Re-formatted some blog posts using custom paragraph formatter which limits col to 70 characters.

6 years ago · 60356eb059
--- a/blogContent/posts/data-science/is-using-ml-for-antivirus-safe.md
+++ b/blogContent/posts/data-science/is-using-ml-for-antivirus-safe.md
@ -1,28 +1,41 @@
 In this blog post I examine the ways in which antivirus programs currently employ machine learning and then go into the 
 security vulnerabilities that it brings.
 In this blog post I examine the ways in which antivirus programs
 currently employ machine learning and then go into the  security
 vulnerabilities that it brings. 

 # ML in the Antivirus Industry

 Malware detection falls into two broad categories: static and dynamic analysis.
 Static analysis examines the program without actually running the code.
 Static analysis looks at things like the file fingerprints, hashes, reverse engineering, memory artifacts, packer detection, and debugging. 
 Static analysis is largely known for looking up the hashes of the virus against a known database of viruses.
 It is super easy to fool signature based malware detection using simple obfuscation methods.
 Dynamic analysis is a technique where you run the program in a sandbox and monitor all the actions that the virus takes.
 If you notice that the program is acting suspicious, it is likely a virus.
 Suspicious behavior typically includes things like registry edits and API calls to bad host names. 

 Antivirus detection is very difficult, but, probably not for the reasons you think.
 The issue isn't writing programs which can detect these static or dynamic properties of viruses-- that is the easy part.
 It is also relatively easy to determine a general rule set for what makes a program dangerous.
 You can also easily blacklist suspicious domains, block malicious activity, and implement a signature based maleware detection program. 

 The real problem is that there are hundreds of thousands of malware applications and more are created every day.
 Not only are there tons of pesky malware applications, there is an absurd amount of normal programs which we don't want malware applications to block.  
 It is impossible for a small team of malware researchers to create a definitive set of heuristics which can correctly identify all malware programs.
 This is where we turn to the field of Machine Learning.
 Humans are very bad with big data, but, computers love big data.
 Most antivirus companies use machine learning and it has been a large success so far because it has allowed us to dramatically improve our ability to detect zero day viruses.
 Malware detection falls into two broad categories: static and dynamic
 analysis. Static analysis examines the program without actually
 running the code. Static analysis looks at things like the file
 fingerprints, hashes, reverse engineering, memory artifacts, packer
 detection, and debugging.  Static analysis is largely known for
 looking up the hashes of the virus against a known database of
 viruses. It is super easy to fool signature based malware detection
 using simple obfuscation methods. Dynamic analysis is a technique
 where you run the program in a sandbox and monitor all the actions
 that the virus takes. If you notice that the program is acting
 suspicious, it is likely a virus. Suspicious behavior typically
 includes things like registry edits and API calls to bad host names.  

 Antivirus detection is very difficult, but, probably not for the
 reasons you think. The issue isn't writing programs which can detect
 these static or dynamic properties of viruses-- that is the easy part.
 It is also relatively easy to determine a general rule set for what
 makes a program dangerous. You can also easily blacklist suspicious
 domains, block malicious activity, and implement a signature based
 maleware detection program.  

 The real problem is that there are hundreds of thousands of malware
 applications and more are created every day. Not only are there tons
 of pesky malware applications, there is an absurd amount of normal
 programs which we don't want malware applications to block.   It is
 impossible for a small team of malware researchers to create a
 definitive set of heuristics which can correctly identify all malware
 programs. This is where we turn to the field of Machine Learning.
 Humans are very bad with big data, but, computers love big data. Most
 antivirus companies use machine learning and it has been a large
 success so far because it has allowed us to dramatically improve our
 ability to detect zero day viruses. 

 ## Interesting Examples

@ -39,50 +52,75 @@ Anything which is not a normal program, it alerts you about since it can be a vi

 ### Kaspersky

 Kaspersky appears to have  done a ton of research into using machine learning for malware detection.
 I would highly recommend that you read their [white paper](https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf) on this subject.
 Kaspersky appears to have  done a ton of research into using machine
 learning for malware detection. I would highly recommend that you read
 their [white
 paper](https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf)
 on this subject. 

 # Why is this a problem?

 It turns out that machine learning systems can be easily fooled by using other machine learning algorithms.
 A classic example of this is with image classification.
 It is easy to use neural networks or genetic algorithms to generate examples which fool the machine learning application by learning the weights of the machine 
 learning application and then making slight tweaks to your input to give a false classification.
 It turns out that machine learning systems can be easily fooled by
 using other machine learning algorithms. A classic example of this is
 with image classification. It is easy to use neural networks or
 genetic algorithms to generate examples which fool the machine
 learning application by learning the weights of the machine  learning
 application and then making slight tweaks to your input to give a
 false classification. 

 ![](media/AISaftey/AdversarialExample.png)

 Since viruses generation is a non-differentiable problem, people often use Genetic algorithms for the adversarial network to fool the antivirus.
 In other words, you don't want to attempt to calculate the derivative between two versions of a virus for gradient decent.
 Since viruses are high dimensional problems, it turns out that most calc implementations would actually be inefficient at traversing the search space to find the global minimum.
 If you want to learn more about genetic algorithms, check out my [recent blog post](https://jrtechs.net/data-science/lets-build-a-genetic-algorithm) on it.
 Since viruses generation is a non-differentiable problem, people often
 use Genetic algorithms for the adversarial network to fool the
 antivirus. In other words, you don't want to attempt to calculate the
 derivative between two versions of a virus for gradient decent. Since
 viruses are high dimensional problems, it turns out that most calc
 implementations would actually be inefficient at traversing the search
 space to find the global minimum. If you want to learn more about
 genetic algorithms, check out my [recent blog
 post](https://jrtechs.net/data-science/lets-build-a-genetic-algorithm)
 on it. 

 # Fooling Antivirus Software

 ## Genetic Algorithms

 There are two major approaches which people have used to generate antivirus resistant malware with genetic algorithms.
 The first approach is to slowly make polymorphic changes to the virus in order to fool the malware detection.
 One of the interesting things about this approach is that you have to have some way of verifying that the polymorphic behaviors that you apply to the virus don't break its "virus capabilities".
 There are two major approaches which people have used to generate
 antivirus resistant malware with genetic algorithms. The first
 approach is to slowly make polymorphic changes to the virus in order
 to fool the malware detection. One of the interesting things about
 this approach is that you have to have some way of verifying that the
 polymorphic behaviors that you apply to the virus don't break its
 "virus capabilities". 

 An other approach used is to represent a virus as a set of properties.
 These properties are everything from the port of attack, the payloads, obfuscation parameters, etc.
 The genetic algorithm would simply tweak the properties of the virus until it found a configuration which evaded the antivirus program.
 These properties are everything from the port of attack, the payloads,
 obfuscation parameters, etc. The genetic algorithm would simply tweak
 the properties of the virus until it found a configuration which
 evaded the antivirus program. 

 ## Reinforcement Learning

 A research group at [Endgame](https://www.endgame.com/) recently gave a [Def Con](https://www.defcon.org/) talk where they presented a framework which uses reinforcement learning to evade static virus detection.
 A research group at [Endgame](https://www.endgame.com/) recently gave
 a [Def Con](https://www.defcon.org/) talk where they presented a
 framework which uses reinforcement learning to evade static virus
 detection. 

 ![Reinforcement Learning Diagram](media/AISaftey/Reinforcement_learning_diagram.png)

 At a high level, the AI plays a "game" against the antivirus where the agent can make functionality-preserving mutations to the virus.
 The reward for the agent is its ability to not get detected by the anti-virus.
 Over time the AI will learn which type of actions will result in getting detected by the antivirus. 
 This framework can be found on [Github](https://github.com/endgameinc/gym-malware).
 At a high level, the AI plays a "game" against the antivirus where the
 agent can make functionality-preserving mutations to the virus. The
 reward for the agent is its ability to not get detected by the
 anti-virus. Over time the AI will learn which type of actions will
 result in getting detected by the antivirus.  This framework can be
 found on [Github](https://github.com/endgameinc/gym-malware). 

 # Takeaways

 Machine learning is great, but, it needs to be properly defended.
 As we start to use machine learning more and more, a large portion of the cyber security field may shift its focus away from securing systems to securing big data applications.
 Machine learning is great, but, it needs to be properly defended. As
 we start to use machine learning more and more, a large portion of the
 cyber security field may shift its focus away from securing systems to
 securing big data applications. 

 # Resources

--- a/blogContent/posts/data-science/lets-build-a-genetic-algorithm.md
+++ b/blogContent/posts/data-science/lets-build-a-genetic-algorithm.md
@ -4,36 +4,49 @@

 # Background and Theory

 Since you stumbled upon this article, you might be wondering what the heck genetic algorithms are.
 To put it simply: genetic algorithms employ the same tactics used in natural selection to find an optimal solution to an optimization problem.
 Genetic algorithms are often used in high dimensional problems where the optimal solutions are not apparent.
 Genetic algorithms are commonly used to tune the [hyper-parameters](https://en.wikipedia.org/wiki/Hyperparameter) of a program.
 However, this algorithm can be used in any scenario where you have a function which defines how well a solution is.
 Many people have used genetic algorithms in video games to auto learn the weaknesses of players.

 The beautiful part about Genetic Algorithms are their simplicity; you need absolutely no knowledge of linear algebra or calculus.
 To implement a genetic algorithm from scratch you only need **very basic** algebra and a general grasp of evolution.
 Since you stumbled upon this article, you might be wondering what the
 heck genetic algorithms are. To put it simply: genetic algorithms
 employ the same tactics used in natural selection to find an optimal
 solution to an optimization problem. Genetic algorithms are often used
 in high dimensional problems where the optimal solutions are not
 apparent. Genetic algorithms are commonly used to tune the
 [hyper-parameters](https://en.wikipedia.org/wiki/Hyperparameter) of a
 program. However, this algorithm can be used in any scenario where you
 have a function which defines how well a solution is. Many people have
 used genetic algorithms in video games to auto learn the weaknesses of
 players.  

 The beautiful part about Genetic Algorithms are their simplicity; you
 need absolutely no knowledge of linear algebra or calculus. To
 implement a genetic algorithm from scratch you only need **very
 basic** algebra and a general grasp of evolution.  

 # Genetic Algorithm

 All genetic algorithms typically have a single cycle where you continuously mutate, breed, and select the most optimal solutions.
 I will dive into each section of this algorithm using simple JavaScript code snippets.
 The algorithm which I present is very generic and modular so it should be easy to port into other programming languages and applications. 
 All genetic algorithms typically have a single cycle where you
 continuously mutate, breed, and select the most optimal solutions. I
 will dive into each section of this algorithm using simple JavaScript
 code snippets. The algorithm which I present is very generic and
 modular so it should be easy to port into other programming languages
 and applications.   

 ![Genetic Algorithms Flow Chart](media/GA/GAFlowChart.svg)


 ## Population Creation

 The very first thing we need to do is specify a data-structure for storing our genetic information.
 In biology, chromosomes are composed of sequences of genes.
 Many people run genetic algorithms on binary arrays since they more closely represent DNA.
 However, as computer scientists, it is often easier to model problems using continuous numbers.
 In this approach, every gene will be a single floating point number ranging between zero and one.
 Every type of gene will have a max and min value which represents the absolute extremes of that gene.
 This works well for optimization because it allows us to easily limit our search space. 
 For example, we can specify that "height" gene can only vary between 0 and 90.
 To get the actual value of the gene from its \[0-1] value we simple de-normalize it.
 The very first thing we need to do is specify a data-structure for
 storing our genetic information. In biology, chromosomes are composed
 of sequences of genes. Many people run genetic algorithms on binary
 arrays since they more closely represent DNA. However, as computer
 scientists, it is often easier to model problems using continuous
 numbers. In this approach, every gene will be a single floating point
 number ranging between zero and one. Every type of gene will have a
 max and min value which represents the absolute extremes of that gene.
 This works well for optimization because it allows us to easily limit
 our search space.  For example, we can specify that "height" gene can
 only vary between 0 and 90. To get the actual value of the gene from
 its \[0-1] value we simple de-normalize it.  

 $$
 g_{real value} = (g_{high}- g_{low})g_{norm} + g_{low}
@ -87,17 +100,22 @@ class Gene
 ```


 Now that we have genes, we can create chromosomes. 
 Chromosomes are simply collections of genes.
 Whatever language you make this in, make sure that when you create a new chromosome it 
 is has a [deep copy](https://en.wikipedia.org/wiki/Object_copying) of the original genetic information rather than a shallow copy.
 A shallow copy is when you simple copy the object pointer where a deep copy is actually creating a new object.
 If you fail to do a deep copy, you will have weird issues where multiple chromosomes will share the same DNA.
 Now that we have genes, we can create chromosomes.  Chromosomes are
 simply collections of genes. Whatever language you make this in, make
 sure that when you create a new chromosome it  is has a [deep
 copy](https://en.wikipedia.org/wiki/Object_copying) of the original
 genetic information rather than a shallow copy. A shallow copy is when
 you simple copy the object pointer where a deep copy is actually
 creating a new object. If you fail to do a deep copy, you will have
 weird issues where multiple chromosomes will share the same DNA.  

 In this class I added helper functions to clone the chromosome as a random copy. 
 You can only create a new chromosome by cloning because I wanted to keep the program generic and make no assumptions about the domain.
 Since you only provide the min/max information for the genes once, cloning an existing chromosome is the easiest way of
 ensuring that all corresponding chromosomes contain genes with identical extrema. 
 In this class I added helper functions to clone the chromosome as a
 random copy.  You can only create a new chromosome by cloning because
 I wanted to keep the program generic and make no assumptions about the
 domain. Since you only provide the min/max information for the genes
 once, cloning an existing chromosome is the easiest way of  ensuring
 that all corresponding chromosomes contain genes with identical
 extrema.   


 ```javascript
@ -148,7 +166,8 @@ class Chromosome
 }
 ```

 Creating a random population is pretty straight forward if implemented a method to create a random clone of a chromosome. 
 Creating a random population is pretty straight forward if implemented
 a method to create a random clone of a chromosome.   

 ```javascript
 /**
@ -170,16 +189,17 @@ const createRandomPopulation = function(geneticChromosome, populationSize)
 };
 ```

 This is where nearly all the domain information is introduced. 
 After you define what types of genes are found on each chromosome, you can create an entire population.
 In this example all genes contain values ranging between one and ten. 
 This is where nearly all the domain information is introduced.  After
 you define what types of genes are found on each chromosome, you can
 create an entire population. In this example all genes contain values
 ranging between one and ten.   

 ```javascript
 let gene1 = new Gene(1,10,10);
 let gene2 = new Gene(1,10,0.4);
 let geneList = [gene1, gene2];

 let exampleOrganism = new Chromosome(geneList);
 let exampleOrganism = new Chromosome(geneList);  

 let population = createRandomPopulation(genericChromosome, 100);
 ```
@ -187,10 +207,14 @@ let population = createRandomPopulation(genericChromosome, 100);

 ## Evaluate Fitness

 Like all optimization problems, you need a way to evaluate the performance of a particular solution.
 The cost function takes in a chromosome and evaluates how close it got to the ideal solution. 
 This particular example it is just computing the [Manhattan Distance](https://en.wiktionary.org/wiki/Manhattan_distance) to a random 2D point.
 I chose two dimensions because it is easy to graph, however, real applications may have dozens of genes on each chromosome. 
 Like all optimization problems, you need a way to evaluate the
 performance of a particular solution. The cost function takes in a
 chromosome and evaluates how close it got to the ideal solution.  This
 particular example it is just computing the [Manhattan
 Distance](https://en.wiktionary.org/wiki/Manhattan_distance) to a
 random 2D point. I chose two dimensions because it is easy to graph,
 however, real applications may have dozens of genes on each
 chromosome.   

 ```javascript
 let costx = Math.random() * 10;
@ -209,9 +233,11 @@ const basicCostFunction = function(chromosome)

 ## Selection

 Selecting the best performing chromosomes is straightforward after you have a function for evaluating the performance.
 This code snippet also computes the average and best chromosome of the population to make it easier to graph and define
 the stopping point for the algorithm's main loop. 
 Selecting the best performing chromosomes is straightforward after you
 have a function for evaluating the performance. This code snippet also
 computes the average and best chromosome of the population to make it
 easier to graph and define the stopping point for the algorithm's main
 loop.   

 ```javascript
 /**
@ -249,10 +275,12 @@ const naturalSelection = function(population, keepNumber, fitnessFunction)
 };
 ```

 You might be wondering how I sorted the list of JSON objects - not a numerical array.
 I used the following function as a comparator for JavaScript's built in sort function.
 This comparator will compare objects based on a specific attribute that you give it.
 This is a very handy function to include in all of your JavaScript projects for easy sorting. 
 You might be wondering how I sorted the list of JSON objects - not a
 numerical array. I used the following function as a comparator for
 JavaScript's built in sort function. This comparator will compare
 objects based on a specific attribute that you give it. This is a very
 handy function to include in all of your JavaScript projects for easy
 sorting.   

 ```javascript
 /**
@ -281,15 +309,16 @@ function predicateBy(prop)

 ## Reproduction

 The process of reproduction can be broken down into Pairing and Mating. 
 The process of reproduction can be broken down into Pairing and
 Mating.   

 ### Pairing

 Pairing is the process of selecting mates to produce offspring.
 A typical approach will separate the population into two segments of mothers and fathers.
 You then randomly pick pairs of mothers and fathers to produce offspring.
 It is ok if one chromosome mates more than once.
 It is just important that you keep this process random. 
 Pairing is the process of selecting mates to produce offspring. A
 typical approach will separate the population into two segments of
 mothers and fathers. You then randomly pick pairs of mothers and
 fathers to produce offspring. It is ok if one chromosome mates more
 than once. It is just important that you keep this process random.   

 ```javascript
 /**
@ -317,22 +346,26 @@ const matePopulation = function(population, desiredPopulationSize)

 ### Mating

 Mating is the actual act of forming new chromosomes/organisms based on your previously selected pairs.
 From my research, there are two major forms of mating: blending, crossover.
 Mating is the actual act of forming new chromosomes/organisms based on
 your previously selected pairs. From my research, there are two major
 forms of mating: blending, crossover.  

 Blending is typically the most preferred approach to mating when dealing with continuous variables.
 In this approach you combine the genes of both parents based on a random factor.
 Blending is typically the most preferred approach to mating when
 dealing with continuous variables. In this approach you combine the
 genes of both parents based on a random factor.  

 $$
 c_{new} = r * c_{mother} + (1-r) * c_{father}
 $$

 The second offspring simply uses (1-r) for their random factor to adjust the chromosomes. 
 The second offspring simply uses (1-r) for their random factor to
 adjust the chromosomes.   

 Crossover is the simplest approach to mating.
 In this process you clone the parents and then you randomly swap *n* of their genes.
 This works fine in some scenarios; however, this severely lacks the genetic diversity of the genes because you now have to solely
 rely on mutations for changes. 
 Crossover is the simplest approach to mating. In this process you
 clone the parents and then you randomly swap *n* of their genes. This
 works fine in some scenarios; however, this severely lacks the genetic
 diversity of the genes because you now have to solely rely on
 mutations for changes.   

 ```javascript
 /**
@ -373,14 +406,16 @@ const blendGene = function(gene1, gene2, blendCoef)

 ## Mutation

 Mutations are random changes to an organisms DNA.
 In the scope of genetic algorithms, it helps our population converge on the correct solution.
 Mutations are random changes to an organisms DNA. In the scope of
 genetic algorithms, it helps our population converge on the correct
 solution.  

 You can either adjust genes by a factor resulting in a smaller change or, you can
 change the value of the gene to be something completely random.
 Since we are using the blending technique for reproduction, we already have small incremental changes.
 I prefer to use mutations to randomly change the entire gene since it helps prevent the algorithm 
 from settling on a local minimum rather than the global minimum. 
 You can either adjust genes by a factor resulting in a smaller change
 or, you can change the value of the gene to be something completely
 random. Since we are using the blending technique for reproduction, we
 already have small incremental changes. I prefer to use mutations to
 randomly change the entire gene since it helps prevent the algorithm 
 from settling on a local minimum rather than the global minimum.   


 ```javascript
@ -408,11 +443,13 @@ const mutatePopulation = function(population, mutatePercentage)

 ## Immigration

 Immigration or "new blood" is the process of dumping random organisms into your population at each generation.
 This prevents us from getting stuck in a local minimum rather than the global minimum.
 There are more advanced techniques to accomplish this same concept.
 My favorite approach (not implemented here) is raising **x** populations simultaneously and every **y** generations
 you take **z** organisms from each population and move them to another population. 
 Immigration or "new blood" is the process of dumping random organisms
 into your population at each generation. This prevents us from getting
 stuck in a local minimum rather than the global minimum. There are
 more advanced techniques to accomplish this same concept. My favorite
 approach (not implemented here) is raising **x** populations
 simultaneously and every **y** generations you take **z** organisms
 from each population and move them to another population.   

 ```javascript
 /**
@ -432,7 +469,8 @@ const newBlood = function(population, immigrationSize)

 ## Putting It All Together

 Now that we have all the ingredients for a genetic algorithm we can piece it together in a simple loop.
 Now that we have all the ingredients for a genetic algorithm we can
 piece it together in a simple loop.  

 ```javascript
 /**
@ -487,11 +525,14 @@ const runGeneticOptimization = function(geneticChromosome, costFunction,

 ## Running

 Running the program is pretty straight forward after you have your genes and cost function defined.
 You might be wondering if there is an optimal configuration of parameters to use with this algorithm.
 The answer is that it varies based on the particular problem.
 Problems like the one graphed by this website perform very well with a low mutation rate and a high population.
 However, some higher dimensional problems won't even converge on a local answer if you set your mutation rate too low. 
 Running the program is pretty straight forward after you have your
 genes and cost function defined. You might be wondering if there is an
 optimal configuration of parameters to use with this algorithm. The
 answer is that it varies based on the particular problem. Problems
 like the one graphed by this website perform very well with a low
 mutation rate and a high population. However, some higher dimensional
 problems won't even converge on a local answer if you set your
 mutation rate too low.   

 ```javascript
 let gene1 = new Gene(1,10,10);
@ -499,17 +540,15 @@ let gene1 = new Gene(1,10,10);
 let geneN = new Gene(1,10,0.4);
 let geneList = [gene1,..., geneN];

 let exampleOrganism = new Chromosome(geneList);
 let exampleOrganism = new Chromosome(geneList);  

 costFunction = function(chromosome)
 {
    var d =...;
    //compute cost
    return d;
 }
 costFunction = function(chromosome) {     var d =...;     //compute
 cost     return d; }  

 runGeneticOptimization(exampleOrganism, costFunction, 100, 50, 0.01, 0.3, 20, 10);
 ```

 The complete code for the genetic algorithm and the fancy JavaScript graphs can be found in my [Random Scripts GitHub Repository](https://github.com/jrtechs/RandomScripts).
 In the future I may package this into an [npm](https://www.npmjs.com/) package.
 The complete code for the genetic algorithm and the fancy JavaScript
 graphs can be found in my [Random Scripts GitHub
 Repository](https://github.com/jrtechs/RandomScripts). In the future I
 may package this into an [npm](https://www.npmjs.com/) package.
--- a/blogContent/posts/data-science/r-programming-language.md
+++ b/blogContent/posts/data-science/r-programming-language.md
@ -1,39 +1,52 @@
 R is a programming language designed for statistical analysis and graphics.
 Since R has been around since 1992, it has developed a large community and has over [13 thousand packages](https://cran.r-project.org/web/packages/) publicly available.
 What is really cool about R is that it is an open source [GNU](http://www.gnu.org/) project.
 R is a programming language designed for statistical analysis and
 graphics. Since R has been around since 1992, it has developed a large
 community and has over [13 thousand
 packages](https://cran.r-project.org/web/packages/) publicly
 available. What is really cool about R is that it is an open source
 [GNU](http://www.gnu.org/) project. 


 # R Syntax and Paradigms

 The syntax of R is C esk with its use of curly braces. 
 The type system of R is similar to Python where it can infer what type you are using.
 This "lazy" type system allows for "faster" development since you don't have to worry about declaring types -- this laziness makes it harder to debug and read your code.
 The type system of R is rather strange and distinctly different from most other languages.
 For starters, integers are represented as vectors of length 1.
 These things may feel weird at first, but, R's type system is one of the things that make it a great tool for manipulating data.
 The syntax of R is C esk with its use of curly braces.  The type
 system of R is similar to Python where it can infer what type you are
 using. This "lazy" type system allows for "faster" development since
 you don't have to worry about declaring types -- this laziness makes
 it harder to debug and read your code. The type system of R is rather
 strange and distinctly different from most other languages. For
 starters, integers are represented as vectors of length 1. These
 things may feel weird at first, but, R's type system is one of the
 things that make it a great tool for manipulating data. 

 ![R Arrays Start at 1](media/r/arrays.jpg)

 Did I mention that arrays start at 1?
 Technically, the thing which we refer to as an array in Java are really vectors in R.
 Arrays in R are data objects which can store data in more than two dimensions.
 Since R tries to follow mathematical notation, indexing starts at 1 -- just like in linear algebra.
 Using zero based indexing makes sense for languages like C because the index is used to get at a particular memory location from a pointer.
 Did I mention that arrays start at 1? Technically, the thing which we
 refer to as an array in Java are really vectors in R. Arrays in R are
 data objects which can store data in more than two dimensions. Since R
 tries to follow mathematical notation, indexing starts at 1 -- just
 like in linear algebra. Using zero based indexing makes sense for
 languages like C because the index is used to get at a particular
 memory location from a pointer. 

 <youtube src="s3FozVfd7q4" />

 I don't have the time to go over the basic syntax of R in a single blog post; however, I feel that this youtube video does a pretty good job.
 I don't have the time to go over the basic syntax of R in a single
 blog post; however, I feel that this youtube video does a pretty good
 job. 

 # R Markdown

 One of my favorite aspects of R is its markdown language called Rmd.
 Rmd is essentially markdown which has can have embedded R scripts run in it.
 The Rmd file is compiled down to a markdown file which is converted to either a PDF, HTML file, or a slide show using pandoc.
 You can provide options for the pandoc render using a YAMAL header in the Rmd file.
 This is an amazing tool for creating reports and writing research papers.
 The documents which you create are reproducible since you can share the source code to it.
 If the data which you are using changes, you simply have to recompile to document to get an updated view.
 You no longer have to re-generate a dozen graphs and update figures and statistics across your document.
 Rmd is essentially markdown which has can have embedded R scripts run
 in it. The Rmd file is compiled down to a markdown file which is
 converted to either a PDF, HTML file, or a slide show using pandoc.
 You can provide options for the pandoc render using a YAMAL header in
 the Rmd file. This is an amazing tool for creating reports and writing
 research papers. The documents which you create are reproducible since
 you can share the source code to it. If the data which you are using
 changes, you simply have to recompile to document to get an updated
 view. You no longer have to re-generate a dozen graphs and update
 figures and statistics across your document. 


 # Resources
--- a/blogContent/posts/open-source/the-essential-vim-configuration.md
+++ b/blogContent/posts/open-source/the-essential-vim-configuration.md
@ -1,31 +1,27 @@
 # Vim Configuration

 Stock Vim is pretty boring.
 The good news is that Vim has a very comprehensive configuration file which
 allows you to tweak it to your heart's content.
 To make changes to Vim you simply modify the ~/.vimrc file in your home
 directory.
 By adding simple commands this file you can easily change the way your
 text editor looks.
 Neat.
 Stock Vim is pretty boring. The good news is that Vim has a very
 comprehensive configuration file which allows you to tweak it to your
 heart's content. To make changes to Vim you simply modify the ~/.vimrc
 file in your home directory. By adding simple commands this file you
 can easily change the way your text editor looks. Neat. 

 I attempted to create the smallest Vim configuration file which makes
 Vim usable enough for me to use as my daily text editor. 
 I believe that it is important for everyone to know what their
 Vim configuration does. 
 This knowledge will help ensure that you are only adding the things
 you want and that you can later customize it for your workflow.
 Vim usable enough for me to use as my daily text editor.  I believe
 that it is important for everyone to know what their Vim configuration
 does.  This knowledge will help ensure that you are only adding the
 things you want and that you can later customize it for your workflow.
 Although it may be tempting to download somebody else's massive Vim
 configuration, I argue that this can lead to problems down the road.
 configuration, I argue that this can lead to problems down the road. 

 I want to mention that I don't use Vim as my primary
 IDE; I only use Vim as a text editor.
 I tend to use JetBrains tools on larger projects since they have amazing
 auto complete functionality, build tools, and comprehensive error detection.
 There are great Vim configurations out there on the internet; however, most
 tend to be a bit overkill for what most people want to do. 
 I want to mention that I don't use Vim as my primary IDE; I only use
 Vim as a text editor. I tend to use JetBrains tools on larger projects
 since they have amazing auto complete functionality, build tools, and
 comprehensive error detection. There are great Vim configurations out
 there on the internet; however, most tend to be a bit overkill for
 what most people want to do.  

 Alright, lets dive into my vim configuration!
 Alright, lets dive into my vim configuration! 


 # Spell Check
@ -35,34 +31,32 @@ autocmd BufRead,BufNewFile *.md setlocal spell spelllang=en_us
 autocmd BufRead,BufNewFile *.txt setlocal spell spelllang=en_us
 ```

 Since I am often an atrocious speller, having basic spell check abilities in 
 Vim is a lifesaver.
 It does not make sense to have spell check enabled for most files since it
 would light up most programming files like a Christmas tree.
 I have my Vim configuration set to automatically enable spell check for markdown files
 and basic text files. 
 If you need spell check in other files, you can enter the command 
 ":set spell" to enable spell check for that file.
 To see the spelling recommendations, type "z=" when you are over a
 highlighted word.
 Since I am often an atrocious speller, having basic spell check
 abilities in  Vim is a lifesaver. It does not make sense to have spell
 check enabled for most files since it would light up most programming
 files like a Christmas tree. I have my Vim configuration set to
 automatically enable spell check for markdown files and basic text
 files.  If you need spell check in other files, you can enter the
 command  ":set spell" to enable spell check for that file. To see the
 spelling recommendations, type "z=" when you are over a highlighted
 word. 


 # Appearance

 Adding colors to Vim is fun.
 The "syntax enable" command tells vim to highlight keywords in programming
 files and other structured files.
 Adding colors to Vim is fun. The "syntax enable" command tells vim to
 highlight keywords in programming files and other structured files. 

 ```vim
 syntax enable
 ```

 I would encourage everyone to look at the different color schemes available for
 Vim. 
 I threw the color scheme command in a try-catch block to ensure that it does not crash
 Vim if you don't have the color scheme installed.
 By default the desert color scheme is installed; however, that is not always the 
 case for [community created](http://vimcolors.com/) Vim color schemes. 
 I would encourage everyone to look at the different color schemes
 available for Vim.  I threw the color scheme command in a try-catch
 block to ensure that it does not crash Vim if you don't have the color
 scheme installed. By default the desert color scheme is installed;
 however, that is not always the  case for [community
 created](http://vimcolors.com/) Vim color schemes.  

 ```vim
 try
@ -70,13 +64,12 @@ try
 catch
 endtry

 set background=dark
 ```
 set background=dark ``` 

 # Indentation and Tabs

 Having your indentation settings squared away will save you a ton of time
 if you are doing any programming in Vim. 
 Having your indentation settings squared away will save you a ton of
 time if you are doing any programming in Vim.  

 ```vim
 "copy indentation from current line when making a new line
@ -84,28 +77,25 @@ set autoindent
 " Smart indentation when programming: indent after {
 set smartindent 

 set tabstop=4     " number of spaces per tab
 set expandtab     " convert tabs to spaces
 set shiftwidth=4  " set a tab press equal to 4 spaces
 ```
 set tabstop=4     " number of spaces per tab set expandtab     "
 convert tabs to spaces set shiftwidth=4  " set a tab press equal to 4
 spaces ``` 

 # Useful UI Tweaks

 These are three UI tweaks that I find really useful to have, some people may
 have different opinions on these.
 Seeing line numbers is useful since programming errors typically just 
 tells you what line your program went up in flames.
 The cursor line is useful since it allows you to easily to find your place
 in the file -- this may be a bit too much for some people.

 I like to keep every line under 80 characters long for technical files,
 having a visual queue for this is helpful.
 Some people prefer to just use the auto word wrap and keep their lines as long
 as they like.
 I like to keep to the 80 character limit and explicitly choose where
 I cut each line.
 Some of my university classes mandate the 80 character limit and take
 points off if you don't follow it. 
 These are three UI tweaks that I find really useful to have, some
 people may have different opinions on these. Seeing line numbers is
 useful since programming errors typically just  tells you what line
 your program went up in flames. The cursor line is useful since it
 allows you to easily to find your place in the file -- this may be a
 bit too much for some people. 

 I like to keep every line under 80 characters long for technical
 files, having a visual queue for this is helpful. Some people prefer
 to just use the auto word wrap and keep their lines as long as they
 like. I like to keep to the 80 character limit and explicitly choose
 where I cut each line. Some of my university classes mandate the 80
 character limit and take points off if you don't follow it.  

 ```vim
 " Set Line Numbers to show "
@ -121,7 +111,7 @@ set colorcolumn=80

 # Searching and Auto Complete

 This these configurations make searching in Vim less painful.
 This these configurations make searching in Vim less painful. 

 ```vim
 " search as characters are entered "
@ -133,8 +123,8 @@ set hlsearch
 set ignorecase
 ```

 These configurations will make command completion easier by 
 showing an auto-complete menu when you press tab. 
 These configurations will make command completion easier by  showing
 an auto-complete menu when you press tab.  

 ```vim
 " Shows a auto complete menu when you are typing a command "
@ -147,11 +137,10 @@ set wildignorecase " ignore case for auto complete

 # Useful Things to Have

 There is nothing too earth shattering in this section, just things that
 might save you some time.
 Enabling mouse support is a really interesting configuration.
 When enabled, this allows you to select text and jump between different
 locations with your mouse.
 There is nothing too earth shattering in this section, just things
 that might save you some time. Enabling mouse support is a really
 interesting configuration. When enabled, this allows you to select
 text and jump between different locations with your mouse. 

 ```vim
 " Enables mouse support "
@ -170,7 +159,7 @@ set autoread
 set lazyredraw
 ```

 Setting your file format is always a good idea for compatibility. 
 Setting your file format is always a good idea for compatibility.  

 ```vim
 " Set utf8 as standard encoding and en_US as the standard language "
@ -183,6 +172,6 @@ set ffs=unix,dos,mac
 # Wrapping it up

 I hope that this quick blog post inspired you to maintain your own Vim
 configuration file.
 You can find my current configuration files in my 
 [random scripts repository](https://github.com/jrtechs/RandomScripts/tree/master/config).
 configuration file. You can find my current configuration files in my 
 [random scripts
 repository](https://github.com/jrtechs/RandomScripts/tree/master/config).
--- a/blogContent/posts/other/2018-in-review.md
+++ b/blogContent/posts/other/2018-in-review.md
@ -1,7 +1,10 @@
 Inspired by [Justin Flory](https://justinwflory.com/) and [Dan Schneiderman](http://www.schneidy.com),
 I decided to make a 2018 review post. I believe that it would be a good way to reflect upon what I did
 in 2018 and make plans for 2019. This post will be a very high level overview of the projects and
 activities that I did in 2018 -- nothing personal. Pictures say a thousand words, so, I will include a lot.
 Inspired by [Justin Flory](https://justinwflory.com/) and [Dan
 Schneiderman](http://www.schneidy.com), I decided to make a 2018
 review post. I believe that it would be a good way to reflect upon
 what I did in 2018 and make plans for 2019. This post will be a very
 high level overview of the projects and activities that I did in 2018
 -- nothing personal. Pictures say a thousand words, so, I will include
 a lot. 

 # January:

@ -11,7 +14,7 @@ activities that I did in 2018 -- nothing personal. Pictures say a thousand words

 **Started Second Semester of College**

 Classes:
 Classes: 

 - Mechanics of Programming
 - Statistics
@ -92,9 +95,9 @@ Classes:

 **Second Year of College**

 First year on the Eboard of RITlug as Vice President.
 First year on the Eboard of RITlug as Vice President. 

 Classes:
 Classes: 

 - Linear Algebra
 - Analysis Of Algorithms
--- a/blogContent/posts/other/morality-of-self-driving-cars.md
+++ b/blogContent/posts/other/morality-of-self-driving-cars.md
@ -1,84 +1,106 @@
 <youtube src="_MFGx8d1zl0" />

 Although the movie *I Robot* has not aged well, it still brings up some interesting ethical questions
 that we are still discussing concerning self driving cars. The protagonist Detective Spooner 
 has an almost unhealthy amount of distrust towards
 robots. In the movie, a robot decided to save Spooner's life over a 12 year old girl in a car accident. 
 This ignites the famous ethical debate of the trolley problem, but, now with artificial intelligence.
 The debate boils down to this: are machines capable of making moral decisions. The 
 surface level answer from the movie is presented as **no** when Spooner's presents car crash antidote.
 This question parallels the discussion that we are currently having with self driving cars.
 When a self driving car is presented with two options which result in the loss of life,
 what should it choose?
 Although the movie *I Robot* has not aged well, it still brings up
 some interesting ethical questions that we are still discussing
 concerning self driving cars. The protagonist Detective Spooner  has
 an almost unhealthy amount of distrust towards robots. In the movie, a
 robot decided to save Spooner's life over a 12 year old girl in a car
 accident.  This ignites the famous ethical debate of the trolley
 problem, but, now with artificial intelligence. The debate boils down
 to this: are machines capable of making moral decisions. The   surface
 level answer from the movie is presented as **no** when Spooner's
 presents car crash antidote. This question parallels the discussion
 that we are currently having with self driving cars. When a self
 driving car is presented with two options which result in the loss of
 life, what should it choose? 

 <iframe width="100%" height="315" src="https://www.youtube.com/embed/ixIoDYVfKA0" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

 When surveyed, most people say that they would prefer to have self driving cars take the utilitarian 
 approach towards the trolley problem. A utilitarian approach would try to minimize the
 total amount of harm. MIT made a neat [website](http://moralmachine.mit.edu/) where it presents you with a 
 bunch of "trolley problems" where you have to decide who dies. At the end of the survey the
 website presents you with a list of observed preferences you made when deciding who's life was more important to save.
 The purpose of the trolley problem is merely to ponder what decision a self driving car
 should make when **all** of its alternatives are depleted. 
 When surveyed, most people say that they would prefer to have self
 driving cars take the utilitarian  approach towards the trolley
 problem. A utilitarian approach would try to minimize the  total
 amount of harm. MIT made a neat
 [website](http://moralmachine.mit.edu/) where it presents you with a 
 bunch of "trolley problems" where you have to decide who dies. At the
 end of the survey the website presents you with a list of observed
 preferences you made when deciding who's life was more important to
 save. The purpose of the trolley problem is merely to ponder what
 decision a self driving car should make when **all** of its
 alternatives are depleted.  

 ![Moral Machine](media/selfDrivingCars/moralmachine3.png)


 We still need to question whether
 utilitarianism is the right moral engine for self driving cars. Would it be ethical 
 for a car to take into account 
 you age, race, gender, and social status when deciding if you get to live? 
 If self driving cars could access personal information such as criminal history or known friends, would it
 be ethical to use that information? Would it be moral for
 someone to make a car which favored the safety of the passengers of the car above 
 others?
 We still need to question whether utilitarianism is the right moral
 engine for self driving cars. Would it be ethical  for a car to take
 into account  you age, race, gender, and social status when deciding
 if you get to live?  If self driving cars could access personal
 information such as criminal history or known friends, would it  be
 ethical to use that information? Would it be moral for someone to make
 a car which favored the safety of the passengers of the car above 
 others? 

 ![Moral Machine](media/selfDrivingCars/moralMachine.png)


 Even though most people want self driving cars to use utilitarianism, most people surveyed also responded
 that they would not buy a car which did not have their safety as its top priority.
 This brings up a serious social dilemma. If people want everyone else's cars to be utilitarians,
 yet, have their own cars be greedy and favor their safety, we would see none of the utilitarian improvements. This 
 presented us with the tragedy of the commons problem since everyone would favor their own 
 safety and nobody would sacrifice their safety for the public good. This brings up yet another question:
 would it be fair to ask someone to sacrifice their safety in this way?

 In most cases, when a tragedy of the commons situation is presented, government intervention is
 the most piratical solution. It might be the best to have the government
 mandate that all cars try to maximize the amount of life saved when a car is presented with the
 trolley problem. Despite appearing to be a good solution, the flaw in this does not become apparent before you us 
 consequentialism to examine this problem.
 Even though most people want self driving cars to use utilitarianism,
 most people surveyed also responded that they would not buy a car
 which did not have their safety as its top priority. This brings up a
 serious social dilemma. If people want everyone else's cars to be
 utilitarians, yet, have their own cars be greedy and favor their
 safety, we would see none of the utilitarian improvements. This 
 presented us with the tragedy of the commons problem since everyone
 would favor their own  safety and nobody would sacrifice their safety
 for the public good. This brings up yet another question: would it be
 fair to ask someone to sacrifice their safety in this way? 

 In most cases, when a tragedy of the commons situation is presented,
 government intervention is  the most piratical solution. It might be
 the best to have the government mandate that all cars try to maximize
 the amount of life saved when a car is presented with the trolley
 problem. Despite appearing to be a good solution, the flaw in this
 does not become apparent before you us  consequentialism to examine
 this problem. 

 ![Moral Machine](media/selfDrivingCars/moralMachine6.png)

 Self driving cars are expected to reduce car accidents by 90% by eliminating human error. If people
 decide to not use self driving cars due to the utilitarian moral engine, we run the 
 risk of actually loosing more lives. Some people have actually argued that since
 artificial intelligence is incapable of making moral decisions, they should take
 no action at all when there is a situation which will always results in the loss of life. 
 In the frame of the trolley problem, 
 it is best for the artificial intelligence to not pull the lever. I will argue that 
 it is best for self driving cars to not make ethical
 decisions because, it would result in the highest adoption rate of self driving cars. This would end up
 saving the most lives in the long run. Plus, the likelihood that a car is actually presented with
 a trolley problem is pretty slim.

 The discussion over the moral decisions a car has to make is almost fruitless. It turns out 
 that humans are not even good at making moral decisions in emergency situations. When we make rash decisions
 influenced by anxiety, we are heavily influenced by prejudices and self motives. Despite our own shortcomings when it
 comes to decision making, that does not mean that we can not do better with self driving cars. However,
 we need to realize that it is the mass adoption of self driving cars which will save the most lives, not
 the moral engine which we program the cars with. We can not let the moral engine of the self driving 
 cars get in the way of adoption. 

 The conclusion that I made parallels Spooner's problem with robots in the movie *I Robot*. Spooner was so mad at the robots for
 saving his own life rather than the girl's, he never realized that if it was not for the robots, neither of them would
 have survived that car crash. Does that mean we can't do better than not pulling the lever? Well... not exactly. 
 Near the end of the movie a robot was presented with another trolley problem, but, this time he managed to
 find a way which saved both parties. Without reading into this movie too deep, this illustrates how the early
 adoption of artificial intelligence ended up saving tons of lives like Spooners. It is only when the technology fully develops 
 is when we can start to avoid the trolley problem completely.
 Self driving cars are expected to reduce car accidents by 90% by
 eliminating human error. If people decide to not use self driving cars
 due to the utilitarian moral engine, we run the  risk of actually
 loosing more lives. Some people have actually argued that since
 artificial intelligence is incapable of making moral decisions, they
 should take no action at all when there is a situation which will
 always results in the loss of life.  In the frame of the trolley
 problem,  it is best for the artificial intelligence to not pull the
 lever. I will argue that  it is best for self driving cars to not make
 ethical decisions because, it would result in the highest adoption
 rate of self driving cars. This would end up saving the most lives in
 the long run. Plus, the likelihood that a car is actually presented
 with  a trolley problem is pretty slim. 

 The discussion over the moral decisions a car has to make is almost
 fruitless. It turns out  that humans are not even good at making moral
 decisions in emergency situations. When we make rash decisions
 influenced by anxiety, we are heavily influenced by prejudices and
 self motives. Despite our own shortcomings when it comes to decision
 making, that does not mean that we can not do better with self driving
 cars. However, we need to realize that it is the mass adoption of self
 driving cars which will save the most lives, not the moral engine
 which we program the cars with. We can not let the moral engine of the
 self driving  cars get in the way of adoption.  

 The conclusion that I made parallels Spooner's problem with robots in
 the movie *I Robot*. Spooner was so mad at the robots for saving his
 own life rather than the girl's, he never realized that if it was not
 for the robots, neither of them would have survived that car crash.
 Does that mean we can't do better than not pulling the lever? Well...
 not exactly.  Near the end of the movie a robot was presented with
 another trolley problem, but, this time he managed to find a way which
 saved both parties. Without reading into this movie too deep, this
 illustrates how the early adoption of artificial intelligence ended up
 saving tons of lives like Spooners. It is only when the technology
 fully develops  is when we can start to avoid the trolley problem
 completely.