diff --git a/blogContent/posts/data-science/is-using-ml-for-antivirus-safe.md b/blogContent/posts/data-science/is-using-ml-for-antivirus-safe.md index 88c4c2d..4ac9bfc 100644 --- a/blogContent/posts/data-science/is-using-ml-for-antivirus-safe.md +++ b/blogContent/posts/data-science/is-using-ml-for-antivirus-safe.md @@ -1,28 +1,41 @@ -In this blog post I examine the ways in which antivirus programs currently employ machine learning and then go into the -security vulnerabilities that it brings. +In this blog post I examine the ways in which antivirus programs +currently employ machine learning and then go into the security +vulnerabilities that it brings. # ML in the Antivirus Industry -Malware detection falls into two broad categories: static and dynamic analysis. -Static analysis examines the program without actually running the code. -Static analysis looks at things like the file fingerprints, hashes, reverse engineering, memory artifacts, packer detection, and debugging. -Static analysis is largely known for looking up the hashes of the virus against a known database of viruses. -It is super easy to fool signature based malware detection using simple obfuscation methods. -Dynamic analysis is a technique where you run the program in a sandbox and monitor all the actions that the virus takes. -If you notice that the program is acting suspicious, it is likely a virus. -Suspicious behavior typically includes things like registry edits and API calls to bad host names. - -Antivirus detection is very difficult, but, probably not for the reasons you think. -The issue isn't writing programs which can detect these static or dynamic properties of viruses-- that is the easy part. -It is also relatively easy to determine a general rule set for what makes a program dangerous. -You can also easily blacklist suspicious domains, block malicious activity, and implement a signature based maleware detection program. - -The real problem is that there are hundreds of thousands of malware applications and more are created every day. -Not only are there tons of pesky malware applications, there is an absurd amount of normal programs which we don't want malware applications to block. -It is impossible for a small team of malware researchers to create a definitive set of heuristics which can correctly identify all malware programs. -This is where we turn to the field of Machine Learning. -Humans are very bad with big data, but, computers love big data. -Most antivirus companies use machine learning and it has been a large success so far because it has allowed us to dramatically improve our ability to detect zero day viruses. +Malware detection falls into two broad categories: static and dynamic +analysis. Static analysis examines the program without actually +running the code. Static analysis looks at things like the file +fingerprints, hashes, reverse engineering, memory artifacts, packer +detection, and debugging. Static analysis is largely known for +looking up the hashes of the virus against a known database of +viruses. It is super easy to fool signature based malware detection +using simple obfuscation methods. Dynamic analysis is a technique +where you run the program in a sandbox and monitor all the actions +that the virus takes. If you notice that the program is acting +suspicious, it is likely a virus. Suspicious behavior typically +includes things like registry edits and API calls to bad host names. + +Antivirus detection is very difficult, but, probably not for the +reasons you think. The issue isn't writing programs which can detect +these static or dynamic properties of viruses-- that is the easy part. +It is also relatively easy to determine a general rule set for what +makes a program dangerous. You can also easily blacklist suspicious +domains, block malicious activity, and implement a signature based +maleware detection program. + +The real problem is that there are hundreds of thousands of malware +applications and more are created every day. Not only are there tons +of pesky malware applications, there is an absurd amount of normal +programs which we don't want malware applications to block. It is +impossible for a small team of malware researchers to create a +definitive set of heuristics which can correctly identify all malware +programs. This is where we turn to the field of Machine Learning. +Humans are very bad with big data, but, computers love big data. Most +antivirus companies use machine learning and it has been a large +success so far because it has allowed us to dramatically improve our +ability to detect zero day viruses. ## Interesting Examples @@ -39,50 +52,75 @@ Anything which is not a normal program, it alerts you about since it can be a vi ### Kaspersky -Kaspersky appears to have done a ton of research into using machine learning for malware detection. -I would highly recommend that you read their [white paper](https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf) on this subject. +Kaspersky appears to have done a ton of research into using machine +learning for malware detection. I would highly recommend that you read +their [white +paper](https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf) +on this subject. # Why is this a problem? -It turns out that machine learning systems can be easily fooled by using other machine learning algorithms. -A classic example of this is with image classification. -It is easy to use neural networks or genetic algorithms to generate examples which fool the machine learning application by learning the weights of the machine -learning application and then making slight tweaks to your input to give a false classification. +It turns out that machine learning systems can be easily fooled by +using other machine learning algorithms. A classic example of this is +with image classification. It is easy to use neural networks or +genetic algorithms to generate examples which fool the machine +learning application by learning the weights of the machine learning +application and then making slight tweaks to your input to give a +false classification. ![](media/AISaftey/AdversarialExample.png) -Since viruses generation is a non-differentiable problem, people often use Genetic algorithms for the adversarial network to fool the antivirus. -In other words, you don't want to attempt to calculate the derivative between two versions of a virus for gradient decent. -Since viruses are high dimensional problems, it turns out that most calc implementations would actually be inefficient at traversing the search space to find the global minimum. -If you want to learn more about genetic algorithms, check out my [recent blog post](https://jrtechs.net/data-science/lets-build-a-genetic-algorithm) on it. +Since viruses generation is a non-differentiable problem, people often +use Genetic algorithms for the adversarial network to fool the +antivirus. In other words, you don't want to attempt to calculate the +derivative between two versions of a virus for gradient decent. Since +viruses are high dimensional problems, it turns out that most calc +implementations would actually be inefficient at traversing the search +space to find the global minimum. If you want to learn more about +genetic algorithms, check out my [recent blog +post](https://jrtechs.net/data-science/lets-build-a-genetic-algorithm) +on it. # Fooling Antivirus Software ## Genetic Algorithms -There are two major approaches which people have used to generate antivirus resistant malware with genetic algorithms. -The first approach is to slowly make polymorphic changes to the virus in order to fool the malware detection. -One of the interesting things about this approach is that you have to have some way of verifying that the polymorphic behaviors that you apply to the virus don't break its "virus capabilities". +There are two major approaches which people have used to generate +antivirus resistant malware with genetic algorithms. The first +approach is to slowly make polymorphic changes to the virus in order +to fool the malware detection. One of the interesting things about +this approach is that you have to have some way of verifying that the +polymorphic behaviors that you apply to the virus don't break its +"virus capabilities". An other approach used is to represent a virus as a set of properties. -These properties are everything from the port of attack, the payloads, obfuscation parameters, etc. -The genetic algorithm would simply tweak the properties of the virus until it found a configuration which evaded the antivirus program. +These properties are everything from the port of attack, the payloads, +obfuscation parameters, etc. The genetic algorithm would simply tweak +the properties of the virus until it found a configuration which +evaded the antivirus program. ## Reinforcement Learning -A research group at [Endgame](https://www.endgame.com/) recently gave a [Def Con](https://www.defcon.org/) talk where they presented a framework which uses reinforcement learning to evade static virus detection. +A research group at [Endgame](https://www.endgame.com/) recently gave +a [Def Con](https://www.defcon.org/) talk where they presented a +framework which uses reinforcement learning to evade static virus +detection. ![Reinforcement Learning Diagram](media/AISaftey/Reinforcement_learning_diagram.png) -At a high level, the AI plays a "game" against the antivirus where the agent can make functionality-preserving mutations to the virus. -The reward for the agent is its ability to not get detected by the anti-virus. -Over time the AI will learn which type of actions will result in getting detected by the antivirus. -This framework can be found on [Github](https://github.com/endgameinc/gym-malware). +At a high level, the AI plays a "game" against the antivirus where the +agent can make functionality-preserving mutations to the virus. The +reward for the agent is its ability to not get detected by the +anti-virus. Over time the AI will learn which type of actions will +result in getting detected by the antivirus. This framework can be +found on [Github](https://github.com/endgameinc/gym-malware). # Takeaways -Machine learning is great, but, it needs to be properly defended. -As we start to use machine learning more and more, a large portion of the cyber security field may shift its focus away from securing systems to securing big data applications. +Machine learning is great, but, it needs to be properly defended. As +we start to use machine learning more and more, a large portion of the +cyber security field may shift its focus away from securing systems to +securing big data applications. # Resources diff --git a/blogContent/posts/data-science/lets-build-a-genetic-algorithm.md b/blogContent/posts/data-science/lets-build-a-genetic-algorithm.md index 1dcdfb5..0f735bc 100644 --- a/blogContent/posts/data-science/lets-build-a-genetic-algorithm.md +++ b/blogContent/posts/data-science/lets-build-a-genetic-algorithm.md @@ -4,36 +4,49 @@ # Background and Theory -Since you stumbled upon this article, you might be wondering what the heck genetic algorithms are. -To put it simply: genetic algorithms employ the same tactics used in natural selection to find an optimal solution to an optimization problem. -Genetic algorithms are often used in high dimensional problems where the optimal solutions are not apparent. -Genetic algorithms are commonly used to tune the [hyper-parameters](https://en.wikipedia.org/wiki/Hyperparameter) of a program. -However, this algorithm can be used in any scenario where you have a function which defines how well a solution is. -Many people have used genetic algorithms in video games to auto learn the weaknesses of players. - -The beautiful part about Genetic Algorithms are their simplicity; you need absolutely no knowledge of linear algebra or calculus. -To implement a genetic algorithm from scratch you only need **very basic** algebra and a general grasp of evolution. +Since you stumbled upon this article, you might be wondering what the +heck genetic algorithms are. To put it simply: genetic algorithms +employ the same tactics used in natural selection to find an optimal +solution to an optimization problem. Genetic algorithms are often used +in high dimensional problems where the optimal solutions are not +apparent. Genetic algorithms are commonly used to tune the +[hyper-parameters](https://en.wikipedia.org/wiki/Hyperparameter) of a +program. However, this algorithm can be used in any scenario where you +have a function which defines how well a solution is. Many people have +used genetic algorithms in video games to auto learn the weaknesses of +players. + +The beautiful part about Genetic Algorithms are their simplicity; you +need absolutely no knowledge of linear algebra or calculus. To +implement a genetic algorithm from scratch you only need **very +basic** algebra and a general grasp of evolution. # Genetic Algorithm -All genetic algorithms typically have a single cycle where you continuously mutate, breed, and select the most optimal solutions. -I will dive into each section of this algorithm using simple JavaScript code snippets. -The algorithm which I present is very generic and modular so it should be easy to port into other programming languages and applications. +All genetic algorithms typically have a single cycle where you +continuously mutate, breed, and select the most optimal solutions. I +will dive into each section of this algorithm using simple JavaScript +code snippets. The algorithm which I present is very generic and +modular so it should be easy to port into other programming languages +and applications. ![Genetic Algorithms Flow Chart](media/GA/GAFlowChart.svg) ## Population Creation -The very first thing we need to do is specify a data-structure for storing our genetic information. -In biology, chromosomes are composed of sequences of genes. -Many people run genetic algorithms on binary arrays since they more closely represent DNA. -However, as computer scientists, it is often easier to model problems using continuous numbers. -In this approach, every gene will be a single floating point number ranging between zero and one. -Every type of gene will have a max and min value which represents the absolute extremes of that gene. -This works well for optimization because it allows us to easily limit our search space. -For example, we can specify that "height" gene can only vary between 0 and 90. -To get the actual value of the gene from its \[0-1] value we simple de-normalize it. +The very first thing we need to do is specify a data-structure for +storing our genetic information. In biology, chromosomes are composed +of sequences of genes. Many people run genetic algorithms on binary +arrays since they more closely represent DNA. However, as computer +scientists, it is often easier to model problems using continuous +numbers. In this approach, every gene will be a single floating point +number ranging between zero and one. Every type of gene will have a +max and min value which represents the absolute extremes of that gene. +This works well for optimization because it allows us to easily limit +our search space. For example, we can specify that "height" gene can +only vary between 0 and 90. To get the actual value of the gene from +its \[0-1] value we simple de-normalize it. $$ g_{real value} = (g_{high}- g_{low})g_{norm} + g_{low} @@ -87,17 +100,22 @@ class Gene ``` -Now that we have genes, we can create chromosomes. -Chromosomes are simply collections of genes. -Whatever language you make this in, make sure that when you create a new chromosome it -is has a [deep copy](https://en.wikipedia.org/wiki/Object_copying) of the original genetic information rather than a shallow copy. -A shallow copy is when you simple copy the object pointer where a deep copy is actually creating a new object. -If you fail to do a deep copy, you will have weird issues where multiple chromosomes will share the same DNA. +Now that we have genes, we can create chromosomes. Chromosomes are +simply collections of genes. Whatever language you make this in, make +sure that when you create a new chromosome it is has a [deep +copy](https://en.wikipedia.org/wiki/Object_copying) of the original +genetic information rather than a shallow copy. A shallow copy is when +you simple copy the object pointer where a deep copy is actually +creating a new object. If you fail to do a deep copy, you will have +weird issues where multiple chromosomes will share the same DNA. -In this class I added helper functions to clone the chromosome as a random copy. -You can only create a new chromosome by cloning because I wanted to keep the program generic and make no assumptions about the domain. -Since you only provide the min/max information for the genes once, cloning an existing chromosome is the easiest way of - ensuring that all corresponding chromosomes contain genes with identical extrema. +In this class I added helper functions to clone the chromosome as a +random copy. You can only create a new chromosome by cloning because +I wanted to keep the program generic and make no assumptions about the +domain. Since you only provide the min/max information for the genes +once, cloning an existing chromosome is the easiest way of ensuring +that all corresponding chromosomes contain genes with identical +extrema. ```javascript @@ -148,7 +166,8 @@ class Chromosome } ``` -Creating a random population is pretty straight forward if implemented a method to create a random clone of a chromosome. +Creating a random population is pretty straight forward if implemented +a method to create a random clone of a chromosome. ```javascript /** @@ -170,16 +189,17 @@ const createRandomPopulation = function(geneticChromosome, populationSize) }; ``` -This is where nearly all the domain information is introduced. -After you define what types of genes are found on each chromosome, you can create an entire population. -In this example all genes contain values ranging between one and ten. +This is where nearly all the domain information is introduced. After +you define what types of genes are found on each chromosome, you can +create an entire population. In this example all genes contain values +ranging between one and ten. ```javascript let gene1 = new Gene(1,10,10); let gene2 = new Gene(1,10,0.4); let geneList = [gene1, gene2]; -let exampleOrganism = new Chromosome(geneList); +let exampleOrganism = new Chromosome(geneList); let population = createRandomPopulation(genericChromosome, 100); ``` @@ -187,10 +207,14 @@ let population = createRandomPopulation(genericChromosome, 100); ## Evaluate Fitness -Like all optimization problems, you need a way to evaluate the performance of a particular solution. -The cost function takes in a chromosome and evaluates how close it got to the ideal solution. -This particular example it is just computing the [Manhattan Distance](https://en.wiktionary.org/wiki/Manhattan_distance) to a random 2D point. -I chose two dimensions because it is easy to graph, however, real applications may have dozens of genes on each chromosome. +Like all optimization problems, you need a way to evaluate the +performance of a particular solution. The cost function takes in a +chromosome and evaluates how close it got to the ideal solution. This +particular example it is just computing the [Manhattan +Distance](https://en.wiktionary.org/wiki/Manhattan_distance) to a +random 2D point. I chose two dimensions because it is easy to graph, +however, real applications may have dozens of genes on each +chromosome. ```javascript let costx = Math.random() * 10; @@ -209,9 +233,11 @@ const basicCostFunction = function(chromosome) ## Selection -Selecting the best performing chromosomes is straightforward after you have a function for evaluating the performance. -This code snippet also computes the average and best chromosome of the population to make it easier to graph and define -the stopping point for the algorithm's main loop. +Selecting the best performing chromosomes is straightforward after you +have a function for evaluating the performance. This code snippet also +computes the average and best chromosome of the population to make it +easier to graph and define the stopping point for the algorithm's main +loop. ```javascript /** @@ -249,10 +275,12 @@ const naturalSelection = function(population, keepNumber, fitnessFunction) }; ``` -You might be wondering how I sorted the list of JSON objects - not a numerical array. -I used the following function as a comparator for JavaScript's built in sort function. -This comparator will compare objects based on a specific attribute that you give it. -This is a very handy function to include in all of your JavaScript projects for easy sorting. +You might be wondering how I sorted the list of JSON objects - not a +numerical array. I used the following function as a comparator for +JavaScript's built in sort function. This comparator will compare +objects based on a specific attribute that you give it. This is a very +handy function to include in all of your JavaScript projects for easy +sorting. ```javascript /** @@ -281,15 +309,16 @@ function predicateBy(prop) ## Reproduction -The process of reproduction can be broken down into Pairing and Mating. +The process of reproduction can be broken down into Pairing and +Mating. ### Pairing -Pairing is the process of selecting mates to produce offspring. -A typical approach will separate the population into two segments of mothers and fathers. -You then randomly pick pairs of mothers and fathers to produce offspring. -It is ok if one chromosome mates more than once. -It is just important that you keep this process random. +Pairing is the process of selecting mates to produce offspring. A +typical approach will separate the population into two segments of +mothers and fathers. You then randomly pick pairs of mothers and +fathers to produce offspring. It is ok if one chromosome mates more +than once. It is just important that you keep this process random. ```javascript /** @@ -317,22 +346,26 @@ const matePopulation = function(population, desiredPopulationSize) ### Mating -Mating is the actual act of forming new chromosomes/organisms based on your previously selected pairs. -From my research, there are two major forms of mating: blending, crossover. +Mating is the actual act of forming new chromosomes/organisms based on +your previously selected pairs. From my research, there are two major +forms of mating: blending, crossover. -Blending is typically the most preferred approach to mating when dealing with continuous variables. -In this approach you combine the genes of both parents based on a random factor. +Blending is typically the most preferred approach to mating when +dealing with continuous variables. In this approach you combine the +genes of both parents based on a random factor. $$ c_{new} = r * c_{mother} + (1-r) * c_{father} $$ -The second offspring simply uses (1-r) for their random factor to adjust the chromosomes. +The second offspring simply uses (1-r) for their random factor to +adjust the chromosomes. -Crossover is the simplest approach to mating. -In this process you clone the parents and then you randomly swap *n* of their genes. -This works fine in some scenarios; however, this severely lacks the genetic diversity of the genes because you now have to solely -rely on mutations for changes. +Crossover is the simplest approach to mating. In this process you +clone the parents and then you randomly swap *n* of their genes. This +works fine in some scenarios; however, this severely lacks the genetic +diversity of the genes because you now have to solely rely on +mutations for changes. ```javascript /** @@ -373,14 +406,16 @@ const blendGene = function(gene1, gene2, blendCoef) ## Mutation -Mutations are random changes to an organisms DNA. -In the scope of genetic algorithms, it helps our population converge on the correct solution. +Mutations are random changes to an organisms DNA. In the scope of +genetic algorithms, it helps our population converge on the correct +solution. -You can either adjust genes by a factor resulting in a smaller change or, you can -change the value of the gene to be something completely random. -Since we are using the blending technique for reproduction, we already have small incremental changes. -I prefer to use mutations to randomly change the entire gene since it helps prevent the algorithm -from settling on a local minimum rather than the global minimum. +You can either adjust genes by a factor resulting in a smaller change +or, you can change the value of the gene to be something completely +random. Since we are using the blending technique for reproduction, we +already have small incremental changes. I prefer to use mutations to +randomly change the entire gene since it helps prevent the algorithm +from settling on a local minimum rather than the global minimum. ```javascript @@ -408,11 +443,13 @@ const mutatePopulation = function(population, mutatePercentage) ## Immigration -Immigration or "new blood" is the process of dumping random organisms into your population at each generation. -This prevents us from getting stuck in a local minimum rather than the global minimum. -There are more advanced techniques to accomplish this same concept. -My favorite approach (not implemented here) is raising **x** populations simultaneously and every **y** generations -you take **z** organisms from each population and move them to another population. +Immigration or "new blood" is the process of dumping random organisms +into your population at each generation. This prevents us from getting +stuck in a local minimum rather than the global minimum. There are +more advanced techniques to accomplish this same concept. My favorite +approach (not implemented here) is raising **x** populations +simultaneously and every **y** generations you take **z** organisms +from each population and move them to another population. ```javascript /** @@ -432,7 +469,8 @@ const newBlood = function(population, immigrationSize) ## Putting It All Together -Now that we have all the ingredients for a genetic algorithm we can piece it together in a simple loop. +Now that we have all the ingredients for a genetic algorithm we can +piece it together in a simple loop. ```javascript /** @@ -487,11 +525,14 @@ const runGeneticOptimization = function(geneticChromosome, costFunction, ## Running -Running the program is pretty straight forward after you have your genes and cost function defined. -You might be wondering if there is an optimal configuration of parameters to use with this algorithm. -The answer is that it varies based on the particular problem. -Problems like the one graphed by this website perform very well with a low mutation rate and a high population. -However, some higher dimensional problems won't even converge on a local answer if you set your mutation rate too low. +Running the program is pretty straight forward after you have your +genes and cost function defined. You might be wondering if there is an +optimal configuration of parameters to use with this algorithm. The +answer is that it varies based on the particular problem. Problems +like the one graphed by this website perform very well with a low +mutation rate and a high population. However, some higher dimensional +problems won't even converge on a local answer if you set your +mutation rate too low. ```javascript let gene1 = new Gene(1,10,10); @@ -499,17 +540,15 @@ let gene1 = new Gene(1,10,10); let geneN = new Gene(1,10,0.4); let geneList = [gene1,..., geneN]; -let exampleOrganism = new Chromosome(geneList); +let exampleOrganism = new Chromosome(geneList); -costFunction = function(chromosome) -{ - var d =...; - //compute cost - return d; -} +costFunction = function(chromosome) { var d =...; //compute +cost return d; } runGeneticOptimization(exampleOrganism, costFunction, 100, 50, 0.01, 0.3, 20, 10); ``` -The complete code for the genetic algorithm and the fancy JavaScript graphs can be found in my [Random Scripts GitHub Repository](https://github.com/jrtechs/RandomScripts). -In the future I may package this into an [npm](https://www.npmjs.com/) package. +The complete code for the genetic algorithm and the fancy JavaScript +graphs can be found in my [Random Scripts GitHub +Repository](https://github.com/jrtechs/RandomScripts). In the future I +may package this into an [npm](https://www.npmjs.com/) package. \ No newline at end of file diff --git a/blogContent/posts/data-science/r-programming-language.md b/blogContent/posts/data-science/r-programming-language.md index c45efe7..24f0cf5 100644 --- a/blogContent/posts/data-science/r-programming-language.md +++ b/blogContent/posts/data-science/r-programming-language.md @@ -1,39 +1,52 @@ -R is a programming language designed for statistical analysis and graphics. -Since R has been around since 1992, it has developed a large community and has over [13 thousand packages](https://cran.r-project.org/web/packages/) publicly available. -What is really cool about R is that it is an open source [GNU](http://www.gnu.org/) project. +R is a programming language designed for statistical analysis and +graphics. Since R has been around since 1992, it has developed a large +community and has over [13 thousand +packages](https://cran.r-project.org/web/packages/) publicly +available. What is really cool about R is that it is an open source +[GNU](http://www.gnu.org/) project. # R Syntax and Paradigms -The syntax of R is C esk with its use of curly braces. -The type system of R is similar to Python where it can infer what type you are using. -This "lazy" type system allows for "faster" development since you don't have to worry about declaring types -- this laziness makes it harder to debug and read your code. -The type system of R is rather strange and distinctly different from most other languages. -For starters, integers are represented as vectors of length 1. -These things may feel weird at first, but, R's type system is one of the things that make it a great tool for manipulating data. +The syntax of R is C esk with its use of curly braces. The type +system of R is similar to Python where it can infer what type you are +using. This "lazy" type system allows for "faster" development since +you don't have to worry about declaring types -- this laziness makes +it harder to debug and read your code. The type system of R is rather +strange and distinctly different from most other languages. For +starters, integers are represented as vectors of length 1. These +things may feel weird at first, but, R's type system is one of the +things that make it a great tool for manipulating data. ![R Arrays Start at 1](media/r/arrays.jpg) -Did I mention that arrays start at 1? -Technically, the thing which we refer to as an array in Java are really vectors in R. -Arrays in R are data objects which can store data in more than two dimensions. -Since R tries to follow mathematical notation, indexing starts at 1 -- just like in linear algebra. -Using zero based indexing makes sense for languages like C because the index is used to get at a particular memory location from a pointer. +Did I mention that arrays start at 1? Technically, the thing which we +refer to as an array in Java are really vectors in R. Arrays in R are +data objects which can store data in more than two dimensions. Since R +tries to follow mathematical notation, indexing starts at 1 -- just +like in linear algebra. Using zero based indexing makes sense for +languages like C because the index is used to get at a particular +memory location from a pointer. -I don't have the time to go over the basic syntax of R in a single blog post; however, I feel that this youtube video does a pretty good job. +I don't have the time to go over the basic syntax of R in a single +blog post; however, I feel that this youtube video does a pretty good +job. # R Markdown One of my favorite aspects of R is its markdown language called Rmd. -Rmd is essentially markdown which has can have embedded R scripts run in it. -The Rmd file is compiled down to a markdown file which is converted to either a PDF, HTML file, or a slide show using pandoc. -You can provide options for the pandoc render using a YAMAL header in the Rmd file. -This is an amazing tool for creating reports and writing research papers. -The documents which you create are reproducible since you can share the source code to it. -If the data which you are using changes, you simply have to recompile to document to get an updated view. -You no longer have to re-generate a dozen graphs and update figures and statistics across your document. +Rmd is essentially markdown which has can have embedded R scripts run +in it. The Rmd file is compiled down to a markdown file which is +converted to either a PDF, HTML file, or a slide show using pandoc. +You can provide options for the pandoc render using a YAMAL header in +the Rmd file. This is an amazing tool for creating reports and writing +research papers. The documents which you create are reproducible since +you can share the source code to it. If the data which you are using +changes, you simply have to recompile to document to get an updated +view. You no longer have to re-generate a dozen graphs and update +figures and statistics across your document. # Resources diff --git a/blogContent/posts/open-source/the-essential-vim-configuration.md b/blogContent/posts/open-source/the-essential-vim-configuration.md index f2d401c..41b288a 100644 --- a/blogContent/posts/open-source/the-essential-vim-configuration.md +++ b/blogContent/posts/open-source/the-essential-vim-configuration.md @@ -1,31 +1,27 @@ # Vim Configuration -Stock Vim is pretty boring. -The good news is that Vim has a very comprehensive configuration file which -allows you to tweak it to your heart's content. -To make changes to Vim you simply modify the ~/.vimrc file in your home -directory. -By adding simple commands this file you can easily change the way your -text editor looks. -Neat. +Stock Vim is pretty boring. The good news is that Vim has a very +comprehensive configuration file which allows you to tweak it to your +heart's content. To make changes to Vim you simply modify the ~/.vimrc +file in your home directory. By adding simple commands this file you +can easily change the way your text editor looks. Neat. I attempted to create the smallest Vim configuration file which makes -Vim usable enough for me to use as my daily text editor. -I believe that it is important for everyone to know what their -Vim configuration does. -This knowledge will help ensure that you are only adding the things -you want and that you can later customize it for your workflow. +Vim usable enough for me to use as my daily text editor. I believe +that it is important for everyone to know what their Vim configuration +does. This knowledge will help ensure that you are only adding the +things you want and that you can later customize it for your workflow. Although it may be tempting to download somebody else's massive Vim -configuration, I argue that this can lead to problems down the road. +configuration, I argue that this can lead to problems down the road. -I want to mention that I don't use Vim as my primary -IDE; I only use Vim as a text editor. -I tend to use JetBrains tools on larger projects since they have amazing -auto complete functionality, build tools, and comprehensive error detection. -There are great Vim configurations out there on the internet; however, most -tend to be a bit overkill for what most people want to do. +I want to mention that I don't use Vim as my primary IDE; I only use +Vim as a text editor. I tend to use JetBrains tools on larger projects +since they have amazing auto complete functionality, build tools, and +comprehensive error detection. There are great Vim configurations out +there on the internet; however, most tend to be a bit overkill for +what most people want to do. -Alright, lets dive into my vim configuration! +Alright, lets dive into my vim configuration! # Spell Check @@ -35,34 +31,32 @@ autocmd BufRead,BufNewFile *.md setlocal spell spelllang=en_us autocmd BufRead,BufNewFile *.txt setlocal spell spelllang=en_us ``` -Since I am often an atrocious speller, having basic spell check abilities in -Vim is a lifesaver. -It does not make sense to have spell check enabled for most files since it -would light up most programming files like a Christmas tree. -I have my Vim configuration set to automatically enable spell check for markdown files -and basic text files. -If you need spell check in other files, you can enter the command -":set spell" to enable spell check for that file. -To see the spelling recommendations, type "z=" when you are over a -highlighted word. +Since I am often an atrocious speller, having basic spell check +abilities in Vim is a lifesaver. It does not make sense to have spell +check enabled for most files since it would light up most programming +files like a Christmas tree. I have my Vim configuration set to +automatically enable spell check for markdown files and basic text +files. If you need spell check in other files, you can enter the +command ":set spell" to enable spell check for that file. To see the +spelling recommendations, type "z=" when you are over a highlighted +word. # Appearance -Adding colors to Vim is fun. -The "syntax enable" command tells vim to highlight keywords in programming -files and other structured files. +Adding colors to Vim is fun. The "syntax enable" command tells vim to +highlight keywords in programming files and other structured files. ```vim syntax enable ``` -I would encourage everyone to look at the different color schemes available for -Vim. -I threw the color scheme command in a try-catch block to ensure that it does not crash -Vim if you don't have the color scheme installed. -By default the desert color scheme is installed; however, that is not always the -case for [community created](http://vimcolors.com/) Vim color schemes. +I would encourage everyone to look at the different color schemes +available for Vim. I threw the color scheme command in a try-catch +block to ensure that it does not crash Vim if you don't have the color +scheme installed. By default the desert color scheme is installed; +however, that is not always the case for [community +created](http://vimcolors.com/) Vim color schemes. ```vim try @@ -70,13 +64,12 @@ try catch endtry -set background=dark -``` +set background=dark ``` # Indentation and Tabs -Having your indentation settings squared away will save you a ton of time -if you are doing any programming in Vim. +Having your indentation settings squared away will save you a ton of +time if you are doing any programming in Vim. ```vim "copy indentation from current line when making a new line @@ -84,28 +77,25 @@ set autoindent " Smart indentation when programming: indent after { set smartindent -set tabstop=4 " number of spaces per tab -set expandtab " convert tabs to spaces -set shiftwidth=4 " set a tab press equal to 4 spaces -``` +set tabstop=4 " number of spaces per tab set expandtab " +convert tabs to spaces set shiftwidth=4 " set a tab press equal to 4 +spaces ``` # Useful UI Tweaks -These are three UI tweaks that I find really useful to have, some people may -have different opinions on these. -Seeing line numbers is useful since programming errors typically just -tells you what line your program went up in flames. -The cursor line is useful since it allows you to easily to find your place -in the file -- this may be a bit too much for some people. - -I like to keep every line under 80 characters long for technical files, -having a visual queue for this is helpful. -Some people prefer to just use the auto word wrap and keep their lines as long -as they like. -I like to keep to the 80 character limit and explicitly choose where -I cut each line. -Some of my university classes mandate the 80 character limit and take -points off if you don't follow it. +These are three UI tweaks that I find really useful to have, some +people may have different opinions on these. Seeing line numbers is +useful since programming errors typically just tells you what line +your program went up in flames. The cursor line is useful since it +allows you to easily to find your place in the file -- this may be a +bit too much for some people. + +I like to keep every line under 80 characters long for technical +files, having a visual queue for this is helpful. Some people prefer +to just use the auto word wrap and keep their lines as long as they +like. I like to keep to the 80 character limit and explicitly choose +where I cut each line. Some of my university classes mandate the 80 +character limit and take points off if you don't follow it. ```vim " Set Line Numbers to show " @@ -121,7 +111,7 @@ set colorcolumn=80 # Searching and Auto Complete -This these configurations make searching in Vim less painful. +This these configurations make searching in Vim less painful. ```vim " search as characters are entered " @@ -133,8 +123,8 @@ set hlsearch set ignorecase ``` -These configurations will make command completion easier by -showing an auto-complete menu when you press tab. +These configurations will make command completion easier by showing +an auto-complete menu when you press tab. ```vim " Shows a auto complete menu when you are typing a command " @@ -147,11 +137,10 @@ set wildignorecase " ignore case for auto complete # Useful Things to Have -There is nothing too earth shattering in this section, just things that -might save you some time. -Enabling mouse support is a really interesting configuration. -When enabled, this allows you to select text and jump between different -locations with your mouse. +There is nothing too earth shattering in this section, just things +that might save you some time. Enabling mouse support is a really +interesting configuration. When enabled, this allows you to select +text and jump between different locations with your mouse. ```vim " Enables mouse support " @@ -170,7 +159,7 @@ set autoread set lazyredraw ``` -Setting your file format is always a good idea for compatibility. +Setting your file format is always a good idea for compatibility. ```vim " Set utf8 as standard encoding and en_US as the standard language " @@ -183,6 +172,6 @@ set ffs=unix,dos,mac # Wrapping it up I hope that this quick blog post inspired you to maintain your own Vim -configuration file. -You can find my current configuration files in my -[random scripts repository](https://github.com/jrtechs/RandomScripts/tree/master/config). +configuration file. You can find my current configuration files in my +[random scripts +repository](https://github.com/jrtechs/RandomScripts/tree/master/config). diff --git a/blogContent/posts/other/2018-in-review.md b/blogContent/posts/other/2018-in-review.md index 9f919ed..31caa80 100644 --- a/blogContent/posts/other/2018-in-review.md +++ b/blogContent/posts/other/2018-in-review.md @@ -1,7 +1,10 @@ -Inspired by [Justin Flory](https://justinwflory.com/) and [Dan Schneiderman](http://www.schneidy.com), -I decided to make a 2018 review post. I believe that it would be a good way to reflect upon what I did -in 2018 and make plans for 2019. This post will be a very high level overview of the projects and -activities that I did in 2018 -- nothing personal. Pictures say a thousand words, so, I will include a lot. +Inspired by [Justin Flory](https://justinwflory.com/) and [Dan +Schneiderman](http://www.schneidy.com), I decided to make a 2018 +review post. I believe that it would be a good way to reflect upon +what I did in 2018 and make plans for 2019. This post will be a very +high level overview of the projects and activities that I did in 2018 +-- nothing personal. Pictures say a thousand words, so, I will include +a lot. # January: @@ -11,7 +14,7 @@ activities that I did in 2018 -- nothing personal. Pictures say a thousand words **Started Second Semester of College** -Classes: +Classes: - Mechanics of Programming - Statistics @@ -92,9 +95,9 @@ Classes: **Second Year of College** -First year on the Eboard of RITlug as Vice President. +First year on the Eboard of RITlug as Vice President. -Classes: +Classes: - Linear Algebra - Analysis Of Algorithms diff --git a/blogContent/posts/other/morality-of-self-driving-cars.md b/blogContent/posts/other/morality-of-self-driving-cars.md index bc527d9..5542eb6 100644 --- a/blogContent/posts/other/morality-of-self-driving-cars.md +++ b/blogContent/posts/other/morality-of-self-driving-cars.md @@ -1,84 +1,106 @@ -Although the movie *I Robot* has not aged well, it still brings up some interesting ethical questions -that we are still discussing concerning self driving cars. The protagonist Detective Spooner -has an almost unhealthy amount of distrust towards -robots. In the movie, a robot decided to save Spooner's life over a 12 year old girl in a car accident. -This ignites the famous ethical debate of the trolley problem, but, now with artificial intelligence. -The debate boils down to this: are machines capable of making moral decisions. The - surface level answer from the movie is presented as **no** when Spooner's presents car crash antidote. -This question parallels the discussion that we are currently having with self driving cars. -When a self driving car is presented with two options which result in the loss of life, -what should it choose? +Although the movie *I Robot* has not aged well, it still brings up +some interesting ethical questions that we are still discussing +concerning self driving cars. The protagonist Detective Spooner has +an almost unhealthy amount of distrust towards robots. In the movie, a +robot decided to save Spooner's life over a 12 year old girl in a car +accident. This ignites the famous ethical debate of the trolley +problem, but, now with artificial intelligence. The debate boils down +to this: are machines capable of making moral decisions. The surface +level answer from the movie is presented as **no** when Spooner's +presents car crash antidote. This question parallels the discussion +that we are currently having with self driving cars. When a self +driving car is presented with two options which result in the loss of +life, what should it choose? -When surveyed, most people say that they would prefer to have self driving cars take the utilitarian -approach towards the trolley problem. A utilitarian approach would try to minimize the - total amount of harm. MIT made a neat [website](http://moralmachine.mit.edu/) where it presents you with a -bunch of "trolley problems" where you have to decide who dies. At the end of the survey the -website presents you with a list of observed preferences you made when deciding who's life was more important to save. -The purpose of the trolley problem is merely to ponder what decision a self driving car -should make when **all** of its alternatives are depleted. +When surveyed, most people say that they would prefer to have self +driving cars take the utilitarian approach towards the trolley +problem. A utilitarian approach would try to minimize the total +amount of harm. MIT made a neat +[website](http://moralmachine.mit.edu/) where it presents you with a +bunch of "trolley problems" where you have to decide who dies. At the +end of the survey the website presents you with a list of observed +preferences you made when deciding who's life was more important to +save. The purpose of the trolley problem is merely to ponder what +decision a self driving car should make when **all** of its +alternatives are depleted. ![Moral Machine](media/selfDrivingCars/moralmachine3.png) -We still need to question whether -utilitarianism is the right moral engine for self driving cars. Would it be ethical -for a car to take into account -you age, race, gender, and social status when deciding if you get to live? -If self driving cars could access personal information such as criminal history or known friends, would it - be ethical to use that information? Would it be moral for -someone to make a car which favored the safety of the passengers of the car above -others? +We still need to question whether utilitarianism is the right moral +engine for self driving cars. Would it be ethical for a car to take +into account you age, race, gender, and social status when deciding +if you get to live? If self driving cars could access personal +information such as criminal history or known friends, would it be +ethical to use that information? Would it be moral for someone to make +a car which favored the safety of the passengers of the car above +others? ![Moral Machine](media/selfDrivingCars/moralMachine.png) -Even though most people want self driving cars to use utilitarianism, most people surveyed also responded -that they would not buy a car which did not have their safety as its top priority. -This brings up a serious social dilemma. If people want everyone else's cars to be utilitarians, -yet, have their own cars be greedy and favor their safety, we would see none of the utilitarian improvements. This -presented us with the tragedy of the commons problem since everyone would favor their own -safety and nobody would sacrifice their safety for the public good. This brings up yet another question: -would it be fair to ask someone to sacrifice their safety in this way? - -In most cases, when a tragedy of the commons situation is presented, government intervention is - the most piratical solution. It might be the best to have the government -mandate that all cars try to maximize the amount of life saved when a car is presented with the -trolley problem. Despite appearing to be a good solution, the flaw in this does not become apparent before you us -consequentialism to examine this problem. +Even though most people want self driving cars to use utilitarianism, +most people surveyed also responded that they would not buy a car +which did not have their safety as its top priority. This brings up a +serious social dilemma. If people want everyone else's cars to be +utilitarians, yet, have their own cars be greedy and favor their +safety, we would see none of the utilitarian improvements. This +presented us with the tragedy of the commons problem since everyone +would favor their own safety and nobody would sacrifice their safety +for the public good. This brings up yet another question: would it be +fair to ask someone to sacrifice their safety in this way? + +In most cases, when a tragedy of the commons situation is presented, +government intervention is the most piratical solution. It might be +the best to have the government mandate that all cars try to maximize +the amount of life saved when a car is presented with the trolley +problem. Despite appearing to be a good solution, the flaw in this +does not become apparent before you us consequentialism to examine +this problem. ![Moral Machine](media/selfDrivingCars/moralMachine6.png) -Self driving cars are expected to reduce car accidents by 90% by eliminating human error. If people -decide to not use self driving cars due to the utilitarian moral engine, we run the -risk of actually loosing more lives. Some people have actually argued that since -artificial intelligence is incapable of making moral decisions, they should take -no action at all when there is a situation which will always results in the loss of life. -In the frame of the trolley problem, -it is best for the artificial intelligence to not pull the lever. I will argue that -it is best for self driving cars to not make ethical -decisions because, it would result in the highest adoption rate of self driving cars. This would end up -saving the most lives in the long run. Plus, the likelihood that a car is actually presented with - a trolley problem is pretty slim. - -The discussion over the moral decisions a car has to make is almost fruitless. It turns out -that humans are not even good at making moral decisions in emergency situations. When we make rash decisions -influenced by anxiety, we are heavily influenced by prejudices and self motives. Despite our own shortcomings when it -comes to decision making, that does not mean that we can not do better with self driving cars. However, -we need to realize that it is the mass adoption of self driving cars which will save the most lives, not -the moral engine which we program the cars with. We can not let the moral engine of the self driving -cars get in the way of adoption. - -The conclusion that I made parallels Spooner's problem with robots in the movie *I Robot*. Spooner was so mad at the robots for -saving his own life rather than the girl's, he never realized that if it was not for the robots, neither of them would -have survived that car crash. Does that mean we can't do better than not pulling the lever? Well... not exactly. -Near the end of the movie a robot was presented with another trolley problem, but, this time he managed to -find a way which saved both parties. Without reading into this movie too deep, this illustrates how the early -adoption of artificial intelligence ended up saving tons of lives like Spooners. It is only when the technology fully develops -is when we can start to avoid the trolley problem completely. +Self driving cars are expected to reduce car accidents by 90% by +eliminating human error. If people decide to not use self driving cars +due to the utilitarian moral engine, we run the risk of actually +loosing more lives. Some people have actually argued that since +artificial intelligence is incapable of making moral decisions, they +should take no action at all when there is a situation which will +always results in the loss of life. In the frame of the trolley +problem, it is best for the artificial intelligence to not pull the +lever. I will argue that it is best for self driving cars to not make +ethical decisions because, it would result in the highest adoption +rate of self driving cars. This would end up saving the most lives in +the long run. Plus, the likelihood that a car is actually presented +with a trolley problem is pretty slim. + +The discussion over the moral decisions a car has to make is almost +fruitless. It turns out that humans are not even good at making moral +decisions in emergency situations. When we make rash decisions +influenced by anxiety, we are heavily influenced by prejudices and +self motives. Despite our own shortcomings when it comes to decision +making, that does not mean that we can not do better with self driving +cars. However, we need to realize that it is the mass adoption of self +driving cars which will save the most lives, not the moral engine +which we program the cars with. We can not let the moral engine of the +self driving cars get in the way of adoption. + +The conclusion that I made parallels Spooner's problem with robots in +the movie *I Robot*. Spooner was so mad at the robots for saving his +own life rather than the girl's, he never realized that if it was not +for the robots, neither of them would have survived that car crash. +Does that mean we can't do better than not pulling the lever? Well... +not exactly. Near the end of the movie a robot was presented with +another trolley problem, but, this time he managed to find a way which +saved both parties. Without reading into this movie too deep, this +illustrates how the early adoption of artificial intelligence ended up +saving tons of lives like Spooners. It is only when the technology +fully develops is when we can start to avoid the trolley problem +completely.