Browse Source

Re-formatted some blog posts using custom paragraph formatter which limits col to 70 characters.

pull/77/head
jrtechs 5 years ago
parent
commit
60356eb059
6 changed files with 406 additions and 302 deletions
  1. +82
    -44
      blogContent/posts/data-science/is-using-ml-for-antivirus-safe.md
  2. +128
    -89
      blogContent/posts/data-science/lets-build-a-genetic-algorithm.md
  3. +35
    -22
      blogContent/posts/data-science/r-programming-language.md
  4. +64
    -75
      blogContent/posts/open-source/the-essential-vim-configuration.md
  5. +10
    -7
      blogContent/posts/other/2018-in-review.md
  6. +87
    -65
      blogContent/posts/other/morality-of-self-driving-cars.md

+ 82
- 44
blogContent/posts/data-science/is-using-ml-for-antivirus-safe.md View File

@ -1,28 +1,41 @@
In this blog post I examine the ways in which antivirus programs currently employ machine learning and then go into the
security vulnerabilities that it brings.
In this blog post I examine the ways in which antivirus programs
currently employ machine learning and then go into the security
vulnerabilities that it brings.
# ML in the Antivirus Industry
Malware detection falls into two broad categories: static and dynamic analysis.
Static analysis examines the program without actually running the code.
Static analysis looks at things like the file fingerprints, hashes, reverse engineering, memory artifacts, packer detection, and debugging.
Static analysis is largely known for looking up the hashes of the virus against a known database of viruses.
It is super easy to fool signature based malware detection using simple obfuscation methods.
Dynamic analysis is a technique where you run the program in a sandbox and monitor all the actions that the virus takes.
If you notice that the program is acting suspicious, it is likely a virus.
Suspicious behavior typically includes things like registry edits and API calls to bad host names.
Antivirus detection is very difficult, but, probably not for the reasons you think.
The issue isn't writing programs which can detect these static or dynamic properties of viruses-- that is the easy part.
It is also relatively easy to determine a general rule set for what makes a program dangerous.
You can also easily blacklist suspicious domains, block malicious activity, and implement a signature based maleware detection program.
The real problem is that there are hundreds of thousands of malware applications and more are created every day.
Not only are there tons of pesky malware applications, there is an absurd amount of normal programs which we don't want malware applications to block.
It is impossible for a small team of malware researchers to create a definitive set of heuristics which can correctly identify all malware programs.
This is where we turn to the field of Machine Learning.
Humans are very bad with big data, but, computers love big data.
Most antivirus companies use machine learning and it has been a large success so far because it has allowed us to dramatically improve our ability to detect zero day viruses.
Malware detection falls into two broad categories: static and dynamic
analysis. Static analysis examines the program without actually
running the code. Static analysis looks at things like the file
fingerprints, hashes, reverse engineering, memory artifacts, packer
detection, and debugging. Static analysis is largely known for
looking up the hashes of the virus against a known database of
viruses. It is super easy to fool signature based malware detection
using simple obfuscation methods. Dynamic analysis is a technique
where you run the program in a sandbox and monitor all the actions
that the virus takes. If you notice that the program is acting
suspicious, it is likely a virus. Suspicious behavior typically
includes things like registry edits and API calls to bad host names.
Antivirus detection is very difficult, but, probably not for the
reasons you think. The issue isn't writing programs which can detect
these static or dynamic properties of viruses-- that is the easy part.
It is also relatively easy to determine a general rule set for what
makes a program dangerous. You can also easily blacklist suspicious
domains, block malicious activity, and implement a signature based
maleware detection program.
The real problem is that there are hundreds of thousands of malware
applications and more are created every day. Not only are there tons
of pesky malware applications, there is an absurd amount of normal
programs which we don't want malware applications to block. It is
impossible for a small team of malware researchers to create a
definitive set of heuristics which can correctly identify all malware
programs. This is where we turn to the field of Machine Learning.
Humans are very bad with big data, but, computers love big data. Most
antivirus companies use machine learning and it has been a large
success so far because it has allowed us to dramatically improve our
ability to detect zero day viruses.
## Interesting Examples
@ -39,50 +52,75 @@ Anything which is not a normal program, it alerts you about since it can be a vi
### Kaspersky
Kaspersky appears to have done a ton of research into using machine learning for malware detection.
I would highly recommend that you read their [white paper](https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf) on this subject.
Kaspersky appears to have done a ton of research into using machine
learning for malware detection. I would highly recommend that you read
their [white
paper](https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf)
on this subject.
# Why is this a problem?
It turns out that machine learning systems can be easily fooled by using other machine learning algorithms.
A classic example of this is with image classification.
It is easy to use neural networks or genetic algorithms to generate examples which fool the machine learning application by learning the weights of the machine
learning application and then making slight tweaks to your input to give a false classification.
It turns out that machine learning systems can be easily fooled by
using other machine learning algorithms. A classic example of this is
with image classification. It is easy to use neural networks or
genetic algorithms to generate examples which fool the machine
learning application by learning the weights of the machine learning
application and then making slight tweaks to your input to give a
false classification.
![](media/AISaftey/AdversarialExample.png)
Since viruses generation is a non-differentiable problem, people often use Genetic algorithms for the adversarial network to fool the antivirus.
In other words, you don't want to attempt to calculate the derivative between two versions of a virus for gradient decent.
Since viruses are high dimensional problems, it turns out that most calc implementations would actually be inefficient at traversing the search space to find the global minimum.
If you want to learn more about genetic algorithms, check out my [recent blog post](https://jrtechs.net/data-science/lets-build-a-genetic-algorithm) on it.
Since viruses generation is a non-differentiable problem, people often
use Genetic algorithms for the adversarial network to fool the
antivirus. In other words, you don't want to attempt to calculate the
derivative between two versions of a virus for gradient decent. Since
viruses are high dimensional problems, it turns out that most calc
implementations would actually be inefficient at traversing the search
space to find the global minimum. If you want to learn more about
genetic algorithms, check out my [recent blog
post](https://jrtechs.net/data-science/lets-build-a-genetic-algorithm)
on it.
# Fooling Antivirus Software
## Genetic Algorithms
There are two major approaches which people have used to generate antivirus resistant malware with genetic algorithms.
The first approach is to slowly make polymorphic changes to the virus in order to fool the malware detection.
One of the interesting things about this approach is that you have to have some way of verifying that the polymorphic behaviors that you apply to the virus don't break its "virus capabilities".
There are two major approaches which people have used to generate
antivirus resistant malware with genetic algorithms. The first
approach is to slowly make polymorphic changes to the virus in order
to fool the malware detection. One of the interesting things about
this approach is that you have to have some way of verifying that the
polymorphic behaviors that you apply to the virus don't break its
"virus capabilities".
An other approach used is to represent a virus as a set of properties.
These properties are everything from the port of attack, the payloads, obfuscation parameters, etc.
The genetic algorithm would simply tweak the properties of the virus until it found a configuration which evaded the antivirus program.
These properties are everything from the port of attack, the payloads,
obfuscation parameters, etc. The genetic algorithm would simply tweak
the properties of the virus until it found a configuration which
evaded the antivirus program.
## Reinforcement Learning
A research group at [Endgame](https://www.endgame.com/) recently gave a [Def Con](https://www.defcon.org/) talk where they presented a framework which uses reinforcement learning to evade static virus detection.
A research group at [Endgame](https://www.endgame.com/) recently gave
a [Def Con](https://www.defcon.org/) talk where they presented a
framework which uses reinforcement learning to evade static virus
detection.
![Reinforcement Learning Diagram](media/AISaftey/Reinforcement_learning_diagram.png)
At a high level, the AI plays a "game" against the antivirus where the agent can make functionality-preserving mutations to the virus.
The reward for the agent is its ability to not get detected by the anti-virus.
Over time the AI will learn which type of actions will result in getting detected by the antivirus.
This framework can be found on [Github](https://github.com/endgameinc/gym-malware).
At a high level, the AI plays a "game" against the antivirus where the
agent can make functionality-preserving mutations to the virus. The
reward for the agent is its ability to not get detected by the
anti-virus. Over time the AI will learn which type of actions will
result in getting detected by the antivirus. This framework can be
found on [Github](https://github.com/endgameinc/gym-malware).
# Takeaways
Machine learning is great, but, it needs to be properly defended.
As we start to use machine learning more and more, a large portion of the cyber security field may shift its focus away from securing systems to securing big data applications.
Machine learning is great, but, it needs to be properly defended. As
we start to use machine learning more and more, a large portion of the
cyber security field may shift its focus away from securing systems to
securing big data applications.
# Resources

+ 128
- 89
blogContent/posts/data-science/lets-build-a-genetic-algorithm.md View File

@ -4,36 +4,49 @@
# Background and Theory
Since you stumbled upon this article, you might be wondering what the heck genetic algorithms are.
To put it simply: genetic algorithms employ the same tactics used in natural selection to find an optimal solution to an optimization problem.
Genetic algorithms are often used in high dimensional problems where the optimal solutions are not apparent.
Genetic algorithms are commonly used to tune the [hyper-parameters](https://en.wikipedia.org/wiki/Hyperparameter) of a program.
However, this algorithm can be used in any scenario where you have a function which defines how well a solution is.
Many people have used genetic algorithms in video games to auto learn the weaknesses of players.
The beautiful part about Genetic Algorithms are their simplicity; you need absolutely no knowledge of linear algebra or calculus.
To implement a genetic algorithm from scratch you only need **very basic** algebra and a general grasp of evolution.
Since you stumbled upon this article, you might be wondering what the
heck genetic algorithms are. To put it simply: genetic algorithms
employ the same tactics used in natural selection to find an optimal
solution to an optimization problem. Genetic algorithms are often used
in high dimensional problems where the optimal solutions are not
apparent. Genetic algorithms are commonly used to tune the
[hyper-parameters](https://en.wikipedia.org/wiki/Hyperparameter) of a
program. However, this algorithm can be used in any scenario where you
have a function which defines how well a solution is. Many people have
used genetic algorithms in video games to auto learn the weaknesses of
players.
The beautiful part about Genetic Algorithms are their simplicity; you
need absolutely no knowledge of linear algebra or calculus. To
implement a genetic algorithm from scratch you only need **very
basic** algebra and a general grasp of evolution.
# Genetic Algorithm
All genetic algorithms typically have a single cycle where you continuously mutate, breed, and select the most optimal solutions.
I will dive into each section of this algorithm using simple JavaScript code snippets.
The algorithm which I present is very generic and modular so it should be easy to port into other programming languages and applications.
All genetic algorithms typically have a single cycle where you
continuously mutate, breed, and select the most optimal solutions. I
will dive into each section of this algorithm using simple JavaScript
code snippets. The algorithm which I present is very generic and
modular so it should be easy to port into other programming languages
and applications.
![Genetic Algorithms Flow Chart](media/GA/GAFlowChart.svg)
## Population Creation
The very first thing we need to do is specify a data-structure for storing our genetic information.
In biology, chromosomes are composed of sequences of genes.
Many people run genetic algorithms on binary arrays since they more closely represent DNA.
However, as computer scientists, it is often easier to model problems using continuous numbers.
In this approach, every gene will be a single floating point number ranging between zero and one.
Every type of gene will have a max and min value which represents the absolute extremes of that gene.
This works well for optimization because it allows us to easily limit our search space.
For example, we can specify that "height" gene can only vary between 0 and 90.
To get the actual value of the gene from its \[0-1] value we simple de-normalize it.
The very first thing we need to do is specify a data-structure for
storing our genetic information. In biology, chromosomes are composed
of sequences of genes. Many people run genetic algorithms on binary
arrays since they more closely represent DNA. However, as computer
scientists, it is often easier to model problems using continuous
numbers. In this approach, every gene will be a single floating point
number ranging between zero and one. Every type of gene will have a
max and min value which represents the absolute extremes of that gene.
This works well for optimization because it allows us to easily limit
our search space. For example, we can specify that "height" gene can
only vary between 0 and 90. To get the actual value of the gene from
its \[0-1] value we simple de-normalize it.
$$
g_{real value} = (g_{high}- g_{low})g_{norm} + g_{low}
@ -87,17 +100,22 @@ class Gene
```
Now that we have genes, we can create chromosomes.
Chromosomes are simply collections of genes.
Whatever language you make this in, make sure that when you create a new chromosome it
is has a [deep copy](https://en.wikipedia.org/wiki/Object_copying) of the original genetic information rather than a shallow copy.
A shallow copy is when you simple copy the object pointer where a deep copy is actually creating a new object.
If you fail to do a deep copy, you will have weird issues where multiple chromosomes will share the same DNA.
Now that we have genes, we can create chromosomes. Chromosomes are
simply collections of genes. Whatever language you make this in, make
sure that when you create a new chromosome it is has a [deep
copy](https://en.wikipedia.org/wiki/Object_copying) of the original
genetic information rather than a shallow copy. A shallow copy is when
you simple copy the object pointer where a deep copy is actually
creating a new object. If you fail to do a deep copy, you will have
weird issues where multiple chromosomes will share the same DNA.
In this class I added helper functions to clone the chromosome as a random copy.
You can only create a new chromosome by cloning because I wanted to keep the program generic and make no assumptions about the domain.
Since you only provide the min/max information for the genes once, cloning an existing chromosome is the easiest way of
ensuring that all corresponding chromosomes contain genes with identical extrema.
In this class I added helper functions to clone the chromosome as a
random copy. You can only create a new chromosome by cloning because
I wanted to keep the program generic and make no assumptions about the
domain. Since you only provide the min/max information for the genes
once, cloning an existing chromosome is the easiest way of ensuring
that all corresponding chromosomes contain genes with identical
extrema.
```javascript
@ -148,7 +166,8 @@ class Chromosome
}
```
Creating a random population is pretty straight forward if implemented a method to create a random clone of a chromosome.
Creating a random population is pretty straight forward if implemented
a method to create a random clone of a chromosome.
```javascript
/**
@ -170,16 +189,17 @@ const createRandomPopulation = function(geneticChromosome, populationSize)
};
```
This is where nearly all the domain information is introduced.
After you define what types of genes are found on each chromosome, you can create an entire population.
In this example all genes contain values ranging between one and ten.
This is where nearly all the domain information is introduced. After
you define what types of genes are found on each chromosome, you can
create an entire population. In this example all genes contain values
ranging between one and ten.
```javascript
let gene1 = new Gene(1,10,10);
let gene2 = new Gene(1,10,0.4);
let geneList = [gene1, gene2];
let exampleOrganism = new Chromosome(geneList);
let exampleOrganism = new Chromosome(geneList);
let population = createRandomPopulation(genericChromosome, 100);
```
@ -187,10 +207,14 @@ let population = createRandomPopulation(genericChromosome, 100);
## Evaluate Fitness
Like all optimization problems, you need a way to evaluate the performance of a particular solution.
The cost function takes in a chromosome and evaluates how close it got to the ideal solution.
This particular example it is just computing the [Manhattan Distance](https://en.wiktionary.org/wiki/Manhattan_distance) to a random 2D point.
I chose two dimensions because it is easy to graph, however, real applications may have dozens of genes on each chromosome.
Like all optimization problems, you need a way to evaluate the
performance of a particular solution. The cost function takes in a
chromosome and evaluates how close it got to the ideal solution. This
particular example it is just computing the [Manhattan
Distance](https://en.wiktionary.org/wiki/Manhattan_distance) to a
random 2D point. I chose two dimensions because it is easy to graph,
however, real applications may have dozens of genes on each
chromosome.
```javascript
let costx = Math.random() * 10;
@ -209,9 +233,11 @@ const basicCostFunction = function(chromosome)
## Selection
Selecting the best performing chromosomes is straightforward after you have a function for evaluating the performance.
This code snippet also computes the average and best chromosome of the population to make it easier to graph and define
the stopping point for the algorithm's main loop.
Selecting the best performing chromosomes is straightforward after you
have a function for evaluating the performance. This code snippet also
computes the average and best chromosome of the population to make it
easier to graph and define the stopping point for the algorithm's main
loop.
```javascript
/**
@ -249,10 +275,12 @@ const naturalSelection = function(population, keepNumber, fitnessFunction)
};
```
You might be wondering how I sorted the list of JSON objects - not a numerical array.
I used the following function as a comparator for JavaScript's built in sort function.
This comparator will compare objects based on a specific attribute that you give it.
This is a very handy function to include in all of your JavaScript projects for easy sorting.
You might be wondering how I sorted the list of JSON objects - not a
numerical array. I used the following function as a comparator for
JavaScript's built in sort function. This comparator will compare
objects based on a specific attribute that you give it. This is a very
handy function to include in all of your JavaScript projects for easy
sorting.
```javascript
/**
@ -281,15 +309,16 @@ function predicateBy(prop)
## Reproduction
The process of reproduction can be broken down into Pairing and Mating.
The process of reproduction can be broken down into Pairing and
Mating.
### Pairing
Pairing is the process of selecting mates to produce offspring.
A typical approach will separate the population into two segments of mothers and fathers.
You then randomly pick pairs of mothers and fathers to produce offspring.
It is ok if one chromosome mates more than once.
It is just important that you keep this process random.
Pairing is the process of selecting mates to produce offspring. A
typical approach will separate the population into two segments of
mothers and fathers. You then randomly pick pairs of mothers and
fathers to produce offspring. It is ok if one chromosome mates more
than once. It is just important that you keep this process random.
```javascript
/**
@ -317,22 +346,26 @@ const matePopulation = function(population, desiredPopulationSize)
### Mating
Mating is the actual act of forming new chromosomes/organisms based on your previously selected pairs.
From my research, there are two major forms of mating: blending, crossover.
Mating is the actual act of forming new chromosomes/organisms based on
your previously selected pairs. From my research, there are two major
forms of mating: blending, crossover.
Blending is typically the most preferred approach to mating when dealing with continuous variables.
In this approach you combine the genes of both parents based on a random factor.
Blending is typically the most preferred approach to mating when
dealing with continuous variables. In this approach you combine the
genes of both parents based on a random factor.
$$
c_{new} = r * c_{mother} + (1-r) * c_{father}
$$
The second offspring simply uses (1-r) for their random factor to adjust the chromosomes.
The second offspring simply uses (1-r) for their random factor to
adjust the chromosomes.
Crossover is the simplest approach to mating.
In this process you clone the parents and then you randomly swap *n* of their genes.
This works fine in some scenarios; however, this severely lacks the genetic diversity of the genes because you now have to solely
rely on mutations for changes.
Crossover is the simplest approach to mating. In this process you
clone the parents and then you randomly swap *n* of their genes. This
works fine in some scenarios; however, this severely lacks the genetic
diversity of the genes because you now have to solely rely on
mutations for changes.
```javascript
/**
@ -373,14 +406,16 @@ const blendGene = function(gene1, gene2, blendCoef)
## Mutation
Mutations are random changes to an organisms DNA.
In the scope of genetic algorithms, it helps our population converge on the correct solution.
Mutations are random changes to an organisms DNA. In the scope of
genetic algorithms, it helps our population converge on the correct
solution.
You can either adjust genes by a factor resulting in a smaller change or, you can
change the value of the gene to be something completely random.
Since we are using the blending technique for reproduction, we already have small incremental changes.
I prefer to use mutations to randomly change the entire gene since it helps prevent the algorithm
from settling on a local minimum rather than the global minimum.
You can either adjust genes by a factor resulting in a smaller change
or, you can change the value of the gene to be something completely
random. Since we are using the blending technique for reproduction, we
already have small incremental changes. I prefer to use mutations to
randomly change the entire gene since it helps prevent the algorithm
from settling on a local minimum rather than the global minimum.
```javascript
@ -408,11 +443,13 @@ const mutatePopulation = function(population, mutatePercentage)
## Immigration
Immigration or "new blood" is the process of dumping random organisms into your population at each generation.
This prevents us from getting stuck in a local minimum rather than the global minimum.
There are more advanced techniques to accomplish this same concept.
My favorite approach (not implemented here) is raising **x** populations simultaneously and every **y** generations
you take **z** organisms from each population and move them to another population.
Immigration or "new blood" is the process of dumping random organisms
into your population at each generation. This prevents us from getting
stuck in a local minimum rather than the global minimum. There are
more advanced techniques to accomplish this same concept. My favorite
approach (not implemented here) is raising **x** populations
simultaneously and every **y** generations you take **z** organisms
from each population and move them to another population.
```javascript
/**
@ -432,7 +469,8 @@ const newBlood = function(population, immigrationSize)
## Putting It All Together
Now that we have all the ingredients for a genetic algorithm we can piece it together in a simple loop.
Now that we have all the ingredients for a genetic algorithm we can
piece it together in a simple loop.
```javascript
/**
@ -487,11 +525,14 @@ const runGeneticOptimization = function(geneticChromosome, costFunction,
## Running
Running the program is pretty straight forward after you have your genes and cost function defined.
You might be wondering if there is an optimal configuration of parameters to use with this algorithm.
The answer is that it varies based on the particular problem.
Problems like the one graphed by this website perform very well with a low mutation rate and a high population.
However, some higher dimensional problems won't even converge on a local answer if you set your mutation rate too low.
Running the program is pretty straight forward after you have your
genes and cost function defined. You might be wondering if there is an
optimal configuration of parameters to use with this algorithm. The
answer is that it varies based on the particular problem. Problems
like the one graphed by this website perform very well with a low
mutation rate and a high population. However, some higher dimensional
problems won't even converge on a local answer if you set your
mutation rate too low.
```javascript
let gene1 = new Gene(1,10,10);
@ -499,17 +540,15 @@ let gene1 = new Gene(1,10,10);
let geneN = new Gene(1,10,0.4);
let geneList = [gene1,..., geneN];
let exampleOrganism = new Chromosome(geneList);
let exampleOrganism = new Chromosome(geneList);
costFunction = function(chromosome)
{
var d =...;
//compute cost
return d;
}
costFunction = function(chromosome) { var d =...; //compute
cost return d; }
runGeneticOptimization(exampleOrganism, costFunction, 100, 50, 0.01, 0.3, 20, 10);
```
The complete code for the genetic algorithm and the fancy JavaScript graphs can be found in my [Random Scripts GitHub Repository](https://github.com/jrtechs/RandomScripts).
In the future I may package this into an [npm](https://www.npmjs.com/) package.
The complete code for the genetic algorithm and the fancy JavaScript
graphs can be found in my [Random Scripts GitHub
Repository](https://github.com/jrtechs/RandomScripts). In the future I
may package this into an [npm](https://www.npmjs.com/) package.

+ 35
- 22
blogContent/posts/data-science/r-programming-language.md View File

@ -1,39 +1,52 @@
R is a programming language designed for statistical analysis and graphics.
Since R has been around since 1992, it has developed a large community and has over [13 thousand packages](https://cran.r-project.org/web/packages/) publicly available.
What is really cool about R is that it is an open source [GNU](http://www.gnu.org/) project.
R is a programming language designed for statistical analysis and
graphics. Since R has been around since 1992, it has developed a large
community and has over [13 thousand
packages](https://cran.r-project.org/web/packages/) publicly
available. What is really cool about R is that it is an open source
[GNU](http://www.gnu.org/) project.
# R Syntax and Paradigms
The syntax of R is C esk with its use of curly braces.
The type system of R is similar to Python where it can infer what type you are using.
This "lazy" type system allows for "faster" development since you don't have to worry about declaring types -- this laziness makes it harder to debug and read your code.
The type system of R is rather strange and distinctly different from most other languages.
For starters, integers are represented as vectors of length 1.
These things may feel weird at first, but, R's type system is one of the things that make it a great tool for manipulating data.
The syntax of R is C esk with its use of curly braces. The type
system of R is similar to Python where it can infer what type you are
using. This "lazy" type system allows for "faster" development since
you don't have to worry about declaring types -- this laziness makes
it harder to debug and read your code. The type system of R is rather
strange and distinctly different from most other languages. For
starters, integers are represented as vectors of length 1. These
things may feel weird at first, but, R's type system is one of the
things that make it a great tool for manipulating data.
![R Arrays Start at 1](media/r/arrays.jpg)
Did I mention that arrays start at 1?
Technically, the thing which we refer to as an array in Java are really vectors in R.
Arrays in R are data objects which can store data in more than two dimensions.
Since R tries to follow mathematical notation, indexing starts at 1 -- just like in linear algebra.
Using zero based indexing makes sense for languages like C because the index is used to get at a particular memory location from a pointer.
Did I mention that arrays start at 1? Technically, the thing which we
refer to as an array in Java are really vectors in R. Arrays in R are
data objects which can store data in more than two dimensions. Since R
tries to follow mathematical notation, indexing starts at 1 -- just
like in linear algebra. Using zero based indexing makes sense for
languages like C because the index is used to get at a particular
memory location from a pointer.
<youtube src="s3FozVfd7q4" />
I don't have the time to go over the basic syntax of R in a single blog post; however, I feel that this youtube video does a pretty good job.
I don't have the time to go over the basic syntax of R in a single
blog post; however, I feel that this youtube video does a pretty good
job.
# R Markdown
One of my favorite aspects of R is its markdown language called Rmd.
Rmd is essentially markdown which has can have embedded R scripts run in it.
The Rmd file is compiled down to a markdown file which is converted to either a PDF, HTML file, or a slide show using pandoc.
You can provide options for the pandoc render using a YAMAL header in the Rmd file.
This is an amazing tool for creating reports and writing research papers.
The documents which you create are reproducible since you can share the source code to it.
If the data which you are using changes, you simply have to recompile to document to get an updated view.
You no longer have to re-generate a dozen graphs and update figures and statistics across your document.
Rmd is essentially markdown which has can have embedded R scripts run
in it. The Rmd file is compiled down to a markdown file which is
converted to either a PDF, HTML file, or a slide show using pandoc.
You can provide options for the pandoc render using a YAMAL header in
the Rmd file. This is an amazing tool for creating reports and writing
research papers. The documents which you create are reproducible since
you can share the source code to it. If the data which you are using
changes, you simply have to recompile to document to get an updated
view. You no longer have to re-generate a dozen graphs and update
figures and statistics across your document.
# Resources

+ 64
- 75
blogContent/posts/open-source/the-essential-vim-configuration.md View File

@ -1,31 +1,27 @@
# Vim Configuration
Stock Vim is pretty boring.
The good news is that Vim has a very comprehensive configuration file which
allows you to tweak it to your heart's content.
To make changes to Vim you simply modify the ~/.vimrc file in your home
directory.
By adding simple commands this file you can easily change the way your
text editor looks.
Neat.
Stock Vim is pretty boring. The good news is that Vim has a very
comprehensive configuration file which allows you to tweak it to your
heart's content. To make changes to Vim you simply modify the ~/.vimrc
file in your home directory. By adding simple commands this file you
can easily change the way your text editor looks. Neat.
I attempted to create the smallest Vim configuration file which makes
Vim usable enough for me to use as my daily text editor.
I believe that it is important for everyone to know what their
Vim configuration does.
This knowledge will help ensure that you are only adding the things
you want and that you can later customize it for your workflow.
Vim usable enough for me to use as my daily text editor. I believe
that it is important for everyone to know what their Vim configuration
does. This knowledge will help ensure that you are only adding the
things you want and that you can later customize it for your workflow.
Although it may be tempting to download somebody else's massive Vim
configuration, I argue that this can lead to problems down the road.
configuration, I argue that this can lead to problems down the road.
I want to mention that I don't use Vim as my primary
IDE; I only use Vim as a text editor.
I tend to use JetBrains tools on larger projects since they have amazing
auto complete functionality, build tools, and comprehensive error detection.
There are great Vim configurations out there on the internet; however, most
tend to be a bit overkill for what most people want to do.
I want to mention that I don't use Vim as my primary IDE; I only use
Vim as a text editor. I tend to use JetBrains tools on larger projects
since they have amazing auto complete functionality, build tools, and
comprehensive error detection. There are great Vim configurations out
there on the internet; however, most tend to be a bit overkill for
what most people want to do.
Alright, lets dive into my vim configuration!
Alright, lets dive into my vim configuration!
# Spell Check
@ -35,34 +31,32 @@ autocmd BufRead,BufNewFile *.md setlocal spell spelllang=en_us
autocmd BufRead,BufNewFile *.txt setlocal spell spelllang=en_us
```
Since I am often an atrocious speller, having basic spell check abilities in
Vim is a lifesaver.
It does not make sense to have spell check enabled for most files since it
would light up most programming files like a Christmas tree.
I have my Vim configuration set to automatically enable spell check for markdown files
and basic text files.
If you need spell check in other files, you can enter the command
":set spell" to enable spell check for that file.
To see the spelling recommendations, type "z=" when you are over a
highlighted word.
Since I am often an atrocious speller, having basic spell check
abilities in Vim is a lifesaver. It does not make sense to have spell
check enabled for most files since it would light up most programming
files like a Christmas tree. I have my Vim configuration set to
automatically enable spell check for markdown files and basic text
files. If you need spell check in other files, you can enter the
command ":set spell" to enable spell check for that file. To see the
spelling recommendations, type "z=" when you are over a highlighted
word.
# Appearance
Adding colors to Vim is fun.
The "syntax enable" command tells vim to highlight keywords in programming
files and other structured files.
Adding colors to Vim is fun. The "syntax enable" command tells vim to
highlight keywords in programming files and other structured files.
```vim
syntax enable
```
I would encourage everyone to look at the different color schemes available for
Vim.
I threw the color scheme command in a try-catch block to ensure that it does not crash
Vim if you don't have the color scheme installed.
By default the desert color scheme is installed; however, that is not always the
case for [community created](http://vimcolors.com/) Vim color schemes.
I would encourage everyone to look at the different color schemes
available for Vim. I threw the color scheme command in a try-catch
block to ensure that it does not crash Vim if you don't have the color
scheme installed. By default the desert color scheme is installed;
however, that is not always the case for [community
created](http://vimcolors.com/) Vim color schemes.
```vim
try
@ -70,13 +64,12 @@ try
catch
endtry
set background=dark
```
set background=dark ```
# Indentation and Tabs
Having your indentation settings squared away will save you a ton of time
if you are doing any programming in Vim.
Having your indentation settings squared away will save you a ton of
time if you are doing any programming in Vim.
```vim
"copy indentation from current line when making a new line
@ -84,28 +77,25 @@ set autoindent
" Smart indentation when programming: indent after {
set smartindent
set tabstop=4 " number of spaces per tab
set expandtab " convert tabs to spaces
set shiftwidth=4 " set a tab press equal to 4 spaces
```
set tabstop=4 " number of spaces per tab set expandtab "
convert tabs to spaces set shiftwidth=4 " set a tab press equal to 4
spaces ```
# Useful UI Tweaks
These are three UI tweaks that I find really useful to have, some people may
have different opinions on these.
Seeing line numbers is useful since programming errors typically just
tells you what line your program went up in flames.
The cursor line is useful since it allows you to easily to find your place
in the file -- this may be a bit too much for some people.
I like to keep every line under 80 characters long for technical files,
having a visual queue for this is helpful.
Some people prefer to just use the auto word wrap and keep their lines as long
as they like.
I like to keep to the 80 character limit and explicitly choose where
I cut each line.
Some of my university classes mandate the 80 character limit and take
points off if you don't follow it.
These are three UI tweaks that I find really useful to have, some
people may have different opinions on these. Seeing line numbers is
useful since programming errors typically just tells you what line
your program went up in flames. The cursor line is useful since it
allows you to easily to find your place in the file -- this may be a
bit too much for some people.
I like to keep every line under 80 characters long for technical
files, having a visual queue for this is helpful. Some people prefer
to just use the auto word wrap and keep their lines as long as they
like. I like to keep to the 80 character limit and explicitly choose
where I cut each line. Some of my university classes mandate the 80
character limit and take points off if you don't follow it.
```vim
" Set Line Numbers to show "
@ -121,7 +111,7 @@ set colorcolumn=80
# Searching and Auto Complete
This these configurations make searching in Vim less painful.
This these configurations make searching in Vim less painful.
```vim
" search as characters are entered "
@ -133,8 +123,8 @@ set hlsearch
set ignorecase
```
These configurations will make command completion easier by
showing an auto-complete menu when you press tab.
These configurations will make command completion easier by showing
an auto-complete menu when you press tab.
```vim
" Shows a auto complete menu when you are typing a command "
@ -147,11 +137,10 @@ set wildignorecase " ignore case for auto complete
# Useful Things to Have
There is nothing too earth shattering in this section, just things that
might save you some time.
Enabling mouse support is a really interesting configuration.
When enabled, this allows you to select text and jump between different
locations with your mouse.
There is nothing too earth shattering in this section, just things
that might save you some time. Enabling mouse support is a really
interesting configuration. When enabled, this allows you to select
text and jump between different locations with your mouse.
```vim
" Enables mouse support "
@ -170,7 +159,7 @@ set autoread
set lazyredraw
```
Setting your file format is always a good idea for compatibility.
Setting your file format is always a good idea for compatibility.
```vim
" Set utf8 as standard encoding and en_US as the standard language "
@ -183,6 +172,6 @@ set ffs=unix,dos,mac
# Wrapping it up
I hope that this quick blog post inspired you to maintain your own Vim
configuration file.
You can find my current configuration files in my
[random scripts repository](https://github.com/jrtechs/RandomScripts/tree/master/config).
configuration file. You can find my current configuration files in my
[random scripts
repository](https://github.com/jrtechs/RandomScripts/tree/master/config).

+ 10
- 7
blogContent/posts/other/2018-in-review.md View File

@ -1,7 +1,10 @@
Inspired by [Justin Flory](https://justinwflory.com/) and [Dan Schneiderman](http://www.schneidy.com),
I decided to make a 2018 review post. I believe that it would be a good way to reflect upon what I did
in 2018 and make plans for 2019. This post will be a very high level overview of the projects and
activities that I did in 2018 -- nothing personal. Pictures say a thousand words, so, I will include a lot.
Inspired by [Justin Flory](https://justinwflory.com/) and [Dan
Schneiderman](http://www.schneidy.com), I decided to make a 2018
review post. I believe that it would be a good way to reflect upon
what I did in 2018 and make plans for 2019. This post will be a very
high level overview of the projects and activities that I did in 2018
-- nothing personal. Pictures say a thousand words, so, I will include
a lot.
# January:
@ -11,7 +14,7 @@ activities that I did in 2018 -- nothing personal. Pictures say a thousand words
**Started Second Semester of College**
Classes:
Classes:
- Mechanics of Programming
- Statistics
@ -92,9 +95,9 @@ Classes:
**Second Year of College**
First year on the Eboard of RITlug as Vice President.
First year on the Eboard of RITlug as Vice President.
Classes:
Classes:
- Linear Algebra
- Analysis Of Algorithms

+ 87
- 65
blogContent/posts/other/morality-of-self-driving-cars.md View File

@ -1,84 +1,106 @@
<youtube src="_MFGx8d1zl0" />
Although the movie *I Robot* has not aged well, it still brings up some interesting ethical questions
that we are still discussing concerning self driving cars. The protagonist Detective Spooner
has an almost unhealthy amount of distrust towards
robots. In the movie, a robot decided to save Spooner's life over a 12 year old girl in a car accident.
This ignites the famous ethical debate of the trolley problem, but, now with artificial intelligence.
The debate boils down to this: are machines capable of making moral decisions. The
surface level answer from the movie is presented as **no** when Spooner's presents car crash antidote.
This question parallels the discussion that we are currently having with self driving cars.
When a self driving car is presented with two options which result in the loss of life,
what should it choose?
Although the movie *I Robot* has not aged well, it still brings up
some interesting ethical questions that we are still discussing
concerning self driving cars. The protagonist Detective Spooner has
an almost unhealthy amount of distrust towards robots. In the movie, a
robot decided to save Spooner's life over a 12 year old girl in a car
accident. This ignites the famous ethical debate of the trolley
problem, but, now with artificial intelligence. The debate boils down
to this: are machines capable of making moral decisions. The surface
level answer from the movie is presented as **no** when Spooner's
presents car crash antidote. This question parallels the discussion
that we are currently having with self driving cars. When a self
driving car is presented with two options which result in the loss of
life, what should it choose?
<iframe width="100%" height="315" src="https://www.youtube.com/embed/ixIoDYVfKA0" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
When surveyed, most people say that they would prefer to have self driving cars take the utilitarian
approach towards the trolley problem. A utilitarian approach would try to minimize the
total amount of harm. MIT made a neat [website](http://moralmachine.mit.edu/) where it presents you with a
bunch of "trolley problems" where you have to decide who dies. At the end of the survey the
website presents you with a list of observed preferences you made when deciding who's life was more important to save.
The purpose of the trolley problem is merely to ponder what decision a self driving car
should make when **all** of its alternatives are depleted.
When surveyed, most people say that they would prefer to have self
driving cars take the utilitarian approach towards the trolley
problem. A utilitarian approach would try to minimize the total
amount of harm. MIT made a neat
[website](http://moralmachine.mit.edu/) where it presents you with a
bunch of "trolley problems" where you have to decide who dies. At the
end of the survey the website presents you with a list of observed
preferences you made when deciding who's life was more important to
save. The purpose of the trolley problem is merely to ponder what
decision a self driving car should make when **all** of its
alternatives are depleted.
![Moral Machine](media/selfDrivingCars/moralmachine3.png)
We still need to question whether
utilitarianism is the right moral engine for self driving cars. Would it be ethical
for a car to take into account
you age, race, gender, and social status when deciding if you get to live?
If self driving cars could access personal information such as criminal history or known friends, would it
be ethical to use that information? Would it be moral for
someone to make a car which favored the safety of the passengers of the car above
others?
We still need to question whether utilitarianism is the right moral
engine for self driving cars. Would it be ethical for a car to take
into account you age, race, gender, and social status when deciding
if you get to live? If self driving cars could access personal
information such as criminal history or known friends, would it be
ethical to use that information? Would it be moral for someone to make
a car which favored the safety of the passengers of the car above
others?
![Moral Machine](media/selfDrivingCars/moralMachine.png)
Even though most people want self driving cars to use utilitarianism, most people surveyed also responded
that they would not buy a car which did not have their safety as its top priority.
This brings up a serious social dilemma. If people want everyone else's cars to be utilitarians,
yet, have their own cars be greedy and favor their safety, we would see none of the utilitarian improvements. This
presented us with the tragedy of the commons problem since everyone would favor their own
safety and nobody would sacrifice their safety for the public good. This brings up yet another question:
would it be fair to ask someone to sacrifice their safety in this way?
In most cases, when a tragedy of the commons situation is presented, government intervention is
the most piratical solution. It might be the best to have the government
mandate that all cars try to maximize the amount of life saved when a car is presented with the
trolley problem. Despite appearing to be a good solution, the flaw in this does not become apparent before you us
consequentialism to examine this problem.
Even though most people want self driving cars to use utilitarianism,
most people surveyed also responded that they would not buy a car
which did not have their safety as its top priority. This brings up a
serious social dilemma. If people want everyone else's cars to be
utilitarians, yet, have their own cars be greedy and favor their
safety, we would see none of the utilitarian improvements. This
presented us with the tragedy of the commons problem since everyone
would favor their own safety and nobody would sacrifice their safety
for the public good. This brings up yet another question: would it be
fair to ask someone to sacrifice their safety in this way?
In most cases, when a tragedy of the commons situation is presented,
government intervention is the most piratical solution. It might be
the best to have the government mandate that all cars try to maximize
the amount of life saved when a car is presented with the trolley
problem. Despite appearing to be a good solution, the flaw in this
does not become apparent before you us consequentialism to examine
this problem.
![Moral Machine](media/selfDrivingCars/moralMachine6.png)
Self driving cars are expected to reduce car accidents by 90% by eliminating human error. If people
decide to not use self driving cars due to the utilitarian moral engine, we run the
risk of actually loosing more lives. Some people have actually argued that since
artificial intelligence is incapable of making moral decisions, they should take
no action at all when there is a situation which will always results in the loss of life.
In the frame of the trolley problem,
it is best for the artificial intelligence to not pull the lever. I will argue that
it is best for self driving cars to not make ethical
decisions because, it would result in the highest adoption rate of self driving cars. This would end up
saving the most lives in the long run. Plus, the likelihood that a car is actually presented with
a trolley problem is pretty slim.
The discussion over the moral decisions a car has to make is almost fruitless. It turns out
that humans are not even good at making moral decisions in emergency situations. When we make rash decisions
influenced by anxiety, we are heavily influenced by prejudices and self motives. Despite our own shortcomings when it
comes to decision making, that does not mean that we can not do better with self driving cars. However,
we need to realize that it is the mass adoption of self driving cars which will save the most lives, not
the moral engine which we program the cars with. We can not let the moral engine of the self driving
cars get in the way of adoption.
The conclusion that I made parallels Spooner's problem with robots in the movie *I Robot*. Spooner was so mad at the robots for
saving his own life rather than the girl's, he never realized that if it was not for the robots, neither of them would
have survived that car crash. Does that mean we can't do better than not pulling the lever? Well... not exactly.
Near the end of the movie a robot was presented with another trolley problem, but, this time he managed to
find a way which saved both parties. Without reading into this movie too deep, this illustrates how the early
adoption of artificial intelligence ended up saving tons of lives like Spooners. It is only when the technology fully develops
is when we can start to avoid the trolley problem completely.
Self driving cars are expected to reduce car accidents by 90% by
eliminating human error. If people decide to not use self driving cars
due to the utilitarian moral engine, we run the risk of actually
loosing more lives. Some people have actually argued that since
artificial intelligence is incapable of making moral decisions, they
should take no action at all when there is a situation which will
always results in the loss of life. In the frame of the trolley
problem, it is best for the artificial intelligence to not pull the
lever. I will argue that it is best for self driving cars to not make
ethical decisions because, it would result in the highest adoption
rate of self driving cars. This would end up saving the most lives in
the long run. Plus, the likelihood that a car is actually presented
with a trolley problem is pretty slim.
The discussion over the moral decisions a car has to make is almost
fruitless. It turns out that humans are not even good at making moral
decisions in emergency situations. When we make rash decisions
influenced by anxiety, we are heavily influenced by prejudices and
self motives. Despite our own shortcomings when it comes to decision
making, that does not mean that we can not do better with self driving
cars. However, we need to realize that it is the mass adoption of self
driving cars which will save the most lives, not the moral engine
which we program the cars with. We can not let the moral engine of the
self driving cars get in the way of adoption.
The conclusion that I made parallels Spooner's problem with robots in
the movie *I Robot*. Spooner was so mad at the robots for saving his
own life rather than the girl's, he never realized that if it was not
for the robots, neither of them would have survived that car crash.
Does that mean we can't do better than not pulling the lever? Well...
not exactly. Near the end of the movie a robot was presented with
another trolley problem, but, this time he managed to find a way which
saved both parties. Without reading into this movie too deep, this
illustrates how the early adoption of artificial intelligence ended up
saving tons of lives like Spooners. It is only when the technology
fully develops is when we can start to avoid the trolley problem
completely.

Loading…
Cancel
Save