| @ -0,0 +1,45 @@ | |||
| In this blog post I want to examine the ways in which anti virus programs currently employ machine learning and then go into the potential pitfalls that ML bring. | |||
| # ML In the Antivirus Industry | |||
| Most current maleware detection falls into two broad categories: static and dynamic analysis. | |||
| Static analysis looks at the program without actually running the code. | |||
| Static analysis typically looks at things like the file fingerprint, virus scanning, reverse engineering, memory artifacts, packer detection, and debugging. | |||
| Static analysis also encompasses looking up the hashes of the virus against a known database of viruses. | |||
| However, it is super easy to fool signature based malware detection using simple obfuscation methods. | |||
| Dynamic analysis is a technique where you run the program in a sandbox and monitor all the actions that the virus takes. | |||
| If you notice that the program is acting suspicious -ie changing the registry or making suspicious API calls- it is likely a virus. | |||
| Antivirus detection is very difficult, but, probably not for the reasons you think | |||
| The issue isn't writing programs which can detect these static or dynamic properties of viruses, that is the easy part. | |||
| It is also relatively easy to determine a general rule set for what makes a virus a virus. | |||
| You can easily whitelist suspicious domains, determine that certain file fingerprints hashes, and behaviours are virus like. | |||
| The real problem is that there are hundreds of thousands of maleware applications and more are created every day. | |||
| Not only are there tons of pesky maleware applications, there is an absurd amount of normal programs which we don't want maleware applications to block. | |||
| It is impossible for a small team of maleware researchers to create a definitive set of heuristics which can correctly identify all maleware programs. | |||
| This is where we turn to the field of Machine Learning. | |||
| Humans are bad with big data, but, computers absolutely love big data. | |||
| Most antivirus companies use machine learning and it has been a large success so far because it has allowed us to dramatically improve our ability to detect zero day viruses. | |||
| ## Interesting Examples | |||
| ### Cylance | |||
| [Cylance](https://www.cylance.com) uses supervised learning and static analysis to classify files as being maleware. This product pulls a list of attributes from the file which they can then compare against other known viruses. | |||
| ### MalwareBytes Anomalous | |||
| [Anomalous](https://blog.malwarebytes.com/detections/machinelearning-anomalous-100/) is a machine learning application which simply flags files which appear different from their training set of known normal files. | |||
| This does not attempt to classify what makes a virus a virus, but, what makes a normal program a normal program. | |||
| Anything which is not a normal program, it alerts you about since it is probably a virus. | |||
| ### Kaspersky | |||
| Kaspersky appears to have a ton of research into using machine learning for maleware detection. | |||
| I would highly recommend that you read their [white paper](https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf) on this subject. | |||
| # Why is this a problem? | |||
| It turns out that machine learning systems can be easily fooled by using [Generative Adversarial Networks](https://en.wikipedia.org/wiki/Generative_adversarial_network). Essentially what this boils down to is that you have two | |||