The complexities of machine learning in data science

Machine learning served as API

Machine learning is no longer just for geeks. Today, any programmer can call some APIs and include them as part of his work. With the Amazon cloud, with Google Cloud Platforms (GCP) and many more such platforms, in the coming days and years we can easily see machine learning models now being offered to you in API forms. So all you have to do is work on your data, clean it, and convert it into a format that can ultimately be fed into a machine learning algorithm that is nothing more than an API. So, it becomes plug and play. You connect the data to an API call, the API goes back to the computing machines, comes back with the predictive results, and then takes an action based on that.

Machine learning: some use cases

Things like facial recognition, voice recognition, identifying a file that is a virus, or predicting what the weather will be like today and tomorrow, all these uses are possible in this mechanism. But obviously there is someone who has put a lot of work into making sure these APIs are available. If, for example, we take facial recognition, there’s been a lot of work in the image processing area where you take an image, you train your model on the image, and finally you can come out with a very generalized model that can work on some new type of data that will come in the future and that you have not used to train your model. And this is typically how machine learning models are built.

The case of antivirus software

All of your antivirus software, usually in the case of identifying a file as malicious or good, benign or safe, and most antiviruses have now moved from static signature-based virus identification to dynamic machine learning-based detection to identify virus So more and more when you use antivirus software you know that most antivirus software gives you updates and these updates in the early days used to be in the signature of viruses. But today these signatures become machine learning models. And when there is a new virus update, you have to completely retrain the model you already had. You need to retrain your mode to know that this is a new virus on the market and on your machine. The way machine learning can do that is that every malware or virus file has certain characteristics associated with it. For example, a Trojan can get to his machine, the first thing he does is create a hidden folder. The second thing he does is copy some dlls. The moment a malicious program starts to perform any action on your machine, it leaves its traces and this helps to reach them.