Automating the Data Scientist – is there any such thing?

Data science skills are likely to be more readily available within the next decade or so. If you are not convinced, just have a look at the list of engineering degrees offered by most universities today – you will almost certainly find at least one curriculum related to data science. But while we wait for the next generation of engineers and data scientists to finish their university degrees and gap years, in the meantime, shortage of data science skills in the workforce will continue to pose a major challenge for organisations in realising the potential of their data. So the obvious question is: why don’t we use machine learning techniques to create our own data scientists? It turns out that some data scientists have already been working on this idea in the recent years. Their objective: develop automated machine learning methods and processes that can make machine learning (ML) more readily available for non-ML experts, to improve efficiency of ML and to accelerate research on ML.

Computer programming is about automation, and machine learning is all about automating automation. Then, automated machine learning is the automation of automating automation.

Sebastian Rashka

The field of automated machine learning (AutoML) is evolving so quickly that there is no universally agreed-upon definition. Fundamentally, AutoML offers machine learning experts tools to automate repetitive tasks by applying ML to ML itself. A recent Google Research article (link) explains that –

The goal of automating machine learning is to develop techniques for computers to solve new machine learning problems automatically, without the need for human machine learning experts to intervene on every new problem. If we’re ever going to have truly intelligent systems, this is a fundamental capability that we will need.

Today, Google offers a paid service called Cloud AutoML that enables developers with limited machine learning expertise to train high-quality models specific to their business needs, by leveraging Google’s state-of-the-art machine learning technology. Cloud AutoML even provides a simple graphical user interface (GUI) for users to train, evaluate, improve, and deploy models based on their own data in just a few minutes! A number of Google pre-trained machine learning models are also accessible to users via this platform. For example, the “Cloud Vision” service allows developers to automatically understand the contents of images, classify them into thousands of categories (such as, “sailboat”), detect individual objects and faces within images, and read printed words contained within images. Similarly, using the “Cloud Natural Language” service, one can extract information about people, places and events from text documents, news articles, or blog posts; understand sentiment about their products on social media or parse intent from customer conversations happening in a call centre or a messaging app.


Links to Relevant Books

But Google is not alone in this game. The other technology giants, such as Microsoft, IBM and Baidu, are also offering AutoML type service via their Cloud platforms. And to make the AutoML party even more merrier, the open source community have come up with their own answer – AutoKeras, an open source python package written using the Keras deep learning library.

So yes, “automating the data scientist” is not a fantasy, it is very much real!


Links to Relevant Books

Leave a Reply

Your email address will not be published. Required fields are marked *