“Using TensorFlow makes me feel like I’m not smart enough to use TensorFlow” – Rachel Thomas, Professor at the University of San Francisco and co-founder of Fast.ai. If you are wondering what is TensorFlow, it is one of the most popular deep learning framework originally developed by Google (if you would like to know more, here’s the link to an article I wrote about TensorFlow a couple of weeks back: Getting Started with TensorFlow and Deep Learning).
Using the native TensorFlow API for deep learning is no easy task, especially if you are a beginner. The main reason for this is the relatively low-level of abstraction offered by the TensorFlow API, which makes it extremely hard, if not impossible, to express complex ideas. However, things changed for the better in early 2017, when Francois Chollet, author of Keras and AI researcher at Google, revealed that work was already underway to develop a TensorFlow implementation of the Keras API specification. Keras is an extremely user friendly high-level API for building and training deep learning models; and making TensorFlow accessible natively via the Keras API means that developers new to machine learning can now get started with TensorFlow and deep learning very quickly, without sacrificing the flexibility and performance of TensorFlow. As a matter of fact, when TensorFlow 1.4 was released (we are awaiting the release of version 2.0 anytime now), it included an early implementation of the Keras API (tf.keras), which made it possible to build, train and test a pretty complex deep learning model (such as a classifier for handwritten digits) in just six to ten lines of Python code!
So where does Uber fit into all of this? It turns out that Uber’s Artificial Intelligence team was also working on a similar initiative (i.e. simplifying the development and use of deep learning models), codenamed Ludwig, over the past two years, but at an even higher level of abstraction – i.e. deep learning using TensorFlow without actually having to write a single line of code! And more importantly, Uber announced this week that they are open sourcing Ludwig! At a first glance, Ludwig looks pretty impressive as a toolkit for deep learning, and I look forward to trying it out (I will report back once I have run some of my own tests). But in the meantime, here is a list of some of the coolest features of Ludwig.
- You can run Ludwig from the command line by providing just a tabular file (like CSV) containing the data and a YAML configuration file that specifies which columns of the tabular file are input features and which are output target variables. As an example, you can train a model as follows:
- In the model definition file, all you have to do is specify the names of the columns in the CSV file that are inputs to your model and their datatypes, and names of columns in the CSV file that will be the outputs (i.e. the target variables which the model will learn to predict). Here’s an example of a model definition file for a text classifier built using a parallel Convolutional Neural Network (pCNN). In this example, the CSV file contains two columns: (i) “text” of type text (which will be used as the input feature); and (ii) “class” of type category (which is the target variable that the model will learn to predict). It is that simple!
- The main new idea that Ludwig introduces is the notion of data type specific encoders and decoders, which results in a highly modularised and extensible architecture: each type of data supported (text, images, categories etc) has a specific preprocessing function. For instance, text can be encoded with a convolutional neural network (CNN) or a recurrent neural network (RNN). In short, encoders map the raw data to tensors, and decoders map tensors to the raw data.
- Ludwig currently supports the following nine datatypes: binary, numerical, category, set, bag, sequence, text, timeseries and image. However, Ludwig allows one to add additional datatypes, which will require programming skills (there is a developer’s guide available on Ludwig’s website).
- The model definition can contain additional information, in particular preprocessing information for each feature in the dataset, which encoder or decoder to use for each feature, architectural parameters for each encoder and decoder, and training parameters. Default values of preprocessing, training, and various model architecture parameters are based on Uber’s own experience or have been adapted from the academic literature, allowing novices to easily train complex models. At the same time, the ability to set each of them individually in the model configuration file offers full flexibility to experts. Each model trained with Ludwig is saved and can be loaded at a later time to obtain predictions on new data.
- Ludwig provides a set of model architectures (e.g. parallel CNN, LSTM, RNN and stacked CNN) that can be combined together to create an end-to-end model for a given use case.
- Ludwig also provides a simple Python API that allows you to programatically train or load a model and use it to obtain predictions on new data. Furthermore, you can train models on multiple GPUs locally and in a distributed fashion through the use of Horovod, an open source distributed training framework.
- After training, Ludwig creates a result directory containing the trained model with its hyperparameters and summary statistics of the training process. You can visualise them using one of the several visualisation options available with the visualise feature of Ludwig.
- In future releases, Uber’s AI team hope to – (i) add several new encoders for each data type, such as Transformer, ELMo, and BERT for text, and DenseNet and FractalNet for images; and (ii) introduce additional data types like audio, point clouds, and graphs, while at the same time integrating more scalable solutions for managing big data sets, like Petastorm.
Hats off to the Uber AI team!