Author: Edward Huskin
Implementing a neural network from scratch is a deeply valuable exercise, especially for someone just picking up on the fundamentals of machine learning.
Once all the concepts have sunk in, writing a library that does something as common as image classification, and OCR is basically attempting to reinvent the wheel. Rather than spend days to weeks writing such software and training it, most people are better off using a machine learning framework.
A machine learning framework can be thought of as a pre-built wrapper that abstracts all the complicated aspects of writing and optimizing a library on your own.
TensorFlow is by far the most popular machine learning framework in the world. TensorFlow wasn’t the first ML framework on the market, but managed to get to the top because it’s supported by a large company, has great documentation and is extremely powerful.
TensorFlow is a great fit for almost any kind of ML project – it’s amazing at text-based ML, image recognition and even video analysis. Added to the fact that it had the largest ML community online, it’s also easy to pick up.
However, it works best for people that already have a solid grasp on the mathematical aspects of machine learning like linear algebra and calculus. It’s a lot more low-level than other libraries out there, so not very beginner-friendly.
MLlib is a machine learning library maintained by the Apache Foundation that’s meant for data crunching tasks in warehouses. It’s currently maintained as part of Apache Spark, which allows for it to be used together with Spark or Hadoop for big data collection, sorting, and classification.
One of its greatest advantages lies in the fact that it is interoperable with most popular big data platforms – Kubernetes, Mesos, Spark, Hadoop, and EC2. It is also compatible with hundreds of different data sources, including Hive and HBase.
In addition, it is extremely fast at processing data, provides support for in-memory computing and is extremely fault-tolerant.
It does tend to have very high memory consumption, lacks a concrete file management system and high latency.
Keras is a deep learning library that was built with simplicity more than anything in mind. It’s not a library on its own but runs on top of other frameworks such as TensorFlow and Theano. For people that don’t like to interact with TensorFlow’s low-level design paradigm, Keras high-level, modular design is a perfect fit. This design paradigm makes Keras ideal for experimentation and writing quick prototypes.
The biggest downside to this framework is that it bears an understanding of what runs underneath it. For example, if you decide to run it on top of Theano, you should have at least a basic understanding of what the library is trying to accomplish. You might run into a low-level error that warrants delving into Theano’s code to understand what’s going on.
Unlike TensorFlow and Keras that can provide some level of high-level abstraction for those that need it, Pytorch was designed to focus on direct work with array expressions. TensorFlow currently holds the mantle for most popular ML library, but Pytorch isn’t very far behind.
It has gained traction for the freedom it allows for writing custom layers and optimize numerical tasks. It’s often compared with Keras because of the level of abstraction it provides, but is a lot less limiting, allowing for more complex architectures to be built.
The real game-changer when it comes to Pytorch is the in-built support for dynamic neural networks. Unlike TensorFlow where you’d have to start all over from scratch if you want to change the behavior of your model, Pytorch allows you to tweak the neural network on-the-fly.
A feature that Pytorch shares with other neural networks is the distribution of computational work among multiple CPU or GPU cores. Pytorch stands out from TensorFlow, for instance, because this feature is a lot more easily achieved.
Scikit-learn is a library that’s used for simple machine learning tasks, such as data mining and analysis. It’s a library that’s easy to get into and familiarize oneself with because it is built on top of popular libraries such as SciPy, NumPy, and Matplotlib.
Scikit-learn implements various different versions of dimensionality reduction and clustering that make it better suited for supervised rather than unsupervised learning. As such, Scikit-learn isn’t very useful for deep learning tasks. However, it’s going to be more than enough for most scenarios, as TensorFlow tends to be overkill in such cases.
6. Amazon ML
Amazon ML is a cloud-based service that has gained prominence for its ability to construct mathematical models using data it is provided with. It’s an API-based ML service that also provides visualization tools that would otherwise be difficult for beginners to write on their own using non-deployable languages such as R.
Essentially, it takes away the need to learn complex algorithms in order to write additional code or manage infrastructure on your own.
What’s the Best Machine Learning Framework?
There is never going to be a one-size-fits-all solution to machine learning. Most frameworks and libraries are built in such a way that they tend to be good at one or more applications while sacrificing some other crucial applications.
Your choice of library is going to depend on your level of familiarity with different machine learning tools, and concepts, your particular use case, time frame, and the kinds of resources you have at your disposal.
For instance, TensorFlow is by far the most popular ML framework. It is incredibly versatile and a go-to tool for people working with images. However, it has a terribly steep learning curve and requires mastery of mathematical concepts like calculus.
On the other hand, Scikit-learn isn’t nearly as versatile but is incredibly easy to pick up and is enough for most small-scale applications. In fact, TensorFlow often tends to be an overkill.
If your use case involves working with big data, MLlib, together with Spark or your favorite big data library are indisposable.
Image credit: freepic