Machine learning (ML) is a branch of artificial intelligence (AI) in which models are created to predict an outcome by learning from patterns present in data. They can automatically improve their accuracy over time, and identify patterns without being explicitly programmed or told what to look for, and can then make predictions on previously unseen data.
Machine learning models are trained on historic datasets and used to make predictions on future data. As new data become available, the models are regularly re-trained to help them improve their predictions as patterns in the data change.
ML dates back to the 1940s, when McCulloch and Pitts created a model of the first neural network using an electrical circuit. Alan Turing created the Turing Test in 1950, and in 1952 Arthur Samuel created a running program that played checkers.
However, it wasn’t really until the 2000s that machine learning exploded in popularity. These days, there are constant rapid developments in the field, and ML algorithms have now been democratized to make them much easier to use, and avoid the need to write the complex algorithms from scratch.
There are four main types (or styles) of machine learning: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.
In supervised machine learning, you provide a labeled dataset in which there are a set of features and the target variable you’re trying to predict. You give the model the data, the answers, and define an evaluation metric (i.e., accuracy) and it trains itself to predict the answer as accurately as it can. You can then pass the model previously unseen data and it will predict the answer based on what it previously learned.
In unsupervised learning, you provide an unlabeled dataset comprising a set of features, but you don’t provide the answer you’re trying to predict. Instead, the model looks for patterns in the data and places it into groupings based on the similarity of the features present. You can then pass the model unseen data and it will group the data based on what it previously observed in your training data. Unsupervised machine learning can also be used for dimensionality reduction and outlier detection via models such as the Isolation Forest, or Principle Component Analysis (PCA).
Semi-supervised learning is the least common machine learning approach, and has only recently been included in ML frameworks, such as scikit-learn. Semi-supervised learning is essentially a hybrid of supervised and unsupervised learning, that doesn’t require all the training data to be labeled. The benefit of semi-supervised learning is that it allows you to use a smaller dataset, which is essential in certain situations.
Reinforcement learning is one of the newer machine learning methods currently in use today. It doesn’t require the use of labeled input data and instead uses dynamic programming techniques, such as Markov decision processes (MDP). The models are designed to take various actions and maximise a reward variable, thus learning from their mistakes to get better at meeting the target objective.
Machine learning techniques can even detect fake news. (Picture by Wesley Tingey, Unsplash.)
In machine learning, features are the names of the variables you pass in to the model to train it to predict the target variable. To identify patterns in data, machine learning models use maths, so the features you provide in the input data all need to be converted to a numeric form.
For example, let’s say you are building a model to predict who survived on the Titanic, based on their gender, age, and ticket price. For each passenger, you would provide their age (i.e., 39), ticket price (i.e, £100), and their gender, but your labeled data would need to be encoded so males would have a 1 and females would have a 0.
In most machine learning projects, you will spend a lot of time preparing features and in generating new features. This process is called feature engineering and it can be one of the most fascinating parts of many ML projects, since you really need to understand the underlying data.
The phrases artificial intelligence and machine learning are used interchangeably by many people to mean the same thing. Machine learning is technically just a sub-branch of AI. However, artificial intelligence is perhaps best viewed as a concept in which the goal is to create intelligent algorithms that simulate or outperform human thinking.
Deep learning is a subset of machine learning in which the algorithms use artificial neural networks designed to learn in the same way human brains learn. Therefore, all deep learning algorithms are machine learning, but only some machine learning algorithms use deep learning.
Deep learning algorithms are much more complicated than the more common supervised machine learning or unsupervised machine learning algorithms more widely used by data scientists and require more advanced skills. However, modern deep learning frameworks such as Tensorflow, Keras, and PyTorch, make it much easier to create neural networks than it used to be.
Popular deep learning models include recurrent neural networks (RNNs) and convolutional neural networks (CNNs), which are being used in various areas, such as Natural Language Processing (NLP), Natural Language Understanding (NLU), and computer vision. Neural nets power the technology in modern self-driving cars.
Deep learning techniques and deep neural nets dominate in computer vision. (Picture by Wesley Tingey, Unsplash.)
Machine learning algorithms are the underlying mathematical procedures used within machine learning models. ML algorithms perform pattern recognition to learn from data, but come in a wide range of forms, designed to detect patterns in a variety of ways.
Common ML algorithms include: linear regression, logistic regression, decision trees, principal component analysis (PCA), support vector machines (SVM), k Means, k Nearest Neighbours (KNN), and artificial neural networks (ANNs). Sometimes the term “algorithm” and “model” are used interchangeably in data science, but they do technically represent different things.
A machine learning model is best thought of as the combination of the data, algorithm, and program that examines data and generates predictions. When a model is created, the data is first analysed, then prepared and formatted for use in the model, and then used to train the model. The trained model is then saved and used to make predictions on previously unseen data, allowing previous observations to be used to make predictions.
A decade or so ago, R used to be the main language used for data scientists building machine learning models. However, it’s now fallen somewhat out of favour due to the explosive growth of Python. Having worked with R in the distant past, I now solely use Python and find it a really enjoyable language to use.
Early ML models ran on the computer’s CPU, and many data scientists still use this technique. However, most serious ML engineers now utilise the GPUs (graphical processing units) in their graphics cards instead. Often building a data science workstation specifically for this purpose.
GPUs have hundreds or thousands of processors per chip and can perform calculations in parallel, making them significantly faster. It’s also possible to access GPUs (and the more recent TPUs - Tensor Processing Units) in the cloud.
ML can primarily solve two main types of problem: classification and regression. Classification aims to classify items by predicting which “class” an item belongs to, while regression aims to predict a number. The two types of model are known as classifiers or regression models.
These two basic problem types actually have loads of different practical applications in the real world. These include everything from speech recognition, to computer vision, Natural Language Processing (NLP), Natural Language Generation (NLG), and Natural Language Understanding (NLU), to time series forecasting, clustering, and a whole array of other types of predictive models.
Classifiers are designed to classify items into groups called classes based on their features. They predict a discrete variable (y) from a set of predictive variables or features (X).
At the most basic level, a classifier would predict a binary or binomial outcome, i.e., 1 or 0. For example, a spam classifier might examine emails and predict whether they are “spam” or “not spam”. An image classifier might predict whether a meal is a “hotdog” or “not hotdog”.
However, more advanced classifiers can predict the probability of an item belonging to more than one class. These multiclass classification models, or multinomial classification models, return the probability of each item belonging to each of the pre-defined classes.
In ML, regression models aim to predict a numeric value. For example, what price will a house sell for based on its location and attributes, or what sales will a company generate in August. They use methods that predict a continuous variable (y) based on a set of predictor variables or features (X).
By examining the features, the model defines y as a function and can predict X when it gets a new, previously unseen set of features. For example, a model might examine the features of houses and predict the house sale price.
Machine learning can be relatively difficult, or fairly easy, depending on the complexity of the problem you are trying to solve. ML requires a very particular set of skills. Skills that can be acquired over a long career. Skills that make machine learning engineers a nightmare for employers to find.
Modern ML frameworks, such as scikit-learn, have made ML much easier than it used to be, because you can use pre-written algorithms in your models without the need to write them from scratch. However, you’ll still need to be good at maths, stats, and data analysis, and understand the principles behind modelling.
Machine learning frameworks, like scikit-learn, use a common approach to creating their models, so the complexity of creating a simple linear regression model, or a gradient boosted tree, or a neural network, are actually all quite similar. None of them require you to write the underlying machine learning algorithm - you simply call the relevant function.
Machine learning engineers, and data scientists who work in ML, command very high salaries in the UK, even when compared to other high demand technical roles, such as web development.
As of 2021, graduate salaries for ML roles are upwards of £30K in the UK. Senior and Lead ML roles can hit £100K or more, depending on the business, sector, or work being undertaken. Most companies are looking for people with a degree, and often a Master’s or PhD, plus demonstrable experience in ML.
With data science dubbed “the sexiest job of the 21st century”, very high salaries on offer, and a global shortage of data scientists, it is of no surprise that lots of people are looking to learn machine learning.
There are a growing number of Bachelor’s and Master’s degree programmes that aim to teach data science and ML. However, most people will likely opt for one of the many online learning courses available via providers like DataCamp.
There are also loads of great websites out there which, like Practical Data Science, aim to share practical advice on specific ML and data science problems to help other people get to grips with them and use them in their work. A simple classification model, such as the wine classification problem, is a good way to get started.
Picture by Andrea Piacquadio, Pexels.
Realistically, you’re likely to require a good degree in some kind of numerate discipline, such as computer science, mathematics, or statistics if you want to have a career in machine learning. However, if you’re intelligent, you could retrain from a different field, but you’ll need to put in a huge amount of hours to become proficient enough to do this as your day job.
If you don’t want to take 1-3 years out of employment by studying for a degree-level qualification, your best bet is likely to be online learning. Many of the courses available are really excellent and, if you put in the time to practice and read around the subject, you can pick up some very practical skills.
To get started I’d suggest starting off by building a simple linear regression model and a simple classification model, and then moving onto more advanced techniques, or best practices. Learning the basics of the processes behind creating and assessing a machine learning model are the most important factors at first. Once you know these, you can apply them to whatever models you like.
I’d also highly recommend that you use machine learning to solve problems on real world datasets, rather than the toy datasets often used in many machine learning examples and documentation. These will better equip you with the skills you need to apply your knowledge to real business problems.
I tried a few online courses when I was learning ML and settled on DataCamp. My wife is also currently enrolled with DataCamp and is following one of their longer, more in-depth career pathways. Their courses are excellent. They have very knowledgeable instructors, good video-based content, and practical tasks to try, with plenty of code examples.
With DataCamp, you can also take the first course in any four-part course for free. This is a great way to find out if the course is right for you, and is also a quick way to get started in any new data science technologies you fancy trying. You can view a full list of DataCamp’s data science and machine learning courses here.
Matt Clarke, Wednesday, March 03, 2021