Model Training for non-Data Scientists

5 min readJan 7, 2022

Photo by Possessed Photography on Unsplash

The goal of this article is to give an introduction to a very important concept in the world of Machine Learning, which is Model Training. I will give a broad description of the concept of training and get specific in some other articles in the future as time goes on.

If you are new to Machine Learning, or simply have an interest in it, this is for you.

Note: This article focuses on parameterized training in supervised learning, which is more predominant.
Also, I try to be as abstract as possible until the very last section of the article. This allows you to think more broadly about it. It’s easier to go from a generalization to a specialization.

First, a model

Let’s define a model. Definitions are different depending on what text you are looking at. Broadly speaking though, the output of a training procedure is a model. Essentially, a model is a representation of your data distribution whose performance is dependent on factors like architecture, training algorithm, dataset, etc. The extent to which your model represents your data distribution determines its performance.

So, what does it mean to train a model? In broad terms, training means to improve or optimize something towards achieving a goal or improving performance. If you were to take a test, you would have to prepare for it, right? You would also need some resources to study for the test, e.g. tutorials, books, etc.
In the same manner, a model has to be trained (learning/studying) using data (books and tutorials) to perform better in the real world (the test).

It all starts with the data

Data is required to train a model, it is essential for achieving a model with high performance. Without data, the model has nothing to learn from. Without books, lectures, and tutorials, the chances of scoring high on a test are very low.
Also, the quality of the dataset you use to train is also a determining factor in model performance. There are data practitioners focused on data preprocessing and cleaning.
Training data is used to train the model and test data is used to test the model.

Choice of Model

Several factors inform one's choice of a model including the problem to be solved, the dataset you have, etc.
According to research and practice, some model types and model architectures have been discovered to perform better at certain tasks than others. Examples of model types include:

Decision Tree and its variants
Artificial Neural Nets
Linear Regression
Logistic Regression
Graph Models; these are very interesting.

Parameters / Weights

This is not hard to explain, but it’s quite tricky. Models are made up of these things called parameters (typically called weights in Deep Neural Nets). Essentially, parameters are just numbers that tell you how the model should react to input from the user. Parameters also determine the output of the model.
There are millions of weights in a typical real-life model, think BERT (Bidirectional Encoder Representation from Transformers).
These parameters (numbers) have to be tuned (or optimized) in such a way that they increase the likelihood that the model performs better.

Think of a baby, it knows nothing when it is born, so you could say its weights are randomized. A collection of randomized weights is not guaranteed to perform optimally in any scenario. However, as the baby grows, its weights (which are basically numbers) are optimized which is now evident in its ability to walk, talk, etc. Now the baby knows how to talk because they learned to do so by optimizing some parameters or weights in their brain.

Loss Function / Error Measurement

This is simple. While you’re learning, you have to keep measuring how well the model is doing, a loss function is how you do that. It entails comparing the output of your model during training to the original output.
There are several types of loss functions. Here are a few loss functions you can use:

Mean Squared Error
Mean Absolute Error
Cross-Entropy

Read more about loss functions here.

Optimization

Here’s the interesting part of machine learning. After we have measured how far away we are from the truth, now we have to update our weights and optimization is how we do so. For each parameter that makes up the model, we calculate a delta, we then update each parameter using this delta. The delta is usually based on the error that we described earlier. Examples of optimizers are:

Gradient Descent (a very common algorithm)
RMSProp
Adam Optimizer

Read more about optimization methods here.

Performance Measure

So you have trained a model and you’d like to know how it is performing. It is important to use the performance measure that best represents how well the model is doing in real life.

Accuracy (typically not a good measure of performance)
F1-Score
ROC-AUC (Area Under the Curve)
IoU (Intersection over Union, used in Computer Vision problems)

In many real-life cases, accuracy is not a good measure of model performance.

Summary

So in general, here are the steps in machine learning.

Collect your data (this is huge, and not very easy)
Choose a model
Choose a loss function
Choose an optimization method.
Initialize weights (most people use random initialization)
Pass your training data as input to the model.
Compare the model output to the true value.
Measure how far you are from the true value using the loss function.
Optimize your model using your optimizer.
Measure model performance using a test dataset.
If you get your desired performance level, stop. Else, go back to step 6.

You could say that the goal is to improve performance while reducing errors. Of course, this is a very high-level description of model training, there’s more to training but this gives a nice introduction to show you that it is not as daunting as you think it may be.

Thanks for reading up to this point. If you like what you just read, kindly give a few claps. 👏🏽
You can also follow on Twitter: @iamtemibabs