Multi-Layer Perceptron for beginners

4 min readJun 27, 2018

So last week, I joined a competition to recognize patterns in a difficult dataset (I’ll explain later). My team decided that we need to assign ourselves to different tasks. We were assigned one model each to work with and see how it fits. The model task assigned to me was an Artificial Neural Network and I’m not very happy about that (again, I’ll explain later). Eventually, I had to settle for a type of NN called Multi-Layer Perceptron.

My aim with this post is to do a surface-level non-mathematical analysis of my work last weekend. So don’t worry if you’re not comfortable with calculus or math, you’ll cope.

I’m doing another post next week for those interested in the mathematics.

What is this Perceptron?

A perceptron is a type of Artificial Neural Network (ANN) that is patterned in layers/stages from neuron to neuron. An ANN is patterned after how the brain works. You see, on the surface level, the brain is made up of elements called neurons

(the red stuff in the image) and connected/linked in a manner described in the image. There are many billions (maybe trillions) of these in the brain. Tiny calculations (called non-linearity in Machine Learning) are done at each of these points/junctions to help us make decisions about our environment. Now let’s make that artificial. In the Machine Learning world, we call artificial neurons neurons and call the links weights.

The weights are the connections between neurons and they form the basis of the decision-making in the brain. In the ANN, they’re basically numbers.

In the ANN, we do something called layering. This means that we group certain neurons and treat them alike (look at the image below).

As you can see, this ANN has :

6 input neurons, where the input is received;
4 hidden neurons, where the machine learns stuff;
another 4 hidden neurons, where the machine learns more stuff;
and 1 output neuron, where the final decision is made and the machine returns an output.

THAT’S 4 LAYERS.

You would notice that in this perceptron, all the neurons in a layer are connected to every other neuron in the next layer. We say that it is a fully-connected neural network. Also observe that the network connects the input layer to the output layer indirectly through the hidden layers (we have 2 layers in the image above). This is the Multi in the Multi-Layer Perceptron (MLP). Whenever you have at least 1 hidden layer, you have a multi-layer perceptron.

For an Artificial Neural Network to efficiently do classification or regression, it requires hundreds of millions of data records. THAT’S A LOT!!! Interestingly, that’s why brain-training starts from childhood immediately after birth and it takes a long time for you as a human to take actions or make decisions based on training (depending on what level of cognition and decision-making you’re talking about). But when the brain learns, it learns.

The Training of the Brain

Our brain starts at a random state (that means it uses random values for the weights) and then makes decisions based on these. Then it observes the output of the environment, compares the output with its own and corrects itself, adjusting these weights to minimize the error it makes. It’s a lot of matrix-matrix multiplication and a lot of other calculus, but like I promised, no math stuff here. Long process, huh? So, the weights of the brain, are set as a result of the observed environment. Be careful what you learn!!!

The Data Set

Now, let me describe the pain-in-the-ass dataset. Here’s the highlights, 3 issues:

It has over 4700 features (that’s a lot). I mean, it’s not even an image dataset.
It is very limited in number of records or examples (less than the number of features).
It is a sparse dataset. It has a lot of zeros in it.

If you’re a Machine Learning expert or a good Data Scientist, you’ll know a Neural Network is a major time-waste for a problem like this. You feel my pain?

Poor Results

After spending hours building the MLP model, my beloved model did not even show signs that I was training it at all, the results on the test dataset were terrible (a Root Mean Squared Logarithmic Error of 8.321). I wasn’t expecting great results, but I wasn’t expecting this level of terribleness.

And that, my friends, is how I spent my weekend training a baby that never learned anything.

I’m publishing another read next week about the mathematics.