Machine Learning
(Ref. Huda Nasser, Julia Academy - Data Science)
In this chapter, we will explore the fundamentals of machine learning by working with the MNIST dataset — a classic benchmark in computer vision. The MNIST dataset consists of 70,000 handwritten digits (0 through 9), split into 60,000 training images and 10,000 testing images. Each image is a grayscale 28×28 pixel image, making it ideal for experimenting with classification models.
The Julia language offers powerful packages, including Flux.jl (for building neural networks), MLDatasets.jl (to access standard datasets) and OneHotArrays (for target batching). Throughout, the exercise we will use a set of tools (Images.jl, ImageInTerminal.jl, Plots.jl) in order to make visual cheks along the process.
The exercise in this chapter will guide you through the following steps:
- Load and visualize the MNIST dataset;
- Preprocess the data for model training;
- Build and train a simple machine learning model (here a neural network);
- Evaluate the model’s performance on unseen data.
MNIST dataset
The MNIST dataset can be retrieved from the MLDatasets.jl package. Start by loading the training dataset.
using MLDatasets
= MNIST(split=:train) d_train
What does this dataset actually look like? You can check this by typing the following commands.
using Images
using ImageInTerminal
colorview(Gray,d_train.features[:,:,1])
It turns out unclear at moments which digit is handwritten on the image. To clarify this, have a look at the label associated to the image.
1] d_train.targets[
Neural network
A neural network is a type of machine learning model inspired by the structure and function of the human brain. It is composed of layers of interconnected nodes called neurons, which work together to process data, recognize patterns, and make predictions.
At its core, a neural network learns to approximate complex functions by adjusting the weights and biases of these connections based on the data it sees.
A standard neural network for the MNIST dataset to start of with has the following structure:
Input (784) ⟶ Dense (32) ⟶ ReLU ⟶ Dense (10) ⟶ Softmax ⟶ Output (Digit 0–9)
The 28-by-28 gray-scale images are flattened into a 784 element vector. No activation function is applied at this stage — the input is just passed to the next layer. The input data is passed to a 32-neuron hidden layer which computes a weighted sum of the inputs, adds a bias, and passes the result through an activation function, here ReLU (Rectified Linear Unit) to introduce non-linearity. The output layer has 10 neurons to be consistent with the 10 possible targets (0 through 9). This being a classification task of the handwritten digit, we use a Softmax activation function to convert the outputs into probabilities that sum to 1.
Preprocess the dataset
The neural network we will be using in this exercise requires a 1D-vector of length 784 input. Start by flattening the matrices representing the images of our dataset using Flux.jl.
using Flux
= Flux.flatten(d_train.features) v_train
You should now use OneHotArrays.jl to transform the target array to vectors of 10 elements, with 1
at the index of the target digit.
using OneHotArrays
= onehotbatch(d_train.targets,0:9) Y
Set-up the neural network
The lines of code bellow are simply a translation of the neural network schematic in Julia.
= Chain(
m Dense(28*28,32,relu),
Dense(32,10),
softmax )
What happens if we apply this neural network to one of the images?
m(v_train[:,1])
Training
You can start by having a look at the training function within Flux.jl in the following way.
#| output: false
? Flux.train!
Take care when changing package version to have a look at the major changes. For instance, from version 0.14 of Flux.jl on the syntax for Flux.train!
changed. Indeed, it went from Flux.train!(loss, params(model), data, opt)
to Flux.train!(loss, model, data, opt_state) # using the new "setup" from Optimisers.jl
.
When a neural network makes predictions (like classifying an image as a “3” instead of a “7”), we need a way to measure the difference between the predicted output and the actual (true) target.
The loss function provides this measure. It returns a numerical value that represents the “error” — the higher the value, the worse the prediction. Since we have a classification problem in this exercise, a typical loss choice is the cross-entropy loss.
loss(m,x, y) = Flux.Losses.crossentropy(m(x),y)
To properly train the neural network we wish to minimize the loss function. To do so, we will be using a variant of gradient descent called ADAM.
= Flux.setup(Adam(), m) optimizer
When training a neural network, we often need to go over the training data multiple times. Each full pass over the training data is called an epoch.
using IterTools: ncycle
= ncycle([(v_train, Y)], 200) dataset
The dataset
storage constructed in the cell above tells us to train for 200 epochs. This means that the network will see the training data 200 times.
Let’s train the neural network now!
train!(loss, m, dataset, optimizer) Flux.
So, does it work better than previously on our first image?
= m(v_train[:,1])
tst = argmax(tst)-1
cls = d_train.targets[1]
tgt println("Image classified as ", cls, " with target ", tgt, ".")
Let us now have a look under the hood of Flux.train!
. What is happening in the training loop? - Take a subset of input data with associated targets: a batch; - Determine whether the model m
predicts well the targets: use the loss function; - Find out which direction each model parameter should move to: compute the gradient of the loss with respect to each parameter; - Adjust the parameters using the gradients and an optimizer.
Testing
We can now evaluate our trained neural network on unseen data, the so-called test dataset.
= MNIST(split=:test)
d_test for i in 1:10
= d_test.features[:,:,i]
b = reshape(b,784)
v_b = m(v_b)
a = argmax(a)-1
r println("Image classified as ", r, " with target ", d_test.targets[i], ".")
end
The results seem pretty good at first glance.