Artificial Neural Networks (ANN) can be complex but it has become much easier to implement, thanks to frameworks and libraries, the past few years. In this post, we’ll walk through the process of creating a basic ANN. We’ll be using Python, TensorFlow, and Keras to create an ANN for recognizing handwritten digits. This is kind of the “Hello World” of AI.
In preparation and for better understanding, it would be good to have watched 3Blue1Brown’s videos But what is a neural network? and Gradient descent, how neural networks learn as we will be implementing a neural network as he described in those videos.
Python is a high-level, interpreted programming language known for its readability, simplicity, and flexibility. Kinda like BASIC. It is one of the most popular programming languages for artificial intelligence and machine learning due to its versatility, ease of use, and strong ecosystem of libraries that streamline complex AI tasks.
Tensorflow is an open-source machine learning (ML) framework developed by Google that allows you to build, train, and deploy ML models, particularly deep learning models. It provides a wide range of tools, libraries, and community resources to simplify the process of working with ML algorithms.
Keras is an open-source, high-level neural networks API written in Python. It is designed to simplify the process of building and training deep learning models. Originally, Keras was designed to be able to run on top of various lower-level deep learning frameworks. However, is now integrated into TensorFlow as its high-level API.
Installation
To install Python, go to the Python Downloads page and download the installer for your operating system and follow the instruction. Side note: I had problems with the latest version. So instead I downloaded Python 3.12.7. Not sure if it was because I was using MacOS 15 or because of TensorFlow.
Once Python is installed, we can install TensorFlow.
pip3 install tensorflow
Creating The Model
Fire up your favorite editor and start by importing Tensorflow.
# Import necessary libraries and modules
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Flatten, Dense
from tensorflow.keras.utils import to_categorical
Next, we define the ANN model.
# Build a neural network model
model = Sequential([
Input(shape=(28, 28)), # Defines input shape (28, 28)
Flatten(), # Converts the 2D array into 1D array
Dense(16, activation='sigmoid'), # First hidden layer with 16 neurons, sigmoid activation
Dense(16, activation='sigmoid'), # Second hidden layer with 16 neurons, sigmoid activation
Dense(10, activation='sigmoid') # Output layer with 10 neurons (for 10 classes)
])
I attempted to define the model as described in 3Blue1Brown’s video: It is a simple sequential network with an input layer that accepts 28×28 images, a flatten layer that converts the images to 784×1 array, two 16-neuron hidden layers with sigmoid activation functions, and a 10-neuron output layer also with a sigmoid activation function.
Next, we compile the model
# Compile the model with optimizer, loss function, and evaluation metric
model.compile(optimizer='sgd', # Stochastic Gradient Descent as the optimization function
loss='mse', # Mean Squared Error as the cost function
metrics=['accuracy'])
Again, I attempted to define the model as described in 3Blue1Brown’s video: It uses a gradient descent optimization function and a mean squared error cost or loss function.
Training The Model
Next, we load the MNIST dataset which is a well-known dataset of 28×28 handwritten digit images. It has 70,000 labeled images (60,000 training and 10,000 testing). Conveniently, Keras includes the MNIST dataset, a large collection of handwritten digits. It has both training and testing data.
# Load the MNIST dataset (handwritten digits)
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Next, we normalize the data
# Normalize the data to a range of 0 to 1 by dividing by 255
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
Next, we convert the labels to one-hot encoding.
# Convert the labels to one-hot encoding for multi-class classification
y_train = to_categorical(y_train, 10) # One-hot encode the training labels (10 classes)
y_test = to_categorical(y_test, 10) # One-hot encode the test labels (10 classes)
One-hot encoding is a technique used to convert categorical data into a numerical format used in machine learning models. In one-hot encoding, each data category is represented as a binary vector with all elements set to 0 except for the element corresponding to the category, which is set to 1. For example:
0 = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
1 = [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
:
9 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
Then, we train the model for 500 epochs. Let’s just leave it at that for now.
# Train the model on the training data for 500 epochs
model.fit(x_train, y_train, epochs=500)
Then, we evaluate the result of the training.
# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('Test loss:', test_loss)
print('Test accuracy:', test_acc)
Finally, we save the model.
# Save the trained model
try:
model.save('model.keras')
print("Model training complete and saved")
except Exception as e:
print(f"Error saving model: {e}")
Save the file and run it. The file is available on the GitHub repo.
python3 create_basic_model.py
After a few minutes of training, the result of the evaluation is:
Test loss: 0.02075177989900112
Test accuracy: 0.89410001039505
And that’s it. We have created and trained a basic ANN. In the next post, we will use it for recognition.