What is an Artificial Neural Network (ANN) in Deep Learning?

Throughout our previous blog posts, we have explored classical standard machine learning algorithms like Linear Regression and Random Forests. Today, we cross the threshold into the most advanced, powerful sector of modern AI: Deep Learning.

If you have ever asked, "what is an artificial neural network in deep learning?", you are in the right place. Let's break down the biological inspiration, the mathematical architecture, and the Python code that powers self-driving cars and ChatGPT.

1. The Biological Inspiration: The Perceptron

The artificial neural network meaning literally stems from biology. To build machines that can "think", researchers mapped the architecture of the human brain. The human brain contains roughly 86 billion biological cells called Neurons.

In machine learning, the digital equivalent of a single neuron is called a Perceptron. A perceptron receives numerical signals (inputs), multiplies them by a level of importance (weights), adds a baseline number (bias), and then passes the result through a mathematical filter to decide if it should "fire" a signal forward, exactly like a human brain cell firing electricity.

2. The Architecture: Layers of the Network

A single perceptron isn't very smart. But when you connect thousands of them together, you create a sprawling "Neural Network". A standard ann deep learning architecture is divided into three distinct vertical stages:

Input Layer: The very first wall of neurons. It receives the raw data (like the pixels of an image or columns in a spreadsheet) and passes it inside.
Hidden Layers: This is where the "Deep" in Deep Learning comes from! The hidden layers neural network architecture consists of multiple dense columns of neurons. They perform complex mathematical transformations to extract hidden features (like recognizing the curve of a cat's ear).
Output Layer: The final wall of neurons that spits out the final prediction (e.g., predicting 95% probability the image is a 'Cat').

3. Activation Functions (The Brain's Mathematical Filters)

If a neural network only multiplied weights and added biases, it would essentially just be a giant, linear math equation. It could never solve complex, curvy, non-linear real-world problems. This is where activation functions relu sigmoid and others step in.

ReLU (Rectified Linear Unit): The most popular activation function for hidden layers. If a neuron calculates a negative number, ReLU turns it to 0 (turns the neuron off). If it is positive, it lets the number pass through. It is incredibly fast and avoids the vanishing gradient problem.
Sigmoid: Primarily used in the final Output Layer for binary classification. It mathematically squishes any number into a probability perfectly bounded between 0 and 1.
Softmax: Used exclusively in the final Output Layer for multi-class classification (e.g., predicting between Cat, Dog, or Bird) making sure all output probabilities perfectly add up to 100%.

4. How the Network "Learns": Backpropagation

When you first create a network, the mathematical weights connecting the neurons are completely randomized. Its first few predictions will be hilariously wrong. How does it learn?

Through the legendary backpropagation algorithm (Backward Propagation of Errors). The network checks its prediction against the actual answer key and calculates how overwhelmingly "wrong" it was (using a Loss Function). It then hits reverse, traveling backward through the entire network, adjusting and fine-tuning every single weight using calculus (Gradient Descent) so it is slightly less wrong the next time. It repeats this forward-backward loop millions of times.

5. Practical Python Implementation (TensorFlow / Keras)

In the modern era, you do not need to write the complex calculus from scratch. Here is standard neural network python code keras to build a functional Artificial Neural Network used to predict customer churn (binary classification) using TensorFlow and Keras.

# 1. Import TensorFlow and Keras Sequential Model

import numpy as np

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler



# 2. Simulated Customer Data (Age, Account Balance, Credit Score) -> Churned (1 or 0)

X = np.array([[25, 4000, 600], [50, 85000, 800], [35, 200, 500], [45, 120000, 750], [22, 100, 550]])

y = np.array([1, 0, 1, 0, 1])



# 3. Train/Test Split and Feature Scaling (CRITICAL FOR NEURAL NETWORKS)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)



# 4. Build the ANN Architecture

model = Sequential()



# Input Layer & First Hidden Layer (3 Input Features, 8 Neurons, ReLU Activation)

model.add(Dense(units=8, activation='relu', input_shape=(3,)))



# Second Hidden Layer (4 Neurons)

model.add(Dense(units=4, activation='relu'))



# Output Layer (1 Neuron, Sigmoid Activation for Binary Probability 0 to 1)

model.add(Dense(units=1, activation='sigmoid'))



# 5. Compile the Model (Attach the Optimizer and Cost Function)

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])



# 6. Train the Neural Network via Backpropagation! (epochs = number of loops)

model.fit(X_train_scaled, y_train, batch_size=2, epochs=50)

Conclusion

The dawn of deep learning basics has forever shifted the paradigm of artificial intelligence. While traditional machine learning hits a ceiling regarding how much data it can absorb, Artificial Neural Networks actually scale infinitely in power; the more data you feed them, the smarter they become. However, they demand massive amounts of computational GPU power and incredibly large, clean datasets to achieve their maximum potential.

Frequently Asked Questions (FAQs)

What is an Epoch vs a Batch Size?

An Epoch occurs when the neural network has processed the entire dataset exactly one full time (moving forward and backward). The Batch Size is how many rows of data the network processes before pausing to update its mathematical weights. Doing updates in small batches saves memory and avoids crashing standard computers.

Why must I "Scale" data before feeding it to an ANN?

Neural networks calculate gradients using calculus to update weights. If one feature is $1,000,000 (Income) and another is 5 (Num of Kids), the massive numbers will blow up the equations causing the "Exploding Gradient" problem, meaning the network mathematically physically cannot learn. Features must be squeezed between -1 and 1 via standard scaling.

What is the difference between ANN, CNN, and RNN?

ANN (Artificial Neural Network) is standard dense layers used for tabular excel-style data. CNN (Convolutional Neural Network) uses sliding filters designed explicitly to process 2D Images and Computer Vision. RNN (Recurrent Neural Network) contains internal memory loops explicitly designed to process sequential data like stock markets or sentences of text.