NGN Tech Hub - Leading Innovation and Sustainability in Software Development, Learning & Personal Development | Experts in Transforming Ideas into Digital Solutions.

Go back to Blogs

Understanding the fundamentals of Convolutional Neural Networks

January 10, 2025

ℹ️

We sometimes use affiliate links in our content. This means that if you click on a link and make a purchase, we may receive a small commission at no extra cost to you. This helps us keep creating valuable content for you!

In our previous blog, we provided an overview of the principles of AI and their applications. Now, let’s dive deeper into one of the most powerful techniques in AI: Convolutional Neural Networks (CNNs). In this article, we will explore the fundamentals of CNNs, their architecture, and basic implementation in Python. But first, let’s briefly revisit Artificial Neural Networks (ANNs) to set the stage for understanding CNNs.

Prerequisites

What are Artificial Neural Networks?

ANNs are a set of algorithms, that are inspired by human brains, that are designed to recognize patterns. They are made up of layers of interconnected nodes, called neurons, that process data. They interpret sensory data through a kind of machine perception, labeling, or clustering of raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text, or time series, must be translated.

What are Convolutional Neural Networks?

Convolutional Neural Networks (CNNs) are a specialized type of deep artificial neural network designed primarily for processing and analyzing data with grid-like topology, such as images and videos. But CNNs also work well for non-image data (especially in NLP & text classification). They are a cornerstone of deep learning applications and have revolutionized fields like computer vision, natural language processing, and even audio recognition.

Key Concepts and Components of CNNs

CNNs are built on principles inspired by the human visual system, particularly in how the brain processes visual information through hierarchical patterns of increasingly complex features. Below are the essential components that define CNNs:

Convolution Operation

The convolution operation is the heart of a CNN. It involves sliding a filter (also called a kernel) over the input data to extract features such as edges, corners, and textures.

Filters/Kernels: Small matrices with learnable parameters that capture specific patterns in the data. They measure how close a patch or region of input matches a feature.
Feature Maps: The output of the convolution operation, representing the filtered features.
Stride: Determines how much the filter moves during each convolution. We prefer a smaller stride size if we expect several fine-grained features to reflect in our output. On the other hand, if we are only interested in the macro-level of features, we choose a larger stride size. Larger strides reduce the spatial dimensions of the feature map.
Padding: Adds zeros around the input data to maintain the spatial dimensions during convolution.

Mathematical Representation:

For a 2D convolution, if X is the input matrix and K is the kernel, the convolution at position (i, j) is:

Pooling Layers

Pooling layers reduce the spatial dimensions of feature maps, making the network computationally efficient and robust to small translations in the input.

Max Pooling: Extracts the maximum value from a region of the feature map.
Average Pooling: Computes the average value of a region.

Pooling also prevents overfitting by reducing the number of parameters.

Activation Functions

Non-linear activation functions are applied to introduce non-linearity, enabling CNNs to learn complex patterns.

ReLU (Rectified Linear Unit): Replaces negative values with zero, defined as:
f(x) = max(0, x)
Other common activations: Sigmoid, Tanh, and Leaky ReLU.

Fully Connected Layers

After extracting features using convolution and pooling layers, the CNN flattens the feature maps into a single vector and feeds it into fully connected layers for classification or regression tasks.

These layers connect every neuron in one layer to every neuron in the next, making decisions based on the learned features.

Dropout Layers

Dropout layers randomly deactivate a fraction of neurons during training to prevent overfitting and enhance generalization.

How CNN Works?

To understand how a CNN operates, let’s break down the pipeline of a typical CNN used for image classification:

Input Layer: Receives raw pixel data from the input image (e.g., 224x224x3 for a color image).
Convolutional Layers: Extract features such as edges and textures using filters.
Pooling Layers: Downsample the feature maps to reduce complexity.
Fully Connected Layers: Combine extracted features and classify them into predefined categories.
Output Layer: Outputs the probabilities for each class using functions like Softmax.

Advantages of CNNs

Spatial Hierarchy: CNNs capture spatial dependencies by processing small regions at a time, enabling efficient feature extraction.
Parameter Sharing: Filters are shared across input data, significantly reducing the number of learnable parameters.
Translation Invariance: Pooling layers make CNNs robust to shifts and distortions in input data.

Applications of CNNs

CNNs have widespread applications across various industries:

Computer Vision
- Image Classification: Recognizing objects in images (e.g., classifying cats and dogs).
- Object Detection: Identifying and localizing multiple objects in an image (e.g., YOLO, SSD).
- Image Segmentation: Dividing an image into regions or objects (e.g., U-Net)
Healthcare
- Medical Imaging: Detecting abnormalities in X-rays, MRIs, and CT scans.
- Cancer Diagnosis: Analyzing histopathological images for early detection.
Natural Language Processing
- Sentiment analysis using 1D convolutions on text data.
- Sentence classification and language modeling.
Autonomous Vehicles
- Recognizing pedestrians, traffic signs, and lane boundaries using CNN-based models like MobileNet and ResNet.
Facial Recognition
- Powering systems for security and authentication (e.g., FaceNet).

Challenges of CNNs

High Computational Cost: Training deep CNNs requires significant computational resources.
Data Dependency: CNNs require large labeled datasets for effective training.
Overfitting: Small datasets can lead to models that do not generalize well.
Interpretability: Understanding why CNNs make specific predictions can be challenging.

Future of CNNs

With advancements in hardware and software, CNNs are becoming more efficient and powerful. Emerging trends include:

Hybrid Architectures: Combining CNNs with transformers for better spatial and temporal feature learning.
Automated CNN Design: Using Neural Architecture Search (NAS) to automate the creation of CNNs.
Edge AI: Deploying lightweight CNNs on mobile and IoT devices for real-time inference.

Basic CNN Implementation with Python

This section provides a Python implementation of a basic Convolutional Neural Network (CNN) using PyTorch, one of the most popular deep learning frameworks. This implementation is for an image classification task, such as recognizing digits from the MNIST dataset.

Dependencies

Ensure that you have all the necessary dependencies listed below are installed for the provided code.

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

Initialization of CNN and forward function

The Initialization block is used to initialize CNN model, while the forward method defines the forward pass of the Convolutional Neural Network. This method specifies how the input tensor flows through the network layers to produce the output logits. Below is a detailed explanation of each step in the forward method:

class CNN(nn.Module):
    def __init__(self, num_classes=10):
        """
        Initialize the CNN model.
        """
        super(CNN, self).__init__()
        # Convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1)  # Output: 32x28x28
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)  # Output: 64x28x28
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)  # Reduces spatial dimensions by half (e.g., 28x28 -> 14x14)

        # Fully connected layers
        self.fc1 = nn.Linear(64 * 14 * 14, 128)  # Flattened size: 64x14x14
        self.fc2 = nn.Linear(128, num_classes)

        # Dropout layer for regularization
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        """
        Forward pass of the CNN.
        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, channels, height, width).
        Returns:
            torch.Tensor: Output logits.
        """
        # Convolutional layers with ReLU and pooling
        x = F.relu(self.conv1(x))
        x = self.pool(F.relu(self.conv2(x)))

        # Flatten the feature maps
        x = x.view(x.size(0), -1)  # Reshape to (batch_size, flattened_features)

        # Fully connected layers
        x = F.relu(self.fc1(x))
        x = self.dropout(x)  # Apply dropout
        x = self.fc2(x)  # Output layer

        return x

Training and testing: To train the CNN model, you need to follow these steps:

Set up the environment: Ensure you have the necessary libraries installed.
Load and preprocess the data: Use the MNIST dataset for training and testing.
Define the model: Use the CNN class provided.
Set up the training loop: Train the model using the training data.
Evaluate the model: Test the model using the test data.

# Training and Testing Functions
def train(model, device, train_loader, optimizer, criterion, epochs=5):
   """
   Train the CNN model.
   Args:
       model: The CNN model.
       device: The device to run on (CPU or GPU).
       train_loader: DataLoader for training data.
       optimizer: Optimizer for updating model parameters.
       criterion: Loss function.
       epochs (int): Number of epochs to train.
   """
   model.train()
   for epoch in range(epochs):
       running_loss = 0.0
       for batch_idx, (data, target) in enumerate(train_loader):
           data, target = data.to(device), target.to(device)
           # Zero the parameter gradients
           optimizer.zero_grad()
           # Forward pass
           output = model(data)
           loss = criterion(output, target)
           # Backward pass and optimization
           loss.backward()
           optimizer.step()
           running_loss += loss.item()
       print(f"Epoch {epoch + 1}, Loss: {running_loss / len(train_loader):.4f}")


def test(model, device, test_loader, criterion):
   """
   Test the CNN model.
   Args:
       model: The CNN model.
       device: The device to run on (CPU or GPU).
       test_loader: DataLoader for test data.
       criterion: Loss function.
   """
   model.eval()
   test_loss = 0.0
   correct = 0
   with torch.no_grad():
       for data, target in test_loader:
           data, target = data.to(device), target.to(device)
           # Forward pass
           output = model(data)
           test_loss += criterion(output, target).item()
           # Get predictions
           pred = output.argmax(dim=1, keepdim=True)
           correct += pred.eq(target.view_as(pred)).sum().item()
   test_loss /= len(test_loader)
   accuracy = 100. * correct / len(test_loader.dataset)
   print(f"Test Loss: {test_loss:.4f}, Accuracy: {accuracy:.2f}%")

Key Features of the Implementation

Train and Test Functions: Functions for training and testing are separated for better organization.
Hyperparameter Customization: Easily adjustable parameters like batch size, learning rate, and epochs.
GPU Support: The model automatically uses a GPU if available.
Regularization: Dropout is used in the fully connected layers to reduce overfitting.

Model training, testing, and performance evaluation

Model training and testing

The execute function acts as the core of the CNN project pipeline to training the model, test the application, and orchestrating the entire workflow. It encompasses three major tasks:

Data Preparation:
- The function initializes and processes the training and testing datasets. Data transformations, like normalization and augmentation, are applied here to enhance model performance and generalization. The data is then loaded into batches using DataLoader objects, which facilitate efficient training by handling memory constraints.
Model Setup:
- Within the execute function, the CNN model is initialized. This includes defining the architecture, configuring the loss function (e.g., CrossEntropyLoss for classification tasks), and setting up the optimizer (e.g., Adam or SGD). It ensures the model is ready for training and evaluation, often specifying whether it will run on CPU or GPU.
Training and Testing Process:
- The function manages the training loop, where the model learns from the training data using forward and backward passes, optimizing weights with the specified optimizer. It also includes the testing phase, evaluating model performance on unseen data to compute metrics like accuracy and loss.

def execute():
   # Hyperparameters
   try:
       batch_size = 64
       learning_rate = 0.001
       epochs = 10
       # Device configuration
       device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
       # Data transformations
       transform = transforms.Compose([
           transforms.ToTensor(),
           transforms.Normalize((0.5,), (0.5,))
       ])
       # Load MNIST dataset
       train_dataset = datasets.MNIST(root="./data", train=True, transform=transform, download=True)
       test_dataset = datasets.MNIST(root="./data", train=False, transform=transform, download=True)
       train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
       test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
       # Initialize model, loss function, and optimizer
       model = CNN(num_classes=10).to(device)
       criterion = nn.CrossEntropyLoss()
       optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
       # Train and test the model
       print("Training the model...")
       train(model, device, train_loader, optimizer, criterion, epochs)
       print("Testing the model...")
       test(model, device, test_loader, criterion)
   except Exception as e:
       print("Error executing task" + str(e))

The main block in the provided code serves as entrypoint to test the application and to display the result.

if __name__ == "__main__":
   execute()

To run the application and dispaly the result:

Install dependencies: pip install torch torchvision
Run the script python cnn.py, and it will train the CNN on the MNIST dataset, displaying the loss and accuracy after training.

Performance evaluation

The result demonstrates that CNN achieved strong performance, with steadily decreasing training loss and high testing accuracy, demonstrating its ability to generalize well on unseen data. Minimal discrepancies between training and testing results indicate effective learning, while any misclassifications provide insights for potential improvements, such as fine-tuning or enhancing data preprocessing.

Next Steps

Extend the Model: Add more convolutional and fully connected layers.
Experiment: Use different datasets (e.g., CIFAR-10) and optimizers (e.g., SGD).
Optimize: Implement learning rate scheduling or fine-tune pre-trained models.

Summary

Convolutional Neural Networks (CNNs) are a powerful class of deep learning models particularly well-suited for image recognition and classification tasks. They leverage the spatial structure of images through convolutional layers, pooling layers, and fully connected layers to automatically and adaptively learn spatial hierarchies of features.This enables CNNs to efficiently extract features like edges, textures, and patterns, enabling tasks such as image classification, object detection, and segmentation. Their ability to share parameters and capture spatial hierarchies makes them computationally efficient and powerful for a wide range of applications. While challenges like computational demands remain, continuous advancements ensure CNNs will remain a vital tool in the AI toolkit.

References