Activation Functions: Sigmoid, ReLU, and Tanh Explained

Interview Preparation Hub for AI/ML Engineering Roles

1. Introduction

Activation functions are at the heart of neural networks. They introduce non-linearity into the model, enabling networks to learn complex patterns beyond simple linear relationships. Without activation functions, neural networks would collapse into linear regression models, severely limiting their power.

This guide explores three of the most widely used activation functions—Sigmoid, ReLU, and Tanh. We will cover their mathematical foundations, properties, advantages, limitations, applications, and relevance in modern deep learning architectures.

2. What is an Activation Function?

An activation function defines how the weighted sum of inputs and bias is transformed before passing to the next layer. Mathematically:

z = Σ (w_i * x_i) + b
a = f(z)

Where f is the activation function. The choice of f determines the network’s ability to capture non-linear relationships.

3. Sigmoid Function

The Sigmoid function maps inputs to the range (0, 1). It is defined as:

f(x) = 1 / (1 + e^(-x))

Properties:

Range: (0, 1).
S-shaped curve.
Useful for probability interpretation.

Advantages:

Simple and smooth.
Interpretable as probability.

Limitations:

Vanishing gradient problem.
Outputs not zero-centered.
Slow convergence in deep networks.

4. Tanh Function

The Tanh function maps inputs to the range (-1, 1). It is defined as:

f(x) = (e^x - e^(-x)) / (e^x + e^(-x))

Properties:

Range: (-1, 1).
S-shaped curve, centered at zero.

Advantages:

Zero-centered outputs.
Better convergence than Sigmoid.

Limitations:

Still suffers from vanishing gradients.
Computationally expensive compared to ReLU.

5. ReLU Function

ReLU is the most widely used activation function in modern deep learning. It is defined as:

f(x) = max(0, x)

Properties:

Range: [0, ∞).
Piecewise linear.
Sparse activation (many neurons output zero).

Advantages:

Efficient computation.
Mitigates vanishing gradient problem.
Enables deep networks to train faster.

Limitations:

“Dying ReLU” problem (neurons stuck at zero).
Not zero-centered.

6. Comparison of Sigmoid, Tanh, and ReLU

Function	Range	Advantages	Limitations
Sigmoid	(0, 1)	Probability interpretation	Vanishing gradients, not zero-centered
Tanh	(-1, 1)	Zero-centered, smoother convergence	Vanishing gradients
ReLU	[0, ∞)	Fast, efficient, mitigates vanishing gradients	Dying ReLU problem

7. Applications in Deep Learning

Sigmoid: Binary classification, probability outputs.
Tanh: Hidden layers in shallow networks.
ReLU: Default choice in deep networks (CNNs, RNNs, Transformers).

8. Advanced Variants of ReLU

To address ReLU’s limitations, several variants were introduced:

Leaky ReLU: Allows small negative values.
Parametric ReLU (PReLU): Learns slope for negative inputs.
Exponential Linear Unit (ELU): Smooths negative values.

9. Challenges

Choosing the right activation function for the task.
Balancing computational efficiency with accuracy.
Handling vanishing/exploding gradients.

10. Interview Notes

Be ready to explain mathematical definitions of Sigmoid, Tanh, and ReLU.
Discuss advantages and limitations of each.
Explain vanishing gradient and dying ReLU problems.
Describe applications and modern relevance.

Diagram: Interview Prep Map

Sigmoid → Tanh → ReLU → Variants → Applications → Challenges → Interview Prep

🔥 Popular Topics

Introduction to Deep Learning and Artificial Intelligence 13 views The Perceptron: The Building Block of Neural Networks 12 views Activation Functions: Sigmoid, ReLU, and Tanh Explained 10 views Forward Propagation and Loss Functions 10 views Building Multi-Layer Perceptrons (MLP) 10 views

Activation Functions: Sigmoid, ReLU, and Tanh Explained

1. Introduction

2. What is an Activation Function?

3. Sigmoid Function

4. Tanh Function

5. ReLU Function

6. Comparison of Sigmoid, Tanh, and ReLU

7. Applications in Deep Learning

8. Advanced Variants of ReLU

9. Challenges

10. Interview Notes

Related Topics

🔥 Popular Topics