What Is Batch Normalization?

Written by Coursera Staff • Updated on

Begin your understanding of batch normalization, a technique revolutionizing neural network training, by learning what batch normalization is and why it’s important in deep learning.

[Feature Image] An aspiring data scientist searches “What is batch normalization” on their laptop as they plan their coursework for their degree.

Batch normalization is a machine learning technique that can speed up deep learning training and contribute to the speed and stability of neural networks. In 2015, Sergey Ioffe and Christian Szegedy revolutionized the field of deep learning with a normalization model that addressed a significant challenge in training neural networks: internal covariate shift. Internal covariate shift occurs as network activations’ distribution changes depending on network parameter alterations during training. Batch normalization aims to solve this problem by normalizing each layer of the network activations rather than normalizing the total batch. 

Discover how batch normalization works, what differentiates it from other normalization techniques, and its real-world implications. 

What is batch normalization?

Batch normalization speeds up training rates and improves accuracy in a deep neural network. It does this by inserting a hidden layer between two activation network layers. By re-centering and re-scaling the output of the intermediate layer in the training stage of a deep neural network, batch normalization reduces dependency on the initial weights. The network can then train on inputs with a consistent distribution and isn’t slowed down by constantly changing parameters. 

How batch normalization works

Batch normalization operates in three stages and repeats those stages for each batch in the training set. Explore them in more detail: 

  1. First, the hidden batch normalization layer calculates the input layer’s current mean and standard deviation. 

  2. The second step alters these values to a mean of zero and a unit standard deviation, which you calculate by subtracting the mean from each input and dividing by the standard deviation.

  3. Forcing these parameters can often be too restrictive, so the third step in batch normalization is adding two learnable parameters: gamma and beta. Gamma is a scaling factor that can modify the standard deviation, and beta offsets bias by shifting the curve left or right. When a neural network requires a fluctuation in the input distribution, gamma and beta continuously learn each mini batch's optimal value and can learn to revert to the actual distributions.

Implementation in neural networks

Batch normalization is available in deep learning frameworks; however, it is important to determine if it is the best technique for your neural network. Multilayer perceptrons and convolutional neural networks are candidates for batch normalization; however, recurrent neural networks may be too complex. 

Another consideration is whether to apply batch normalization before or after an activation. It is common to apply batch normalization before the activation function; however, some distributions yield better results when batch normalization is applied afterward. Also, factoring in the larger-than-normal learning rates can help determine if batch normalization makes sense for the neural network. 

Batch normalization vs. layer normalization

Batch normalization and layer normalization are both standard methods for training neural networks but have unique uses. Choosing the right normalization tool to train the data set can determine the neural network’s success or failure. While batch normalization normalizes each feature in a mini-batch, layer normalization normalizes across all features independently for each input. This means that while batch normalization is dependent on batch sizes, layer normalization can use models with smaller batch sizes.

Who uses batch normalization?

Jobs that create and interpret machine learning algorithms and models require understanding normalization techniques like batch normalization. Machine learning has applications in nearly all industries, with uses from automated order-taking (AOT) at drive-thrus to lung cancer screenings and self-driving cars. 

Below are three careers that work with machine learning and batch normalization.

1. Data scientist

Average annual US salary (Glassdoor): $116,946 [1]

As a data scientist, your company or clients may look to you for recommendations on business decisions and processes based on your understanding of data. After collecting raw data, you can create algorithms and models and use machine learning to categorize data to make predictions and interpret the data.

2. Machine learning engineer

Average annual US salary (Glassdoor): $122,019 [2]

As a machine learning engineer, you often bridge the gap between software engineers and data scientists. You may develop software that automates predictive models and work with data scientists to input their data into machine learning algorithms. 

3. Deep learning model researcher

Average annual US salary (Glassdoor): $125,613 [3]

As a deep learning model researcher, your work may be similar to that of a machine learning engineer. Working with machine learning models, you would focus on developing models that let the machine train, teach, and think for itself. 

Pros and cons of using batch normalization

The benefits of batch normalization solidify it as a universally accepted tool for machine learning engineers when training deep neural networks. However, batch normalization also has limitations and challenges that you should consider when choosing a normalization method.

Advantages of batch normalization

  • Accelerated training process: Batch normalization speeds up the training process, resulting in higher learning rates than models trained without a normalization tool. 

  • Improved stability and performance: The model is more stable and accurate in the training stage by reducing internal covariate shifts.

  • Reduced dependence on initial parameters: Normalizing each batch rather than the entire training set makes the network less dependent on initial starting weights. The model continuously learns the ideal mean and standard deviation, increasing flexibility as it trains.

Disadvantages of batch normalization

  • Increases the complexity of the model: Adding additional layers to the training model makes it larger and more complex and can increase the operational costs.

  • Requires large batch sizes: Using batch normalization on small or varied batch sizes can incorrectly estimate the mean and deviation. This creates an unstable training model.

  • Not universally beneficial: Batch normalization works well with convolutional neural networks but can create detrimental results in training sequence data sets such as recurrent neural networks.

How to begin with batch normalization

If you want to have a career in deep learning, you will likely need to know batch normalization. To enter the machine learning field, you’ll likely need a bachelor's degree in mathematics, computer science, or statistics. Many employers look for candidates with a bachelor's degree and relevant work experience, yet some require a master’s or doctoral degree. 

Continue exploring deep learning with Coursera

Whether you are automating a virtual sales assistant or training a model to spot fraudulent bank transactions, batch normalization has many applicable uses. 

Strengthen your fundamental knowledge of machine learning with one or all five courses in the series, Deep Learning Specialization, and work towards mastery of deep learning and artificial intelligence (AI). You can also gain a thorough introduction to AI when you complete the Google AI Essential Career Certificate. You’ll find these options and more on Coursera.

Article sources

1

Glassdoor. "How much does a Data Scientist make?, https://www.glassdoor.com/Salaries/data-scientist-salary-SRCH_KO0,14.htm." Accessed December 19, 2024.

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.