Text Summarization Using Deep Learning

7 min readMay 11, 2021

So, In this article, We will make a Text Summarizer using Deep Learning. We will walk through a step-by-step process for building it and then we will implement our first text summarization model in python.

What is Text Summarization in NLP?

Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of natural language, like speech and text, by software. It helps the machines process and understands the human language.

“Text summarization is the problem of creating a short, accurate, and fluent summary of a longer text document.” -Text Summarizer

There are broadly two different approaches that are used for text summarization:

Extractive Summarization
Abstractive Summarization

Let’s look at these two types in more detail.

Extractive Summarization

In this type of Summarization, We identify the important sentences or phrases from the original text and extract only those from the text. Using the Text Rank algorithm we can build an Extractive Summarization.

Abstractive Summarization

Here, we generate new sentences from the existing sentences or text. In this, the generated sentences from Abstractive Summarization might be not present in the existing sentences or text.

So here we are going to build an Abstractive Text Summarizer using Deep Learning.

Introduction to Sequence-to-Sequence (Seq2Seq) Modeling

We can build a Seq2Seq model on any problem which includes the sequential information about the given problem. This includes two classification

Neural Machine Translation
Named Entity Recognition

some very common applications of sequential information are given below

In the case of Neural Machine Translation, the input is a text in language, and the output is also text in another language:

In the case of Named Entity Recognition, the input is a sequence of words and the output is a sequence of tags for given words as an input.

Our objective is to build an Abstractive Text Summarizer where the input is a long sequence of words and the output is a short summary of those words. So, we can model this as a Many-to-Many Seq2Seq problem.

There are two major components of a Seq2Seq model:

Encoder
Decoder

Encoder-Decoder

The Encoder-Decoder architecture is mainly used to solve the sequence-to-sequence (Seq2Seq) problems where the input and output sequences are of different lengths.

There are two phases to set up the Encoder-Decoder

Training Phase
Inference Phase

Understanding the Problem Statement

Customer reviews can often be long and descriptive. Analyzing these reviews manually, as you can imagine, is really time-consuming. This is where the brilliance of Natural Language Processing can be applied to generate a summary for long reviews.

Implementing Text Summarization in Python using Keras

Keras is an open-source library that provides a Python interface for artificial neural networks

Let’s import it into our environment:

Import the Libraries

Read the dataset

Drop Duplicates and NA values

Preprocessing

Performing basic preprocessing steps is very important before we get to the model building part. Using messy and uncleaned text data is a potentially disastrous move. So in this step, we will drop all the unwanted symbols, characters, etc. from the text that do not affect the objective of our problem.

Here is the dictionary that we will use for expanding the contractions:

a) Text Cleaning

Let’s look at the first 10 reviews in our dataset to get an idea of the text preprocessing steps:

We will perform the below-preprocessing tasks for our data:

Convert everything to lowercase
Remove HTML tags
Contraction mapping
Remove(‘s)
Remove any text inside the parenthesis()
Eliminate punctuations and special characters
Remove stopwords
Remove short words

b) Summary Cleaning

And now the first 10 rows of the reviews to an idea of the preprocessing steps for the summary column:

Define the function for this task:

Remember to add the start and end special tokens at the beginning and end of the summary:

Now, let’s take a look at the top 5 reviews and their summary:

Understanding the distribution of the sequences

Here, We will analyze the length of the reviews and the summary to get an overall idea about the distribution of the length of the text. It will help us to fix the maximum length of the sequence.

For Example

Now we are getting closer to make the building a model part. Before that, we have to split the dataset into a Training and Validation set.

Preparing the Tokenizer

It produces a sequence of the vocabulary and it will convert word sequence to integer sequence.

Let’s make Tokenizers for the text and summary

a) Text Tokenizer

b) Summary Tokenizer

Model building

Now We are going to make the model-building part. Before that let’s understand a few terms,

Return Sequences = True: When the return sequence is set to True. LSTM will produce a hidden state and cell state for every timestep.
Return State = True: When the return state is set to True. LSTM will produce a hidden state and cell state of the last timestep only.
Initial State: It will initialize the internal state of the LSTM for the first timestep only.
Stacked LSTM: It has multiple layers of the LSTM on top of each other it leads to better representation of the sequence.

Here we are building stacked LSTM for the encoder…

It is used to stop the training of the neural network at the right time by monitoring a user-specified metric. Here we use validation loss(val_loss) once it increases our model will stop the training.

We will train the model on a batch size of 512 and validate it on the holdout set(10% of the dataset). Batch size is a term used in Deep Learning and refers to the number of training examples utilized in one iteration.