What it takes to learn transformer models for natural langauge processing!

joydeepml2020
Jun 1, 2021
1 min read

Transformer models has become very popular in natural lanagauge processing task and mostly in task like machine neural transalation, image captioning,text summarization etc and infact with some small tweek the transformer based encoder stack(BERT) can be used in task like text classification, sentiment analysis,news calssification etc. In the recent past and in the present days, transformer has become one of the most popular articheture for NLP problems. In this series of article, we will explore the transformer models. The transformer based model was introduced by Vaswani et al in NIPS 2017 in their paper "Attention Is All You Need". Before transformer, the sequence 2 sequence tasks were handled by recuring neural network based models like RNN based encoder and decoder. Infact the best performing seq2seq model was RNN based encoder-decoder with attention mechanism. In their papers they introduced transformer based architecture by utilizing solely attention mechanism and discarding RNN or CNN part. In order to understand transformer well, we need to understand what are the short commings of encoder -decoder based RNN models and attentioned based models.

In this series,we will discuss encoder decoder based RNN seq2seq models, what are their short commings. Then how attention mechanism helps to overcome that. Finally, we will discuss the transformer models with tensorflow 2.0 implementation of transformer articheture.

In this article, we are going to understand the Encoder -Decoder RNN based seq2seq2 models.

What it takes to learn transformer models for natural langauge processing!

Recent Posts

Comments