{"id":5257,"date":"2020-10-02T16:44:46","date_gmt":"2020-10-02T11:14:46","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=5257"},"modified":"2025-09-30T07:00:47","modified_gmt":"2025-09-30T11:00:47","slug":"sequence-to-sequence-model-for-deep-learning-with-keras","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/sequence-to-sequence-model-for-deep-learning-with-keras\/","title":{"rendered":"Sequence to Sequence Model for Deep Learning with Keras"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><strong>What is Seq2seq Learning?<\/strong><\/h2>\n\n\n\n<p>Sequence to sequence learning involves building a model where data in a domain can be converted to another domain, following the input data sequence. Seq2seq, as it is called for short, is especially useful in Natural Language Processing for language translation. If you\u2019ve used popular language translators like Google Translate, you\u2019d realize that as you type a word in a language (say English), it converts that word to another language (say French) in real-time following the sequence of input words. If you change any word in English, there is a corresponding change in the French word. A seq2seq model does this.<\/p>\n\n\n\n<p><em>I am learning how to build models -&gt; [seq2seq] -&gt; J&#8217;apprends \u00e0 construire des mod\u00e8les<\/em><\/p>\n\n\n\n<p>Seq2seq model can be used for other applications such as conversational models, image captioning, text summarization, and more. In this tutorial, we will focus on building a seq2seq model that does language translation using Keras in Python.&nbsp;<\/p>\n\n\n\n<p>A seq2seq model has two important components: the encoder and the decoder. And that&#8217;s why the Seq2seq model can also be called the encoder-decoder model. The encoder maps the sequence of input to the decoder to return a sequence of output.&nbsp;<\/p>\n\n\n\n<p>There are many ways to build the neural network architecture of the encoder and decoder, depending upon how you want to apply it. For image captioning, a convolution neural network (CNN) with a flattened final layer is typically used. For language translation models, a recurrent neural network (RNN) is used. Since there are various variations of RNN models, you would need to determine the kind of RNN to apply. The two common RNN models are LSTM (long short term memory) or GRU (gated recurrent units).&nbsp;<\/p>\n\n\n\n<p>In this project, we shall use LSTM to build a seq2seq model for machine translation.&nbsp;<\/p>\n\n\n\n<p>Let&#8217;s start by understanding how the sequence to sequence models work.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How Seq2seq Works<\/strong><\/h2>\n\n\n\n<p>1. RNN layer which serves as the encoder: The encoder receives a sequence as input and returns it&#8217;s own internal state. This process continues until the end of the sequence. Note that the outputs of the encoder layers (hidden state) are discarded until the final layer. Only the cell state (more like the memory of the layer) of one layer is passed unto the next.&nbsp; The output of the final state is called the context or conditioning of the decoder. This final output of the encoder is used as the initial input of the decoder.<\/p>\n\n\n\n<p>2. RNN layer which serves as the decoder: The decoder is trained to return the target characters of the data but with an offset time, i.e. in the future. Let&#8217;s put it in another way. The decoder is trained to predict the next character, given the previous character. Or we can say, the decoder is trained to return the target t+1 given the targets t, as defined by the input sequence.&nbsp;<\/p>\n\n\n\n<p>With the context received from the encoder as input, each state of the decoder returns an output that serves as input for the next state. The output of the decoder is the characters at each time step.&nbsp;<\/p>\n\n\n\n<p>After training the model, the next step is to carry inference on the model. In the inference model, the model predicts the sequence of characters for a completely new input sequence.&nbsp;<\/p>\n\n\n\n<p>To do this we&#8217;d start by encoding the sequence into state vectors. Then we feed the state vectors and a single character target sequence into the decoder to generate predictions for the next character. Afterward, argmax is used to sample the predicted character and append it to the target sequence. The process is repeated until it gets to the end of the sequence or perhaps, reaches a defined limit of characters.&nbsp;<\/p>\n\n\n\n<p>That\u2019s how the seq2seq model works. Now that we have an understanding of how encoding, decoding, and inference operates, let&#8217;s take a coding example.&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>A Seq2seq Model Example: Building a Machine Translator.\u00a0<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code><\/code><\/pre>\n\n\n\n<p>In the Keras official blog, the author of the Keras library, Francois Chollet, wrote an <a href=\"https:\/\/blog.keras.io\/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html\" rel=\"nofollow noopener\" target=\"_blank\">article<\/a> that details how to implement an LSTM-based sequence to sequence model to make predictions. In this post, we&#8217;ll be discussing how to build such models and specifically use them for machine translation. The LSTM model will predict Spanish texts given a sequence of the input text. The data used for this project was gotten from <a href=\"http:\/\/www.manythings.org\/anki\/\" rel=\"nofollow noopener\" target=\"_blank\">manythings.org\/anki<\/a>. In the dataset, the input sequence were English words while the output sequence were Spanish words. You can download the dataset <a href=\"http:\/\/www.manythings.org\/anki\/spa-eng.zip\" rel=\"nofollow noopener\" target=\"_blank\">here<\/a>.<\/p>\n\n\n\n<p>Here&#8217;s how the dataset looks like.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Go. Ve. CC-BY 2.0 (France) Attribution: tatoeba.org <em>#2877272 (CM) &amp; #4986655 (cueyayotl)<\/em>\nGo. Vete. CC-BY 2.0 (France) Attribution: tatoeba.org <em>#2877272 (CM) &amp; #4986656 (cueyayotl)<\/em>\nGo. Vaya. CC-BY 2.0 (France) Attribution: tatoeba.org <em>#2877272 (CM) &amp; #4986657 (cueyayotl)<\/em>\nGo. V\u00e1yase. CC-BY 2.0 (France) Attribution: tatoeba.org <em>#2877272 (CM) &amp; #6586271 (arh)<\/em>\nHi. Hola. CC-BY 2.0 (France) Attribution: tatoeba.org <em>#538123 (CM) &amp; #431975 (Leono)<\/em>\nRun! \u00a1Corre! CC-BY 2.0 (France) Attribution: tatoeba.org <em>#906328 (papabear) &amp; #1685404 (Elenitigormiti)<\/em>\nRun! \u00a1Corran! CC-BY 2.0 (France) Attribution: tatoeba.org <em>#906328 (papabear) &amp; #5213896 (cueyayotl)<\/em>\nRun! \u00a1Corra! CC-BY 2.0 (France) Attribution: tatoeba.org <em>#906328 (papabear) &amp; #8005613 (Seael)<\/em>\nRun! \u00a1Corred! CC-BY 2.0 (France) Attribution: tatoeba.org <em>#906328 (papabear) &amp; #8005615 (Seael)<\/em>\nRun. Corred. CC-BY 2.0 (France) Attribution: tatoeba.org <em>#4008918 (JSakuragi) &amp; #6681472 (arh)<\/em>\nHe laughed. \u00c9l se re\u00eda. CC-BY 2.0 (France) Attribution: tatoeba.org <em>#299650 (Sprachprofi) &amp; #745277 (Shishir)<\/em>\nHe made it. Lo hizo \u00e9l. CC-BY 2.0 (France) Attribution: tatoeba.org <em>#300301 (CK) &amp; #6682410 (arh)<\/em>\nHe made it. Lo logr\u00f3. CC-BY 2.0 (France) Attribution: tatoeba.org <em>#300301 (CK) &amp; #6682411 (arh)<\/em>\nHe made it. Lo hizo. CC-BY 2.0 (France) Attribution: tatoeba.org <em>#300301 (CK) &amp; #6682413 (arh)<\/em><\/pre>\n\n\n\n<p>&nbsp;The Modelling Process in a nutshell<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>We have reiterated the fact that machine learning algorithms work with numbers alone and not strings. Thus, we would need to convert the sentences into <a href=\"https:\/\/www.h2kinfosys.com\/blog\/getting-started-with-numpy\/\">NumPy arrays.<\/a> The encoder-decoder LSTM model will require 3 important data in numerical array format: encoder_input_data, decoder_input_data, decoder_target_data.<\/li>\n<\/ol>\n\n\n\n<p>Let&#8217;s understand what each of these NumPy arrays is.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The input data of the encoder (encoder_input_data): This is a 3-dimensional array, containing the number of pairs in English (input_texts), the maximum length of the English sentence (max_encoder_seq_length), and the number of the English characters (num_encoder_tokens) in the data. Remember that the data is a one-hot encoding representation of the sentences in English.\u00a0<\/li>\n\n\n\n<li>The input data of the decoder (decoder_input_data): This is a 3-dimensional array containing the number of pairs in Spanish (input_texts), the maximum length of the Spanish sentences (max_decoder_seq_length), and the number of Spanish characters (num_decoder_tokens). Again, the data is represented as a one-hot encoding of the Spanish sentences.\u00a0<\/li>\n\n\n\n<li>The target data of the decoder (decoder_target_data): This data is the same as the input data of the decoder, only that that returns the next character at an offset time t + 1. Putting it differently, the input data of the decoder at time t, (decoder_target_data[:, t, :]) is same as the target data of the decoder at time t+1 (decoder_input_data[:, t + 1, :])<\/li>\n<\/ul>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Once we have all 3 data, we proceed to train a simple LSTM based sequence to sequence model that predicts the target data of the decoder given then input data of the encoder and input data of the decoder.\u00a0<\/li>\n\n\n\n<li>Finally, we carry out inferences by trying out the model on new sentences to see if it can decide the sentences with high accuracy.\u00a0<\/li>\n<\/ol>\n\n\n\n<p>Although the training and inference process makes use of similar RNN layers, there are two different models and must be built separately. We will begin by building the training model.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Training Model<\/strong><\/h2>\n\n\n\n<p>We\u2019d start by importing the necessary libraries and defining the necessary libraries and some parameters that need to be defined when training the model.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#import the necessary libraries<\/em>\n<strong>from<\/strong> <strong>tensorflow.keras.models<\/strong> <strong>import<\/strong> Model\n<strong>from<\/strong> <strong>tensorflow.keras.layers<\/strong> <strong>import<\/strong> Input, LSTM, Dense\n<strong>import<\/strong> <strong>numpy<\/strong> <strong>as<\/strong> <strong>np<\/strong>\n\n<em>#define the batch size for training<\/em>\nbatch_size =&nbsp; 70&nbsp;\n<em>#define the number of epoch for training<\/em>\nepochs = 40&nbsp;\n<em>#define the dimensionality of the encoding process<\/em>\nlatent_dim = 256&nbsp;\n<em>#define the number of samples to train on<\/em>\nnum_samples = 10000&nbsp;<\/pre>\n\n\n\n<p>Next, we would be reading the data to extract the input texts (sentences in English), target texts (sentences in Spanish), input characters (words in English),target characters (words in Spanish).&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#define the file location<\/em>\ndata_path = r\"C:\\Users\\wale obembe\\Downloads\\Compressed\\spa.txt\"\n<em>#define an empty list to store the words\/sentences in English<\/em>\ninput_texts = []\n<em>#define an empty list to store the characters in Spanish<\/em>\ntarget_texts = []\n<em>#define a set to store the words\/sentences in English<\/em>\n<em>#a set data type and not a list is used to avoid repetition of characters<\/em>\ninput_characters = set()\n<em>#define a set to store the characters in Spanish<\/em>\ntarget_characters = set()\n<em>#read the data file and parse by each line<\/em>\n<strong>with<\/strong> open(data_path, 'r', encoding='utf-8') <strong>as<\/strong> f:\n&nbsp;&nbsp;&nbsp;&nbsp;lines = f.read().split('<strong>\\n<\/strong>')\n\n<strong>for<\/strong> line <strong>in<\/strong> lines[: min(num_samples, len(lines) - 1)]:\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#split each line into the input text, target text and the other unnecessary text<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;input_text, target_text, _ = line.split('<strong>\\t<\/strong>')\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#we use tab as the start sequence character for the target, and \\n as end sequence character<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;target_text = '<strong>\\t<\/strong>' + target_text + '<strong>\\n<\/strong>'\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#append the English words for each text to the empty list<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;input_texts.append(input_text)\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#append the Spanish words for each text to the empty list<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;target_texts.append(target_text)\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#select the letters or words in English<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>for<\/strong> char <strong>in<\/strong> input_text:\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em>#check if the letter is not in the list<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<strong>if<\/strong> char <strong>not<\/strong> <strong>in<\/strong> input_characters:\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em>#append the letter that is not on the list<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;input_characters.add(char)\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#select the letters or words in Spanish<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>for<\/strong> char <strong>in<\/strong> target_text:\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em>#check if the letter or word is not in the list<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<strong>if<\/strong> char <strong>not<\/strong> <strong>in<\/strong> target_characters:\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em>#append the letter or word that is not on the list<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;target_characters.add(char)<\/pre>\n\n\n\n<p>Let\u2019s print out these variables&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>print<\/strong>(input_characters)<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<p><code>[' ', '!', '$', \"'\", ',', '-', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'Y', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']<\/code><\/p>\n\n\n\n<p><strong>print<\/strong>(target_characters)<\/p>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">['\\t', ' ', '!', '\"', \"'\", ',', '-', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', ':', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'Y', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '\u00a1', '\u00ab', '\u00bb', '\u00bf', '\u00c1', '\u00c9', '\u00d3', '\u00da', '\u00e1', '\u00e9', '\u00ed', '\u00f1', '\u00f3', '\u00fa', '\u00fc']<\/pre>\n\n\n\n<p><strong>print<\/strong>(input_texts[: 50])<\/p>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">['Go.', 'Go.', 'Go.', 'Go.', 'Hi.', 'Run!', 'Run!', 'Run!', 'Run!', 'Run.', 'Who?', 'Wow!', 'Fire!', 'Fire!', 'Fire!', 'Help!', 'Help!', 'Help!', 'Jump!', 'Jump.', 'Stop!', 'Stop!', 'Stop!', 'Wait!', 'Wait.', 'Go on.', 'Go on.', 'Hello!', 'Hurry!', 'Hurry!', 'Hurry!', 'I hid.', 'I hid.', 'I hid.', 'I hid.', 'I ran.', 'I ran.', 'I try.', 'I won!', 'Oh no!', 'Relax.', 'Shoot!', 'Shoot!', 'Shoot!', 'Shoot!', 'Shoot!', 'Shoot!', 'Smile.', 'Attack!', 'Attack!']<\/pre>\n\n\n\n<p><strong>print<\/strong>(target_texts[:50])<\/p>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">['<strong>\\t<\/strong>Ve.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Vete.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Vaya.<strong>\\t<\/strong>', '<strong>\\t<\/strong>V\u00e1yase.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Hola.<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Corre!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Corran!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Corra!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Corred!<strong>\\t<\/strong>', '<strong>\\t<\/strong>Corred.<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00bfQui\u00e9n?<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1\u00d3rale!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Fuego!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Incendio!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Disparad!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Ayuda!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Socorro! \u00a1Auxilio!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Auxilio!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Salta!<strong>\\t<\/strong>', '<strong>\\t<\/strong>Salte.<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Parad!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Para!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Pare!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Espera!<strong>\\t<\/strong>', '<strong>\\t<\/strong>Esperen.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Contin\u00faa.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Contin\u00fae.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Hola.<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Date prisa!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Daos prisa!<strong>\\t<\/strong>', '<strong>\\t<\/strong>Dese prisa.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Me ocult\u00e9.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Me escond\u00ed.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Me ocultaba.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Me escond\u00eda.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Corr\u00ed.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Corr\u00eda.<strong>\\t<\/strong>', '<strong>\\t<\/strong>Lo intento.<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1He ganado!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Oh, no!<strong>\\t<\/strong>', '<strong>\\t<\/strong>Tom\u00e1telo con soda.<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Fuego!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Disparad!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Disparen!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Dispara!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Dispar\u00e1!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Dispare!<strong>\\t<\/strong>', '<strong>\\t<\/strong>Sonr\u00ede.<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Al ataque!<strong>\\t<\/strong>', '<strong>\\t<\/strong>\u00a1Atacad!<strong>\\t<\/strong>']<\/pre>\n\n\n\n<p>Let\u2019s explicitly define the number of English words, the number of Spanish words, the number of characters in the longest English sentence and the number of characters in the longest Spanish sentence. We would be needing these variables later in our model.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#define the sorted list of English words<\/em>\ninput_characters = sorted(list(input_characters))\n<em>#define the sorted list of Spanish words<\/em>\ntarget_characters = sorted(list(target_characters))\n<em>#define the number of English words<\/em>\nnum_encoder_tokens = len(input_characters)\n<em>#define the number of Spanish words<\/em>\nnum_decoder_tokens = len(target_characters)\n<em>#define the maximum length of the English sentences<\/em>\nmax_encoder_seq_length = max([len(txt) <strong>for<\/strong> txt <strong>in<\/strong> input_texts])\n<em>#define the maximum length of the Spanish sentences<\/em>\nmax_decoder_seq_length = max([len(txt) <strong>for<\/strong> txt <strong>in<\/strong> target_texts])<\/pre>\n\n\n\n<p>Print the result&#8230;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>print<\/strong>(\"Number of samples:\", len(input_texts))\n<strong>print<\/strong>(\"Number of unique input tokens:\", num_encoder_tokens)\n<strong>print<\/strong>(\"Number of unique output tokens:\", num_decoder_tokens)\n<strong>print<\/strong>(\"Max sequence length for inputs:\", max_encoder_seq_length)\n<strong>print<\/strong>(\"Max sequence length for outputs\", max_decoder_seq_length)<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Number of samples: 10000\nNumber of unique input tokens: 69\nNumber of unique output tokens: 83\nMax sequence length <strong>for<\/strong> inputs: 16\nMax sequence length <strong>for<\/strong> outputs 42<\/pre>\n\n\n\n<p>We can also define two variables that will hold the index and characters for both English and Spanish words in a dictionary data type.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#index each characters in English<\/em>\ninput_token_index = dict(\n[(char, i) <strong>for<\/strong> i, char <strong>in<\/strong> enumerate(input_characters)])\n<em>#index each characters in Spanish<\/em>\ntarget_token_index = dict(\n[(char, i) <strong>for<\/strong> i, char <strong>in<\/strong> enumerate(target_characters)])<\/pre>\n\n\n\n<p>Going forward, we define the data for the 3 vital data that our training model will require, i.e the encoder_input_data, decoder_input_data, and decoder_target_data. Recall that they are a 3-dimensional one-hot encoding. To carry out the one-hot encoding process, let\u2019s begin by populating the data with zeros.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#define the input data of the encoder as a 3-dimensional matrix populated with zeros<\/em>\n<em>#the shape of the matrix is the length of the input text by the length of the max encoder by the number of encoder characters<\/em>\n<em>#the np.zeros() function takes an argument of the specified dtype. Here we use float32<\/em>\nencoder_input_data = np.zeros(\n(len(input_texts), max_encoder_seq_length, num_encoder_tokens\n), dtype='float32')\n<em>#define the input data of the decoder as a 3-dimensional matrix populated with zeros<\/em>\ndecoder_input_data = np.zeros(\n(len(input_texts), max_decoder_seq_length, num_decoder_tokens\n), dtype='float32')\n<em>#define the target data of the decoder as a 3-dimensional matrix populated with zeros<\/em>\ndecoder_target_data = np.zeros(\n(len(input_texts), max_decoder_seq_length, num_decoder_tokens\n), dtype='float32')<\/pre>\n\n\n\n<p>Next we would need to change the textual input texts as numerical vectors using one-hot encoding. The code below does that.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#parse the input and output texts<\/em>\n<strong>for<\/strong> i, (input_text, target_text) <strong>in<\/strong> enumerate(zip(input_texts, target_texts)):\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>for<\/strong> t, char <strong>in<\/strong> enumerate(input_text):\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;encoder_input_data[i, t, input_token_index[char]] = 1.\n&nbsp;&nbsp;&nbsp;&nbsp;encoder_input_data[i, t + 1:, input_token_index[' ']] = 1.\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>for<\/strong> t, char <strong>in<\/strong> enumerate(target_text):\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em>#define that the decoder_target_data is one time step ahead of the decoder_input_data by&nbsp;<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;decoder_input_data[i, t, target_token_index[char]] = 1.\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<strong>if<\/strong> t &gt; 0:\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em># define the decoder_target_data to be ahead by one timestep and will not include the first character<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;decoder_target_data[i, t - 1, target_token_index[char]] = 1.\n&nbsp;&nbsp;&nbsp;&nbsp;decoder_input_data[i, t + 1:, target_token_index[' ']] = 1.\n&nbsp;&nbsp;&nbsp;&nbsp;decoder_target_data[i, t:, target_token_index[' ']] = 1.<\/pre>\n\n\n\n<p>Going further, we train the model. When training the RNN, some parameters must be carefully defined. Let&#8217;s discuss what these parameters mean.&nbsp;<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The return_state parameter: When it is set to True, the RNN layer returns a list where the first entry is the outputs while the rest are the states of preceding cells. It is used to recover preceding encoder states.\u00a0<\/li>\n\n\n\n<li>The return_sequences parameter: By default, the RNN returns the output of only the last layer alongside the states of the other layers. When this parameter is set to True, it returns the entire sequence of the outputs. This is typically used for the decoder.\u00a0<\/li>\n\n\n\n<li>The initial_state parameter: This passes the encoder states to the decoder for its initial state.\u00a0<\/li>\n<\/ol>\n\n\n\n<p>Let&#8217;s go ahead to define the encoder.<\/p>\n\n\n\n<p>We first define the encoder input which is the English character sequence as one-hot encodings, whose length is equal to the number of encoder tokens. As explained earlier, the return_state parameter should be set to True for the encoder.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#define an input of the encoder with length as the number of encoder tokens<\/em>\nencoder_inputs = Input(shape=(None, num_encoder_tokens))\n<em>#instantiate the LSTM model<\/em>\nencoder = LSTM(latent_dim, return_state=True)\n<em>#define the outputs and states of the encoder<\/em>\nencoder_outputs, state_h, state_c = encoder(encoder_inputs)\n<em>#disregard encoder_outputs and keep only the states<\/em>\nencoder_states = [state_h, state_c]<\/pre>\n\n\n\n<p>It&#8217;s now time to define the decoder.&nbsp;<\/p>\n\n\n\n<p>As with the encoder, the input is a sequence of French characters as one-hot encodings, whose length is the number of decoder tokens. The LSTM is defined to return the output sequence of the states by setting the return_sequence to True.<\/p>\n\n\n\n<p>The final hidden and cell state of the encoder is used to initialize the input of the decoder. Additionally, the Dense layer is used to predict the output of each character. Finally, the Model can be defined for the encoder input data, decoder input data and decoder output data.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#define an input of the encoder with length as the number of encoder tokens<\/em>\ndecoder_inputs = Input(shape=(None, num_decoder_tokens))\n<em>#define the LSTM model for the decoder setting the return sequences and return state to True<\/em>\ndecoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)\n<em>#define only the decoder output for the training model. The states are only needed in the inference model<\/em>\ndecoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)\ndecoder_dense = Dense(num_decoder_tokens, activation='softmax')\ndecoder_outputs = decoder_dense(decoder_outputs)\n<em>#define the training model which requires the encoder_input_data and decoder_input_data to return the decoder_target_data<\/em>\nmodel = Model([encoder_inputs, decoder_inputs], decoder_outputs)\n<em>#Train the model<\/em>\nmodel.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])\nmodel.fit([encoder_input_data, decoder_input_data], decoder_target_data,\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;batch_size=batch_size,\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;epochs=epochs,\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;validation_split=0.2)<\/pre>\n\n\n\n<p>After about an hour, the model was successfully trained with a loss of 0.192 and an accuracy of 94.2%. Note that the accuracy of the training model can be increased by increasing the number of epochs.&nbsp;<\/p>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Epoch 1\/40\n8000\/8000 [==============================] - 111s 14ms\/sample - loss: 1.4567 - acc: 0.6568 - val_loss: 1.3915 - val_acc: 0.6210\nEpoch 2\/40\n8000\/8000 [==============================] - 100s 12ms\/sample - loss: 1.0924 - acc: 0.7033 - val_loss: 1.1186 - val_acc: 0.6872\nEpoch 3\/40\n8000\/8000 [==============================] - 94s 12ms\/sample - loss: 0.8982 - acc: 0.7445 - val_loss: 0.9992 - val_acc: 0.7029\nEpoch 4\/40\n8000\/8000 [==============================] - 88s 11ms\/sample - loss: 0.8080 - acc: 0.7601 - val_loss: 0.9034 - val_acc: 0.7278\nEpoch 5\/40\n8000\/8000 [==============================] - 95s 12ms\/sample - loss: 0.7425 - acc: 0.7771 - val_loss: 0.8708 - val_acc: 0.7350\n.\n.\n.\nEpoch 35\/40\n8000\/8000 [==============================] - 104s 13ms\/sample - loss: 0.2295 - acc: 0.9309 - val_loss: 0.7207 - val_acc: 0.8126\nEpoch 36\/40\n8000\/8000 [==============================] - 87s 11ms\/sample - loss: 0.2212 - acc: 0.9335 - val_loss: 0.7240 - val_acc: 0.8131\nEpoch 37\/40\n8000\/8000 [==============================] - 98s 12ms\/sample - loss: 0.2129 - acc: 0.9362 - val_loss: 0.7271 - val_acc: 0.8124\nEpoch 38\/40\n8000\/8000 [==============================] - 89s 11ms\/sample - loss: 0.2057 - acc: 0.9381 - val_loss: 0.7324 - val_acc: 0.8138\nEpoch 39\/40\n8000\/8000 [==============================] - 88s 11ms\/sample - loss: 0.1985 - acc: 0.9405 - val_loss: 0.7427 - val_acc: 0.8130\nEpoch 40\/40\n8000\/8000 [==============================] - 87s 11ms\/sample - loss: 0.1920 - acc: 0.9425 - val_loss: 0.7403 - val_acc: 0.8162<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Inference Model<\/strong><\/h2>\n\n\n\n<p>After training the model, the next step is to use the model to make predictions. The model to make the prediction is called the inference model. This model is almost like the training model save some slight differences. The training model is not built to recursively return one character at a time. This, the Inference Model must do.&nbsp;<\/p>\n\n\n\n<p>Even though the inference model is a different model from the training model, it would require reference to the features of the training model. To define the encoder model, we refer to the trained model and take the input layer from the encoder and output the states (cell and hidden state) of the encoder.&nbsp;<\/p>\n\n\n\n<p>To define the decoder model, we define its initial state as the hidden and cell state of the newly created encoder model (the encoder of the inference model). This is important because this decoder is a different model and must take its initial input from the encoder. Having defined the input of the decoder, it can now be passed as the initial states of the LSTM layer.&nbsp;<\/p>\n\n\n\n<p>The model is built such that the hidden and cell of the encoder is used as the initial state of the decoder. However, on subsequent calls, the initial state of the decoder is the hidden and cell state of the previous call. Hence, the model will have to output the hidden and cell state alongside the predicted character of that call.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#define the decoder input state as&nbsp; a list of the hidden and cell state<\/em>\ndecoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]\ndecoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)\ndecoder_states = [state_h, state_c]\n<em>#define the decoder output<\/em>\ndecoder_outputs = decoder_dense(decoder_outputs)\n<em>#define the decoder model<\/em>\ndecoder_model = Model(\n[decoder_inputs] + decoder_states_inputs,&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;[decoder_outputs] + decoder_states\n)\n\n<em># Reverse-lookup token index to decode sequence back to something readable<\/em>\nreverse_input_char_index = dict(\n(i, char) <strong>for<\/strong> char, i <strong>in<\/strong> input_token_index.items()\n)\nreverse_target_char_index = dict(\n(i, char) <strong>for<\/strong> char, i <strong>in<\/strong> target_token_index.items()\n)\n<\/pre>\n\n\n\n<p>Now, we can tie it all together and define a function that decodes some text in English to output French text.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>def<\/strong> decode_sequence(input_seq):\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#encode the input as state vectors<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;states_value = encoder_model.predict(input_seq)\n&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#generate empty target sequence of length 1.<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;target_seq = np.zeros((1, 1, num_decoder_tokens))\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#populate the first character of target sequence with the start character<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;target_seq[0, 0, target_token_index['<strong>\\t<\/strong>']] = 1.\n&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#sampling loop for a batch of sequences<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#to simplify, we use batch size of 1<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;stop_condition = False\n&nbsp;&nbsp;&nbsp;&nbsp;decoded_sentence = ''\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>while<\/strong> <strong>not<\/strong> stop_condition:\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;output_tokens, h, c = decoder_model.predict(\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[target_seq] + states_value\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em>#sample a token<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sampled_token_index = np.argmax(output_tokens[0, -1, :])\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sampled_char = reverse_target_char_index[sampled_token_index]\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;decoded_sentence += sampled_char\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em>#exit condition: either hit max length or find stop character<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<strong>if<\/strong> (len(decoded_sentence) &gt; max_decoder_seq_length <strong>or<\/strong> sampled_char == '<strong>\\n<\/strong>'):\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;stop_condition = True\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em>#update the target sequence (of length 1)<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;target_seq = np.zeros((1, 1, num_decoder_tokens))\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;target_seq[0, 0, sampled_token_index] = 1.\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em>#update states<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;states_value = [h, c]\n&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>return<\/strong> decoded_sentence\n<\/pre>\n\n\n\n<p>Let\u2019s call the function and check the result&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>for<\/strong> seq_index <strong>in<\/strong> range(100):\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#take one sequence (part of the training set) for trying out decoding.<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;input_seq = encoder_input_data[seq_index: seq_index + 1]\n&nbsp;&nbsp;&nbsp;&nbsp;call the function\n&nbsp;&nbsp;&nbsp;&nbsp;decoded_sentence = decode_sequence(input_seq)\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>print<\/strong>()\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>print<\/strong>(f\"Input sentence: {input_texts[seq_index]}\")\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>print<\/strong>(f\"Decoded sentence: {decoded_sentence}\")<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Input sentence: Go. Decoded sentence: Vaya.&nbsp;\nInput sentence: Run! Decoded sentence: \u00a1Corre!&nbsp;\nInput sentence: Who? Decoded sentence: \u00bfQui\u00e9n es?\nInput sentence: Fire! Decoded sentence: \u00a1Disparad!&nbsp;\nInput sentence: I care. Decoded sentence: Me preocupo.&nbsp;\nInput sentence: I fell. Decoded sentence: Me acuera.\n<\/pre>\n\n\n\n<p>Quite a decent result!<\/p>\n\n\n\n<p>There you have it, an English-Spanish translation model built with seq2seq. If you have any questions, let us know in the comment section.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is Seq2seq Learning? Sequence to sequence learning involves building a model where data in a domain can be converted to another domain, following the input data sequence. Seq2seq, as it is called for short, is especially useful in Natural Language Processing for language translation. If you\u2019ve used popular language translators like Google Translate, you\u2019d [&hellip;]<\/p>\n","protected":false},"author":10,"featured_media":5314,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[498],"tags":[],"class_list":["post-5257","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence-tutorials"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/5257","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=5257"}],"version-history":[{"count":1,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/5257\/revisions"}],"predecessor-version":[{"id":30154,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/5257\/revisions\/30154"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/5314"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=5257"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=5257"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=5257"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}