This is a set of Jupyter notebooks I have created (in both Spanish and English) to accompany classes I give in Masters in Artificial Intelligence on the latest developments in end-to-end NLP (Natural Language Processing) with neural networks.

Some people say that 2018 was the “ImageNet” year for text. By this they are referring to the breakthroughs in image recognition and, in particular, transfer learning. That is to say, the possibility of training a large, computationally expensive model on a general data set, and being able to “fine-tune” this model for a specific task (for example, to tell the difference between dogs and cats). Up until recently, it has not been feasible to apply transfer learning to text based (or NLP) models.

  • References. A list of links to all the relevant academic papers.
  • Classification of text with cutting edge models. An introduction to and comparison of Word2Vec, ELMo, BERT and XLNet models for classifying IMDB movie reviews as either positive or negative.
  • Attention. A deep dive into the attention mechanism used in the Transformer - the main building block of BERT - starting from a simple Vec2Vec model to translate from English to Spanish.
  • BERT understands. Here a BERT model that has been fine-tuned on the SQuAD (Stanford Question Answering Dataset) is used to answer reading comprehension questions about a Harry Potter book.
  • BERT predicts. BERT is trained to be able to fill in the missing words in a sentence.
  • Bertle. A semantic search engine that uses BERT sentence embeddings to find relevant articles from Stack Overflow.
  • Dr BERT. A psychoanalyst inspired by Eliza and trained using the transcripts of Dr Carl Rogers.
  • Language models. A language model is a function that estimates the probability of the next word (or token) conditioned on the text that precedes it. Here we are going to use the GPT-2 language model to predict the continuation of a sentence and to draw attention to unlikely constructions.
  • Text generation with cutting edge models. Language Models XLNet and GPT-2 are used to generate random prose, from writing chapters of Game of Thrones to generating tweets in the style of Donald Trump.
  • Amazon opinions. A Kaggle style competition to use what you have learned to create a model to classify Amazon reviews as either negative or positive. Extra challenges arise from having a very small, unbalanced data set in Spanish.
  • GPT-2. A Python script using Hugging Face’s PyTorch implementation to generate text with the 1.5 billion parameter GPT-2 model released by OpenAI in November 2019.

All the notebooks can be run on Google Colab. If you want to access the pre-trained checkpoints (approximately 10 Gb) on Google Colab, send this link to your Gmail account and save the directory with the name “checkpoints” to your Google Drive in a subdirectory (which you may need to create) called “Colab Notebooks”. Note that the notebooks currently work with TensorFlow 1.14 and the checkpoints are compatible with Keras 2.2.4. On Google Colab you can install these with !pip uninstall -y tensorflow, !pip install --upgrade tensorflow-gpu==1.14 and !pip install --upgrade keras==2.2.4.