Deep Learning for Machine Translation

Hassan Sajjad and I were fortunate enough to have the opportunity to teach a deep learning course at the Computation Linguistics school organized by Deutsche Gesellschaft für Sprachwissenschaft. This course is geared towards students with a limited background in deep and machine learning. It walks them through building their very first machine learning model, all the way up to developing a strong intuition behind sequence-to-sequence models, with the material wrapped in the context of language. We also look at practical considerations when training these models, write code and perform exercises that mirror what is learned in the lectures and finally peek into some techniques of better understanding what these state-of-the-art models actually learn about language.

The official spiel

Statistical methods have dominated the field of machine translation for almost a decade now. These methods use a parallel corpus, i.e. a set of sentence pairs, where a sentence pair consists of a source language sentence, and its corresponding target language translation. The main objective of these methods has been to learn a mapping between the source and target words, and then use this mapping to generate translations of new source sentences. As recently as a couple of years ago, Deep Neural Networks have dethroned the Phrase based methods, and have been shown to give state-of-the-art results for machine translation.

In this lecture series, we will first cover the basics of statistical machine translation to establish the intuition behind machine translation. We will then cover the basics of neural network models – word embedding and neural language model. Finally, we will learn an end-to-end translation system based completely on deep neural networks. In the last part of the lecture series, we will learn to peek into these neural systems and analyze what they learn about the intricacies of a language like morphology and syntax, without ever explicitly seeing these details in the training data. We will see how to adapt these models quickly to the required domain without retraining the model from scratch.

Pre-requisites

Python
- Python/numpy tutorial
- iPython/Jupyter notebook tutorial
Basics of linear algebra
- Linear Algebra for Machine Learning

Small portions of the lectures and exercises are dedicated to refreshing these concepts as well!

Lecture materials

Lecture 0: Introduction & Roadmap [slides]
Lecture 1: Language & Translation [slides]
Lecture 2: Language Modeling [slides]
- Python tutorial [iPython/Jupyter Notebook]
- Python tutorial as a PDF [non-editable]
Lecture 3: Machine Learning [slides]
- Decision Boundary Exercise [iPython/Jupyter Notebook]
- Optimization functions demonstration [iPython/Jupyter Notebook]
Lecture 4: Machine Learning II [slides]
- Linear Classifier with MSE [iPython/Jupyter Notebook]
- Linear Classifier with Softmax and Cross Entropy [iPython/Jupyter Notebook]
Lecture 5: Machine Learning and Neural Networks [slides]
- Efficient Linear Classifier with Softmax and Cross Entropy [iPython/Jupyter Notebook]
- Neural Network [iPython/Jupyter Notebook]
- Neural Network with Keras toolkit [iPython/Jupyter Notebook]
Lecture 6: Neural Network Language Models [slides]
- Language Modeling with Keras [iPython/Jupyter Notebook]
Lecture 7: Sequence to Sequence [slides]
Lecture 8: Practical Neural MT [slides]
Lecture 9: Analysis of Neural MT [slides]
Lecture 10: Recent Advancements [slides]

The official spiel

Pre-requisites

Lecture materials

Comments

Say something: