Skip to Content

Deep Learning for Machine Translation

Delivered in Fall 2017 at Heinrich-Heine-Universität Düsseldorf (DGfS-CL Fall School).

Hassan Sajjad and I were fortunate enough to have the opportunity to teach a deep learning course at the Computation Linguistics school organized by Deutsche Gesellschaft für Sprachwissenschaft. This course is geared towards students with a limited background in deep and machine learning. It walks them through building their very first machine learning model, all the way up to developing a strong intuition behind sequence-to-sequence models, with the material wrapped in the context of language. We also look at practical considerations when training these models, write code and perform exercises that mirror what is learned in the lectures and finally peek into some techniques of better understanding what these state-of-the-art models actually learn about language.

The official spiel

Statistical methods have dominated the field of machine translation for almost a decade now. These methods use a parallel corpus, i.e. a set of sentence pairs, where a sentence pair consists of a source language sentence, and its corresponding target language translation. The main objective of these methods has been to learn a mapping between the source and target words, and then use this mapping to generate translations of new source sentences. As recently as a couple of years ago, Deep Neural Networks have dethroned the Phrase based methods, and have been shown to give state-of-the-art results for machine translation.

In this lecture series, we will first cover the basics of statistical machine translation to establish the intuition behind machine translation. We will then cover the basics of neural network models – word embedding and neural language model. Finally, we will learn an end-to-end translation system based completely on deep neural networks. In the last part of the lecture series, we will learn to peek into these neural systems and analyze what they learn about the intricacies of a language like morphology and syntax, without ever explicitly seeing these details in the training data. We will see how to adapt these models quickly to the required domain without retraining the model from scratch.

Pre-requisites

Small portions of the lectures and exercises are dedicated to refreshing these concepts as well!

Lecture materials

Comments

No comments yet.

Say something: