Machine Translation/Statistics

Statistical machine translation

edit

Language models

edit

Language models are used in MT for a) scoring arbitrary sequences of words (tokens) and b) given a sequence of tokens, they predict what token will likely to follow the sequence. Formally, language models are probability distributions over sequences of tokens in a given language.

N-gram models

edit

Character-based models

edit

Recently, it was shown that it is possible to use sub-words, characters or even bytes as basic units for language modelling[citation needed]. There are a few events focused particularly on such models and in general, processing language data on sub-word units, e.g. SCLem 2017.

Translation models

edit

IBM models 1-5

edit

Phrase-based models

edit

Factored translation models

edit

Syntax- and tree-based models

edit

Synchronous phrase grammar

edit

Parallel tree-banks

edit

Syntactic rules extraction

edit

Decoding

edit
edit

Hybrid systems

edit

Computer-aided translation

edit

Translation memory

edit