LaTeX/Internationalization
LaTeX requires some additional configuration to typeset documents in languages other than English. There are currently two packages providing international language support, namely, Babel and Polyglossia:
- Babel[1] works with the three main engines, namely, pdfTeX, LuaTeX and XeTeX. Depending on the engine the number of supported languages (with various levels of coverage) goes from about 170 to 300, and new ones can be declared easily from scratch. It also provides partial support for Plain TeX.
- Polyglossia was devised as an alternative to Babel for XeTeX (although currently also provides partial support for LuaTeX, but not for pdfTeX). It supports about 90 languages.
Both packages cover the major languages around the World (French, Spanish, Arabic, Chinese, Japanese, Thai, Hindi, Marathi, etc.) and handle the following tasks:
- Fonts
- Setting the script and language tags of the current font, if possible, and switching between fonts for each language, as specified by the user (mainly XeTeX and LuaTeX). With Babel + LuaTeX the font can be switched automatically based on script. See also the discussion of fontspec in the Fonts chapter.
- Linebreaking, justification and hyphenation
- Activating for each script and language the corresponding line breaking algorithm. In the case of hyphenated languages, loading the language-specific hyphenation patterns. Babel provides basic line breaking for CJK scripts, as well as non-standard hyphenation, like “ff” → “ff-f”, repeated hyphens, and ranked rules. There is also some tentative support for Arabic and Tibetan justificacion.
- Cultural elements
- Translating document labels (like “chapter”, “figure”, “bibliography”), as well as formatting dates according to language-specific conventions and formatting numbers for languages that have their own numbering system. Polyglossia can generate the current date in the Hebrew, Islamic (Civil) and Persian calendars; Babel supports in addition Islamic Umm al-Qura, Coptic, Ethiopic, Chinese, and Buddhist.
- Bidirectional typesetting
- Supporting documents that contain right to left scripts. Babel + LuaTeX uses an algorithm based on the Unicode one, which changes the direction automatically. Layout elements such as tables, margins and so on must be reversed too, and this is done by Babel with LuaTeX to a great extent. With XeTeX, both Babel and Polyglossia rely on the bidi package, which requires explicit markup to change the direction.
- Typographical rules and transliterations
- Performing miscellaneous transformation both at the character level (like transliterations) as well as at the typographical level (like inserting spaces or penalties at appropriate places). Babel with LuaTeX can do this automatically by means of “transforms”; with XeTeX this can be done to some extent (both Babel and Polyglossia), while in 8-bit engines many of them must be done by hand.
With Babel, LaTeX ≥ 2018-04-01, and a monolingual document in UTF-8 encoding (which is the recommended encoding), all you need in many European languages is something like, for example:
\documentclass[french]{article}
\usepackage[T1]{fontenc} % <- With XeTeX or LuaTeX, delete this line
\usepackage{babel}
\begin{document}
Plus ça change, plus c'est la même chose!
\end{document}
|
In addition, there are some specialized frameworks for languages like Japanese, Korean or Chinese, described below.
Encodings
editUnicode engines
editWhen using the xelatex or lualatex engines, many of the problems described below are solved for you. Input files are assumed to be UTF-8 (XeLaTeX also accepts UTF-16 and UTF-32), and the engine automatically maps Unicode characters to their glyphs in the TrueType or OpenType fonts you selected for your document. (This is, of course, assuming those fonts contain the glyphs you need, so you must ensure that your fonts support the languages you are using.)
8-bit engines
editWith engines not supporting Unicode internally (latex or pdflatex), LaTeX must handle two fundamental problems:
- Mapping the bytes of your input file into the characters of the language(s) you want to use.
- Mapping those characters to their glyphs in the fonts your document uses.
With them, you must tell LaTeX which encoding to use for your input files, and what "output" encoding it should use to map characters to their glyphs in the fonts. In most cases (especially for multilingual documents), UTF-8 is an optimal input encoding, which is currently the default encoding.
For most Latin languages, T1 is the desired output encoding, and can be set with:
\usepackage[T1]{fontenc}
|
Other output encodings for specific languages are shown below.
For additional information, see the discussion of encoding in the Fonts chapter, as well as the Special Characters chapter.
Babel
editThe core package babel supports the 3 main engines (PDFLaTeX, LuaLaTeX and XeLaTeX). There are two ways to specify the document languages. One of them is as arguments to the package when it is loaded:
\usepackage[language]{babel}
|
Another approach is making the language a global option in order to let other packages detect and use it:
\documentclass[language]{article}
\usepackage{babel}
|
Finally, babel provides total or partial support for about 250 languages with a set of ini files, which are accessed with \babelprovide
. This command can be used to define easily your own language from scratch, too.
Babel will automatically activate the appropriate hyphenation rules for the language you choose. If your LaTeX format does not support hyphenation in the language of your choice, babel will still work but will disable hyphenation, which has quite a negative effect on the appearance of the typeset document (with LuaLaTeX, however, hyphenation rules can be loaded when the document is being typeset). Babel also specifies new commands for some languages, which simplify the input of special characters. See the sections about languages below for more information.
If you call babel with multiple languages:
\usepackage[languageA,languageB]{babel}
|
Short texts in a secondary language does not require an explicit declaration when loading babel. Just select it as explained in what follows and the basic declarations will be loaded on the fly.
The last language in the option list will be active (i.e. languageB), and you can use the command
\selectlanguage{languageA}
|
to change the active language (when the document begins, with \begin{document}
, the main language is automatically selected). You can also add short pieces of text in another language using the command
\foreignlanguage{languageB}{Text in another language}
|
Babel also offers various environments for entering larger pieces of text in another language:
\begin{otherlanguage}{languageB}
Text in language B. This environment switches all language-related definitions, like the language
specific names for figures, tables etc. to the other language.
\end{otherlanguage}
|
The starred version of this environment typesets the main text according to the rules of the other language, but keeps the language specific string for ancillary things like figures in the main language of the document. The environment hyphenrules switches only the hyphenation patterns used; it can also be used to disallow hyphenation by using the language name 'nohyphenation' (but note otherlanguage* is preferred).
The babel manual provides much more information on these and many other options.
Font management
editIf you are using XeTeX or LuaTeX, Babel supports OpenType fonts with fontspec. To ease font handling, it provides the macro \babelfont
, which switches the font across languages and sets the OpenType ‘language system’ (ie, language and script). Let us assume you are setting up a document in Swedish, with some words in Hebrew, with a font suited for both languages:
\babelfont{rm}{FreeSerif}
|
If, on the other hand, you have to resort to different fonts, you would say:
\babelfont{rm}{Iwona}
\babelfont[hebrew]{rm}{FreeSerif}
|
Also, with version >=3.38 the locale identifiers (\language and \localeid) and the fonts can be switched without explicit markup, depending on the script (only LuaTeX). In the following example, bidi=basic
switches the direction, and onchar=ids fonts
switches the identifiers and the font:
\documentclass{article}
\usepackage[swedish, bidi=basic]{babel}
\babelprovide[import, onchar=ids fonts]{hebrew}
\babelfont{rm}{Iwona}
\babelfont[hebrew]{rm}{FreeSerif}
\begin{document}
Svenska עִבְרִית svenska.
\end{document}
|
Bidirectional texts
editBabel provides basic support fo bidi texts, mainly in LuaTeX. The package option may take three values, namely, default
, basic-r
, and basic
. With bidi=basic
RTL and LTR text can be mixed without explicit markup (only LuaTeX).
Multilingual versions
editIt is possible in LaTeX to typeset the content of one document in several languages and to choose upon compilation which language to output in predefined strings (chapter name, date, etc.). Using the commands above in multilingual documents can be cumbersome, and therefore babel provides a way to define shorter names. With
\babeltags{de = german}
|
You can write:
text \textde{German text} text
text
\begin{de}
German text
\end{de}
text
|
There is a clear drawback to this feature, namely, the ‘prefix’ \text... is heavily overloaded in LaTeX and conflicts with existing macros may arise. The babel manual recommends to to stick to the default selectors or to define your own alternatives.
The current language can also be tested by using the iflang package by Heiko Oberdiek (the built-in feature from the babel package is not reliable). Here comes a simple example:
\IfLanguageName{ngerman}{Hallo}{Hello}
This allows to easily distinguish between two languages without the need of defining own commands. Another approach for localized strings is translator.
Polyglossia
editWhen using XeLaTeX or LuaLaTeX, polyglossia provides an alternative to the core babel package for international language support, as described in its manual.
The original aim was to be compatible with babel, but there is a number of differences. For example, the standard mechanism in LaTeX to declare languages, via package or class options, is not recognized, and the user must rely on a set of new commands, as shown in the example. Unlike babel, secondary languages must be always explicitly declared. It also adds the concept of ‘language variant’, while in babel all locales are treated on an equal footing. Not only languages are declared in a non standard way, but also a new way to switch languages has been devised, with commands like \textenglish or \textlang.
To use polyglossia, load it in your preamble and specify the languages you will be using, along with any language-specific options you wish.
If, for example, we have a document with American English as the main language, and some short texts in French, Bulgarian and Serbian, you might use:
\documentclass{article}
\usepackage{polyglossia}
\setdefaultlanguage[variant=american]{english}
\setotherlanguages{french, bulgarian, serbian}
\newfontfamily\bulgarianfont
{NewComputerModern10}[Script=Cyrillic,Language=Bulgarian]
\newfontfamily\serbianfont
{NewComputerModern10}[Script=Cyrillic,Language=Serbian]
\begin{document}
English. \textlang{french}{French}. \textlang{bulgarian}{Български}.
\textlang{serbian}{Српски}.
\end{document}
|
As a comparison, here is the code with `babel`:
\documentclass{article}
\usepackage[american]{babel}
\babelfont[*cyrillic]{rm}{NewComputerModern10}
\begin{document}
English. \foreignlanguage{french}{French}.
\foreignlanguage{bulgarian}{Български}. \foreignlanguage{serbian}{Српски}.
\end{document}
|
Specific languages
editHere is a collection of language-specific suggestions. If you have experience in a language not listed below, please add some notes about it. Some of the methods described in this chapter may be useful when dealing with non-English author names in bibliographies.
Most of the following assumes you are using babel, but polyglossia supports some of the same commands, although their behavior may be different. |
Arabic script
editDocuments in the Arabic script, including Arabic, Persian, Urdu, Pashto, Kurdish, Uyghur, etc., are best typeset with either XeTeX or LuaTeX. An example with babel and LuaTeX follows (rendering by the browser may be different from an editor):
\documentclass{article}
\usepackage[bidi=basic]{babel}
\babelprovide[import, main]{arabic}
\babelfont{rm}{FreeSerif}
\begin{document}
وﻗﺪ ﻋﺮﻓﺖ ﺷﺒﻪ ﺟﺰﻳﺮة اﻟﻌﺮب ﻃﻴﻠﺔ اﻟﻌﺼﺮ اﻟﻬﻴﻠﻴﻨﻲ )اﻻﻏﺮﻳﻘﻲ( ﺑـ
Arabia أو Aravia )ﺑﺎﻻﻏﺮﻳﻘﻴﺔ Αραβία (، اﺳﺘﺨﺪم اﻟﺮوﻣﺎن ﺛﻼث
ﺑﺎدﺋﺎت ﺑـ “Arabia” ﻋﻠﻰ ﺛﻼث ﻣﻨﺎﻃﻖ ﻣﻦ ﺷﺒﻪ اﻟﺠﺰﻳﺮة اﻟﻌﺮﺑﻴﺔ، إﻻ أﻧﻬﺎ
ﺣﻘﻴﻘﺔً ﻛﺎﻧﺖ أﻛﺒﺮ ﻣﻤﺎ ﺗﻌﺮف ﻋﻠﻴﻪ اﻟﻴﻮم.
\end{document}
|
With XeTeX, you may set bidi=bidi-r
, but mixed LR and RL text must be marked up explicitly. The same applies to polyglossia.
babel with LuaTeX provides partial and tentative support for Arabic justification based on kashida (with the ARABIC TATWEEL Unicode character) or on the ‘justification alternatives’ OpenType table (jalt).
An alternative package for LuaTeX is arabluatex, which is an extension for LuaTeX of arabtex, described below. For XeTeX there is arabxetex.
In 8-bit engines, they can be typeset in a number of ways, one of the oldest being arabtex. Add the following code to your preamble:
\usepackage{arabtex}
|
You can input text in either romanized characters or native Arabic script encodings. Use any of the following commands and environments to enter in text:
\< ... >
\RL{ ... }
\begin{arabtext} ... \end{arabtext}.
|
See the ArabTeX Wikipedia article for further details.
You may also use the Arabi package within Babel to typeset Arabic and Persian
\usepackage{cmap}
\usepackage[LAE,LFE]{fontenc}
\usepackage[arabic,farsi]{babel}
|
You may also copy and paste from PDF files produced with Arabi thanks to the support of the cmap package. You may use Arabi with LyX, or with tex4ht to produce HTML.
Armenian
editThe Armenian script uses its own characters, which will require you to install a text editor that supports Unicode and will allow you to enter UTF-8 text, such as Texmaker or WinEdt. These text editors should then be configured to compile using XeLaTeX or LuaLaTeX.
Once the text editor is set up to compile with XeLaTeX or LuaLaTeX, the fontspec package can be used to write in Armenian:
\usepackage{fontspec}
\setmainfont{DejaVu Serif}
|
or
\usepackage{fontspec}
\setmainfont{Sylfaen}
|
The Sylfaen font lacks italic and bold, but DejaVu Serif supports them.
See Armenian Wikibooks for further details, especially on how to configure the Unicode supporting text editors to compile with Unicode engines.
Cyrillic script
editCurrently the most convenient way to typeset Cyrillic texts is with XeTeX or LuaTeX in the UTF-8 encoding. An example for Russian with these engines, which do not require encoding transformations because everything is done directly in that encoding, is:
\documentclass{article}
\usepackage[russian]{babel}
\babelfont{rm}{DejaVu Serif}
\begin{document}
Россия, находящаяся на пересечении множества культур, а также
с учётом многонационального характера её населения, — отличается
высокой степенью этнокультурного многообразия и способностью к
межкультурному диалогу.
\end{document}
|
Support for Cyrillic in non-Unicode engines is based on standard LaTeX mechanisms plus the fontenc and inputenc packages. babel includes support for the T2* encodings and for typesetting Bulgarian, Russian and Ukrainian texts using Cyrillic letters[2] with non-Unicode engines. AMS-LaTeX packages should be loaded before fontenc and babel(Why?). If you are going to use Cyrillics in mathmode, you also need to load mathtext package before fontenc:
\usepackage{amsmath,amsthm,amssymb}
\usepackage{mathtext}
\usepackage[T1,T2A]{fontenc}
\usepackage[english,bulgarian,russian,ukrainian]{babel}
|
Generally, babel will automatically choose the default font encoding, for the above three languages this is T2A. However, documents are not restricted to a single font encoding. For multilingual documents using Cyrillic and Latin-based languages it makes sense to include Latin font encoding explicitly. Babel will take care of switching to the appropriate font encoding when a different language is selected within the document.
On modern operating systems it is beneficial to use Unicode (utf8 or utf8x) instead of KOI8-RU (koi8-ru) as an input encoding for Cyrillic text.
In addition to enabling hyphenations, translating automatically generated text strings, and activating some language specific typographic rules (like \frenchspacing
), babel provides some commands allowing typesetting according to the standards of Bulgarian, Russian, or Ukrainian languages.
For all three languages, language specific punctuation is provided: the Cyrillic dash for the text (it is little narrower than Latin dash and surrounded by tiny spaces), a dash for direct speech, quotes, and commands to facilitate hyphenation:
Key combination | Action |
---|---|
"| |
No ligature at this position. |
"- |
Explicit hyphen sign, allowing hyphenation in the rest of the word. |
"--- |
Cyrillic emdash in plain text. |
"--~ |
Cyrillic emdash in compound names (surnames). |
"--* |
Cyrillic emdash for denoting direct speech. |
"" |
Similar to "- , but it produces no hyphen sign (used for compound words with hyphen, e.g. x-""y or some other signs as “disable/enable”).
|
"~ |
Compound word mark without a breakpoint. |
"= |
Compound word mark with a breakpoint, allowing hyphenation in the composing words. |
", |
Thinspace for initials with a breakpoint in a following surname. |
"‘ |
German opening double quote (,,). |
"’ |
German closing double quote (“). |
"< |
French opening double quote (<<). |
"> |
French closing double quote (>>). |
The Russian and Ukrainian options of babel define the commands
\Asbuk
\asbuk
|
which act like \Alph
and \alph
(commands for turning counters into letters, e.g. a, b, c...
), but produce capital and small letters of Russian or Ukrainian alphabets (whichever is the active language of the document).
The Bulgarian option of babel provides the commands
\enumBul
\enumLat
\enumEng
|
which make \Alph
and \alph
produce letters of either Bulgarian or Latin (English) alphabets. The default behaviour of \Alph
and \alph
for the Bulgarian language option is to produce letters from the Bulgarian alphabet.
See the Bulgarian translation of "The Not So Short Introduction to LaTeX" [3] for a method to type Cyrillic letters directly from the keyboard using a different distribution.
Chinese
editTypesetting Chinese texts (and, in general, CJK script ones) is best done with a complete framework, like CJK o xeCJK, although for short texts or a few words in horizontal typesetting babel with XeTeX and LuaTeX could be enough, with basic line breaking.
CJK Package
editOne possible Chinese support is made available thanks to the CJK package collection. If you are using a package manager or a portage tree, the CJK collection is usually in a separate package because of its size (mainly due to fonts).
Make sure your document is saved using the UTF-8 character encoding. See Special Characters for more details. Put the parts where you want to write chinese characters in a CJK environment.
\documentclass{article}
\usepackage{CJK}
\begin{document}
\begin{CJK}{UTF8}{gbsn}
你好
You can mix Latin letters and Chinese.
\end{CJK}
\end{document}
|
The last argument specifies the font. It must fit the desired language, since fonts are different for Chinese, Japanese and Korean. Possible choices for Chinese include:
- gbsn (简体宋体, simplified Chinese)
- gkai (简体楷体, simplified Chinese)
- bsmi (繁體細上海宋體, traditional Chinese)
- bkai (繁體標楷體, traditional Chinese)
In CTeX distribution (which has been outdated), six more fonts for simplified Chinese are included, corresponding to default Windows fonts:
- song (宋体, Simsun)
- hei (黑体, Simhei)
- fang (仿宋, STFangSong)
- kai (楷体, STKaiti)
- li (隶书, SimLi)
- you (幼圆, SimYou)
xeCJK Package
editWhen using the XeTeX engine, there is another package called xeCJK, which is based on fontspec and offers similar interface to CJK package.
When using the package, one can define CJK fonts like this:
\documentclass{article}
\usepackage{xeCJK}
\setCJKmainfont{FZSSK.ttf} % use Foundertype's Chinese font, which has a free license
\begin{document}
你好
You can still mix Latin letters and Chinese!
\end{document}
|
Czech
editCzech is fine using
\usepackage[czech]{babel}
|
UTF-8 allows you to have „czech quotation marks“ directly in your text. Otherwise, there are macros \clqq and \crqq to produce left and right quote. You can place quotated text inside \uv
.
Copying and searching in PDF
editAlthough czech letters with diacritical sign are displayed correctly, they are not copy-able or search-able in PDF files generated with pdfLaTeX with just command above. Using package cmap solves this for some fonts, for others is also neccessary to use command glyphtounicode.
Font | (no additional command) | \usepackage{cmap}
|
\usepackage[resetfonts]{cmap}
|
\usepackage{cmap}
\input{glyphtounicode}
\pdfgentounicode=1
|
---|---|---|---|---|
\usepackage{lmodern}
|
ešcržýáíédtnúuŠCRŽÁÚ | ěščřžýáíéďťňúůŠČŘŽÁÚ | ěščřžýáíéďťňúůŠČŘŽÁÚ | ěščřžýáíéďťňúůŠČŘŽÁÚ |
\usepackage{ebgaramond}
|
ešcržýáíédtnúuŠCRŽÁÚ | ešcržýáíédtnúuŠCRŽÁÚ | ešcržýáíédtnúuŠCRŽÁÚ | ěščřžýáíéďťňúůŠČŘŽÁÚ |
Devanagari and other Indic scripts
editThe Devanagari script is used by many languages, including Marathi, Pāḷi, Sanskrit, Hindi, Nepali, Bodo, Konkani, Prakrit. Here is an example for Hindi with babel, for both XeTeX and LuaTeX:
\documentclass{article}
\usepackage[hindi, provide=*]{babel}
\babelfont{rm}{FreeSerif}
\begin{document}
देवनागरी एक भारतीय लिपि है जिसमें अनेक भारतीय भाषाएँ तथा कई विदेशी
भाषाएँ लिखी जाती हैं।
\end{document}
|
Other Indic scripts have a similar setup (Malayalam, Bengali, Sinhala, Telugu, Tamil, Kannada, Assamese, Punjabi, etc.).
If any additional features are required, you need an alternative approach, as illustrated in the following example for Bangla, which sets the option mapdigits for the Arabic digits to be converted to the local ones (only LuaTeX).
\documentclass{article}
\usepackage{babel}
\babelprovide[import, main, mapdigits]{bengali}
\babelfont{rm}{FreeSerif}
\begin{document}
গাইতে গাইতে গায়েন।
\end{document}
|
Mapping the digits is accomplished in XeTeX at the font level, with the option Mapping=, like:
\babelfont{rm}[Mapping=bengalidigits]{FreeSerif}
|
This is actually a XeTeX feature and doesn't require babel. It can be used directly with fontspec.
Support in pdfTeX is based mainly on the velthuis package. An alternative for XeTeX is latexbangla, which relies on polyglossia.
Finnish
editFinnish language hyphenation is enabled with:
\usepackage[finnish]{babel}
|
This will also automatically change document language (section names, etc.) to Finnish.
French
editAs of version 3.0 of babel-french, it is advised to choose the language as a global option with the following command[4]:
\documentclass[french]{article}
\usepackage{babel}
|
Formerly, you could load French language support with the following command:
\usepackage[frenchb]{babel}
|
or
\usepackage[francais]{babel}
|
There are multiple options for typesetting French documents, depending on the flavor of French: french for Parisian French, and acadian and canadien for new-world French. If you do not know or do not really care, we would recommend using french
.
All enable French hyphenation, if you have configured your LaTeX system accordingly. All of these also change all automatic text into French: \chapter
prints Chapitre, \today
prints the current date in French and so on. A set of new commands also becomes available, which allows you to write French input files more easily. Check out the following table for inspiration:
input code | rendered output |
---|---|
\og guillemets \fg{} | « guillemets » |
M\up{me}, D\up{r} | Mme, Dr |
1\ier{}, 1\iere{}, 1\ieres{} | 1er, 1re, 1res |
2\ieme{} 4\iemes{} | 2e 4es |
\No 1, \no 2 | N° 1, n° 2 |
20~\degres C, 45\degres | 20 °C, 45° |
M. \bsc{Durand} | M. Durand |
\nombre{1234,56789} | 1 234,567 89 |
You may want to typeset guillemets and other French characters directly if your keyboard has them. Running Xorg (*BSD and GNU/Linux), you may want to use the oss variant which features some nice shortcuts, like
Key combination | Character |
---|---|
Alt Gr + w | « |
Alt Gr + x | » |
Alt Gr + Shift + é | É |
Alt Gr + Shift + è | È |
Alt Gr + Shift + ç | Ç |
You will need the T1 font encoding for guillemets to print properly.
For the degree character you will get an error like
! Package inputenc Error: Unicode char \u8:° not set up for use with LaTeX.
The textcomp package will fix it for you.
The great advantage of Babel for French is that it will handle some elements of French typography for you, especially non-breaking spaces before all two-parts punctuation marks. So now you can write:
Il répondit: «Ce pain coûte-t-il 2~€?»
|
The non-breaking space before the euro symbol is still necessary because currency symbols and other units or not supported in general (that's not specific to French).
You can use the numprint package along Babel. It will let you print numbers the French way.
\usepackage[french]{babel}
\usepackage[autolanguage]{numprint} % Must be loaded *after* babel.
% ...
\nombre{123456.123456 e-17}
|
|
You will also notice that the layout of lists changes when switching to the French language. This is customizable using the \frenchsetup
command. For more information on what the french option of babel does and how you can customize its behavior, run LaTeX on file frenchb.dtx and read the produced file frenchb.pdf or frenchb.dvi. You can get the PDF version on CTAN.
German
editYou can load German language support using either one of the two following commands (pdfTeX, XeTeX and LuaTeX are supported).
For traditional ("old") German orthography use
\usepackage[german]{babel}
|
or for reform ("new") German orthography use
\usepackage[ngerman]{babel}
|
This enables German hyphenation, if you have configured your LaTeX system accordingly. It also changes all automatic text into German, e.g. “Chapter” becomes “Kapitel”. A set of new commands also becomes available, which allows you to write German input files more quickly even when you don't use the inputenc package. Check out the table below for inspiration. With inputenc, all this becomes moot, but your text also is locked in a particular encoding world.
"A "O "U | Ä Ö Ü |
"a "o "u "s | ä ö ü ß |
"` or \glqq | „ |
"' or \grqq | “ |
\glq \grq | |
"< or \flqq | « |
"> or \frqq | » |
\flq \frq | ‹ › |
\dq | " |
In German books you sometimes find French quotation marks («guillemets»). German typesetters, however, use them differently. A quote in a German book would look like »this«. In the German speaking part of Switzerland, typesetters use «guillemets» the same way the French do. A major problem arises from the use of commands like \flq
: If you use the OT1 font encoding (which is the default) the guillemets will look like the math symbol " ", which turns a typesetter's stomach. T1 encoded fonts, on the other hand, do contain the required symbols. So if you are using this type of quote, make sure you use the T1 encoding.
Decimal numbers usually have to be written like 0{,}5 (not just 0,5). Packages like ziffer enable input like 0,5. Alternatively, one can use the \num
command from the babel and (globally) set the decimal marker using
\usepackage[output-decimal-marker={,}]{siunitx}
% ...
\num{0,5}
|
|
Greek
editThis is the preamble you need to write in the Greek language.
\usepackage[greek]{babel}
|
This preamble enables hyphenation and changes all automatic text to Greek. A set of new commands also becomes available, which allows you to write Greek input files more easily.
Modern Monotonic Greek, as well as Polytonic and Ancient Greek are supported.
If you need a language in the Latin script and you are using LuaTeX, you can switch automatically the font in the following way, with no explicit markup:
\documentclass{book}
\usepackage[portuguese, greek]{babel}
\babelprovide[onchar=ids fonts]{portuguese}
\babelfont{rm}{FreeSerif}
\babelfont[portuguese]{rm}{DejaVu Sans}
\begin{document}
abelha -- μελισσα
\end{document}
|
There is a dedicated package for XeTeX named xgreek.
Hungarian
editUse the following lines:
\usepackage[magyar]{babel}
|
More information in hungarian.
Icelandic and Faroese
editThe following lines can be added to write Icelandic text:
\usepackage[icelandic]{babel}
|
This changes text like Part into Hluti. It makes additional commands available:
"` or \glqq | „ |
\grqq | “ |
\TH | Þ |
\th | þ |
\DH | Ð |
\dh | ð |
To make special characters such as Þ and Æ become available just add:
\usepackage[T1]{fontenc}
|
The default LATEX font encoding is OT1, but it contains only the 128 characters. The T1 encoding contains letters and punctuation characters for most of the European languages using Latin script.
Italian
editItalian is well supported by LaTeX. Just add
\usepackage[italian]{babel}
|
at the beginning of your document and the output of all the commands will be translated properly.
Norwegian
editNorwegian is well supported by LaTeX. Just add
\usepackage[norsk]{babel}
|
at the beginning of your document and the output of all the commands will be translated properly.
Japanese
editjlreq
editThe package provides the class file and JFM (Japanese font metric) files for LuaTeX-ja / pLaTeX / upLaTeX. This aims to implement Requirements for Japanese Text Layout.
upTeX, pTeX
editThere is a variant of TeX intended for Japanese named upTeX, which supports vertical typesetting.
luatexja
editAnother possible way to write in japanese is to use Lualatex and the luatex-ja package. Adapted example from the Luatexja documentation :
\documentclass{ltjsarticle}
\usepackage{luatexja} % This line is unnecessary when using ltjclasses or ltjsclasses.
\begin{document}
\section{はじめてのLua\TeX-ja}
ちゃんと日本語が出るかな?
\subsection{出たかな?}
長い文章を入力するとちゃんと右端のところで折り返されるかな?
大丈夫そうな気がするけど.ちょっと不安だけど何事も挑戦だよね.
\end{document}
|
You can also use capabilities provided by the fontspec package and those provided by luatexja-fontspec to declare the font you want to use in your paper. Let us take an example:
% **********************************
% Basic setup
\documentclass[10pt,a4paper]{article}
\usepackage{fontspec}
\setmainfont[Numbers={OldStyle,Proportional}]{Arno Pro} %setup of western font
\usepackage{luatexja}
\usepackage{luatexja-fontspec}%needed to call \setmainjfont bellow
\setmainjfont[BoldFont=KozGoPr6N-Bold]{KozGoPr6N-Regular} %setup of japanese font
%***********************************
\begin{document}
It is a test to show japanese and english mix. テスト中です。どうですか皆さん。
\end{document}
|
Use UTF-8 as your encoding. In case you don't know how to do this, take a look at Texmaker, a LaTeX editor which uses UTF-8 by default.
luatex-ja can collaborate with babel. For example:
\documentclass{ltjbook}
\usepackage[ngerman,japanese]{babel}
|
For short Japanese texts (a few words or a few paragraphs) in a document in another language, babel (≥3.31) with luatex could be enough; eg:
\usepackage[ngerman]{babel}
\babelprovide[import]{japanese}
\babelfont[japanese]{rm}{IPAMincho}
|
For hyperref package to show the Table of Contents correctly, the encoding has to be explicitly specified.
\usepackage[unicode=true]{hyperref}
|
CJK, XeCJK, bxcjkjatype
editAnother (but old) possible Japanese support is made available thanks to the CJK or XeCJK package collection. If you are using a package manager or a portage tree, the CJK collection is usually in a separate package because of its size (mainly due to fonts).
Make sure your document is saved using the UTF-8 character encoding. See Special Characters for more details. Put the parts where you want to write japanese characters in a CJK environment.
\documentclass{article}
\usepackage{CJK}
\begin{document}
\begin{CJK}{UTF8}{min}
こんにちは
You can mix latin letters as well as hiragana, katakana and kanji.
\end{CJK}
\end{document}
|
The last argument specifies the font. It must fit the desired language, since fonts are different for Chinese, Japanese and Korean. min is an example for Japanese.
The bxcjkjatype package provides a working configuration of the CJK package, suitable for Japanese typesetting of moderate quality. Moreover, it facilitates use of the CJK package for pLATEX users, by providing commands that are similar to those used by the pLATEX kernel and some other packages used with it.
\documentclass[pdflatex,ja=standard]{bxjsarticle}
\begin{document}
吾輩は猫である。名前はまだ無い。
どこで生れたかとんと見当がつかぬ。
何でも薄暗いじめじめした所で
ニャーニャー泣いていた事だけは記憶している。
吾輩はここで始めて人間というものを見た。
\end{document}
|
Korean
editThe two most widely used encodings for Korean text files are EUC-KR and its upward compatible extension used in Korean MS-Windows, CP949/Windows-949/UHC. In these encodings each US-ASCII character represents its normal ASCII character similar to other ASCII compatible encodings such as ISO-8859-x, EUC-JP, Big5, or Shift_JIS. On the other hand, Hangul syllables, Hanjas (Chinese characters as used in Korea), Hangul Jamos, Hiraganas, Katakanas, Greek and Cyrillic characters and other symbols and letters drawn from KS X 1001 are represented by two consecutive octets. The first has its MSB set. Until the mid-1990's, it took a considerable amount of time and effort to set up a Korean-capable environment under a non-localized (non-Korean) operating system. You can skim through the now much-outdated http://jshin.net/faq to get a glimpse of what it was like to use Korean under non-Korean OS in mid-1990's.
TeX and LaTeX were originally written for scripts with no more than 256 characters in their alphabet. To make them work for languages with considerably more characters such as Korean or Chinese, a subfont mechanism was developed. It divides a single CJK font with thousands or tens of thousands of glyphs into a set of subfonts with 256 glyphs each.
For Korean, there are three widely used packages.
- HLATEX by UN Koaunghi
- hLATEXp by CHA Jaechoon
- the CJK package by Werner Lemberg
HLATEX and hLATEXp are specific to Korean and provide Korean localization on top of the font support. They both can process Korean input text files encoded in EUC-KR. HLATEX can even process input files encoded in CP949/Windows-949/UHC and UTF-8 when used along with Λ, Ω.
The CJK package is not specific to Korean. It can process input files in UTF-8 as well as in various CJK encodings including EUC-KR and CP949/Windows-949/UHC, it can be used to typeset documents with multilingual content (especially Chinese, Japanese and Korean). The CJK package has no Korean localization such as the one offered by HLATEX and it does not come with as many special Korean fonts as HLATEX.
The ultimate purpose of using typesetting programs like TeX and LaTeX is to get documents typeset in an aesthetically satisfying way. Arguably the most important element in typesetting is a set of welldesigned fonts. The HLATEX distribution includes UHC PostScript fonts of 10 different families and Munhwabu fonts (TrueType) of 5 different families. The CJK package works with a set of fonts used by earlier versions of HLATEX and it can use Bitstream's cyberbit True-Type font.
To use the HLATEX package for typesetting your Korean text, put the following declaration into the preamble of your document:
\usepackage{hangul}
|
This command turns the Korean localization on. The headings of chapters, sections, subsections, table of content and table of figures are all translated into Korean and the formatting of the document is changed to follow Korean conventions. The package also provides automatic particle selection. In Korean, there are pairs of post-fix particles grammatically equivalent but different in form. Which of any given pair is correct depends on whether the preceding syllable ends with a vowel or a consonant. (It is a bit more complex than this, but this should give you a good picture.) Native Korean speakers have no problem picking the right particle, but it cannot be determined which particle to use for references and other automatic text that will change while you edit the document. It takes a painstaking effort to place appropriate particles manually every time you add/remove references or simply shuffle parts of your document around. HLATEX relieves its users from this boring and error-prone process.
In case you don't need Korean localization features but just want to typeset Korean text, you can put the following line in the preamble, instead.
\usepackage{hfont}
|
For more details on typesetting Korean with HLATEX, refer to the HLATEX Guide. Check out the web site of the Korean TeX User Group (KTUG).
In the FAQ section of KTUG it is recommended to use the kotex package
\usepackage{kotex}
|
Persian script
editFor Persian language, there is a dedicated package called XePersian which uses XeLaTeX as the typesetting engine. Just add the following code to your preamble:
\usepackage{xepersian}
|
Moreover, Arabic script can be used to type Persian as illustrated in the corresponding section.
Polish
editIf you plan to use Polish in your encoded document, use the following code:
\usepackage{polski}
\usepackage[polish]{babel}
|
The above code merely allows to use Polish letters and translates the automatic text to Polish, so that "chapter" becomes "rozdział". There are a few additional things one must remember about.
Connectives
editPolish has many single letter connectives: "a", "o", "w", "i", "u", "z", etc., grammar and typography rules don't allow for them to end a printed line. To ensure that LaTeX won't set them as last letter in the line, you have to use non breakable space:
Noc była sierpniowa, ciepła i~słodka, Księżyc oświecał srebrnem światłem wgłębienie, tak,
że twarze małego rycerza i~Basi były skąpane w blasku.
Poniżej, na podwórzu zamkowem, widać było uśpione kupy żołnierzy, a~także i~ciała zabitych
podczas dziennej strzelaniny, bo nie znaleziono dotąd czasu na ich pogrzebanie.
|
Babel (>=3.58) with LuaTeX provides a transform for this purpose, without explicit markup, which is activated with:
\babelprovide[transforms = oneletter.nobreak]{polish}
|
Numerals
editAccording to Polish grammar rules, you have to put dots after numerals in chapter, section, subsection, etc. headers.
This is achieved by redefining few LaTeX macros.
For books:
\renewcommand\thechapter{\arabic{chapter}.}
\renewcommand\thesection{\arabic{chapter}.\arabic{section}.}
\renewcommand\thesubsection{\arabic{chapter}.\arabic{section}.\arabic{subsection}.}
\renewcommand\thesubsubsection{\arabic{chapter}.\arabic{section}.\arabic{subsection}.\arabic{subsubsection}.}
|
For articles:
\renewcommand\thesection{\arabic{section}.}
\renewcommand\thesubsection{\arabic{section}.\arabic{subsection}.}
\renewcommand\thesubsubsection{\arabic{section}.\arabic{subsection}.\arabic{subsubsection}.}
|
Alternatively you can use dedicated document classes:
- the mwart class instead of article,
- mwbk instead of book
- and mwrep instead of report.
Those classes have much more European typography settings but do not require the use of Polish babel settings or character encoding.
Simple usage:
\documentclass{mwart}
\usepackage[polish]{babel}
\usepackage{polski}
\begin{document}
Pójdź kińże tę chmurność w głąb flaszy.
\end{document}
|
Full documentation for those classes is available at http://web.archive.org/web/20040609034031/http://www.ci.pwr.wroc.pl/~pmazur/LaTeX/mwclsdoc.pdf (Polish).
Indentation
editIt may be customary (depending on publisher) to indent the first paragraph in sections and chapters:
\usepackage{indentfirst}
|
Hyphenation and typography
editIt's much more frowned upon to set pages with hyphenation between pages than it is customary in American typesetting.
To adjust penalties for hyphenation spanning pages, use this command:
\brokenpenalty=1000
|
To adjust penalties for leaving widows and orphans (clubs in TeX nomenclature) use those commands:
\clubpenalty=1000
\widowpenalty=1000
|
Commas in math
editAccording to some typography rules, fractional parts of numbers should be delimited by a comma, not a dot. To make LaTeX not insert additional space in math mode after a comma (unless there is a space after the comma), use the icomma package.
\usepackage{icomma}
|
Unfortunately, it is partially incompatible with the dcolumn package. One needs to either use dots in columns with numerical data in the source file and make dcolumn switch them to commas for display or define the column as follows:
\begin{tabular}{... D{,}{\mathord\mathcomma}{2} ...}
|
The alternative is to use the numprint package, but it is much less convenient.
Another alternative is using package siunitx that lets you typeset numbers and their according units consistently. Number alignment in tables and different output modes re supported.
Further information
editRefer the Słownik Ortograficzny (in Polish) for additional information on Polish grammar and typography rules.
Good extract is available at Zasady Typograficzne Składania Tekstu (in Polish).
Portuguese
editAdd the following code to your preamble:
\usepackage[portuguese]{babel}
|
You can substitute the language for brazilian portuguese by choosing brazilian or brazil.
Slovak
editBasic settings are fine when left the same as Czech, but Slovak needs special signs for 'ď', 'ť', 'ľ'. To be able to type them from keyboard use the following settings:
\usepackage[slovak]{babel}
\usepackage[T1]{fontenc}
|
Spanish
editInclude the appropriate Babel option:
\usepackage[spanish]{babel}
|
The trick is that Spanish has several options and commands to control the layout. The options may be loaded either at the call to Babel, or before, by defining the command \spanishoptions
. Therefore, the following commands are roughly equivalent:
\def\spanishoptions{mexico}
\usepackage[spanish]{babel}
|
\usepackage[spanish,mexico]{babel}
|
On average, the former syntax should be preferred, as the latter is not recognized by some programs (LyX, latex2rtf) interacting with LaTeX.
Spanish also defines shorthands for the dot and << >> so that they are used as logical markup: the former is used as decimal marker in math mode, and the output is typically either a comma or a dot; the latter is used for quoted text, and the output is typically either «» or “”. This allows different typographical conventions with the same input, as preferences may be quite different from, say, Spain and Mexico.
Two particularly useful options are es-noquoting,es-nolists: some packages and classes are known to collide with Spanish in the way they handle active characters, and these options disable the internal workings of Spanish to allow you to overcome these common pitfalls. Moreover, these options may simplify the way LyX customizes some features of the Spanish layout from inside the GUI.
The options mexico,mexico-com provide support for local custom in Mexico: the former using decimal dot, as customary, and the latter allowing decimal comma, as formerly required by the Mexican Official Norm (NOM) of the Department of Economy for labels in foods and goods. More localizations are in the making.
The other commands modify the Spanish layout after loading Babel. Two particularly useful commands are \spanishoperators
and \spanishdeactivate
.
The macro \spanishoperators{<list of operators>}{
contains a list of spanish mathematical operators, and may be redefined at will. For instance, the command
\def\spanishoperators{sen}
|
only defines sen, overriding all other definitions; the command \let\spanishoperators\relax
disables them all. This command supports accented or spaced operators: the \acute{<letter>}
command puts an accent, and the \,
command adds a small space.
For instance, the following operators are defined by default.
l\acute{i}m l\acute{i}m\,sup l\acute{i}m\,inf m\acute{a}x
\acute{i}nf m\acute{i}n sen tg arc\,sen arc\,cos arc\,tg
cotg cosec senh tgh
|
Finally, the macro \spanishdeactivate{<list of characters>}
disables some active characters, to keep you out of trouble if they are redefined by other packages. The candidates for deactivation are the set {<>."'}. Please, beware that some option preempt the availability of some active characters. In particular, you should not combine the es-noquoting option with \spanishdeactivate{<>}
, or the es-noshorthands with \spanishdeactivate{<>."}
.
Please check the documentation for Babel or spanish.dtx for further details.
Thai
editBoth babel (luatex and xetex) and polyglossia (only xetex) support Thai. Word division in luatex is based on the standard hyphenation mechanism, so that patterns can be modified with \babelpatterns
, while xetex relies on its own built-in mechanism. In pdftex you need an external tool for word segmentation (like swath). An example with babel (luatex and xetex) is:
\documentclass{book}
\usepackage{babel}
\babelprovide[main, import]{thai}
\babelfont{rm}{FreeSerif}
\begin{document}
ปัจจุบันข้าวและพริกเป็นส่วนประกอบสำคัญที่สุดของอาหารไทย
\end{document}
|
Tibetan
editOne option to use Tibetan script in LaTeX is to add
\usepackage{ctib}
|
to your preamble and use a slightly modified Wylie transliteration for input. Refer to the excellent package documentation for details. More information can be found on [1]
`babel` for `luatex` provides tentative support for justification with trailing tshegs.[2]
Vietnamese
editThe following preamble could be used to directly type Vietnamese (xetex or luatex).
\documentclass{article}
\usepackage{fontspec}%
\setmainfont[Ligatures=TeX]{Linux Libertine O}
|
For a document written in this language:
\documentclass{article}
\usepackage[vietnamese]{babel}
|
References
edit- ↑ Babel: The multilingual framework to localize LaTeX, LuaLaTeX, XeLaTeX
- ↑ The Not So Short Introduction to LaTeX, 2.5.6 Support for Cyrillic, Maksym Polyakov
- ↑ The Not So Short Introduction to LaTeX, Bulgarian translation
- ↑ babel-french documentation: "the French language should now be loaded as french, not as frenchb or francais and preferably as a global option of
\documentclass
. Some tolerance still exists in v3.0, but do not rely on it."