Gujarati/How to use Unicode in creating Gujarati script

Introduction

This page contains an Indic script. Without sufficient text support you may see irregular vowel placements and no conjuncts. More...

Gujarati script is used to write Gujarati language. The topic first started as a stub, then became a sub-page under Gujarati script, and finally was flagged as a candidate for Wikibooks. Here, we would attempt to deal with a slightly complicated typography of the Gujarati script for the non-native users of the script, and also a slightly complicated manner in which it is implemented in Unicode. By the virtue of standardization, Unicode has tried to implement twelve South Asian Scripts with similar set of rules. This means that once you know How to use Unicode in creating Gujarati script, you may apply somewhat similar methodology for other Indic scripts like Devanagari, Bengali, Gurumukhi, at al; provided you have the basic knowledge and the required fluency in the respective writing system.

The Basics

Gujarati alphabet mainly includes 34 consonants (ornamented sounds), 2 compound characters that are treated as consonants (not lexically though), and 14 vowels (pure sounds). Overall, the writing system comprises 94 legitimate and recognized distinct symbols or shapes. In the current Unicode 4.1 implementation, however, only some of these symbols have been incorporated as glyphs or shapes. The remaining shapes are created by conjunctions.

Introductory knowledge of Gujarati language and script can be obtained from

Framework of a Gujarati symbol

Given a constructed Gujarati syllable, it can be logically divided into the following parts based on the position of the shapes involved.

1. Baseline area – this is the placeholder for consonants and independent vowels
2. Area below and above the baseline – used for placing lower (below-base) and upper (above-base) dependent vowels respectively
3. Area before and after the baseline – used for writing left (pre-base) and right (post-base) dependent vowels respectively

Examples (clock-wise from top-left): 1. Post-based (Right) 2. Below-based (Lower) 3. Pre-based (Left) 4. Above-based (Upper). We will use these conventions in our further discussion.

What is Substitution?

Substitution, in the sense applicable here, means replacing a set or group of characters or shapes with a single character or shape. In practical terms, this translates as – 1) multiple key-strokes will generate a single shape; and 2) the resultant shape will keep transforming itself (based on certain rules) in accordance with the user's key-strokes or inputs.

Substitution can happen when you add one or more shapes in any of the positions other than the baseline area (see illustration above).

Unicode Code-set

The Unicode range for Gujarati script is from U+0A80 to U+0AFF. The ISCII Code-page identifier for Gujarati script is 57010.

The table below shows the glyphs that are implemented in Unicode standard 4.0.0. Gray boxes indicate the code-points that are reserved/unused.

x=

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

U+0A8x

ઁ

ં

ઃ

અ

આ

ઇ

ઈ

ઉ

ઊ

ઋ

ઌ

ઍ

એ

U+0A9x

ઐ

ઑ

ઓ

ઔ

ક

ખ

ગ

ઘ

ઙ

ચ

છ

જ

ઝ

ઞ

ટ

U+0AAx

ઠ

ડ

ઢ

ણ

ત

થ

દ

ધ

ન

પ

ફ

બ

ભ

મ

ય

U+0ABx

ર

લ

ળ

વ

શ

ષ

સ

હ

઼

ઽ

ા

િ

U+0ACx

ી

ુ

ૂ

ૃ

ૄ

ૅ

ે

ૈ

ૉ

ો

ૌ

્

U+0ADx

ૐ

U+0AEx

ૠ

ૡ

ૢ

ૣ

૦

૧

૨

૩

૪

૫

૬

૭

૮

૯

U+0AFx

૱

For further details regarding Unicode Code-points and standards, you may refer to Unicode Code-chart — Standard 4.1.

Examples

Note: In the examples shown in the sections below, the "+" sign denotes the combination of key-strokes or user inputs.

Half-form of consonants

Half-forms of consonants are used in pre-base position. For consonants that do not have distinct glyph for half-forms, a Halant (્) is used to create half-forms as follows:

મ +્ + ય = મ્ય

— as in રમ્ય (pleasant)

(Note the Half-form of મ, which is used here in conjunction with ય) Note: Half-form is not created for the base glyph even if the syllable ends with a Halant.

Application of Upper-based form of Ra – (Reph)

Application of Ra with a Halant (Half-form of Ra, as seen above) to a full-form consonant before the consonant produces Reph for that consonant. This affects the pronunciation of Ra in conjunction with that consonant. A Reph can be created as follows:

ર +્	= Ra + Halant
ર +્ + થ = ર્થ	— as in અર્થ (meaning)

(Ra + Halant + થ = Reph effect on થ)

Application of Lower-based form of Ra – (Vattu)

Application of a Halant of a consonant (Half-form of consonant) to a full-form of Ra produces Vattu for that consonant. This affects the pronunciation of Ra in conjunction with that consonant. A Vattu can be created as follows:

પ +્ + ર = પ્ર

— as in પ્રજા (people)

(પ + Halant + Ra = Vattu effect on પ)

Vattu variants

Vattu variants (half and full) are formed when consonants with vattu mark are combined. Often in some cases, a special glyph is required to represent vattu when various consonants are combined.

ડ +્ + ર = ડ્ર

— as in ડ્રમ (drum)

(special glyph ડ્ર. Notice the two lower-based marks, as compared to only one in the previous example.)

Special Marks, Characters and Nukta

Above-based marks

All above-based marks and post-based matra are created as under:

ક +ં = કં

— as in કંપન (vibration)

Below-based marks

The below-based marks and post-based matra are created as below:

ક +ુ = કુ	— as in કુતરો (dog)
ભ +ૂ = ભૂ	— as in ભૂકંપ (earthquake)

Characters શ્ર, ક્ષ and જ્ઞ

Following characters, which are part of the Gujarati alphabet, but are not explicitly created as glyphs in Unicode character-set, can be generated as indicated below:

શ +્ + ર = શ્ર

ક +્ + ષ = ક્ષ

જ +્ + ઞ = જ્ઞ

Application of Nukta

Nukta effects the pronunciation of the (preceding) consonant to which it is applied. A Nukta form of a consonant can be created in Unicode as follows:

ય +઼ = ય઼

Substitutions for specific typography of the script

Following are the main character substitutions which are required to address the complexity of the language and to generate various character forms of the script:

Pre-base substitutions

The half-form conjunctions, one of the most common occurrences of the script, are created by pre-base substitutions.

ન +્ + ન = ન્ન

— as in પ્રસન્ન (happy)

Also, the special use of this substitution is in creating I-Matra (and its appropriately aligned shape) as shown below:

ત +િ = તિ

— as in તિર (arrow)

Post-base substitutions

Consonants of the Gujarati script do not have post-based forms. Primarily, post-based substitution is used to create visarga out of vowels, and is also applied for "I-Matra" substitutions as follows (which will precede any above-based substitution, if applied as well):

જ +ી = જી

— as in જીવન (life)

(Compare the special shape જી – a result of post-based substitution – with another result of similar conbination using a character like લ, which will generate: લ +ી = લી)

Above-base substitutions

Above-based substitution is mainly applied for Matra, Reph, vowel modifications and for stress and tone marks. Consider the following examples:

વ +ૈ = વૈ	— as in વૈભવ (pompousness)
ર +્ + ગ +ે = ર્ગે	— as in સ્વર્ગે (in heaven)
મ +ે +ં = મેં	— as in મેંઢક (frog)

Below-base substitutions

Mainly used for below-based matra, the below-based substitution could produce a conjunction, or change the whole shape of the glyph. This substitution is also used for producing special tone effect like anudatta.

More details on Gujarati Unicode

For further details on Gujarati Unicode, you may refer to Unicode Std 4.0.0 - Chapter 9
TDIL: Ministry of Communication & Information Technology, India
If you are creating a web-page while the OS language is not Gujarati, save the file as UTF-8 Unicode HTML. The code-points may be lost otherwise.