What is Punjabi?

Punjabi (sometimes spelt Panjabi) is a language spoken predominantly on the Indian sub-continent by approximately 60 million people. There are two variants: eastern Punjabi spoken by people from India and western Punjabi spoken by people from Pakistan. In India, Punjabi is an official state language and is written in Gurmukhi script. In Pakistan, Punjabi is written in a script called Shahmukhi. Even though Punjabi is spoken by approximately 30 million people (representing the single largest linguistic group), it is not an official language of Pakistan. Due to the promotion of Urdu in Pakistan, Punjabi’s formal use in Pakistan is considerably less so than in India.

There are also large numbers of Punjabi speakers in countries around the world (especially the UK, Canada and United States). These are predominately immigrants or descendents of immigrants from the Indian sub-continent.

What is Gurmukhi?

Gurmukhi is by far the most predominant script used for writing Punjabi in eastern (Indian) Punjab. The Gurmukhi script is derived from the Landa alphabet and was standardised by Guru Angad Dev in the 16th century. It forms a part of the Brahmi script family.

Gurmukhi should use the locale type pa.

What is Shahmukhi?

Shahmukhi is based on Arabic script (reading from right-to-left) and is the predominant script for writing Punjabi in western (Pakistan) Punjab. This web site concentrates more towards the Gurmukhi side of Punjabi. Users wishing to use Punjabi in Shahmukhi are advised to research Arabic (and Urdu) computing.

Shahmukhi should use the locale type pa-PK.

What is Unicode?

Unicode is the international standard whose goal is to specify a code matching every character needed by every written human language to a single code point (integer). In terms of Indic languages, it provides support for 9 different scripts:

Unicode provides the first well implemented standard for using a large variety of scripts on computers across the world.

What was ISCII?

ISCII was the Indian Script Code for Information Interchange. It was developed by the Indian government to represent Indic scripts uniformly across multiple platforms. It was difficult to implement and did not have widespread backing. Unicode was based on the ISCII standard and currently has much better support.

The use of ISCII on new projects is NOT recommended!

What are the advantages of using Unicode for Gurmukhi?

Unicode is the international standard for data interchange. It is slowly but surely replacing all other standards used across the world. With a Unicode compatible computer system you can:

  • Name files and folders using Gurmukhi.
  • Search the entire web in Gurmukhi as you can do now for English. Major search engines already support Unicode Gurmukhi.
  • Create programs in Gurmukhi.
  • Have your web page titles in Gurmukhi.
  • Sort and organise data on your entire computer.
  • Exchange data with other users without loss of information and without specific fonts.

What are the disadvantages of using Unicode for Gurmukhi?

Unicode is a good solution for Punjabi - although it does have its downfalls:

  • Not all features of (older) Gurmukhi can be represented yet. This is not a problem for modern Punjabi.
  • Using Unicode Gurmukhi requires some readjustment in the way it is approached in comparison to font-based Gurmukhi. For example, you may have to use a different keyboard layout.

I’ve already got a web site in Punjabi, how can I convert it to Unicode?

The Punjabi Computing Resource Centre provides software to enable you to migrate font-based Gurmukhi to Unicode. The program in question - the Gurmukhi Unicode Conversion Application - is free and available to download right now! We also have resources that help you make Unicode compatible web sites.

Where can I find Gurmukhi fonts for Unicode?

Most of the Operating Systems these days come with one or more Panjabi Unicode fonts. Apple’s MacOS comes with Gurmukhi MN and Gurmukhi MT fonts. Windows comes with Raavi. For Linux, Redhat has developed Lohit Punjabi. There are many other Unicode complient or compatible fonts available for free on the internet.

Why is there no Danda (Purn Viraa) or Double Danda (Deergh Virama) for Gurmukhi?

At present, all Indic scripts use the Danda and Double Danda in the Devanagari block at U+0964 and U+0965 respectively. This poses no problem because Unicode enables you to use any characters from different blocks. One example is the use of Latin punctuation in Gurmukhi text.

The Inscript keyboard layout uses the Devanagari Danda at U+0964.

Why can I not create Era with Lavan?

For historic and transliteration reasons, Unicode encodes independent vowel forms seperately. That means they cannot be created using a combination of the components (e.g. Iri + Lavan).

In some literary texts, Era + Lavan is used in place of Iri + Lavan. After much research, we have concluded that this is indeed a mistake and should be replaced with Iri + Lavan. If you must use this combination, it can be created by using a ZWJ (Zero Width Joiner). That is, Era (U+0A05) + ZWJ (U+200D) + Lavan (U+0A47) to give ਅ‍ੇ. This method is not recommended because not all applications and programs handle the use of ZWJ correctly.

Where are all the Paireen characters?

Unicode does not encode Paireen or subjoined characters on their own. To encode a subjoined character, you simply enter the barer consonant, the Virama (Halant) sign and finally the full form of the subjoined character you wish to use.

For example, to type ‘pra’, you would enter ਪ + '੍' + ਰ to give ਪ੍ਰ. If your computer does not have the associated glyph for a particular subjoined form, your computer will display the Virama followed by the full form of the following character (like ਪ੍‌ਰ). You can force your computer to display the full form by entering a ZWNJ (Zero Width Non-Joiner - U+200C) after the Virama.

Why are all the Gurmukhi characters in such a strange order?

Unicode for Indic languages is based on ISCII. ISCII encoded nine different Indic scripts and provided a mechanism to easily switch between the scripts. This enabled users to view any Indian language text in the script of their choice. This was possible because of the many similarities between Brahmi-based scripts.

Due to this, all Indic scripts had equivalent characters at the same codepoint which is based on the Devanagari block. Because of this basis on the Devanagari block, the order does not seem correct to Gurmukhi readers.

How do I sort Gurmukhi text?

Unicode provides a special algorithm and guides for sorting and ordering text. The software you are using must implement this in order to correctly sort Gurmukhi text.

Why can I not type characters used in older Gurmukhi (for example, the SGGS) using Unicode?

At present, Unicode Gurmukhi is geared at typing modern Punjabi. It has not been implemented with archaic forms of Gurmukhi in mind. Older and Sanskritised forms of Gurmukhi break with some modern Gurmukhi conventions which makes implementing them particularly troublesome because Unicode rendering engines heavily enforce rules on Indian scripts.

Such rules that are troublesome are:

  • Allowing only one vowel sign to be attached to a consonant. e.g. Hora and Onkar can be used on one consonant.
  • Preventing adaptation of independent vowels. e.g. Onkar on ਓ to represent the independent form of Hora and Onkar.
  • Allowing only one form of a conjunct.e.g. Pari Haha and Udaat usage as alternative forms of Haha.

These rules do not conflict with modern Gurmukhi – in fact, they complement modern Gurmukhi – but they cause huge difficulties when a user wishes to enter text in a form that breaks with convention.

Many of the problems can be overcome with sporadic use of ZWS (Zero Width Space), ZWJ (Zero Width Joiner) and ZWNJ (Zero Width Non-Joiner). For example, Onkar on ਓ can be created as follows: (Your browser may have trouble rendering this sequence.)

ਓ + ZWJ + 'ੁ' = ਓ‍ੁ

However the use of these special characters cannot be a long term solution and needs to be addressed.

The PCRC has been formulated proposals for many months now and is urgently looking for experts to contribute. If you have in-depth knowledge of non-conventional forms of Gurmukhi and its relation to Sanskrit, Persian and other languages please contact us.

These proposals should go some way to addressing most of the issues present. However they may never be able to address stylistic differences such as using a Bindi/Tippi both before and after Bihari.

Source: This article was orignally published at Punjabi Computing Resource Center by Sukhjinder Sidhu.