RSS Feed

ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS

ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS

The Unicode Consortium, a non-Governmental body with headquarters in the U.S.A with Governmental agencies of many countries also as members , have standardised and maintains a Universal Character Set (UCS), i.e. a standard that defines, in one place, all the characters needed for writing the majority of living languages in use on computers. It aims to be, and to a large extent already is, a superset of all other character sets that have been encoded. Unicode (as the UCS is commonly referred to) can access over a million characters of which about 100,000 have already been defined. These include characters for all the world’s main languages along with a selection of symbols for various purposes.

REASONS OF DISSENSIONS AMONG THE ASSAMESE :

1. Non-representation/misrepresentation of the Assamese writing system in the Unicode Standard because the Unicode Consortium and also the Government of India thinks that the current Bengali Code chart will serve the purpose of using the Assamese language in computers.

2. The script is named as Bengali and all character descriptors in the Unicode Code Chart named as per the Bengali nomenclature and Assamese are forced to use it, neither the Government of India and the Unicode Consortium is willing to do anything positive on it. Both take it as a political issue and cite multiple technical difficulties in solving it, and try to convince the complainants that nothing is wrong with it.

3. But the fact remains that the Assamese letter (Ro) is being described as Bengali letter(Ro)with middle diagonal, in the Bengali chart of the Unicode Standard.

4. Assamese letter(Wobo) described as Bengali letter (Ro)with lower diagonal, in the Bengali chart of the Unicode Standard.

5. Thirteen other Assamese letters similarly misrepresented in the Bengali chart of the Unicode Standard.

6. Assamese letter ক্ষ (Khya) is not represented at all in the Bengali Code Chart of the Unicode.

7. There are gross Collation Error which occurs when sorting softwares are run in Assamese. It was thought of as because (Ro) and (Wobo) are not in proper place and ক্ষ(Khya) is not represented at all in the Bengali Code Chart of the Unicode Standard.

Unicode expert’s latest communication rules out the first reason and no comment available on the second reason cited here.


SOLUTIONS UNDER CONSIDERATION :

1.RENAMING OF THE SCRIPT AND ALTERNATIVE NOMENCLATURE OF THE CHARACTER DESCRIPTORS

This is stated in the beginning because, the Government of India seems more interested in solving it that way. Renaming of the current Bengali script in the Unicode Standard with a name acceptable to all has been proposed by many. The problem with the renaming solution is there, both in the Bengali and Assamese side and most important a technical problem is associated with it.

A. Will the Bengali community agree to it, considering that the present Bengali code chart is serving their purpose quite well. The Bengali community is there in two sovereign countries India and Bangladesh.

B. The major problem lies on the Assamese side, will the renaming be limited to the renaming of the name of the Script and Code chart only or will it include the misrepresented character descriptors’ nomenclature also. For example the following Assamese characters have Bengali descriptors, different from how they would have been described in Assamese.

Supposing renaming is taken up as the best solution for solving the controversy then the whole current Bengali Code Chart of the Unicode Standard will have to have alternative nomenclature beginning with the title of the script like ASSAMESE AND BENGALI and the individual characters will also have alternative character descriptors like this :

U+09B8 e0 a6 b8 = BENGALI LETTER SA / ASSAMESE LETTER XA (DONTIYA)

U+09AF e0 a6 af = BENGALI LETTER YA / ASSAMESE LETTER ZA (ANTUSTYA)

If such an alteration is possible and every character is given both the Assamese and Bengali descriptors and the script renamed as per an acceptable name and the displaced and missing Assamese characters(Ro) and (Wobo) and ক্ষ(Khya)put in proper place in the chart, the problem may be solved.

Latest communication from some Unicode experts have reported that the misplaced letters do not cause collation error but is determined by CLDR of the Unicode. The collation error however is present since long time in Assamese and persists. One anonymous person has reported about it in the Unicode Forum. No comment is available on the absence of the letter ক্ষ(Khya) as a possible cause.


But as per the basic principle of a Unique Code, one particular entity can have one identifier, in this case around fifteen characters will have one identifier for two entities.

If Unicode Consortium or the Indian Government thinks that this basic principle of Unique Codification can be violated then the matter may be acceptable to the Assamese and Bengali alike.

2. SEPARATE SLOT/RANGE FOR THE ASSAMESE SCRIPT

If renaming in the way described above is not possible, then allocation of a separate slot/range for the Assamese Script remains the only solution. Which is perhaps easier for the Unicode Consortium to do. Government of Assam has also moved the Government of India seeking a separate slot/range for the Assamese script. Allocation of a separate slot/range for the Assamese Script will mean Unicode Consortium allowing and accepting duplication of characters. The Unicode Consortium has already allowed and accepted not only duplication but in case of some of the characters triplication of characters in the three major European writing systems viz. Cyrillic, Greek and Latin. Consequently in the Unicode Standard has more than the following number of duplicate characters :

a=2, A=3, B=3, c=2, C=2, e=2, E=3, H=3, i=2, I=3, j=2, J=2, K=2, M=3, N=2, o=2, O=3, p=2, P=3, s=2, S=2, T=2, x=2, X=3, y=2, Y=2 and Z=2

For complete list of characters duplicated between Cyrillic, Greek and Latin click here

Here only there are a total of 63 (sixty three characters) duplicated between the three major European writing systems the Cyrillic, Greek and Latin, the actual number is more than this.

Number wise duplication of characters will be perhaps much less than this, if Bengali and Assamese scripts are duplicated and allocated separate slots/ range for themselves.

CONCLUSION :

The solution therefore lies in duplication. In the first option there is going to be duplication of the Unique Codes meaning single code for two entities and in the second option there is going to be duplication of characters meaning two characters of the same appearance. The Unicode Consortium and the Government of India has to choose between the two. Duplication of characters is already there in the Unicode Standard but whether duplication of Unique Codes are there, or whether it is acceptable to the experts, whether it is justified, it is not known, because duplication itself means loss of uniqueness of any Unique Code.

If the Assamese and Bengali are to be separated that is called in Unicode parlance Disunification.

For full details on the issue go to this webpage

(https://drsatyakamphukan.wordpress.com/assamese-and-unicode)

Dr Satyakam Phukan

General Surgeon

Jorpukhuripar, Uzanbazar

Guwahati, Assam

P.I.N : 781001

Phone: 99540 46357

One response »

  1. Pallab Jyoti Sarmah

    I support Phukan sir. I prefer the first option for unicode consortium, to build a separate block for assamese language.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: