Statistical Transliteration of Devanagari to Arabic
Automated transliteration of Devanagari into the Arabic script cannot be
performed accurately using rule-based approaches. Not only do the structural
differences of the alpha-syllabic system of Devanagari and the abjad system of
Arabic pose challenges, but the incongruent character repertoires of the two
scripts presents additional hurdles. This Python prototype converts
the Hindi/Urdu/Hindustani languages in Devanagari into the Arabic script using
both rule-based and simple statistical techniques. The core function is a simple spelling validator.
As expected, some Arabic ‘corrections’ are wrong. An enhancement based upon
n-gram analysis is being developed.
Devanagari Orthographic Syllable Analyzer
A Python implementation of Unicode Standard Annex (UAX) #29 “Unicode Text
Segmentation”, which analyzes grapheme clusters of Devanagari. Such clusters
are synonymous with orthographic syllables of the script. The algorithm will
be extended for performing identification of Sanskrit meters using machine learning.
Note: iOS devices may not correctly or at all display some glyphs on account of
known bugs in Apple’s Devanagari font.
Jumble Solver with WordNet Integration
Produces permutations of a scrambled string and validates productions against
the WordNet lexicon to return plausible English words and definitions. I coded
this as a prototype for a validator of jumbled strings in South Asian languages,
which are written in alpha-syllabic scripts. Permutation of text in these
languages requires analysis of syllabic tokens instead of individual letters
as in English.