• Statistical Transliteration of Devanagari to Arabic

    Automated transliteration of Devanagari into the Arabic script cannot be performed accurately using rule-based approaches. Not only do the structural differences of the alpha-syllabic system of Devanagari and the abjad system of Arabic pose challenges, but the incongruent character repertoires of the two scripts presents additional hurdles. This Python prototype converts the Hindi/Urdu/Hindustani languages in Devanagari into the Arabic script using both rule-based and simple statistical techniques. The core function is a simple spelling validator. As expected, some Arabic ‘corrections’​ are wrong. An enhancement based upon n-gram analysis is being developed.

  • Devanagari Orthographic Syllable Analyzer

    A Python implementation of Unicode Standard Annex (UAX) #29 “Unicode Text Segmentation”​, which analyzes grapheme clusters of Devanagari. Such clusters are synonymous with orthographic syllables of the script. The algorithm will be extended for performing identification of Sanskrit meters using machine learning. Note: iOS devices may not correctly or at all display some glyphs on account of known bugs in Apple’s Devanagari font.

  • Jumble Solver with WordNet Integration

    Produces permutations of a scrambled string and validates productions against the WordNet lexicon to return plausible English words and definitions. I coded this as a prototype for a validator of jumbled strings in South Asian languages, which are written in alpha-syllabic scripts. Permutation of text in these languages requires analysis of syllabic tokens instead of individual letters as in English.