• Expanding Unicode

    Developing Unicode standards for the representation of the world’s writing systems on digital devices. I have encoded 26 script blocks in Unicode and proposals for the addition of 16 new scripts have been submitted.

  • Devanagari Syllable Analyzer

    A Python implementation of Unicode Standard Annex (UAX) #29 “Unicode Text Segmentation”​, which analyzes grapheme clusters of Devanagari. Such clusters are synonymous with orthographic syllables of the script. The algorithm will be extended for performing identification of Sanskrit meters using machine learning. Note: iOS devices may not correctly or at all display some glyphs on account of known bugs in Apple’s Devanagari font.

  • Statistical Transliteration of Devanagari to Arabic

    Automated transliteration of Devanagari into the Arabic script cannot be performed accurately using rule-based approaches. Not only do the structural differences of the alpha-syllabic system of Devanagari and the abjad system of Arabic pose challenges, but the incongruent character repertoires of the two scripts presents additional hurdles. This prototype, written in Python, converts the Urdu/Hindi languages represented in Devanagari into Arabic using both rule-based and simple statistical techniques. As expected, some Arabic ‘corrections’​ are wrong. The core of this function may be considered a spelling validator. Therefore, an enhancement that implements n-gram analysis using TensorFlow is being developed.