• Nearest Chicago Public Library

    I developed this app to explore various geolocation and geocoding libraries. The app provides the location of the Chicago Public Library branch that is nearest to an address specified by the user. I chose this use case because the data set is geographically confined. The app relies on data from the City of Chicago Data Portal. It renders maps and performs geolocation using the Google Maps Javascript API and the Google Maps Geocoding API. ‘Nearest neighbors’ are determined ‘as the crow flies’ using a simple linear distance algorithm, while miles are calculated using Vicenty distance. The default map displays all library locations. After an address is provided, the map displays only the locations of that address and the nearest library. Zoom factor is determined by the miles between these two locations.

Data Visualization

  • Revisualizing the Unicode Roadmap

    I undertook this project in an effort to present a singular view of the Unicode Roadmap to the SMP that stretches over 90 versions from 2001 to the present, and encompasses 298 entries for scripts / blocks. I produced a web-scraper in Python using requests and beautifulsoup4 to parse each version. The data processing initially relied upon a set of dicts generated from the raw data, but I switched to pandas for greater efficiency. The visualization is similar to a gantt chart and has been plotted using matplotlib.

Text Processing

  • Devanagari Syllable Analyzer

    A Python implementation of Unicode Standard Annex (UAX) #29 “Unicode Text Segmentation”​, which analyzes grapheme clusters of Devanagari. Such clusters are synonymous with orthographic syllables of the script. The algorithm will be extended for performing identification of Sanskrit meters using machine learning. Note: iOS devices may not correctly or at all display some glyphs on account of known bugs in Apple’s Devanagari font.

  • Statistical Transliteration of Devanagari to Arabic

    Automated transliteration of Devanagari into the Arabic script cannot be performed accurately using rule-based approaches. Not only do the structural differences of the alpha-syllabic system of Devanagari and the abjad system of Arabic pose challenges, but the incongruent character repertoires of the two scripts presents additional hurdles. This Python prototype converts the Urdu/Hindi languages in Devanagari into the Arabic script using both rule-based and simple statistical techniques. The core function is a simple spelling validator. As expected, some Arabic ‘corrections’​ are wrong. An enhancement based upon n-gram analysis using TensorFlow is being developed.

  • Jumble Solver with WordNet Integration

    Produces permutations of a scrambled string and validates productions against the WordNet lexicon to return plausible English words and definitions. I coded this as a prototype for a validator of jumbled strings in South Asian languages, which are written in alpha-syllabic scripts. Permutation of text in these languages requires analysis of syllabic tokens instead of individual letters as in English.