Projects

2018


What to cook

Project background

This is a side project to save me some time in grocery store. Often, I would stand in front a shelf of different product inside of a grocery story thinking about what to have for dinner and how to put it together. This is extremely annoying especially when I’m in a rush.

Inspired by Kaggle project “What’s cooking” I decide to make a app which I could just put in ingredients that I want to have for a meal and come up with a few recipes from the app. E.g. If I really want to have tomato and garlic tonight I wouldn’t mind cooking it with other ingredients as long as there’s a good recipe. From there, selecting styles of cooking, whether it’s Asian, French, or Italian shouldn’t be much of a trouble.

 

OCR on degraded mimeograph printings

While working as Algorithm Engineer @ BMSMART LLC

With tons of mimeograph printings floating around the world, query through them seems like a impossible job to do. Digitizing mimeograph prints becomes increasingly important, unlike alphabetical writing system, languages with logographic writing system such as Chinese have much more variations in writing, and of course, in printings as well with more fonts to choose from. Correctly recognizing these characters is a much harder task, it would even be the hardest machine learning task that I led to challenge till that point. To add to the complexities, prints could degrade from a variety of reasons: fading from exposing to the environment, contamination, or simply because prints are not solid enough coming out of printing process.

To tackle this problem, I’ve applies multiple measurements in each step of machine learning, binarization, normalization and standardization in [redacted] when pre-processing data; enhancing training data with [redacted] before training a CNN; applying NLP techniques to enhance/correct results using context, especially from [redacted] perspectives.

This method results in a 9% increase of recognition accuracy from baseline, and is currently pending patent.

 

SMILe (Smoker indication and lifestyle estimation) Tool

While working as data Engineer @ Lapetus solutions

SMILe is a dataset being developed by Lapetus solutions and UNC-Wilmington’s Faceaging Group in an on going project trying to identify the impact of smoking on human faces.

This tool aim to enrich the existing data by more that 50% with more annotations [redacted], annotators could choose fields they’d like to annotate, in reference with respective existing data displayed to them in the GUI. Later this data could be stored in different format as seen appropriate.

The tool is developed in Python, with PyQt as front end and Pandas as back end, it connects to CSV files, MySQL Database or  MongoDB with csv, unicodecsv, pandas, mysqlclient, and pymongo.

2017


I.D.I.O.T (Intelligent data ingestion & organization tool)

While working as data Engineer @ Lapetus solutions

 

Tackle handwritten script recognition problem with NLP

Project background

 

Rotifers counter

Project background

UNC-Wilmington hackathon 2017 project

 

Augmented Reality Sandbox

Project background

 

Tattoo & Piercing Dataset

Project background

 

2016


Fish 2.0

Project background

UNC-Wilmington hackathon 2016 project

 

2015


Application on incremental file-system backup on AWS cloud

Project background

 

2014


Performance tuning of departmental database

Project background

 

2013


Home made typewriter/printer driver

Project background

So the computer lab at BIT brought in some new toy-bo… I mean tool-boxes this year, instead of testing what we know about assembly language on simulators/emulators, we could actually see the real-life behaviors on tangible devices! So to get this started I would only need the keypad, which supposedly came with it’s own driver that would spit out mapped signal when keys are pressed, and the typewriter style print head with no driver.

Mapping input signals into functional command turns out to be, not very hard, as well as the scanning for input part. However buffering data received seems to be the tricky part, based on the manual came with the set, we should be safe to assume that it’s going to work if we put our buffer two blocks below where our home-brewed driver code is stored. The reality tells us no, apparently the pointer would not start from where it is stated in the manual and we just have to test it repetitively to find out where the section really starts. It did function as intended in the end though we’ve wasted a lot of paper trying to get it right, overall this is a lovely little fun project.

 

2012


Gradient directed composition of multi-exposure images

Project background