2022
-
Built Python package for healthcare data harmonization and automation, which can be used to extract, clean, validate, profile, and standardize EMR and RUO (Research Use Only) data from different hospitals
2021
-
Built Machine Learning models to reduce Personal Injury Protection (PIP) claim loss by preventing fraud. This model is projected to save more than 1 million per month on claim loss
2020
-
Initiated the Event driven data architecture by Ingesting Data Change Capture (CDC) of Databases into Kafka topics using Docker Compose
2019
-
Built Machine Learning models to reduce Personal Injury Protection (PIP) claim loss by preventing fraud. This model is projected to save more than 1 million per month on claim loss
-
Built a web crawler to collect used car information from craigslist in Greater Portland Oregon area. Loaded data into Azure DataBricks. Explored the data and compared the correlation between price and odometer vs correlation between price and model year.
-
Loaded song and song-log data into Azure blob storage. Mounted the blob storage onto Azure DataBricks. Read the data into Hive tables using Spark. Built a Hive database after cleaning and partitioning.
-
Automated daily ETL runtime and table-size reports using SSIS and SSRS packages.
-
Built Error Event Schema in SSIS for ETL process, which improved the efficiency of
troubleshooting and made the ETL more resilient.
2018
-
Converted and verified more than 300 queries that extracted and conformed data from Kaiser
Permanente (KP) EMR and Claim database when KP changed the host from Teradata to Oracle
-
Detected Anomalies in staging area and prevents bad data from leaking into production. This
system has avoided loading more than 80 source errors in the period of 1 year.
2017
-
Explored the data, calculated response time, performed False Alarm Analysis and visualized the
analytical results in Python Notebook.
-
Built an Nvidia end-to-end Convolutional Neural Network using Python Keras library to drive car
in a simulator.
Collected 10-minute data on the simulator then used the data to train the model. The model can
drive the simulator car autonomously.
2013-2016
-
Established a
pharmacogenomics knowledge database in SQL Server for reporting purpose.
-
Managed 6 clinic trial data collection, storage and analysis as assay development technical lead.
2008-2013
-
Identify more than 50 mutations that cause Neonatal
Diabetes, published 10 peer-reviewed scientific papers.
2004-2008