2022
  • Built Python package for healthcare data harmonization and automation, which can be used to extract, clean, validate, profile, and standardize EMR and RUO (Research Use Only) data from different hospitals 2021
    • Built Machine Learning models to reduce Personal Injury Protection (PIP) claim loss by preventing fraud. This model is projected to save more than 1 million per month on claim loss
    2020
    • Initiated the Event driven data architecture by Ingesting Data Change Capture (CDC) of Databases into Kafka topics using Docker Compose
    2019
    • Built Machine Learning models to reduce Personal Injury Protection (PIP) claim loss by preventing fraud. This model is projected to save more than 1 million per month on claim loss
    • Built a web crawler to collect used car information from craigslist in Greater Portland Oregon area. Loaded data into Azure DataBricks. Explored the data and compared the correlation between price and odometer vs correlation between price and model year.
    • linked_camera [Hive database]
      Loaded song and song-log data into Azure blob storage. Mounted the blob storage onto Azure DataBricks. Read the data into Hive tables using Spark. Built a Hive database after cleaning and partitioning.
    • linked_camera [ETL report]
      Automated daily ETL runtime and table-size reports using SSIS and SSRS packages.
    • location_searching [Error Event Schema]
      Built Error Event Schema in SSIS for ETL process, which improved the efficiency of troubleshooting and made the ETL more resilient.
    2018
    • location_searching [Teradata to Oracle]
      Converted and verified more than 300 queries that extracted and conformed data from Kaiser Permanente (KP) EMR and Claim database when KP changed the host from Teradata to Oracle
    • linked_camera [Anomaly Detection]
      Detected Anomalies in staging area and prevents bad data from leaking into production. This system has avoided loading more than 80 source errors in the period of 1 year.
    2017
    • Explored the data, calculated response time, performed False Alarm Analysis and visualized the analytical results in Python Notebook.
    • Built an Nvidia end-to-end Convolutional Neural Network using Python Keras library to drive car in a simulator. Collected 10-minute data on the simulator then used the data to train the model. The model can drive the simulator car autonomously.
    2013-2016 2008-2013
    • android [Genetics]
      Identify more than 50 mutations that cause Neonatal Diabetes, published 10 peer-reviewed scientific papers.
    2004-2008