About

I am a Data Scientist with a background in Chemical & Biomolecular Engineering from UCLA, specializing in the intersection of deep technical implementation and high-level business strategy.

Currently a Data Science Manager at Capital One, I lead the development of high-performance Python frameworks that empower enterprise-wide teams to aggregate metrics and derive actionable insights from massive datasets. My career has spanned critical roles for the U.S. Census Bureau's Economy-Wide Statistics Division and Deloitte’s Risk & Financial Advisory practice, consistently focusing on Natural Language Processing, Machine Learning, and Process Automation.

My engineering roots at UCLA instilled a "first-principles" approach to problem-solving, which I now apply to the world of Big Data. Since my 2018 pivot at NovellusDX, I’ve been obsessed with turning raw information into strategic assets. I'm currently living in the Washington, D.C. area, so I’m always looking to collaborate with fellow tech enthusiasts in the DMV region; please reach out and let's connect!

Core Tech Stack

Programming Languages

Python, SQL, Javascript, MATLAB, C++

Python Libraries

NumPy, Pandas, Matplotlib, Seaborn, Sklearn, SpaCy, NLTK, Bokeh, Flask, Plotly, Transformers, Haystack, SQLGlot, SQLFluff

Software Tools

QGIS, Jenkins, Vim, Elasticsearch, Kibana, NiFi, Git, Snowflake

Case Studies

Side Projects

Overview

In my spare time, I enjoy building side projects that focus on bridging the gap between deep-backend data engineering and accessible user-facing utility to empower non-technical stakeholders. To date, I have released three Flask web services hosted via Render, but please keep an eye on this page in the future for other projects I currently have in flight.

Dr. Playlist

Several years ago, I teamed up with another UCLA alumnus with the goal of utilizing data science and machine learning to improve music classification beyond what current algorithms achieve, with a focus on the tracks' genres/styles, via analysis of instrument type, chord progressions, meter, and several other features. Although the end goal of Dr. Playlist is to be able to classify any given song based on those features, we had to first explore the limitations of current models in order to determine the best ways to proceed with this project.

Browser-based, Render-hosted Flask app that provides Disney song recommendations based on a user-inputted song. Dataset features obtained by querying SpotiPy and LyricGenius APIs:
Dr. Playlist - Disney Edition



Named Entity Recognition (NER) Annotator

Built a custom SpaCy-integrated web application to solve the high cost of manual data labeling. The tool pre-identifies entities via a trained classifier, allowing users to rapidly validate or correct annotations via a Flask-based UI.

  • Reduces manual labeling time by ~60% through pre-annotation.
  • Exports directly to SpaCy .spacy and .json formats.

Over the course of my career, I have on many occasions needed to manually annotate training data for machine learning applications, especially those related to Natural Language Processing. Unfortunately, although there are several annotation tools available online, they all felt too limited for my purposes or too expensive. So instead, I developed my own annotation tool, demoed here using Recipes data, for which I have trained a Spacy Named Entity Recognition (NER) classifier to attempt to pre-identify and annotate relevant entities. The user can then make their own annotations by highlighting a word or phrase with their mouse.

Annotations can be changed or removed by clicking on the colored box surrounding your selection and selecting the appropriate option.

Final annotations, stored via the user's click of the "Record Annotations" button, are uploaded to this Google sheet.

Kinship Linkage Tree

A Pythonic solution via Render-hosted Flask app for visualizing hierarchical and complex data structures. This engine automatically generates complex linkages from dynamic datasets using Graphviz and Flask.

  • Handles dynamic depth scaling for massive relationship trees.
  • Optimized SVG rendering for browser performance.

In 2022, I was tasked with keeping an electronic record of my family's geneaology, to serve as a replacement for the paper copy being curated manually by my great-uncle.
Instead of simply using text boxes and arrows, I opted for an automated Pythonic solution that has longevity, one that will allow us to continue to add new data for generations to come. This solution also optimizes the layout of the branches, illustrating the relationships by generation in a compact but easily readable way. For privacy reasons, I have recreated the functionality here using the British monarchy's family tree as a sample dataset.

Resume