About
Hi there! I'm A.J., and I was born and raised in Los Angeles, where I completed my undergraduate education at UCLA in June 2019.
Although I graduated with a B.S. in Chemical & Biomolecular Engineering, I discovered a passion for Data Science during an internship
with NovellusDX in Summer 2018, and have since been pursuing related opportunities in that field. I currently live in the Washington,
D.C., metropolitan area and would love to connect with other professionals in the region, so please feel free to contact
me at any time!
Here are some of the technologies I've been working with recently:
- Programming Languages: Python, SQL, Javascript, HTML/CSS, MATLAB, C++
- Libraries and Frameworks: NumPy, Pandas, PyTest, Matplotlib, Seaborn, Scikit-learn, SpaCy, ScraPy, NLTK, Bokeh, Flask, Transformers, Graphviz, Haystack, SQLFluff, SQLGlot
- Software Tools: QGIS, Linux, Jenkins, Vim, Elasticsearch, Kibana, NiFi, Git, Snowflake
Intro
During the Summer of 2018, when I interned at a biotech start-up called NovellusDX, I expected to be placed in their research division, given my two years of hands-on experience in a Bioengineering laboratory at UCLA, where I researched synthetic tissue engineering. Instead, I was placed in their data analysis division, where I immediately discovered a passion for data science, a field almost completely unrelated to my major field of study, Chemical & Biomolecular Engineering. My newfound passion for data science was cultivated over only a few short months at NovellusDX, but my experience was so eye-opening that my career path pivoted to one focused entirely on Data Science in all its forms.
At NovellusDX, my work focused mainly on data visualization and analysis of machine learning outputs for the purposes of improving their algorithm’s accuracy in identifying cancerous mutations within a cell line. Then, during my senior year in 2019, I partnered with a fellow UCLA student to develop a project we’ve called “Dr. Playlist,” which has the goal of improving music classification by style, genre, or “vibe,” via the development of a machine learning pipeline focusing on variables including types of instruments, chord progressions, key, meter, etc.
While working on Dr. Playlist, I completed a 6-month data science bootcamp program through Thinkful, in which I enrolled to foster a deeper education while pursuing other opportunities in the field. As part of the Thinkful program, I completed projects involving supervised and unsupervised machine learning, experimental design, and general data visualization. The tools and strategies used in completing these projects included database querying via SQL, web scraping via Scrapy, dataset analysis with NumPy and Pandas, visualization using Matplotlib and Seaborn, and the various machine learning algorithms included in Scikit-learn.
In the time since my culmination from the Thinkful program, I have worked on contracts for the U.S. Census Bureau as an employee of Spatial Front Inc. and for a federal law enforcement client as a member of Deloitte’s Risk & Financial Advisory practice. I now work for Capital One, in their U.S. Card division, developing Python functionality that enables others at the company to quickly and accurately compile data, aggregate metrics, and identify actionable insights.
Although most of my experience as a data scientist has coincidentally been focused on the financial side, I find that all realms of data science catch my interest. Natural Language Processing, Machine Learning, and Robotic Process Automation are some of my favorite areas to pursue in developing personal projects, so if you know of any new innovations, I would love to hear about them!
Check out my work samples, which include some of my favorite assignments completed within the Thinkful program,
or take a look at the other tabs on this site for other personal projects I have since developed in my spare time.
Work Samples
- Determination of the most significant features of a dataset with respect to life expectancy, accomplished by the removal of outliers (Winsorization), normalization (BoxCox transformation), and correlation analyses:
Exploratory Data Analysis: Factors that Affect Life Expectancy
- Positive/Negative sentiment analysis of a set of movie reviews from IMDB via a Bernoulli Naive-Bayes classification algorithm:
Natural Language Processing: Sentiment Classification
- Research proposal, rollout plan, and evaluation plan for how U.S. airlines may be able to improve revenue:
U.S. Domestic Flights: Narrative Analytics and Experimentation (Thinkful Capstone #2)
- Prediction of house sale prices in Ames, Iowa, accomplished with Ordinary Least Squares, Lasso, Ridge, and ElasticNet Regression models:
Regression: Predicting House Prices
- Prediction of the type of weather in Szeged, Hungary, accomplished with Natural Language Processing as well as Decision Tree and Random Forest Classification models:
Classification: Predicting the Weather
- Prediction of the ratings for a set of recipes from their key terms and ingredients, accomplished with Natural Language Processing and a Support Vector Machine model:
SVM Classification: Recipe Ratings and Tags
- Comparison of Multi-Layer Perceptron with Gradient Boosting for the classification of weather by type of precipitation:
Neural Network: Weather Classification
Dr. Playlist
In May of 2019, two UCLA students teamed up with the goal of utilizing data science and machine learning to improve music classification beyond what current algorithms achieve,
with a focus on the tracks' genres/styles, via analysis of instrument type, chord progressions, meter, and several other features. Although the end goal of Dr. Playlist is to be
able to classify any given song based on those features, we had to first explore the limitations of current models in order to determine the best ways to proceed with this project.
Analysis and visualization of a subset of the Million Songs Dataset, which was originally compiled by researchers at Columbia University:
Million Songs Dataset (Thinkful Capstone #1)
Analysis and visualization of data extracted via web scraping with Scrapy from a website of Disney song lyrics:
Web Scraping: Disney Song Lyrics Analysis
Predictive genre classification for a set of 100,000+ tracks of music, with a comparison of the accuracies of several supervised machine learning models:
Dr. Playlist - Initial Analysis and Genre Classification via Supervised Machine Learning
Browser-based, Render-hosted Flask app that provides Disney song recommendations based on a user-inputted song. Dataset features obtained by querying SpotiPy and LyricGenius APIs:
Dr. Playlist - Disney Edition
Family Tree
In 2022, I was tasked with keeping an electronic record of my family's geneaology, to serve as a replacement for the paper copy being curated manually by my great-uncle.
Instead of simply using text boxes and arrows, I opted for an automated Pythonic solution that has longevity, one that will allow us to continue to add new data for generations
to come. This solution also optimizes the layout of the branches, illustrating the relationships by generation in a compact but easily readable way.
For privacy reasons, I have recreated the functionality here using the British monarchy's family tree as a sample dataset.
Browser-based, Render-hosted Flask app that plots linkages in a Family Tree from data stored in a Google sheet. Allows users to download as JPG/PDF and make updates in real-time.
Annotator Tool
Over the course of my career, I have on many occasions needed to manually annotate training data for machine learning applications,
especially those related to Natural Language Processing. Unfortunately, although there are several annotation tools available online,
they all felt too limited for my purposes or too expensive. So instead, I developed my own annotation tool, demoed here using Recipes data,
for which I have trained a Spacy Named Entity Recognition (NER) classifier to attempt to pre-identify and annotate relevant entities. The user can then make their own
annotations by highlighting a word or phrase with their mouse.
Annotations can be changed or removed by clicking on the colored box surrounding your selection and selecting the appropriate option
within this Browser-based, Render-hosted Flask app.
Final annotations, stored via the user's click of the "Record Annotations" button, are uploaded to this Google sheet.
Resume
Contact
Elements
Text
This is bold and this is strong. This is italic and this is emphasized.
This is superscript text and this is subscript text.
This is underlined and this is code: for (;;) { ... }
. Finally, this is a link.
Heading Level 2
Heading Level 3
Heading Level 4
Heading Level 5
Heading Level 6
Blockquote
Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.
Preformatted
i = 0;
while (!deck.isInOrder()) {
print 'Iteration ' + i;
deck.shuffle();
i++;
}
print 'It took ' + i + ' iterations to sort the deck.';
Lists
Unordered
- Dolor pulvinar etiam.
- Sagittis adipiscing.
- Felis enim feugiat.
Alternate
- Dolor pulvinar etiam.
- Sagittis adipiscing.
- Felis enim feugiat.
Ordered
- Dolor pulvinar etiam.
- Etiam vel felis viverra.
- Felis enim feugiat.
- Dolor pulvinar etiam.
- Etiam vel felis lorem.
- Felis enim et feugiat.
Icons
Actions
Table
Default
Name |
Description |
Price |
Item One |
Ante turpis integer aliquet porttitor. |
29.99 |
Item Two |
Vis ac commodo adipiscing arcu aliquet. |
19.99 |
Item Three |
Morbi faucibus arcu accumsan lorem. |
29.99 |
Item Four |
Vitae integer tempus condimentum. |
19.99 |
Item Five |
Ante turpis integer aliquet porttitor. |
29.99 |
|
100.00 |
Alternate
Name |
Description |
Price |
Item One |
Ante turpis integer aliquet porttitor. |
29.99 |
Item Two |
Vis ac commodo adipiscing arcu aliquet. |
19.99 |
Item Three |
Morbi faucibus arcu accumsan lorem. |
29.99 |
Item Four |
Vitae integer tempus condimentum. |
19.99 |
Item Five |
Ante turpis integer aliquet porttitor. |
29.99 |
|
100.00 |