Intro
Welcome to my data science portfolio! I'm thrilled to share my journey of transitioning from a pharmacy background to a proficient
data scientist with a passion for learning. My track record showcases my ability to craft precise models and extract actionable insights even within narrow timeframes.
My experience thriving under pressure in the fast-paced pharmacy environment and intensive bootcamp equips me to meet tight deadlines without compromising quality,
a skill I've successfully applied to my data work.
I'm actively seeking opportunities to use my skills to solve data-driven problems.
As you explore my portfolio, envision how my unique background and data expertise can create a positive impact.
If you are looking for a dedicated data professional with a relentless appetite for learning, let's connect and explore how I can add value to your
team. I'm eager to turn data into results.
Machine Learning Projects
Skin Disease Classification using Computer Vision
In light of the prevalence of skin diseases and their self-treatable nature, accurate classification of common skin conditions can significantly
contribute to addressing healthcare burden.
I developed a robust image classification model utilizing deep transfer learning to effectively classify five prevalent skin conditions in Singapore, namely acne, eczema, fungal skin infections, psoriasis, and warts. This model has showcased impressive accuracy, achieving above 90% on all key metrics.
I deployed the model on Streamlit, providing a user-friendly platform that offers a preliminary diagnosis and valuable information
on the diseases and their management.
Key Skills: Python, Tensorflow, Keras, Convolutional Neural Networks, Transfer Learning, Deep Learning, Grad-CAM, Web Scraping,
BeautifulSoup, Data Augmentation, Ensemble Modeling, Data Visualization, Docker
Tackling the West Nile Virus Outbreak with Data
Leveraging mosquito population data and weather conditions, I developed a binary classification model to predict the presence of West Nile virus in mosquitoes across Chicago.
My model achieved ROC AUC score of 88%, effectively predicting West Nile virus presence in traps. Prioritizing areas for pre-emptive spraying can help to optimize pesticide deployment and minimize costs while safeguarding public health.
Key Skills: Python, AdaBoost, CatBoost, GradientBoost, XGBoost, Logistic Regression, Random Forest, Regularized Greedy Forest,
Data Visualization, Data Analysis, Machine Learning, Predictive Modeling, Binary Classification
Reddit Binary Classification for Diet Communities
I utilized web scraping and natural language processing (NLP) techniques to collect and analyze a dataset comprising of over 2000 user posts from Keto and Paleo diet communities on Reddit.
The binary classification model achieved 94% accuracy and 93% f1-score in predicting subreddit origins based on user posts. Through deep analysis of user-generated content such as bigram/trigram words, I also identified unique patterns which can guide targeted marketing and product development.
Key Skills: Python, Natural Language Processing, Web-Scraping, Logistic Regression, Multinomial Naive Bayes, Bernoulli Naive Bayes, Gaussian Naive Bayes,
K-Nearest-Neighbors, Count Vectorization, TFIDF Vectorization, Binary Classification, Machine Learning, Data Analysis
Predicting HDB Resale Prices
HDB flats are highly sought after due to their affordability and amenities, making accurate price predictions vital for both buyers and sellers.
Using a comprehensive dataset from a Kaggle competition, I developed a linear regression model to predict the resale price of HDB flats. For simplicity, I reduced the number of features from 77 to 16, and achieved a model R2 score of 0.9 and root-mean-squared error (RMSE) of $46,000.
Key Skills: Python, Data Cleaning, Exploratory Data Analysis (EDA), Feature Engineering, Linear Regression, Ridge Regression, Lasso Regression, ElasticNet Regression, Data Visualization, Data Analysis
Data Analysis Projects
COVID-19 Data Analysis and Visualization
I conducted an extensive COVID-19 analysis using SQL, uncovering vital insights into cases, deaths, and vaccinations. This project encompasses
data preparation, exploratory analysis, and in-depth examinations at global, continental, and country levels. SQL techniques utilized include
data loading, aggregation, subqueries, common table expressions (CTE), temporary tables, creating views and window functions.
Additionally, I created an interactive Tableau dashboard, which features:
- Geographical map showing population percentages for vaccinations, deaths, and infections.
- Area chart illustrating rolling totals for vaccinations, cases, and deaths over time.
- Treemap for visualizing the probability of death if infected by country and continent.
- Bar chart identifying the top 5 countries with the highest population percentage of deceased.
- Line chart tracking new cases over time.
Some key insights include:
- A seasonal component exists for COVID, with a spike in cases towards the end of the year.
- South America has the highest vaccination per capita (1.9), yet they have the highest death per capita (0.0031). Asia has the lowest cases and deaths per capita compared to other continents.
- Yemen has the highest death probability if infected at ~18%, a stark difference from the second highest country Sudan at 7.9%.
Key Skills: SQL, MySQL, Exploratory Data Analysis, Data Cleaning, Extract-transform-load (ETL), Data Integration, Data Analysis, Data Visualization, Tableau, Interactive Dashboards
Climate Data Analysis for Picnic Planning
I completed a comprehensive analysis of climate data specific to Singapore, aiming to optimize picnic planning and operations
by leveraging weather insights. Utilizing Python and the Pandas library, I cleaned, validated and integrated data from 8 sources into an unified view before conducting exploratory
data analysis and visualization to uncover key insights. Visualizations used in this project include heatmaps, histograms, boxplots, scatterplots and line charts.
Some key insights include:
- February has ideal conditions for a picnic, such as best wind speeds and lowest rainfall.
- Rainfall variation by weekdays exist: heaviest rains on Friday, highest probability of rain on Wednesdays.
- Changi offers a higher probability of favorable weather conditions compared to Tuas and Ang Mo Kio.
Key Skills: Python, Extract-transform-load (ETL), Data Cleaning, Data Integration, Data Validation, Pandas, Numpy, Data Transformation,
Exploratory Data Analysis, Data Visualization, Matplotlib, Seaborn, Statistics, Data Analysis
About Me
I graduated with a Bachelor's Degree in Pharmacy from National University of Singapore, and worked as a hospital
pharmacist for 2.5 years thereafter. Venturing beyond my comfort zone, I used my spare time to pick up essential skills in finance,
investing, programming, data analytics and data science. As a person who loves solving difficult problems, data challenges were akin to
puzzles, and I was soon drawn to this exciting domain.
I eventually made a pivotal decision to enroll in General Assembly's intensive data science bootcamp, where I honed my expertise through a
diverse array of projects. My data journey has only just started and I am excited to see how it unfolds.
Skills
Programming Languages
Data Science and Machine Learning
- Pandas
- Numpy
- Scikit-learn
- Web Scraping
- Tensorflow
- Keras
- Natural Language Processing
- Deep Learning / Neural Networks
- Docker
- Spark
- SQL
Data Analysis and Visualization
- Data Cleaning
- Tableau / Dashboards
- Matplotlib
- Seaborn
Soft Skills
- Communication
- Detail-oriented
- Problem Solving
Education
General Assembly
May 2023 - Aug 2023
- Received 480 hours of data science instructional content
- Topics include Data Cleaning, Regression and Classification, Natural Language Processing, Clustering, Time Series, Neural Networks and others
- Completed 4 machine learning projects and 1 project on data analysis
National University of Singapore
Aug 2016 - Jun 2020
- Bachelor of Pharmacy (First Class Honours)
- CAP 4.92
- Dean's list for all semesters
- Best Overall Performance in Pharmaceutical Chemistry Modules
Experiences
Pharmacist
Apr 2021 - Apr 2023
- Established the pharmacy-led rheumatology service, which includes specialised counselling, creating tailored educational materials
and optimizing medication use.
- Achieved desired patient outcomes through a daily individual workload of approximately 60 prescriptions.
Contact
Find me at:
Elements
Text
This is bold and this is strong. This is italic and this is emphasized.
This is superscript text and this is subscript text.
This is underlined and this is code: for (;;) { ... }
. Finally, this is a link.
Heading Level 2
Heading Level 3
Heading Level 4
Heading Level 5
Heading Level 6
Blockquote
Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.
Preformatted
i = 0;
while (!deck.isInOrder()) {
print 'Iteration ' + i;
deck.shuffle();
i++;
}
print 'It took ' + i + ' iterations to sort the deck.';
Lists
Unordered
- Dolor pulvinar etiam.
- Sagittis adipiscing.
- Felis enim feugiat.
Alternate
- Dolor pulvinar etiam.
- Sagittis adipiscing.
- Felis enim feugiat.
Ordered
- Dolor pulvinar etiam.
- Etiam vel felis viverra.
- Felis enim feugiat.
- Dolor pulvinar etiam.
- Etiam vel felis lorem.
- Felis enim et feugiat.
Icons
Actions
Table
Default
Name |
Description |
Price |
Item One |
Ante turpis integer aliquet porttitor. |
29.99 |
Item Two |
Vis ac commodo adipiscing arcu aliquet. |
19.99 |
Item Three |
Morbi faucibus arcu accumsan lorem. |
29.99 |
Item Four |
Vitae integer tempus condimentum. |
19.99 |
Item Five |
Ante turpis integer aliquet porttitor. |
29.99 |
|
100.00 |
Alternate
Name |
Description |
Price |
Item One |
Ante turpis integer aliquet porttitor. |
29.99 |
Item Two |
Vis ac commodo adipiscing arcu aliquet. |
19.99 |
Item Three |
Morbi faucibus arcu accumsan lorem. |
29.99 |
Item Four |
Vitae integer tempus condimentum. |
19.99 |
Item Five |
Ante turpis integer aliquet porttitor. |
29.99 |
|
100.00 |