Ray Yi

Data Analyst · chirayy@uci.edu

Welcome! I am a Predictive Data Analyst at Pacific Life, with 1.5 years of professional experience in data querying & manipulation, data visualization/dashboarding, machine learning/time series models, extract/transform/load processes (ETL), and communication with both technical and non-technical stakeholders. At Pacific Life, I've collaborated with project managers and data scientists to complete various projects that enable the Operations department to conduct data-informed decisions in staffing, employee scheduling, budgeting, and determining processing productivity, to name a few. I'm proficient in Python, SQL, Tableau, and R, as well as statistics/probability concepts. I graduated from University of California, Irvine in June 2020 with a degree of Bachelor of Science majoring in Data Science and a minor in Management. What I enjoy most about data science is the process to make sense of available data and figuring out an optimal way to present insights through data storytelling.

Aside from work, I compete in ping pong and enjoy everything basketball. Since COVID, I've set up a budget home gym that consists of a bench assembled from furnitures and an adjustable dumbbell, happy to say I've made it work and have been keeping up with my lifting routine :) Ask me about my favorite NBA player, fantasy basketball, thoughts on the importance of sleep, or anything really, I'd love to chat!

Experience

Pacific Life

Predictive Data Analyst

• Developed time-series model that improves call volume forecast MAPE by 10% and queue time MAPE by 70% using ARIMA and h2O.ai in Python, connected to Tableau dashboard that highlights call center metrics
• Developed Python app that appends daily work/staff volume to Snowflake, used to update Tableau dashboard that highlights production needs and recommends resource sharing strategies, reducing overtime by 25%
• Conducted contract unit cost analysis by cleaning/manipulating various data sources in Pandas, developed metrics to quantify cost/resources, providing new insights on budgeting from a contract perspective
• Implemented Python-Snowflake ETL processes scheduled on Alteryx server that power Operations reports/dashboards

June 2020 - Present
Remote : Newport Beach, CA

Pacific Life

Data Science Intern

• Built RShiny dashboard that visualizes and ranks policy abnormality from >100,000 policynumbers using PCA, K-Medoids Clustering, and Manhattan Distance, data wrangling done in SSMS, Alteryx, and R
• Led weekly meetings to discuss feature engineering and app design/functionality for insurance disbursement fraud

June 2019 - August 2019
Aliso Viejo, CA

Wing

Data Science Intern

• Identified KPIs such as user browsing habits and order preferences, wrote respective mySQL queries, connected PostgreSQL with Tableau for live data visualization, increasing personalized commands by 60%
• Scraped web pages of competing prices with BeautifulSoup and merged data with Alteryx for pricing strategy

October 2018 - May 2019
Irvine, CA

UC Irvine Center in Writing & Communication

Writing Peer Tutor

• Suggested ways to improve students' textual analysis logic and received annual evaluation of 9.3/10
• Prioritized teaching the principles behind grammatical and logical errors so students can apply in future papers

December 2016 - May 2018
Irvine, CA

Selected Projects

NBA Matchup and Season Performance Predictor

• Jupyter Notebooks that visualize and detail the process to achieve 65% accuracy, 0.1 MAE of NBA team record prediction and 75% accuracy of matchup victor prediction
• Scraped, cleaned, and joined player and team data from multiple sources, followed by EDA (scatterplots and heatmaps) for feature analysis/selection
• Wrote custom functions to pull given season roster's season-1 performance and weighted features by minutes played to reduce bias
• Modeled our final dataset of 1M+ rows with Random Forest Regressor/Classifier and Multiple Linear Regression from Scikit-Learn

Times Series Sales Forecast Dashboard

• Tableau Dashboard that visualizes 3-month projection of daily sales, achieved Kaggle competition top 30, with 20% MAPE
• Dashboard refreshed & automated by python script that connects to database, manipulates data to optimize time series model, and uploads forecast back to database, consumed by Tableau, ran on a schedule via Alteryx

Presidential Candidate Twitter Sentiment Analysis

• Jupyter Notebook that reveals Twitter perception of 2020 Presidential Candidates in March using PostgreSQL, SQLAlchemy, Tweepy, and Scikit-Learn
• Designed database schema for Tweets, streamed Tweets in PostgreSQL, and conducted Sentiment Analysis with NLTK package
• Utilized K-Means clustering to group similar tweets of each candidate to understand main topics of discussion for each candidate

KinectBeats - 1st Place HackTech 19 - Python Motion-to-Music Platform

• Music platform implemented with Kinect API to map hand gestures to sounds coded with Sonic Pi in Python
• Improved gesture recognition precision and accuracy by tweaking existing gesture images for a more diverse dataset

Education

Major: Bachelor of Science, Data Science
Minor: Business Management

GPA: 3.5/4.0

Relevant Coursework

Programming in Python I-III, Statistical Methods for Data Analysis I-III, Machine Learning, Database Management, Information Visualization, Probability & Statistics I-III, Bayesian Statistics, Data Science Capstone Project, Data Structures in C++, Analysis of Algorithms

Skills
  • Programming - Python, SQL, R, C++, HTML/CSS/Javascript
  • Python - Pandas, Numpy, Matplotlib, Scikit-Learn, BeautifulSoup, Tensorflow, Data Structures and Algorithms
  • SQL - Snowflake, mySQL, PostgreSQL, SQLAlchemy, MongoDB
  • R - Statistical Modelling, Caret, RShiny
  • Applications - Tableau, Excel, Git, Alteryx, H2O.ai
  • Languages - Fluent in English and Chinese, Conversational Proficiency in Spanish
Affiliations
  • Undergraduate Data Science Association
  • UC Irvine Varsity Table Tennis
  • Donald Bren School of Information and Computer Science Peer Advisory
Graduated June 2020