Ashish Trehan
Contact
ashishtrehan10@gmail.com – PN: 706-294-095
www.ashishtrehan.com
Skills and Subject Matter
Python (NumPy, SciPy, Pandas) | Automotive, Logistics, Healthcare, Security, Geospatial |
SQL (Postgres, MySQL,SQL Server, Snowflake, Mariadb) | Operations and Support |
Data Visualization (Plotly, d3.js, Tableau, PowerBI, Periscope, Looker. Streamlit) | Media, Advertising, Edtech, Procurement, Pharm |
NLP (spaCy.io, nltk, Hugging Face)
AI and Computer Vision (Langchain, YOLO, LlamaIndex) | Infrastructure and Databases (Docker, Git, Codefresh, CircleCI, Kibana, Snowflake,Hadoop, Zookeeper, Redshift, Redis) |
Machine Learning (Classification, Prediction, Clustering, Optimization) using sklearn, or-tools, pyomo | Recommendation Systems (Collaborative Filtering and Knowledge-Based) |
Optimization (Pyomo, PuLP, Google OR-Tools) | Metric and KPI design and implementation |
Meta-Heuristics (Genetic Algorithm, Particle Swarm, and Simulated Annealing) | Market Segmentation |
Statistical Modeling (Statsmodels, R, SAS) | Qualitative and Quantitative Research |
Testing (Nosetests, Pytests, Selenium) | Data Engineering (Flask, Fastapi, Snowpipe, Snowpark, AWS(Lambda, ECR, Kubernetes), Airflow, Databricks, Azure Full Stack, Google Cloud, Redis, Alteryx, RabbitMQ ,Alooma) |
WORK
TECH DATA LEAD
SpendHQ / September 2023 - Current / Atlanta, GA
- Leadership: Mentor and have weekly 1 on 1s with five junior Data Engineers
- Data Engineering:
- Co-Architected and deployed the new data model for SpendHQ moving from a BFT to a normalized data architecture in Snowflake as the query and transform layer and then using SingleStore as the Read Layer
- Migrated and used Flyway as the migration deployment for the new Data Model
- Rebuilt the ETL and execution of Rules out of Alteryx/Assembler as a black-box solution to a self-service solution leveraging the new data architecture, Snowflake Stored Procedures/Tasks, and FastAPI.
- Deploy Redis to push events when the Rules Manager changed status
- Built an internal mock data creator leveraging LLMs and Time Series to help Sales Consulting sell the product to other industries using data similar to their line of work. Create synthetic data to sell to an OEM compared to a CPG.
- Incorporated Lexorank within the Data Pipeline Solution to allow for reordering and then generating the new sequence during the retrieval of the dataset.
LEAD SENIOR DATA ENGINEER
Healthgrades / July 2021 - August 2023 / Atlanta, GA
- Leadership: Mentor and weekly 1v1 with two BI Analysts and two Data Engineers
- Data Engineering:
- Architected the solution and deployed 60 workflows(snowpipes) to ingest from S3 to Snowflake helping to sunset Trifacta and MYSQL, which resulted in saving the team $100,000 in renewal. And allowing for more developers to be able to support and manage the solution.
- Managed and solutioned the migration of AdRevOps data from MYSQL and SQL Server to Snowflake followed Data Mesh principles to allow for a much more collaborative, scalable, and decentralized architecture to allow for the following:
- Self-service for analysts and developers
- Domain-oriented data governance
- Data as a Service to help with Product Owners to ideate features quicker and allow for Developer forward services
- Replaced email attachment download service workflows using Lambdas and Airflow hitting the API or pulling data from other cloud providers.
- Designed the Data Dictionary and Catalog using Alation for the PII and PHI data.
- Worked and designed the solution leveraging Role Based Security within Snowflake and maintaining control of the staging tables for developers and using views for business to build reports off of.
- Co-architected and developed the mapping of claims data into FHIR standard allowing for customers to transfer data between different insurance providers due to regulatory rule leveraging apigee, Azure Data Factory, and Databricks.
- Deployed POC of Streamlit dashboard to replace Tableau Dashboards
- Streamlit dashboard for marketing and advertising data and joining across multiple data resources. Daily data reports helped to drive efficiencies within the Ad Rev Ops team increasing weekly and end of month reporting into daily reports allowing for real-time decision making.
- Deployed a predictive and time series plotting to the Ad Rev Ops team to predict the expected number of impressions and clicks for certain ad campaigns. The forecast error, using sMAPE metric, is 5% to 8%.
DATA SCIENTIST
Uplift K12 / December 2020 - June 2021 / Atlanta, GA | Houston, TX
- Designing and deploying new Data Architecture for Uplift K12’s legacy product and new product Nora.
- Worked as Technical Project Manager to help score and prioritize engineering tasks and put on a Roadmap. The road map was created in Aha and all tasks are written in Asana.
- Worked with the Engineering team to begin implementing 3 week sprint cycles and use CircleCi for continuous integration.
- Help manage all current sprint planning and writing all.
- Deployed new Data Architecture on AWS. Tested all postgresql locally using Docker.
- Wrote all Regression and Acceptance Tests for legacy product lines using Selenium.
- Built and deployed new product API using Fastapi
- Deployed a reporting dashboard for User Engagement using Cube.js
- Tested in Tableau, Periscope, Looker
- Algorithmic Ranking for incoming Requests for the new product.
DATA SCIENTIST | BI Manager
Clutch Technologies / July 2016 - September 2020 / Atlanta, GA
Data Science:
- Replaced manual process of vehicle recommendations firstly with a heuristic model based off of business SEM knowledge balancing between two metrics of likability and predictability.
- The Recommendation System later evolved using previous explicit inputs like ratings of vehicles, make, and model and implicit inputs like time in vehicle to recommend vehicles based off Matrix Factorization (a class of Collaborative Filter)
- Vehicle and Concierge Assignment:
- The Recommendation System’s predicted ratings would then be used to help solve a combinatorial optimization problem where we wanted to maximize the product matrix of users requesting vehicles with the number of available or soon to be available vehicles.
- The process was manually assigned by our customer representatives before the deployment of our assignment algorithm. We went from automating 40% to eventually reaching 81% of all incoming requests.
- We eventually added another layer of finding which concierges would go out to complete the pick-up and delivery of each vehicle.
- Automated the unit testing of this algorithm using nosetests to ensure we were achieving the global maxima after adding new code to the master repo.
Business Intelligence:
- Designed and Maintained 30 Client Facing Dashboards
- Dashboards built on Periscope(Sisense) help to manage, track, and report the day to day operations of our Fortune 500 OMEs to large regional dealerships
- 168 different visualizations required maintenance and support
- Worked with Business Stakeholders to design a metric around churn.
- All internal and external Ad-Hoc reporting tasks were sent to me as tickets
Data Engineering:
- All Algorithmic Services like Recommender and Assignment built as backend APIs using Flask and later moving to Fastapi
- Used Segment to collect and build a data pipeline for web traffic to add zip codes to all incoming IP Addresses to help with Marketing Segmentation model. Used Redis to build new visitor sessions based on business requirements. All data was channeled into a Snowflake database and we used Periscope to build a dashboard for the Marketing Team.
- Built a pipeline for all Telematics (geolocation) data where data moved from Elasticsearch (JSON data) then parsed and using AWS glue moved to AWS Redshift and then using Periscope and Kibana to surface relevant data
- Built logging to monitor all services sent to elasticsearch and then using Kibana to help with monitoring and debugging
- Testing and Phasing in services would first send messages to Slack Webhooks to help with debugging and monitoring
- Built many SQL Views of denormalized tables used in supporting many dashboards
Project Management:
- Working with Cox Automotive Data Science team wrote out requirements for automotive portfolio optimization project
- Able to articulate Business Needs into clear requirements for success in the project
- Wrote our own stories within an Agile sprint framework for all projects
SENIOR ASSOCIATE DATA SCIENCE ANALYST
dRISTi 360 / July 2015 - July 2016 / Atlanta, GA
Business and Data Analytics Associate advancing predictive modeling within the Advertising Technology stack by leveraging social stream data from Twitter and Facebook to help advise and implement segmentation strategies over networks' audience. Created data-driven segments to craft sales stories to help networks prospect and create media plans for Brands/Advertisers.
- Co-Designed and Implemented an Audience Insights business application to report industry KPIs and data visualization to help reduce the need for building individualized presentations to clients.
- Developed and executed the design of functional requirements our technology to ensure valid segmentation can be made from social data
- Built the requirements of data cleansing of social data to ensure we were finding tweets that corresponded with dictionaries
- Developed and implemented the data cleansing stats using confusion matrix, accuracy, Matthews coefficient as well as the noise reduction framework utilizing random forest for feature extraction and Support Vector Machine as the decision model to classify tweets based on accept/reject and other classes
- My finger-tippiness with the data allowed for the creation of a data workflow and pipeline running from HDFS to Hive, preprocessing using MinHash then a Visual Inspection using bar charts to ensure the balanced training set between the classes. After the training-set was prepped and manually classified, deployed the use of Random Forest to pull the feature set and deployed Support Vector Machine as the decision model. This helped to improve overall efficiency in improving dictionary-creation as well as being able to move the dial from 71% accuracy to 85% accuracy.
- Data Discovery Platform: created ipython notebooks deployed on server to allow broader team to inspect data of the tweets to discover interesting keywords or topics that correlated heavily with the overall topic (TV Show) using SVD for dimensionality reduction and t-SNE to plot the data to allow for visual inspection giving the company a 50% productivity gain in finding a new keyword to add to the content dictionary.
- Supported analytic efforts to drive actionable insights which include numerous ROI business case analyses from program based outcomes
- Exec-level presentations doing walkthroughs of the front-end Ad Sales application and requirements gathering.
Big Data and Advanced Analytics Consultant
Slalom Consulting / June 2014 - June 2015 / Atlanta, GA
Big Data and Advanced Analytics consultant advised and implemented Business Intelligence, Advanced Analytics, and Big Data projects.
- Assisted in the SQL development in AWS Redshift and workflow design of a cross-customer platform to allow customer service to better understand the customer journey of cable clients using Tableau. The database combined call center data, web stream data, and truck rolls to help begin understanding reducing service cost and call-in-rate. Used Tableau dashboards to build out analytic platforms to help analysts find at what page depth would customers quit and then call in. Used Sankey graphs to see the progression of a work order number through web, call, & finally truck roll. Found that sent box hit reset were the most successful deterrent to truck rolls and could allow for customers to be serviced without the need of a call-in or truck roll.
- Deployed Survival Analysis Presentation on Worker Retention for Client showing that they were likely to lose male workers after the 4th year and women in the 7th year of work
- Deployed Self-Service R+Python Server (ipython) as an internal tool for the Advanced Analytics Practice and later shared it with the broader Business Intelligence team as a way to share and co-develop algorithms and code.
Practice Area Lead - Data Analytics
Sogeti Consulting (CapGemini) / March 2013 - July 2014 / Atlanta, GA
IT and Analytics professional within information management and analytics practice, with an emphasis in Big Data technology ecosystems, advanced analytics, and data visualization within the customer service, manufacturing.
- Sogeti, NCR IT Services, and Teradata-Aster team collaborated on the first Big Data initiative. The multi-disciplinary team spanning data analysts to business experts in charge of leveraging the Aster appliance with deep analytical techniques using machine-learning techniques to help predict the failure of devices on ATM units. Using statistical methods to quantify the predictions of legacy model used by NCR in predicting work orders. Also, utilizing Chi-squared independence tests, ROC charts, and scatter plots to test whether the legacy model and Random Forest model are currently better than a random guess. Ran and developed python code to create SQL functions to run our Random Forest Ensemble model over the Aster-MR. The model was able to move the needle from predicting < 10% off the old rule-based system to being able to predict 32% of Work Orders.
- Used SAS to build statistical models around Chi-Square Tests
Education
Masters in Applied Economics from University of Georgia, Athens, Georgia (2013)
Bachelor of Science (B.S) Architecture from the University of Georgia, Athens, Georgia (2010)