Lecture 14: �A Few Tips for Data Science in “Real” World
Applied Data Science Fall 2025
Amir Hesam Salavati
Hamed Shah-Mansouri
Last Session We Covered...
Explainable AI
Global Explanation
Local Explanation
Imbalanced �Data
In this session
Data Pipelines
https://russiabusinesstoday.com/infrastructure/russia-to-build-20000-km-of-oil-gas-pipelines-by-2022-report/
Pipeline Purpose
Image: https://www.geeksforgeeks.org/whats-data-science-pipeline/
Pipeline Tools
Spectrum of Pipelines
Apache Spark
Source: https://databricks.com/spark/about
Kafka
Source: https://axual.com/what-is-apache-kafka/
AirFlow
Scikit-learn Pipelines + Transformers
Image: https://youtube.com/watch?v=jzKSAeJpC6s
Pandas Pipes
https://hackersandslackers.com/merge-dataframes-with-pandas/
No-coding Pipeline
Image:https://siliconangle.com/2021/12/17/hevo-data-closes-30m-round-provide-data-pipelines-cloud/
Cloud-based Pipeline
Some Examples of Real Data Pipelines
Pinterest�over 100M MAU doing over 10B+ pageviews per month.
AirBnB�over 100M users browsing over 2M listings�
Automated Machine Learning (AutoML)
Image: https://datafloq.com/read/why-automated-machine-learning-important/
AutoML, Its Roots and Applications
AutoML Range of Applications
AutoML Tools
Cloud AutoML
Challenges of Data Science in “Real” Life Applications
Image: https://summitphotography.ca/
Real life environments usually come with constraints that mean we need to solve the challenge within those boundaries (e.g. cost, resources, speed, etc.)
Accuracy vs. Speed vs. Cost Tradeoffs
Accuracy vs. Explainability (vs. Manipulation) Trade Offs
Cold Start Problem: What to do When We Have No Data?!
Cold Start Problem
A/B Testing: Why Do Limited Releases at First?
A/B Testing, Even When You are Pretty Sure!
Necessary Skills for an Applied Data Scientist
Image: http://www.leansolutions.gr/blog/what-are-benefits-multi-skilling-in-production/
Another Data Analysis Pipeline/Workflow
Problem
Gathering Data
Analysis
Knowledge & Insight
Exploratory Data Analysis
Data Cleaning/�Preprocess
Which Parts Did We Cover in This Course?
What We Have Covered in This Course
Problem
Gathering Data
Analysis
Knowledge & Insight
Exploratory Data Analysis
Data Cleaning/�Preprocess
What Remains: Problem Formulation
Image: https://techgrabyte.com/10-machine-learning-algorithms-application/
What Remains: Problem Formulation
Image: https://techgrabyte.com/10-machine-learning-algorithms-application/
What Remained: Data Gathering
Image: https://docs.eazybi.com/eazybi/data-import/external-data-sources
What Remained: Insight Sharing and Storytelling
Source:https://madelearningdesigns.com/2014/02/06/using-digital-stories-in-elearning/ + https:/forbes.com/sites/jeffthomson/2021/05/28/the-power-of-storytelling-in-the-finance-function/
What Remained: Insight Sharing and Storytelling
Source:https://madelearningdesigns.com/2014/02/06/using-digital-stories-in-elearning/ + https:/forbes.com/sites/jeffthomson/2021/05/28/the-power-of-storytelling-in-the-finance-function/
Additional Soft Skills That Could Help
Additional Soft Skills That Could Help
Additional Soft Skills That Could Help
Some Helpful Resources
Image:https://ideas.ted.com/teds-summer-culture-list-114-podcasts-books-tv-shows-movies-and-more-to-nourish-you/
Courses
Blogs
Community
Podcasts
Youtube Channels
Data Pipelines
AutoML
Tips for data science in “real” world
https://towardsdatascience.com/the-absolute-beginners-guide-for-data-science-rookies-736e4fcbff0a
ToDo List for Next Session