Data Science/ ML
Interviews
Interview Pillars
&
Resources
...the never ending list
And now we have LLMs too!
BUT FIRST...
If you have no idea whatsoever..
Just do these courses to get started:
Machine Learning A-Z (Python & R in Data Science Course)
Learn Python for Data Structures, Algorithms & Interviews
..Create a github repo. E.g. this
.. Start solving past and present Kaggle competitions, starting with forecasting using LGBM, xgboost, Bayesian Optimization
Disclaimer: All this is based on my experience of failing many many interviews and endlessly reading stuff on Reddit, Blind and LC
Resources
https://www.youtube.com/watch?v=-WEpWH1NHGU [start with this]
https://www.interviewquery.com/questions
4. A/B Testing
Sample Size, Power, alpha | |
Multiple Metrics | Bonferroni Correction |
Bonferroni correction - multiple testing | Trustworthy online experiments [you can find a pdf] |
Type 1, type 2 errors | |
Experimental design - randomization unit, MDE | |
MVT | |
ABn Testing, multi arm bandits | |
Simpson's paradox | |
Sample Ratio mismatch | |
quasi experiment | |
Z-test, T-test, Anova, Ancova, chi sq | |
Non normal AB Test | |
Proportion Testing | |
Summary of AB Testing |
5. Product Sense
Resources
6. System Design : Enough Tutorials on YouTube [Tech Dummies is great!]
7. ML System Design: www.boringbot.xyz
8. Live Coding on ML Problems: Kaggle Kaggle Kaggle! [learn how to write precision/recall from scratch]
9. Writing Algorithms from scratch: hamzafarooq/algos: Building ML Algorithms ground-up
10. Behavior : Amazon Leadership Principles
11/12. DS + ML Concepts: 100 Page ML Book
Bonus link for Data Science Concepts: ISLR Textbook Slides, Videos and Resources
DS Skills | Description | Expectations |
Data Querying | Ability to write queries involving not limited to SQL/MySQL/Hive etc. for joining datasets, summarizing and aggregating from large scale databases | Minimum Expectation (fixed ) : Different types of Joins, when are they used, Group By, Distinct , UNIONS, basic sub queries , comparators Good to haves ( depending on level ) : window functions, date/time manipulations, string formatting, running totals, pivoting, lag, lead operations. |
Statistics | Understanding of statistical intuition behind samples, population & hypothesis testing | Minimum Expectations (fixed ) : Central limit theorem , Different statistical distribution ( top 3 ) uses cases, p-values, confidence intervals, linear regression, basic parametric tests like z test, t-test etc. Good to haves (depending on level ) : Effect size, power analysis, sampling techniques, top 10 statistical distributions use cases, L1/L2 regularizations understanding. |
Machine Learning | Understanding of how Machine learnings works , algorithms and thought process behind different top ML algos. | Minimum Expectations (fixed) : Understanding of over/under fitting, training/test/validation set, ability to deal with uncleaned data, How trees, clustering, logistic regression and dimension reduction work, cross validation and evaluate which algorithm is better. Good to haves ( depending on level ) : Tree Pruning, bootstrapping, ensemble models, boosting, ROC curves, parameter tunings, when and how to balance between accuracy and interpretability of ML models. |
DS Skills | Description | Expectations |
Fundamentals of Programming | Experience in writing basic programs using any language, understanding of basic data structures. | Minimum Expectation (fixed) : pseudo code of common problems, loops , counters, edge cases , space and time complexity ( basics ) of any approach, ability to write common programs of finding area of triangle, palindrome etc. Good to haves ( depending on level ) : Object oriented programming, dictionaries & hash maps , breaking down problem to sub problems, solid understanding of big O notations, experience in writing big programs, experience in version control/git. |
Applied Math & Probability | Solid fundamentals of high school math and numbers , probability | Minimum Expectation (fixed ) : understanding of permutation, combination, fundamentals properties of probabilities, bayes theorem, basics of linear algebra and matrix. Good to haves ( depending on level ) : optimization, inflection point intutition. |
Language Models
But why do I need to learn all this?
Resources
What is NLP anyways?
Resource: Ultimate Guide to Understand and Implement Natural Language Processing
Natural Language Processing (NLP) is defined as the branch of Artificial Intelligence that provides computers with the capability of understanding text and spoken words in the same way a human being can. It incorporates machine learning models, statistics, and deep learning models into computational linguistics i.e. rule-based modeling of human language to allow
The ultimate Notion Resource - built by my team
Different kind of Roles
Product Analyst | Data Analyst |
Metrics for Product growth/health | Pretty much a Data Analyst |
Ex: Our MAU, DAU are down by 10% | Ex: derive insights for all different kind of users across the board |
Focus on immediate commercial outcome | Focus on long term outcome |
Data scientist | ML Engineers |
Extract knowledge and insights from structured and unstructured data | ML models learn from data -> ML is part of data science |
Use data to help company make decisions | Develop models to turn data into products |
Is a scientist -> engineering isn’t a top priority | Is an engineer -> engineering is a top priority |
Caveats
Research | Applied research |
Find the answers for fundamental questions and expand the body of theoretical knowledge. | Find solutions to practical problems |
Ex: develop a new learning method for unsupervised transfer learning | Ex: develop techniques to make that new learning method work on a real world dataset |
Focus on long term outcome | Focus on immediate commercial outcome |
Caveats