Course map
Python�& Colab
pandas, matplotlib �& visualization
APIs &�City data
Global
data
2 weeks
Develop intermediate Python skills; learn how to use code and text cells in Colab notebooks. Access data on gDrive and on repositories
3 weeks
Learn how to use dataframes. Setup plots and draw annotated charts using dataframes.
3 weeks
Establish data feeds, learn about ETL, analyze and identify patterns.
3 weeks
Research and correlate multiple data sets from different sources
Real-time�data
3 weeks
Process streaming data to derive time-dependent characteristics.
Tools &
Techniques
& theory
methods
Course map
Python�& Colab
2 weeks
Develop intermediate Python skills; learn how to use code and text cells in Colab notebooks. Access data on gDrive and on repositories
Transition to Python: write simple methods, appreciate multi-value returns, write basic programs in Python.
Understand the Colab environment: it’s a simple IDE for Python combined with a document editor. Learn the basic syntax for MarkDown (MD) markup as well as simple math typesetting using LaTex.
Learn how to attach your Google Drive to your Python code.
Review of basic quantitative techniques.
Course map
Develop understanding of pandas.dataframes, how to create them, how to edit them, etc.
The ETL (Extract-Transform-Load) process. Extract: Import data files from Google Drive or from online sources into a dataframe. Transform: Curate a dataframe to identify missing, corrupt, or meaningless data. Load: pass the dataframe into analytical or visualization tools.
For visualization, develop a familiarity with the matplotlib package. �Learn how to customize graphs.
Correlation maps, histograms, and clusters.
pandas, matplotlib �& visualization
3 weeks
Learn how to use dataframes. Setup plots and draw annotated charts using dataframes.
Course map
Learn how use APIs for public data repositories.�Case study: City of Chicago data portal.
Build an analytical tool from scratch. First with a local copy of a small data subset, then by utilizing the API to pull the full data set.
Develop ETL strategy. (How to handle bad data).
Discover and narrate simple associations.�Experiment with K-means clustering.
APIs &�City data
3 weeks
Establish data feeds, learn about ETL, analyze and identify patterns.
Course map
Global
data
3 weeks
Research and correlate multiple data sets from different sources
Does a country’s freedom or press correlate with its commitment to transparency? How about with its quality of life? And how do we measure commitment to transparency or quality of life?
In this module you’ll explore data sets from the World Bank and major non-governmental organizations, to narrate correlations of socioeconomic data.
Explore the correlation/causation barrier. Hypotheses and tests
Course map
Real-time�data
3 weeks
Process streaming data to derive time-dependent characteristics.
How can we capture data in real-time? What can we do with time-dependent data?
In this module you will explore simple time-series analysis and aggregation of streaming data. There are many sources of real-time data: aviation traffic, stock markets, ham radio, social media, etc.
In addition to time series observations, some data streams can be analyzed for sentiments (e.g., social media posts).
Basics of natural language processing.
Course map
Have�questions?
I already know Python, what am I supposed to do for two weeks?
Spend more time mastering the MarkDown component of Colab notebooks and develop skills in mathematical typesetting using LaTeX. Pair with a classmate who’s not as familiar with Python and assist them.
I’ve used pandas, matplotlib, etc, before. What should I do for two weeks?
Get an early start with seaborn and scikit-learn. Though if you are already familiar with pandas and matplotlib, please speak with me to make sure you are not taking a redundant course.
Can I work on my own project?
Yes, but conditions apply. If you have an idea that is relevant to other parts of your studies or work, please speak with me to discuss options.