The fight against the COVID-19 pandemic is being driven by data and model projections. These models provide crucial insights to help mobilize citizens, public health entities and governments in response to the virus.

Data science is a multidisciplinary field founded on statistics, data analysis and machine learning. It aims at extracting knowledge and insights from big and small data. It’s advancing almost every aspect of our lives; from public health and healthcare so evident today, to politics, economics, entertainment and national security.  It’s spawning new startups and technology, such as, targeted advertisements, fraud detection, self-driving cars, search recommender systems, stock forecasting and health diagnostics tools.

In this seminar you will hear from leading researchers at Dartmouth across a wide variety of fields who are extracting new insights from data to advance their respective fields. You will also read and present research papers on key areas in data science. This seminar will also include a number of programming assignments that seek to reinforce concepts and computational methods widely used in data science as preparation of group projects. The programming assignments will use the pydata stack: the Python open data science stack.  

There will be no formal lectures and a large amount of self learning and data hacking will be required. We intend to use most X-hours to briefly preview weekly programming assignments. Note, there is little math in this seminar, it’s mostly focussed on hacking using the pydata stack. While advanced seminars such as this are more geared toward research and graduate students, undergraduates are very very welcome.

Prerequisites: Python and Machine Learning or Professor Campbell’s approval.

Our seminar guest speakers

 Class information and Logistics

The plan to run the seminar as follows but may have to adapt. All interaction will be via Zoom (the Zoom ID will be listed on Canvas).

Time: 2A Tuesday 2.25-4.15 pm, Thursday 2.25-4.15 pm, Wednesday-X 4.35-5.25 pm

Seminar leader: Andrew T. Campbell. Zoom office hours Monday and Friday 4-5 pm

TA: Shayan Mirjafari 

Office hours: Zoom office hours.

Guest Speakers: Guest presenters will record their talks and the recordings will be made available via Canvas. I am hoping that the guest presenters will be available for 30 mins during the Tuesday or Thursday session that their talk was previously scheduled.

Assignments: Shayan Mirjafari will present a brief summary of the assignment. Again the video will be available via Canvas. He will be available for questions on Zoom during the x-hour slot on Wednesday-X 4.35-5.25 pm

Student presentations. We are planning to use Zoom. I will ask presenters to record their presentations in case there is an issue with live streaming. I will post the video on Canvas. In a Zoom setting we may be able to have a Q and A as in a traditional seminar.

  Coursework and grading

Reading/critiques 20%

Presentations: 15%

Programming assignments: 40%

Group project: 25%

  Writing critiques

You are required to read the papers presented in class and write a critique. A critique should be a minimum of 1.5 page long but can be longer it should include the following:

Note, I will grade most of your paper on what you liked (strengths), didn’t like (weakness) and what were the holes (your proposed future work)

Please read this: Keshav, S. (2007). How to read a paper. ACM SIGCOMM Computer Communication Review, 37(3), 83-84.

And check out How to give a good presentation

Grading Critiques

There are approximately three papers per week. One will be from the invited speaker and two other papers presented by students. You have to write a critique for the invited speaker (mandatory) and you can select one of the other student presented papers for the other critique. That means on average you will read 3 papers and write 2 critiques per week. Papers are due 11.59 pm the day before the speaker or presenters.

The critiques serve two purposes in my mind:

So you come to class with good knowledge of the paper and are ready to contribute to the discussion of the contributions, and pros and cons of the paper presented.

Programming assignments

Each students will complete a programming assignment that will reinforce ideas and techniques in data science; the tentative list of projects are as follows:

  1. Data acquisition
  2. Statistical tests: confidence interval, t-tests, ANOVA, correlation. MPG dataset and Hanover climate dataset.
  3. Linear regression: training and interpretations. Bike rental dataset
  4. Classification – credit card fraud detection
  5. Twitter: topic models and sentiment analysis
  6. Deep learning and TensorFlow

Submission of programming assignments

Your assignments will use jupyter notebooks that we provide when the assignment is handed out. Please submit your notebook and data files to Canvas.

Assignments will be previewed during Wednesday x-hour and due on the following Tuesday at 11.59 pm. You have one 24-hour extension you can use.

Resources

PyData stack tools

How to read a paper

How to give a good presentation

Statistics

Group projects

Project material from the last offering of the class is here.

Week 1 -- Future of Mobile Sensing

Introduction to data science seminar

Tuesday: Speaker: Andrew Campbell (Computer Science): “Future of Mental Health Sensing”

Paper: Wang, Rui, et al. “Tracking depression dynamics in college students using mobile phone and wearable sensing.” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2.1 (2018): 1-26

Wednesday-X: Shayan Mirjafari: Assignment preview to be recorded and have Zoom session [See Canvas for Zoom]

Thursday: Student Presentations

Obuchi, Mikio, et al. "Predicting Brain Functional Connectivity Using Mobile Sensing." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4.1 (2020): 1-22 (Presenter: Mikio Obuchi [presentation_slides])

Week 2 -- Social Networks and COVID-19

Tuesday: Student Presentations

Li, L., Qin, L., Xu, Z., Yin, Y., Wang, X., Kong, B., ... & Cao, K. (2020). Artificial Intelligence Distinguishes COVID-19 from Community Acquired Pneumonia on Chest CT. Radiology, 200905. (Presenter: Sunint Bindra [presentation_slides]).

Mirjafari, Shayan, et al. "Differentiating higher and lower job performers in the workplace using mobile sensing." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3.2 (2019): 1-24 (Presenter: Subigya Nepal. [presentation_slides]).

Wednesday-X: Shayan Mirjafari: Assignment preview to be recorded and have Zoom session [See Canvas for Zoom]

Thursday: Speaker: Thalia Wheatley (Psychological and Brain Sciences): “Similar neural responses predict friendship

Paper: Parkinson, Carolyn, Adam M. Kleinbaum, and Thalia Wheatley. "Similar neural responses predict friendship." Nature communications 9.1 (2018): 1-14.

Week 3 -- Healthcare and Deep Learning

Tuesday: Student Presentations

Esteva, Andre, Alexandre Robicquet, Bharath Ramsundar, Volodymyr Kuleshov, Mark DePristo, Katherine Chou, Claire Cui, Greg Corrado, Sebastian Thrun, and Jeff Dean. "A guide to deep learning in healthcare." Nature medicine 25, no. 1 (2019): 24-29. Presenter: David Chen [presentation_slides].

Miotto, Riccardo, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel T. Dudley. "Deep learning for healthcare: review, opportunities and challenges." Briefings in bioinformatics 19, no. 6 (2018): 1236-1246. Presenter: Derek Bai [presentation_slides].

Wednesday-X: Shayan Mirjafari: Assignment preview to be recorded and have Zoom session [See Canvas for Zoom]

Thursday: Speaker:Nicholas Jacobson  (Geisel School of Medicine) : “Cognitive-behavioral therapy in the digital age

Paper: Wilhelm, S., Weingarden, H., Ladis, I., Braddick, V., Shin, J., & Jacobson, N. C. (2020). Cognitive-behavioral therapy in the digital age: presidential address. Behavior Therapy, 51(1), 1-14.

Nick thought this was a good short primer for his paper above: Jacobson, N. C., Weingarden, H., & Wilhelm, S. (2019). Digital biomarkers of mood disorders and symptom change. NPJ digital medicine, 2(1), 1-3.

Week 4 -- Computational Music and the Brain

Tuesday: Student Presentations

Martín Abadi, et la. “TensorFlow: a system for large-scale machine learning”. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 265-283. (Presenter: Ahsan Azim [presentation_slides])

Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113 (Presenter:  Shuai Jiang [presentation_slides])

Wednesday-X: Shayan Mirjafari: Assignment preview to be recorded and have Zoom session [See Canvas for Zoom]

Thursday: Speaker: Michael Casey (Music/Computer Science): “Music and the Brain

Paper: Casey, Michael A. “Music of the 7Ts: Predicting and Decoding Multivoxel fMRI Responses with Acoustic, Schematic, and Categorical Music Features." Frontiers in psychology 8 (2017): 1179

Week 5 -- Computational Healthcare -- and Social Justice

Tuesday: Speaker: James O'Malley (The Dartmouth Institute) “Computational Healthcare

Paper: O'Malley, A. James, Erika L. Moen, Julie PW Bynum, Andrea M. Austin, and Jonathan S. Skinner. "Modeling peer effect modification by network strength: The diffusion of implantable cardioverter defibrillators in the US hospital network." Statistics in Medicine (2020).

Wednesday-X: Shayan Mirjafari: Assignment preview to be recorded and have Zoom session [See Canvas for Zoom]

Thursday: Student Presentations

Dressel, Julia, and Hany Farid. "The accuracy, fairness, and limits of predicting recidivism." Science advances 4, no. 1 (2018)  (Presenter:  Madeleine Genereux. [presentation_slides])

Martín Abadi, et la. “TensorFlow: a system for large-scale machine learning”. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 265-283. (Presenter: Ahsan Azim [presentation_slides])

Week 6 -- Election Forecasting and Fake News

Tuesday: Shayan Mirjafari: Assignment preview to be recorded and have Zoom session [See Canvas for Zoom]

Wednesday-x: Speaker: Sean Jeremy Westwood (Government)   The political horserace

Paper: Westwood, Sean, Solomon Messing, and Yphtach Lelkes. "Projecting confidence: How the probabilistic horse race confuses and demobilizes the public."

Thursday: Student Presentations

Shu, Kai, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. "Fake news detection on social media: A data mining perspective." ACM SIGKDD Explorations Newsletter 19, no. 1 (2017), (2020), Presenter: Christopher Miller [presentation_slides]).

Guess, A. M., Lockett, D., Lyons, B., Montgomery, J. M., Nyhan, B., & Reifler, J. (2020). “Fake news” may have limited effects beyond increasing beliefs in false claims. Harvard Kennedy School Misinformation Review,  (Presenter: Sydney Lister [presentation_slides]).

Week 7 -- The Twittersphere -- and Fintech

Tuesday: No class

Wednesday-x: Speaker: Soroush Vosoughi (Computer Science)    “Tribalism in the Twittersphere”

Thursday: Project Pitches and Student Presentations

Dixon, Matthew, Diego Klabjan, and Jin Hoon Bang. "Classification-based financial markets prediction using deep neural networks." Algorithmic Finance 6, no. 3-4 (2017): 67-77.  (Presenter: Joshua Ackerman [presentation_slides]).

Week 8-10 -- Group Projects

All project material is here.