Statistical Models of Networks, Graphs, and Other Relational Structures
Prof Daniel M. Roy
firstname.lastname@example.org (please include “STA4513” in your email’s subject line or body)
Office hours: SS 6026, Tuesdays 9--11am, or by appointment.
Because the class was approved rather late, students who miss the first class but still want to take the course are encouraged to come to the second class. Please contact me so that you can receive notes for the first class.
For short courses, like this one, the drop date is the date of the second class.
Date/Time: Fall 2014, Wednesdays, 2--5pm from Sept. 10 through Oct. 15
Room: WE 69 (Wetmore Hall, New College, across Huron St from Sidney Smith)
This course is a survey of topics on the statistical analysis of network, graph, and relational data, with an emphasis on network modeling and random graphs. In particular, the role of probabilistic symmetries---including exchangeability and stationarity---will be our primary concern, at least in the second half.
Our understanding of graph- and network-valued data has undergone a dramatic shift in the past decade. We now understand there to be fundamentally different regimes that relate to the prevalence of edges. The best understood is the dense regime, where, informally speaking, we expect to see edges among vertices chosen uniformly at random from a large graph. The mathematical foundations of this area can be traced back to work by Aldous and Hoover in the early 1980s, but work in graph theory over the past decade has enriched our understanding considerably. Most existing statistical methods, especially Bayesian ones, work implicitly in the dense regime. Real-world networks, however, are not dense. A growing community is now focused on the structure of large sparse graphs. The sparse regime, however, is not well understood: key mathematical notions continue to be identified. We will work through key papers in probability, statistics, and graph theory in order to gain the broader perspective necessary to identify opportunities to contribute to our understanding of statistical methods on graphs and networks.
The course will be structured around weekly readings and student presentations of papers. Each week there will be a lecture on a new subject. Student presentations on each topic will follow the subsequent week.
Each week there will be a lecture on a topic and a student presentation on the previous week’s topic.
See Reading (below) for papers associated with each week’s topic.
The grade in the course will be based on:
Each student will be expected to make at least one class presentations. The second presentation can be replaced by a research project. Those auditing will be expected to make at least one class presentation, and potentially scribe if there are too few registered students.
*Scribing requires that the student take detailed notes during lecture, and produce a LaTeX version of these notes. The notes should be free from typos, and should allow a reader to follow the logic of the presentation. These will be distributed to the students. A LaTeX template for scribing will be made available on the course website and should be used.
Students should remain attentive, ask questions, and answer questions during class. Participation also involves reading over those papers being presented that week, and coming with questions. Scribing is a clear way to participate.
Students are expected to make paper presentations, which can be tackled in groups of up to 2 students collaboratively. (Groups of 3 should receive my prior approval.) A list of scheduled presentations will be available on the course website. Papers other than those that have been marked with yellow astericks (*) require prior approval. Papers should be chosen from the appropriate topic for the week, but exceptions will be considered if proposed more than a week in advance. Students are also encouraged to find papers in application areas of interest to them.
Presentations can either be slide presentations or chalk talks. They should be planned to take 45 minutes, with 15 minutes for questions. (Rehearse your presentations to check for timing. A rough guide is no more than one slide per minute.) In the case of a slide presentation, PDF slides will be placed on the website after the class. In the case of a chalk-talk, the presenter is responsible for producing notes for the talk. These can be the hand-written notes if they are clearly written and suitable for publication on the website, otherwise they should be LaTeX'd. Notes should be submitted by Friday of the same week of the presentation.
Research projects will be simulations and/or theory on a model that I will choose. Unless I give approval otherwise, research projects should be completed individually. Students should decide whether they will pursue a research project before the third class.
Research projects are due by email on the last day of class (11:59pm Toronto time, Oct 15). Every day of delay thereafter results in a 10% deduction. Presentations cancelled later than Monday noon will be counted as missed. Missed presentations can be made up if there are slots available for 75%. Extenuating circumstances will be handled on a case by case basis.
Mathematical maturity and some background in linear algebra, analysis, measure theory, and probability theory recommended.
Blackboard will be used to manage the course list and grades, but the course information and links will be available at http://danroy.org/teaching/2014/STA4513/
Students with diverse learning styles and needs are welcome in this course. Please feel free to approach me or Accessibility Services so we can assist you in achieving academic success in this course. If you have not registered with the Accessibility Services and have a disability, please visit the Accessibility Services website at http://www.accessibility.utoronto.ca for information on how to register.
There is no required course text. Students should read those papers being presented in each week. Papers marked with * are approved for presenting. Students may suggest other articles (listed here, or otherwise), but my prior approval is required.
These readings, cited by Snijders, address statistical problems dealing with network data.
There are several courses available on the statistical analysis of networks. They tend not to emphasize the connections with random graphs or exchangeability, hence my emphasis on models. Nevertheless, for doing work in this area, it is useful to know the perspective of these courses.