Analysis of Social Media 10-802 and 11-772

Announcements:


Time & Place: Tuesday 4:30-6:30 Wean Hall 4623

Instructors: William W. Cohen (Machine Learning Dept); Natalie Glance (Google Pittsburgh)

Description: The most actively growing part of the web is "social media"—e.g.. wikis, blogs, bboards, and collaboratively-developed community sites like Flikr and YouTube.  This seminar course will review selected papers from the recent research literature that address the problem of analyzing and understanding social media.  This will be a 6-credit course, with the primary workload being attending class and presenting material.  (Students that would like to upgrade the course to a full 12-credit course and submit a course project should contact Dr. Cohen).


Topics that will be covered include:

The seminar will also include guest lectures from researchers in industry that have actively worked on social media analysis. 

 

Students should have a machine learning course (e.g., 15-781 or 15-681) or consent of the instructor. The content of the course will be complimentary to another new course, “The Social Web: Content, Communities, and Context” (05-320/05-820) which is also being offered in fall 2007.


This course is cross-listed under LTI as 11-772.

Tentative schedule

Reading list

Text analysis for social media

Sentiment/opinion:
  1. Thumbs up or thumbs down? semantic orientation..., Turney, ACL 2001.
  2. Thumbs up? Sentiment Classification using Machine Learning ..., Pang et al, EMNLP 2002.
  3. Extracting Product Features and Opinions from Reviews, Ana-Maria Popescu, Oren Etzioni, Proceedings of HLT-EMNLP, 2005. The OPINE System.
  4. Joint Extraction of Entities and Relations for Opinion Recognition, Choi et al, EMNLP 2006.  
  5. Overview of the TREC-2006 Blog Track, Ounis et al, TREC 2006.
  6. Annotating Expressions of Opinions and Emotions in Language, Wiebe, Wilson, Cardie, Computational Linguistics 2005.

     

  7. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification, Blitzer et al, ACL 2007
  8. Large-Scale Sentiment Analysis for News and Blogs, Godbole et al, ICWSM 2007

Demographics/authorship:
  1. GIS and the Blogosphere, Hurst, WWW 2005 Workshop on the Weblogging Ecosystem. (Predicting GIS info for authors)
  2. Style mining of electronic messages for multiple authorship attribution: First Results, Argamon et al, KDD 2003. (On newsgroups).
  3. Effects of age and gender on blogging, Schler et al, AAAI 2006 Spring Symposium on Computational Approaches to ....
  4. Whose Thumb Is It Anyway? Classifying Author Personality from Weblog Text, Oberlander & Nowson, COLING/ACL 2006. (Predicting personality info for authors)

Link-based community analysis

Topic modeling/LDA-like stuff:
  1. The Missing Link: A Probabilistic Model of Document Content and Hypertext Connectivity, Hoffman & Cohen, NIPS 2001
  2. Topic and Role Discovery in Social Networks, McCallum et al, IJCAI 2005

Propagating influence, trust and influence:
  1. The Web as a graph: Measurements, models and methods Kleinberg et al, Invited survey at the International Conference on Combinatorics and Computing, 1999. Background on HITS and bipartite cliques.
  2. The PageRank citation ranking: Bringing order to the Web, Page et al, 1999. Background on pagerank.
  3. Modeling Trust and Influence in the Blogosphere Using Link Polarity, Kale et al, ICWSM 2007.
  4. Mining Knowledge-Sharing Sites for Viral Marketing, Richardson & Domingos, KDD 2002.
  5. Maximizing the spread of influence through a social network, Kleinberg et al, KDD 2003.
  6. Patterns of Influence in a Recommendation Network, Lescovec et al, PAKDD2006.
  7. Information diffusion through blogspace, Gruhl et al, WWW 2004
  8. The Small-World Phenomenon: An Algorithmic Perspective, Kleinberg. STOC 2000
  9. Implicit structure in the dynamics of the blogsphere, Adar, WWW 2005 Workshop on the Weblogging Ecosystem.
  10. Unsupervised prediction of citation influences, Dietz et al, ICML 2007.
  11. The political blogosphere and the 2004 US election: divided they blog, LA Adamic, N Glance - 3rd international workshop on Link …, 2005
  12. Cascading Behavior in Large Blog Graphs, Leskovec et, SDM 2007.
  13. Cost-effective Outbreak Detection in Networks, Leskovec et al, KDD 2007
  14. Mining blog stories using community-based and temporal clustering, Qamra et al, ICIKM 2006

Other analysis tasks for social media


SPAM detection:

  1. Blocking Blog Spam with Language Model Disagreement, Mishna et al, WWW 2005 Workshop on Adversarial IR on the Web


Tags and Folksonomies:

  1. TagAssist: Automatic Tag Suggestion for Blog Posts, Sood et al, ICWSM 2007
  2. Improved Annotation of the Blogosphere via Autotagging and Hierarchical Clustering, Brooks & Montenez, WWW 2006
  3. The Complex Dynamics of Collaborative Tagging, Halpin et al, WWW 2007

Trends in social media:

  1. Bursty and hierarchical structure in streams, Kleinberg, KDD 2002. Background for a couple of the papers below.
  2. On the Bursty Evolution of Blogspace, Kumar et al, WWW 2003.
  3. The predictive power of online chatter, Gruhl et al, KDD 2005.
  4. Visualizing tags over time, Dubinko et al, WWW 2006. 
  5. Graphs over time: densification.... Leskovec, Kleinberg, Faloutsos, KDD 2005

Datasets and tools

  1. GUESS: A Language and Interface for Graph Exploration, Adar, CHI 2006. Visualization tool.
  2. Overview of the TREC-2006 Blog Track, Ounis et al, TREC 2006. The TREC blog data.
  3. ICWSM dataset.