Analysis of Social Media 10-802 and 11-772
Announcements:
Time
& Place: Tuesday 4:30-6:30 Wean Hall 4623
Instructors:
William W. Cohen
(Machine Learning Dept); Natalie Glance (Google Pittsburgh)
Description: The most actively growing part of the web is "social
media"—e.g.. wikis, blogs, bboards, and collaboratively-developed community
sites like Flikr and YouTube. This seminar course will review selected
papers from the recent research literature that address the problem of
analyzing and understanding social media. This will be a
6-credit course, with the
primary workload being attending class and presenting material.
(Students that would like to upgrade the course to a full 12-credit course and
submit a course project should contact Dr. Cohen).
Topics that will be covered include:
-
Text analysis techniques for sentiment analysis, analysis of
figurative language, authorship attribution, and inference of demographic
information about authors (e.g., age or sex).
-
Community analysis techniques for detecting communities, predicting
authority, assessing influence (e.g. in viral marketing), or detecting spam.
-
Visualization techniques for understanding the interactions within
and between communities.
-
Learning techniques for modeling and predicting trends in social
media, or predicting other properties of media (e.g., user-provided content
tags.)
The seminar will also include guest lectures from researchers in industry that
have actively worked on social media analysis.
Students should have a machine learning course (e.g., 15-781 or 15-681) or
consent of the instructor. The content of the course will be complimentary to
another new course, “The Social Web: Content, Communities, and Context”
(05-320/05-820) which is also being offered in fall 2007.
This course is cross-listed under LTI as 11-772.
Tentative schedule
Reading list
Text analysis for social media
Sentiment/opinion:
-
Thumbs
up or thumbs down? semantic orientation..., Turney, ACL 2001.
-
Thumbs
up? Sentiment Classification using Machine Learning ..., Pang et al,
EMNLP 2002.
-
Extracting
Product Features and Opinions from Reviews, Ana-Maria Popescu, Oren
Etzioni, Proceedings of HLT-EMNLP, 2005. The OPINE System.
-
Joint
Extraction of Entities and Relations for Opinion Recognition, Choi et
al, EMNLP 2006.
-
Overview
of the TREC-2006 Blog Track, Ounis et al, TREC 2006.
-
Annotating
Expressions of Opinions and Emotions in Language, Wiebe, Wilson, Cardie,
Computational Linguistics 2005.
-
Biographies,
Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment
Classification, Blitzer et al, ACL 2007.
-
Large-Scale Sentiment
Analysis for News and Blogs, Godbole et al,
ICWSM
2007
Demographics/authorship:
-
GIS
and the Blogosphere, Hurst, WWW 2005 Workshop on the
Weblogging
Ecosystem. (Predicting GIS info for authors)
-
Style
mining of electronic messages for multiple authorship attribution: First
Results, Argamon et al, KDD 2003. (On newsgroups).
-
Effects
of age and gender on blogging, Schler et al, AAAI
2006 Spring Symposium on Computational
Approaches to ....
-
Whose
Thumb Is It Anyway? Classifying Author Personality from Weblog Text,
Oberlander & Nowson, COLING/ACL 2006. (Predicting personality info for
authors)
Link-based community analysis
Topic modeling/LDA-like stuff:
-
The
Missing Link: A Probabilistic Model of Document Content and Hypertext
Connectivity, Hoffman & Cohen, NIPS 2001
-
Topic
and Role Discovery in Social Networks, McCallum et al, IJCAI 2005
Propagating influence, trust and influence:
-
The Web as a
graph: Measurements, models and methods Kleinberg et al, Invited survey
at the International Conference on Combinatorics and Computing, 1999.
Background on HITS and bipartite
cliques.
-
The
PageRank citation ranking: Bringing order to the Web, Page
et al, 1999. Background on
pagerank.
-
Modeling
Trust and Influence in the Blogosphere Using Link Polarity, Kale et al,
ICWSM 2007.
-
Mining
Knowledge-Sharing Sites for Viral Marketing, Richardson & Domingos,
KDD 2002.
-
Maximizing
the spread of influence through a social network, Kleinberg et al, KDD
2003.
-
Patterns
of Influence in a Recommendation Network, Lescovec et al, PAKDD2006.
-
Information
diffusion through blogspace, Gruhl et al, WWW 2004
-
The
Small-World Phenomenon: An Algorithmic Perspective, Kleinberg. STOC 2000
-
Implicit
structure in the dynamics of the blogsphere, Adar, WWW 2005 Workshop on
the
Weblogging
Ecosystem.
-
Unsupervised
prediction of citation influences, Dietz et al, ICML 2007.
-
The
political blogosphere and the 2004 US election: divided they blog, LA
Adamic, N Glance - 3rd international workshop on Link …, 2005
-
Cascading
Behavior in Large Blog Graphs, Leskovec et, SDM 2007.
-
Cost-effective
Outbreak Detection in Networks, Leskovec et al, KDD 2007
-
Mining
blog stories using community-based and temporal clustering, Qamra et al,
ICIKM 2006
Other analysis tasks for social media
SPAM detection:
-
-
Blocking
Blog Spam with Language Model Disagreement, Mishna et al, WWW 2005
Workshop on
Adversarial
IR on the Web
Tags and Folksonomies:
-
TagAssist:
Automatic Tag Suggestion for Blog Posts, Sood et al,
ICWSM
2007
-
Improved
Annotation of the Blogosphere via Autotagging and Hierarchical
Clustering, Brooks & Montenez, WWW 2006
-
The
Complex Dynamics of Collaborative Tagging, Halpin et al, WWW 2007
Trends in social media:
-
Bursty
and hierarchical structure in streams, Kleinberg, KDD 2002.
Background for a couple of the papers
below.
-
On
the Bursty Evolution of Blogspace, Kumar et al, WWW 2003.
-
The
predictive power of online chatter, Gruhl et al, KDD 2005.
-
Visualizing
tags over time, Dubinko et al, WWW 2006.
-
Graphs
over time: densification.... Leskovec, Kleinberg, Faloutsos, KDD 2005
Datasets and tools
-
GUESS:
A Language and Interface for Graph Exploration, Adar, CHI 2006.
Visualization tool.
-
Overview
of the TREC-2006 Blog Track, Ounis et al, TREC 2006. The TREC blog data.
-
ICWSM
dataset.