Social Meaning and Social Language Processing

LIN 393S

Instructor: David Beaver

Spring 2011, Wed 2-5, Parlin 10

Summary

We will consider how social traits are marked (or betrayed) by the language we use, and how corpus and computational methods can be employed to study such marking. Topics will include markers of a wide range of social and psychological factors, such as Politeness, Sentiment, Deception, Status, Gender, Depression, and Group Cohesion. As well as discussing and presenting theoretical literature on these topics, participants will be encouraged to engage in hands-on statistical and computational analysis of corpora or web sources.

The seminar will be based primarily around readings drawn from primary literature, though there will also be a more practical side, with participants using statistical methods (e.g. with the R package) or machine learning techniques to run investigations on corpora and other data sets.

We will be reading through papers and book chapters each week. All participants will present papers to the group regularly. The main topics to be covered will be:

  1. Sentiment
  2. Deception
  3. Politics and Leadership
  4. Personality and health
  5. Social Interaction
  6. Politeness


Week 1 (Jan 19): general introduction, data resources


Week 2 (Jan 25)

Below are four readings for the next class (or the next two classes), plus discussion leaders.

Let me reiterate that the discussion leader is not necessarily expected to present the paper, although for technical papers a handout or slides may be necessary. All participants are expected to try out their own experiments as they read the paper, and bring the results to class. This should be particularly easy for the Science paper, and I expect you all to have tried simple experiments with Google’s ngram viewer (the link is to an obvious experiment).

For many of the papers we’ll discuss in future classes, not all of the participants (instructor included) come pre-equipped with sufficient  knowledge to perform their own studies. So it’s important that you all provide feedback on what special topics we should cover in class, e.g. how-to tutorials on the use of specific tools like R and LIWC, or particular data sets.

Reading

Discussion Leader

Jean-Baptiste Michel, Yuan Kui Shen, Aviva P. Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy,Peter  Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. 2011. Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331, 176

David

Pang, Bo, Lee, Lillian, and Vaithyanathan, Shivakumar. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 79-86. ACL.

David


Week 3 (Feb 2)

Reading

Discussion Leader

Pang, Bo and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1): 1–135. Chapter 4. p. 23-60

Dan G

Davis, Christopher and Christopher Potts. 2010. Affective demonstratives and the division of pragmatic labor. In Maria Aloni, Harald Bastiaanse, Tikitu de Jager, and Katrin Schulz, eds., Logic, Language, and Meaning: 17th Amsterdam Colloquium Revised Selected Papers, 42-52. Berlin: Springer.

Dan V


Week 4 (Feb 9)

Reading

Discussion Leader

Hatzivassiloglou, Vasileios and Kathleen R. McKeown. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 174-181

Zach

de Marneffe, Marie-Catherine, Christopher D. Manning and Christopher Potts. 2010. Was it good? It was provocative. Learning the meaning of scalar adjectives. To appear in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.

Joey


Week 5 (Thursday Feb 17, 5-6:15)

Reading

Discussion Leader

Pang, Bo and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1): 1–135. Chapter 5.

Justin

Kennedy, Alistair and Diana Inkpen. 2006. Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence 22:110-125.

Juwon


Week 6, (Feb 23)

Reading

Discussion Leader

Blair-Goldensohn, Sasha, Kerry Hannan, Ryan McDonald, Tyler Neylon, George A. Reis, and Jeff Reynar. 2008. Building a sentiment summarizer for local service reviews. In WWW Workshop on NLP in the Information Explosion Era (NLPIX).

Patrick

Danescu-Niculescu-Mizil, Cristian, Gueorgi Kossinets, Jon Kleinberg, Lillian Lee. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. Proceedings of WWW, 141-150.

Patrick

TBA: Deception or politics papers


Week 7, (Mar 2)

Reading

Discussion Leader

M. L. Newman, J. W. Pennebaker, D. S. Berry, and J. M. Richards. 2003. Lying words: Predicting deception from linguistic style. Personality and Social Psychology Bulletin, 29:665–675.

Dan

Catalina L. Toma & Jeffrey T. Hancock. 2010. Reading between the Lines: Linguistic Cues to Deception in Online Dating Profiles. CSCW 2010.

Chris

Hancock, J.T., L.E. Curry, S. Goorha, & M. Woodworth.  (2008). On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes 45(1): 1-23. Available via:  http://sml.comm.cornell.edu/publications.php

Justin

Special talk

Dan Ramage / Stanford University (http://www.stanford.edu/~dramage)

Date/Time: Wednesday, March 2, 3:15 p.m.

Location: UTA 5.522 (1616 Guadalupe St.)

Talk Title: Analyzing People, Places, and the Web with Statistical Text Models

Talk Abstract:

From Twitter to academic publications, information technologies have enabled companies, organizations, and governments to collect huge datasets about the world, often with major textual components. These datasets promise to improve our understanding of large scale social phenomena in new ways. Doing so requires tools that can discover and quantify interpretable, trustworthy patterns in the data. In particular, these tools should discover textual trends that align with labels, tags, or other known categories of interest, when they are available, and lend themselves to visual exploration and interpretation. I will present a series of probabilistic models of metadata-enriched text with applications to tagged web pages and Twitter, enabling new kinds of characterizations of how people organize information and communicate online. I'll also present results from an ongoing study of innovation in academia, as seen through the lens of one million PhD dissertation abstracts from the past 30 years.

Speaker Bio:

Daniel Ramage is a PhD candidate in Computer Science at Stanford University. He works in the Natural Language Processing Group and is advised by Chris Manning. His research focuses on building models and tools that can help shed light on real-world phenomena through the lens of the text people write. He has worked at large industrial research labs, including Microsoft Research in Redmond and IBM Research in Zurich, and collaborated extensively with social scientists from Stanford's School of Education. In recent summers, he has volunteered in Jerusalem teaching software engineering and project management to Palestinian and Israeli teenagers through Middle East Education through Technology. In his spare time, he satisfies his inner coder by working on open source software, including the Stanford Topic Modeling Toolbox.


Week 8, (Mar 9)

Reading

Discussion Leader

M. L. Newman, J. W. Pennebaker, D. S. Berry, and J. M. Richards. 2003. Lying words: Predicting deception from linguistic style. Personality and Social Psychology Bulletin, 29:665–675.

Dan V

Hancock, J.T., L.E. Curry, S. Goorha, & M. Woodworth.  (2008). On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes 45(1): 1-23. Available via:  http://sml.comm.cornell.edu/publications.php

Justin

Rajesh Ranganath, Dan Jurafsky, and Dan McFarland. 2009. It's Not You, it's Me: Detecting Flirting and its Misperception in Speed-Dates. Proceedings of EMNLP 2009.

Dan G

Dan Jurafsky, Rajesh Ranganath, and Dan McFarland. 2009. Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation. Proceedings of NAACL HLT 2009.

Dan G


Week 9, (Mar 24)

Reading

Discussion Leader

Rajesh Ranganath, Dan Jurafsky, and Dan McFarland. 2009. It's Not You, it's Me: Detecting Flirting and its Misperception in Speed-Dates. Proceedings of EMNLP 2009.

Dan G

Dan Jurafsky, Rajesh Ranganath, and Dan McFarland. 2009. Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation. Proceedings of NAACL HLT 2009.

Dan G

Burt L. Monroe, Michael P. Colaresi, and Kevin M. Quinn. 2008. Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict. Political Analysis (2008) 16:372–403.

Zach

Slatcher, R.B., Chung, C.K., Pennebaker, J.W., & Stone, L.D. (2007).  Winning words: Individual differences in linguistic style among U.S. presidential and vice presidential candidates.  Journal of Research in Personality, 41, 63-75.

Zach


Week 10, (Mar 31)

Reading

Discussion Leader

Burt L. Monroe, Michael P. Colaresi, and Kevin M. Quinn. 2008. Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict. Political Analysis (2008) 16:372–403.

Zach

Brown and Levinson, Politeness, CUP, 1987. pp. 55-129

David

Miller, C. and Wu, P. and Funk, H. and Johnson, L. and Viljalmsson, H. 2007. A computational approach to etiquette and politeness: An “Etiquette Engine for cultural interaction training, Proceedings of the 16th Conference on Behavior Representation in Modeling and Simulation (BRIMS), 26--29

Juwon

Ardissono, L. and Boella, G. and Lesmo, L. 1995. Indirect speech acts and politeness: A computational approach, Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society, Lawrence Erlbaum Associates.

Juwon


Week 11, (Apr 6)

Reading

Discussion Leader

Ardissono, L. and Boella, G. and Lesmo, L. 1995. Indirect speech acts and politeness: A computational approach, Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society, Lawrence Erlbaum Associates.

Juwon

Brown and Levinson, Politeness, CUP, 1987. pp. 55-129

David

M Koppel, S Argamon. 2002. Automatically categorizing written texts by author gender. Literary and Linguistic Computing.

Patrick

Newman, M.L., Groom, C.J., Handelman, L.D., and Pennebaker, J.W. 2008. Gender differences in language use: an analysis of 14,000 text samples. Discourse Processes, 45, 211–236.

Dan

Dan Velleman. Unpublished ms 2010. Linguistic Inquiry and N-gram Count: a closer look at the psychology of syntactic patterns. Qualifying Paper. The University of Texas at Austin.

Dan


Week 12, (Apr 13)

Reading

Discussion Leader

Ardissono, L. and Boella, G. and Lesmo, L. 1995. Indirect speech acts and politeness: A computational approach, Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society, Lawrence Erlbaum Associates.

Juwon

von Rooij, Robert, 2003. Being polite is a handicap: Towards a game theoretical analysis of polite linguistic behavior, M. Tennenholtz (ed.), Proceedings of TARK 9.

Joey

Eckert, Penny, 2008. Variation and the indexical field. Journal of Sociolinguistics 12/4: 453–476

Patrick


Week 13, (Apr 20)

Reading

Discussion Leader

Eckert, Penny, 2008. Variation and the indexical field. Journal of Sociolinguistics 12/4: 453–476

Patrick

Lee CH, Kim K, Seo YS, Chung CK. (2007). The relations between personality and language use. Journal of Gen Psychology 134(4):405-13

Justin

Ireland, M.E., Slatcher, R.B., Eastwick, P.W., Scissors, L.E., Finkel, E.J., & Pennebaker, J.W.  (in press).  Language style matching predicts relationship initiation and stability. Psychological Science.

Justin

Niederhoffer, K.G. & Pennebaker, J.W.  (2002).  Linguistic style matching in social interaction.  Journal of Language and Social Psychology, 21, 337-360.

Dan G.

(if time)

Abdullah, N. A. 2006. Constructing Business Email Messages: A Model of Writers’ Choice, ESP Malaysia, Vol. 12, 53-63

Chris

(if time)


Week 14, (Apr 27)

Reading

Discussion Leader

Lee CH, Kim K, Seo YS, Chung CK. (2007). The relations between personality and language use. Journal of Gen Psychology 134(4):405-13

Justin

Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77, 1296–1312.

Justin

Niederhoffer, K.G. & Pennebaker, J.W.  (2002).  Linguistic style matching in social interaction.  Journal of Language and Social Psychology, 21, 337-360.

Dan G.

(if time)

Abdullah, N. A. 2006. Constructing Business Email Messages: A Model of Writers’ Choice, ESP Malaysia, Vol. 12, 53-63

Chris

General discussion of Politeness, and Brown and Levinson in particular

David

References

Sentiment

Blair-Goldensohn, Sasha, Kerry Hannan, Ryan McDonald, Tyler Neylon, George A. Reis, and Jeff Reynar. 2008. Building a sentiment summarizer for local service reviews. In WWW Workshop on NLP in the Information Explosion Era (NLPIX).

Danescu-Niculescu-Mizil, Cristian, Gueorgi Kossinets, Jon Kleinberg, Lillian Lee. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. Proceedings of WWW, 141-150.

Davis, Christopher and Christopher Potts. 2010. Affective demonstratives and the division of pragmatic labor. In Maria Aloni, Harald Bastiaanse, Tikitu de Jager, and Katrin Schulz, eds., Logic, Language, and Meaning: 17th Amsterdam Colloquium Revised Selected Papers, 42-52. Berlin: Springer.

Ghose, Anindya and Panagiotis G. Ipeirotis. 2007. Designing novel review ranking systems: Predicting the usefulness and impact of reviews. Proceedings of ICEC 2007.

Hatzivassiloglou, Vasileios and Kathleen R. McKeown. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 174-181

Kennedy, Alistair and Diana Inkpen. 2006. Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence 22:110-125.

de Marneffe, Marie-Catherine, Christopher D. Manning and Christopher Potts. 2010. Was it good? It was provocative. Learning the meaning of scalar adjectives. To appear in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.

Pang, Bo, Lee, Lillian, and Vaithyanathan, Shivakumar. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 79-86.         ACL.

Pang, Bo and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1): 1–135. Chapter 4.

Pang, Bo and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1): 1–135. Chapter 5.

Potts, Christopher and Florian Schwarz. 2008. Exclamatives and heightened emotion: Extracting pragmatic generalizations from large corpora. Ms., UMass Amherst.

Potts, Christopher. 2010. On the negativity of negation. To appear in David Lutz and Nan Li, eds., Proceedings of Semantics and Linguistic Theory 20.

Velikovich, Leonid, Sasha Blair-Goldensohn, Kerry Hannan, and Ryan McDonald. 2010. The viability of web-derived polarity lexicons. Proceedings of NAACL 2010.

Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2009). Recognizing Contextual Polarity: an exploration of features for phrase-level sentiment analysis. Computational Linguistics 35(3).

Politics

Burt L. Monroe, Michael P. Colaresi, and Kevin M. Quinn. 2008. Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict. Political Analysis (2008) 16:372–403.

Quinn, K. M., B. L. Monroe, M. Colaresi, M. H. Crespin, and D. R. Radev. 2009. How to Analyze Political Attention with Minimal Assumptions and Costs. American Journal of Po- litical Science 54(1): 209–28.

Thomas, Matt, Pang, Bo, and Lee, Lillian. 2006. Get out the vote: determining support or opposition from Congressional floor-debate transcripts. In Proceedings of EMNLP 2006, 327–335.

Slatcher, R.B., Chung, C.K., Pennebaker, J.W., & Stone, L.D. (2007).  Winning words: Individual differences in linguistic style among U.S. presidential and vice presidential candidates.  Journal of Research in Personality, 41, 63-75.

Tae Yano, Philip Resnik, and Noah A. Smith. 2010. Shedding (a Thousand Points of) Light on Biased Language. In Proceedings of the NAACL-HLT Workshop on Creating Speech and Language Data With Mechanical Turk, Los Angeles, CA, June 2010.

Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010.From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In Proceedings of the International AAAI Conference on Weblogs and Social Media, Washington, DC, May 2010.

Hancock, J.T., Beaver, D.I., Chung, C.K., Frazee, J., Pennebaker, J.W., Graesser, A., & Cai, Z. (2010). Social language processing: A framework for analyzing the communication of terrorists and authoritarian regimes. International Journal of Language, Culture, and Society.

Social Interaction and Dating

Gonzales, A.L., Hancock, J.T., & Pennebaker, J.W. 2010. Language indicators of social

dynamics in small groups. Communications Research 31, 3-19.

Ireland, M.E., Slatcher, R.B., Eastwick, P.W., Scissors, L.E., Finkel, E.J., & Pennebaker, J.W.  (in press).  Language style matching predicts relationship initiation and stability. Psychological Science.

Dan Jurafsky, Rajesh Ranganath, and Dan McFarland. 2009. Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation. Proceedings of NAACL HLT 2009.

Niederhoffer, K.G. & Pennebaker, J.W.  (2002).  Linguistic style matching in social interaction.  Journal of Language and Social Psychology, 21, 337-360.

Rajesh Ranganath, Dan Jurafsky, and Dan McFarland. 2009. It's Not You, it's Me: Detecting Flirting and its Misperception in Speed-Dates. Proceedings of EMNLP 2009.

Scholand, A.J. and Tausczik, Y.R. and Pennebaker, J.W. 2010. Social language network analysis. Proceedings of the 2010 ACM conference on Computer supported cooperative work. pp. 23--26. ACM.

Personality and Gender

M Koppel, S Argamon. 2002. Automatically categorizing written texts by author gender. Literary and Linguistic Computing.

Lee CH, Kim K, Seo YS, Chung CK. (2007). The relations between personality and language use. Journal of Gen Psychology 134(4):405-13

F. Mairesse, M. Walker, M. Mehl, and R. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research (JAIR), 30:457-500.

Newman, M.L., Groom, C.J., Handelman, L.D., and Pennebaker, J.W. 2008. Gender differences in language use: an analysis of 14,000 text samples. Discourse Processes, 45, 211–236.

Tausczik, Y., & Pennebaker, J.W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods.  Journal of Language and Social Psychology,29, 24-54.

Dan Velleman. Unpublished ms 2010. Linguistic Inquiry and N-gram Count: a closer look at the psychology of syntactic patterns. Qualifying Paper. The University of Texas at Austin.


Deception

Duran, N.D., Hall, C., McCarthy, P.M., & McNamara, D.S. (2010). The linguistic correlates of conversational deception: Comparing natural language processing technologies. Applied Psycholinguistics.

Enos, Frank, Elizabeth Shriberg, Martin Graciarena, Julia Hirschberg, and Andreas Stolcke. 2007. Detecting deception using critical segments. In Proceedings Interspeech, 1621-1624. Antwerp.

Hancock, J.T., L.E. Curry, S. Goorha, & M. Woodworth.  (2008). On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes 45(1): 1-23. Available via:  http://sml.comm.cornell.edu/publications.php

David F. Larcker and Anastasia A. Zakolyukina. submitted manuscript. Detecting Deceptive Discussions in Conference Calls.

M. L. Newman, J. W. Pennebaker, D. S. Berry, and J. M. Richards. 2003. Lying words: Predicting deception from linguistic style. Personality and Social Psychology Bulletin, 29:665–675.

Catalina L. Toma & Jeffrey T. Hancock. 2010. Reading between the Lines: Linguistic Cues to Deception in Online Dating Profiles. CSCW 2010.

Health

Campbell, Sherlock R. and Pennebaker, James W. 2003. The secret life of pronouns: flexibility in writing style and physical health. Psychological Science 14(1): 60–65.

Nairan Ramirez-Esparza, Cindy K. Chung, Ewa Kacewicz, and James W. Pennebaker. 2008. The Psychology of Word Use in Depression Forums in English and in Spanish: Testing Two Text Analytic Approaches. Int'l AAAI Conference on Weblogs and Social Media (ICWSM) 2008.

S. S. Rude, E. M. Gortner, and J. W. Pennebaker. 2004. Language use of depressed and depression-vulnerable college students. Cognition and Emotion, 18:1121-1133.

Politeness and speech acts

Brown, P. & Levinson, S. (1987). Politeness: Some Universals in Language Usage. Cambridge, UK.; Cambridge Univ. Press.

Local materials at UT: Part 1 (text converted), Part 1 (big unconverted file), Part 2, Part 3, Part 4

Miller, C. and Wu, P. and Funk, H. and Johnson, L. and Viljalmsson, H. 2007. A computational approach to etiquette and politeness: An “Etiquette Engine for cultural interaction training, Proceedings of the 16th Conference on Behavior Representation in Modeling and Simulation (BRIMS), 26--29

Ardissono, L. and Boella, G. and Lesmo, L. 1995. Indirect speech acts and politeness: A computational approach, Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society, Lawrence Erlbaum Associates.

von Rooij, Robert, 2003. Being polite is a handicap: Towards a game theoretical analysis of polite linguistic behavior, M. Tennenholtz (ed.), Proceedings of TARK 9.

 

Sociolinguistics

Eckert, Penny, 2008. Variation and the indexical field. Journal of Sociolinguistics 12/4: 453–476

Conferences

2010 IEEE Second International Conference on Social Computing

Statistical resources

Baayen, R. H. (2008) Analyzing Linguistic Data. A Practical Introduction to Statistics Using R. Cambridge University Press.

The R Project for Statistical Computing

R tutorial

Class pages

Dan Jurafsky and Chris Potts. 2010. Extracting Social Meaning and Sentiment. Stanford University. (Note: Many thanks are due to these guys. Much material on this page was gathered from theirs, with permission. - dib)

Conferences of interest

1st International Academic Conference on Social Broadcasting Technology, March 13 & 14, 2011 - Austin, Texas

Workshop on Language in Social Media (LSM 2011), June 23 at ACL/HLT 2011 in Portland, Oregon, submissions due April 1.

Data Resources (collection compiled by Joey Frazee, Dan Velleman, et al)

Supplementary Topics

Authorship Attribution

Patrick Juola. 2006. Authorship Attribution. Foundations and Trends in Information Retrieval: Vol. 1: No 3, pp 233-334.

E. Stamatatos. 2009. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60(3):538–556

Mohtasseb, H. and Ahmed, A. 2009. More blogging features for author identification. Proceedings of the 2009 international conference on Knowledge discovery (ICKD’09). 534--539.

[Added the above just to see a LIWC approach in action. - dib]