Text Analysis Workshop for Social Sciences

An Official Live Online UT Course (LA125) – Spring, 2016

Registration Information


James W. Pennebaker, Professor of Psychology

Class times: Mondays 6:00-7:15pm CDT; January 25-February 29

Questions?  Email: Pennebaker@mail.utexas.edu



What can you learn about individuals, groups, companies, or entire civilizations by studying their words?  This workshop will train you to analyze everyday language using the most recent version of a text analysis program, Linguistic Inquiry and Word Count (LIWC).  From the 6 classes, you should be able to analyze books, blogs, emails, tweets, and just about any texts and get a psychological picture of the author.


This is kind of a workshop and kind of a class.  Since it is being run through the University of Texas at Austin, it is actually a 1-hour credit/no-credit course.  But you can think of it as a workshop that meets once a week on Monday nights for six consecutive weeks.  The cost of the workshop/class is $200.  You can sign up for it to get college credit or, if that is too threatening, you can just audit the course for the same amount of money. All workshops will be recorded, including comments from students, and available for replay.


This is a 1-hour Credit/No Credit course with no grade.  Credit will only be given if the person participates in a minimum of 4 of the 6 live courses.


Requirements and prerequisites.  To take the workshop, you must register for the course through Extended Campus (formerly known as University Extension) as course number LA-125.  To register, go to the online registration site. Note that UT charges $200 for the course.


This workshop is designed for people with no formal training in computerized text analysis.  It is relevant for researchers in multiple disciplines, IT experts, There are no prerequisites. However, it is encouraged that all participants have an understanding of basic statistics, including correlations, t-tests, and analyses of variance.  At the very minimum, students must be familiar with Excel and, preferably, programs such as SPSS, SAS, R, or other statistical package.


Class meetings.  All six classes will take place from 6:00 – 7:15 PM Central Time and will be broadcast via Adobe Connect. Students will be able to participate and ask questions.  You will receive information about the class weblink in mid-January after signing up for the class.


LIWC2015 computer program.  All registered students will be provided with a free 2-month copy of the LIWC2015 program.  This will come with two licenses so that you can install it on two computers.  If you have an option between a Mac and a PC, the PC allows for one slight advantage when we are working with the Meaning Extraction Method in Workshop 5.  Note that LIWC is a commercial product that is intended for research purposes only.  Any commercial uses involve getting a commercial license from a separate company (Receptiviti.com).


Privacy and confidentiality.  All interactions in the class itself, including questions, texts, shared data, etc. will be available to the students for both educational and research purposes.  By signing up for and participating in this workshop/class, you understand that online interactions may be analyzed and published (with all identifying information removed). Personal emails to the instructor will not be part of the class data archive and will be kept confidential.


Workshop homework and assignments.  Each workshop will involve a homework assignment.  The assignments are optional. If you are serious about learning about computerized text analysis, however, you really should do all the assignments. The assignments that you turn in will be put on the class website for others to review. Depending on the size of the course, feedback will be provided by other class members and (privately) by the instructor.


Class website.  Within Canvas, we will be creating a class site that will encourage class discussions outside of the workshops themselves.  It is hoped that students will use this site to get to know one another and start possible collaborations.  The instructor will set up a weekly office hour to address questions.








Tentative Syllabus





January 25

Basics of text analysis and LIWC.  A brief overview of the logic of analyzing natural text.  A primer and demonstration of how LIWC works.

LIWC operator’s manual

February 1

Dictionaries, data sets, and the psychometrics of words.  A demonstration of building your own dictionary.  An overview of the types of corpora available and how to build your own. The behavior of words will be discussed.

LIWC language manual

February 8

Personality, cognition, and social dynamics.  How LIWC and related systems can reveal aspects of individual differences, thinking styles, and social identity. Particular emphasis will be placed on the distinction between self-reports, observer reports, and text analyses.

Tausczik & Pennebaker

February 15

Social interactions and language style matching. Language is inherently social. An overview of ways to measure conversations or text-based interactions (e.g., email, Twitter) to understand how people are connecting and/or thinking alike.

LSM paper

February 22

Topic modeling and meaning extraction.  How to automatically extract topics or themes from text. This will be an introduction to quantitative qualitative analyses.

Chung & Pennebaker;RiotScan and MEH

February 29

Machine learning and future directions.  The computational social science revolution is underway.  A discussion of new ways of thinking within computational linguistics, computer sciences, medicine, business, and other disciplines dealing with big data.