Differentiating Communication Styles
of Leaders on the
Linux Kernel Mailing List
Daniel Schneider, Scott Spurlock, Megan Squire
Elon University
@MeganSquire0
FLOSSmole / FLOSSdata / FLOSSpapers
OpenSym '16 (Berlin)
August 17, 2016
The LKML
Linux Kernel Mailing List
--one of 148 email mailing lists used to manage the Linux operating system
--7000 subscribers
--300 messages per day
--earliest archive is 1995
--2,160,000 messages (1995 - March 2015)
--54,000 unique senders (addresses)
The Linux Kernel Civility Incident - July 2013
Linus Torvalds
Greg Kroah-Hartman
Sarah Sharp
Research Questions
RQ1: Considering the two LKML leaders who were at the center of the 2013 controversy (Torvalds & Kroah-Hartman), what are the interesting features of their written discourse? How different is their communication style?
RQ2: Can we automatically differentiate emails written by each person, solely based on their (non-code) content? What features of the email content are most helpful to this task?
First we collected all the emails sent by Linus & by Greg
RQ1: lexical differences
RQ1: lexical differences
Flesch-Kincaid Reading Ease (FKRE) is a simplistic method for measuring how easy it is to read a text (higher scores are easier to read).
FKRE can also be converted to a school grade level equivalent (FKGL).
RQ1: lexical differences
RQ1: lexical differences
RQ1: expletive caveats
RQ2: automatically distinguishing the two authors
RQ2: automatically distinguishing the two authors
Perhaps this is too good?
Which features are actually interesting?
Thanks
At first we suspected that Greg simply used a "thanks" signature and Linus did not.
However, this is not the whole story.
Examples of "thanks" (all Greg)
Greg:
Which features are actually interesting?
Sorry
Greg says sorry a lot more than Linus does.
Examples of 'sorry' usage
Greg:
Linus:
Which features are actually interesting?
AdverbCount
Here we did include actually both in the total count of adverbs and on its own.
This adverb usage is an interesting fingerprint to Linus' speech. Linus' most-often used adverbs include:
Examples of adverbs (all Linus)
Which features are actually interesting?
ExpletivesCount
Whether expletives were used or not was an interesting feature.
Which features are actually interesting?
Names
Examples of Name Usage
Feel free to play around with this patch, I've sent it on to Linus.
Thanks, I've applied this to my trees, and will include it in the next round of changesets to Linus.
==
Greg seems to use some seriously bad drugs, and creates totally crap commit messages that are just annoying when you have to look at them because there's some conflict. Greg - please fix your crazy tools. Look at this:...and tell me why the f*&% you have commit messages like this....
Which features are actually interesting?
Thing
I was curious why the word "thing" should be such a strong fingerprint for Linus.
Examples of "the thing" (all Linus)
Future Work
Future Work
Future Work
Future Work
Future Work
Differentiating Communication Styles
of Leaders on the
Linux Kernel Mailing List
Daniel Schneider, Scott Spurlock, Megan Squire
Elon University
@MeganSquire0
FLOSSmole / FLOSSdata / FLOSSpapers
OpenSym '16 (Berlin)
August 17, 2016