Published using Google Docs
Varun's Reading List Notes
Updated automatically every 5 minutes

1) Grudin: Why CSCW applications fail? (1988)

CSCW systems fail for the following 3 reasons:

2) Horvitz: Principles of Mixed-Initiative User Interfaces (1999)

Motivation:

The paper presents the design of LookOut, a feature of Microsoft Outlook, which automatically parses content of email messages grounding it based on the send date, and creates calendar events on Outlook which can be manipulated by the user. The design of LookOut is presented adhering to a set of 12 principles concerning mixed-initiative design proposed at the outset of the paper. The paper has an interesting discussion of the consideration of uncertainty, as well as the expected costs and benefits of taking autonomous action in different situations.

3) Ackerman: The Intellectual Challenge of CSCW: The Gap Between Social Requirements and Technical Feasibility (2000)

The paper reviews the existing state of CSCW research, argues that the main problem is the social-technical gap, or the gap between social requirements and technical feasibility, and presents solutions for what the field must do moving forward.

Existing CSCW research: human activity is nuanced, flexible and contextualized; CSCW systems need to be modeled similarly.

CSCW system / HCI problem: computational system that enables and facilitates collaborative work among multiple individuals or groups. It deals with research surrounding how people manage or interact with these computational systems.

Gap: There are no existing HCI mechanisms that can fully automate the everyday social handling of personal information. We must restrict the problem scope from what is socially appropriate.

Ackerman uses the P3P - privacy preferences project, to illustrate this. P3P allows services and users to configure privacy preferences related to data sharing. There is a match when the service and user preferences match. But Ackerman argues that no technical solution can accurately capture what is socially appropriate where exceptions are the norm and hence the problem scope needs to be changed.

In summary there are 3 main issues with CSCW systems:

  1. Systems are not nuanced
  2. Systems don’t allow for ambiguity
  3. Systems are not socially flexible

Arguments against the significance of the gap: technology will change, users will change. But after 25 years, we can see that technological solutions remain elusive, and forcing users to change is against the central premise of HCI.

Solutions: CSCW/HCI researchers should centralize the gap in their work. Ackerman argues that CSCW needs to be reconceptualized as the science of the artificial. 

Here are some key highlights:

Concluding quote:

“HCI and CSCW systems need to have at their core a fundamental understanding of how people really work and live in groups, organizations, communities, and other forms of collective life. Otherwise, we will produce unusable systems, badly mechanizing and distorting collaboration and other social activity”

 

4) Wobbrock and Kientz: Research Contributions in HCI Research (2016)

7 types of HCI contributions:

  1. Empirical:

Definition: new knowledge creation from through qual/quant observation and data gathering

Methods: interviews, experiments, surveys, ethnographies, diaries, logs

Significance: findings highlight new knowledge

Evaluation: importance of findings and soundness of methods

  1. Artifacts:

Definition: Arise from generative design

Methods: New systems, architectures, tools, toolkits, sketches, mockups

Significance: compel us to imagine new futures

Evaluation: could be accompanying empirical studies, quantitative evaluation, how well designs negotiate trade-offs and keep competing priorities in balance.

  1. Methodological:

Definition: Create new knowledge by informing how you do research

        Methods: new techniques for design, analysis, and measurement

        Significance: improve research practices

        Evaluation: utility, reproducibility, reliability, and validity

  1. Theoretical:

Definition:Create new knowledge by defining qual/quant theories that have descriptive and/or predictive power

Methods: consist of new or improved concepts, definitions, models, principles, frameworks

Significance: theoretical contributions inform what we do, why we do it, and what we expect from it

Evaluation: novelty, soundness, and power to describe, predict, and explain. Validation through empirical work.

  1. Survey:

Definition:review and synthesize work done on a research topic

        Methods: review prior work on a research topic which has some maturity

        Significance: exposing trends and gaps

Evaluation: evaluated based on how well they organize what is currently known about a topic and reveal opportunities for further research

  1. Dataset:

Definition: provides a new and useful corpus, often accompanied by an analysis of its characteristics, for the benefit of the research community

        Methods: Synthesis of a new corpus from a variety of sources - web, crowdsource etc

Significance: enable evaluations of shared repositories by new algorithms, systems, or methods

Evaluation: extent to which they supply the research community with a useful and representative corpus against which to test and measure.

  1. Opinion:

        Definition: seek to change the minds of readers through persuasion

        Methods: draw upon many of the above contribution types to make their case

Significance: goal is to persuade, compel reflection, discussion, and debate, not just inform.

Evaluation: strength of their argument. Strong arguments credibly use supporting evidence and fairly consider opposing perspectives

5) Rosenblat and Stark: Algorithmic Labor and Information Asymmetries: A Case Study of Uber’s Drivers (2016)

Summary:

The study examines how uber drivers experience labor under a regime of automated and algorithmic management. Specifically, through a qualitative study combining analysis of online forum data (1350 items) and driver interviews (7 participants), the paper highlights:

(i) the information and power asymmetry which exists through soft control and gamification and is vital for uber to run its business, and

(ii) critiques of uber’s algorithms and advertisement and corporate communications.

Highlights:

6) Alkhatib, Bernstein, and Levi: Examining Crowd Work and Gig Work Through The Historical Lens of Piecework (2017)

The paper characterizes on-demand work, i.e. crowd work like AMT and gig work like uber drivers, using historical analogy of piecework, which has been well studied and has many parallels to on-demand work.

Characteristics of piecework:

Research Questions:

  1. what are the complexity limits of on–demand work?
  1. how far can work be decomposed into smaller microtasks?
  1. what will work and the place of work look like for workers?

7) Webb: The Impact of AI on the Labor Market (2020)

The paper presents a methodology to measure exposure of occupations to different technologies - software, robots and AI. The methodology calculates an exposure score as the overlap between task descriptions for the occupations and the tasks performed by the technology using patent titles as a proxy for technology ability.

Findings:

Nature of the occupation:

Nature of individuals

Employment and Wages:

Other references on impact of AI on Labor:

8) Zhang: Algorithmic Management Reimagined For Workers and By Workers: Centering Worker Well-Being in Gig Work (2022)

The study explores concerns of rideshare workers on their well being and imagines new platforms through the lens of algorithmic imaginaries.

The specific research questions include:

  1. How do gig work’s algorithmic management and platform design affect worker well-being?
  2. What do gig workers desire to see in technology designs that support their well-being and work preferences?

The paper argues algorithmic imaginaries, or ways of thinking what algorithms are and how they should function, is better than mental models. The authors conduct focus groups to understand worker concerns and then conduct follow up participatory design sessions asking participants to envision new platform design features.

Highlights:

9) Dubal: On Algorithmic Wage Discrimination (2023)

Definitions:

Harms:

white people. That raises some tough questions. THE WASHINGTON POST (2016).;  “Caldor Fire Evacuees Report Tahoe Ride-Hail Price Gouging of More Than $1,500,” KQED,accessed October 25, 2022 https://www.kqed.org/news/11887558/caldor-fire-evacueesreport-tahoe-ride-hail-price-gouging-of-more-than-1500.

Legality:

Empirical Evidence of Hourly Pay Calculation:

Transparency:

Algorithm:

Mitigations:

Gig Work vs Organized Work: Why is AWD unacceptable in gig work? (TRUS)

Price vs Wage Discrimination: Why is the former okay?

Consumers Hate ‘Price Discrimination,’ but They Sure Love a Discount - The New York Times

“The most important factor… is that shoppers understand the rules that merchants have created. Problems arise when there’s an “informational imbalance”.

https://www.cnn.com/2024/04/05/business/walmart-shoppers-class-action-settlement/index.html

Zephyr Teachout - Algorithmic Personalized Wages

10) Sweeney: Discrimination in Online Ad Delivery (2013)

Google ads (service aka Google AdSense) suggesting arrest appear more frequently when the search string contains names associated with blacks than whites regardless of whether the advertising company has arrest records of the person.

This isn’t illegal in itself since Title VII would only apply if you were able to prove that an employer used the ads about your arrest in a hiring decision. Furthermore, the advertiser and the ad may be protected free speech under the first amendment.

This is one of the first works which examined bias in ad delivery and google image search setting the precedent for several other studies.

11) Ali: Discrimination through Optimization (2019)

The paper presents empirical evidence of bias in ad delivery optimization on facebook, along gender and racial lines, which could be in potential violation of Title VII.

Methodology:

Finding 1:  

Skewed ad delivery occurs due to market effects alone, even when targeting the same audience with varying budgets.

Methodology:  

Identical ads targeting the same audience but with varying budgets were run on Facebook.

Significance:

The audience that saw the ads ranged from over 55% men for low-budget ads to under 45% men for high-budget ads, demonstrating market effects alone can skew ad delivery across protected classes.

Finding 2:

Skewed ad delivery occurs due to the ad creative content (headline, text, and image).

Methodology:

Ads targeting the same audience but containing creatives stereotypically of interest to different genders and races were used (e.g., bodybuilding for men, cosmetics for women, hip-hop for Black users, country music for white users).

Significance:

Despite identical targeting and bids, ad delivery was heavily skewed based solely on the creative, with some ads delivering to over 80% men, over 90% women, over 85% Black users, or over 80% white users.

Finding 3:  

The ad image alone significantly impacts ad delivery.

Methodology:

Experiments swapping different headlines, text, and images were run, including cases where the image contradicted the other creative components' stereotypical interests.

Significance:

Differences in delivery were significantly affected by just the image. E.g., an ad with male-stereotypical text/headline but a female-stereotypical image delivered primarily to women.

Finding 4:

Facebook likely automatically classifies ad images, skewing delivery from the ad run's start.  

Methodology:

Ads with nearly transparent male/female stereotype images (visually indistinguishable but retaining data) were created.

Significance:

Statistically significant delivery differences based on the transparent images indicate Facebook's automated image classification and relevance estimation contribute to skewed delivery from the outset.

Finding 5:

Real employment and housing ads experience significantly skewed delivery.

Methodology:

Employment and housing ads were created and run while measuring delivery to different racial/gender users when optimizing for clicks.  

Significance:

Despite identical targeting, ads for different job types and housing delivered to vastly different audiences based solely on the creative. E.g., lumber jobs: 72% white, 90% male; taxi jobs: 75% Black.

12) BHN: Fairness in Machine Learning (2023)

Chapter 2: Legitimacy

Question: is it morally acceptable to use ML in a specific scenario? E.g. social media banning, automated essay scoring, criminal risk prediction. This is different from the other notion of fairness - relative treatment of groups.

ML is not a replacement for human decision making because in high stakes decision making scenarios like hiring, credit and housing, decisions are typically made by a bureaucracy, and not an individual.

Bureaucracies incorporate procedural protections such as:

Bureaucracies protect against arbitrary decisions: inconsistent or lack well justifiable reasoning. This is built on the principle that people are entitled to similar decisions unless there are reasons otherwise. Arbitrary decisions show a lack of respect for the people who are subject to them.

3 types of automation:

PO concerns:

  1. Mismatch between goal and prediction target
  2. Fail to consider relevant information,
  3. may seize upon spurious correlations (e.g. color of sneakers and speed, fast runners like blue sneakers and slow runners like red sneakers; coach uses color of sneakers to decide team membership)
  4. Lack agency and recourse

PO concerns: Mismatch between goal and prediction target example

Goal: where to deploy police to lessen crime

Target: arrest data

Mismatch:

Conclusion:

To establish legitimacy, decision makers must affirmatively justify their scheme by: demonstrating the target's relation to agreed-upon stakeholder goals; validating the deployed system's accuracy; allowing recourse methods; and addressing other outlined dimensions. While procedural protections around automated systems can achieve justification, decision makers avoid implementing them, as it undercuts automation's intended cost savings.

Chapter 3: Classification

No fairness through unawareness: being blind to the sensitive attribute i.e. removing it from input cannot ensure fair classification. There will exist a redundant encoding of the sensitive attribute, especially in large feature spaces, across many different features.

3 Statistical Non Discrimination Criteria:

  1. INDEPENDENCE acceptance rate R⊥A

        e.g. demographic parity P(Y^ = 1 | A=a) = P(Y^=1|A=b)

        Note: group specific thresholds may not satisfy independence.

        

  1. SEPARATION error rate R⊥A | Y

P{Y^ = 1 ∣ Y = 1,A = a}= P{Y^ = 1 ∣ Y = 1,A = b} True Positive rate

P{Y^ = 1 ∣ Y = 0,A = a}= P{Y^ = 1 ∣ Y = 0,A = b} False Positive rate

E.g. For example, a lender could use a more lenient risk threshold for one group to lower its error rate.

  1. SUFFICIENCY output frequency of a specific score Y⊥A|R - Parity of Positive/Negative predictive value

        P{Y = 1 ∣ R = r, A = a} = P{Y = 1 ∣ R = r, A = b}

        P{Y = 1 ∣ R = r} = r (calibration)

        P{Y = 1 ∣ R = r, A = a} = r (calibration by group implies sufficiency)

Non discrimination criteria can be satisfied by pre-processing, during training and post-processing classifiers.

Note: ProPublica implicitly adopted equality of false positive rates as a fairness criterion in their article on COMPAS scores (Black defendants had “twice the false positive rate” of White defendants). Northpointe, the maker of the COMPAS software, emphasized the importance of calibration by group in their rebuttal to ProPublica’s article.

Chapter 4: Relative notions of fairness

Question: why we might be concerned about uneven allocation of opportunities across specific groups and society overall?

Note: Race and gender have been the historical basis for organizing in most societies, not just idiosyncratic traits employers use to discriminate.

6 reasons why discrimination is morally incorrect:

  1. Relevance: race/gender has no relevance to outcome of decision
  2. Generalization: treats people within groups as overly uniform
  3. Prejudice: some groups are inferior to other
  4. Disrespect: demeans people of specific groups
  5. Immutability: treat people differently based on criteria they have no control over
  6. compounding injustice: can’t be culpable based on past injustices

Equality of opportunity: 3 Views

  1. Narrow: similar people should be treated similarly based on current level of similarity

e.g. education admission through a standardized exam (meritocracy)

  1. Middle: treat seemingly dissimilar people similarly by discounting dissimilarity as a result of past injustice beyond their control

e.g. affirmative action, texas law: top 10% of all high school students guaranteed admission

Some institutions must bear the cost even if there may be no guaranteed reward

  1. Broad: society should be organized such that people of similar ability should be able to attain similar outcomes

e.g. equalize quality of education accessible to rich and poor (not admissions)

Tensions: at what point in life are we ultimately responsible for how we compare with others?

Randomization and Thresholding: we must recognize that precisely controlled and purposeful randomness is not the same as arbitrariness or capriciousness.

three conditions hold (e.g. affordable housing, immigration visas):

Base rates (Error rate parity somewhat realizes the middle view):

One thing we can do even without the features is to look at differences in base rates (i.e., rates at which different groups achieve desired outcomes, such as loan repayment or job success). If the base rates are significantly different —and if we assume that individual differences in ability and ambition cancel out at the level of groups — it suggests that people’s qualifications may differ due to circumstances beyond the individual.

In fact, if base rates are so different that we expect large disparities in error rates that cannot be mitigated by interventions like data collection, then it suggests that the use of predictive decision making is itself problematic, and perhaps we should scrap the system or apply more fundamental interventions

Chapter 7: Broader view of discrimination

Social Scientists organize discrimination into 3 levels:

  1. Structural: the way society is organized e.g. laws
  1. Organizational: at the level of organizations e.g. universities

  1. Interpersonal: attitudes and beliefs of individuals

13) Kraut, Robert E., and Paul Resnick. Building Successful Online Communities: Evidence-Based Social Design (2012)

Chapter 4: Regulating Behavior in Online Communities

Part 1: How to lessen impact of bad behavior:

Part 2: How to limit bad behavior:

Part 3: How to encourage voluntary compliance:

Recommendations:

  1. Start with softer approaches:
  1. Use tangible remedies for persistent offenses:
  1. Build legitimacy and compliance:

Chapter 6: Starting New Online Communities

The chapter emphasizes the importance of carving out a useful niche, defending it from competitors, and reaching critical mass to ensure community success.

Design decisions for carving out a nice involve:

  1. Interactions: select, sort, highlight, notifications
  2. Structure: size and scope

Design decisions for reaching critical mass involve:

  1. External communication and integration: sharing IDs and profiles
  2. Create rewards for retention and recruitment
  3. Advertising and communicating in the right way

14) Van den Hoven, Jeroen. The Cambridge handbook of information and computer ethics (2010)

Chapter 4: The use of normative theories in computer ethics.

Moral dilemmas : there are 2 choices and a person cannot take both – how should they proceed? e.g. trolley problem: a train is chugging along, if you do nothing the train will kill 5 people, if you pull a lever you will kill 1 person.

Utilitarian ethics: limit damage, kill 1 person

Deontological ethics: do nothing, matter of principle

Criticisms of LLMs in HCI work:

Committee Feedback: