1 of 74

Human-Computer Interaction

saadh.info/hci

Week 9 (Thursday): Experiment Design

1

2 of 74

Attendance and Agenda

  1. Scientific Foundations

2

3 of 74

Announcements

  • Please submit the teammate evaluation form.
  • Submit assignment 2 by the end of the day today to avoid any late submission penalty.
  • Please go back and check if you finished all quizzes. There are some missing.
  • Start working on milestone 2!

3

4 of 74

1. Perception and Cognition

4

5 of 74

2. User Research Methods & Qualitative Analysis

5

Interviews

Contextual Inquiry

Think-out Aloud

6 of 74

3. Experimental Research in HCI

Error bars show

±1 standard deviation

7 of 74

4. Modeling Interactions (Next)

Units: bits

RT = a + b log2(n + 1)

Fitts’ Law

Hick-Hyman’ Law

8 of 74

3. Experimental Research in HCI

Error bars show

±1 standard deviation

9 of 74

Experimental Research in HCI

9

Scientific Foundation

Experiment Design

Hypothesis Testing

Demo and Assignment 3

10 of 74

Research – Definition #3

  • Research is…

  • What does that mean for HCI researchers?
    • Design and conduct a user study to test whether a new interaction technique improves on an existing interaction technique

10

Investigation or experimentation aimed at the discovery and interpretation of facts, the revision of accepted theories or laws in light of new facts.

11 of 74

Example: Apple iPhone (2007)

11

iPhone Gestures:

  • Tilt
  • Multitouch
  • Flick

12 of 74

Tilt

  • Research on tilt as an interaction primitive dates at least to 19981

12

1 Harrison, B., Fishkin, K. P., Gujar, A., Mochon, C., & Want, R. (1998). Squeeze me, hold me, tilt me! An exploration of manipulative user interfaces. Proc CHI '98, 17-24, New York: ACM.

13 of 74

Multitouch

  • Research on multitouch as an interaction primitive dates at least to 19781

13

1 Herot, C. F., & Weinzapfel, G. (1978). One-point touch input of vector information for computer displays. Proceedings of SIGGRAPH 1978, 210-216, New York: ACM.

14 of 74

Flick

  • Research on flick as an interaction primitive dates at least to 19631

14

1 Sutherland, I. E. (1963). Sketchpad: A man-machine graphical communication system. Proceedings of the AFIPS Spring Joint Computer Conference, 329-346, New York: ACM.

15 of 74

So…

15

1963

(multitouch)

1978

(flick)

1998

(tilt)

2007

(iPhone)

Research

Engineering

Design

Materials�&�Processes

Products

time

16 of 74

Experimental Method

  • Aka scientific method
  • Controlled experiments conducted in lab setting
  • Relevance vs. precision
    • Low in relevance (artificial environment)
    • High in precision (extraneous behaviours easy to control)
  • At least two variables:
    • Manipulated variable (aka independent variable)
    • Response variable (aka dependent variable)
  • Cause-and-effect conclusions possible (changes in the manipulated variable caused changes in the response variable)

16

17 of 74

Correlational Method

  • Look for relationships between variables
  • Observations made, data collected
    • Example: are user’s privacy settings while social networking related to their age, gender, level of education, employment status, income, etc.
  • Non-experimental
    • Interviews, online surveys, questionnaires, etc.
  • Balance between relevance and precision (some quantification, observations not in lab)
  • Cause-and-effect conclusions not possible

17

18 of 74

General Rules

18

OK to compute....

Nominal

Ordinal

Interval

Ratio

frequency distribution

Yes

Yes

Yes

Yes

median and percentiles

No

Yes

Yes

Yes

addition or subtraction

No

No

Yes

Yes

mean or standard deviation

No

No

Yes

Yes

ratio, or coefficient of variation

No

No

No

Yes

19 of 74

Testable Research Questions (2)

  • Very weak� Is the new technique any good?
  • Weak� Is the new technique better than QSK?
  • Better� Is the new technique faster than QSK?
  • Better still� Is the measured entry speed (in words� per minute) higher for the new technique� than for QSK after one hour of use?

19

20 of 74

A Tradeoff

20

21 of 74

The Tradeoff

  • There is tension between internal and external validity
  • The more the test environment and experimental procedures are “relaxed” (to mimic real-world situations), the more the experiment is susceptible to uncontrolled sources of variation, such as pondering, distractions, fiddling, or secondary tasks

21

22 of 74

Comparative Evaluations

  • Preferable to do a comparative evaluation rather than one-of
  • More insightful results obtained
  • Factorial experiments require comparison, because there must be at least one independent variable with at least two levels
  • If one condition is a base line; comparisons possible between studies (assuming similar methodology)

22

23 of 74

Ingredients for an HCI Experiment

23

24 of 74

Introduction

  • Learning to conduct and design an experiment is a skill required of all researchers in HCI
  • Experiment design is the process of
    • deciding what variables to use,
    • what tasks and procedures to use,
    • how many participants to use and
    • how to solicit them, and so on

24

25 of 74

Signal and Noise Metaphor

  • Signal and noise metaphor for experiment design:

  • Signal 🡪 a variable of interest
  • Noise 🡪 everything else (random influences)
  • Experiment design seeks to enhance the signal, while minimizing the noise

25

26 of 74

Methodology

  • Methodology is the way an experiment is designed and carried out
  • Methodology is critical:

  • What methodology?
  • Don’t just make it up because it seems reasonable
  • Follow standards for experiments with human participants (next slide)

26

Science is method. Everything else is commentary.1

1 This quote from Allen Newell was cited and elaborated on by Stuart Card in an invited talk at the ACM’s SIGCHI conference in Austin, Texas (May 10, 2012).

27 of 74

Ethics Approval

  • Ethics approval is a crucial step that precedes every HCI experiment
  • HCI experiments involve humans, thus…

  • Proposal submitted to ethics review committee
  • Criteria for approval:
    • research methodology
    • risks or benefits
    • the right not to participate, to terminate participation, etc.
    • the right to anonymity and confidentiality

27

Researchers must respect the safety, welfare, and dignity of human participants in their research and treat them equally and fairly.1

1 http://www.yorku.ca/research/students/index.html

28 of 74

Getting Started With Experiment Design

  • Transitioning from the creative work in formulating and prototyping ideas to experimental research is a challenge
  • Begin with…

  • Remember research questions:

  • Properly formed research questions inherently identify experimental variables (can you spot the independent variable and the dependent variable in the question above?)

28

What are the experimental variables?

Can a task be performed more quickly with my new interface than with an existing interface?

29 of 74

Independent Variable

  • An independent variable (IV) is a circumstance or characteristic that is manipulated in an experiment to elicit a change in a human response while interacting with a computer.
  • “Independent” because it is independent of participant behavior (i.e., there is nothing a participant can do to influence an independent variable)
  • Examples:
    • interface, device, feedback mode, button layout, visual layout, age, gender, background noise, expertise, etc.
  • The terms independent variable and factor are synonymous

29

30 of 74

Test Conditions

  • An independent variable (IV) must have at least two levels
  • The levels, values, or settings for an IV are the test conditions
  • Name both the factor (IV) and its levels (test conditions):

30

Factor (IV) Levels (test conditions )

Device mouse, trackball, joystick

Feedback mode audio, tactile, none

Task pointing, dragging

Visualization 2D, 3D, animated

Search interface Google, custom

31 of 74

Human Characteristics

  • Human characteristics are naturally occurring attributes
  • Examples:
    • Gender, age, height, weight, handedness, grip strength, finger width, visual acuity, personality trait, political viewpoint, first language, shoe size, etc.
  • They are legitimate independent variables, but they cannot be “manipulated” in the usual sense
  • Causal relationships are difficult to obtain due to unavoidable confounding variables

31

32 of 74

How Many IVs?

  • An experiment must have at least one independent variable
  • Possible to have 2, 3, or more IVs
  • But the number of “effects” increases rapidly with the size of the experiment:

  • Advice: Keep it simple (1 or 2 IVs, 3 at the most)

32

33 of 74

Dependent Variable

  • A dependent variable is a measured human behaviour (related to an aspect of the interaction involving an independent variable)
  • “Dependent” because it depends on what the participant does
  • Examples:
    • task completion time, speed, accuracy, error rate, throughput, target re-entries, task retries, presses of backspace, etc.
  • Dependent variables must be clearly defined
    • Research must be reproducible!

33

34 of 74

Unique DVs

  • Any observable, measurable behaviour is a legitimate dependent variable (provided it has the potential to reveal differences among the test conditions)
  • So, feel free to “roll your own”
  • Example: negative facial expressions1
    • Application: user difficulty with mobile games
    • Events logged included frowns, head shaking
    • Counts used in ANOVA, etc.
    • Clearly defined 🡪 reproducible

34

1 Duh, H. B.-L., Chen, V. H. H., & Tan, C. B. (2008). Playing different games on different phones: An empirical study on mobile gaming. Proceedings of MobileHCI 2008, 391-394, New York: ACM.

35 of 74

Data Collection

  • Obviously, the data for dependent variables must be collected in some manner
  • Ideally, engage the experiment software to log timestamps, key presses, button clicks, etc.
  • Planning and pilot testing important
  • Ensure conditions are identified, either in the filenames or in the data columns
  • Examples: (next two slides)

35

36 of 74

36

TextInputHuffman-P01-D99-B06-S01.sd1

37 of 74

37

TextInputHuffman-P01-D99-B06-S01.sd2

38 of 74

More Variables

  1. Control Variable
  2. Random Variable
  3. Confounding Variable

38

39 of 74

Control Variable

  • A control variable is a circumstance (not under investigation) that is kept constant while testing the effect of an independent variable
  • More control means the experiment is less generalizable (i.e., less applicable to other people and other situations)
  • Research question: Is there an effect of font color or background color on reading comprehension?
    • Independent variables: font color, background color
    • Dependent variable: comprehension test scores
    • Control variables
      • Font size (e.g., 12 point)
      • Font family (e.g., Times)
      • Ambient lighting (e.g., fluorescent, fixed intensity)
      • Etc.

39

40 of 74

Random Variable

  • A random variable is a circumstance that is allowed to vary randomly
  • More variability is introduced in the measures (bad!), but the results are more generalizable (good!)
  • Research question: Does user stance affect performance while playing Guitar Hero?
    • Independent variable: stance (standing, sitting)
    • Dependent variable: score on songs
    • Random variables
      • Prior experience playing a real musical instrument
      • Prior experience playing Guitar Hero
      • Amount of coffee consumed prior to testing. Etc.

40

41 of 74

Control vs. Random Variables

  • There is a trade-off which can be examined in terms of internal validity and external validity (see below)

41

42 of 74

Confounding Variable

  • A confounding variable is a circumstance that varies systematically with an independent variable
  • Should be considered, lest the results are misleading
  • Research question: In an eye tracking application, is there an effect of “camera distance” on task completion time?
    • Independent variable: Camera distance (near, far)
      • Near camera (A): inexpensive camera mounted on eye glasses
      • Far camera (B): expensive camera mounted above system display
    • Dependent variable: task completion time
    • But, “camera” is a confounding variable: camera A for the near setup, camera B for the far setup
    • Are the effects due to camera distance or to some aspect of the different setups?

42

43 of 74

Fitts’ Law Example

  • Independent variables:
    • Index of difficulty (ID): 1, 2, 3, 4 bits
    • Amplitude: 16, 32, 64, 128 pixels
  • Note: ID = log2(2A / W)

43

Experiment design

(ID x Amplitude)

Showing width (W)

Experiment design

(W x Amplitude)

W is a confounding variable (see below)

44 of 74

Experiment Task

  • Recall the definition of an independent variable:
    • a circumstance or characteristic that is manipulated in an experiment to elicit a change in a human response while interacting with a computer
  • The experiment task must “elicit a change”
  • Qualities of a good task: represent, discriminate
    • Represent activities people do with the interface
      • Improves external validity (but may compromise internal validity)
    • Discriminate among the test conditions
      • Increases likelihood of a statistically significant outcome (i.e., the sought-after “change” occurs)

44

45 of 74

Task Examples

  • Usually the task is self-evident (follows directly from the research idea)
  • Research idea 🡪 a new graphical method for entering equations in a spreadsheet
    • Experiment task 🡪 insert an equation using
      • (a) the graphical method and
      • (b) the conventional method
  • Research idea 🡪 an auditory feedback technique for programming a GPS device
    • Experiment task 🡪 program a destination location into a GPS device using
      • (a) the auditory feedback method and
      • (b) the conventional method

45

46 of 74

Knowledge-based Tasks

  • Most experiment tasks are performance-based or skill-based (e.g., inserting an equation, programming a destination location; see previous slide)
  • Sometimes the task is knowledge-based (e.g., “Use an Internet search interface to find the birth date of Albert Einstein.”)
  • In this case, participants become contaminated (in a sense) after the first run of task, since they have acquired the knowledge
  • Experimentally, this poses problems (beware!)
  • A creative approach is needed (e.g., for the other test condition, slightly change the task; “…of William Shakespeare”)

46

47 of 74

Procedure

  • The procedure encompasses everything that occurs with participants
  • The procedure includes the experiment task (obviously), but everything else as well…
    • Arriving, welcoming
    • Signing a consent form
    • Instructions given to participants about the experiment task (next slide)
    • Demonstration trials, practice trials
    • Rest breaks
    • Administering of a questionnaire or an interview

47

48 of 74

Instructions

  • Very important (best to prepare in advance; write out)
  • Often the goal in the experiment task is “to proceed as quickly and accurately as possible but at a pace that is comfortable”
  • Other instructions are fine, as per the goal of the experiment or the nature of the tasks, but…
  • Give the same instructions to all participants
  • If a participant asks for clarification, do not change the instructions in a way that may cause the participant to behave differently from the other participants

48

49 of 74

Participants

  • Researchers want experimental results to apply to people not actually tested – a population
  • Population examples:
    • Computer-literate adults, teenagers, children, people with certain disabilities, left-handed people, engineers, musicians, etc.
  • For results to apply generally to a population, the participants used in the experiment must be…
    • Members of the desired population
    • Selected at random from the population
  • True random sampling is rarely done (consider the number and location of people in the population examples above)
  • Some form of convenience sampling is typical

49

50 of 74

How Many Participants?

  • Too few 🡪 experimental effects fail to achieve statistical significance
  • Too many 🡪 statistical significance for effects of no practical value
  • The correct number… (drum roll please)
    • I recommend: Use the same number of participants as used in similar research1
  • Power Analysis
    • Assumes that you have null and alternative hypotheses, the statistical method, effect size or variability, the significance level (α = 0.05?), and either the sample size or power (80%?). Will look at this later…

50

1 Martin, D. W. (2004). Doing psychology experiments (6th ed.). Pacific Grove, CA. Belmont, CA: Wadsworth.

51 of 74

Questionnaires

  • Questionnaires are used in most HCI experiments
  • Two purposes
    • Collect information about the participants
      • Demographics (gender, age, first language, handedness, visual acuity, etc.)
      • Prior experience with interfaces or interaction techniques related to the research
    • Solicit feedback, comments, impressions, suggestions, etc., about participants’ use of the experimental apparatus
  • Questionnaires, as an adjunct to experimental research, are usually brief

51

52 of 74

Information Questions

52

Ratio-scale� data

Ordinal-scale�data

Open-ended

Closed

53 of 74

Participant Feedback

  • Using NASA Task Load Index (TLX):

  • ISO 9241-9:

53

54 of 74

Within-subjects, Between-subjects

54

55 of 74

Within-subjects, Between-subjects

55

Within-subjects

Between-subjects

56 of 74

Within-subjects, Between-subjects (2)

  • Within-subjects advantages
    • Fewer participants (easier to recruit, schedule, etc.)
    • Less “variation due to participants”
    • No need to balance groups (because there is only one group!)
  • Within-subjects disadvantage
    • Order effects (i.e., interference between conditions)
  • Between-subjects advantage
    • No order effects (i.e., no interference between conditions)
  • Between-subjects disadvantage
    • More participants (harder to recruit, schedule, etc.)
    • More “variation due to participants”
    • Need to balance groups (to ensure they are more or less the same)
  • HCI researchers generally prefer with-in subject designs

56

57 of 74

Within-subjects, Between-subjects (3)

  • Sometimes…
    • A factor must be assigned within-subjects
      • Examples: Block, session (if learning is the IV)
    • A factor must be assigned between-subjects
      • Examples: gender, handedness
    • There is a choice
      • In this case, the balance tips to within-subjects
  • With two factors, there are three possibilities:
      • both factors within-subjects
      • both factors between-subjects
      • one factor within-subjects + one factor between-subjects (this is a mixed design)

57

58 of 74

Order Effects, Counterbalancing

  • Only relevant for within-subjects factors
  • The issue: order effects (aka learning effects, practice effects, fatigue effects, sequence effects)
  • Order effects offset by counterbalancing:
    • Participants divided into groups
    • Test conditions are administered in a different order to each group
    • Order of administering test conditions uses a Latin square
    • Distinguishing property of a Latin square 🡪 each condition occurs precisely once in each row and column (next slide)

58

59 of 74

Latin Squares

59

2 x 2

4 x 4

3 x 3

5 x 5

60 of 74

Balanced Latin Square

  • With a balanced Latin square, each condition precedes and follows each other condition an equal number of times
  • Only possible for even-orders
  • Top row pattern: A, B, n, C, n – 1, D, n – 2, …

60

4 x 4

6 x 6

61 of 74

Example

  • An experimenter seeks to determine if three editing methods (A, B, C) differ in the amount of time to do a common editing task:

  • Conditions are assigned within-subjects
  • Twelve participants are recruited and divided into three groups (4 participants/group)
  • Methods administered using a 3 × 3 Latin Square (2 slides back)
  • Results (next slide)

61

62 of 74

Results - Data

62

Group effect is small

∴ Counterbalancing worked!

63 of 74

Results - Chart

63

64 of 74

Other Techniques

  • Instead of using a Latin square, �all orders (n!) can be used; 3 × 3 case 🡪
  • Conditions can be randomized
  • Randomizing best if the tasks are brief �and repeated often; examples (see below)

64

Target size

Movement direction

Movement distance

65 of 74

Asymmetric Skill Transfer

  • A phenomenon known as asymmetric skill transfer sometimes occurs in within-subjects designs
  • Figures in next three slides demonstrate
  • They are presented in the slides without an explanation (to be discussed in class)

65

66 of 74

66

Letters Only

Keyboard

Letters + Word Prediction

Keyboard

= LO

67 of 74

67

68 of 74

68

69 of 74

69

69

= LO

70 of 74

Longitudinal Studies

  • Sometimes instead of “balancing out” learning effects, the research seeks to promote and investigate learning
  • If so, a longitudinal study is conducted
  • “Practice” is the IV
  • Participants are practiced over a prolonged period of time
  • Practice units: blocks, sessions, hours, days, etc.
  • Example on next slide

70

71 of 74

Longitudinal Study – Results1

71

1 MacKenzie, I. S., Kober, H., Smith, D., Jones, T., & Skepner, E. (2001). LetterWise: Prefix-based disambiguation for mobile text entry. Proceedings of the ACM Symposium on User Interface Software and Technology - UIST 2001, 111-120, New York: ACM.

72 of 74

The New vs. The Old

  • Sometimes a new technique will initially perform poorly in comparison to an established technique
  • A longitudinal study will determine if a crossover point occurs and, if so, after how much practice (see below)

72

73 of 74

Cost-Benefit Trade-offs

  • New, improved techniques sometimes languish
  • Evidently, the benefit in the new technique is insufficient to overcome the cost in learning it (see below)

73

74 of 74

Attendance & Next Time

  • Hypothesis Testing

74