1 of 38

PSY6422

Data Management & Visualisation

Tom Stafford

t.stafford@sheffield.ac.uk

2 of 38

1: OVERVIEW

3 of 38

Module Aims

Introduce modern data analysis tools.

Demonstrate best practice

Teach skills which are

…..robust

…..scalable

…..transferable

4 of 38

Motivation

5 of 38

Point-and-click analysis

- introduce errors

- forget process

- annoying to repeat

- what you have done becomes invisible (including errors)

6 of 38

Excel issues, #1

Ziemann, M., Eren, Y., & El-Osta, A. (2016). Gene name errors are widespread in the scientific literature. Genome biology, 17(1), 177. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7

7 of 38

Excel issues, #2

Herndon, T., Ash, M., & Pollin, R. (2014). Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Cambridge journal of economics, 38(2), 257-279.

8 of 38

Reporting errors are widespread

Nuijten, M. B., Hartgerink, C. H., van Assen, M. A., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior research methods, 48(4), 1205-1226. https://link.springer.com/article/10.3758/s13428-015-0664-2

- “The prevalence of statistical reporting errors in psychology (1985–2013)”

- 1/2 of papers looked at had errors, "One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion"

- Dare you check your last report: http://statcheck.io/

9 of 38

Every analysis you do, you will have do again

...and again

...and again

10 of 38

The old way isn’t working

Encourages errors

Opaque -> lack of credibility

High effort to repeat

High effort to adapt : inflexible

Doesn’t scale

Hard to share

11 of 38

The new way….Open Science

Open Access Publishing

Open (Source) Tools

Open Data

Open ...and reproducible...Analysis

12 of 38

The new way….Open Science

"What is Open Science? It is endeavoring to preserve the rights of others to reach independent conclusions about your data and work."

Jeff Rouder https://twitter.com/JeffRouder/status/938147822431502337

13 of 38

Motivation 2

The new way is

...increasingly expected

...scalable

...transferable

...more reliable

14 of 38

Motivation 2

All science is now data science

“I consider computational literacy (including coding) to be essential for any student today (regardless of whether they are from sciences or the humanities)”

Russ Poldrack, Stanford University

15 of 38

Motivation!

These are interesting times in scholarship

Learning just a few basics prepares you for a rapidly expanding world of tools and techniques

16 of 38

Module logistics

17 of 38

Stafford, T. (2008), A fire to be lighted: a case-study in enquiry-based learning, Practice and Evidence of Scholarship of Teaching and Learning in Higher Education, Vol. 3, No. 1, April 2008, pp.20-42.

18 of 38

A graduate seminar

Lack of structure…..BUT more autonomy

Customised learning…..BUT requires more effort

Advanced topics...BUT face to face with uncertainty and changing practices

There is room to include the topics *you* want to learn

* * You have my full attention * *

19 of 38

There are no special skills

Anyone can do this

All the mistakes you are about to make, I have made myself, many times

You can only learn by making these mistakes.

20 of 38

Assessment

21 of 38

Assessment

Initial Self-assessment

including target setting

Wrap-up self-assessment

Code project: data visualisation

20%

30%

50%

} Learning Portfolio

22 of 38

Self-Assessment

Deadline: 2019-02-15 17:00

23 of 38

Self-ratings: competencies

I don’t understand what this is

I am ready to try this

I can probably do this

This is trivial for me

24 of 38

Self-ratings: jargon

I don’t know this

I could guess

I know what this is

I am very familiar

25 of 38

Self-ratings: reading

I haven’t read this

I read this

I read this and made notes

I read this, made notes, followed up refs

26 of 38

Target setting

27 of 38

Process

AT START OF MODULE

Initial self-assessment

Identify target items for learning

ONGOING

Add items (reading, jargon) as you come across them

28 of 38

Wrap up self-Assessment

Deadline: 2019-05-20 17:00

29 of 38

AT END OF COURSE

Identify achieved targets

Identify future targets

30 of 38

Code project

Deadline: 2019-05-20 17:00

31 of 38

Topics

32 of 38

Topics

Project organisation

Python

Coding Fundamentals

Python

Data Management

Python

Coding principles

Python

Making graphs

R

Introduction to R / RStudio

R

Regression models / visualising regression models

The terminal (command line interface)

33 of 38

After Easter

Personalised tuition & Advanced topics

Version control and git

Jupyter notebooks

Shiny apps

RMarkdown

….????

34 of 38

Deadlines

35 of 38

Deadlines

2019-02-15 @ 5pm

Initial self-assessment

2019-05-20 @ 5pm

Wrap-up self-assessment

2019-05-20 @ 5pm

Code project

These are on the timetable

36 of 38

Final task

37 of 38

Is Spyder installed on your PC?

Search under programmes

If not, install via ‘software centre’

38 of 38

Questions, etc

t.stafford@sheffield.ac.uk