PSY6422
Data Management & Visualisation
Tom Stafford
t.stafford@sheffield.ac.uk
1: OVERVIEW
Module Aims
Introduce modern data analysis tools.
Demonstrate best practice
Teach skills which are
…..robust
…..scalable
…..transferable
Motivation
Point-and-click analysis
- introduce errors
- forget process
- annoying to repeat
- what you have done becomes invisible (including errors)
Excel issues, #1
Ziemann, M., Eren, Y., & El-Osta, A. (2016). Gene name errors are widespread in the scientific literature. Genome biology, 17(1), 177. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
Excel issues, #2
Herndon, T., Ash, M., & Pollin, R. (2014). Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Cambridge journal of economics, 38(2), 257-279.
Reporting errors are widespread
Nuijten, M. B., Hartgerink, C. H., van Assen, M. A., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior research methods, 48(4), 1205-1226. https://link.springer.com/article/10.3758/s13428-015-0664-2
- “The prevalence of statistical reporting errors in psychology (1985–2013)”
- 1/2 of papers looked at had errors, "One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion"
- Dare you check your last report: http://statcheck.io/
Every analysis you do, you will have do again
...and again
...and again
The old way isn’t working
Encourages errors
Opaque -> lack of credibility
High effort to repeat
High effort to adapt : inflexible
Doesn’t scale
Hard to share
The new way….Open Science
Open Access Publishing
Open (Source) Tools
Open Data
Open ...and reproducible...Analysis
The new way….Open Science
"What is Open Science? It is endeavoring to preserve the rights of others to reach independent conclusions about your data and work."
Jeff Rouder https://twitter.com/JeffRouder/status/938147822431502337
Motivation 2
The new way is
...increasingly expected
...scalable
...transferable
...more reliable
Motivation 2
All science is now data science
“I consider computational literacy (including coding) to be essential for any student today (regardless of whether they are from sciences or the humanities)”
Russ Poldrack, Stanford University
Motivation!
These are interesting times in scholarship
Learning just a few basics prepares you for a rapidly expanding world of tools and techniques
Module logistics
Stafford, T. (2008), A fire to be lighted: a case-study in enquiry-based learning, Practice and Evidence of Scholarship of Teaching and Learning in Higher Education, Vol. 3, No. 1, April 2008, pp.20-42.
A graduate seminar
Lack of structure…..BUT more autonomy
Customised learning…..BUT requires more effort
Advanced topics...BUT face to face with uncertainty and changing practices
There is room to include the topics *you* want to learn
* * You have my full attention * *
There are no special skills
Anyone can do this
All the mistakes you are about to make, I have made myself, many times
You can only learn by making these mistakes.
Assessment
Assessment
Initial Self-assessment
including target setting
Wrap-up self-assessment
Code project: data visualisation
20%
30%
50%
} Learning Portfolio
Self-Assessment
Deadline: 2019-02-15 17:00
Self-ratings: competencies
I don’t understand what this is
I am ready to try this
I can probably do this
This is trivial for me
Self-ratings: jargon
I don’t know this
I could guess
I know what this is
I am very familiar
Self-ratings: reading
I haven’t read this
I read this
I read this and made notes
I read this, made notes, followed up refs
Target setting
Process
AT START OF MODULE
Initial self-assessment
Identify target items for learning
ONGOING
Add items (reading, jargon) as you come across them
Wrap up self-Assessment
Deadline: 2019-05-20 17:00
AT END OF COURSE
Identify achieved targets
Identify future targets
Code project
Deadline: 2019-05-20 17:00
Topics
Topics
| Project organisation |
Python | Coding Fundamentals |
Python | Data Management |
Python | Coding principles |
Python | Making graphs |
R | Introduction to R / RStudio |
R | Regression models / visualising regression models |
| The terminal (command line interface) |
After Easter
Personalised tuition & Advanced topics
Version control and git |
Jupyter notebooks |
Shiny apps |
RMarkdown |
….???? |
Deadlines
Deadlines
2019-02-15 @ 5pm | Initial self-assessment |
2019-05-20 @ 5pm | Wrap-up self-assessment |
2019-05-20 @ 5pm | Code project |
These are on the timetable
Final task
Is Spyder installed on your PC?
Search under programmes
If not, install via ‘software centre’
Questions, etc
t.stafford@sheffield.ac.uk