1 of 52

Get good data out of bad spreadsheets

With tidy tools

Duncan Garmonsway

@nacnudus

2 of 52

3 of 52

4 of 52

5 of 52

Uh oh

6 of 52

7 of 52

8 of 52

9 of 52

10 of 52

11 of 52

12 of 52

13 of 52

14 of 52

15 of 52

16 of 52

17 of 52

18 of 52

19 of 52

20 of 52

21 of 52

22 of 52

filter(row == 3, col == 1)$character

23 of 52

readr::melt_csv()

googlesheets4::sheets_cells()

tidyxl::xlsx_cells()

24 of 52

25 of 52

26 of 52

27 of 52

28 of 52

29 of 52

30 of 52

31 of 52

32 of 52

33 of 52

34 of 52

35 of 52

36 of 52

37 of 52

38 of 52

39 of 52

40 of 52

41 of 52

  • One-row-per-cell format:

42 of 52

  • One-row-per-cell format:

tidyxl

readr

googlesheets4

43 of 52

  • One-row-per-cell format:

tidyxl

readr

googlesheets4

  • behead() each header in turn

44 of 52

  • One-row-per-cell format:

tidyxl

readr

googlesheets4

  • behead() each header in turn

unpivotr

45 of 52

46 of 52

Via David Robinson and Jenny Bryan https://rpubs.com/dgrtwo/tidying-enron

47 of 52

48 of 52

((IF((103-B$89)=103,0,(103-B$89)))+(IF((200-B$95)=200,0,(200-B$95)))+(IF((196-B$98)=196,0,(196-B$98)))+(IF((200-B$101)=200,0,(200-B$101)))+(IF((70-B$104)=70,0,(MIN(40,(70-B$104))))+(IF((78-B$109)=78,0,(MIN(50,(78-B$109)))))+(IF((103-B$114)=103,0,(MIN(66,(103-B$114)))))+(IF((195-B$119-B$124-B$129-B$134-B$139)=195,0,(MIN(70,(195-B$119-B$124-B$129-B$134-B$139)))))+(IF((64-B$144)=64,0,(MIN(50,(64-B$144)))))+(IF((48-B$149)=48,0,(MIN(20,(48-B$149)))))+(IF((44-B$154)=44,0,(MIN(20,(44-B$154)))))+(IF((130-B$159)=130,0,(MIN(20,(130-B$159)))))))

49 of 52

50 of 52

51 of 52

52 of 52

Get good data out of bad spreadsheets

Duncan Garmonsway

@nacnudus