A | B | C | D | E | F | G | H | I | J | K | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Friendly URL of this spreadsheet: | ||||||||||
2 | https://tinyurl.com/rxivist-further-analysis | ||||||||||
3 | |||||||||||
4 | Rxivist preprint: | ||||||||||
5 | https://doi.org/10.1101/515643 | ||||||||||
6 | Data on Zenodo: | ||||||||||
7 | https://doi.org/10.5281/zenodo.2465689 | ||||||||||
8 | |||||||||||
9 | Question: can relation b/w IF and preprint downloads (as shown in Rxivist preprint) be correlated to downloads pre- and post-publication of the respective journal articles? | ||||||||||
10 | (and possibly to their OA-availability?) | ||||||||||
11 | |||||||||||
12 | Comparable figure for ArXiv (one-dimensional, no IF component) | ||||||||||
13 | https://twitter.com/catmacOA/status/1041699581233455104 | ||||||||||
14 | NB This is not the route explored here, because of limitations of usage of absolute download numbers pre- and post-publication. | ||||||||||
15 | |||||||||||
16 | Acccording to preprint: 37.563 articles, of which 15.797 published | ||||||||||
17 | 30 most-frequent journals further analyzed (IF, interval to publication) | ||||||||||
18 | According to publications_per_journal: 7576 articles in these journals | ||||||||||
19 | According to publication_time_journal: 7653 articles in 30 most-frequent journal (interval times listed for these) | ||||||||||
20 | Acccording to downloads_by_months: 18823 articles with monthly download stats (for 1st 12 months) = all articles except 2018 (n=18740) | ||||||||||
21 | Of articles with monthly stats (2013-2017), 12040 have been published, 6026 in 30 most-frequent journals | ||||||||||
22 | |||||||||||
23 | Available data per article in this set (n=6026): | ||||||||||
24 | ID | ||||||||||
25 | DOI: NO | ||||||||||
26 | Interval to journal publication (days) | ||||||||||
27 | Downloads per month for first 12 months | ||||||||||
28 | IF of journal (IF for 2017) | ||||||||||
29 | OA availability of journal article <- retrieve from UPW met DOI: NO | ||||||||||
30 | OA policy of 30 most-frequent journals | ||||||||||
31 | |||||||||||
32 | Data processing (data_aggregated) | ||||||||||
33 | - copy download_publication_status | ||||||||||
34 | - VLOOKUP in publication_time_journal for journal name and interval | ||||||||||
35 | - interval_month = IFERROR(QUOTIENT(interval,(365/12)),"-") NB This gives the number of full months prior to publication. | ||||||||||
36 | - VLOOKUP in downloads_by_months for monthly download stats (via helper columns with 2 values concatenated) | ||||||||||
37 | - calculate % prepub and postpub downloads in 1st year | ||||||||||
38 | |||||||||||
39 | Data processing part 2 (data_aggregated_2) | ||||||||||
40 | - copy download_publication_status | ||||||||||
41 | - calculate number of articles from pivot data_aggregated | ||||||||||
42 | - calculate median_interval, median_downloads_total, median_downloads_year1 and median_prepub directly from data_aggregated, copy into data_aggregated_2 | ||||||||||
43 | - calculate median_interval_months by dividing median_interval by 365/12 | ||||||||||
44 | |||||||||||
45 | Data processing part 3 (Chart 8 boxplots - data) | ||||||||||
46 | - from data_aggregated, calculate 5 number summary (median, Q1, Q3, Q1-1.5*IQR, Q3+1.5*IQR) for % downloads post-publication per journal | ||||||||||
47 | - using Google sheets formulas MEDIAN, QUARTILE | ||||||||||
48 | - Template for making box plots taken from: | ||||||||||
49 | https://creativemaths.net/blog/teaching-resources-item/google-sheets-compare-box-plots/ | ||||||||||
50 | |||||||||||
51 | |||||||||||
52 | List of charts in this spreadsheet: | ||||||||||
53 | Chart 1 | IF vs. downloads | |||||||||
54 | Chart 2 | IF vs. downloads year 1 | |||||||||
55 | Chart 3 | IF vs. interval until publication | |||||||||
56 | Chart 4 | interval until publication vs. % downloads post-publication | |||||||||
57 | Chart 5 | IF vs. % downloads post-publication | |||||||||
58 | Chart 6 | downloads year 1 vs. % downloads post-publication | |||||||||
59 | Chart 7 | scatterplots per journal (downloads year 1 vs. % downloads post-publication) | |||||||||
60 | Chart 8 | boxplots (% downloads post-publication) | |||||||||
61 | |||||||||||
62 | Excel sheet with underlying data: | ||||||||||
63 | https://www.dropbox.com/s/69lxaab17u2efld/Rxivist%20analysis.xlsx?dl=0 | ||||||||||
64 | |||||||||||
65 | Bianca Kramer | ||||||||||
66 | @MsPhelps | ||||||||||
67 | created 190123 | ||||||||||
68 | last modified 190127 | ||||||||||
69 | |||||||||||
70 | original data shared by the authors of the Rxivist preprint under CC-BY-NC | ||||||||||
71 | |||||||||||
72 | |||||||||||
73 | |||||||||||
74 | |||||||||||
75 |