Budgeting for Data Management and Open Science
Fernando Rios
Research Data Management Specialist, UA Libraries
Tina Lee
User Engagement Officer, CyVerse
Budgeting assumptions
Data Management activities that need to be budgeted for:
Active Research
Sharing outputs openly and reproducibly
Mostly need to budget someone’s time to take ownership of implementing data operating procedures, do curation, documentation, and sharing
Under-budgeting of time...
80% is spent finding data and cleaning it
Ballpark amounts
European Commission High Level Expert Group on the European Open Science Cloud “Realising the European Open Science Cloud, First report and recommendations of the Commission High Level Expert Group on the European Open Science Cloud”. 2016.
Saleem Arriago. “Facilitating and Ensuring Data STewardship: Data Challenges of NOAA’s Climate Observation Division”. 2016
Costs at different points in the research lifecycle
Credit: Digital Curation Centre
Time/Personnel,�Data licensing costs
Time/Personnel (experience vs cost)�Storage
Compute
Documentation
Long-term storage
Data curation &
Archiving costs
Active-phase research
Personnel budgets should...
...consider:
...include time and experience for:
Hardware and software budgets should include:
Sample Storage Costs
Assume 100 TB storage, 10 TB egress per year.
Think about what the storage time frame and purpose are
Storage Type | Storage Cost� | Egress (10 TB/yr) | Total (1 yr, 100TB) | Note |
Institutional research computing | $3900 - $4500 | $0 | $3900 - $4500 | Meant for active Storage |
Amazon S3 | ~$1500 | ~$900 | $2500 | Glacier Deep Archive (infrequent access) |
LTO-8 Tape | ~$1500 + $7500 | 0 | $9000 | 10 x 12TB tapes + drive. Long-term storage. Does not include labor or storage |
Wrapping up research
What needs to happen?
Data Archiving Costs
Easier to budget for than other costs
Public vs non-public�Non-public: See previous slide on storage costs
Data Curation Costs
Highly variable.
Example 1: inexperienced student in the discipline that requires training on data sharing and publishing
TERRA-REF project job posting for student at UA
A lot of the work is documentation
Data Curation Costs
Example 2: Experienced researcher in the ecological sciences with experience in open science
http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-hours-690/
Item | Cost | Note |
Double checking the main dataset and doing some reformatting to prepare it for submission | 5h | Had already spent a fair amount of time reformatting it to best-practices |
Creating missing supplementary datafile and metadata | 3h | Missing datafile may not have been needed after checking again for errors |
Submission to Dryad | 0.75h �$90 |
|
Prepare a geographic map of the locations in Dryad submission | 1h | Realized not everyone is familiar with locations in dataset |
Submission of map to Figshare | 0.25h | |
Revising, cleaning up code, uploading it to GitHub | 25h | Much work needed to clean it up |
Archive code & making code citable w/DOI in Zenodo | 0.5h | |
Editing bibliography in paper to follow best practices for data and code citation | 0.5h | |
Open access costs | $600 | Article Processing Charges (APCs) |
How do I find what I need to budget for in my project?