1 of 25

Data Publishing and Preservation

Michael Shensky

Head of Research Data Services

m.shensky@austin.utexas.edu

Meryl Brodsky

Liaison Librarian

meryl.brodsky@austin.utexas.edu

2 of 25

Welcome

3 of 25

Workshop Logistics

4 of 25

Other Fall 2022 Research Data Services Announcements

UT GIS Day 2022

  • UT GIS Day will take place on Wednesday 11/16 from 9am to 6pm
  • We will have a series of event dedicated to showcasing how geographic information

systems & related technologies are being used by the campus community

  • Events will include:
    • Maya interactive lidar event
    • Panel discussion
    • GIS career event
    • Lightning talks
    • Drone demo
    • PCL Map Room tour
    • And more…

Proposals are now being accepted for presentations and maps/posters!

Visit the event website using the QR code or link below to submit your proposal!

Visit the UT GIS Day site at

https://guides.lib.utexas.edu/gis

for full event details!

5 of 25

Goals for This Workshop

  • Gain an understanding of why data publication is beneficial (and/or required!)
  • Learn about good data management practices in preparation for data publishing and preservation
  • Discuss data publication best practices including: file naming, metadata, and more
  • Build familiarity with strategies for protecting sensitive information
  • Explore different data repositories
  • Log in to the Texas Data Repository and Texas ScholarWorks

6 of 25

Why Preserve & Publish your Research Data?

7 of 25

Why Preserve & Publish your Research Data?

  • Ensures your data will not be lost
  • Required by many funding agencies and some journal publishers
  • Open Science can support researchers in other fields, or countries (improve equity)
  • A second research product will increase your research impact

8 of 25

Data Preservation & Publication

within the Research Data Lifecycle

  • Typically comes towards the end of the research data lifecycle
  • Requires that the initial stages of the lifecycle be carried out successfully
  • If done correctly, data preservation & publication helps foster further research

Plan

Find/

Create

Process

Analyze

Preserve

Publish

Reuse

9 of 25

Good Data Management Practices

  • Write a data management plan, designate a point person in charge of data decisions
  • Create a data dictionary to define the variables
  • Create a file naming protocol and document it
  • Document workflow and use scripted workflows where possible
  • Keep raw data raw
  • Backup data regularly to different servers or computers
  • Use open standard file formats instead of proprietary file formats when possible

10 of 25

Good Data Management Practices

11 of 25

Data Dictionary & Variables

12 of 25

File Name Protocol - Be consistent, document rules

  • Example: AtherRat_012_056_mb_0423_raw.csv
  • AtherRat = Experiment Name
  • 012 = Experiment Number
  • 056 = Sample Number
  • mb = Stain Used, Methylene Blue
  • 0423 = 2 digit coordinates of image, 4 across, 23 down
  • Raw = data stage

13 of 25

Readme File = Inventory of data files + project

General Information

Sharing/Access Information

Data & File Overview

Methodology Information

Data-Specific Information

14 of 25

Good Data Publication Practices

  • Findable = Use metadata for datasets
  • Accessible = Publish in an appropriate repository
  • Interoperable = Do not compress & use non-proprietary formats
  • Reusable = Data should be well described, so it can be replicated or combined with other data

15 of 25

Publishing & Protecting Sensitive Data

  • Anonymize
  • Generalize
  • Encryption
  • Aggregation
  • Destruction

16 of 25

Texas Data Repository (TDR)

  • TDR is a multi-institutional data repository managed by the Texas Digital Library and built on open source Dataverse software
  • UT Austin has its own “Dataverse” within TDR
  • All UT Austin affiliates can log in with their UT EID
  • If you are leaving UT Austin (graduating, etc.) contact TDR support so that you can maintain access to your account

Try logging in

17 of 25

Why use the Texas Data Repository?

  • Comply with funding requirements
  • Ensure reliable, managed access for data
  • Make your research data citable and increase scholarly impact since all published datasets receive a DOI
  • Collaborate with research team members who can also use TDR
  • Access local data publication support through the UT Libraries
  • Great default choice for a data repository if you don’t have time or interest to search through all possible options

18 of 25

Adding Data to the Texas Data Repository

  • Create a dataverse under which all of your related projects can be grouped
  • Create datasets within that dataverse to store data about a particular project or study
  • Create metadata for your dataverse or dataset
  • Upload individual data files to your dataset
  • Determine data licensing (default is CC0) and any restrictions you want to impose on data access

19 of 25

Managing Data in the Texas Data Repository

  • You can save draft datasets in TDR if you are not quite ready to publish
  • Publish new “versions” if data changes
  • You can ask for a review of your data prior to publication if you would like
  • Try to avoid deaccessioning data if at all possible
    • This will lead to a “tombstone”in TDR

20 of 25

Data Limits in the Texas Data Repository

  • Datasets should be less than 10GB in size
  • Individual data files can be up to 4GB, but files larger than 2GB may experience upload issues
  • For projects with 10GB+ of data, consider breaking the data up into multiple datasets

21 of 25

Detailed Info about the Texas Data Repository

Works best in Google Chrome

22 of 25

Texas Data Repository

  • TDR was built using Dataverse software, you can use the very powerful Dataverse API to interact with TDR for operations like
    • Searching
    • Downloading
    • Creating dataverses or datasets
    • Examining dataverse content statistics
    • And more!
  • Some API functions require you to generate your own API key
  • Try using the API with Python scripts that utilize the requests module

23 of 25

Other Data Repository Options

  • Re3data.org is a website you can use to search for online data repositories
  • This site can be helpful when you are looking for data to download or looking for a place to publish your data
  • Practice using re3data.org to find a data repository that might be relevant for your research

24 of 25

Texas Scholar Works (TSW)

  • TSW is a repository for depositing scholarly publications associated with your data
  • Posters, conference presentations, articles, theses, dissertations, and more
  • Over 100,000 publications already deposited
  • All publications receive a DOI and Handle

Try logging in

25 of 25

Wrap Up

Michael Shensky

Head of Research Data Services

m.shensky@austin.utexas.edu

Questions? Comments?

Upcoming Events

  • UT GIS Day will take place on Wednesday 11/16 from 9am to 6pm

Meryl Brodsky

Liaison Librarian

meryl.brodsky@austin.utexas.edu