1 of 31

Managing Data with SQL

& Open Source Relational Databases

Michael Shensky

Head of Research Data Services

m.shensky@austin.utexas.edu

October 24, 2025

UT-OSPO

UT Austin Open Source Program Office

Sloan grant number: G-2023-20944

2 of 31

Welcome and Workshop Logistics

  • Workshops will have live automated transcription, but transcription may not be perfect
  • Feel free to ask questions out loud and add comments in the chat
  • Workshop instruction will run from 12pm to 1pm and there will be time for questions until 1:15pm
  • Recorded workshops and lecture slides will be posted online at:

3 of 31

Fall 2025 Data & Donuts Schedule

    • Sept. 5 Overcoming research data management challenges: learning through practice
    • Sept. 26 Analysis and management of textual data using Python
    • Oct. 3 Data discovery strategies
    • Oct. 24 SQL and open source relational databases

4 of 31

AI for Generating Research Code

In this virtual workshop you will learn about different AI tools that can be used by researchers to generate research code, their respective strengths and limitations, and important considerations when using AI in research. Participants will have an opportunity to learn through live demonstrations and hands-on practice with AI tools.

Date/time: Wednesday, November 5, from 12-1 pm

Format: Virtual (Zoom)

Registration: https://utexas.zoom.us/meeting/register/SBhpVsbCQ22weDaOc6H3dg#/registration

5 of 31

Fall 2025 UTL Research Data Services Events

What Celebrate and learn about GIS and geospatial research here at UT

When Wednesday, November 19th, time 12 - 5pm

Where PCL Scholars Lab

More info https://guides.lib.utexas.edu/gis/ut-gis-day

6 of 31

Goals for This Workshop

  • Provide a foundation in the basics of using SQL
  • Understand why SQL is useful and develop familiarity with common use cases
  • Cover the most common relational database options, with a focus on open source
  • Practice using DB Browser for SQLite to manage data using SQL
  • Gain hands on experience using SQL and Python to interact with a relational database
  • Understand benefits of using open source software

7 of 31

What is a Database?

  • A database is an organized collection of related datasets with defined data structures
  • Conforms to a specific database model (relational, document, graph, etc.)
  • Managing information in a database can be more efficient and more secure that storing data in spreadsheets

A grey cylinder symbol is commonly used to represent databases in system diagrams

8 of 31

Why Use SQL and Relational Databases?

    • Can be easier to manage tables in a database than manage multiple spreadsheets
    • Ability to join related data records
    • SQL commands are easy to share and results are easy to reproduce
    • SQL commands can be integrated into scripted processes (Python, PHP, Java, etc.)
    • SQL has many applications - data driven web design, GIS data management, etc.
    • RDBMS instances can store massive amounts of data and allow for granular control over data access

9 of 31

Limits of Working with Spreadsheets & Excel

  • Maximum database size:

281TB

  • Maximum rows in a table: 18,446,744,073,709,551,616
  • Maximum number of columns per table

2000

10 of 31

What is a Relational Database?

  • In a relational database all data is stored in tables
  • Each row contains information about a particular record
  • Each column is dedicated to storing information about a specific attribute
  • Rows in one table can be joined to related rows in another table based on a common key value
  • Data can be normalized to avoid redundant storage of information across multiple tables

tree_id

species

condition

plant_date

1

oak

good

1/1/1998

2

ash

poor

3/28/1993

tree_id

work

date

1

trim

1/5/2020

1

trim

2/4/2021

2

fertilize

3/8/2020

2

trim

5/9/2021

3

remove

2/9/2022

11 of 31

Database Implementations

Client-Server (multi-user architecture)

File database (single user architecture)

12 of 31

What is Structured Query Language?

  • SQL = Structured Query Language
  • SQL is a standardized syntax for interacting with relational databases
  • SQL can be used within database management software or can be used within scripts written in other languages (e.g. Python)
  • Used for:
    • Creating new databases and tables
    • Querying data
    • Updating data
    • Joining data
    • Setting constraints
    • Defining triggers and views
    • Creating indices
    • Managing primary and foreign keys

13 of 31

Relational Database Use Cases

  • Managing business records pertaining to sales, marketing, HR, etc.
  • Building data driven web sites
  • Embedding data within mobile applications
  • Managing research data

14 of 31

Relational Database Research Use Cases

  • Li, Guan-Cheng. 2019. Kickstarter Structured Relational Database, https://doi.org/10.7910/DVN/EOYBXM, Harvard Dataverse, V2
  • Tan, Kynan; Munster, Anna; Mackenzie, Adrian. 2021. Replication Data for: "Images of the arXiv: reconfiguring large scientific image datasets, https://doi.org/10.7910/DVN/EAAG94, Harvard Dataverse, V1
  • Brown, Eva Maxfield; Slaughter, Isaac; Weber, Nicholas. 2025. Code Contribution and Credit in Science. https://doi.org/10.7910/DVN/KPYVI1, Harvard Dataverse, V1, UNF:6:wWLU2O+hmG5I5RHguL5VTg== [fileUNF]
  • Hentati-Sundberg, J., & Olin, A. 2022. Common guillemots in the Baltic Sea studied with video surveillance and object detection: raw data, annotations, model, and model outputs [Data set]. Zenodo. https://doi.org/10.5061/dryad.xsj3tx9hx
  • Cotacallapa Mamani, H. E. 2023. E-commerce Product Dataset from Mercado Libre Perú (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8415496
  • Lessmann, O., Jewell, K. S., Ehlig, B., Kunkel, U., Macherius, A., Winter, E., & Wick, A. 2025. Collective Spectral Library (25.6) [Data set]. Federal Institute of Hydrology. https://doi.org/10.5281/zenodo.17182260

15 of 31

Using SQL in Open Source Research Software: QGIS

16 of 31

RDBMS Popularity Rankings

Rankings of Database Popularity: https://db-engines.com/en/ranking

17 of 31

What is Open Source Software?

  • Source code for the software is openly shared - often on platforms like GitHub
  • Typically, open source software is shared under a software license that defines how the software can be used by others
  • Using open source software in research can provide benefits over commercial software
    • Free to use, modifiable, code can be checked for understanding, etc.

18 of 31

RDBMS Options: Open Source vs. Commercial Options

Analysis of Commercial vs Open Source Databases https://db-engines.com/en/ranking_osvsc

19 of 31

RDBMS Services at UT Austin

  • SQLite
  • MySQL* / MariaDB
  • PostgreSQL
  • Microsoft SQL Server*
  • Oracle*

* Offered as a centrally funded service for faculty & staff by UT Enterprise Technology

20 of 31

DB Browser for SQLite

  • Cross platform compatible (Windows, MacOS, Linux)
  • Easy to use, good for practicing database management with SQL
  • Good DB Browser tutorial at http://datacarpentry.org/sql-ecology-lesson/
  • Demonstration of common SQL operations
    • Select, distinct, where, count, sum, avg, order by

Download installation files for DB Browser for SQLite at https://sqlitebrowser.org/dl/

21 of 31

DB Browser Demo

  • Install DB Browser (https://sqlitebrowser.org/)
  • Download tree inventory data from the City of Portland by exporting to CSV
  • Open DB Browser
  • Import CSV as table
  • Demonstration of common SQL table structure operations
    • Change table name, change field name, add field, drop field
  • Demonstration of common SQL operations
    • Select, distinct, where, count, min, max, sum, avg, order by, insert into

22 of 31

Google Colab Python and SQL Example

  • Colab is a free Google service that allows you to create and run Jupyter Notebooks
  • Allows you to write Python code and text notes in compartmentalized cells within a notebook
    • Code cells can be run individually to allow for previewing outputs and troubleshooting issues
  • Notebooks are stored in Google Drive and can access other files in Google Drive

https://research.google.com/colaboratory/faq.html

23 of 31

What is Python?

  • Open source, interpreted programming language
  • Cross platform
  • Extensive use in a variety of fields
  • Large ecosystem of open source packages
  • Can be used for file management, analyzing data, editing data, visualizing data, and more!

24 of 31

Python and SQL Demo

  • Importing packages and modules
  • Creating a new SQLite database
  • Creating new tables
  • Populating tables with records
  • Querying records
  • Joining tables

Google Colab notebook file for this demo can be accessed at:

https://colab.research.google.com/drive/1XBuPpFq7kAyjLwNleJNx-RqCx7r9PQGS?usp=sharing

25 of 31

SQL and Database Resources

26 of 31

UT Open Source Program Office (UT-OSPO)

26

  • The UT Open Source Program Office (UT-OSPO) is a great resource for developers of research software
  • Established in 2023 with funding from the Alfred P. Sloan Foundation (Grant Number G-2023-20944)
  • Director: Angela Newell, Ph.D. (anewell@austin.utexas.edu)
  • The Office is a partnership between TACC, ET, the iSchool, & UT Libraries

27 of 31

UT-OSPO Vision

Engage faculty and students in an open source participation pathway advancing basic use of open source software through contribution, sharing, accepting external contributions, and ultimately developing an ecosystem of related open source projects.

27

Develop a community of researchers who use, contribute to or develop open source code and a program office that embraces and supports researchers at every step of the pathway from using code to developing software

Focus on faculty, post-docs, grad students - helping them interact with the open source communities for the tools that they use.

28 of 31

28

29 of 31

How Can the OSPO Help Researchers?

29

  • Training/Events
  • Funding Identification
  • Best practices guides
  • Consultation Services

30 of 31

How to Connect

31 of 31

Wrap Up

Michael Shensky

Head of Research Data Services

m.shensky@austin.utexas.edu

Questions? Comments?