1 of 21

Managing Data with SQL

& Relational Databases

Michael Shensky

Head of Research Data Services

m.shensky@austin.utexas.edu

2 of 21

Welcome

3 of 21

Workshop Logistics

  • All workshops this fall will be conducted as Zoom webinars (workshop on 9/16 is also in-person)
  • You can register for upcoming workshops by visiting https://guides.lib.utexas.edu/data-and-donuts
  • Feel free to ask questions and add comments in the chat
  • Workshop instruction will run from 12pm to 1pm and there will be time for questions until 1:15pm
  • Workshops this fall will be recorded and links to view the recordings will be posted at https://guides.lib.utexas.edu/data-and-donuts/information-about-past-workshops

4 of 21

Other Fall 2022 Research Data Services Announcements

UTL Map & Geospatial Collections Explorer Fellowship

  • This Fellowship will be awarded for a second consecutive year in 2022
  • The Fellowship is designed to incentivize and reward use of UT Libraries geospatial collections
  • The 2022 call for proposals is now posted and proposals are now due by 10/10/2022 at 11:59pm
  • Currently enrolled UT students at all levels, faculty, & post-docs are eligible for the $1500 fellowship award
  • Winners will be announced at an event on UT GIS Day (11/16/2022)
  • Visit https://guides.lib.utexas.edu/gis/utl-map-and-geospatial-collection-explorer-fellowship for the latest updates

5 of 21

Goals for This Workshop

  • Provide a foundation in the basics of using SQL
  • Understand why SQL is useful and develop familiarity with common use cases
  • Cover the most common relational database options
  • Practice using DB Browser for SQLite to manage data using SQL
  • Gain hands on experience using SQL and Python to interact with a relational database

6 of 21

What is a Database?

  • A database is an organized collection of related datasets with defined data structures
  • Conforms to a specific database model (relational, document, graph, etc.)
  • In relational databases, stored information can be queried using SQL

A grey cylinder symbol is commonly used to represent databases in system diagrams

7 of 21

What is a Relational Database?

  • In a relational database all data is stored in tables
  • Each row contains information about a particular record
  • Each column is dedicated to storing information about a specific attribute
  • Rows in one table can be joined to related rows in another table based on a common key value
  • Data can be normalized so that no redundant information needs to be stored

tree_id

species

condition

plant_date

1

oak

good

1/1/1998

2

ash

poor

3/28/1993

tree_id

work

date

1

trim

1/5/2020

1

trim

2/4/2021

2

fertilize

3/8/2020

2

trim

5/9/2021

3

remove

2/9/2022

8 of 21

What is SQL?

  • SQL = Structured Query Language
  • SQL is a standardized syntax for interacting with relational databases
  • SQL can be used within database management software or can be used within scripts written in other languages (e.g. Python)
  • Used for:
    • Creating new databases and tables
    • Querying data
    • Updating data
    • Joining data
    • Setting constraints
    • Defining triggers and views
    • Creating indices
    • Managing primary and foreign keys

9 of 21

Database Implementations

File database (single user architecture)

  • Functionality for accessing and modifying a database is built directly into an application/script and the database is stored locally

Application/script >>> Locally stored file database

Client-Server (multi-user architecture)

  • The application/script sends a request to a relational database management system (RDBMS) software on a server that controls the database

Application/script >>>

Database management system on a server >>>

Database within DBMS

10 of 21

Why Use Databases & SQL for Data Management

    • Easier to manage multiple tables in a database than manage multiple spreadsheets
    • Ability to join related data records
    • SQL commands are easy to share and results are easy to reproduce
    • SQL commands can be integrated into scripted processes (Python, PHP, Java, etc.)
    • SQL has many applications - data driven web design, GIS data management, etc.
    • RDBMS instances can store massive amounts of data and allow for granular control over data access

11 of 21

Relational Database Use Case Examples

  • Managing experimental research data
  • Managing business records pertaining to sales, marketing, HR, etc.
  • Building data driven web sites
  • Embedding data within mobile applications

12 of 21

RDBMS Options

  • SQLite
  • MySQL* / MariaDB
  • PostgreSQL
  • Microsoft SQL Server*
  • Oracle*

*Offered as a free service for faculty & staff by UT ITS

13 of 21

RDBMS Popularity Rankings

Rankings of Database Popularity: https://db-engines.com/en/ranking

14 of 21

DB Browser for SQLite

  • Cross platform compatible (Windows, MacOS, Linux)
  • Easy to use, good for practicing database management with SQL
  • Good DB Browser tutorial at http://datacarpentry.org/sql-ecology-lesson/
  • Demonstration of common SQL operations
    • Select, distinct, where, count, sum, avg, order by

Download installation files for DB Browser for SQLite at https://sqlitebrowser.org/dl/

15 of 21

DB Browser Demo

  • Install DB Browser
  • Download tree inventory data from the City of Austin
  • Open DB Browser
  • Import CSV as table
  • Demonstration of common SQL table structure operations
    • Change table name, change field name, add field, drop field
  • Demonstration of common SQL operations
    • Select, distinct, where, count, min, max, sum, avg, order by, insert into

Export Austin tree inventory data as CSV from: https://data.austintexas.gov/Locations-and-Maps/Tree-Inventory/wrik-xasw

16 of 21

Google Colab SQL Example

  • Colab is a free Google service that allows you to create and run Jupyter Notebooks
  • Allows you to write Python code and text notes in compartmentalized cells within a notebook
    • Code cells can be run individually to allow for previewing outputs and troubleshooting issues
  • Notebooks are stored in Google Drive and can access other files in Google Drive

https://research.google.com/colaboratory/faq.html

17 of 21

What is Python?

  • Open source, interpreted programming language
  • Cross platform
  • Current version is 3.10.4 (as of 4/22)
  • Extensive use in a variety of fields
  • Large ecosystem of open source packages
  • Can be used for file management, analyzing data, editing data, visualizing data, and more!

18 of 21

Python Basics

  • Packages and modules
  • Variables and object types
  • Operators
  • Methods and functions
  • Syntax
  • Comments
  • Conditional statements
  • Loops

19 of 21

Python and SQL Demo

  • Importing packages and modules
  • Creating a new SQLite database
  • Creating new tables
  • Populating tables with records
  • Querying records
  • Joining tables

Google Colab notebook file for this demo can be accessed at:

https://colab.research.google.com/drive/1XBuPpFq7kAyjLwNleJNx-RqCx7r9PQGS?usp=sharing

20 of 21

SQL Resources, Documentation, and Help

21 of 21

Wrap Up

Michael Shensky

Head of Research Data Services

m.shensky@austin.utexas.edu

Questions? Comments?

Next Workshop