Databases are popular for many applications, especially for use with web applications or customer-oriented programs. There is a caveat though; databases don't have the performance that file-system based applications do.
Normal files, such as text files, are easy to create and use; Python has the tools built-in and it doesn't take much to work with files. File systems are more efficient (most of the time) in terms of performance because you don't have the overhead of database queries or other things to worry about. And files are easily portable between operating systems (assuming you aren't using a proprietary format) and are often editable/usable with different programs.
Databases are good when discrete "structures" are to be operated on, e.g. a customer list that has phone numbers, addresses, past orders, etc. A database can store a lump of data and allow the user or developer to pull the necessary information, without regard to how the data is stored. Additionally, databases can be used to retrieve data randomly, rather than sequentially. For pure sequential processing, a standard file is better.
Obviously, there is more to the file-system vs. database battle than what I just covered. But, generally speaking, you will be better suited using a file-system structure than a database unless there is a reason to use a database. My personal recommendation is that, unless you are creating a server-based application, try using a local file rather than a database. If that doesn't work, then you can try a database.
How to Use a Database
A database (DB) is simply a collection of data, placed into an arbitrary structured format. The most common DB is a relational database; tables are used to store the data and relationships can be defined between different tables. SQL (structured query language) is the language used to work with most DBs. (SQL can either be pronounced as discrete letters "S-Q-L" or as a word "sequel". I personally use "sequel".)
SQL provides the commands to query a database and retrieve or manipulate information. The format of a query is one of the most powerful forces when working with DBs; an improper query won't return the desired information, or worse, it will return the wrong information. SQL is also used to input information into a DB.
While you can interact directly with a DB using SQL, as a programmer you have the liberty of using Python to control much of the interactions. You will still have to know SQL so you can populate and interact with the DB, but most of the calls to the DB will be with the Python DB-API (database application programming interface).
Working With a Database
This book is not intended to be a database or SQL primer. However, I will provide you with enough information to create simple database and an application that uses it. First I will cover the basic principles of databases and SQL queries then we will use Python to make and manipulate a small database.
First off, consider a database to be one or more tables, just like a spreadsheet. The vertical columns comprise different fields or categories; they are analogous to the fields you fill out in a form. The horizontal rows are individual records; each row is one complete record entry. Here's a pictorial summary, representing a customer list. The table's name is "Customers_table":
|
Index |
Last_Name |
First_Name |
Address |
City |
State |
Zip_Code |
Phone_Number |
|
0 |
Johnson |
Jack |
123 Easy St. |
Anywhere |
CA |
12345 |
555-555-5555 |
|
1 |
Smith |
John |
312 Hard St. |
Somewhere |
NY |
54321 |
666-555-5555 |
The only column that needs special explanation is the Index field. This field isn't required but is highly recommended. You can name it anything you want but the purpose is the same. It is a field that provides a unique value to every record; it's often called the primary key field. The primary key is a special object for most databases; simply identifying which field is the primary key will automatically increment that field as new entries are made, thereby ensuring a unique data object for easy identification. The other fields are simply created based on the information that you want to include in the database.
To make a true relational database, you have one table that refers to one or more tables in some fashion. If I wanted to make a order-entry database, I could make another table that tracks an order and relate that order to the above customer list, like so:
|
Key |
Item_title |
Price |
Order_Number |
Customer_ID |
|
0 |
Boots |
55.50 |
4455 |
0 |
|
1 |
Shirt |
16.00 |
4455 |
0 |
|
2 |
Pants |
33.00 |
7690 |
0 |
|
4 |
Shoes |
23.99 |
3490 |
1 |
|
5 |
Shoes |
65.00 |
5512 |
1 |
This table is called "Orders_table". This table shoes the various orders made by each person in the customer table. Each entry has a unique key and is related to the customer table by the Customer_ID field.
Using SQL to Query a Database
To query a table using SQL, you simply tell the database what it is your are trying to do. If you want to get a list of the customers or a list of orders in the system, just select what parts of the table you want to get. (Note: the following code snippets are not Python specific; additionally, SQL statements are not case-sensitive but are usually written in uppercase for clarity.)
SELECT * FROM Customers_table
The result should pretty look just like the table above; it may be textual or have grid lines, depending on the environment you are using but the information will all be there.
You can also limit the selection to specific fields, such as:
SELECT Last_name, First_name FROM Customers_table
SELECT Address FROM Customers_table WHERE State == "NY"
The second SQL query above uses the "WHERE" statement, which returns a limited set of information based on the condition specified. If you used the statement as written, you should only get back the addresses of customers who live in New York state. Obviously this is a good idea because it limits the results you have to process and it reduces the amount of memory being used. Many system slowdowns can be traced to bad DB queries that return too much information and consume too many resources.
To combine the information from two tables, i.e. to harness the power of relational databases, you have to join the tables in the query.
SELECT Last_name, First_name, Order_Number FROM Customers_table, Orders_table WHERE Customers_table.Index = Orders_table.Customer_ID
This should give you something that looks like this:
Johnson Jack 4455
Johnson Jack 4455
Johnson Jack 7690
Smith John 3490
Smith John 5512
Again, the formatting may be different depending on the system you are working with but it’s the information that counts.
Python and SQLite
Starting with v2.5, Python has included SQLite, a light-weight SQL library. SQLite is written in C, so it’s quick. It also creates the database in a single file, which makes implementing a DB fairly simple; you don’t have to worry about all the issues of having a DB spread across a server. However, it does mean that SQLite is better suited to either development purposes or small, stand-alone applications. If you are planning on using your Python program for large-scale systems, you’ll want to move to a more robust database, such as PostgreSQL or MySQL.
However, this doesn’t mean SQLite isn’t useful. It’s good for prototyping your application before you throw in a full-blown DB; that way you know your program works and any problems are most likely with the DB implementation. It’s also good for small programs that don’t need a complete DB package with its associated overhead.
So, how do you use SQLite with Python? I’ll show you.
Creating an SQLite DB
Because SQLite is built into Python, you simply import it like any other library. Once imported, you have to make a connection to it; this creates the database file. A cursor is the object within SQLite that performs most of the functions you will be doing with the DB.
import sqlite3 #SQLite v3 is the version currently included with Python
connection = sqlite3.connect(“Hand_tools.db”) #The .db extension isn’t required but is helpful
cursor = connection.cursor()
#mem_conn = sqlite3.connect(“:memory:”) #Alternative DB created only in memory
#cursor = mem_conn.cursor()
cursor.execute(“””CREATE TABLE Tools #Listing the table items vertically helps with clarity
(id INTEGER PRIMARY KEY,
name TEXT,
size TEXT,
price INTEGER)”””)
for item in (
(None, “Knife”, “Small”, 15), #The comma at the end is necessary to separate tuple items
(None, “Machete”, “Medium”, 35),
(None, “Axe”, “Large”, 55),
(None, “Hatchet”, “Small”, 25),
(None, “Hammer”, “Small”, 25)
(None, “Screwdriver”, “Small”, 10),
(None, “Prybar”, “Large”, 60),
):cursor.execute(“INSERT INTO Tools VALUES (?, ?, ?, ?)”, item)
connection.commit() #Write data to database
cursor.close() #Close database
The above code makes a simple, single-table database of a collection of hand tools. Notice the question marks used to insert items into the table. The question marks are used to prevent a SQL injection attack, where a SQL command is passed to the DB as a legitimate value. The DB program will process the command as a normal, legitimate command which could delete data, change data, or otherwise compromise your DB. The question marks act as a substitution value to prevent this from occurring.
You’ll also note the ability to create a DB in memory. This is good for testing, when you don’t want to take the time to write to disc or worry about directories. If you have enough memory, you can also create the DB completely in memory for your final product; however, if you lose power or otherwise have memory problems, you lose the complete DB. I only use a RAM DB when I’m testing the initial implementation to make sure I have the syntax and format correct. Once I verify it works the way I want, then I change it to create a disc-based DB.
Pulling Data from a DB
To retrieve the data from an SQLite DB, you just use the SQL commands that tell the DB what information you want and how you want it formatted.
cursor.execute(“SELECT name, size, price FROM Tools”)
toolsTuple = cursor.fetchall()
for tuple in toolsTuple:
name, size, price = tuple #unpack the tuples
item = (“%s, %s, %d” % (name, size, price))
print item
Which returns the following list:
Knife, Small, 15
Machete, Medium, 35
Axe, Large, 55
Hatchet, Small, 25
Hammer, Small, 25
Screwdriver, Small, 10
Prybar, Large, 60
Knife, Small, 15
Machete, Medium, 35
Axe, Large, 55
Hatchet, Small, 25
Hammer, Small, 25
Screwdriver, Small, 10
Prybar, Large, 60
Alternatively, if you want to print out pretty tables, you can do something like this:
cursor.execute(“SELECT * FROM Tools”)
for row in cursor:
print “-” * 10
print “ID:”, row[0]
print “Name:”, row[1]
print “Size:”, row[2]
print “Price:”, row[3]
print “-” * 10
Which gives you this:
----------
ID: 1
Name: Knife
Size: Small
Price: 15
----------
----------
ID: 2
Name: Machete
Size: Medium
Price: 35
----------
----------
ID: 3
Name: Axe
Size: Large
Price: 55
----------
----------
ID: 4
Name: Hatchet
Size: Small
Price: 25
----------
----------
ID: 5
Name: Hammer
Size: Small
Price: 25
----------
----------
ID: 6
Name: Screwdriver
Size: Small
Price: 10
----------
----------
ID: 7
Name: Prybar
Size: Large
Price: 60
----------
----------
ID: 8
Name: Knife
Size: Small
Price: 15
----------
----------
ID: 9
Name: Machete
Size: Medium
Price: 35
----------
----------
ID: 10
Name: Axe
Size: Large
Price: 55
----------
----------
ID: 11
Name: Hatchet
Size: Small
Price: 25
----------
----------
ID: 12
Name: Hammer
Size: Small
Price: 25
----------
----------
ID: 13
Name: Screwdriver
Size: Small
Price: 10
----------
----------
ID: 14
Name: Prybar
Size: Large
Price: 60
----------
Obviously, you can mess around with the formatting to present the information as you desire, such as giving columns with headers, including or removing certain fields, etc.
SQLite Database Files
SQLite will try to recreate the database file every time you run the program. If the DB file already exists, you will get an “OperationalError” exception stating that the file already exists. The easiest way to deal with this is to simply catch the exception and ignore it.
try:
cursor.execute(“CREATE TABLE Foo (id INTEGER PRIMARY KEY, name TEXT)”)
except sqlite3.OperationalError:
pass
This will allow you to run your database program multiple times (such as during creation or testing) without having to delete the DB file after every run.
You can also use a similar try/except block when testing to see if the DB file already exists; if the file doesn’t exist, then you can call the DB creation module. This allows you to put the DB creation code in a separate module from your “core” program, calling it only when needed.