1 of 17

SimpleDB Overview

CSE 444 – Section 1

1

2 of 17

Today…

  • Demo Git/Eclipse Setup
  • Go through an overview of SimpleDB

2

3 of 17

3

Local Repository

/home/jortiz16

GitLab Individual Repository

git@.../simple-db-jortiz16

GitLab Course Repository

git@.../simple-db

Forked (by you)

Origin Remote

Cloned (by you)

Upstream Remote

Added (by you)

git push

git pull

git pull

upstream

master

Git

4 of 17

What you should NOT do:

  • Modify given classes
    • Removing, renaming, relocating to other packages
  • Modify given methods
    • Changing parameters or return types
  • Use third-party libraries
    • Except the ones under lib/directory
    • You can do everything using regular Java libraries

4

5 of 17

What you CAN do:

  • Add new classes/interfaces/methods/packages
    • Watch out for name conflicts with future labs!
    • Safer choice: use new packages (best) or inner classes (meh)
  • Re-implement provided methods
    • Just don’t destroy correctness or specification!
  • Find bugs!

5

6 of 17

What you CAN do (continued):

  • System test cases
    • Under test/systemtest
    • We’ll grade using additional tests
  • Write up
    • Explain why do you implement in that way
  • We’ll read your code
    • Reading horrible code is horrible, so spend some time polishing
    • Passing all the test cases may not necessary mean you’ll get a high score
      • Sanity check, not a final grade

6

7 of 17

Setting up SimpleDB

Any questions or concerns?

7

8 of 17

Overview of SimpleDB

8

9 of 17

Database

  • A single database
    • One schema
    • List of tables
  • References to major components
    • Global instance of Catalog
    • Global instance of BufferPool

9

10 of 17

Catalog

  • Stores metadata about tables in the database
    • void addTable(DbFile d, TupleDesc d)
    • DbFile getTable(int tableid)
    • TupleDesc getTupleDesc(int tableid)
  • NOT persisted to disk
    • Catalog info is reloaded every time SimpleDB starts up

10

11 of 17

BufferPool

  • The ONLY bridge between data-processing�operators and actual data files
    • Strict interface for physical independence!
  • Data files are never accessed directly
  • Later labs:
    • Locking for transactions
    • Flushing pages for recovery

11

12 of 17

Data Types

  • Integer
    • Type.INT_TYPE
    • 4 byte width
  • Fixed-length Strings
    • Type.STRING_TYPE
    • 128 bytes long (Type.STRING_LEN)
    • Do not change this constant!

12

13 of 17

OpIterator

  • Ancestor class for all operators
    • Join, Project, SeqScan, etc…
  • Each operator has methods:
    • open(), close(), getTupleDesc(), hasNext(), next(), rewind()
  • Iterator model: Chain iterators together

13

14 of 17

HeapFile

  • Main class that organizes the physical storage of tables
  • Collection of HeapPages on disk
    • One HeapFile for each table
    • Fixed-size pages means efficient lookup of pages

14

HeapPage #1

HeapPage #2

HeapPage #3

15 of 17

15

16 of 17

HeapPage

  • A chunk of data that can reside in the BufferPool
  • Format: Header + Tuples
    • # of 1 bits in Bitmap = # of active tuples on page
  • Fixed size: BufferPool.PAGE_SIZE

16

Header Bitmap

Tuple #1

Tuple #2

.

.

.

17 of 17

Questions?

17