Warmup
Consider the following two tables:
What is the result of the following join?
1
a | b |
A | 1 |
B | 2 |
c | d |
3 | X |
2 | Y |
2 | Z |
table1
table2
table2.merge(table1, left_on='c', right_on='b',
how='left')
Pair
pollev.com/cse163
CSE 163
Joins / Spatial Indices
Hunter Schafer
Geopandas
3
data = geopandas.read_file('data_file.shp')
Geometry
4
Matplotlib
There are two fundamental concepts for matplotlib
The subplots method conveniently returns a new figure and axis
5
fig, axs = plt.subplots(3)
fig, [ax1, ax2, ax3] = plt.subplots(3)
Dissolve
6
Separated Data
Imagine that your data is split into multiple DataFrames and you want to combine them.
How can I print out the grading assignments?
7
ta_name | ta_id |
Ryan | 1 |
James | 2 |
Nicole | 3 |
grader_id | student_name |
2 | Flora |
3 | Paul |
1 | Wen |
3 | Andrew |
tas
grading
Combining Data
Basic Idea, write code to do something like (not real code)
This is called a join since we are joining the tables together
This is such a common task, there is a method we can call to do this
8
for t in tas:
for g in grading:
if t['ta_id'] == g['grader_id']:
print(t, g)
Pandas Join
9
tas.merge(grading, left_on='ta_id',
right_on='grader_id')
ta_name | ta_id |
Ryan | 1 |
James | 2 |
Nicole | 3 |
grader_id | student_name |
2 | Flora |
3 | Paul |
1 | Wen |
3 | Andrew |
ta_name | ta_id | grader_id | student_name |
Ryan | 1 | 1 | Wen |
James | 2 | 2 | Flora |
Nicole | 3 | 3 | Paul |
Nicole | 3 | 3 | Andrew |
Type of Join
There are interesting questions of what happens if there are rows that don’t “line up”. Different type of joins differ in how to handle this case.
Types of Joins
left.merge(right, left_on=’lcol’, right_on=’rcol’,
how=’type’)
10
Warmup
Consider the following two tables:
What is the result of the following join?
11
a | b |
A | 1 |
B | 2 |
c | d |
3 | X |
2 | Y |
2 | Z |
table1
table2
table2.merge(table1, left_on='c', right_on='b',
how='left')
Pair
pollev.com/cse163
1 min
Suppose we had two tables
Suppose we wanted to compute the number of ratings each instructors received. From rows shown above, the result should print out that Hunter received 0, Rit received 2, and Aleks received 1.
There are multiple steps, to solve this: Join the data and then a groupby instructor to get counts.
We want you to figure out the join.
12
name | id |
Hunter | 1 |
Rit | 2 |
Aleks | 3 |
... | ... |
instructor_id | rating |
2 | 5 |
3 | 5 |
2 | 5 |
... | ... |
instructors
evals
Think
pollev.com/cse163
2 min
Suppose we had two tables
Suppose we wanted to compute the number of ratings each instructors received. From rows shown above, the result should print out that Hunter received 0, Rit received 2, and Aleks received 1.
There are multiple steps, to solve this: Join the data and then a groupby instructor to get counts.
We want you to figure out the join.
13
name | id |
Hunter | 1 |
Rit | 2 |
Aleks | 3 |
... | ... |
instructor_id | rating |
2 | 5 |
3 | 5 |
2 | 5 |
... | ... |
instructors
evals
Pair
pollev.com/cse163
Geospatial Join
Goals
Notes
14
Spatial Joins
Very similar to plain old joins (how)
The key difference are you match by geo-spatial relation (op)
15
gpd.sjoin(mainland, florence, � how='inner', op='intersects')
Spatial Join - Points
Imagine we had a dataset of where�people live in England.��Want to know if someone lives�at coordinates (x, y)��How many points would we �have to search through?
16
[Point1, Point2, Point3, …]
Spatial Index
17
Spatial Index
This is a tree!
To find if a Point is in the dataset, follow the tree!
18
Spatial Index Performance
19
Technicalities
20