Olivia Koski
Insight Data Engineering Fellowship
New York 2020
WINDFINDER
Unleash The Power of Wind
photo credit: Matt Artz via unsplash
WINDFINDER
Demand for Renewable Energy Is Growing
WINDFINDER
Energy Entrepreneurs Rely on Big Data
Minimum sustained wind speed greater than 13 mph required to produce power
GRIB
netCDF
HDF
GeoTIFF
WINDFINDER
National Renewable Energy Laboratory WIND Toolkit
x index (0 to 2976)
y index
(0 to 1602)
t index
(0 to 61368)
WINDFINDER
Data Conversion
50 Terabytes
HDF5
Convert to
45 Mbyte CSV
t0
t1
t2
t3
t4
t
y
x
y
x
t
v (x, y)
1602 x 2975 grid
4,765,950 data points
per snapshot
WINDFINDER
Pipeline
Ingestion
Processing
Results
Visualization
WINDFINDER
WINDFINDER
Olivia Koski
WINDFINDER
15 m/s = 33 mph
25 m/s = 55 mph
WINDFINDER
Thank You!
Q & A
WINDFINDER
WINDFINDER
Where to Build a Wind Turbine?
Coastal Areas / The Ocean
Rounded Hills
Open Plains
> 50 - 60 % capacity factor
15 - 50 % capacity factor
WINDFINDER
Should You Build a Wind Turbine?
Cost to build a turbine
Expected Earnings
Return on Investment
WINDFINDER
Wind: A Growth Industry?
Government Subsidies
Consumer Demand
Climate Change and Energy Independence
WINDFINDER
Turbine Spacing
GE 2 megawatt onshore model
D = 433 feet
1D
4D
10D
1 - 1.5 acres of land per turbine
WINDFINDER
How Do Turbines Work?
Taller towers, steadier winds
Larger blade captures more energy
P = π/2 * r² * v³ * ρ * η
one watt is calculated as 1 W = 1 kg * m² / s³
WINDFINDER
Turbine Power Generation
1 megawatt or 1 MW = one million watts
Power is measured in MWh or kWh (power per hour)
Ideal wind speed range @ max power (2 MW) = 27 to 55 mph (12 to 24 m/s)
Ideal : 30 - 40% Reality: 15 - 30 % Betz Limit (Best): 59 %
Assuming 25 % capacity factor
2 MW x 365 days x 24 hours x 25 % = 4,380MWh or 4,380,000kWh per year
WINDFINDER
HDF5 -> CSV
2013
2012
2011
2010
2009
2008
2007
50 Tbytes HDF5
4 Tbytes HDF5
37 attributes
7 years every hour
2 x 2 km
37 attributes
12 months of data
2 x 2 km
45 Mbyte
time slice
1 attribute
4,765,950 data points
CSV
WINDFINDER
Data Transformation
| x0 | x1 | ... | xn |
y0 | v00 | v01 | ... | v0n |
y1 | v01 | v11 | ... | v1n |
ym | v0m | v1m | ... | vmn |
| y | x | v | t |
y0 | 0 | 0 | v00 | 0 |
y0 | 0 | 1 | v01 | 1 |
y0 | 0 | 2 | v02 | 2 |
ym | m | 2 | vm2 | 3 |
timeslice example data at t = 0
transformation to get statistics
over y, x, t values
m = 1602 x n = 2795
data sampled at 2 km x 2 km resolution
20 seconds per column per 117 files with Spark processing
Map y, x -> lat, long and t -> datetime
Map t index -> datetime
Add column to dataframe
in Spark with datetime stamp
Map y, x index -> lat, long
Add lat, long coordinates after
calculating top speeds for efficiency
y | x | v | datetime | lat | long |
0 | 0 | v000 | MMDDYYYY | | |
0 | 0 | v001 | MMDDYYYY | | |
.. | .. | .. | .. | | |
m | n | vmn792 | MMDDYYYY | | |
ADD COLUMNS
In PostgreSQL after top wind speeds tabulated
WINDFINDER
Throughput
Ingestion
Processing
Results
Visualization
~5 minutes per timeslice
to extract full res wind grid
1.6 hours
At 20 km resolution
117 time slices ~5 Gb
3 Gb database
~ minutes to perform queries and convert y, x, to lat, long
WINDFINDER
Downsample to improve query performance
| 0 | 1 | ... | 9 | ... | 2975 |
0 | v00 | v01 | ... | v09 | ... | v02975 |
1 | v10 | v11 | ... | v19 | ... | v12975 |
Downsampling
~20 seconds per transformation
WINDFINDER
Driver
m4.2xlarge
Worker
m4.2xlarge
Worker
m4.2xlarge
Worker
m4.2xlarge
| 0 | 1 | ... | 9 | ... | 2975 |
0 | v00 | v01 | ... | v09 | ... | v02975 |
1 | v10 | v11 | ... | v19 | ... | v12975 |
Downsampling
20 seconds per column per 117 files with Spark processing
WINDFINDER
WINDFINDER
Future Work
WINDFINDER
Data Source
x index (0 to 2976)
y index
(0 to 1602)
t index
(0 to
61368)
1602 x 2976 x 61368 = 292,575,131,136 bits
= 1,170,300,500,000 bytes for 1 attribute (floating point)
= 43.3 Tbytes (all 37 attributes)
WINDFINDER
Month | Number of Files Processed |
Jan | 23 |
Feb | 9 |
March | 9 |
April | 9 |
May | 9 |
June | 9 |
July | 9 |
August | 9 |
Sept | 9 |
Oct | 4 |
Nov | 14 |
Dec | 4 |
Full resolution (2 km x 2 km) Spark processing for 117 files = 16.5 hours
2 km x 20 km resolution
Spark processing for 117 files = 1.65 hours
1 year = 8,760 hours (files)
Downsample to every other hour, every other day:
1 year = 2,190 hours
Every third day, every third hour
1 year = 973.3 hours
Every fourth day, every fourth hour
1 year = 547.5 hours
2 km x 20 km resolution
Spark processing for 117 files ~ 5 hours
Time index | Datetime | Location (long, lat) | windspeed_100m |
0 | 20070101 00:00:00 | -108.918,36.314 | 4.712059 |
0 | 20070101 00:00:00 | -107.291,36.513 | 4.2878494 |
0 | 20070101 00:00:00 | -106.201,36.631 | 4.428238 |
... | | | |
1 | 20070101 00:01:00 | -108.918,36.314 | 5.426193 |
1 | 20070101 00:01:00 | -107.291,36.513 | 4.7944565 |
1 | 20070101 00:01:00 | -106.201,36.631 | 4.086426 |
Common Weather Data Formats
GRIB1: GRIdded Binary (Edition 1), World Meteorological Organization
GRIB2: GRIdded Binary (Edition 2), World Meteorological Organization
netCDF3: Network Common Data Form, (Version 3.x), Unidata (UCAR/NCAR)
netCDF4: Network Common Data Format, (Version 4.x), Unidata (UCAR/NCAR)
HDF4: Hierarchical Data Format, (Version 4.x), NCSA/NASA
HDF4-EOS2: HDF4-Earth Obseving System, (Version 2; georeferenced data)
HDF5: Hierarchical Data Format, (Version 5.x), NCSA/NASA
HDF5-EOS5: HDF5-Earth Obseving System, (Version 5; georeferenced data)
GeoTIFF: Georeferenced raster imagery
WINDFINDER