1 of 29

Olivia Koski

Insight Data Engineering Fellowship

New York 2020

WINDFINDER

Unleash The Power of Wind

photo credit: Matt Artz via unsplash

2 of 29

WINDFINDER

Demand for Renewable Energy Is Growing

  • Wind power provided 6.5 % of nation’s energy in 2018
    • 26 million homes
  • Provides 20% of energy in 6 states
  • 35% of U.S. electricity by 2050
  • Large companies like AT&T, Walmart, Shell, Exxon are buying wind at record levels to offset carbon footprint

WINDFINDER

3 of 29

Energy Entrepreneurs Rely on Big Data

Minimum sustained wind speed greater than 13 mph required to produce power

GRIB

netCDF

HDF

GeoTIFF

WINDFINDER

4 of 29

National Renewable Energy Laboratory WIND Toolkit

x index (0 to 2976)

y index

(0 to 1602)

t index

(0 to 61368)

  • 50 Terabytes available via Amazon’s Public Data Registry
    • 2 x 2 km spatial resolution
    • 1 hour time resolution
    • 7 years of data
  • HDF5 files (hierarchical data format)
    • Good at large, heterogeneous, complex datasets

WINDFINDER

5 of 29

Data Conversion

50 Terabytes

HDF5

Convert to

45 Mbyte CSV

t0

t1

t2

t3

t4

t

y

x

y

x

t

v (x, y)

1602 x 2975 grid

4,765,950 data points

per snapshot

WINDFINDER

6 of 29

Pipeline

Ingestion

Processing

Results

Visualization

WINDFINDER

7 of 29

WINDFINDER

8 of 29

Olivia Koski

  • BS Physics CU-Boulder, MA Journalism NYU
  • Former laser engineer at Lockheed Martin
  • Entrepreneur with lots of client-facing experience
  • PMP Certified
  • Wrote a book about space vacations!

WINDFINDER

9 of 29

15 m/s = 33 mph

25 m/s = 55 mph

WINDFINDER

10 of 29

Thank You!

Q & A

WINDFINDER

11 of 29

WINDFINDER

12 of 29

Where to Build a Wind Turbine?

Coastal Areas / The Ocean

Rounded Hills

Open Plains

> 50 - 60 % capacity factor

15 - 50 % capacity factor

WINDFINDER

13 of 29

Should You Build a Wind Turbine?

Cost to build a turbine

  • 2 MW turbine will cost $3 - $4 million to purchase and install

Expected Earnings

  • Power Purchase Agreements range from 0.03 - 0.10 per kWh
  • For 2 MW turbine operating at 25 %, revenue range is $131,400 - $438,000 per year

Return on Investment

  • Turbine lifespan is 20 - 25 years
  • Assume 10 - 15 years to pay off with no subsidies
  • Income of $1M - $5M over ten year period after pay off.

WINDFINDER

14 of 29

Wind: A Growth Industry?

Government Subsidies

  • $163.9 billion provided to companies in form of federal loans
  • $9.4 billion federal grants and tax credits
  • $2.9 billion provided by state and local governments

Consumer Demand

  • Non-utility customers like AT&T, Walmart, ExxonMobil and Shell Energy purchased a record 4,203 MW of wind power capacity in 2018 through power purchase agreements (PPA)

Climate Change and Energy Independence

  • Renewable energy reduces carbon emissions
  • Wind power usage reduces reliance on imported energy

WINDFINDER

15 of 29

Turbine Spacing

GE 2 megawatt onshore model

D = 433 feet

1D

4D

10D

1 - 1.5 acres of land per turbine

WINDFINDER

16 of 29

How Do Turbines Work?

Taller towers, steadier winds

Larger blade captures more energy

P = π/2 * r² * v³ * ρ * η

one watt is calculated as 1 W = 1 kg * m² / s³

WINDFINDER

17 of 29

Turbine Power Generation

1 megawatt or 1 MW = one million watts

Power is measured in MWh or kWh (power per hour)

Ideal wind speed range @ max power (2 MW) = 27 to 55 mph (12 to 24 m/s)

Ideal : 30 - 40% Reality: 15 - 30 % Betz Limit (Best): 59 %

Assuming 25 % capacity factor

2 MW x 365 days x 24 hours x 25 % = 4,380MWh or 4,380,000kWh per year

WINDFINDER

18 of 29

HDF5 -> CSV

2013

2012

2011

2010

2009

2008

2007

50 Tbytes HDF5

4 Tbytes HDF5

37 attributes

7 years every hour

2 x 2 km

37 attributes

12 months of data

2 x 2 km

45 Mbyte

time slice

1 attribute

4,765,950 data points

CSV

WINDFINDER

19 of 29

Data Transformation

x0

x1

...

xn

y0

v00

v01

...

v0n

y1

v01

v11

...

v1n

ym

v0m

v1m

...

vmn

y

x

v

t

y0

0

0

v00

0

y0

0

1

v01

1

y0

0

2

v02

2

ym

m

2

vm2

3

timeslice example data at t = 0

transformation to get statistics

over y, x, t values

m = 1602 x n = 2795

data sampled at 2 km x 2 km resolution

20 seconds per column per 117 files with Spark processing

20 of 29

Map y, x -> lat, long and t -> datetime

Map t index -> datetime

Add column to dataframe

in Spark with datetime stamp

Map y, x index -> lat, long

Add lat, long coordinates after

calculating top speeds for efficiency

y

x

v

datetime

lat

long

0

0

v000

MMDDYYYY

0

0

v001

MMDDYYYY

..

..

..

..

m

n

vmn792

MMDDYYYY

ADD COLUMNS

In PostgreSQL after top wind speeds tabulated

WINDFINDER

21 of 29

Throughput

Ingestion

Processing

Results

Visualization

~5 minutes per timeslice

to extract full res wind grid

1.6 hours

At 20 km resolution

117 time slices ~5 Gb

3 Gb database

~ minutes to perform queries and convert y, x, to lat, long

WINDFINDER

22 of 29

Downsample to improve query performance

0

1

...

9

...

2975

0

v00

v01

...

v09

...

v02975

1

v10

v11

...

v19

...

v12975

Downsampling

~20 seconds per transformation

  • 16.4 hours to transform and load 117 files into PostgreSQL table
    • 2976
    • 2 km x 2 km resolution
    • 558M rows
    • 28.3 Gbyte table

  • 1.6 hours to transform and load 117 files into PostgreSQL table
    • 297
    • 2 km x 20 km resolution
    • 55.8M rows at resolution
    • 2.8 Gbyte table

WINDFINDER

23 of 29

Driver

m4.2xlarge

Worker

m4.2xlarge

Worker

m4.2xlarge

Worker

m4.2xlarge

  • 16.4 hours to transform 117 .csv 1602 x 2975 wind speed arrays to y, x, v, t table with 558M rows at 2 km x 2 km resolution (2976 queries), 28.3 Gbyte database

  • 1.6 hours to transform 117 arrays to y, x, v, t table with 55.8M rows at resolution of 2 km x 20 km resolution (297 queries), 2.8 Gbyte database

  • ~ 20 seconds per query (reducing data set from 2976 columns to 297 reduces processing time by 10)

  • 8 vCPU, 32 Gb memory per machine

0

1

...

9

...

2975

0

v00

v01

...

v09

...

v02975

1

v10

v11

...

v19

...

v12975

Downsampling

20 seconds per column per 117 files with Spark processing

WINDFINDER

24 of 29

WINDFINDER

Future Work

  • Investigate Spark Mlib matrix math capabilities
  • Ingest HDF5 files directly into Spark via Numpy
  • Benchmark HDF5 “traditional” cluster computing
  • Benchmark Parquet file processing
  • Scale up processing with additional attributes
  • High resolution analysis of target location (i.e. states with underdeveloped wind power industry like PA, W. Virgina, Ohio)
  • Collaboration with industry experts to better understand specific data challenges related to finding wind resource

WINDFINDER

25 of 29

Data Source

x index (0 to 2976)

y index

(0 to 1602)

t index

(0 to

61368)

1602 x 2976 x 61368 = 292,575,131,136 bits

= 1,170,300,500,000 bytes for 1 attribute (floating point)

= 43.3 Tbytes (all 37 attributes)

WINDFINDER

26 of 29

Month

Number of Files Processed

Jan

23

Feb

9

March

9

April

9

May

9

June

9

July

9

August

9

Sept

9

Oct

4

Nov

14

Dec

4

Full resolution (2 km x 2 km) Spark processing for 117 files = 16.5 hours

2 km x 20 km resolution

Spark processing for 117 files = 1.65 hours

1 year = 8,760 hours (files)

Downsample to every other hour, every other day:

1 year = 2,190 hours

Every third day, every third hour

1 year = 973.3 hours

Every fourth day, every fourth hour

1 year = 547.5 hours

2 km x 20 km resolution

Spark processing for 117 files ~ 5 hours

27 of 29

Time index

Datetime

Location

(long, lat)

windspeed_100m

0

20070101 00:00:00

-108.918,36.314

4.712059

0

20070101 00:00:00

-107.291,36.513

4.2878494

0

20070101 00:00:00

-106.201,36.631

4.428238

...

1

20070101 00:01:00

-108.918,36.314

5.426193

1

20070101 00:01:00

-107.291,36.513

4.7944565

1

20070101 00:01:00

-106.201,36.631

4.086426

28 of 29

29 of 29

Common Weather Data Formats

GRIB1: GRIdded Binary (Edition 1), World Meteorological Organization

GRIB2: GRIdded Binary (Edition 2), World Meteorological Organization

netCDF3: Network Common Data Form, (Version 3.x), Unidata (UCAR/NCAR)

netCDF4: Network Common Data Format, (Version 4.x), Unidata (UCAR/NCAR)

HDF4: Hierarchical Data Format, (Version 4.x), NCSA/NASA

HDF4-EOS2: HDF4-Earth Obseving System, (Version 2; georeferenced data)

HDF5: Hierarchical Data Format, (Version 5.x), NCSA/NASA

HDF5-EOS5: HDF5-Earth Obseving System, (Version 5; georeferenced data)

GeoTIFF: Georeferenced raster imagery

WINDFINDER