LEM data flow
ESA WorldCover tile�File type: geotiff�Dims: �Location: AWS public url
EO data�File type: geotiff�Dims: 256 x 256 x 18 b x 24 m�Location: gs://lem-tifs/tifs�Spec: 3040 files at 70-90mb�Total: 220.06 GB
�Notes:�The entire tile is a polygon not a box (e.g. nan ocean data is skipped)
�256x256 dimensions must be used to allow file to fit in tif-to-np cloud function memory
Sampled EO data�File type: npy�Dims: 18 b x 24 m�Source: tif-to-np cloud function�Location: gs://lem-assets/npy�Spec: 1,145,889m files at 3.5kb�Total: 3.82 GB��Notes: �Npy files are sampled every 10 pixels why is the total size reduction nearly 60x
Theories: nans + lighter encoding + cloud function scaling, float32
Webdataset shard�File type: tar�Source: Cloud VM�Location: gs://lem-assets/tars�Spec: 1 tar file at 4.4GB (in 1-10 recommended range)��Notes:�Amount of time it takes to download the data is frustrating because it’s so little
Training
Earth�Engine�[fetching] (~24 hrs)
Tif-to-np function [sampling] (mins)
Cloud VM script
[packaging]�(~3hrs)
LEM data flow
ESA WorldCover tile�File type: geotiff�Dims: �Location: AWS public url
EO data�File type: geotiff�Dims: 256 x 256 x 18 b x 24 m�Location: gs://lem-tifs/tifs�Spec: 3040 files at 70-90mb�Total: 220.06 GB
�Notes:�The entire tile is a polygon not a box (e.g. nan ocean data is skipped)
�256x256 dimensions must be used to allow file to fit in tif-to-np cloud function memory
Sampled EO data�File type: npy�Dims: 600 x 18 b x 24 m�Source: tif-to-np cloud function�Location: gs://lem-assets/npy�Spec: 1884 files at 2MB�Total: 3.82 GB��Notes: �Npy files are sampled every 10 pixels why is the total size reduction nearly 60x
Theories: nans + lighter encoding + cloud function scaling, float32
Webdataset shard�File type: tar�Source: Cloud VM�Location: gs://lem-assets/tars�Spec: 1 tar file at 3.82GB (in 1-10 recommended range)��
Training
Earth�Engine�[fetching] (~24 hrs)
Tif-to-np function [sampling] (mins)
Cloud VM script
[packaging]�(mins)
LEM data flow
2022-11-28
Training
1
0
2
3
4
5
S1_S2_ERA5_STRM( EEPipeline )
WorldCover2020( EEPipeline )
LEM data flow
2022-11-28