1 of 20

Origin (V.Guemas)

2009

2011

2013

2015

2017

2019

On CRAN

v2.8.5

R Tools

THREDDS /

2 of 20

> start <- as.Date(paste(1992, mth, "01", sep = ""), "%Y%m%d")

> end <- as.Date(paste(2012, mth, "01", sep = ""), "%Y%m%d")

> dateseq <- format(seq(start, end, by = "year"), "%Y%m%d")

> data <- Load(var = ‘tas’,

exp = list(glosea5, list(name = 'ecmwf/system4_m1'),

list(name = 'meteofrance/system5_m1')),

obs = ‘erainterim’,

sdates = dateseq, leadtimemin = 2, leadtimemax = 4,

lonmin = -15, lonmax = 45, latmin = 25, latmax = 50,

storefreq = "monthly", sampleperiod = 1, nmember = 9,

output = "lonlat", method = "bilinear",

grid =’r256x128’)

define start dates

Load it

dataset {3}

member {9}

sdate {21}

ftime {3}

lat {18}

lon {43}

dataset {3}

member {1}

sdate {21}

ftime {3}

lat {18}

lon {43}

exp$data

obs$data

RAM memory

Pick monthly 2-meter air temperature in DJF over Europe from ECMWF, glosea5 and Meteofrance experiments for 9 ensembles and the ERA-interim reanalysis, from November 1st starting dates from 1992 to 2012.

3 of 20

dataset {2}

member {9}

sdate {27}

ftime {3}

lat {17}

lon {39}

Example case

BigData issues

Computing time can raise to several hours in score computation or data retrieval.

Involved data occupies in some cases far more than the available main memory and hangs the machine.

7.7 Mbyte

dataset {5}

member {9}

sdate {27}

ftime {60}

lat {73}

lon {144}

Usual case

6.1 Gbyte

dataset {1}

member {50}

sdate {36}

ftime {120}

lat {144}

lon {288}

Big case

71.6 Gbyte

startR

retrieve data and parallel distributed processing

Solution

4 of 20

startR

retrieve data and parallel distributed processing

Top features:

  • It allows to load data with all kinds of time frequency (e.g., 6-hourly, daily, weekly, monthly...)
  • Start() accepts user-defined transformation or reordering functions to be applied to preprocess data before performing analysis.
  • It is not bound to a specific file format. Currently, it includes interface function for NetCDF format.
  • Parallelization on multiple nodes allows to load data and compute analysis:
  • To identify the amount of data
  • To define the workflow:
  • defining the operation / analysis
  • on which dimensions the analysis should be performed?
  • To execute by indicating the HPC (if desired), the nodes and the dimensions to apply parallelized computation (chunking).

startR is the only R tool tailored for the seasonal to decadal prediction framework to retrieve big data and perform parallel distributed computing.

5 of 20

startR

retrieve data and parallel distributed processing

CSTools

Climate Services Tools - MEDSCOPE Toolbox

MEDiterranean Services Chain based On Climate PrEdictions

THREDDS /

6 of 20

Origin (V.Guemas)

2009

2011

2013

2015

2017

2019

On CRAN

v2.8.5

R Tools

THREDDS /

7 of 20

> start <- as.Date(paste(1992, mth, "01", sep = ""), "%Y%m%d")

> end <- as.Date(paste(2012, mth, "01", sep = ""), "%Y%m%d")

> dateseq <- format(seq(start, end, by = "year"), "%Y%m%d")

> data <- Load(var = ‘tas’,

exp = list(glosea5, list(name = 'ecmwf/system4_m1'),

list(name = 'meteofrance/system5_m1')),

obs = ‘erainterim’,

sdates = dateseq, leadtimemin = 2, leadtimemax = 4,

lonmin = -15, lonmax = 45, latmin = 25, latmax = 50,

storefreq = "monthly", sampleperiod = 1, nmember = 9,

output = "lonlat", method = "bilinear",

grid =’r256x128’)

define start dates

Load it

dataset {3}

member {9}

sdate {21}

ftime {3}

lat {18}

lon {43}

dataset {3}

member {1}

sdate {21}

ftime {3}

lat {18}

lon {43}

data$mod

data$obs

RAM memory

Pick monthly 2-meter air temperature in DJF over Europe from ECMWF, glosea5 and Meteofrance experiments for 9 ensembles and the ERA-interim reanalysis, from November 1st starting dates from 1992 to 2012.

8 of 20

dataset {2}

member {9}

sdate {27}

ftime {3}

lat {17}

lon {39}

Example case

BigData issues

Computing time can raise to several hours in score computation or data retrieval.

Involved data occupies in some cases far more than the available main memory and hangs the machine.

7.7 Mbyte

dataset {5}

member {9}

sdate {27}

ftime {60}

lat {73}

lon {144}

Usual case

6.1 Gbyte

dataset {1}

member {50}

sdate {36}

ftime {120}

lat {144}

lon {288}

Big case

71.6 Gbyte

startR

retrieve data and parallel distributed processing

Solution

9 of 20

startR

retrieve data and parallel distributed processing

Top features:

  • It allows to load data with all kinds of time frequency (e.g., 6-hourly, daily, weekly, monthly...)
  • Start() accepts user-defined transformation or reordering functions to be applied to preprocess data before performing analysis.
  • It is not bound to a specific file format. Currently, it includes interface function for NetCDF format.
  • Parallelization on multiple nodes allows to load data and compute analysis:
  • To identify the amount of data
  • To define the workflow:
  • defining the operation / analysis
  • on which dimensions the analysis should be performed?
  • To execute by indicating the HPC (if desired), the nodes and the dimensions to apply parallelized computation (chunking).

startR is the only R tool tailored for the seasonal to decadal prediction framework to retrieve big data and perform parallel distributed computing.

10 of 20

startR

retrieve data and parallel distributed processing

CSTools

Climate Services Tools - MEDSCOPE Toolbox

MEDiterranean Services Chain based On Climate PrEdictions

THREDDS /

11 of 20

Top features:

  • Functions for downscaling precipitation (RainFARM), Calibration, Bias Correction, Multivariable RMSE, MultiModel Skill Scores, Quantile Plotting and

coming more.

  • Documentation in Vignette* format.

  • Development policy for contributors from different institutions:
    • Role definition: author / reviewer / coordinator
    • Extended workflow from detecting a feature to be include until its publication on CRAN.
    • Guidelines to ensure the quality, usability and interoperability of the functionalities: coding style, testing (preferably unit testing) and reviewing using Control Version System and cloud services (GitLab) which allows discussing at every step of the process.

CSTools

Climate Services Tools

*an instructive tutorial demonstrating practical uses of the software with discussion of the interpretation of the results

12 of 20

Summer 2020 or Long term

Winter-Spring 2020

Autumn 2019

Current (summer 2019)

startR 0.0.1 On CRAN

CSTools 1.0.1

CSTools 1.1.0

startR 0.1.3 On GitLab

startR 0.1.4 on CRAN

  • Documentation
  • Downscaling based on Analogs, Save, Calibration, multiEOFS

startR 0.1.5

  • Compatibility break
  • Rewrite all functions for N-dimensional arrays with named dimensions
  • Compatible with startR

s2dverification 3.0.0

s2dverification 2.8.6

  • Documentation in roxygen2 format
  • Bug fixes
  • PlotMatrix()

  • Interface with other file formats
  • Improvements in parallelization

s2dverification 3.1.0

  • Integration new functionalities
  • Visualization
  • ….

CSTools 1.2.0

  • More functionalities

s2dverification 2.8.5

13 of 20

Summer 2021 or Long term

Winter-Spring 2021

Autum-Winter 2020

Current (autumn 2020)

CSTools 3.0.1

CSTools 4.0.0

startR 2.1.0

  • New developments:�regriding, ...
  • All functions promised
  • Fixes for the manuscript

CSTools

s2dverification 2.8.7

  • Fixes: Load, Clim,...
  • 2 years of support after the project. BSC is the copyright owner.
  • To be deprecated

startR 3.0.0

  • ++R workflow manager, multiple steps in AddStep

s2dverification 2.8.6

s2dv 1.0.1

  • Fixes: Clim,..
  • Add new functions from s2dverification
  • New 6 functions

  • Finish all the transformation of s2dverification functions

s2dv 1.0.2

CSTools 4.0.1

  • Fixes

s2dverification 2.8.8

  • Refine plotting function (use MapGenerator)

s2dv 2.0.0

startR 2.0.4

  • Fixes shm/dev, metadata ...

s2dv 0.0.1

startR 2.0.1

14 of 20

startR

retrieve data and parallel distributed processing

CSTools

Climate Services Tools

s2dverification

analysis and visualization

Forecast calibration,

bias correction, statistical and stochastic downscaling, optimal forecast combination and multivariate verification, as well as basic and advanced tools to obtain tailored products.

Computation

of statistics

and skill scores against observations and visualisation of data and results.

easyNCDF

multiApply

Automatic loading (from disk or store to RAM) and arrangement of multi-dimensional data sets.

Implements the MapReduce paradigm on HPCs in a way transparent to the user and specially oriented to complex multidimensional datasets.

Free

download

Contact Us:

An-Chi Ho

(Research Engineer)

an.ho@bsc.es

Nuria Pérez-Zanón (Postdoctoral Researcher)

nuria.perez@bsc.es

Find there vignettes, code, wiki, documentation, issue tracker and much more!!

15 of 20

16 of 20

CSTools

Climate Services Tools

Example: PlotForecastPDF applied to three seasonal surface wind speed forecasts.

Example: PlotMostLikelyQuantileMap() of 10-m wind speed for ECMWF System 4 seasonal forecast for DJF 2016-2017.

17 of 20

18 of 20

s2dverification

BSC-Earth GitLab -- Open project�https://earth.bsc.es/gitlab/es/s2dverification* Find wiki, vignettes, and issue tracker here

CRAN�https://CRAN.R-project.org/package=s2dverification* Find documentation here

CSTools

CRAN�https://CRAN.R-project.org/package=CSTools* Find the vignettes and documentation here

Useful information

Contact Us:

An-Chi Ho (Research Engineer)

an.ho@bsc.es

Nuria Pérez-Zanón (Postdoctoral Researcher)

nuria.perez@bsc.es

startR

BSC Earth GitLab -- Open projects�https://earth.bsc.es/gitlab/es/s2dverification* Find wiki, vignettes, and issue tracker here

19 of 20

startR

retrieve data and parallel distributed processing

CSTools

Climate Services Tools

s2dverification

analysis and visualization

Forecast calibration,

bias correction, statistical and stochastic downscaling, optimal forecast combination and multivariate verification, as well as basic and advanced tools to obtain tailored products.

Computation of statistics and skill scores against observations and visualisation of data and results

easyNCDF

multiApply

Automatic loading (from disk or store to RAM) and arrangement of multi-dimensional data sets

Implements the MapReduce paradigm on HPCs in a way transparent to the user and specially oriented to complex multidimensional datasets.

20 of 20