1 of 20

Origin (V.Guemas)

2009

2011

2013

2015

2017

2019

On CRAN

v2.8.5

R Tools

THREDDS /

2 of 20

> start <- as.Date(paste(1992, mth, "01", sep = ""), "%Y%m%d")

> end <- as.Date(paste(2012, mth, "01", sep = ""), "%Y%m%d")

> dateseq <- format(seq(start, end, by = "year"), "%Y%m%d")

> data <- Load(var = ‘tas’,

exp = list(glosea5, list(name = 'ecmwf/system4_m1'),

list(name = 'meteofrance/system5_m1')),

obs = ‘erainterim’,

sdates = dateseq, leadtimemin = 2, leadtimemax = 4,

lonmin = -15, lonmax = 45, latmin = 25, latmax = 50,

storefreq = "monthly", sampleperiod = 1, nmember = 9,

output = "lonlat", method = "bilinear",

grid =’r256x128’)

define start dates

Load it

dataset {3}

member {9}

sdate {21}

ftime {3}

lat {18}

lon {43}

dataset {3}

member {1}

sdate {21}

ftime {3}

lat {18}

lon {43}

exp$data

obs$data

RAM memory

Pick monthly 2-meter air temperature in DJF over Europe from ECMWF, glosea5 and Meteofrance experiments for 9 ensembles and the ERA-interim reanalysis, from November 1st starting dates from 1992 to 2012.

3 of 20

dataset {2}

member {9}

sdate {27}

ftime {3}

lat {17}

lon {39}

Example case

BigData issues

Computing time can raise to several hours in score computation or data retrieval.

Involved data occupies in some cases far more than the available main memory and hangs the machine.

7.7 Mbyte

dataset {5}

member {9}

sdate {27}

ftime {60}

lat {73}

lon {144}

Usual case

6.1 Gbyte

dataset {1}

member {50}

sdate {36}

ftime {120}

lat {144}

lon {288}

Big case

71.6 Gbyte

startR

retrieve data and parallel distributed processing

Solution

4 of 20

startR

retrieve data and parallel distributed processing

Top features:

It allows to load data with all kinds of time frequency (e.g., 6-hourly, daily, weekly, monthly...)
Start() accepts user-defined transformation or reordering functions to be applied to preprocess data before performing analysis.
It is not bound to a specific file format. Currently, it includes interface function for NetCDF format.
Parallelization on multiple nodes allows to load data and compute analysis:
To identify the amount of data
To define the workflow:
defining the operation / analysis
on which dimensions the analysis should be performed?
To execute by indicating the HPC (if desired), the nodes and the dimensions to apply parallelized computation (chunking).

startR is the only R tool tailored for the seasonal to decadal prediction framework to retrieve big data and perform parallel distributed computing.

5 of 20

startR

retrieve data and parallel distributed processing

CSTools

Climate Services Tools - MEDSCOPE Toolbox

MEDiterranean Services Chain based On Climate PrEdictions

THREDDS /

6 of 20

Origin (V.Guemas)

2009

2011

2013

2015

2017

2019

On CRAN

v2.8.5

R Tools

THREDDS /

7 of 20

> start <- as.Date(paste(1992, mth, "01", sep = ""), "%Y%m%d")

> end <- as.Date(paste(2012, mth, "01", sep = ""), "%Y%m%d")

> dateseq <- format(seq(start, end, by = "year"), "%Y%m%d")

> data <- Load(var = ‘tas’,

exp = list(glosea5, list(name = 'ecmwf/system4_m1'),

list(name = 'meteofrance/system5_m1')),

obs = ‘erainterim’,

sdates = dateseq, leadtimemin = 2, leadtimemax = 4,

lonmin = -15, lonmax = 45, latmin = 25, latmax = 50,

storefreq = "monthly", sampleperiod = 1, nmember = 9,

output = "lonlat", method = "bilinear",

grid =’r256x128’)

define start dates

Load it

dataset {3}

member {9}

sdate {21}

ftime {3}

lat {18}

lon {43}

dataset {3}

member {1}

sdate {21}

ftime {3}

lat {18}

lon {43}

data$mod

data$obs

RAM memory

Pick monthly 2-meter air temperature in DJF over Europe from ECMWF, glosea5 and Meteofrance experiments for 9 ensembles and the ERA-interim reanalysis, from November 1st starting dates from 1992 to 2012.

8 of 20

dataset {2}

member {9}

sdate {27}

ftime {3}

lat {17}

lon {39}

Example case

BigData issues

Computing time can raise to several hours in score computation or data retrieval.

Involved data occupies in some cases far more than the available main memory and hangs the machine.

7.7 Mbyte

dataset {5}

member {9}

sdate {27}

ftime {60}

lat {73}

lon {144}

Usual case

6.1 Gbyte

dataset {1}

member {50}

sdate {36}

ftime {120}

lat {144}

lon {288}

Big case

71.6 Gbyte

startR

retrieve data and parallel distributed processing

Solution

9 of 20

startR

retrieve data and parallel distributed processing

Top features:

It allows to load data with all kinds of time frequency (e.g., 6-hourly, daily, weekly, monthly...)
Start() accepts user-defined transformation or reordering functions to be applied to preprocess data before performing analysis.
It is not bound to a specific file format. Currently, it includes interface function for NetCDF format.
Parallelization on multiple nodes allows to load data and compute analysis:
To identify the amount of data
To define the workflow:
defining the operation / analysis
on which dimensions the analysis should be performed?
To execute by indicating the HPC (if desired), the nodes and the dimensions to apply parallelized computation (chunking).

startR is the only R tool tailored for the seasonal to decadal prediction framework to retrieve big data and perform parallel distributed computing.

10 of 20

startR

retrieve data and parallel distributed processing

CSTools

Climate Services Tools - MEDSCOPE Toolbox

MEDiterranean Services Chain based On Climate PrEdictions

THREDDS /

11 of 20

Top features:

Functions for downscaling precipitation (RainFARM), Calibration, Bias Correction, Multivariable RMSE, MultiModel Skill Scores, Quantile Plotting and

coming more.

Documentation in Vignette* format.

Development policy for contributors from different institutions:

Role definition: author / reviewer / coordinator
Extended workflow from detecting a feature to be include until its publication on CRAN.
Guidelines to ensure the quality, usability and interoperability of the functionalities: coding style, testing (preferably unit testing) and reviewing using Control Version System and cloud services (GitLab) which allows discussing at every step of the process.

CSTools

Climate Services Tools

*an instructive tutorial demonstrating practical uses of the software with discussion of the interpretation of the results

12 of 20

Summer 2020 or Long term

Winter-Spring 2020

Autumn 2019

Current (summer 2019)

startR 0.0.1 On CRAN

CSTools 1.0.1

CSTools 1.1.0

startR 0.1.3 On GitLab

startR 0.1.4 on CRAN

Documentation

Downscaling based on Analogs, Save, Calibration, multiEOFS

startR 0.1.5

Compatibility break
Rewrite all functions for N-dimensional arrays with named dimensions
Compatible with startR

s2dverification 3.0.0

s2dverification 2.8.6

Documentation in roxygen2 format
Bug fixes
PlotMatrix()

Interface with other file formats
Improvements in parallelization

s2dverification 3.1.0

Integration new functionalities
Visualization
….

CSTools 1.2.0

More functionalities

s2dverification 2.8.5

13 of 20

Summer 2021 or Long term

Winter-Spring 2021

Autum-Winter 2020

Current (autumn 2020)

CSTools 3.0.1

CSTools 4.0.0

startR 2.1.0

New developments:�regriding, ...

All functions promised
Fixes for the manuscript

CSTools

s2dverification 2.8.7

Fixes: Load, Clim,...

2 years of support after the project. BSC is the copyright owner.

To be deprecated

startR 3.0.0

++R workflow manager, multiple steps in AddStep

s2dverification 2.8.6

s2dv 1.0.1

Fixes: Clim,..
Add new functions from s2dverification
New 6 functions

Finish all the transformation of s2dverification functions

s2dv 1.0.2

CSTools 4.0.1

Fixes

s2dverification 2.8.8

Refine plotting function (use MapGenerator)

s2dv 2.0.0

startR 2.0.4

Fixes shm/dev, metadata ...

s2dv 0.0.1

startR 2.0.1

14 of 20

startR

retrieve data and parallel distributed processing

CSTools

Climate Services Tools

s2dverification

analysis and visualization

Forecast calibration,

bias correction, statistical and stochastic downscaling, optimal forecast combination and multivariate verification, as well as basic and advanced tools to obtain tailored products.

Computation

of statistics

and skill scores against observations and visualisation of data and results.

easyNCDF

multiApply

Automatic loading (from disk or store to RAM) and arrangement of multi-dimensional data sets.

Implements the MapReduce paradigm on HPCs in a way transparent to the user and specially oriented to complex multidimensional datasets.

Free

download

https://earth.bsc.es/gitlab/es/

https://CRAN.R-project.org/

Contact Us:

An-Chi Ho

(Research Engineer)

an.ho@bsc.es

Nuria Pérez-Zanón (Postdoctoral Researcher)

nuria.perez@bsc.es

Find there vignettes, code, wiki, documentation, issue tracker and much more!!

15 of 20

16 of 20

CSTools

Climate Services Tools

Example: PlotForecastPDF applied to three seasonal surface wind speed forecasts.

Example: PlotMostLikelyQuantileMap() of 10-m wind speed for ECMWF System 4 seasonal forecast for DJF 2016-2017.

Examples from CSTools:

PlotForecast: Each panel corresponds to a different start date. Each ensemble member (yellow circle) and the observation for that month (purple diamond) are drawn for each forecast. The probability of each tercile is shown in different colored shadows: above normal (brown), normal (grey), below normal (blue) and their value is specified on the left axis. The probabilities above 90th (below 10th) percentile are displayed with a red (blue) striped background. An asterisk marks the tercile with the highest probability.

The predictions were issued the 1st of November 2016. The most likely category and its percentage of probability to occur is shown. White colour indicates that the forecasts probabilities are below the 30% for all five categories. The reference dataset is ERA-Interim and the climatological period 1981-2015.

PlotMostLikelyQuantileMap:

17 of 20

18 of 20

s2dverification

BSC-Earth GitLab -- Open project�https://earth.bsc.es/gitlab/es/s2dverification�* Find wiki, vignettes, and issue tracker here

CRAN�https://CRAN.R-project.org/package=s2dverification�* Find documentation here

CSTools

CRAN�https://CRAN.R-project.org/package=CSTools�* Find the vignettes and documentation here

Useful information

Contact Us:

An-Chi Ho (Research Engineer)

an.ho@bsc.es

Nuria Pérez-Zanón (Postdoctoral Researcher)

nuria.perez@bsc.es

MEDSCOPE

Website

https://www.medscope-project.eu/the-project/medscope-platform/

startR

BSC Earth GitLab -- Open projects�https://earth.bsc.es/gitlab/es/s2dverification�* Find wiki, vignettes, and issue tracker here

19 of 20

startR

retrieve data and parallel distributed processing

CSTools

Climate Services Tools

s2dverification

analysis and visualization

Forecast calibration,

bias correction, statistical and stochastic downscaling, optimal forecast combination and multivariate verification, as well as basic and advanced tools to obtain tailored products.

Computation of statistics and skill scores against observations and visualisation of data and results

easyNCDF

multiApply

Automatic loading (from disk or store to RAM) and arrangement of multi-dimensional data sets

Implements the MapReduce paradigm on HPCs in a way transparent to the user and specially oriented to complex multidimensional datasets.

1 of 20

2 of 20

3 of 20

4 of 20

5 of 20

6 of 20

7 of 20

8 of 20

9 of 20

10 of 20

11 of 20

12 of 20

13 of 20

14 of 20

15 of 20

16 of 20

17 of 20

18 of 20

19 of 20

20 of 20