DIRISA: A National Data Infrastructure for Digital Humanities
DH-IGNITE
19 October 2022
About DIRISA & NICIS
DIRISA�A national initiative enabling and supporting data driven research
“Researchers deposit, find and access relevant data in the DIRISA Data Commons. They share, reuse and combine data from other domains with their own research in new ways”
Core
services
Networked
resources
Skills & expertise
Computing Services (CHPC)
Networking Services (SANReN)
Data Services (DIRISA)
Data based research environments (Cloud)
Materials & Manuf.
Energy
Earth & Environment
Phy Sci & Eng.
Humans & Society
Health, Bio & Food
DIRISA Objectives and Activities
3
Build research data infrastructure
Develop skills and expertise
Advocate and coordinate
Strategic input
South African Research Data Commons
4
Authenticate DIRISA user
Research Data Management and Data Based Research Services
DShare
Register at DIRISA
Data Management Planning: DMP_SA Tool
Create data management plans: https://secure.dirisa.ac.za/SADMPTool/
“Data is the new gold” �National Investment in Data
SKA projected budget
€ 2 billion to 2020
�€ 650 million for Phase 1
SA so far: R2 billion
“We should get more value from our investments in data” [DST Minister Pandor, 2016]
Data Connects Disciplines
Data Attribution
The Open (Research) Data Mindset
Data Access Model: Open by Default
10
Closed Shared Open
Internal access
Named access
Group based access
Public access
Anyone
Personal Private Public
Small Medium Big
Thank you
Dr Anwar Vahed
NICIS – DIRISA
avahed@csir.ac.za
Data Ecosystem, Data Visibility
Well managed data
Funder
(Private, Public)
Publisher
(Profit, Non-profit)
Repository / Long-Term Archive
Data Steward / Data Manager
Researcher, Collector
Library
Research has changed
DIRISA Activities
South African National Data Commons
Tier 3 (Institutional)
Tier 2 (Regional/Thematic)
Tier 1 (National)
Tier 0 (Global)
CERN, SKA
ARDC (Australia)
Nectar
ANDS
JISC (UK)
EUDAT (EU)
NICIS
SANSA
SAEON
Ilifu
IDIA
H3ABioNet
Tier 1 Conceptual Architecture
40 PB
2 PB
Archival data & staging; DevOps
8 (16) PB
Active data: near real time interactive access
0.5 PB
Services & staging between DIRISA and CHPC storage systems
Storage Virtualisation Service
CHPC Lustre or Posix storage systems
CHPC compute
systems
* PB
Software defined storage hierarchy
iRODS
DIRISA cloud portal
High Level Architecture
Distributed Data Clouds Management (iRODS, OpenStack , Ceph, Resonant,…)
Deposit iRODS client
Data Cloud Interface
T2/3
Regional/Other
8 PB
2 PB WOS
Service and Portal Infrastructure
DEPOSIT | DISCOVERY | APPLICATION
DOI: SAFIRE�RA…
DMP�tool
RDM services
WebDav
ORCID, Re3data…
Registries
Data Objects
Services
Users
Data �Staging
CHPC
T2/3
T2/3
T2/3
40 PB
Collaborators
EUDAT
ARDC
UK DA
JISC
Data.gov
NIST
Hardware
Middleware
Service app’s
In Conclusion
Incentive to produce award worthy data set
Set benchmark for good research practice
Improve research practice
More good quality data
More and more diverse research
Research Data Value Chain
Phenomena
Simulation�models
Instruments, Sensors & Humans
Data collection tools
Research data repositories
Research Analyses & Visualisation
Innovation
Publication Provenance
Improving Return on Data
Research Ecosystems: �cross & multi disciplinary �research
RDM Services: harmonised data management
Federated Data Infrastructure: observations (models and measurements)
Skills and expertise
SAEON
SANSA
StatsSA
SUN
DIRISA
e-Research Environments
Data Management Services
National Data Infrastructure