Venkata Chandrasekhar Nainala (Chandu)
Friedrich Schiller University -JENA
Update: January 2025
PUBLIC
PRIVATE / EMBARGO
nmrXiv
CURATION WORKFLOW
Curation workflow - Semi automatic
Human in the loop approach
RAW FILES
PROCESSED FILES ~ STANDARD FORMATS
LONG TERM ARCHIVAL
Version 3.0
Version 2.0
Version 1.0
WEB
API
AI/ML tools
CASE
SEARCH ENGINES
STEP 1: FILE UPLOAD
ARCHIVAL / PUBLISHING
STEP 2: ASSIGNMENTS & META-DATA
STEP 3: VALIDATION
Data ingestion & staging
STEP 1: FILE UPLOAD
Metadata Normalization
STEP 2: ASSIGNMENTS & META-DATA
Data Validation
Failure to meet any of the requirements - validation failures (further human attention is required)
STEP 3: VALIDATION
MIChI: Minimum Information about Chemical Investigation
Missing minimum info
File Integrity Checks
Checksums match
Missing files
Meta-Data Checks
Citation
Author
License
nmrXiv
COMPOUND
Caffeine
Caffeine - Spectra
Caffeine - Metadata
nmrXiv
U P D A T E S
nmrXiv
N E W D A T A S U B M I S S I O N
Onboarding Screen (Primer)
Upload (Drag & Drop or File Browser)
Parallel uploads ( Strict validations & error tracking)
Parallel uploads ( Strict validations & error tracking)
Missing & Corrupt file checks
Auto-Processing Spectra
Auto-Import metadata
(compound information - .mol,.sdf,.nmredata.sdf)
Auto-Import metadata
(compound information - .mol,.sdf,.nmredata.sdf)
Validation Report
Validation Report
Multi-Spectra Views
Project or Independent Sample submissions
Ontology driven - Organism details (Part as well)
Samples Overview
Embargo mode - Team Sharing
Optimised processing - notifications
nmrXiv
S E A R C H
MIChI Recommendations ( Draft Version)
https://docs.google.com/spreadsheets/d/1MxCceGO3UUAvIn-GWxxgeOUFnR34A3ileqIW3sYgZNU/edit#gid=0
Advanced Search
Project
Sample
Assay(s)
Molecule(s)
Sample study
Spectral Dataset(s)
Collection
Structure
Search
RDKit Based
~ Exact
~ Substructure
~ Similarity
Structure Search
Results
Browse all samples
reporting the compound
in one view
~ Compare spectra from
different samples
(UI updates pending)
nmrXiv
B IO S C H E M A M E T A D A T A S T R UC T U R E
BioSchemas Metadata Structure
Project
Repo
Study/
Sample
Dataset/
Spectrum
hasPart (1 => n)
hasPart (1 => n)
isPartOf (1)
isPartOf (1)
includedInDataCatalog (1)
Study
Study
ISA
ISA
Dataset
ISA
DataCatalog
ISA
Organization
CreativeWork
Person
publisher
author
citation
ChemicalSubstance: sample
about
MolecularEntity: molecules
hasBioChemEntityPart
'NMR solvent': 'NMR:1000330' | CDCl3 |
'acquisition nucleus': 'NMR:1400083' | 13C |
Other than the measurementTechnique (url), everything is a PropertyValue. Dimension, probe, Temperature, frequency, field strength, number of scans, experiment… | |
variableMeasured
Properties: inChI, inChIKey, iupacName, molecularFormula, molecularWeight, smiles, mol (hasRepresentation), Percentage composition (description)
nmrXiv Object
Bioschemas Type
Bioschemas Type
A R C H I T E C T U R E
O V E R V I E W
nmrXiv Core
integrations
Web Application
Application Database
------
NMR Database
Search
Importers /
Exporters
Cache
SPA
Front end
File format converters
AI/ML
Tools
API
Workflows
Plugins
OAuth ~ SSO / AAI
Schemas
File format converters
AI/ML
Tools
Workflows
Plugins
Schemas
File format conversions
File format converters
Prediction Service
AI/ML
HOSE Codes
Lookup tables
Prediction / CASE
Assignments
CASE
Software /
Prediction
nmrXiv
Repository
Private
Inhouse data
Prediction
Service
Assignments
AI/ML
Prediction
Model
AI/ML
Prediction
Model*
Assignments
nmrXiv Core
integrations
Web Application
Application Database
------
NMR Database
Search
Importers /
Exporters
Cache
SPA
Front end
File format converters
AI/ML
Tools
API
Workflows
Plugins
OAuth ~ SSO / AAI
ELN
A collection of powerful microservices designed to simplify NMR data processing and analysis. nmrKit offers NMR Prediction, Validation, and Depiction via the nmrium library, seamless Format Conversion using the nmr-load-save package accessible through a unified API.
FAST API
CDK
RDKit
HOSE CODE
ALATIS NS
nmr-load-save
lwreg
NMR
Processing/Format conversions
NMR Prediction / Training
Spectral Assignments Validation
Search / Depiction
PostGreSQL
Redis
Minio
Graphana
Prometheus
NN-Models (Tensor Flow)
FAST API
CDK
RDKit
HOSE CODE
ALATIS NS
nmr-load-save
lwreg
NMR
Processing/Format conversions
NMR Prediction / Training
Spectral Assignments Validation
Search / Depiction
PostGreSQL
Redis
Minio
Graphana
Prometheus
NN-Models (Tensor Flow)
FAST API
CDK
RDKit
HOSE CODE
ALATIS NS
nmr-load-save
lwreg
NMR
Processing/Format conversions
NMR Prediction / Training
Spectral Assignments Validation
Search / Depiction
PostGreSQL
Redis
Minio
Graphana
Prometheus
Deep Learning Models (Tensor Flow/PyTorch)
Data - So far…
ELN Integration
Chemotion - nmrXiv
ELN (Chemotion) - nmrXiv
Workflow
NMR Platform
New Roles
Manage Samples
Statistics
Add or update lab
Operators
Manage instruments
Statistics
Announcements
Any other roles??
Admin console access
Can access admin console to
Manage NMR platform from the
nmrXiv interface when they login
Users with the admin console access (NMR Platform) will have additional links in the dropdown on the top right corner.
Admin console options
Options to access NMR Platform in the User Admin Console
NMR Platform Management
NMR Platform Dashboard
Sample Management > Submission
nmrXiv University Private Page
Options to submit orders / Search etc.
Sample Management > Backend
Sample Details View
Options to select/upload spectra. Assignments and other meta-data
Samples overview
NMR Platform > Settings > Device Management
Manage devices (Add, Edit or Remove)
Quick links to other settings on the platform
This page evolves as we improve the platform integration
Metrics
Displays all metrics you would like to access - date range selection will be user controlled
Highlights - Recent release
Next steps (nmrKit)
Thank you
DATA REPOSITORIES
Electronic Lab
Notebooks (ELNs)
NFDI4Chem
TERMINOLOGY
SERVICES
TS4NFDI Terminology Service Suite
nmrXiv
D A T A S T R U C T U R E
Data life cycle / Versioning
https://github.com/ScienceObjectsDB/Documentation
Project 1
Project 2
Project 3
Study
Dataset
Dataset
Dataset
Dataset
Dataset
Dataset
Dataset
Sample
Assay
Spectra
M
Data Structure
Sample Study
Sample Study
https://github.com/ScienceObjectsDB/Documentation
Project 1
Project 2
Project 3
Sample Study
Sample
Assay(s)
Spectral Dataset(s)
Molecule(s)
Sample Study
Sample Study
Sample Study
Sample Study
Sample Study
Sample Study
Sample Study
Sample Study
Sample Study
Next steps
Release
nmrkit.nmrxiv.org
O N G O I N G D E V E L O P M E N T
Prediction service will be going live soon*
Subsequent releases should have file format conversions and other API
S E R V E R S I D E
B A C K E N D
Web application framework
Database
Cache
Search
Backend
Testing
CI/CD
Deployment / Testing (Maintenance)
Web application framework - PHP (MIT)
Database
Cache
Search
Testing
Selenium ~ Chrome Driver
Deployment & DevOps
Backend
S P E C T R A V I E W E R
D O C U M E N T A T I O N
D A T A I M P O R T & E X P O R T
I/O: Meta Data Model - RO-Crate
Chemotion Integration - todo
Workflow
BioSchema / DataCite
T H A N K Y O U
nmrXiv
D A T A S U B M I S S I O N
(mockup based on feedback)
Next
Drag and Drop
Sample -1 d
Sample - 2
Sample - 3
Sample - 4
Project > Study
User Authentication
STEP 1: FILE UPLOAD
nmrXiv Data Submission
Sample 1
Sample 3
Sample 2
DATA UPLOADED IN PREVIOUS STEP
STEP 2: SIMILARITY SEARCH AND SPECTRA - ATOM ASSIGNMENTS
nmrXiv Data Submission
ADD MOLECULE(S)
Instrument Data
Sample 1
PREV
NEXT
NMRShiftDB
SHERLOCK
NMRium AUTO Assignments
Mol 1 95%
Mol 2 03%
Mol 3 01%
Mol 4 -
Cancel
Next
STEP 3: META DATA (Minimum information requirements / validations will be implemented at this stage )
nmrXiv Data Submission
Next
SAMPLE DETAILS PROVIDED
Sample Preparation Protocol
Assay Protocol
Provenance
License Information*
Cancel
STEP 4: COMPLETE
nmrXiv Data Submission
CLOSE
SAMPLE DETAILS PROVIDED
ID0001 | ID0002 | ID0003 | ID0004 |
Data Set IDs
Release Date
Citation
Private
Public
Visibility
Author, 1., & Author, 2.. (2022). FAIR, consensus-driven NMR data repository and computational platform. The ultimate goal is to accelerate broader coordination and data sharing among natural product (NP) researchers by enabling storage, management, sharing and analysis of NMR data.
Download Zip
MD5 hashmap
Embed </>
Share
T E S T S I T E
Currently in pre-beta development stage
nmrXiv
D A T A S U B M I S S I O N
Next
Drag and Drop
Sample -1 d
Sample - 2
Sample - 3
Sample - 4
Project > Study
User Authentication
STEP 1: FILE UPLOAD
nmrXiv Data Submission
Sample 1
Sample 3
Sample 2
DATA UPLOADED IN PREVIOUS STEP
STEP 2: SIMILARITY SEARCH AND SPECTRA - ATOM ASSIGNMENTS
nmrXiv Data Submission
ADD MOLECULE(S)
Instrument Data
Sample 1
PREV
NEXT
NMRShiftDB
SHERLOCK
NMRium AUTO Assignments
Mol 1 95%
Mol 2 03%
Mol 3 01%
Mol 4 -
Cancel
Next
STEP 3: META DATA (Minimum information requirements / validations will be implemented at this stage )
nmrXiv Data Submission
Next
SAMPLE DETAILS PROVIDED
Sample Preparation Protocol
Assay Protocol
Provenance
License Information*
Cancel
STEP 4: COMPLETE
nmrXiv Data Submission
CLOSE
SAMPLE DETAILS PROVIDED
ID0001 | ID0002 | ID0003 | ID0004 |
Data Set IDs
Release Date
Citation
Private
Public
Visibility
Author, 1., & Author, 2.. (2022). FAIR, consensus-driven NMR data repository and computational platform. The ultimate goal is to accelerate broader coordination and data sharing among natural product (NP) researchers by enabling storage, management, sharing and analysis of NMR data.
Download Zip
MD5 hashmap
Embed </>
Share
Meta Data Model
https://isa-tools.org/
ISA Limitations
(Repository perspective)
Need to extend beyond the ISA Models and give total flexibility to the end user to define their own templates while still being complaint with ISA Specifications.
Data/File Types
- CeNAPT data of 42 IMPS, some fully interpreted and with HiFSA profiles: https://dataverse.harvard.edu/dataverse/cenapt
- Data from all our publications with NMR data (39 papers, have not counted but should be 100-200 cpds) since 2015: https://dataverse.harvard.edu/dataverse/gfpuic
1H, HSQC, HMBC plus COSY, NOESY, 13C/APT
Any minimum requirements?
Any format conditions (zipped, un/processed, TopSpin/Xwinnmr, size)?
Data Conversions
Redis / RabbitMQ Queues (Jobs)
Python Job - Dispatcher REST
Web Service
Interacts with the Repository core for authentication/authorization, projects ~ storage details
nmrml2ISA
mzml2ISA
Bruker Converter
ML models
Analysis modules
NMR Workflow
MS Workflow
Raman Workflow
Argo Workflows - https://argoproj.github.io/argo-workflows
GKE: https://medium.com/sysmap-labs/how-to-install-and-configure-argo-workflows-on-gke-9dde654c145e
Python SDK: https://github.com/argoproj/argo-workflows/tree/master/sdks/python
Considerations before choosing Argo: https://medium.com/datamindedbe/what-to-consider-before-choosing-argo-workflow-54f6067307a8
Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).
- REST API
- S3 support
- Opensource
- GUI
Interacts with python dispatcher via REST API
Every cluster job is run as workflow. Some workflows can have one single node (example - converters).
Argos yaml to CWL conversion and vice versa is under development.
Data Conversions
https://github.com/NFDI4Chem/formaTAPIRest
Core Trust Seal
https://zenodo.org/record/3638211#.YOauSxNKhCU; Remark by Ti: Repos with CTS are explicitly recommended in author guidelines of some journals e.g. OpenChemistry (DeGryter).
ISO 16290:2013 (Technology Readiness Levels)
https://www.iso.org/standard/56064.html
DOI
https://support.datacite.org/docs/api-create-dois
DOI Registration Agencies: https://www.doi.org/RA_Coverage.html
Backend
Frame-work
Database
Server
nmrXiv
T E C H N O L O G Y S T A C K
Backend
Ruby (Ruby on Rails)
Vs
Python (Django)
Vs
PHP (Laravel)
Backend
Factors considered
EASY | NOT STRAIGHTFORWARD | YES |
VERY SCALABLE | VERY SCALABLE | SLOW |
Very good | Good | Good |
Good | Very good | Good |
Very good | Good | Good |
Concurrent requests | One request at a time | Yes with multi threading |
Gaining popularity | Okay | Declining |
YES | YES | NO |
https://trends.builtwith.com/framework/Laravel
https://trends.builtwith.com/framework/Ruby-on-Rails
https://trends.builtwith.com/framework/Django-Language
Backend - Trends
Backend
https://laravel.com/
Source : https://laravel.com and wikipedia
�
Amazon S3 compatible server-side software storage stack, it can handle unstructured data such as photos, videos, log files, backups, and container images with currently the maximum supported object size of 5TB
Source : https://min.io and wikipedia
�
Cache
Search
Instant search: https://www.meilisearch.com/
Big Data: ELK Stack: Elasticsearch, Logstash, Kibana
Cache: https://redis.io/
Status Page - Example Dropbox
Easily communicate real-time status to our users
Source : https://www.atlassian.com/software/statuspage and wikipedia
�
Frontend
Frame-work
Bundler
Package manager
Frontend
https://vuejs.org/
Source : https://vuejs.org and wikipedia
�
Mandate
Acquisition
Deposition
Processing
Distribution
Discovery
Archiving
Repurposing
Analysis
Color
Key
Journal,Institution, Funder
User
nmrXiv Platform
Authentication and Authorisation
Data life cycle / Versioning
https://github.com/ScienceObjectsDB/Documentation
Project 1
Project 2
Project 3
Study
Dataset
Dataset
Dataset
Dataset
Dataset
Dataset
Dataset
Sample
Assay
Spectra
M
Directory structure
BagIt is a set of hierarchical file layout conventions designed to support storage and transfer of digital content.
ISA formats
ISA Tab
ISA Json
DataCite
OpenAIRE
Bioschema
IUPAC FAIRSpec
I/O: Data Schemas
Data Formats
Support all major �instrument raw output�file formats and open �data formats.
Data Versioning
Versioning is natively built around data models at the repository level. In addition to that we support DOI Versioning: https://support.datacite.org/docs/versioning
The version number follows semantic versioning principles. Can have additional tags like "stable", "current" or "dev" that link to a specific version and can be updated and separately queried.
Ontologies
Giving (meta)data meaning with ontologies
Ontology driven input fields, textareas not only provide rich user experience but also capture rich metadata ensuring machine readability
nmrXiv - Ontology component
Smart compose - Ontologies / Controlled Vocabulary driven
Source : https://vuejs.org and https://reactjs.org
�