Welcome!
Please feel free to ask any questions you may have during the tutorial.
1
Our Presenters:
Corey Cox
Sierra Moxon
Kevin Schaper
With help from
Sarah Gehrke!
github.com/sierra-moxon
github.com/kevinschaper
github.com/amc-corey-cox
github.com/sagehrke
These slides: http://bit.ly/4ofmxHW
Getting Started
Shared Google drive for this workshop: https://go.lbl.gov/ICBO-LinkML
These slides: https://go.lbl.gov/ICBO-LinkML-slides
GitHub repository for this tutorial:
https://github.com/linkml/linkml-tutorial-2025�
2
Software prerequisites for tutorial
3
Learning Objectives
4
These slides: http://bit.ly/4ofmxHW
Code Of Conduct
LinkML Code of Conduct:
This Code of Conduct is adapted from the Contributor Covenant, version 1.4: https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
5
6
Time (ET) | Topic | Presenter |
15 mins | Introduction (slides) | Sierra |
20 mins | Section 1: Set up a LinkML repository | Sierra |
30 mins | Section 2: Authoring a LinkML model A. Model components, mappings B. Classes, Slots, Enumerations | Sierra |
5 mins | BREAK | |
20 mins | Section 3: Schema best practices, including linting A. linkml-run-examples, linkml-validate | Sierra |
10 mins | Section 4: Generating code from your model A. Pydantic, JSONSchema B. Generating documentation | Sierra |
5 mins | BREAK | |
30 mins | Section 5: Schemaview & Toolkit development A. SchemaView B. Schemabuilder & programmatic access to LinkML tooling. | Kevin |
20 mins | Section 6: LinkML Map | Corey |
10 mins | Wrap-up and Discussion | Corey |
Software prerequisites for tutorial
7
Introduction: Why LinkML?
8
https://app.sli.do/event/cefJ3D92AfEyboNj3KCuoQ
LinkML Tutorial
ICBO 2025
Sierra Moxon, Kevin Schaper, Corey Cox
Nov 5, 2025
Thank you!*
10
Thank you to all of our open source contributors and to the NIH, Wellcome Trust, and CZI Open Science organizations for funding so much of the work in this field.
* Special Thanks to Leo Baumgart for providing Plant Tissue Schema Examples!
Biological data is complex…
11
https://www.mdpi.com/journal/dna
Francie Rodriguez
What we usually start with…
Spreadsheets!
What we usually start with…
Classes?
Hmm…
Slots?
Colors and transposition!
What we usually start with…
Syntax specifics
What we usually start with…
Some are really detailed and have helpful examples!
What we usually start with…
16
17
Existing frameworks not designed for interop
Pacific Ocean Sample Dataset
Crater Lake Sample Dataset
CREATE TABLE lake_sample (
id varchar primary key,
depth foreign key,
location foreign key,
environment foreign key
…
)
CREATE TABLE biosample (
acc varchar primary key,
depth float,
lat float,
long float,
environment varchar
…
)
https://www.mdpi.com/journal/dna
We have many frameworks to structure data
SHACL
Semantic web developer
Developers
Data Scientist
Scientists, Clinicians, ..
SQL DDL
HDF5
Pandas CSVs
Excel
ISO-11179
CDEs
FHIR
Clinical
modeler
JSON Schema
Ontologist
OWL
Pydantic
ProtoBuf
GraphQL
ShEx
…interoperation is difficult
Omics Data
Phenotype / Clinical
Data or Environmental Data
insights
What can we do differently?
20
21
Pacific Ocean Sample Dataset
depth | species |
22 cm | p |
22 cm | g |
Crater Lake Sample Dataset
depth | species |
22 inches | x |
15 feet | x, p |
?
Lake Abert Sample Dataset
depth | species |
22 cm | x |
23 cm | x, y, z |
Example: biosample datasets
Can I compare microbiotic composition of bodies of water?
Can I compare the microbiotic composition of samples taken from the epipelagic zones?
Can I compare microbiotic composition from salt water samples of epipelagic zones?
22
Example: biosample datasets
Pacific Ocean Sample Dataset
depth | species | type |
22 cm | p | ocean |
22 cm | p | ocean |
Crater Lake Sample Dataset
depth | species | type |
22 inches | x | lake |
15 feet | x, p | lake |
?
Lake Albert Sample Dataset
depth | species | type |
22 cm | x | lake |
23 cm | x, y, z | lake |
Can I compare microbiotic composition of bodies of water?
Can I compare the microbiotic composition of samples taken from the epipelagic zones?
Can I compare microbiotic composition from salt water samples of epipelagic zones?
23
Common vocabularies are key
Pacific Ocean Sample Dataset
Crater Lake Sample Dataset
id | depth | species | type |
CL1 | 22 inches | x | ENVO:00000020 |
CL2 | 15 feet | x, p | ENVO:00000020 |
id | depth | species | type |
PO1 | 22 cm | p | ENVO:00000209 |
PO2 | 22 cm | p | ENVO:00000209 |
?
Can I compare microbiotic composition of bodies of water?
Can I compare the microbiotic composition of samples taken from the epipelagic zones ?
Can I compare microbiotic composition from salt water samples?
water body
ENVO:00000063
is-a
lake ENVO:00000020
marine water body
ENVO:00001999
is-a
marine photic zone
ENVO:00000209
is-a, part_of
24
Pacific Ocean Sample Dataset
Crater Lake Sample Dataset
depth | species | type |
22 inches | x | ENVO:00000020 |
15 feet | x, p | ENVO:00000020 |
depth | species | type |
22 cm | p | ENVO:00001999 |
22 cm | g | ENVO:00001999 |
These are “standards” (and “models”), but they are not computable without a human. ��How do we enforce our users pick a term from ENVO and
how do they know which term to pick?
?
Models hiding in plain sight
Data Integration, Collection, and Distribution
25
26
LinkML: Modeling Language & Toolkit
THE STANDARD
A meta-datamodel for structuring your data
TOOLS
Pragmatic developer and curator friendly tools for working with data
definition
Class
Slot
element
has 0..*
is_a 0..1
mixin 0..n
range 0..1
schema
imports 0..*
Validators
Data Converters
Compatibility tools
Data entry
Schema inference
27
Models hiding in plain sight
depth | salinity | bacteria | latitude | longitude | sample_type |
22 cm | 35 | x,p | 44.8084° N | 24.0632° W | ENVO:00001999 |
imports:
linkml:types
classes:
Sample:
description: a sample of biological material.
attributes:
depth:
slot_uri: ENVO:3100031
salinity:
exact_mappings:
-PATO:0085001
bacteria:
multivalued: true
latitude:
type: string
longitude:
sample_type:
required: true
range: SampleType
enums:
SampleType:
reachable_from:
source_ontology: obo:envo
This example makes an enumeration with all values from ENVO, but lots of people want a more constrained list….
MIxS Combination Packages
Helps standardize sets of measurements and observations describing particular habitats that are applicable across all GSC checklists.
The environmental combination classes/packages contain 3 terms to represent the environment of the sample:
env_broad_scale, env_medium, env_local_scale
But can this be easier?
Guiding with LinkML value sets (enumerations)
Guiding with LinkML value sets (enumerations)
31
We have many frameworks to structure data
SHACL
Semantic web developer
Developers
Data Scientist
Scientists, Clinicians, ..
SQL DDL
HDF5
Pandas CSVs
Excel
ISO-11179
CDEs
FHIR
Clinical
modeler
JSON Schema
Ontologist
OWL
Pydantic
ProtoBuf
GraphQL
ShEx
33
LinkML as a universal converter box
JSON-Schema
ShEx, SHACL
JSON-LD Contexts
Python Dataclasses
OWL
Semantic Web Applications
and Infrastructure
“Traditional” Applications and Infrastructure
SQL DDL
TSVs
Create data models in simple YAML files, optionally annotated using ontologies
Compile to other frameworks
Choose the right tools for the job; no lock-in
Biocurator
Data Scientist
dct:creator
34
LinkML auto-generates documentation
LinkML has built in validators
35
classes:
Sample:
description: a sample of biological material.
attributes:
depth:
slot_uri: ENVO:3100031
species:
multivalued: true
salinity:
exact_mappings:
-PATO:0085001
longitude:
latitude:
type:
required: true
range: EnviornmentEnum
enums:
EnvironmentEnum
reachable_from:
source_ontology: ENVO
JSON-Schema
Validator
(JSON-Schema)
populated JSON
LinkML Validate uses a plug-in architecture, more kinds of validation for specific formats is in use.
36
Generate LinkML
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant to LinkML standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Get intelligent assistance from auto schema tools
Schema-automator
Semi-structured data sources
refine
37
Option B: Use Excel or Google Sheets
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant to LinkML standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Option B: Author using schemasheets
38
Option A: Write your schema in LinkML YAML
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant to LinkML standard
Metadata
Dependencies
Namespaces
Actual data model
Option A: Author YAML directly
Software prerequisites for tutorial
39
Section 1: Setting up a LinkML project
40
A brief word about python package managers…
41
https://xkcd.com/1987/
Setup step 1: install prerequisites
42
> uv –-version
> curl -LsSf https://astral.sh/uv/install.sh | sh
Setup step 1: install prerequisites
43
> copier –-version
> uv tool install --with jinja2-time copier
Setup step 1: install prerequisites
44
> just –-version
> uv tool install rust-just
Setup step 1: install prerequisites
45
> git config --global user.name�> git config --global user.email�> git config --global init.defaultBranch
> git config --global user.name "Your Name"�> git config --global user.email "you@example.org"�> git config --global init.defaultBranch main
linkml-project-copier overview
46
https://github.com/linkml/linkml-project-copier
Special thanks to David Linke for his help in development of this terrific resource!
Setup step 2: create LinkML project
47
> cd linkml-tutorial-2025
> mkdir linkml-tutorial-2025
Setup step 2: create LinkML project
48
> copier copy --trust https://github.com/linkml/linkml-project-copier .
Setup step 3: set up LinkML project
49
> just setup
Bonus step 4: pushing to GitHub
50
> git remote add origin https://github.com/<my-org>/linkml-tutorial-2025.git
> git push -u origin main
Be sure to set your GH settings to use gh-pages branch!
step 5: Examine the directory structure
51
LinkML Schema YAML
Justfile - build and deploy scripts
pyproject.toml - python dependencies
GH actions to deploy documentation and run CI/CD commands
step 5: Examine the example LinkML Schema
52
Metadata
Dependencies
Namespaces
Classes
Slots
Section 2: Authoring a Model
53
> git clone https://github.com/linkml/linkml-tutorial-2025
Basic LinkML Elements
54
A LinkML schema defines the structure and semantics of a data model.�It’s the top-level specification that declares:
part
of
is a
Schema
Class
Slot
Enum
constrains
Permissible
Values
part
of
is a
part
of
Basic LinkML Elements
55
A LinkML Class defines a type of entity or concept in the model.
�Each class can:
part
of
is a
Schema
Class
Slot
Enum
constrains
Permissible
Values
part
of
is a
part
of
Basic LinkML Elements
56
A LinkML Slot defines an attribute or relationship that can be used by one or more classes.�It’s analogous to an RDF property, and defines:
part
of
is a
Schema
Class
Slot
Enum
constrains
Permissible
Values
part
of
is a
part
of
Basic LinkML Elements
57
A LinkML Enumeration defines a controlled vocabulary — a finite set of permissible values for a slot.
LinkML Enumerations can be:
part
of
is a
Schema
Class
Slot
Enum
constrains
Permissible
Values
part
of
is a
part
of
Basic LinkML Elements
58
A LinkML Class defines a type of entity or concept in the model.
�Each class can:
part
of
is a
Schema
Class
Slot
Enum
constrains
Permissible
Values
part
of
is a
part
of
Some Best Practices for Defining a LinkML:Class
59
For the purposes of learning…
60
Class: PlantTissueSample
61
LinkML modeling language:
classes:
PlantTissueSample:
aliases: [""]
description: >-
slots:
- id
class_uri:
exact_mappings:
broad_mappings:
https://www.ebi.ac.uk/ols4/search?q=tissue+sample
Establish a hierarchy of classes
62
LinkML modeling language:
classes:
Sample:
PlantTissueSample:
is_a: Sample
aliases: [""]
description: >-
slots:
- id
class_uri:
exact_mappings:
related_mappings:
https://linkml.io/linkml/schemas/inheritance.html#mixin-classes-and-slots
Basic LinkML Elements
63
A LinkML Slot defines an attribute or relationship that can be used by one or more classes.�It defines:
part
of
is a
Schema
Class
Slot
Enum
constrains
Permissible
Values
part
of
is a
part
of
Some Best Practices for Defining a LinkML:Slot
64
Tissue collection slots
65
Sample Container* | TRUE | tube |
Plate Location (Well #)* | TRUE | B1 |
Strain, Variety, or Cultivar* | TRUE | RTx430 |
Isolate | | Sb_Mut1 |
NCBI Taxonomy ID* | TRUE | 4558 |
Ploidy | | diploid |
Collection Date And Time* | TRUE | 2025-08-10T14:00:00-07:00 |
Sample Size* | TRUE | 0.45 g |
Tissue* | TRUE | 5 mm lateral root tips |
Tissue Plant Ontology Term | | seedling cotyledon (PO:0025471); seedling hypocotyl (PO:0025291) |
Depth, meters | | 0.1 |
Elevation, meters | | 120 |
Broad-Scale Environmental Context | | rangeland biome [ENVO:01000247] |
Local Environmental Context | | hillside [ENVO:01000333] |
Environmental Medium | | bluegrass field soil [ENVO:00005789] |
LinkML Slots
66
LinkML modeling language:
slots:
sample_id:
sample_container:
plate_well_number:
strain_variety_or_cultivar:
isolate:
ncbi_taxon_id:
ploidy:
collection_date_and_time:
sample_size:
tissue_type:
depth_in_meters:
broad_scale_context:
local_environmental_context
environmental_medium:
LinkML Slots
67
LinkML modeling language:
slots:
sample_id:
description: >-
required: true
multivalued: false
range: int
ploidy:
description:
slot_uri: PATO:0001374
exact_mappings:
range: PloidyEnum
Enumerations
68
LinkML modeling language:
Enums:
enums:
TissueTypeEnum:
permissible_values:
stem:
meaning: PO:0009047
flower:
meaning: PO:0009046
cotyledon:
meaning: PO:0020030
Enumerations
69
LinkML modeling language:
Enums:
enums:
TissueTypeEnum:
reachable_from:
source_ontology: obo:po
source_nodes:
- PO:0025131
include_self: false
relationship_types:
- rdfs:subClassOf
> vskit expand --config vskit-config.yaml --schema value_set_example.yaml
Other useful constraints
70
LinkML modeling language:
slots:
gene_id:
range: uriorcurie
id_prefixes:
classes:
PlantTissueSample:
slots:
slot_usage:
required: true
Catching up…
71
Section 3: Generating Artifacts
72
Generate and deploy model serializations
73
> just gen-project
Run a generator directly
74
> uv run gen-pydantic src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml > linkml_tutorial_2025_pydantic_model.py
Documentation generation
75
> just testdoc
http://127.0.0.1:8000/linkml-tutorial-2025/
Section 4: Validation
76
Validating an Example Data File
The linkml-validate command is a configurable command line utility for validating data instances against a schema.
CLI:
Documentation: https://linkml.io/linkml/data/validating-data.html
77
> uv run linkml-validate --schema [schema file] [data source...]
Validating an Example Data File
78
# tests/data/valid/PlantTissueSample-001yaml
id: 1
sample_container: tube
strain_variety_cultivar: RTx430
ncbi_taxonomy_id: NCBITaxon:4558
ploidy: diploid
collection_date_time: "2025-08-10T14:00:00-07:00"
sample_size: 0.45 g
tissue: 5 mm lateral root tips
tissue_plant_ontology_term: PO:0025471
elevation_meters: 120
broad_scale_environmental_context: ENVO:01000247
local_environmental_context: ENVO:01000333
environmental_medium: ENVO:00005789
Validating an Example Data File
79
> uv run linkml-validate \
--schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml \
-C PlantTissueSample \
tests/data/valid/PlantTissueSample-001.yaml
SMoxon@SMoxon-M82 linkml-tutorial-2025 % uv run linkml-validate --schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml -C PlantTissueSample tests/data/valid/PlantTissueSample-001.yaml
No issues found
Validating an Example Data File
80
# tests/data/valid/PlantTissueSample-001yaml
id: 2
sample_container: tube
isolate: Sb_Mut1
ploidy: diploid
collection_date_time: "2025-08-10T14:00:00-07:00"
sample_size: 0.45 g
Validating an Example Data File
81
> uv run linkml-validate \
--schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml \
-C PlantTissueSample \
tests/data/invalid/PlantTissueSample-missing-required.yaml
SMoxon@SMoxon-M82 linkml-tutorial-2025 % uv run linkml-validate --schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml -C PlantTissueSample tests/data/invalid/PlantTissueSample-missing-required.yaml
[ERROR] [tests/data/invalid/PlantTissueSample-missing-required.yaml/0] 'strain_variety_cultivar' is a required property in /
[ERROR] [tests/data/invalid/PlantTissueSample-missing-required.yaml/0] 'ncbi_taxonomy_id' is a required property in /
[ERROR] [tests/data/invalid/PlantTissueSample-missing-required.yaml/0] 'tissue' is a required property in /
Validating an Example Data File
82
# tests/data/invalid/PlantTissueSample-pattern-violation.yaml
id: 4
sample_container: plate
plate_location: Z99
strain_variety_cultivar: RTx430
ncbi_taxonomy_id: NCBITaxon:4558
collection_date_time: "2025-08-10T14:00:00-07:00"
sample_size: 0.45grams
tissue: 5 mm lateral root tips
Validating an Example Data File
83
> uv run linkml-validate \
--schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml \
-C PlantTissueSample \
tests/data/invalid/PlantTissueSample-pattern-violation.yaml
SMoxon@SMoxon-M82 linkml-tutorial-2025 % uv run linkml-validate --schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml -C PlantTissueSample tests/data/invalid/PlantTissueSample-pattern-violation.yaml
[ERROR] [tests/data/invalid/PlantTissueSample-pattern-violation.yaml/0] 'Z99' does not match '^[A-H][1-9][0-2]?$' in /plate_location
[ERROR] [tests/data/invalid/PlantTissueSample-pattern-violation.yaml/0] '0.45grams' does not match '^[0-9]+(\\.[0-9]+)?\\s+(g|ml|m2)$' in /sample_size
Validating an Example Data File
84
# tests/data/invalid/PlantTissueSample-bad-range.yaml
id: 3
sample_container: box
strain_variety_cultivar: RTx430
ncbi_taxonomy_id: NCBITaxon:4558
ploidy: octoploid
collection_date_time: "2025-08-10T14:00:00-07:00"
sample_size: 0.45 g
tissue: 5 mm lateral root tips
depth_meters: "very deep"
elevation_meters: true
Validating an Example Data File
85
> uv run linkml-validate \
--schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml \
-C PlantTissueSample \
tests/data/invalid/PlantTissueSample-bad-range.yaml
SMoxon@SMoxon-M82 linkml-tutorial-2025 % uv run linkml-validate --schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml -C PlantTissueSample tests/data/invalid/PlantTissueSample-bad-range.yaml
[ERROR] [tests/data/invalid/PlantTissueSample-bad-range.yaml/0] 'box' is not one of ['tube', 'plate'] in /sample_container
[ERROR] [tests/data/invalid/PlantTissueSample-bad-range.yaml/0] 'octoploid' is not one of ['haploid', 'diploid', 'triploid', 'tetraploid', 'allopolyploid'] in /ploidy
[ERROR] [tests/data/invalid/PlantTissueSample-bad-range.yaml/0] 'very deep' is not of type 'number', 'null' in /depth_meters
[ERROR] [tests/data/invalid/PlantTissueSample-bad-range.yaml/0] True is not of type 'number', 'null' in /elevation_meters
Going Further with Validation
86
Plugin examples:
NCATSTranslator/translator-ingests
microbiomedata/nmdc-schema
Catching up…
87
Section 5: Schema introspection
& programmatic composition
88
Introduction:
89
Why Schema Introspection?
90
Why Schema Introspection?
91
from linkml_runtime.utils.schemaview import SchemaView
sv = SchemaView("biolink-model.yaml")
for slot in sv.class_slots("gene"):
print(slot)
symbol
xref
has biological sequence
id
in taxon
in taxon label
provided by
full name
synonym
information content
equivalent identifiers
iri
category
type
name
description
has attribute
deprecated
What is SchemaView?
Key Features
SchemaView is at the core of LinkML generators, but can be useful to solve project specific problems.
92
SchemaView Basics: Loading schemas
93
from linkml_runtime.utils.schemaview import SchemaView
sv_from_local_file = SchemaView("./biolink-model.yaml")
sv_from_url = SchemaView("https://w3id.org/biolink/biolink-model.yaml")
# when the schema is embedded in a python package
from importlib.resources import files
biolink_yaml = files('biolink_model.schema') / 'biolink_model.yaml'
sv_from_python_env = SchemaView(str(biolink_yaml))
SchemaView Basics: Accessing elements
94
# Get all classes, slots, enums, etc.
all_classes = sv.all_classes()
all_slots = sv.all_slots()
all_enums = sv.all_enums()
# Get specific elements
gene_class = sv.get_class("gene")
related_to_slot = sv.get_slot("related to")
# Generic element retrieval
element = sv.get_element("named thing")
SchemaView Basics: Navigating Hierarchies
95
# Class hierarchy navigation
parents = sv.class_parents("gene")
children = sv.class_children("gene")
ancestors = sv.class_ancestors("gene")
descendants = sv.class_descendants("gene")
# Find root classes (no parents)
roots = sv.class_roots()
# Find leaf classes (no children)
leaves = sv.class_leaves()
# Find stand-alone classes in the schema
orphans = set(sv.class_roots()) & set(sv.class_leaves())
Example use case: biolink predicate to RO
96
Goal: Extract SKOS-style relationships between Biolink predicates and RO terms
Why this matters:
Finding Predicates
The set of slots we’re interested are descendants of “related to”
97
#
from linkml_runtime.utils.schemaview import SchemaView
# Load Biolink Model
sv = SchemaView("https://w3id.org/biolink/biolink-model.yaml")
# Get all descendants of "related to"
predicates = sv.slot_descendants("related to")
print(f"Found {len(predicates)} predicates")
Extracting mappings from a single slot
98
def get_mappings(sv, slot):
"""Get all mappings for a single slot"""
mapping_types = {
'exact_mappings': 'skos:exactMatch',
'broad_mappings': 'skos:broadMatch',
'narrow_mappings': 'skos:narrowMatch',
'related_mappings': 'skos:relatedMatch'
}
results = []
for mapping_type, skos_pred in mapping_types.items():
mappings = getattr(slot, mapping_type, [])
for term in mappings:
results.append({
'biolink_predicate': sv.get_uri(slot, expand=False),
'mapping_type': skos_pred,
'mapped_term': term
})
return results
# Example: single slot
slot = sv.get_slot("related to")
mappings = get_mappings(sv, slot)
print(mappings)
[{'biolink_predicate': 'biolink:related_to', 'mapping_type': 'skos:exactMatch', 'mapped_term': 'UMLS:related_to'}, {'biolink_predicate': 'biolink:related_to', 'mapping_type': 'skos:broadMatch', 'mapped_term': 'owl:topObjectProperty'}
. . .
Extract Mappings for all predicates
99
import pandas as pd
# Apply to all descendants of "related to"
all_results = []
for slot_id in sv.slot_descendants("related to", reflexive=True):
slot = sv.get_slot(slot_id)
all_results.extend(get_mappings(sv, slot))
# Convert to DataFrame and filter for RO terms
df = pd.DataFrame(all_results)
df = df[df['mapped_term'].str.startswith('RO:')]
df = df.rename(columns={'mapped_term': 'ro_term'})
# Shuffle to show variety of predicates
df = df.sample(frac=1, random_state=42).reset_index(drop=True)
# View results
print(f"Found {len(df)} RO mappings")
print(df.head(10).to_string(index=False))
Found 200 RO mappings
biolink_predicate mapping_type ro_term
biolink:temporally_related_to skos:narrowMatch RO:0002092
biolink:related_to skos:narrowMatch RO:0002179
biolink:related_to skos:narrowMatch RO:0002373
biolink:orthologous_to skos:exactMatch RO:HOM0000017
biolink:has_input skos:narrowMatch RO:0002590
biolink:precedes skos:narrowMatch RO:0002412
biolink:occurs_in skos:narrowMatch RO:0002231
biolink:caused_by skos:narrowMatch RO:0009501
biolink:causes skos:narrowMatch RO:0002256
biolink:associated_with skos:narrowMatch RO:0004029
Analyze the Mapping Types
Wrap it up with a nice ascii art bar chart
100
# Group by mapping type
mapping_summary = df.groupby('mapping_type').size().sort_values(ascending=False)
print("SKOS mapping types distribution:\n")
max_count = mapping_summary.max()
for mapping_type, count in mapping_summary.items():
bar = '█' * int(50 * count / max_count)
print(f"{mapping_type:20s} {bar} {count}")
```
SKOS mapping types distribution:
skos:narrowMatch ██████████████████████████████████████████████████ 134
skos:exactMatch █████████████████████ 57
skos:broadMatch ███ 9
SchemaBuilder - Programmatic Schema Construction
Why SchemaBuilder?
Implements the Builder Pattern:
101
from linkml.utils.schema_builder import SchemaBuilder
sb = SchemaBuilder("my-schema")
sb.add_class(...).add_slot(...).add_enum(...)
schema = sb.schema
SchemaBuilder Basics
102
from linkml.utils.schema_builder import SchemaBuilder
sb = SchemaBuilder("sample-schema")
# Add slot definitions FIRST (with explicit properties)
sb.add_slot("name", description="Full name of the person", range="string")
sb.add_slot("age", description="Age in years", range="integer")
# Then add class that uses those slots
# Note: "email" will be auto-created with default range (string)
sb.add_class("Person", slots=["name", "age", "email"], description="A person")
# Get the schema
schema = sb.schema
Example - Clinical Research Data Harmonization Schema
Scenario: You're a data architect for a multi-site clinical research network studying chronic diseases.
Challenge: Bridge biomedical knowledge graphs with clinical standards
Solution: Compose a schema by combining:
Goal: Enable harmonized data across molecular research and clinical care
103
Step 1 - Load Source Schema, Start Our Schema
104
from importlib.resources import files
from linkml.utils.schema_builder import SchemaBuilder
from linkml_runtime.utils.schemaview import SchemaView
from linkml_runtime.linkml_model import SlotDefinition
# Load Biolink Model (installed package)
biolink_yaml = files('biolink_model.schema') / 'biolink_model.yaml'
biolink_sv = SchemaView(str(biolink_yaml))
# Create new schema
sb = SchemaBuilder("clinical-research-schema")
sb.add_defaults()
# Add prefixes
sb.add_prefix("biolink", "https://w3id.org/biolink/vocab/")
sb.add_prefix("RO", "http://purl.obolibrary.org/obo/RO_")
sb.add_prefix("HL7", "http://terminology.hl7.org/CodeSystem/")
sb.add_prefix("SNOMED", "http://snomed.info/id/")
Step 2 - Extract Classes and Slots from Biolink
105
# Get relevant Biolink entity classes for clinical research
biolink_classes = ["disease", "drug", "phenotypic feature", "biological entity"]
for class_name in biolink_classes:
cls = biolink_sv.get_class(class_name)
if cls:
# Use title case for our new schema
new_class_name = class_name.title().replace(" ", "")
class_slot_names = biolink_sv.class_slots(class_name)
for slot_name in class_slot_names:
if slot_name not in sb.schema.slots:
slot = biolink_sv.get_slot(slot_name)
# Left out: import enums & simplify other slot ranges, clear is_a, etc.
sb.add_slot(SlotDefinition(**slot.__dict__))
sb.add_class(
new_class_name,
description=cls.description,
slots=class_slot_names,
exact_mappings=[f"biolink:{class_name.replace(' ', '')}"])
Step 3 - Add Clinical RO Relationships
106
# Define clinical relationships using RO terms NOT in Biolink
# These capture patient-level clinical relationships
ro_relationships = {
"has_population_characteristic": {
"description": "Relates a patient to a demographic or population characteristic",
"exact_mappings": ["RO:0002551"],
"domain": "BiologicalEntity",
"range": "string"
},
"has_disposition": {
"description": "Relates a patient to a disease susceptibility or predisposition",
"exact_mappings": ["RO:0000091"],
"domain": "BiologicalEntity",
"range": "Disease"
},
"improves_condition_of": {
"description": "Relates a therapeutic intervention to the condition it improves",
"exact_mappings": ["RO:0002500"],
"domain": "Drug",
"range": "Disease"
},
. . .
for slot_name, props in ro_relationships.items():
sb.add_slot(slot_name,
description=props["description"],
domain=props["domain"],
range=props["range"],
exact_mappings=props["exact_mappings"])
print(f" ✓ {slot_name} → {props['exact_mappings'][0]}")
Step 4 - Reuse Clinical ValueSets
107
# Load standardized clinical value sets from linkml/valuesets repository
vs_base = "https://raw.githubusercontent.com/linkml/valuesets/main/src/valuesets/schema/"
demographics_sv = SchemaView(vs_base + "demographics.yaml")
clinical_sv = SchemaView(vs_base + "medical/clinical.yaml")
# Copy ontology-grounded enums from linkml/valuesets (HL7/SNOMED mapped) to our schema
sb.schema.enums["EducationLevel"] = demographics_sv.get_enum("EducationLevel")
sb.schema.enums["BloodTypeEnum"] = clinical_sv.get_enum("BloodTypeEnum")
# Add custom slots for clinical observations
sb.add_slot("education_level", description="Patient education level", range="EducationLevel")
sb.add_slot("blood_type", description="Patient blood type", range="BloodTypeEnum")
# Add a custom clinical observation class
sb.add_class("ClinicalObservation",
description="A clinical observation capturing patient characteristics",
slots=["id", "patient_id", "education_level", "blood_type", "observation_date"])
Step 5 - Export!
108
# Get the completed schema
schema = sb.schema
# Export to YAML
from linkml_runtime.dumpers import yaml_dumper
yaml_dumper.dump(schema, "clinical-research-schema.yaml")
Section 6: LinkML-Map
109
Transforming Data Models with LinkML-Map
What LinkML-Map does:
Built for interoperability and reproducibility�
Tutorial goals:
Understand what LinkML-Map does and how to use it
110
Two Core Functions of LinkML-Map
Schema → Schema Transformation�
Data → Data Transformation�
111
Why LinkML-Map?
Reduces brittle, hand-coded ETL logic
Transparent mappings that can be version-controlled and reversible
Supports incremental alignment as models evolve
LinkML ecosystem: validation, supports multiple formats
112
Core Concepts
113
Hands-On Tutorial Overview
(Refer to docs/examples/Tutorial.ipynb)
114
Where LinkML-Map Fits
More Example Notebooks
Tutorial.ipynb – core walkthrough - presented here
Derivations.ipynb – advanced nested mapping
Schema-Composition.ipynb – reusing maps
Data-Validation.ipynb – validating transformed instances
115
Wrap up and Discussion
116
Questions? Discussion?
117
Learning more and staying connected
118
Join the LinkML community!
119