1 of 119

Welcome!

Please feel free to ask any questions you may have during the tutorial.

1

Our Presenters:

Corey Cox

Sierra Moxon

Kevin Schaper

With help from

Sarah Gehrke!

github.com/sierra-moxon

github.com/kevinschaper

github.com/amc-corey-cox

github.com/sagehrke

These slides: http://bit.ly/4ofmxHW

2 of 119

Getting Started

Shared Google drive for this workshop: https://go.lbl.gov/ICBO-LinkML

These slides: https://go.lbl.gov/ICBO-LinkML-slides

GitHub repository for this tutorial:

https://github.com/linkml/linkml-tutorial-2025

2

3 of 119

Software prerequisites for tutorial

  • First, install UV, a Python version, build, and dependency management system.
  • Then add Copier, a templating system for building repositories with predefined dependencies.
  • Then add Just, a build and deployment tool that helps make command-line build operations less cumbersome.
    • uv tool install rust-just
    • https://github.com/casey/just

3

4 of 119

Learning Objectives

  • Understand and build a LinkML schema repository.
  • Understand how ontologies are used in LinkML.
  • Learn to integrate ontologies into LinkML models, focusing on creating controlled vocabularies, mappings, and enumerations based on existing ontologies. Discover how to mix and match ontologies within a single LinkML enumeration.
  • Learn to validate data against LinkML schemas.
  • Learn how to programmatically access a schema with LinkML tools like Schemaview.
  • Learn a little bit about schema transformations with LinkML-Map

4

These slides: http://bit.ly/4ofmxHW

5 of 119

Code Of Conduct

LinkML Code of Conduct:

  • Use welcoming and inclusive language
  • Be respectful of differing viewpoints and experiences
  • Show empathy towards other community members

This Code of Conduct is adapted from the Contributor Covenant, version 1.4: https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

5

6 of 119

6

Time (ET)

Topic

Presenter

15 mins

Introduction (slides)

Sierra

20 mins

Section 1: Set up a LinkML repository

Sierra

30 mins

Section 2: Authoring a LinkML model

A. Model components, mappings

B. Classes, Slots, Enumerations

Sierra

5 mins

BREAK

20 mins

Section 3: Schema best practices, including linting

A. linkml-run-examples, linkml-validate

Sierra

10 mins

Section 4: Generating code from your model

A. Pydantic, JSONSchema

B. Generating documentation

Sierra

5 mins

BREAK

30 mins

Section 5: Schemaview & Toolkit development

A. SchemaView

B. Schemabuilder & programmatic access to LinkML tooling.

Kevin

20 mins

Section 6: LinkML Map

Corey

10 mins

Wrap-up and Discussion

Corey

7 of 119

Software prerequisites for tutorial

  • First, install UV, a Python version, build, and dependency management system.
  • Then add Copier, a templating system for building repositories with predefined dependencies.
  • Then add Just, a build and deployment tool that helps make command-line build operations less cumbersome.
    • uv tool install rust-just
    • https://github.com/casey/just

7

8 of 119

Introduction: Why LinkML?

8

https://app.sli.do/event/cefJ3D92AfEyboNj3KCuoQ

9 of 119

LinkML Tutorial

ICBO 2025

Sierra Moxon, Kevin Schaper, Corey Cox

Nov 5, 2025

10 of 119

Thank you!*

10

Thank you to all of our open source contributors and to the NIH, Wellcome Trust, and CZI Open Science organizations for funding so much of the work in this field.

* Special Thanks to Leo Baumgart for providing Plant Tissue Schema Examples!

11 of 119

Biological data is complex…

11

https://www.mdpi.com/journal/dna

  • clearly labeled attributes
  • whole numbers
  • all integers
  • harmonized units
  • everyone is recording the same attributes
  • easy to compare and reuse

Francie Rodriguez

  • Complex, relational, contextual knowledge
  • >100k of named entities and terms
  • Most knowledge exists in unstructured form (literature, figures, lab notebooks, spreadsheets)

12 of 119

What we usually start with…

Spreadsheets!

13 of 119

What we usually start with…

Classes?

Hmm…

Slots?

Colors and transposition!

14 of 119

What we usually start with…

Syntax specifics

15 of 119

What we usually start with…

Some are really detailed and have helpful examples!

16 of 119

What we usually start with…

16

17 of 119

17

Existing frameworks not designed for interop

Pacific Ocean Sample Dataset

Crater Lake Sample Dataset

CREATE TABLE lake_sample (

id varchar primary key,

depth foreign key,

location foreign key,

environment foreign key

)

CREATE TABLE biosample (

acc varchar primary key,

depth float,

lat float,

long float,

environment varchar

)

https://www.mdpi.com/journal/dna

18 of 119

We have many frameworks to structure data

SHACL

Semantic web developer

Developers

Data Scientist

Scientists, Clinicians, ..

SQL DDL

HDF5

Pandas CSVs

Excel

ISO-11179

CDEs

FHIR

Clinical

modeler

JSON Schema

Ontologist

OWL

Pydantic

ProtoBuf

GraphQL

ShEx

19 of 119

…interoperation is difficult

Omics Data

Phenotype / Clinical

Data or Environmental Data

insights

20 of 119

What can we do differently?

  • Start with ontologies
    • reuse and contribute to existing efforts when possible!
    • make selection of terms from an ontology easy.
  • Make implicit models, explicit
    • use an open, community driven approach
    • meet tool developers, subject matter experts, and organizations where they are
    • make documentation easy

20

21 of 119

21

Pacific Ocean Sample Dataset

depth

species

22 cm

p

22 cm

g

Crater Lake Sample Dataset

depth

species

22 inches

x

15 feet

x, p

?

Lake Abert Sample Dataset

depth

species

22 cm

x

23 cm

x, y, z

Example: biosample datasets

Can I compare microbiotic composition of bodies of water?

Can I compare the microbiotic composition of samples taken from the epipelagic zones?

Can I compare microbiotic composition from salt water samples of epipelagic zones?

22 of 119

22

Example: biosample datasets

Pacific Ocean Sample Dataset

depth

species

type

22 cm

p

ocean

22 cm

p

ocean

Crater Lake Sample Dataset

depth

species

type

22 inches

x

lake

15 feet

x, p

lake

?

Lake Albert Sample Dataset

depth

species

type

22 cm

x

lake

23 cm

x, y, z

lake

Can I compare microbiotic composition of bodies of water?

Can I compare the microbiotic composition of samples taken from the epipelagic zones?

Can I compare microbiotic composition from salt water samples of epipelagic zones?

23 of 119

23

Common vocabularies are key

Pacific Ocean Sample Dataset

Crater Lake Sample Dataset

id

depth

species

type

CL1

22 inches

x

ENVO:00000020

CL2

15 feet

x, p

ENVO:00000020

id

depth

species

type

PO1

22 cm

p

ENVO:00000209

PO2

22 cm

p

ENVO:00000209

?

Can I compare microbiotic composition of bodies of water?

Can I compare the microbiotic composition of samples taken from the epipelagic zones ?

Can I compare microbiotic composition from salt water samples?

water body

ENVO:00000063

is-a

lake ENVO:00000020

marine water body

ENVO:00001999

is-a

marine photic zone

ENVO:00000209

is-a, part_of

24 of 119

24

Pacific Ocean Sample Dataset

Crater Lake Sample Dataset

depth

species

type

22 inches

x

ENVO:00000020

15 feet

x, p

ENVO:00000020

depth

species

type

22 cm

p

ENVO:00001999

22 cm

g

ENVO:00001999

These are “standards” (and “models”), but they are not computable without a human. ��How do we enforce our users pick a term from ENVO and

how do they know which term to pick?

?

Models hiding in plain sight

25 of 119

Data Integration, Collection, and Distribution

  • Start with ontologies
    • reuse and contribute to existing efforts when possible!
    • make selection of terms from an ontology easy.
  • Make implicit models explicit
    • document which ontologies to use in annotating a dataset
    • map to existing knowledge in ontologies explicitly
    • meet tool developers, subject matter experts, and organizations where they are
    • make documentation easy

25

26 of 119

26

LinkML: Modeling Language & Toolkit

THE STANDARD

A meta-datamodel for structuring your data

TOOLS

Pragmatic developer and curator friendly tools for working with data

definition

Class

Slot

element

has 0..*

is_a 0..1

mixin 0..n

range 0..1

schema

imports 0..*

Validators

Data Converters

Compatibility tools

Data entry

Schema inference

27 of 119

27

Models hiding in plain sight

depth

salinity

bacteria

latitude

longitude

sample_type

22 cm

35

x,p

44.8084° N

24.0632° W

ENVO:00001999

imports:

linkml:types

classes:

Sample:

description: a sample of biological material.

attributes:

depth:

slot_uri: ENVO:3100031

salinity:

exact_mappings:

-PATO:0085001

bacteria:

multivalued: true

latitude:

type: string

longitude:

sample_type:

required: true

range: SampleType

enums:

SampleType:

reachable_from:

source_ontology: obo:envo

  • Reuse other LinkML models explicitly.
  • Simple, human-readable descriptions.
  • Reuse existing class definitions.
  • Mappings in the model, to other models.
  • Constraints the direct users to existing terminologies (we’ll dive into this in more detail).

This example makes an enumeration with all values from ENVO, but lots of people want a more constrained list….

28 of 119

MIxS Combination Packages

Helps standardize sets of measurements and observations describing particular habitats that are applicable across all GSC checklists.

The environmental combination classes/packages contain 3 terms to represent the environment of the sample:

env_broad_scale, env_medium, env_local_scale

29 of 119

But can this be easier?

30 of 119

Guiding with LinkML value sets (enumerations)

31 of 119

Guiding with LinkML value sets (enumerations)

31

32 of 119

We have many frameworks to structure data

SHACL

Semantic web developer

Developers

Data Scientist

Scientists, Clinicians, ..

SQL DDL

HDF5

Pandas CSVs

Excel

ISO-11179

CDEs

FHIR

Clinical

modeler

JSON Schema

Ontologist

OWL

Pydantic

ProtoBuf

GraphQL

ShEx

33 of 119

33

LinkML as a universal converter box

JSON-Schema

ShEx, SHACL

JSON-LD Contexts

Python Dataclasses

OWL

Semantic Web Applications

and Infrastructure

“Traditional” Applications and Infrastructure

SQL DDL

TSVs

Create data models in simple YAML files, optionally annotated using ontologies

Compile to other frameworks

Choose the right tools for the job; no lock-in

Biocurator

Data Scientist

dct:creator

34 of 119

34

LinkML auto-generates documentation

35 of 119

LinkML has built in validators

35

classes:

Sample:

description: a sample of biological material.

attributes:

depth:

slot_uri: ENVO:3100031

species:

multivalued: true

salinity:

exact_mappings:

-PATO:0085001

longitude:

latitude:

type:

required: true

range: EnviornmentEnum

enums:

EnvironmentEnum

reachable_from:

source_ontology: ENVO

JSON-Schema

Validator

(JSON-Schema)

populated JSON

LinkML Validate uses a plug-in architecture, more kinds of validation for specific formats is in use.

36 of 119

36

Generate LinkML

id: https://example.org/linkml/hello-world

title: Really basic LinkML model

name: hello-world

version: 0.0.1

prefixes:

linkml: https://w3id.org/linkml/

sdo: https://schema.org/

ex: https://example.org/linkml/hello-world/

default_prefix: ex

default_curi_maps:

- semweb_context

imports:

- linkml:types

classes:

Person:

description: Minimal information about a person

class_uri: sdo:Person

attributes:

id:

identifier: true

slot_uri: sdo:taxID

first_name:

required: true

slot_uri: sdo:givenName

multivalued: true

last_name:

required: true

slot_uri: sdo:familyName

knows:

range: Person

multivalued: true

slot_uri: foaf:knows

YAML conformant to LinkML standard

Metadata

Dependencies

Namespaces

Actual Datamodel

Get intelligent assistance from auto schema tools

Schema-automator

Semi-structured data sources

refine

37 of 119

37

Option B: Use Excel or Google Sheets

id: https://example.org/linkml/hello-world

title: Really basic LinkML model

name: hello-world

version: 0.0.1

prefixes:

linkml: https://w3id.org/linkml/

sdo: https://schema.org/

ex: https://example.org/linkml/hello-world/

default_prefix: ex

default_curi_maps:

- semweb_context

imports:

- linkml:types

classes:

Person:

description: Minimal information about a person

class_uri: sdo:Person

attributes:

id:

identifier: true

slot_uri: sdo:taxID

first_name:

required: true

slot_uri: sdo:givenName

multivalued: true

last_name:

required: true

slot_uri: sdo:familyName

knows:

range: Person

multivalued: true

slot_uri: foaf:knows

YAML conformant to LinkML standard

Metadata

Dependencies

Namespaces

Actual Datamodel

Option B: Author using schemasheets

38 of 119

38

Option A: Write your schema in LinkML YAML

id: https://example.org/linkml/hello-world

title: Really basic LinkML model

name: hello-world

version: 0.0.1

prefixes:

linkml: https://w3id.org/linkml/

sdo: https://schema.org/

ex: https://example.org/linkml/hello-world/

default_prefix: ex

default_curi_maps:

- semweb_context

imports:

- linkml:types

classes:

Person:

description: Minimal information about a person

class_uri: sdo:Person

attributes:

id:

identifier: true

slot_uri: sdo:taxID

first_name:

required: true

slot_uri: sdo:givenName

multivalued: true

last_name:

required: true

slot_uri: sdo:familyName

knows:

range: Person

multivalued: true

slot_uri: foaf:knows

YAML conformant to LinkML standard

Metadata

Dependencies

Namespaces

Actual data model

Option A: Author YAML directly

39 of 119

Software prerequisites for tutorial

  • First, install UV, a Python version, build, and dependency management system.
  • Then add Copier, a templating system for building repositories with predefined dependencies.
  • Then add Just, a build and deployment tool that helps make command-line build operations less cumbersome.
    • uv tool install rust-just
    • https://github.com/casey/just

39

40 of 119

Section 1: Setting up a LinkML project

40

41 of 119

A brief word about python package managers…

41

https://xkcd.com/1987/

42 of 119

Setup step 1: install prerequisites

  • Check to see if you already have UV installed

  • If you get a “command not found” error, install

https://docs.astral.sh/uv/getting-started/installation/

42

> uv –-version

> curl -LsSf https://astral.sh/uv/install.sh | sh

43 of 119

Setup step 1: install prerequisites

  • Check to see if you already have copier installed

  • If you get a “command not found” error, install

https://copier.readthedocs.io/en/stable/

43

> copier –-version

> uv tool install --with jinja2-time copier

44 of 119

Setup step 1: install prerequisites

  • Check to see if you already have just installed

  • If you get a “command not found” error, install

https://github.com/casey/just

44

> just –-version

> uv tool install rust-just

45 of 119

Setup step 1: install prerequisites

  • Check to see if git is configured correctly

  • If any do not return something, set the values as needed

45

> git config --global user.name�> git config --global user.email�> git config --global init.defaultBranch

> git config --global user.name "Your Name"�> git config --global user.email "you@example.org"�> git config --global init.defaultBranch main

46 of 119

linkml-project-copier overview

  • Simple, updateable project scaffold
  • Ensure we have the right tools installed
  • Create and setup the project in a standard format
  • Enable GitHub integrations automatically

46

https://github.com/linkml/linkml-project-copier

Special thanks to David Linke for his help in development of this terrific resource!

47 of 119

Setup step 2: create LinkML project

47

> cd linkml-tutorial-2025

> mkdir linkml-tutorial-2025

48 of 119

Setup step 2: create LinkML project

  • You will be prompted to enter a few values, like:
    • name: linkml-tutorial-2025
    • github_org: <org-name>
    • description: brief one line description of schema
    • full_name: full name of schema author
    • email: email id of schema author

48

> copier copy --trust https://github.com/linkml/linkml-project-copier .

49 of 119

Setup step 3: set up LinkML project

  • The setup process takes care of 3 things for you:
    • Creation of a virtual environment and installation of listed dependencies within it
    • Generation of all artifacts by LinkML suite of generators
    • Generation of Markdown and HTML documentation
    • Initialization of schema project as Git repository

49

> just setup

50 of 119

Bonus step 4: pushing to GitHub

  • Go to https://github.com/new and follow the instructions
    • Being sure to NOT add a README,. gitignore file or a LICENSE file (the template will take care of this for you)
  • Add the remote to your local git repository

50

> git remote add origin https://github.com/<my-org>/linkml-tutorial-2025.git

> git push -u origin main

Be sure to set your GH settings to use gh-pages branch!

51 of 119

step 5: Examine the directory structure

51

LinkML Schema YAML

Justfile - build and deploy scripts

pyproject.toml - python dependencies

GH actions to deploy documentation and run CI/CD commands

52 of 119

step 5: Examine the example LinkML Schema

52

Metadata

Dependencies

Namespaces

Classes

Slots

53 of 119

Section 2: Authoring a Model

53

> git clone https://github.com/linkml/linkml-tutorial-2025

54 of 119

Basic LinkML Elements

54

A LinkML schema defines the structure and semantics of a data model.�It’s the top-level specification that declares:

  • What kinds of things exist (classes)
  • What attributes or properties they have (slots)
  • What controlled values those properties can take (enums or references)
  • How these elements relate to each other

part

of

is a

Schema

Class

Slot

Enum

constrains

Permissible

Values

part

of

is a

part

of

55 of 119

Basic LinkML Elements

55

A LinkML Class defines a type of entity or concept in the model.

�Each class can:

  • Inherit from other classes
  • Specify which slots (attributes) apply to its instances
  • Impose constraints on those slots (required, cardinality, type, etc.)

part

of

is a

Schema

Class

Slot

Enum

constrains

Permissible

Values

part

of

is a

part

of

56 of 119

Basic LinkML Elements

56

A LinkML Slot defines an attribute or relationship that can be used by one or more classes.�It’s analogous to an RDF property, and defines:

  • Range (what kind of value is allowed)
  • Cardinality
  • Whether it’s required
  • Domain(s) (which classes use it)

part

of

is a

Schema

Class

Slot

Enum

constrains

Permissible

Values

part

of

is a

part

of

57 of 119

Basic LinkML Elements

57

A LinkML Enumeration defines a controlled vocabulary — a finite set of permissible values for a slot.

LinkML Enumerations can be:

  • Inline lists of human-readable values
  • Linked to ontology terms (e.g., CURIEs like NCBITaxon:9606 or GO:0008150)
  • Expanded or validated against external ontology sources.
  • Mapped via annotations to ontology URIs, labels, or IDs

part

of

is a

Schema

Class

Slot

Enum

constrains

Permissible

Values

part

of

is a

part

of

58 of 119

Basic LinkML Elements

58

A LinkML Class defines a type of entity or concept in the model.

�Each class can:

  • Inherit from other classes
  • Specify which slots (attributes) apply to its instances
  • Impose constraints on those slots (required, cardinality, type, etc.)

part

of

is a

Schema

Class

Slot

Enum

constrains

Permissible

Values

part

of

is a

part

of

59 of 119

Some Best Practices for Defining a LinkML:Class

  1. Model “things,” not “values”
    1. Person, Sample, Dataset, Experiment
  2. Ask: “Would I have multiple distinct examples of this in data?”
    • yes: it’s a class, no: it may be a slot, an enumeration, or a nested structure.
  3. Use clear, singular names
  4. Define what it is!
    • Add title, description, class_uri, aliases and comments fields to make schemas self-documenting.
  5. Use ontologies for semantics and meaning, LinkML for data shape.

59

60 of 119

For the purposes of learning…

60

61 of 119

Class: PlantTissueSample

61

LinkML modeling language:

  • description
  • aliases
  • mappings
  • class_uri

classes:

PlantTissueSample:

aliases: [""]

description: >-

slots:

- id

class_uri:

exact_mappings:

broad_mappings:

  • NCIT:C70699

https://www.ebi.ac.uk/ols4/search?q=tissue+sample

62 of 119

Establish a hierarchy of classes

62

LinkML modeling language:

  • is_a
  • mixin

classes:

Sample:

PlantTissueSample:

is_a: Sample

aliases: [""]

description: >-

slots:

- id

class_uri:

exact_mappings:

related_mappings:

https://linkml.io/linkml/schemas/inheritance.html#mixin-classes-and-slots

63 of 119

Basic LinkML Elements

63

A LinkML Slot defines an attribute or relationship that can be used by one or more classes.�It defines:

  • Range (what kind of value is allowed)
  • Cardinality
  • Whether it’s required
  • Domain(s) (which classes use it)

part

of

is a

Schema

Class

Slot

Enum

constrains

Permissible

Values

part

of

is a

part

of

64 of 119

Some Best Practices for Defining a LinkML:Slot

  • Use slots to describe or connect things.
    • Example: age, has_disease, part_of, measured_value.
  • If you can express it as “An X has Y,” it’s probably a slot.
  • When a property starts needing subfields, it’s better modeled as its own class.
    • Example:
      1. Slot: height (range: float)
      2. Class: measurement (class with value, unit, method slots)
  • Ask: “Could multiple things share this property?”
  • Controversial: favor readability and usability over semantic purity.
  • Always add descriptions.
  • Always add mappings when available.

64

65 of 119

Tissue collection slots

65

Sample Container*

TRUE

tube

Plate Location (Well #)*

TRUE

B1

Strain, Variety, or Cultivar*

TRUE

RTx430

Isolate

Sb_Mut1

NCBI Taxonomy ID*

TRUE

4558

Ploidy

diploid

Collection Date And Time*

TRUE

2025-08-10T14:00:00-07:00

Sample Size*

TRUE

0.45 g

Tissue*

TRUE

5 mm lateral root tips

Tissue Plant Ontology Term

seedling cotyledon (PO:0025471); seedling hypocotyl (PO:0025291)

Depth, meters

0.1

Elevation, meters

120

Broad-Scale Environmental Context

rangeland biome [ENVO:01000247]

Local Environmental Context

hillside [ENVO:01000333]

Environmental Medium

bluegrass field soil [ENVO:00005789]

66 of 119

LinkML Slots

66

LinkML modeling language:

  • required
  • identifier
  • multivalued
  • range
  • slot_uri
  • description
  • mappings
  • examples

slots:

sample_id:

sample_container:

plate_well_number:

strain_variety_or_cultivar:

isolate:

ncbi_taxon_id:

ploidy:

collection_date_and_time:

sample_size:

tissue_type:

depth_in_meters:

broad_scale_context:

local_environmental_context

environmental_medium:

67 of 119

LinkML Slots

67

LinkML modeling language:

  • required
  • identifier
  • multivalued
  • range
  • slot_uri
  • description
  • mappings
  • examples

slots:

sample_id:

description: >-

required: true

multivalued: false

range: int

ploidy:

description:

slot_uri: PATO:0001374

exact_mappings:

  • PATO:0001374
  • NCIT:C17001
  • SIO:010278

range: PloidyEnum

68 of 119

Enumerations

68

LinkML modeling language:

Enums:

  • permissible values
  • meaning

enums:

TissueTypeEnum:

permissible_values:

stem:

meaning: PO:0009047

flower:

meaning: PO:0009046

cotyledon:

meaning: PO:0020030

69 of 119

Enumerations

69

LinkML modeling language:

Enums:

  • reachable_from
    • vskit in oaklib

enums:

TissueTypeEnum:

reachable_from:

source_ontology: obo:po

source_nodes:

- PO:0025131

include_self: false

relationship_types:

- rdfs:subClassOf

> vskit expand --config vskit-config.yaml --schema value_set_example.yaml

70 of 119

Other useful constraints

70

LinkML modeling language:

slots:

gene_id:

range: uriorcurie

id_prefixes:

  • HGNC
  • NCBIGene
  • Ensembl

classes:

PlantTissueSample:

slots:

  • ncbi_taxon_id

slot_usage:

required: true

71 of 119

Catching up…

71

72 of 119

Section 3: Generating Artifacts

72

73 of 119

Generate and deploy model serializations

  • just is a wrapper around uv
  • gen-project is a grouping of many popular model serialization generators (JSONSchema, python dataclasses, doc, OWL, etc.)
  • we can always run individual generators

73

> just gen-project

74 of 119

Run a generator directly

74

> uv run gen-pydantic src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml > linkml_tutorial_2025_pydantic_model.py

75 of 119

Documentation generation

75

> just testdoc

http://127.0.0.1:8000/linkml-tutorial-2025/

76 of 119

Section 4: Validation

76

77 of 119

Validating an Example Data File

The linkml-validate command is a configurable command line utility for validating data instances against a schema.

CLI:

Documentation: https://linkml.io/linkml/data/validating-data.html

77

> uv run linkml-validate --schema [schema file] [data source...]

78 of 119

Validating an Example Data File

78

# tests/data/valid/PlantTissueSample-001yaml

id: 1

sample_container: tube

strain_variety_cultivar: RTx430

ncbi_taxonomy_id: NCBITaxon:4558

ploidy: diploid

collection_date_time: "2025-08-10T14:00:00-07:00"

sample_size: 0.45 g

tissue: 5 mm lateral root tips

tissue_plant_ontology_term: PO:0025471

elevation_meters: 120

broad_scale_environmental_context: ENVO:01000247

local_environmental_context: ENVO:01000333

environmental_medium: ENVO:00005789

79 of 119

Validating an Example Data File

79

> uv run linkml-validate \

--schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml \

-C PlantTissueSample \

tests/data/valid/PlantTissueSample-001.yaml

SMoxon@SMoxon-M82 linkml-tutorial-2025 % uv run linkml-validate --schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml -C PlantTissueSample tests/data/valid/PlantTissueSample-001.yaml

No issues found

80 of 119

Validating an Example Data File

80

# tests/data/valid/PlantTissueSample-001yaml

id: 2

sample_container: tube

isolate: Sb_Mut1

ploidy: diploid

collection_date_time: "2025-08-10T14:00:00-07:00"

sample_size: 0.45 g

81 of 119

Validating an Example Data File

81

> uv run linkml-validate \

--schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml \

-C PlantTissueSample \

tests/data/invalid/PlantTissueSample-missing-required.yaml

SMoxon@SMoxon-M82 linkml-tutorial-2025 % uv run linkml-validate --schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml -C PlantTissueSample tests/data/invalid/PlantTissueSample-missing-required.yaml

[ERROR] [tests/data/invalid/PlantTissueSample-missing-required.yaml/0] 'strain_variety_cultivar' is a required property in /

[ERROR] [tests/data/invalid/PlantTissueSample-missing-required.yaml/0] 'ncbi_taxonomy_id' is a required property in /

[ERROR] [tests/data/invalid/PlantTissueSample-missing-required.yaml/0] 'tissue' is a required property in /

82 of 119

Validating an Example Data File

82

# tests/data/invalid/PlantTissueSample-pattern-violation.yaml

id: 4

sample_container: plate

plate_location: Z99

strain_variety_cultivar: RTx430

ncbi_taxonomy_id: NCBITaxon:4558

collection_date_time: "2025-08-10T14:00:00-07:00"

sample_size: 0.45grams

tissue: 5 mm lateral root tips

83 of 119

Validating an Example Data File

83

> uv run linkml-validate \

--schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml \

-C PlantTissueSample \

tests/data/invalid/PlantTissueSample-pattern-violation.yaml

SMoxon@SMoxon-M82 linkml-tutorial-2025 % uv run linkml-validate --schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml -C PlantTissueSample tests/data/invalid/PlantTissueSample-pattern-violation.yaml

[ERROR] [tests/data/invalid/PlantTissueSample-pattern-violation.yaml/0] 'Z99' does not match '^[A-H][1-9][0-2]?$' in /plate_location

[ERROR] [tests/data/invalid/PlantTissueSample-pattern-violation.yaml/0] '0.45grams' does not match '^[0-9]+(\\.[0-9]+)?\\s+(g|ml|m2)$' in /sample_size

84 of 119

Validating an Example Data File

84

# tests/data/invalid/PlantTissueSample-bad-range.yaml

id: 3

sample_container: box

strain_variety_cultivar: RTx430

ncbi_taxonomy_id: NCBITaxon:4558

ploidy: octoploid

collection_date_time: "2025-08-10T14:00:00-07:00"

sample_size: 0.45 g

tissue: 5 mm lateral root tips

depth_meters: "very deep"

elevation_meters: true

85 of 119

Validating an Example Data File

85

> uv run linkml-validate \

--schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml \

-C PlantTissueSample \

tests/data/invalid/PlantTissueSample-bad-range.yaml

SMoxon@SMoxon-M82 linkml-tutorial-2025 % uv run linkml-validate --schema src/linkml_tutorial_2025/schema/linkml_tutorial_2025.yaml -C PlantTissueSample tests/data/invalid/PlantTissueSample-bad-range.yaml

[ERROR] [tests/data/invalid/PlantTissueSample-bad-range.yaml/0] 'box' is not one of ['tube', 'plate'] in /sample_container

[ERROR] [tests/data/invalid/PlantTissueSample-bad-range.yaml/0] 'octoploid' is not one of ['haploid', 'diploid', 'triploid', 'tetraploid', 'allopolyploid'] in /ploidy

[ERROR] [tests/data/invalid/PlantTissueSample-bad-range.yaml/0] 'very deep' is not of type 'number', 'null' in /depth_meters

[ERROR] [tests/data/invalid/PlantTissueSample-bad-range.yaml/0] True is not of type 'number', 'null' in /elevation_meters

86 of 119

Going Further with Validation

86

Plugin examples:

NCATSTranslator/translator-ingests

microbiomedata/nmdc-schema

87 of 119

Catching up…

87

88 of 119

Section 5: Schema introspection

& programmatic composition

88

89 of 119

Introduction:

  • Schema Introspection with SchemaView
    • Understanding and analyzing existing schemas
    • Navigating hierarchies and extracting mappings
    • Practical example: Mapping Biolink predicates to RO terms

  • Programmatic Schema Construction with SchemaBuilder
    • Building schemas from code
    • Combining components from multiple sources
    • Example: Composing a microbiome research schema

89

90 of 119

Why Schema Introspection?

  • “I can just read the yaml”:
    • And you should! but it may not scale for a very large schema. (Biolink Model is 13k lines!)

  • “It’s yaml, I can work with yaml”
    • Sometimes not wrong! yq ‘.classes | keys’ is very nice, but…
    • yq '.classes.gene.slots' biolink-model.yaml tells me that genes only have symbol and xref properties, and that sounds wrong.

90

91 of 119

Why Schema Introspection?

  • SchemaView is the canonical implementation of LinkML semantics.
  • It handles inheritance, imports, refinements, and all the edge cases that would otherwise surprise you when naively reading YAML.

91

from linkml_runtime.utils.schemaview import SchemaView

sv = SchemaView("biolink-model.yaml")

for slot in sv.class_slots("gene"):

print(slot)

symbol

xref

has biological sequence

id

in taxon

in taxon label

provided by

full name

synonym

information content

equivalent identifiers

iri

category

type

name

description

has attribute

deprecated

92 of 119

What is SchemaView?

Key Features

  • Load schemas from files or URLs
  • Automatic import resolution
  • Navigate class and slot hierarchies
  • Compute induced slots (inheritance + refinement)

SchemaView is at the core of LinkML generators, but can be useful to solve project specific problems.

92

93 of 119

SchemaView Basics: Loading schemas

93

from linkml_runtime.utils.schemaview import SchemaView

sv_from_local_file = SchemaView("./biolink-model.yaml")

sv_from_url = SchemaView("https://w3id.org/biolink/biolink-model.yaml")

# when the schema is embedded in a python package

from importlib.resources import files

biolink_yaml = files('biolink_model.schema') / 'biolink_model.yaml'

sv_from_python_env = SchemaView(str(biolink_yaml))

94 of 119

SchemaView Basics: Accessing elements

94

# Get all classes, slots, enums, etc.

all_classes = sv.all_classes()

all_slots = sv.all_slots()

all_enums = sv.all_enums()

# Get specific elements

gene_class = sv.get_class("gene")

related_to_slot = sv.get_slot("related to")

# Generic element retrieval

element = sv.get_element("named thing")

95 of 119

SchemaView Basics: Navigating Hierarchies

95

# Class hierarchy navigation

parents = sv.class_parents("gene")

children = sv.class_children("gene")

ancestors = sv.class_ancestors("gene")

descendants = sv.class_descendants("gene")

# Find root classes (no parents)

roots = sv.class_roots()

# Find leaf classes (no children)

leaves = sv.class_leaves()

# Find stand-alone classes in the schema

orphans = set(sv.class_roots()) & set(sv.class_leaves())

96 of 119

Example use case: biolink predicate to RO

96

Goal: Extract SKOS-style relationships between Biolink predicates and RO terms

Why this matters:

  • Biolink Model uses predicates (slots) for relationships
  • Many map to Relations Ontology (RO) terms
  • Understanding these mappings helps with:
    • Semantic interoperability
    • Cross-resource integration
    • Knowledge graph construction

97 of 119

Finding Predicates

The set of slots we’re interested are descendants of “related to”

97

#

from linkml_runtime.utils.schemaview import SchemaView

# Load Biolink Model

sv = SchemaView("https://w3id.org/biolink/biolink-model.yaml")

# Get all descendants of "related to"

predicates = sv.slot_descendants("related to")

print(f"Found {len(predicates)} predicates")

98 of 119

Extracting mappings from a single slot

98

def get_mappings(sv, slot):

"""Get all mappings for a single slot"""

mapping_types = {

'exact_mappings': 'skos:exactMatch',

'broad_mappings': 'skos:broadMatch',

'narrow_mappings': 'skos:narrowMatch',

'related_mappings': 'skos:relatedMatch'

}

results = []

for mapping_type, skos_pred in mapping_types.items():

mappings = getattr(slot, mapping_type, [])

for term in mappings:

results.append({

'biolink_predicate': sv.get_uri(slot, expand=False),

'mapping_type': skos_pred,

'mapped_term': term

})

return results

# Example: single slot

slot = sv.get_slot("related to")

mappings = get_mappings(sv, slot)

print(mappings)

[{'biolink_predicate': 'biolink:related_to', 'mapping_type': 'skos:exactMatch', 'mapped_term': 'UMLS:related_to'}, {'biolink_predicate': 'biolink:related_to', 'mapping_type': 'skos:broadMatch', 'mapped_term': 'owl:topObjectProperty'}

. . .

99 of 119

Extract Mappings for all predicates

99

import pandas as pd

# Apply to all descendants of "related to"

all_results = []

for slot_id in sv.slot_descendants("related to", reflexive=True):

slot = sv.get_slot(slot_id)

all_results.extend(get_mappings(sv, slot))

# Convert to DataFrame and filter for RO terms

df = pd.DataFrame(all_results)

df = df[df['mapped_term'].str.startswith('RO:')]

df = df.rename(columns={'mapped_term': 'ro_term'})

# Shuffle to show variety of predicates

df = df.sample(frac=1, random_state=42).reset_index(drop=True)

# View results

print(f"Found {len(df)} RO mappings")

print(df.head(10).to_string(index=False))

Found 200 RO mappings

biolink_predicate mapping_type ro_term

biolink:temporally_related_to skos:narrowMatch RO:0002092

biolink:related_to skos:narrowMatch RO:0002179

biolink:related_to skos:narrowMatch RO:0002373

biolink:orthologous_to skos:exactMatch RO:HOM0000017

biolink:has_input skos:narrowMatch RO:0002590

biolink:precedes skos:narrowMatch RO:0002412

biolink:occurs_in skos:narrowMatch RO:0002231

biolink:caused_by skos:narrowMatch RO:0009501

biolink:causes skos:narrowMatch RO:0002256

biolink:associated_with skos:narrowMatch RO:0004029

100 of 119

Analyze the Mapping Types

Wrap it up with a nice ascii art bar chart

100

# Group by mapping type

mapping_summary = df.groupby('mapping_type').size().sort_values(ascending=False)

print("SKOS mapping types distribution:\n")

max_count = mapping_summary.max()

for mapping_type, count in mapping_summary.items():

bar = '█' * int(50 * count / max_count)

print(f"{mapping_type:20s} {bar} {count}")

```

SKOS mapping types distribution:

skos:narrowMatch ██████████████████████████████████████████████████ 134

skos:exactMatch █████████████████████ 57

skos:broadMatch ███ 9

101 of 119

SchemaBuilder - Programmatic Schema Construction

Why SchemaBuilder?

  • Build schemas programmatically in Python
  • Combine components from multiple schemas
  • Generate schemas from other data sources
  • Implement schema transformations

Implements the Builder Pattern:

101

from linkml.utils.schema_builder import SchemaBuilder

sb = SchemaBuilder("my-schema")

sb.add_class(...).add_slot(...).add_enum(...)

schema = sb.schema

102 of 119

SchemaBuilder Basics

102

from linkml.utils.schema_builder import SchemaBuilder

sb = SchemaBuilder("sample-schema")

# Add slot definitions FIRST (with explicit properties)

sb.add_slot("name", description="Full name of the person", range="string")

sb.add_slot("age", description="Age in years", range="integer")

# Then add class that uses those slots

# Note: "email" will be auto-created with default range (string)

sb.add_class("Person", slots=["name", "age", "email"], description="A person")

# Get the schema

schema = sb.schema

103 of 119

Example - Clinical Research Data Harmonization Schema

Scenario: You're a data architect for a multi-site clinical research network studying chronic diseases.

Challenge: Bridge biomedical knowledge graphs with clinical standards

  • Research sites use Biolink Model for disease/drug entities
  • Clinical sites use HL7/SNOMED standards
  • Need: Unified schema supporting both research analysis AND clinical reporting

Solution: Compose a schema by combining:

  • Biolink entity classes (diseases, drugs, phenotypes)
  • Clinical RO terms not in Biolink (patient-level relationships)
  • Clinical valuesets (HL7/SNOMED-mapped enums)

Goal: Enable harmonized data across molecular research and clinical care

103

104 of 119

Step 1 - Load Source Schema, Start Our Schema

104

from importlib.resources import files

from linkml.utils.schema_builder import SchemaBuilder

from linkml_runtime.utils.schemaview import SchemaView

from linkml_runtime.linkml_model import SlotDefinition

# Load Biolink Model (installed package)

biolink_yaml = files('biolink_model.schema') / 'biolink_model.yaml'

biolink_sv = SchemaView(str(biolink_yaml))

# Create new schema

sb = SchemaBuilder("clinical-research-schema")

sb.add_defaults()

# Add prefixes

sb.add_prefix("biolink", "https://w3id.org/biolink/vocab/")

sb.add_prefix("RO", "http://purl.obolibrary.org/obo/RO_")

sb.add_prefix("HL7", "http://terminology.hl7.org/CodeSystem/")

sb.add_prefix("SNOMED", "http://snomed.info/id/")

105 of 119

Step 2 - Extract Classes and Slots from Biolink

105

# Get relevant Biolink entity classes for clinical research

biolink_classes = ["disease", "drug", "phenotypic feature", "biological entity"]

for class_name in biolink_classes:

cls = biolink_sv.get_class(class_name)

if cls:

# Use title case for our new schema

new_class_name = class_name.title().replace(" ", "")

class_slot_names = biolink_sv.class_slots(class_name)

for slot_name in class_slot_names:

if slot_name not in sb.schema.slots:

slot = biolink_sv.get_slot(slot_name)

# Left out: import enums & simplify other slot ranges, clear is_a, etc.

sb.add_slot(SlotDefinition(**slot.__dict__))

sb.add_class(

new_class_name,

description=cls.description,

slots=class_slot_names,

exact_mappings=[f"biolink:{class_name.replace(' ', '')}"])

106 of 119

Step 3 - Add Clinical RO Relationships

106

# Define clinical relationships using RO terms NOT in Biolink

# These capture patient-level clinical relationships

ro_relationships = {

"has_population_characteristic": {

"description": "Relates a patient to a demographic or population characteristic",

"exact_mappings": ["RO:0002551"],

"domain": "BiologicalEntity",

"range": "string"

},

"has_disposition": {

"description": "Relates a patient to a disease susceptibility or predisposition",

"exact_mappings": ["RO:0000091"],

"domain": "BiologicalEntity",

"range": "Disease"

},

"improves_condition_of": {

"description": "Relates a therapeutic intervention to the condition it improves",

"exact_mappings": ["RO:0002500"],

"domain": "Drug",

"range": "Disease"

},

. . .

for slot_name, props in ro_relationships.items():

sb.add_slot(slot_name,

description=props["description"],

domain=props["domain"],

range=props["range"],

exact_mappings=props["exact_mappings"])

print(f" ✓ {slot_name}{props['exact_mappings'][0]}")

107 of 119

Step 4 - Reuse Clinical ValueSets

107

# Load standardized clinical value sets from linkml/valuesets repository

vs_base = "https://raw.githubusercontent.com/linkml/valuesets/main/src/valuesets/schema/"

demographics_sv = SchemaView(vs_base + "demographics.yaml")

clinical_sv = SchemaView(vs_base + "medical/clinical.yaml")

# Copy ontology-grounded enums from linkml/valuesets (HL7/SNOMED mapped) to our schema

sb.schema.enums["EducationLevel"] = demographics_sv.get_enum("EducationLevel")

sb.schema.enums["BloodTypeEnum"] = clinical_sv.get_enum("BloodTypeEnum")

# Add custom slots for clinical observations

sb.add_slot("education_level", description="Patient education level", range="EducationLevel")

sb.add_slot("blood_type", description="Patient blood type", range="BloodTypeEnum")

# Add a custom clinical observation class

sb.add_class("ClinicalObservation",

description="A clinical observation capturing patient characteristics",

slots=["id", "patient_id", "education_level", "blood_type", "observation_date"])

108 of 119

Step 5 - Export!

108

# Get the completed schema

schema = sb.schema

# Export to YAML

from linkml_runtime.dumpers import yaml_dumper

yaml_dumper.dump(schema, "clinical-research-schema.yaml")

109 of 119

Section 6: LinkML-Map

109

110 of 119

Transforming Data Models with LinkML-Map

What LinkML-Map does:

  • Transform between data models
  • Define transformations declaratively, not procedurally

Built for interoperability and reproducibility�

Tutorial goals:

Understand what LinkML-Map does and how to use it

110

111 of 119

Two Core Functions of LinkML-Map

Schema → Schema Transformation

  • Define how classes, slots, and enums in one model correspond to another�
  • Map definitions written in YAML using LinkML-Map model syntax�
  • Enables consistent transformations between structured models�
  • Useful for aligning community schemas (e.g., HPOA → Phenio)

Data → Data Transformation

  • Once the model mappings are defined, transform actual data instances�
  • Input: source data (YAML/TSV) + mapping spec�
  • Output: target model-compliant data�
  • Consistent, automated, repeatable transformations - can be reversible

111

112 of 119

Why LinkML-Map?

Reduces brittle, hand-coded ETL logic

Transparent mappings that can be version-controlled and reversible

Supports incremental alignment as models evolve

LinkML ecosystem: validation, supports multiple formats

112

113 of 119

Core Concepts

  • Source Schema: where your data starts�
  • Target Schema: where your data needs to end up�
  • Transformer Specification: how things align — class, slot, and value mappings�
  • Transformation: executing the spec to perform the transformation�
  • Validation: ensure the output conforms to the specification

113

114 of 119

Hands-On Tutorial Overview

(Refer to docs/examples/Tutorial.ipynb)

  • Step 1: Define or load source schema�
  • Step 2: Write a mapping specification�
  • Step 3: Transform instance data�
  • Step 4: Transform the schema

  • Step 5: Using Expressions & Unit Conversions�
  • Step 6: Tabular Serialization�
  • Step 7: Reverse Transform

114

115 of 119

Where LinkML-Map Fits

More Example Notebooks

  • Works with other LinkML tools (validators, schema generators, etc.)�
  • Complements LinkML Runtime and downstream frameworks�
  • Can serve as an ETL backbone for harmonizing datasets
    • Data Model-Based Ingestion Pipeline [dm-bip]

Tutorial.ipynb – core walkthrough - presented here

Derivations.ipynb – advanced nested mapping

Schema-Composition.ipynb – reusing maps

Data-Validation.ipynb – validating transformed instances

115

116 of 119

Wrap up and Discussion

116

117 of 119

Questions? Discussion?

117

118 of 119

Learning more and staying connected

  • Our website: https://linkml.io
  • GitHub:
    • Issues: https://github.com/linkml/linkml/issues
    • All feature requests, comments, questions are welcome!
    • We 🤎 pull requests!
  • Connecting directly
    • Developers currently meet on OBO Workspace slack

118

119 of 119

Join the LinkML community!

119