1 of 27

Open Access

for Code and Models

Practical steps for making code and models reusable

Karin Engström, Associate Professor and Domain Specialist in bioinformatics, Lund University

1

2 of 27

Core idea

Putting code or models online is only the first step

Open Science happens when they are versioned, documented, licensed, and easy to cite or reuse.

These are also key steps toward making them FAIR:

Findable Accessible Interoperable Reusable

https://www.reddit.com/r/ProgrammerHumor/comments/1 22hv4n/when_you_are_tired_of_explaining/

3 of 27

Session outline

60 minutes

Learning outcomes

Why open code and models matter

Concepts: code, models, licensing, citation

Hands-on 1: Git basics

Hands-on 2: README and reusable pipelines

Hands-on 3: model documentation

1

2

3

4

5

6

7

Wrap-up checklist

4 of 27

Hands-on work

Three practical skills to take away

1. Version control (Git) Create a small version-controlled

research code folder.

2. Document Write a README that helps others

understand and run code.

3. Share responsibly Identify the minimum documentation needed

for model reuse.

Focus today: practical minimums, not perfect repositories. Goal: make outputs easier to inspect, cite, and reuse.

5 of 27

“Online” is not the same as “reusable”

Difference between visibility and practical openness

Visible, but hard to reuse

  • Files uploaded somewhere
  • No README
  • No license
  • Unclear version
  • Missing dependencies
  • No citation or contact info

Reusable in practice

  • Versioned repository
  • README
  • License
  • Run instructions
  • Release or archive
  • Limitations documented

3

6 of 27

This reduces friction for others — and for your future self

AI generated

7 of 27

What Open Science asks us to do here

Open as possible, closed as necessary

Aim for openness that supports reuse

  • Make outputs understandable
  • Make them easier to inspect
  • Enable reuse where possible
  • Document what others need to know

Why something may not be fully shareable

  • Personal or sensitive data
  • Copyrighted training material
  • API keys or credentials
  • Proprietary dependencies
  • Partner or contract restrictions

Even when full release is impossible, you can often still share metadata, documentation, synthetic examples, or access conditions.

4

8 of 27

What is code?

Code is a set of instructions that tells a computer how to perform a task (e.g. to run and build a model).

6

9 of 27

What is a model?

A model is a simplified representation of part of the world, built so we can describe it, explain it, predict outcomes, or simulate processes.

Examples across research

  • Statistical models
  • Climate or epidemiological models
  • Conceptual models in theory-building
  • Machine learning models

input

model

output

6

10 of 27

Code and models

Code builds and runs models

Models represent patterns or systems

Code → needed to reproduce how the model was created Model → needed to reuse or apply it

Sharing only one is often not enough for open science

6

11 of 27

The minimum shareable code package

project/

├─ README.md

├─ LICENSE

├─ analysis.py

├─ requirements.txt

├─ results/

├─ .gitignore

└─ data/

└─ README.txt

Why each file matters

  • README.md explains what the project is and how to use it
  • LICENSE: states reuse conditions
  • analysis.py contains the example code.
  • requirements: software dependencies needed
  • results/ is where outputs may be written
  • .gitignore: lists files or folders that should not be tracked
  • data/README.txt explains the data folder, even if data cannot be shared.

8

12 of 27

Sharing and citing code: repository vs archive

A paper needs a stable version, not only the latest state

Development platform

  • GitHub or GitLab
    • Note: May be deleted any time by their owner!
  • Active work
  • Latest version
  • Collaboration and issue tracking

Archive / citation layer

  • Zenodo
  • Release a specific version
  • Archive it, ensuring long-term access
  • Mint a DOI
  • Cite the exact version used in the paper

“A repository is where the project grows. An archive is where a version becomes part of the scholarly record.”

1

0

13 of 27

Licensing: visible is not the same as reusable

Keep the legal message simple and practical

No license

People can see the code, but reuse is unclear.

Common families to mention

MIT – very permissive: you can use, modify, and share the code freely, as long as you include the original license.

GPL – copyleft: you can use and modify the code, but if you share your version, it must also be open under the same license.

Apache 2.0 – permissive: like MIT, but with additional legal clarity, especially regarding patents.

9

14 of 27

Git and version control

15 of 27

Git and open science - making research more transparent and reusable

  • Transparency

See what changed, when, and why

Reproducibility

Return to the exact version used in a study - avoid “final_final_v3” confusion

  • Collaboration

Work together without overwriting each other and track contributions clearly

  • Sharing

Structure your code for others and connect to platforms (GitHub, Zenodo)

  • Your future self

Understand your own work later!

6

16 of 27

AI

generated

17 of 27

Git and GitHub: what is the difference?

Git = version control tool. GitHub = online hosting platform.

Git

  • A tool for tracking changes in files
    • Works with any text-based files (R, Python, SPSS etc.)
  • Works on your own computer
  • Allows you to save versions through commits

GitHub

  • A web platform for hosting Git repositories online
  • Used for sharing, collaboration, and visibility
  • Often used to publish code publicly
  • Builds on Git, but is not the same thing

You can use Git without GitHub, but GitHub usually uses Git.

1

2

18 of 27

Hands-on 1: Git basics

Around 15–20 minutes

What to remember

  • A repository starts in an ordinary folder
  • A commit is a named snapshot
  • Short commit messages explain the change
  • Versioning begins with a few simple commands

1

3

19 of 27

Link to exercises

20 of 27

Hands-on 2: Writing a README

If someone reads only one file, it will probably be this one

Example README structure

# Project name

## Short description ## Files included

## How to run

## Dependencies ## Data

## Citation / contact ## License

A good README answers basic questions

  • What is this project?
  • What does it do?
  • How do I run it?
  • What depends on what?
  • Is the data included, restricted, synthetic, or available elsewhere?
  • How should I cite it?
  • What license applies?

1

4

21 of 27

Hands-on 2: Example README

Fictional project: Campus Coffee Consumption Analysis

1

Project Name

Campus Coffee Consumption Analysis

2

Short description

Analyzes coffee habits from a small campus survey.

3

Files included

analysis.py, data/README.txt, results/ (generated outputs, ignored).

4

How to run

Install dependencies, data description is available in data/README.txt, then run: python analysis.py

5

Dependencies

Python 3.10, pandas, matplotlib.

6

Citation / contact / license

Your Name, 2026. Contact: email@university.se. MIT License.

22 of 27

Link to exercises

23 of 27

Models are not just files: they need documentation

Purpose

What is this model for? Who is it meant to help?

Data and evaluation

What data was used?

How was the model evaluated?

Limitations

Where can it fail? What is out of scope?

License and use

What may others do with it, and how do they run it?

There are many domain-specific frameworks (such as DOME for ML), but they all focus on the same core questions: purpose, data, evaluation, and limitations.

1

5

24 of 27

Hands-on 3: Documenting models

# Model name

## Version ## Purpose

## Intended use

## Out-of-scope use

## Training/source data ## Evaluation

## Limitations ## License

## How to use

Around 10 minutes

Before sharing code or a model, ask:

Is it versioned? Is it licensed?

Is it documented?

Is it runnable or at least understandable?

Have sensitive files, credentials, and restricted data been excluded?

Is there a stable citation point? Are the limitations clear?

1

6

This is essentially a simplified version of what is called a ‘model card’ in machine learning

25 of 27

Hands-on 3: Example Model Documentation

Fictional model: Coffee Consumption Classifier

1

Version

v1.0 — the documented model release.

2

Purpose

Predict high vs low coffee use from survey answers.

3

Intended use

Teaching and research demonstration of a simple classifier.

4

Out-of-scope use

Not for health advice, profiling, or decisions about individuals.

5

Training/source data

Campus survey of staff and students (n=200); no identifiers included.

6

Evaluation

Tested on held-out survey responses; accuracy around 78%.

7

Limitations

Small dataset, one university context, limited generalizability.

8

License

MIT License for example code and documentation.

9

How to use

Run the prediction function in analysis.py with new survey data.

26 of 27

Link to exercises

27 of 27

Take-home checklist

Versioned → track changes (Git)

Documented → README + model description Licensed → clear reuse conditions

Runnable → dependencies + simple instructions Citable → release or DOI (e.g. Zenodo) Responsible → no sensitive or restricted data

Open Science is not about sharing everything — it’s about sharing enough for others to understand, run, and reuse your work.