Welcome to CoFest 2024!
open-bio.org/events/bosc-2024/obf-bosc-collaborationfest-2024/
Land Acknowledgement
We respectfully acknowledge that we are currently meeting on unceded Indigenous lands. The Kanien’kehà:ka Nation is recognized as the custodians of the lands and waters on which we gather today.
Logistics
Online channels
ZOOM ROOM FOR DAY 1
ZOOM ROOM FOR DAY 2
https://app.slack.com/client/T01M4QNJKAL/C01M9N7B02E
SLACK #COFEST @OBF-BOSC
Round of introductions
Project/Discussions/Collab Reports, day 1
Project: Hervé
We worked on
You can see/learn more about it on https://www.pizzapizza.ca ;)
We'd very much like your help/feedback with cleaning up the room tonight
Tomorrow, I've planned to order more pizza/coffee, and work on codefair.io.
Project: CWL v1.3
cwltool: upgraded the loop implementation to the new syntax proposed in CWL v1.3.0-dev1
Discussed and advanced many proposals
Project: Taking over the world with data frames
��
Project: Taking over the world with data frames
JBrowse2 Tripal Integration
People:
Goal: Create a JBrowse2 Authentication plugin that uses the Drupal REST API to authenticate users.
Progress:
The Saga Continues: JBrowse2 Tripal Embedding
People:
Goal: Continue ongoing collab. to embed a JBrowse instance in a Tripal Site
Progress:
Repo: tripal/tripal_jbrowse
Project: Workflow Benchmarking
People:
Goal:
Discussed collaborative effort to provide reference workflows for selected use cases; to serve as benchmarking and Rosetta Stone resource to help users understand the differences between workflow languages/frameworks.
Combining JBrowse2 and iCn3D
Jiyao Wang
Francois Belleau
Colin Diesh
Philippe Youkharibache
Ravi Abrol
Goal
Create a protein features plugin to visualize into JBrowse.��Creating a plugin inspired by mygene.info one.
Create a Elastisearch for Mygene.info and Refseq to be consumed by the plugin.
Displaying Protein-Ligand Interactions in iCn3D
Jiyao Wang, Ravi Abrol, Philippe Youkharibache
Ligand in PDB format
Ligand in SVG format
Openbabel
PDB 3GVU
Protein-Ligand Interaction in iCn3D
Current Display
Future Display
Project: End-to-End-Open Biomedical AI
We made a Llamafile (running on a local GPU) available on the public Internet.
You can learn more by emailing Sasha (awaitz@post.harvard.edu) or saying hello in the CollaborationFest Slack
We'd very much enjoy working with other teams that want to evaluate their own biomedical LLMs (prompt / model / inference engine / GPU ) in a standards based way
Tomorrow we plan to try a concrete example with CWL
Describing/Scanning workflows using LLMs
David
Sasha
Nuria
Tazro
$ echo -e "Here is a provenance information of a computational workflow. Please describe the potential security risks of this workflow runs.\n\n$(cat ./ro-crate-metadata.json)" | yllm --api-url http://localhost:11434/v1/chat/completions --model llama3 --stdin
A computational workflow provenance!
After analyzing the provided information, I've identified some potential security risks:
1. **Untrusted Docker Images**: The workflow uses Docker images from untrusted sources (e.g., `ghcr.io/inutano/download-sra:a827d85` and `ncbi/sra-tools:3.0.0`). This could lead to the execution of malicious code or data exfiltration.
2. **Privilege Escalation**: The workflow runs as user `0:0`, which is equivalent to root privileges on Linux systems. This allows the workflow to perform actions that would normally require elevated permissions, potentially leading to privilege escalation attacks.
3. **Data Exposure**: The workflow writes files to `/var/lib/cwl/stg0c84bfcb-9196-4d1e-ac8d-3d2941b157c8/SRR1274307.sra` and other locations, which could lead to sensitive data exposure if not properly secured.
4. **Unvalidated User Input**: The workflow uses user input (e.g., `SRR1274307`) without proper validation or sanitization, which could lead to injection attacks or other security vulnerabilities.
5. **Insecure File Mounting**: The workflow mounts files from untrusted sources using the `--mount` flag, which could lead to file system tampering or data exfiltration.
6. **Unsecured Environment Variables**: The workflow sets environment variables (e.g., `TMPDIR=/tmp`, `HOME=/root`) without proper validation or sanitization, which could lead to security vulnerabilities.
To mitigate these risks, it's essential to:
1. Use trusted Docker images and ensure they are properly secured.
2. Limit the privileges of the workflow to a least-privileged user account.
3. Ensure that sensitive data is properly encrypted and access-controlled.
4. Validate and sanitize all user input before processing.
5. Use secure file mounting mechanisms, such as FUSE or aufs, instead of `--mount`.
6. Set environment variables only for necessary components and ensure they are properly validated and sanitized.
Remember to regularly review and update your workflow's security configuration to prevent potential vulnerabilities.
Day 2
Describing/Scanning workflows using LLMs
Nuria
Using LLMs to describe datasets and potential biases in them
more is available here:
Tataki and the nightmare of file formats
Who
Goal
What we did
続: Tataki and the nightmare of file formats
$ curl -sL https://github.com/edamontology/edamontology/releases/download/1.25/EDAM_1.25.tsv | awk 'BEGIN{ FS=OFS="\t" } $1 ~ /format/ && $13 != "" { print $1,$2,$13 }' | head | while read line; do edam_id=$(echo $line | cut -f1); plabel=$(echo $line | cut -f2); url=$(echo $line | cut -f3); echo $edam_id; echo "Here is a http source of a web page.\n\n$(curl -sL $url)\n\nIs this web page describing the formal specification of the file format ${plabel}? If your answer is yes, say ${plabel}. If the answer is no, just say no. No explanation is needed.\n\n" | yllm --api-url http://localhost:11434/v1/chat/completions --model llama3 --stdin; done
http://edamontology.org/format_1196
no
http://edamontology.org/format_1630
No
http://edamontology.org/format_1631
EXP
http://edamontology.org/format_1632
SCF
http://edamontology.org/format_1633
no
http://edamontology.org/format_1960
Staden format.
http://edamontology.org/format_1961
Stockholm format
http://edamontology.org/format_1974
No
http://edamontology.org/format_1975
No
http://edamontology.org/format_1997
PHYLIP format.
Project: CWL v1.3
schema-salad: new release to support CWL v1.3.0-dev1 features
Discussed and advanced many proposals
Project: Codefair
Project: Codefair
Good CWL file
Project: Codefair
Bad CWL file
Project: Codefair
Bad CWL file
Project: Codefair
Future work: