2 of 106

Interactive Exercises

Command line examples will look like this

Follow the URLs for browser-based examples

$ cd ~/blobtoolkit

$ ls

Blobtools2

insdc-pipeline

specification

taxdump

viewer

https://blobtoolkit.genomehubs.org/view

3 of 106

Overview

Why Blobs?
BlobToolKit
Using the Viewer — browser
Running BlobToolKit — command line
Programmatic Access — browser & command line

4 of 106

Why blobs?

We want to sequence a tardigrade

– but it comes with a soup of other organisms

5 of 106

Why blobs?

All tardigrade DNA will be at the same molarity*

* Organellar genomes

* Sex chomosomes

* Allelic divergence

* Repeats

Contaminant DNA will be at different molarities*

* Single-copy symbionts

* Or not

6 of 106

Why blobs?

coverage

<1x 1x 2x 4x

Relative frequency

7 of 106

Why blobs?

coverage

<1x 1x 2x 4x

Relative frequency

8 of 106

Why blobs?

All tardigrade DNA will have similar GC content*

* Organellar genomes

* Localised variation

* Repeats

Contaminant DNA GC content may differ*

* Or not

9 of 106

Why Blobs?

GC proportion

0% 25% 50% 75% 100%

Relative frequency

10 of 106

Why Blobs?

GC proportion

0% 25% 50% 75% 100%

coverage

11 of 106

Why Blobs?

add taxonomic annotation to each contig

GC proportion

0% 25% 50% 75% 100%

coverage

12 of 106

Why Blobs?

GC proportion

0% 25% 50% 75% 100%

coverage

Hypsibius dujardini

Chitinophaga

Pseudomonas

Stenotrophomonas

alphaproteobacterium

13 of 106

Blobology Sujai Kumar and colleagues 2013

https://dx.doi.org/10.3389/fgene.2013.00237

14 of 106

Blobology Sujai Kumar and colleagues 2013

https://dx.doi.org/10.3389/fgene.2013.00237

15 of 106

BlobTools Dominik Laetsch and colleagues 2017

https://f1000research.com/articles/6-1287/v1

16 of 106

BlobTools Dominik Laetsch and colleagues 2017

https://f1000research.com/articles/6-1287/v1

17 of 106

BlobToolKit

University of Edinburgh &

Wellcome Sanger Institute

Richard Challis
Mark Blaxter

European Nucleotide Archive

European Bioinformatics Institute

Edward Richards
Jeena Rajan
Guy Cochrane

https://blobtoolkit.genomehubs.org

18 of 106

BlobToolKit

https://www.biorxiv.org/content/10.1101/844852v1

19 of 106

BlobToolKit

BlobDir dataset

DatasetID

|— meta.json

|— identifiers.json

|— gc.json

|— length.json

|— ncount.json

|— {LIBRARYNAME}_cov.json

|— {LIBRARYNAME}_read_cov.json

|— {TAXRULE}_positions.json

|— {TAXRULE}_{RANK}.json

|— {TAXRULE}_{RANK}_cindex.json

|— {TAXRULE}_{RANK}_positions.json

|— {TAXRULE}_{RANK}_score.json

|— {LINEAGE}_busco.json

https://blobtoolkit.genomehubs.org

20 of 106

BlobToolKit

BlobDir dataset

BlobTools2

DatasetID

|— meta.json

...

DatasetID

|— meta.json

|— identifiers.json

|— gc.json

|— length.json

|— ncount.json

|— {LIBRARYNAME}_cov.json

|— {LIBRARYNAME}_read_cov.json

|— {TAXRULE}_positions.json

|— {TAXRULE}_{RANK}.json

|— {TAXRULE}_{RANK}_cindex.json

|— {TAXRULE}_{RANK}_positions.json

|— {TAXRULE}_{RANK}_score.json

|— {LINEAGE}_busco.json

$ ./blobtools create --fasta ACVV01.fasta \

... /path/to/BlobDir

21 of 106

BlobToolKit

BlobDir dataset

BlobTools2

DatasetID

|— meta.json

...

DatasetID

|— meta.json

|— identifiers.json

|— gc.json

|— length.json

|— ncount.json

|— {LIBRARYNAME}_cov.json

|— {LIBRARYNAME}_read_cov.json

|— {TAXRULE}_positions.json

|— {TAXRULE}_{RANK}.json

|— {TAXRULE}_{RANK}_cindex.json

|— {TAXRULE}_{RANK}_positions.json

|— {TAXRULE}_{RANK}_score.json

|— {LINEAGE}_busco.json

$ ./blobtools create --fasta ACVV01.fasta \

... /path/to/BlobDir

Pipeline

22 of 106

BlobToolKit

BlobDir dataset

BlobTools2

DatasetID

|— meta.json

...

$ ./blobtools create --fasta ACVV01.fasta \

... /path/to/BlobDir

Pipeline

Viewer

23 of 106

BlobToolKit

BlobDir dataset

BlobTools2

DatasetID

|— meta.json

...

$ ./blobtools create --fasta ACVV01.fasta \

... /path/to/BlobDir

ENA browser

Pipeline

Viewer

24 of 106

BlobToolKit Pipeline

github.com/blobtoolkit/insdc-pipeline

25 of 106

BlobToolKit Pipeline

https://blobtoolkit.genomehubs.org/pipeline/pipeline-tutorials/pipeline-configuration/

26 of 106

BlobToolKit Pipeline

github.com/blobtoolkit/insdc-pipeline

27 of 106

BlobToolKit Pipeline

Pipeline configuration file

assembly:

accession: GCA_00029833$

alias: DroAlb_1.0

bioproject: PRJNA39511

level: scaffold

span: 253560284

prefix: ACVV01

taxon:

taxid: 7291

name: Drosophila albomi$

github.com/blobtoolkit/insdc-pipeline

28 of 106

BlobToolKit Pipeline

Pipeline configuration file

similarity:

defaults:

evalue: 1e-25

max_target_seqs: 10

root: 1

mask_ids: [7215]

databases:

- {name: nt_v5, local$

- {name: reference_pr$ taxrule: bestsumorder

github.com/blobtoolkit/insdc-pipeline

29 of 106

BlobToolKit Pipeline

Pipeline configuration file

reads:

paired:

- [SRR01,ILLUMINA,482$

- [SRR02,ILLUMINA,552$

single:

- [SRR03,PACBIO]

coverage:

max: 100

min: 0.5

github.com/blobtoolkit/insdc-pipeline

30 of 106

BlobToolKit Pipeline

Pipeline configuration file

busco:

lineages:

- diptera_odb9

- arthropoda_odb9

- eukaryota_odb9

lineage_dir: /busco/lin$

github.com/blobtoolkit/insdc-pipeline

31 of 106

BlobToolKit Pipeline

Snakemake command to run the Pipeline

snakemake -p \

--use-conda \

--conda-prefix $CONDA_DIR \

--directory $WORKDIR/ \

--configfile $WORKDIR/$ASSEMBLY.yaml \

--stats $ASSEMBLY.snakemake.stats \

-j $THREADS \

--resources btk=1 \

-n

https://github.com/blobtoolkit/insdc-pipeline

32 of 106

BlobToolKit Pipeline

github.com/blobtoolkit/insdc-pipeline

33 of 106

BlobToolKit Pipeline

Cluster configuration

__default__:

mem: 100

queue: 'small'

bamtools_stats:

threads: 1

mem: 1000

run_blastn:

threads: 16

mem: 100000

queue: 'normal'

github.com/blobtoolkit/insdc-pipeline

34 of 106

BlobToolKit Pipeline

Snakemake command for running the Pipeline on a cluster

snakemake -p --cluster-config cluster.yaml \

--drmaa " -o {log}.o \

-e {log}.e \

-R \"select[mem>{cluster.mem}] rusag$

-M {cluster.mem} \

-n {cluster.threads} \

-q {cluster.queue}" \

...

https://github.com/blobtoolkit/insdc-pipeline

35 of 106

Using the Viewer

Finding datasets
BlobToolKit Views
Indicators of assembly quality
Exploring non-target data
Digging deeper
Customising plots
Reproducibility

36 of 106

Using the Viewer

https://blobtoolkit.genomehubs.org/btk-viewer/

37 of 106

Using the Viewer

Finding datasets
BlobToolKit Views
Indicators of assembly quality
Exploring non-target data
Digging deeper
Customising plots
Reproducibility

38 of 106

Finding datasets

https://blobtoolkit.genomehubs.org/view

39 of 106

Using the Viewer

Finding datasets
BlobToolKit Views
Indicators of assembly quality
Exploring non-target data
Digging deeper
Customising plots
Reproducibility

40 of 106

BlobToolKit views

https://blobtoolkit.genomehubs.org/view/Drosophilidae/dataset/ACVV01/blob#Filters

41 of 106

Using the Viewer

Finding datasets
BlobToolKit Views
Indicators of assembly quality
Exploring non-target data
Digging deeper
Customising plots
Reproducibility

42 of 106

Indicators of assembly quality

https://blobtoolkit.genomehubs.org/view/Drosophilidae/dataset/ACVV01/blob?plotShape=circle&plotGraphics=svg#Settings

43 of 106

Using the Viewer

Finding datasets
BlobToolKit Views
Indicators of assembly quality
Exploring non-target data
Digging deeper
Customising plots
Reproducibility

44 of 106

Exploring non-target data

https://blobtoolkit.genomehubs.org/view/Drosophilidae

45 of 106

Using the Viewer

Finding datasets
BlobToolKit Views
Indicators of assembly quality
Exploring non-target data
Digging deeper
Customising plots
Reproducibility

46 of 106

Digging deeper

https://blobtoolkit.genomehubs.org/view/Drosophilidae/dataset/AFFF02/blob?zScale=scaleLog#Summary

47 of 106

Using the Viewer

Finding datasets
BlobToolKit Views
Indicators of assembly quality
Exploring non-target data
Digging deeper
Customising plots
Reproducibility

48 of 106

Customising plots

https://blobtoolkit.genomehubs.org/view/Drosophilidae/dataset/AFFF02/blob?plotShape=circle#Filters

49 of 106

Using the Viewer

Finding datasets
BlobToolKit Views
Indicators of assembly quality
Exploring non-target data
Digging deeper
Customising plots
Reproducibility

50 of 106

Reproducibility

https://blobtoolkit.genomehubs.org/view/Drosophilidae/dataset/AFFF02/blob?length--Min=1000&SRR340178_cov--Min=0.1&SRR340178_cov--Max=10&SRR340178_cov--LimitMin=0.1&SRR340178_cov--LimitMax=10&gc--LimitMin=0.0028&gc--Min=0.0028#Filters

KB458458.1

51 of 106

Using the Viewer

Find assemblies with good and bad conventional metrics to compare plots

Search for assemblies with cobionts, e.g.:

Apicomplexa in Chordata
Proteobacteria in Arachnida

Take a look at some of these assemblies:

Crypturellus cinnamomeus PTEZ01 (bird)
Colinus virginianus AWGT02 (bird)
Onchocerca ochengi FJNM01 (nematode parasite)
Brugia timori UZAG01 (nematode parasite)
Sciurus vulgaris mSciVul1_1 (mammal)

https://blobtoolkit.genomehubs.org/view

52 of 106

Running BlobToolKit

Hosting a local Viewer instance
Running BlobTools2
Extending BlobTools2
Dataset validation

53 of 106

Running BlobToolKit

Check BlobToolKit has been downloaded

$ cd ~/blobtoolkit

$ ls

blobtools2

insdc-pipeline

specification

taxdump

viewer

54 of 106

Running BlobToolKit

Check the Conda package manager has been installed

$ conda activate btk_env

(btk_env) $ which python3

/home/username/miniconda3/envs/btk_env/bin/python3

55 of 106

Running BlobToolKit

Hosting a local Viewer instance
Running BlobTools2
Extending BlobTools2
Dataset validation

56 of 106

Hosting a local Viewer instance

http://blobtoolkit.genomehubs.org/download/AC/ACVV01/

57 of 106

Hosting a local Viewer instance

Download a BlobDir dataset from the public Viewer

$ mkdir -p ~/blobtoolkit/datasets

$ cd ~/blobtoolkit/datasets

$ curl http://blobtoolkit.genomehubs.org/download/AC/ACVV01/ACVV01.blobdir.tar.gz | tar xf -

http://blobtoolkit.genomehubs.org/download/AC/ACVV01

58 of 106

Hosting a local Viewer instance

Viewer configuration environment variables

NODE_ENV=local

BTK_CLIENT_PORT=8080

BTK_API_PORT=8000

BTK_API_URL=http://localhost:8000/api/v1

BTK_BASENAME=/view

BTK_ORIGINS='http://localhost:8080 http://localhost null'

BTK_HOST=localhost

BTK_FILE_PATH=/home/username/blobtoolkit/datasets

BTK_USE_DEFAULT_LINKS=true

BTK_STATIC_THRESHOLD=100000

BTK_NOHIT_THRESHOLD=1000000

59 of 106

Hosting a local Viewer instance

Create file with Viewer environment variables

$ cd ~/blobtoolkit/viewer

$ cp .env.dist .env

$ pwd

/home/username/blobtoolkit/viewer

$ nano .env

...

BTK_FILE_PATH=/home/username/blobtoolkit/datasets

...

60 of 106

Hosting a local Viewer instance

Start the Viewer API (back end server)

Start the Viewer (front end server)

$ cd ~/blobtoolkit/viewer

$ conda activate btk_env

(btk_env) $ npm start

...

$ cd ~/blobtoolkit/viewer

$ conda activate btk_env

(btk_env) $ npm run client

...

https://github.com/blobtoolkit/viewer

61 of 106

Hosting a local Viewer instance

http://localhost:8080/view

62 of 106

Hosting a local Viewer instance

Environment variables for publicly available site

NODE_ENV=production

BTK_API_URL=https://blobtoolkit.genomehubs.org/api/v1

BTK_HTTPS=true

BTK_ORIGINS='https:localhost:8080 https://blobtoolkit.genom$

BTK_HOST='blobtoolkit.genomehubs.org'

BTK_KEYFILE='/path/to/privkey.pem'

BTK_CERTFILE='/path/to/cert.pem'

BTK_GDPR_URL=https://genomehubs.org/gdpr

BTK_DATASET_TABLE=true

63 of 106

Running BlobToolKit

Hosting a local Viewer instance
Running BlobTools2
Extending BlobTools2
Dataset validation

64 of 106

Running BlobTools2

https://blobtoolkit.genomehubs.org/blobtools2/

65 of 106

Running BlobTools2

github.com/blobtoolkit/insdc-pipeline

66 of 106

Running BlobTools2

github.com/blobtoolkit/insdc-pipeline

67 of 106

Running BlobTools2

github.com/blobtoolkit/insdc-pipeline

68 of 106

Running BlobTools2

github.com/blobtoolkit/insdc-pipeline

69 of 106

Running BlobTools2

Command to create a BlobDir dataset

$ ./blobtools create --fasta ACVV01.fasta \

--meta ACVV01.yaml \

--hits ACVV01.blastn.out \

--hits ACVV01.diamond.out \

--cov ACVV01.SR01.bam \

--busco ACVV01.busco.diptera_odb9.tsv \

--taxid 7291 \

--taxdump /path/to/taxdump \

/path/to/BlobDir

https://github.com/blobtoolkit/blobtools2

70 of 106

Running BlobTools2

Options when adding BLAST results to a BlobDir dataset

$ ./blobtools add --hits ACVV01.blastn.vs.custom.db.out \

--taxdump /path/to/taxdump \

--taxrule bestsum=myTaxruleName \

--bitscore 500 \

--evalue 1e-75 \

--hit-count 5 \

/path/to/BlobDir

https://github.com/blobtoolkit/blobtools2

71 of 106

Running BlobTools2

Download BLAST results from the public Viewer

$ cd ~/blobtoolkit

$ conda activate btk_env

(btk_env) $ curl http://blobtoolkit.genomehubs.org/download/AC/ACVV01/ACVV01.blastn.nt.root.1.minus.7215.out.gz | gunzip > ACVV01.blastn.out

http://blobtoolkit.genomehubs.org/download/AC/ACVV01

72 of 106

Running BlobTools2

Import BLAST results with non-default settings

$ cd ~/blobtoolkit

$ ./blobtools2/blobtools add --hits ACVV01.blastn.out --taxrule bestsum=alt --taxdump ./taxdump --bitscore 500 ./datasets/ACVV01

$ ls ./datasets/ACVV01/alt_p*

datasets/ACVV01/alt_phylum.json

datasets/ACVV01/alt_phylum_cindex.json

datasets/ACVV01/alt_phylum_positions.json

datasets/ACVV01/alt_phylum_score.json

datasets/ACVV01/alt_positions.json

https://github.com/blobtoolkit/blobtools2

73 of 106

Running BlobToolKit

Hosting a local Viewer instance
Running BlobTools2
Extending BlobTools2
Dataset validation

74 of 106

Extending BlobTools2

Generic datatypes

Identifier
Variable
Category
Array
MultiArray

One parser per analysis type

fasta.py for --fasta
hits.py for --hits
cov.py for --cov
busco.py for --busco
trnascan.py for --trnascan
txt.py for --txt (coming soon)

https://github.com/blobtoolkit/blobtools2

75 of 106

Running BlobToolKit

Hosting a local Viewer instance
Running BlobTools2
Extending BlobTools2
Dataset validation

76 of 106

BlobDir validation

https://blobtoolkit.genomehubs.org/specification/specification-tutorials/validating-datasets/

77 of 106

BlobDir validation

Validate a BlobDir dataset

$ cd ~/blobtoolkit

$ ./specification/validate.py ./datasets/ACVV01/meta.json

VALID

https://github.org/blobtoolkit/specification

78 of 106

Running BlobToolKit

Which method seems most useful to you?

Pipeline
BlobTools2

Which analyses would you like to see incorporated?

Where do you think development efforts should be focused?

Viewer?
Pipeline?
BlobTools2?
documentation?

79 of 106

Programmatic Access

Viewer API — browser & command line
Viewer plots — command line
Filtering datasets — command line
Filtering data files

80 of 106

Programmatic Access

Viewer API — browser & command line
Viewer plots — command line
Filtering datasets — command line
Filtering data files

81 of 106

Viewer API

https://blobtoolkit.genomehubs.org/api-docs/

82 of 106

Viewer API

Access API endpoints on the command line

$ curl -s https://blobtoolkit.genomehubs.org/api/v1/dataset/id/ACVV01/assembly/span

253560284

https://blobtoolkit.genomehubs.org/api-docs /

83 of 106

Viewer API

Use jq to process API data

$ curl -s https://blobtoolkit.genomehubs.org/api/v1/field/ACVV01/length | jq '.values | add'

253560284

$ curl -s https://blobtoolkit.genomehubs.org/api/v1/field/ACVV01/length | jq '.values | map(select(. > 5000)) | add'

214698551

https://stedolan.github.io/jq/

84 of 106

Programmatic Access

Viewer API — browser & command line
Viewer plots — command line
Filtering datasets — command line
Filtering data files

85 of 106

Viewer plots

X11

Generate a plot using blobtools view

$ ./blobtools2/blobtools view --ports 8000-8099 \

--param gc--Min=0.3 --param plotShape=hex \

./datasets/ACVV01

Initializing viewer |███████████████████████████████████████| 15/15 seconds

Loading_http://localhost:8013/view/dataset/ACVV01/blob?staticThreshold=Infinity&nohitThreshold=Infinity&plotGraphics=svg&gc--Min=0.3&plotShape=hex

...

waiting for file 'ACVV01.blob.hex.png'

https://github.com/blobtoolkit/blobtools2

86 of 106

Viewer plots

Generate a plot using blobtools view

$ ./blobtools2/blobtools view --ports 8000-8099 \

--param gc--Min=0.3 --param plotShape=hex \

./datasets/ACVV01

Initializing viewer |███████████████████████████████████████| 15/15 seconds

Loading_http://localhost:8013/view/dataset/ACVV01/blob?staticThreshold=Infinity&nohitThreshold=Infinity&plotGraphics=svg&gc--Min=0.3&plotShape=hex

...

waiting for file 'ACVV01.blob.hex.png'

https://github.com/blobtoolkit/blobtools2

87 of 106

Viewer plots

Command we’d like to use to generate a taxon-filtered plot

$ ./blobtools2/blobtools view \

--host https://blobtoolkit.genomehubs.org \

--view snail \

--param bestsumorder_phylum--Keys=Proteobacteria \

ACVV01

https://github.com/blobtoolkit/blobtools2

88 of 106

Viewer plots

Command we’d like to use to generate a taxon-filtered plot

$ ./blobtools2/blobtools view \

--host https://blobtoolkit.genomehubs.org \

--view snail \

--param bestsumorder_phylum--Keys=Proteobacteria \

ACVV01

https://github.com/blobtoolkit/blobtools2

89 of 106

Viewer plots

Use jq to view bestsumorder_phylum category keys

$ jq '.keys' ./datasets/ACVV01/bestsumorder_phylum.json

[

"no-hit",

"Proteobacteria",

"Arthropoda",

"undef",

"Ascomycota",

"Chordata",

"Mollusca",

...

https://github.com/blobtoolkit/blobtools2

90 of 106

Viewer plots

Use jq to view bestsumorder_phylum category keys

$ curl -s https://blobtoolkit.genomehubs.org/api/v1/field/ACVV01/bestsumorder_phylum | jq '.keys'

[

"no-hit",

"Proteobacteria",

"Arthropoda",

"undef",

"Ascomycota",

...

https://github.com/blobtoolkit/blobtools2

91 of 106

Viewer plots

Use the key value to generate a plot from a publicly hosted dataset

$ ./blobtools2/blobtools view --view snail \

--host https://blobtoolkit.genomehubs.org \

--param bestsumorder_phylum--Keys=1 ACVV01

Loading https://blobtoolkit.genomehubs.org/view/dataset/ACVV01/snail?staticThreshold=Infinity&nohitThreshold=Infinity&plotGraphics=svg&bestsumorder_phylum--Keys=1

Fetching ACVV01.snail.png

waiting for element snail_save_png

waiting for file 'ACVV01.snail.png'

https://github.com/blobtoolkit/blobtools2

92 of 106

Viewer plots

Use the key value to generate a plot from a publicly hosted dataset

$ ./blobtools2/blobtools view --view snail \

--host https://blobtoolkit.genomehubs.org \

--param bestsumorder_phylum--Keys=1 ACVV01

Loading https://blobtoolkit.genomehubs.org/view/dataset/ACVV01/snail?staticThreshold=Infinity&nohitThreshold=Infinity&plotGraphics=svg&bestsumorder_phylum--Keys=1

Fetching ACVV01.snail.png

waiting for element snail_save_png

waiting for file 'ACVV01.snail.png'

https://github.com/blobtoolkit/blobtools2

93 of 106

Viewer plots

Alternate command to host a local Viewer instance

$ ./blobtools2/blobtools view --ports 8000-8099 --remote \

./datasets/ACVV01

Initializing viewer |███████████████████████████████████████| 15/15 seconds

Open dataset at http://localhost:8001/view/dataset/BlobDir/blob?

For remote access use:

ssh -L 8001:127.0.0.1:8001 -L 8000:127.0.0.1:8000 username@remote_host

https://github.com/blobtoolkit/blobtools2

94 of 106

Programmatic Access

Viewer API — browser & command line
Viewer plots — command line
Filtering datasets — command line
Filtering data files

95 of 106

Filtering datasets

Filter a local BlobDir dataset

$ ./blobtools2/blobtools filter --param length--Min=3000000 --table STDOUT ./datasets/ACVV01

[

["index","identifiers","gc","length","SRR026696_cov","best$

[17958,"JH855722.1",0.3877,3161164,0.6789,"Arthropoda"],

[21431,"JH859027.1",0.3881,7262926,0.753,"Arthropoda"]

]

http://blobtoolkit.genomehubs.org/download/AC/ACVV01

96 of 106

Filtering datasets

Use filters to compare alternative taxonomic inferences

$ ./blobtools2/blobtools filter \

--param bestsumorder_phylum--Keys=no-hit \

--table ACVV01.alt_taxrule.tsv \

--table-fields bestsumorder_phylum,alt_phylum \

./datasets/ACVV01

$ head ACVV01.alt_taxrule.tsv

index identifiers bestsumorder_phylum demo_phylum

1 JH838199.1 Proteobacteria no-hit

4 JH838202.1 Arthropoda no-hit

...

http://blobtoolkit.genomehubs.org/download/AC/ACVV01

97 of 106

Filtering datasets

Generate multiple outputs from a single filter command

$ ./blobtools2/blobtools filter \

--param bestsumorder_phylum--Keys=Proteobacteria \

--param bestsumorder_phylum--Inv=true \

--table ACVV01.proteobacteria.tsv \

--table-fields length,bestsumorder_genus,SRR026696_cov \

--summary ACVV01.proteobacteria.json \

--summary-rank genus \

--out ./datasets/ACVV01_proteobacteria \

./datasets/ACVV01

http://blobtoolkit.genomehubs.org/download/AC/ACVV01

98 of 106

Filtering datasets

Inspect the genus-level taxonomy of scaffolds in the filtered dataset

$ head ACVV01.proteobacteria.tsv

index identifiers length bestsumorder_genus SRR026696_cov

1 JH838199.1 1836 Acetobacter 0.0403

37 JH838235.1 2833 Gluconobacter 0.041600000000000005

46 JH838244.1 1575 Acetobacter 0.0407

69 JH838267.1 2008 Gluconobacter 0

91 JH838288.1 23979 Acetobacter 0.1859

99 JH838296.1 1326 Gluconobacter 0.0463

118 JH838315.1 1342 Acetobacter 0

124 JH838321.1 1445 Gluconobacter 0.0333

http://blobtoolkit.genomehubs.org/download/AC/ACVV01

99 of 106

Filtering datasets

View summary data for scaffolds assigned to Acetobacter

$ jq '.summaryStats.hits.Acetobacter' ACVV01.proteobacteria.json

{ "span": 4272045,

"count": 695,

"gc": [0.564,0.5814,0.4864,0.6416,0.3952,0.6496],

"cov": [0.1937,0.2394,0.0637,0.5886,0.0216,2.2387],

"n50": 48752,

"l50": 14,

"n90": 1756,

"l90": 366 }

https://github.com/blobtoolkit/blobtools 2

100 of 106

Filtering datasets

Generate a blob plot from the filtered dataset

$ ./blobtools2/blobtools view --ports 8000-8099 \

--param plotShape=circle \

--param catField=bestsumorder_genus \

--param bestsumorder_genus--Active=true \

./datasets/ACVV01_proteobacteria

Initializing viewer |███████████████████████████████████████| 15/15 seconds

...

waiting for file 'ACVV01_proteobacteria.blob.circle.png'

https://github.com/blobtoolkit/blobtools2

101 of 106

Filtering datasets

Generate a blob plot from the filtered dataset

$ ./blobtools2/blobtools view --ports 8000-8099 \

--param plotShape=circle \

--param catField=bestsumorder_genus \

--param bestsumorder_genus--Active=true \

./datasets/ACVV01_proteobacteria

Initializing viewer |███████████████████████████████████████| 15/15 seconds

...

waiting for file 'ACVV01_proteobacteria.blob.circle.png'

https://github.com/blobtoolkit/blobtools2

102 of 106

Programmatic Access

Viewer API — browser & command line
Viewer plots — command line
Filtering datasets — command line
Filtering data files

103 of 106

Filtering assembly files

Filter a FASTA file based on taxonomic inference

$ ./blobtools2/blobtools filter --fasta ACVV01.fasta \

--param bestsumorder_phylum--Keys=no-hit \

--suffix with_taxonomy ./datasets/ACVV01

$ ./blobtools2/blobtools filter --fasta ACVV01.fasta \

--param bestsumorder_phylum--Keys=no-hit \

--param bestsumorder_phylum--Inv=no-hit \

--suffix without_taxonomy ./datasets/ACVV01

$ ls

ACVV01.fasta ACVV01.without_taxonomy.fasta

ACVV01.with_taxonomy.fasta

https://github.com/blobtoolkit/blobtools2

104 of 106

Filtering assembly files

Filter FASTQ files based on taxonomic inference

$ ./blobtools2/blobtools filter \

--fastq SRR01_1.fastq.gz \

--fastq SRR01_2.fastq.gz \

--cov ACVV01.SRR01.bam \

--param bestsumorder_phylum--Keys=no-hit \

--suffix with_taxonomy ./datasets/ACVV01

$ ls

ACVV01.fastq.gz ACVV01.fastq.with_taxonomy.gz

https://github.com/blobtoolkit/blobtools2

105 of 106

Exploring Further

Download a BlobDir dataset from the public Viewer

$ cd ~/blobtoolkit/datasets

$ curl http://blobtoolkit.genomehubs.org/download/AC/ACVV01/ACVV01.blobdir.tar.gz | tar xf -

Try alternate taxrule parameters

Filter and compare tables and summaries

Reproduce interactive plots

http://blobtoolkit.genomehubs.org/download/

106 of 106

BlobToolKit

University of Edinburgh &

Wellcome Sanger Institute

Richard Challis (rc28@sanger.ac.uk)
Mark Blaxter

European Nucleotide Archive

European Bioinformatics Institute

Edward Richards
Jeena Rajan
Guy Cochrane

blobtoolkit.genomehubs.org