Dask and Genomics
Dask Life Science Workshop
Dask Distributed Summit, May 19-21 2021
Tom White
UK Biobank scale:
10M variants x 500k samples
Statistical genetics toolkit in Python
Why sgkit?
5 Dask Challenges
Wish: support masked arrays in CuPy; or float8
2. Optimizing chunking for performance
Wish - better tools for choosing chunking; incorporate rechunker ideas in Dask
3. Scaling linear algebra operations
Wish - a SLAB-like scalable linear algebra benchmark
4. Reasoning about Dask execution
Wish: high-level visualization of computation (including chunking)
5. Choosing a stable Dask release
Wish: “blessed” stable releases and quality testing by the vendors, in partnership with the community