1 of 14

Python type annotation of Galaxy and BioBlend: goals and progress

Nicola Soranzo

@NicolaSoranzo

European Galaxy Days 2022

2 of 14

Outline

  • Intro to Python type annotations
  • Static type checking
  • Type annotations in the Galaxy ecosystem

www.earlham.ac.uk

3 of 14

Statically vs dynamically typed languages

  • In programming languages, types are used to give meaning to data (sequence of bits)
  • The type system of a language enables type checking (verifying constraints) and optimisations
  • Languages can be divided in:
    • Statically typed: checks at compile time (e.g. C, Rust)
      • Less bugs; increase exec speed
    • Dynamically typed: checks at runtime (e.g. R, Python)
      • Concise code; more flexible (duck typing)
    • But often there’s a mix

www.earlham.ac.uk

4 of 14

Type checking

  • Example of dynamic type checking in Python:

$ cat code.py

x = 4

print(f"x is {x}")

y = x + "hello"

$ python3 code.py

x is 4

Traceback (most recent call last):

File "code.py", line 3, in <module>

y = x + "hello"

TypeError: unsupported operand type(s) for +: 'int' and 'str'

www.earlham.ac.uk

5 of 14

Type annotations (or hints) in Python

  • An increasing number of type annotations functionalities have been added to Python over the course of the last 3.x releases

def repeat(s: str, times: int) -> str:

return s * times

def repeat(s, times):

return s * times

www.earlham.ac.uk

6 of 14

Why add type annotations to Python code?

  • Type hints are NOT enforced by Python at runtime, BUT are quite useful for:
    • Developing code using a type-annotated library
    • Static type checking analysis
    • Code refactoring
    • Data validation at runtime (e.g. pydantic)
    • Automatic code generation

www.earlham.ac.uk

7 of 14

Available types

  • Basic types: int, float, bool, str, bytes, None, object
  • Built-in collection types (to import from typing module):
    • Dict, List, Set, Tuple, …
    • Mapping, Sequence, Iterable, Generator, …
    • Contained types in []: List[int], Dict[str, float], Sequence[bool]
    • From Python 3.9 the version from the standard library can directly be used instead: dict[str, float]
  • Can be anything: Any
  • Can also be None: Optional[str]

www.earlham.ac.uk

8 of 14

Other types

  • Any of certain types: Union[int, str, List[int]]
  • Any of certain values: Literal[3, 3.0, “three”]
  • Function: Callable[[Arg1Type, ], ReturnType]
  • Streams: IO, TextIO, BinaryIO
  • Type alias: Vector: TypeAlias = List[float]
  • New type: UserId = NewType("UserId", int)
  • Generics:

from typing import Sequence, TypeVar

T = TypeVar('T') # Declare type variable

def first(l: Sequence[T]) -> T: # Generic function

return l[0]

www.earlham.ac.uk

9 of 14

In-line annotations vs Stubs

  • Instead of type annotating within the Python code (.py) files, stub files (.pyi) can be used. But in-line is preferred!

class MyClass:

read_write: int

def do_stuff(self, y: str) -> bool: ...

www.earlham.ac.uk

10 of 14

Static type checking

  • Most common checker is mypy: https://mypy.readthedocs.io/
    • mypy dir/
    • Configuration in mypy.ini
      • How strict mypy should be, e.g.:
        • check_untyped_defs
        • no_implicit_optional
        • no_implicit_reexport
        • disallow_any_generics
        • Disallow_untyped_calls
  • Add it to your Continuous Integration linting!

www.earlham.ac.uk

11 of 14

Other resources

  • Other static type checkers:
    • Microsoft’s Pyright
    • Google’s pytype
    • Meta’s Pyre
  • Pydantic: data validation by enforcing type annotation at runtime
  • MonkeyType: automatically add draft type annotations to Python code based on the types collected at runtime
  • Upcoming GTN tutorial from Helena Rasche!

www.earlham.ac.uk

12 of 14

Type annotations in the Galaxy ecosystem

  • Various degrees of type annotation across repositories
  • In Galaxy, annotations started in late 2020 to support FastAPI/pydantic runtime data validation in API calls
    • Large part of Galaxy API annotated
    • Extended to packages produced from the codebase (e.g. galaxy-util, galaxy-tool-util, …)
    • Often added while refactoring code
    • Goal of becoming more complete and strict for core code, tracked in mypy.ini
  • Thanks to Marius van den Beek, John Chilton, David López, Michael Crusoe and many others!

www.earlham.ac.uk

13 of 14

BioBlend annotation at GCC2022 Collaboration Fest

  • BioBlend codebase had only an initial type annotation
  • Proposed project at GCC2022 Collaboration Fest
    • 5 participants (only 1 core dev): perfect project for newcomers!
    • Divided work by module, with fast review/merge cycles
  • Almost finished annotating the whole codebase in 3 days
    • 29 merged pull requests, 45 files changed, 1200 insertions(+), 650 deletions(-) in total.
  • Thanks Catherine Bromhead, Jayadev Joshi, Fabio Cumbo and Bryan @thepineapplepirate !

www.earlham.ac.uk

14 of 14

@NicolaSoranzo

Questions or comments?

Should we have an EGD CoFest type annotation project?

nicola.soranzo@earlham.ac.uk

Earlham Institute, Norwich Research Park, Norwich, Norfolk, NR4 7UZ, UK�www.earlham.ac.uk