1 of 58

KWCoco

2 of 58

KWCoco

  • Does not depend on torch!�
  • Formal JSON schema which is an extension of �(and backwards compatible with) MS-COCO�
  • Tooling to Read, Write, and Modify kwcoco.json files�
  • CLI tool for stats, visualization, evaluation

3 of 58

How it started, and how it’s going

2021-02-05 - how it started

2022-04-06 - how it’s going

2023-11-29 - how it’s still going

4 of 58

KWCOCO

  • Manifest of videos, images, categories, and annotations.�
  • Efficient raster and vector sampling at specified resolution, space-time location, and sensor/band combination.�
  • Multispectral images, raster features, and predicted heatmaps are simply another “asset” for each image.�
  • Python API and CLI interfaces.�
  • gitlab.kitware.com/computer-vision/kwcoco

Assets 3

Assets 2

Assets 1

Video

Img 1

Img 2

Img 3

R|G|B

b3

Videos:

  • Corresponds to a region.
  • “Video-space” corresponds to a constant GSD.

Assets:

  • File-paths
  • Channels
  • Transform to “image-space”

Images:

  • Sensor
  • Datetime
  • Transform to “video-space”

pan

pan

R|G|B

b8

f1

f2

Annots

Annots

Annots

Annotations:

  • Bounding box
  • Segmentation
  • Category
  • Stored in “image-space”

5 of 58

KWCOCO For SMART

  • Asset Space
    • native on disk
  • Image Space
    • all bands resampled to largest
  • Video Space
    • all regions in a common resolution
  • Register any raster data (raw or features)
  • All resampling is done on the fly

Assets 3

Assets 2

Assets 1

Video

Img 1

Img 2

Img 3

R|G|B

b3

pan

pan

R|G|B

b8

f1

f2

Annots

Annots

Annots

6 of 58

KWCOCO (tracking)

MS-COCO modified to support simple tracking:�

  • Added top-level video table�
  • Images contain:
    • The “video_id” they belong to
    • A “frame_index” or “timestamp” to define ordering of the frames�
  • Annotations contain:
    • A “track_id” to be shared between annotations� in the same track.
  • Limitations: annotations are stored for each frame. In the future we will introduce an alternative specification to reduce redundancy.
  • gitlab.kitware.com/computer-vision/kwcoco

Assets 3

Assets 2

Assets 1

Video

Img 1

Img 2

Img 3

R|G|B

Annots

Annots

Annots

R|G|B

R|G|B

depth

IR

<track_id>

<track_id>

<track_id>

7 of 58

Motivation

  • MS-COCO is an easy to use format, with good / bad properties
    • All metadata in one file
    • No holes in polygons
    • No keypoint category encoding

8 of 58

KWCOCO

  • Pycocotools is the official API
    • completely static
    • Needs compiled modules�
  • KWCOCO is an alternative
    • Can add / remote objects
    • Command line interface
    • Can run in pure-python
    • *new* experimental SQL backend
    • Support for an extended schema
      • Holes in polygons
      • Keypoints with categories
      • Group images into videos
      • Auxiliary channels

Pycocotools

9 of 58

KWCOCO CLI

  • Quick stats on a dataset
  • Combine two datasets together
  • Splits two datasets into train / validation / test
  • Change the referenced images paths to an absolute or relative new location
  • Create DEMODATA!
  • Infer unset attributes to conform to the spec
  • Evaluate object detection using one COCO file as truth and another as predictions
  • Change category names
  • Validate the schema and assets

10 of 58

Demo Data

  • The toydata command can generate a toy dataset consisting of objects to detect on a noisy background.�
  • Useful for testing of ML algorithms without having to rely on downloading a large dataset.

11 of 58

Default Dictionary Backend

  • To open an existing dataset:�import kwcoco�dset = kwcoco.CocoDataset(path)
  • dset.dataset is exactly what is loaded from the json file.�
  • dset.index maintains fast lookup tables by primary keys (e.g. id)
    • dset.index.anns
    • dset.index.imgs
    • dset.index.cats
    • dset.index.gid_to_aids
    • dset.index.cid_to_aids
    • dset.index.name_to_video
    • dset.index.file_name_to_img

Note on notation: a = Annotation, c = Category, g = imaGe, vid = video (e.g. aid = annotation id, gid = image id)

Example JSON data structure

12 of 58

Vectorized interface

Convenience accessors are provided to allow for accessing multiple attributes with minimal code.

dset.images(<gids>)�dset.annots(<aids>)�dset.videos(<vidids>)�dset.categories(<cids>)

13 of 58

Experimental SQLAlchemy Backend

  • A coco dataset can be converted into a read-only SQLite file.�
  • Allows for working around Torch issue #13246 with DataLoaders and multiprocessing. (Only a python string is copied).�
  • API is exactly the same as the dictionary-based json data structure.

14 of 58

SQL Scaling

15 of 58

NDSampler

16 of 58

NDSampler -

  • Easy integration with kwcoco�
  • Tricky (due to GDAL), optional, but ultimately worthwhile automatic conversion of images to (configurable with sensible defaults) COG format. �
  • Spatial Indexes - quickly find all the annotations a specified region of the image.�
  • Support for kwimage (soon to be kwannot?) data structures.

17 of 58

Training a 224x224 Resnet50 Classifier on �Annotations in 1080x1920 images

With Cog

~80-95ish% constant utilization)

Without CoG

(67Hz @ 3x224x224)

With Cog

(best worker count = 4 )

(19.4GB RAM per worker)

Without Cog

With Cog

(best worker count > 12 )

(42GB RAM per worker!)

Without Cog

Jumps between 0 and 100% utilization

With CoG

(654Hz / 3x224x224)

18 of 58

KWImage Data Structures�(may change to kwannot)��These all interface nicely with shapely

19 of 58

Overview

  • Boxes - bounding boxes
  • Coords - arbitrary D-dim coordinates
  • Mask - binary mask�
  • Polygon - exterior ring, and a list of interior rings
  • Points - wraps Coords, used for xy keypoints
  • MultiPolygon - Multiple polygons�
  • Detections - Container for multiple boxes, and associated score, class-ids, etc..�
  • Heatmap - soft multi-category masks

All structures have methods:

  • tensor / numpy
    • Convert data to a torch / numpy backend
  • warp
    • Transform underlying raster or vector data with some transform specification, e.g. a matrix, imgaug, a GDAL transform, or an arbitrary function.
  • draw_on
    • Visualizes the structure on an image
  • random
    • classmethod to make a random instance for demo, testing, and algorithm purposes

Core principle: Classes should be extremely thin wrappers around underlying numpy arrays

20 of 58

kwimage.structs.Boxes

  • Maintains box format
    • xywh - tl_x, tl_y, w, h
    • ltrb - tl_x, tl_y, br_x, br_y
    • cxywh - cx, cy, w, h�
  • Fast backend operations:
    • Boxes.ious has a C backend�
  • Selected Boxes methods:
    • Boxes.translate
    • Boxes.scale
    • Boxes.warp

21 of 58

kwimage.structs.Coords

  • Simple backend for other classes that need to maintain lists of coordinates.�
  • Selected Coords methods:
    • Coords.to_imgaug
    • Coords.from_imgaug
    • Coords.soft_fill

22 of 58

kwimage.structs.Polygon

  • Maintain an exterior and multiple interior rings of coordinates.�
  • Selected Polygon methods:
    • Polygon.to_shapely
    • Polygon.to_mask
    • Polygon.to_geojson
    • Polygon.to_coco
    • <from variants of above>

23 of 58

kwimage.Detections

  • Container for columns of associated data such as boxes, classes, scores, keypoints, segmentations, etc... �
  • Selected Detection methods:
    • Detections.non_max_supress
    • Detections.argsort
    • Detections.warp
    • Detections.to_coco

24 of 58

KWArray

  • Misc Items
  • ArrayAPI - torch-numpy API interoperability
  • DataFrameArray - faster-than-pandas named-column-based arrays.
    • DataFrameLight - Python List version of the above structure useful for fast appends.
    • Future API: One class, specify between fast append mode or vectorized mode.
  • Algorithms:
    • Hungarian (i.e. Maximum Value Matching)
    • SetCover (Greedy Approximation and and Exact ILP)
  • Utilities
    • util_random (ensure_rng)
    • stats_dict(arr) -> {‘min’: 0, ‘max’: 3, ‘mean’: 1.2, …}
    • group_items | group_indices, apply_grouping, group_consecutive
    • Faster 32-bit RNGs: standard_normal32, uniform32 (numpy may have this now)

25 of 58

KWImage

  • The “Structs”: �Boxes, Mask, Coords, Points, Polygon, MultiPolygon Detections, Heatmap
  • IO:
    • imread - allows forcing of colorspace (but defaults to rgbx), allows choice of backend (gdal, opencv, scikit-image, — no PIL because PIL is slow — but defaults to the fastest.
    • imwrite - same sensible default colorspace and backend choices.
    • load_image_shape - without reading the entire image. This is where PIL is useful, also gdal is reasonable
  • Functional:
    • overlay_alpha_images(img1, img2) -> blended image # can also specify alpha values
    • stack_images ([*img1, img2, …], axis=0) -> concatenate images (options to handle heterogeneous sizes)
    • warp_tensor - warp a torch tensor or numpy array with a homography or affine matrix
    • grab_test_image, grab_test_image_path - Useful for unit/doc tests, you are testing right?
    • imresize - scale factor, constant size, letterboxing, returns transform info

26 of 58

KWImage - The Structures (kwannot?)

  • The “Structs”: �Boxes, Mask, Coords, Points, Polygon, MultiPolygon Detections, Heatmap
  • Drawing functions
    • ax = item.draw() # matplotlib
    • canvas = item.draw_on(img.copy(), color=’blue’) # opencv, inplace whenever possible
  • Casting Functions
    • coerce - try your best to make this type object from some other type of object
    • to_coco / from_coco - return a coco-compatible representation (may not always be perfect)
    • toformat - change the underlying data format
    • random - make a random instance for testing or randomized algorithms

27 of 58

KWPlot

  • Its ok, I know how to use it. Seaborn is really cool too, it’s probably better and you should learn that instead. Kwimage does depend on this to for its structure “.draw” methods.
  • The most likely to be useful bits:
    • kwplot.autompl is nice if you use IPython
    • kwplot.BackendContext can force mpl backends
    • multi_plot - very seaborn like interface, but no pandas required

28 of 58

Somewhat related to kwcoco

  • Some capabilities of smartwatch will be ported to kwcoco itself

29 of 58

Issues With Existing Machine Learning Systems

  • Lack of a good Data Manifest in existing DL CV systems (e.g. detectron2, mmdet)
    • Hard-coded number of categories
    • Hard-coded mean/std
    • MSI Images need to be resampled and aligned on-disk (if MSI is even supported)
    • Weight checkpoints are the only output of training (topology and metadata not included)
    • Annotations are decoupled from images (often need to specify a path to annotations and a path to images)
  • We want:
    • A manifest that registers paths to images and their annotations (1 file = 1 dataset)
    • Infer number of categories based on the manifest (extend to new categories)
    • Infer mean/std based on the manifest (extend to different MSI sensors)
    • Test data that contains corner cases so we run on the CI (auto-generated)
    • Produce deployed packages with all metadata needed to predict on unseen data.

30 of 58

KWCOCO

  • Extension of MSCOCO
    • Categories
    • Annotations (with Tracks)
    • Images
      • Auxiliary
    • Videos
  • Combined with ndsampler to randomly sample space-time windows�
  • Images stored in native resolution in one or multiple files

New!

Visualizations of the an MSI kwcoco file.

  • Left: loaded red|green|blue features
  • Right: loaded inv_sort1|inv_augment1|inv_shared1 features (These are UKY TA2 features)�

Can load any set of channels as a “DelayedImage”, the finalize() operation loads and aligns the data at a specified resolution on the fly.

Note: Video corresponds a time sequence of images.

31 of 58

ToyData: Food for the CI

Toy Data is useful for developing, debugging, and running tests on CI! Separates data from algorithms. Because Drop1 (and all datasets we work with) are in the same format we can swap it in.

32 of 58

KW-COCO

  • JSON Manifest of images sequences (i.e. videos aka regions)
    • Heterogeneous sensors / resolutions / channels
    • Pixel based box, polygon, and mask annotations�
  • Kitware’s TA-2 Interchange Format
    • Stores data in native resolution
    • Intermediate features stored as new “auxiliary” channels
    • Final results stored as annotations�
  • Combines with ndsampler
    • Resampling at specified resolution (on the fly)
    • Random sampling of subregions for training�
  • SQLAlchemy backend for scaling

32

Sample Data From Region at a Virtual Resolution

I want channels [B2, nir, Material1, Material2, B11] at 10 meter GSD from [0:100, 100:200] at frames [3, 7] in video “23KPQ_BR_Rio_R01”.

kwcoco + ndsampler

Here’s that (5,2,100,100) tensor and annotations in relative coordinates.

33 of 58

Adding Your Features to the KWCOCO file

{

"videos": [{"name": "TheRegionName", "width": 300, "height": 400}, ...],

"images": [

{

"name": "TheImageName",

"width": 600,

"height": 800,

"video_id": 1,

"date_captured": "2018-10-16T16:02:29",

"warp_img_to_vid": {"scale": 0.5},

"auxiliary": [

{

"file_name": "B1.tif",

"warp_aux_to_img": {"scale": 2.0},

"width": 300, "height": 400

"channels": "coastal", "num_bands": 1,

},

{

"file_name": "B2.tif",

"warp_aux_to_img": {"scale": 1.0},

"channels": "blue", "num_bands": 1,

},

...

], }, ... ]}

Input KWCOCO

...

"auxiliary": [

{"file_name": "B1.tif", ...},

{"file_name": "B2.tif", ...},

{

"file_name": "YOUR_FEATURE_PATH.tif",

"warp_aux_to_img": {"scale": 4.0},

"width": 75, "height": 100,

"channels": "your_channel_code",

"num_bands": 32,

...

},

...

]

Output KWCOCO

Append a new “auxiliary” item

34 of 58

Part 1: The kwcoco + ndsampler libraries

35 of 58

The kwcoco library

  • kwcoco: https://kwcoco.readthedocs.io/en/latest/
    • IS
      • A data format
      • An indexable manifest of categories, videos, images, and annotations.
      • Human readable (mostly, you wanna see what a segmentation looks like?)
      • A command line interface (CLI) tool. (with scoring code)
        • See `kwcoco --help`
      • An API with add / remove, statistic, and other helper methods.
      • Coercible - several ways to represent annotation data
        • i.e. There is a formal schema, but multiple backwards compatible formats are supported and new styles (like WKT for or on-disk masks for segmentations) can be added.
    • IS NOT
      • For loading data (it has lightweight - i.e. inefficient because no disk cache - ways of doing it)
      • For data streaming

36 of 58

I truncated the segmentations

37 of 58

The data structure

  • Accessing information in a kwcoco dataset is usually done by interfacing with an “index” object.�
  • There is also an alternative ORM-like API

38 of 58

The SQL Backend (don’t linger on this slide)

39 of 58

The ndsampler Library

  • ndsampler: https://ndsampler.readthedocs.io/en/latest/
    • DOES
      • A tool for loading (3d-video and 2d-image) data
      • Cache things like spatial indexes and COGs to a “workdir”
      • Do all of the alignment magic between with channels, images and videos with different resolutions
      • Provide an indexable regular grid of “positive” and “negative” samples. (helps write dataloaders)�
    • DOES NOT
      • Implement a torch dataset / dataloader by itself
      • Work with 1d or 4d+ data 😞
      • Work with anything but kwcoco (currently)
      • Use the GPU

40 of 58

Ndsampler API

  • Can be modified to suit developer needs�
  • Images:
    • gid
    • space-region
      • slices (y1:y2, x1:x2)
      • cx, cy, width, height�
  • Videos:
    • vidid
    • space-time-region
      • slices (t1:t2, y1:y2, x1:x2)
    • Alternates?
      • Specify space and time separately
      • Specify list of gids for time

41 of 58

Supporting Libraries

  • kwarray - https://kwarray.readthedocs.io/en/latest/
    • Low-level python library containing numpy-like array operations.
    • kwarray.SlidingWindow(<shape>, <window>) - will be used later when gridding up “videos”
    • kwarray.ArrayAPI - interoperability between torch and numpy�
  • kwimage - https://kwimage.readthedocs.io/en/latest/
    • Low level python library specifically for image operations
    • Currently is the home of the “kwimage” data structures:
      • kwimage.Boxes
      • kwimage.Coords
      • kwimage.MultiPolygon
      • kwimage.Mask
      • kwimage.Detections

42 of 58

Part 2: The WATCH Datasets

43 of 58

Givens and Goals

  • Given a static training dataset containing:
    • “Videos” of orthorectified spatial regions chosen a-priori�
    • Frames in each video may contain a mixture of:
      • Channels spread across different files and at different resolutions
      • Single files containing multiple channels�
    • Each frame might be at a different resolution.�
    • Note: Channels might be features from earlier steps in TA1 or TA2!
  • The TA2 developer should be able to:
    • Point at a single file to load a dataset�
    • Load any space-time-region at some specified resolution for any video with any subset of channels.

???

LC

WV

S2

S2

LC

LC

WV

Gray (uint8)

TrueColor (uint8)

B1, B2, B3, B4, B5, B6, B7, B8, B8A B9, B10, B11

8 bands, unknown

Orthorectified Dataset

44 of 58

Method Overview

Raw GeoTiff Source

(e.g. RGD)

S2

LC

WV

S2

S2

LC

LC

WV

Orthorectified Naitive Resolution KWCoco

For each ROI-Query we:

  • find all overlapping geotiffs
  • orthorectify and crop them to the ROI at native resolution
  • Register them as a “video” in a kwcoco.

S2

LC

WV

S2

S2

LC

LC

WV

Updated KWCoco at a Specified “Virtual” Resolution

watch/scripts/geojson_to_kwcoco.py

watch/scripts/coco_align_geotiffs.py

watch/scripts/coco_add_watch_fields.py

  • Each file is associated with a “channel” code.�(a “|”-separated string)�
  • Transforms are computed between images in a video. (all resize ops are delayed until sample time)
    • warp_img_to_vid
    • warp_aux_to_img
  • ndsampler can now sample given:
    • Temporal bounds in frame indexes (or specific image-ids)
    • Spatial bounds in pixels
    • Specified channels
    • Specified scale (TODO)

45 of 58

Important kwcoco Fields for WATCH

  • channels - a ‘|’ separated string that gives a codename to each channel.�
  • warp_aux_to_img : A transformation from auxiliary space to a chosen “image” space. �
  • warp_img_to_vid : A transformation from the chosen “image” space into a chosen “video” space.�
  • warp_aux_to_img: Not shown here, but will be used when an image has multiple bands in different files.

46 of 58

The updated “drop0-aligned” kwcoco dataset

  • Notice Videos now have fields:
    • “wld_to_vid”
    • “height”
    • “width”
  • Notice Images now have fields:
    • “channels”
    • “timestamp”
    • “num_bands”
    • “approx_meter_gsd”
    • “approx_elevation”
    • “warp_img_to_vid”
    • “warp_to_wld”
      • Note, that the TransformSpec is flexible, and we could do anything you want as long as `kwimage.Transform.coerce` can support it.�
  • Note: this would be slightly different for auxiliary images.

47 of 58

Method summary

  • We got an orthorectified kwcoco dataset.
    • Images are in COG format
    • Images have transforms populated
    • A video “gsd” is chosen�
  • We load the kwcoco dataset�
  • We create an ndsampler.CocoSampler�
  • We create an indexable grid of sample regions�
  • We use that grid and the sampler itself to write a data loader.

48 of 58

Part 3: In Domain Examples

49 of 58

50 of 58

51 of 58

The kwcoco API - compute stats on drop0

52 of 58

53 of 58

54 of 58

Scratch

55 of 58

Assets 2

Assets 1

Video

Img 1

Img 2

R|G|B

Annot 1

Annot 3

R|G|B

depth

Track1

Track2

Annot 2

56 of 58

Image

Asset

Annotation

Track

Video

*

*

*

*

Image

Asset

Annotation

Track

Video

*

*

*

*

Category

*

Info

Image

Asset

Annotation

Track

Video

*

*

*

*

Category

*

Info

🎵�Audio

? audio / mpeg structure to be determined.

MPEG

57 of 58

Image

Asset

Annotation

Track

Video

*

*

*

*

Category

*

Info

🎵�Audio

MPEG

🎵�Audio

MPEG

MPEG

MPEG

Annotation

The basic audio annotation is a 1D box, indicating start / end time. Label could be a category / caption / etc...

58 of 58

Image

Asset

Annotation

Track

Video

*

*

*

*

Category

*

Info

🎵�Audio

Movies

Stills