1 of 24

Decoding the Battlefield: Machine Learning Challenges in Electronic Warfare

1

March 1, 2024

© 2023 SRC, Inc. All rights reserved�v20231101

This document contains information proprietary to SRC that shall not be disclosed outside the organization receiving the document,

and shall not be duplicated, used, or disclosed in whole or in part for any purpose other than to evaluate its content.

PR 20-####

This document contains information proprietary to SRC that shall not be disclosed outside the organization receiving the document,

and shall not be duplicated, used, or disclosed in whole or in part for any purpose other than to evaluate its content.

© 2023 SRC, Inc. All rights reserved�v20231101

2 of 24

Introduction

3 of 24

SRC Overview

  • Independent, not-for-profit, research and development corporation
  • Headquartered in Syracuse, NY with several regional locations (1500 employees)
  • Customers include
    • U.S. Dept of Defense, intelligence agencies
    • Other U.S. Agencies (CDC, EPA)
    • International Partners

  • Mission: To deliver innovative, advanced defense solutions and products that help keep America and our allies safe and strong through R&D, services, and manufacturing - investing all our earnings into technology, employees, and communities.

4 of 24

What we do at SRC

5 of 24

Machine Learning in Electronic Warfare

  • Electronic Warfare involves using the electromagnetic spectrum to detect actions or communications, attack enemy capabilities within the spectrum, and protect friendly assets from interruption.
  • Machine learning is being used to automate signals analysis functions across the Electronic Warfare workflow
    • Signal Feature Extraction/Emitter Characterization
    • Emitter/Platform Identification
    • Electronic Counter-Measures/Counter-Counter-Measures
    • Intelligence Production/Modeling/Reporting

6 of 24

Data is Everything

  • You can’t do Machine Learning without understanding the data
  • “Good” data needs to represent the problem you are trying to solve

7 of 24

Challenge 1 – Machine Learning Dataset Management

8 of 24

Deep Learning Datasets

  • Deep Learning requires a lot of data
    • Youtube-8M contains 7 million videos with 4716 classes
    • MS Common Objects in Context (COCO) contains 330,000 images with 1.5M object instances
    • MNIST contains 70,000 images
    • ImageNet goal is 80,000+ nouns * 1000 images each (currently >14M)
  • Data is difficult to collect
  • Robust models need samples with representative noise/corruptions to ensure learned features truly represent the classes
    • Data synthesis/augmentation is much more efficient than capturing the data ‘in the wild’

9 of 24

Synthetic Data at SRC

We need lots of data!

Parametric

Summary

(Digest)

Continuous Digital

Intermediate Frequency (CDIF)

Noiseless

Augmented

.TMP

.TMP

Pulse Descriptor Word (PDW)

.XPDW

Unlabeled

.XPDW

Labeled

.JSON

.CSV

File Metadata

Simulation Metadata

{

Author: J. Doe

Data: 1/6/21

}

Data

Simulator

10 of 24

Synthetic Data Augmentation

Parametric

Summary

(Digest)

Continuous Digital

Intermediate Frequency (CDIF)

Noiseless

Augmented 1

.TMP

.TMP

Pulse Descriptor Word (PDW)

Unlabeled

.XPDW

Labeled

.XPDW

File Metadata

Simulation Metadata

.JSON

{

Author: J. Doe

Data: 1/6/21

}

.CSV

Augmented N

.TMP

Unlabeled

.XPDW

Labeled

.XPDW

Unlabeled

.XPDW

Labeled

.XPDW

11 of 24

Challenge 1: Dataset Management

  • Problem: Large volumes of ‘clean’ data and related augmented versions are difficult to organize and maintain. Training sets need to be constructed with balanced features and augmentations.
  • Task: Build an app to organize, query, and view a deep learning dataset with hierarchical relationships
    • Sub-Task 1: Develop a database that indexes a training/test/evaluation set of images to model relationships from files to other files and from files to their truth/metadata
    • Sub-Task 2: Create a UI for querying and browsing the data stored in the database, including functions for:
      • Querying the data based on relationships (e.g. “List all files derived from file X or Y”)
      • Visualizing files and the structure of relationships (e.g. tree/graph/list structures resulting from queries)
      • Display truth/metadata for an image (e.g. classification of the image)

Overview

12 of 24

Dataset

  • The provided dataset is Tiny ImageNet (a small excerpt of ImageNet) along with augmentations of the images generated from an open-source script1.
  • Original (un-distorted) images are in the ‘original’ directory
    • Subdirectories for train, test, validation sets
    • Subdirectories of train set organize the images by their noun classification
  • At the top level, directories exist for each augmentation algorithm
    • Subdirectories 1-5 indicate the severity of the applied augmentation
    • Images organized into the same train/test/validation sets
  • Readme.txt describes the structure in more detail

13 of 24

Objectives

  • Use Case 1: You are a machine learning practitioner that needs to select a diverse set of training and test images using this tool. You need to build a dataset that contains balanced data and is robust for deep learning.
  • Use Case 2: You have evaluated a model on a dataset built with this tool. You have a list of results and want to understand more about where the model is succeeding and where it is failing.
  • Database clearly models the relationships between files (source<->augmentations), including metadata on the augmentations (type, severity)
  • Database is simple to query based on metadata features of a given file/relationship.
    • “Return all images in class X”
    • “Return all images based on Image X, Y, or Z”
    • “Return all images based on augmentation algorithm X”
    • “Return all augmentation algorithms that have been used”
  • UI is easy to navigate and provides clear information regarding the lineage of individual files (can you visualize the results of the queries above?)
  • UI provides a visual depiction of file contents (the images themselves can be previewed)

14 of 24

Resources

  • Database
  • UI
    • Javascript/React are great options for relatively quick development
    • Python/Dash (https://dash.plotly.com/tutorial) is like React for Python

15 of 24

Challenge 2 – Interpulse Modulation Classification

16 of 24

Signal Data – I/Q

  •  

17 of 24

IQ Sample – Synthetic Radar Data

Pulse

Pulse Width (PW)

Pulse Repetition Interval (PRI)

18 of 24

Signal Data – PDW

  • Pulse Descriptor Word (PDW) data summarizes the I/Q data in a condensed format
    • Each point in the file represents a single pulse rather than a single sample of the waveform – data size is much more manageable
    • Summarizes the PRI, PW, PF, PA in a single time-based record
  • Relies on software (Parameterizer) to detect pulses in the I/Q data
    • Results are highly dependent on configuration of the parameterizer and quality of the I/Q data
    • Different parameterizers may have unique tendencies or introduce noise
  • Features of the signal within a single pulse (e.g. phase shifts) are lost since the pulse is summarized into one set of values

19 of 24

PDW Sample – Synthetic Radar Data

20 of 24

Examples of Interpulse Modulation in PDW

  • Constant/Unmodulated

  • Stagger

  • Dwell and Switch

21 of 24

Challenge 2: Interpulse Modulation Classification

  • Problem: The interpulse modulation is a description of how the values within a signal feature domain change from pulse to pulse. Automatically labeling this information may support either the estimation of an emitter’s objective or the identification of the emitter.
  • Task: Design a machine learning algorithm to classify the interpulse modulation in a sample of signal data
    • Given a recorded signal, reproduce the modulation classification label for each of three domains:
      • Differential Time of Arrival (DTOA)
      • Pulse Frequency (PF)
      • Pulse Width (PW)
    • Classification is given on a per-file basis, each domain separately
    • Any algorithm/architecture is acceptable
  • Objective: Maximize accuracy of produced labels

Overview

22 of 24

Dataset

  •  

23 of 24

Challenge 2: Interpulse Modulation Classification

  • Files contain different number of pulses
    • Possible solutions include padding/truncation, recurrent networks, stepping through fixed-size windows
  • Preprocessing may be necessary and may differ per domain
    • Scaling and standardization will depend on the typical ranges and units of those values

    • DTOA and Pulse Width are always positive, but Pulse Frequency may have negative values due to simulated receiver tuning
  • PA may be a useful piece of information (think attention-based networks), but not 100% necessary

Potential Considerations

Unmodulated or Stagger?

24 of 24

Thank you!