1 of 24

Decoding the Battlefield: Machine Learning Challenges in Electronic Warfare

March 1, 2024

This document contains information proprietary to SRC that shall not be disclosed outside the organization receiving the document,

and shall not be duplicated, used, or disclosed in whole or in part for any purpose other than to evaluate its content.

PR 20-####

This document contains information proprietary to SRC that shall not be disclosed outside the organization receiving the document,

and shall not be duplicated, used, or disclosed in whole or in part for any purpose other than to evaluate its content.

2 of 24

Introduction

3 of 24

SRC Overview

Independent, not-for-profit, research and development corporation
Headquartered in Syracuse, NY with several regional locations (1500 employees)
Customers include

U.S. Dept of Defense, intelligence agencies
Other U.S. Agencies (CDC, EPA)
International Partners

Mission: To deliver innovative, advanced defense solutions and products that help keep America and our allies safe and strong through R&D, services, and manufacturing - investing all our earnings into technology, employees, and communities.

4 of 24

What we do at SRC

https://youtu.be/aGNkrMLobvg

5 of 24

Machine Learning in Electronic Warfare

Electronic Warfare involves using the electromagnetic spectrum to detect actions or communications, attack enemy capabilities within the spectrum, and protect friendly assets from interruption.
Machine learning is being used to automate signals analysis functions across the Electronic Warfare workflow

Signal Feature Extraction/Emitter Characterization
Emitter/Platform Identification
Electronic Counter-Measures/Counter-Counter-Measures
Intelligence Production/Modeling/Reporting

6 of 24

Data is Everything

You can’t do Machine Learning without understanding the data
“Good” data needs to represent the problem you are trying to solve

7 of 24

Challenge 1 – Machine Learning Dataset Management

8 of 24

Deep Learning Datasets

Deep Learning requires a lot of data

Youtube-8M contains 7 million videos with 4716 classes
MS Common Objects in Context (COCO) contains 330,000 images with 1.5M object instances
MNIST contains 70,000 images
ImageNet goal is 80,000+ nouns * 1000 images each (currently >14M)

Data is difficult to collect
Robust models need samples with representative noise/corruptions to ensure learned features truly represent the classes

Data synthesis/augmentation is much more efficient than capturing the data ‘in the wild’

9 of 24

Synthetic Data at SRC

We need lots of data!

Parametric

Summary

(Digest)

Continuous Digital

Intermediate Frequency (CDIF)

Noiseless

Augmented

.TMP

Pulse Descriptor Word (PDW)

.XPDW

Unlabeled

.XPDW

Labeled

.JSON

.CSV

File Metadata

Simulation Metadata

{

Author: J. Doe

Data: 1/6/21

…}

Data

Simulator

10 of 24

Synthetic Data Augmentation

Parametric

Summary

(Digest)

Continuous Digital

Intermediate Frequency (CDIF)

Noiseless

Augmented 1

.TMP

Pulse Descriptor Word (PDW)

Unlabeled

.XPDW

Labeled

.XPDW

File Metadata

Simulation Metadata

.JSON

{

Author: J. Doe

Data: 1/6/21

…}

.CSV

Augmented N

.TMP

…

Unlabeled

.XPDW

Labeled

.XPDW

Unlabeled

.XPDW

Labeled

.XPDW

11 of 24

Challenge 1: Dataset Management

Problem: Large volumes of ‘clean’ data and related augmented versions are difficult to organize and maintain. Training sets need to be constructed with balanced features and augmentations.
Task: Build an app to organize, query, and view a deep learning dataset with hierarchical relationships

Sub-Task 1: Develop a database that indexes a training/test/evaluation set of images to model relationships from files to other files and from files to their truth/metadata
Sub-Task 2: Create a UI for querying and browsing the data stored in the database, including functions for:

Querying the data based on relationships (e.g. “List all files derived from file X or Y”)
Visualizing files and the structure of relationships (e.g. tree/graph/list structures resulting from queries)
Display truth/metadata for an image (e.g. classification of the image)

Overview

12 of 24

Dataset

The provided dataset is Tiny ImageNet (a small excerpt of ImageNet) along with augmentations of the images generated from an open-source script¹.
Original (un-distorted) images are in the ‘original’ directory

Subdirectories for train, test, validation sets
Subdirectories of train set organize the images by their noun classification

At the top level, directories exist for each augmentation algorithm

Subdirectories 1-5 indicate the severity of the applied augmentation
Images organized into the same train/test/validation sets

Readme.txt describes the structure in more detail

1 https://arxiv.org/pdf/1903.12261.pdf

13 of 24

Objectives

Use Case 1: You are a machine learning practitioner that needs to select a diverse set of training and test images using this tool. You need to build a dataset that contains balanced data and is robust for deep learning.
Use Case 2: You have evaluated a model on a dataset built with this tool. You have a list of results and want to understand more about where the model is succeeding and where it is failing.
Database clearly models the relationships between files (source<->augmentations), including metadata on the augmentations (type, severity)
Database is simple to query based on metadata features of a given file/relationship.

“Return all images in class X”
“Return all images based on Image X, Y, or Z”
“Return all images based on augmentation algorithm X”
“Return all augmentation algorithms that have been used”

UI is easy to navigate and provides clear information regarding the lineage of individual files (can you visualize the results of the queries above?)
UI provides a visual depiction of file contents (the images themselves can be previewed)

14 of 24

Resources

Database

Graph Databases represent this data in a very natural way, but may require some setup (easier with Docker)

NoSQL/Document databases are a possible option also

Python TinyDB (https://tinydb.readthedocs.io/en/latest/intro.html) is file-based, requires no installation aside from the TinyDB pip package (at the expense of performance)

Javascript/React are great options for relatively quick development
Python/Dash (https://dash.plotly.com/tutorial) is like React for Python

15 of 24

Challenge 2 – Interpulse Modulation Classification

16 of 24

Signal Data – I/Q

* Images from https://en.wikipedia.org/wiki/In-phase_and_quadrature_components

17 of 24

IQ Sample – Synthetic Radar Data

Pulse

Pulse Width (PW)

Pulse Repetition Interval (PRI)

18 of 24

Signal Data – PDW

Pulse Descriptor Word (PDW) data summarizes the I/Q data in a condensed format

Each point in the file represents a single pulse rather than a single sample of the waveform – data size is much more manageable
Summarizes the PRI, PW, PF, PA in a single time-based record

Relies on software (Parameterizer) to detect pulses in the I/Q data

Results are highly dependent on configuration of the parameterizer and quality of the I/Q data
Different parameterizers may have unique tendencies or introduce noise

Features of the signal within a single pulse (e.g. phase shifts) are lost since the pulse is summarized into one set of values

19 of 24

PDW Sample – Synthetic Radar Data

20 of 24

Examples of Interpulse Modulation in PDW

Constant/Unmodulated

Stagger

Dwell and Switch

21 of 24

Challenge 2: Interpulse Modulation Classification

Problem: The interpulse modulation is a description of how the values within a signal feature domain change from pulse to pulse. Automatically labeling this information may support either the estimation of an emitter’s objective or the identification of the emitter.
Task: Design a machine learning algorithm to classify the interpulse modulation in a sample of signal data

Given a recorded signal, reproduce the modulation classification label for each of three domains:

Differential Time of Arrival (DTOA)
Pulse Frequency (PF)
Pulse Width (PW)

Classification is given on a per-file basis, each domain separately
Any algorithm/architecture is acceptable

Objective: Maximize accuracy of produced labels

Overview

23 of 24

Challenge 2: Interpulse Modulation Classification

Files contain different number of pulses

Possible solutions include padding/truncation, recurrent networks, stepping through fixed-size windows

Preprocessing may be necessary and may differ per domain

Scaling and standardization will depend on the typical ranges and units of those values

DTOA and Pulse Width are always positive, but Pulse Frequency may have negative values due to simulated receiver tuning

PA may be a useful piece of information (think attention-based networks), but not 100% necessary

Potential Considerations

Unmodulated or Stagger?

1 of 24

2 of 24

3 of 24

4 of 24

5 of 24

6 of 24

7 of 24

8 of 24

9 of 24

10 of 24

11 of 24

12 of 24

13 of 24

14 of 24

15 of 24

16 of 24

17 of 24

18 of 24

19 of 24

20 of 24

21 of 24

22 of 24

23 of 24

24 of 24