1 of 12

X International conference�“Information Technology and Implementation” (IT&I-2023)�Kyiv, Ukraine

Parallel and Distributed Machine Learning for Anomaly Detection Systems

Bohdan Koval, Iulia Khlevna

2 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

The purpose of this research

explore how parallel and distributed machine learning techniques can be harnessed to make anomaly detection systems operate with greater speed and efficiency;
benchmarking and performance evaluation to measure the efficiency gains and the reduction in processing time:

differentiate parallel and distributed machine learning
formulate parallel machine learning technique and analyze its performance
elaborate on distributed machine learning technique and how we can apply it

3 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

Parallel and distributed computing

Parallel Computing	Distributed Computing
Concurrency is achieved through simultaneous execution of numerous operations	System components are geographically distributed across distinct locations
A solitary computing unit is sufficient to execute the tasks	Utilizes a network of multiple distinct computing units
Concurrent operations are performed by multiple processors within a single system	Concurrent operations are distributed across multiple discrete computing systems
May encompass shared or distributed memory resources	Solely employs distributed memory resources
Inter-processor communication typically occurs via a shared memory bus	Communication between computing units relies on message passing protocols
Enhances the overall performance of a system	Enhances system scalability, fault tolerance, and resource sharing capabilities

4 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

Parallel and distributed computing

Parallelism: Utilize parallel processing or multi-threading capabilities available in your programming environment. Libraries like scikit-learn have options for parallel processing. You can make use of this feature to distribute the computation across multiple CPU cores, which can drastically reduce processing time.
Distributed Computing: If your dataset is extremely large and cannot fit in memory, consider using distributed computing frameworks like Apache Spark. These frameworks can distribute the data and computation across a cluster of machines.

5 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

Two primary types of parallelism:

model parallelism, which focuses on dividing a large model into smaller, manageable components that can be trained concurrently;
data parallelism, a technique where subsets of the dataset are distributed to multiple processing units for simultaneous training.

6 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

Two primary types of parallelism:

7 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

Amdahl's law

where Slatency represents the potential reduction in the time it takes to complete the entire task, s the reduction in time specifically for the part of the task that can be done in parallel, p is the portion of the total task time that is spent on the part that can be parallelized before parallelization.

Since Slatency < 1/(1 - p), it indicates that a small portion of the program that cannot be parallelized will restrict the maximum speedup achievable through parallelization.

8 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

Data processing performance comparison

Processing approach	Result
Regular (sequential) processing	19.341 seconds
Pool parallel processing	9.120 seconds
Pool parallel processing threads	19.125 seconds
Joblib parallel processing	16.912 seconds
Joblib parallel processing threads	19.213 seconds

9 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

Parallel feature engineering comparison

Feature engineerings processing approach	Result
Grouping features (sequential)	1.721 seconds
Grouping features shifted (sequential)	1.981 seconds
Pool grouping features parallel	2.820 seconds
Pool grouping features parallel threads	1.101 seconds
Pool grouping features shifted parallel	2.521 seconds
Pool grouping features shifted parallel threads	1.226 seconds
Joblib grouping features parallel threads	1.009 seconds
Joblib grouping features shifted parallel threads	1.182 seconds

10 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

Distributed machine learning

Distributed training becomes a necessity under the following circumstances:

Time-Intensive Training
Storage Constraints
Data Localization
RAM Constraints

11 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

Distributed machine learning

Distributed machine learning data processing involves:

Model Initialization
Model Distribution
Gradient Calculation
Gradient Communication
Model Update
Iterative Process

12 of 12

Information Technology and Implementation, November 20, 2023, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

Conclusion

Parallel processing can benefit the training speed in both cases, but the bump in the speed depends on the input size (the bigger size - the better speed improvement). Generally speaking, we can see 2x speed boost with allocating 4x resources.
The benefits of distributed training are particularly evident in scenarios involving extensive time-consuming training processes, storage constraints, data localization requirements, and RAM limitations.