1 of 26

Optimizing Continuous Development By Detecting and Preventing Unnecessary Content Generation

Talank Baral, Shanto Rahman, Bala Naren Chanumolu,

Basak Balci,Tuna Tuncer, August Shi, Wing Lam

CCF-2145774

2 of 26

Developer Anecdote

2

Servers

4:15 PM

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml: Defines project dependencies and configuration for Maven builds

…� static int add() {

- ts.addRow(“”);return ts.size();

Build code and

run tests using

pom.xml

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

4:17 PM

Developer’s laptop

3 of 26

Developer Anecdote

3

Servers

…� static int add() {

- ts.addRow(“”);return ts.size();

…� static int add() {

- ts.addRow(“”);

+ ts.addRow(r);return ts.size();

4:15 PM

4:25 PM

Build code and

run tests using

pom.xml

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

Build code and

run tests using

pom.xml

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

Servers

Build code and run tests �using pom.xml and YAML file

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

4:27 PM

4:17 PM

…�- static int add() {

+ static int add(r) {

- ts.addRow(“”);

+ ts.addRow(r);return ts.size();

Developer’s laptop

4:35 PM

6:10 PM

6:15 PM

4 of 26

Developer Anecdote

4

…�- static int add() {

+ static int add(r) {

- ts.addRow(“”);

+ ts.addRow(r);return ts.size();

Servers

…� static int add() {

- ts.addRow(“”);return ts.size();

…� static int add() {

- ts.addRow(“”);

+ ts.addRow(r);return ts.size();

4:15 PM

4:25 PM

Build code and

run tests using

pom.xml

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

Build code and

run tests using

pom.xml

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

Servers

Build code and run tests �using pom.xml and YAML file

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

4:27 PM

4:17 PM

6:15 PM

Developer’s laptop

4:35 PM

6:10 PM

YAML file

YAML file: Defines what CD build should do on server

5 of 26

Developer Anecdote

5

Servers

…� static int add() {

- ts.addRow(“”);return ts.size();

…� static int add() {

- ts.addRow(“”);

+ ts.addRow(r);return ts.size();

4:15 PM

4:25 PM

Build code and

run tests using

pom.xml

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

Build code and

run tests using

pom.xml

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

Servers

Build code and run tests �using pom.xml and YAML file

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

4:27 PM

4:17 PM

…�- static int add() {

+ static int add(r) {

- ts.addRow(“”);

+ ts.addRow(r);return ts.size();

6:15 PM

Developer’s laptop

4:35 PM

Merged time: 6:30 PM

Total time: 2h 15min

6:10 PM

6 of 26

Developer Anecdote

6

…�- static int add() {

+ static int add(r) {

- ts.addRow(“”);

+ ts.addRow(r);return ts.size();

Servers

…� static int add() {

- ts.addRow(“”);return ts.size();

…� static int add() {

- ts.addRow(“”);

+ ts.addRow(r);return ts.size();

Build code and

run tests using

pom.xml

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

Build code and

run tests using

pom.xml

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

Servers

Build code and run tests �using pom.xml and YAML file

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

Developer’s laptop

4:15 PM

4:25 PM

4:27 PM

4:17 PM

6:15 PM

4:35 PM

6:10 PM

Merged time: 6:30 PM

Total time: 2h 15min

CD builds can be expensive to run, especially if they run unnecessary tasksto generate �unnecessary files.

2 min.

2 min.

5 min.

8 min.

8 min.

15 min.

Coding: 2+2+...+5

Building: 8+8+...+15

7 of 26

What is unnecessary?

7

unnecessary?

….�$ mvn -B package --file pom.xml

Coverage Report

$ mvn -B package --file pom.xml

Coverage Report

Summary

Anything that is destroyed without being accessed is unnecessary

TEMPORARY�Machine

…�<plugin>

jacoco-maven-plugin

</plugin>�…

pom.xml

Local Build

CD Build

Developer’s laptop

Cloud machine

8 of 26

Configuration of CD

  • YAML files are widely used to configure CD, such as GitHub Actions

8

name: Java CI with Maven

on:

push:

branches: [ master ]

pull_request:

branches: [ master ]

jobs:

build:

runs-on: ubuntu-latest

strategy:

matrix:

java: [8, 11]

name: java ${{ matrix.java }} building ...

steps:

- uses: actions/checkout@v3

- name: Set up Java ${{ matrix.java }}

uses: actions/setup-java@v3

with:

java-version: ${{ matrix.java }}

distribution: 'temurin'

cache: maven

- name: Build with Maven

run: mvn -B package --file pom.xml

We can avoid creation of unnecessary:

  • Surefire report
  • Jacoco report
  • PMD code analysis

To save over 14% of the step runtime!

-DdisableXmlReport=true -Djacoco.skip=true -Dpmd.skip=true

[1] https://github.com/JSQLParser/JSqlParser/blob/b7e5c151df37f5eb5c0e46f7321e19daeb7b9863/.github/workflows/maven.yml

Workflow name and events to trigger the build

Specify OS and java versions to build a job �named “build”

Setup steps

Step to run mvn tests

These changes affect only the CD build and does not affect developers’ local build!

9 of 26

Our Research

  1. propose dynamically tracking file reads/writes during CD builds �to identify the generation of unnecessary content
  2. propose OptCD, which can
    1. Identify unnecessary files during CD builds
    2. Detect Maven plugins generating unnecessary files and directories
    3. Fix Maven commands to prevent generating unnecessary directories
  3. evaluate OptCD on 22 Maven projects and sent �26 pull requests to developers speed up build runtime – �12 are accepted, nine are pending, and five are rejected

9

Goal: Optimize Continuous Development (CD) builds by not creating or modifying unnecessary files to speed up runtime. �To help with this goal, we

10 of 26

OptCD

10

Servers

Developer’s laptop

OptCD

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

11 of 26

OptCD

11

Servers

Developer’s laptop

OptCD

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

12 of 26

OptCD

12

Servers

Developer’s laptop

OptCD

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

13 of 26

OptCD

13

Servers

Developer’s laptop

OptCD

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

…�<plugin>

… jacoco-maven-plugin

</plugin>�…

pom.xml

YAML file

14 of 26

14

Log

Unused Files

Unused Directories

OptCD Overview

15 of 26

15

Log

Unused Files

Unused Directories

OptCD Overview

16 of 26

  • Continues recording until the last step and uploads the generated log for later use

16

GitHub Actions YAML

Log

Unused Files

Modified GitHub Actions YAML

Plugins from pom.xml

Unused Directories

Plugins from pom.xml

Modified pom.xml

  • Uses inotifywait, a Linux-based tool, to record operations such as file creation, modification, accessing, and deletion

Runs instrumented YAML once, which:

OptCD Logger

17 of 26

With these rules, the Classifier identifies unnecessary files from the logs generated by the Logger

17

GitHub Actions YAML

Log

Unused Files

Modified GitHub Actions YAML

Plugins from pom.xml

Unused Directories

Plugins from pom.xml

Modified pom.xml

Unnecessary files are the files generated and written to during the build, but never read from later

Created

Modified

Accessed

Necessary file

Unnecessary file

Necessary file

Unnecessary file

echo "abc" >> file.txt

Created, Modified but not Accessed

cat file.txt

Not Modified but Accessed

OptCD Classifier

18 of 26

Clusters unnecessary files into unnecessary directories

18

GitHub Actions YAML

Log

Unused Files

Modified GitHub Actions YAML

Plugins from pom.xml

Unused Directories

Plugins from pom.xml

Modified pom.xml

Plugins modify multiple files. Trying to disable an entire directory is faster than attempting to disable each file individually.

├── target

├── classes

│ ├── META-INF

│ ├── MANIFEST.MF

├── site

│ ├── CCJSqlParser.java.html

│ ├── CCJSqlParserTokenManager.html

├── jacoco

│ ├── jacoco.xml

│ ├── net.sf.jsqlparser.parser

├── javadoc

│ ├── ClassName.html

├── checkstyle-result.xml

../target/jacoco is unnecessary

../target/site is necessary

../target/javadoc is necessary

../target is necessary

Majority of unused files are solely in unnecessary directories

Unnecessary directories

Necessary directory

= Unnecessary file

= Necessary file

OptCD Clusterer

19 of 26

19

GitHub Actions YAML

Log

Unused Files

Modified GitHub Actions YAML

Plugins from pom.xml

Unused Directories

Plugins from pom.xml

Modified pom.xml

Three methods:

  1. Information Retrieval
  2. ChatGPT
  3. Log Search

Mapper searches for plugin responsible for creation of unnecessary directory

OptCD Mapper

20 of 26

Information Retrieval method:

  • Uses TF-IDF, which matches the name of an unnecessary directory to the likely plugin name that modifies it

20

GitHub Actions YAML

Log

Unused Files

Modified GitHub Actions YAML

Plugins from pom.xml

Unused Directories

Plugins from pom.xml

Modified pom.xml

Maven-resources-plugin:3.3.0;

maven-surefire-plugin:3.0.0-M7;

Maven-compiler-plugin:3.10.1;

Maven-bundle-plugin:5.1.8;

Build-helper-maven-plugin:3.2.0;

Maven-checkstyle-plugin:3.1.0;

Maven-pmd-plugin:3.19.0;

Jacoco-maven-plugin:0.8.8;

Maven-site-plugin:3.12.1;

Maven-source-plugin:3.2.1;

Javacc-maven-plugin:3.0.3;

License-maven-plugin:2.0.0;

maven-javadoc-plugin:3.4.1

Ranked #1

Ranked #2

OptCD Mapper

../target/jacoco is unnecessary

../target/site is necessary

../target/javadoc is necessary

../target is necessary

21 of 26

ChatGPT method:

  • Constructs a prompt asking which plugin is responsible for generating an unused directory

21

GitHub Actions YAML

Log

Unused Files

Modified GitHub Actions YAML

Plugins from pom.xml

Unused Directories

Plugins from pom.xml

Modified pom.xml

OptCD Mapper

  • List of plugins
  • Maven command
  • Unused directory

Ranked list of plugins

ask to rank plugins

Inputs:

Output:

22 of 26

Log Search method:

  • Correlate timestamp in the Logger logs with the timestamps of when plugins were started and ended in the GitHub Actions build logs

22

GitHub Actions YAML

Log

Unused Files

Modified GitHub Actions YAML

Plugins from pom.xml

Unused Directories

Plugins from pom.xml

Modified pom.xml

Java 8 building … Build with Maven 2023-08-07T02:32:45.3833432Z [INFO] --- jacoco-maven-plugin:0.8.7:prepare-agent (default) @ jsqlparser ---

Java 8 building … Build with Maven 2023-08-07T02:34:36.4448210Z [INFO] — jacoco-maven-plugin:0.8.7:prepare-agent (default) @ jsqlparser —

OptCD Mapper

Example Logger log

Example GitHub Actions build log

23 of 26

23

GitHub Actions YAML

Log

Unused Files

Modified GitHub Actions YAML

Plugins from pom.xml

Unused Directories

Plugins from pom.xml

Modified pom.xml

Ask ChatGPT to find a fix for the Maven command in the YAML file

ask argument to add

OptCD Fixer

  • List of plugins
  • Maven command
  • Unused directory

Inputs:

Argument to update command

Output:

24 of 26

24

GitHub Actions YAML

Log

Unused Files

Unused Directories

Plugins from pom.xml

Plugins from pom.xml

Modified pom.xml

Modified GitHub Actions YAML

OptCD Overview

25 of 26

Research Questions on 22 GitHub Open-Source Projects

  • RQ1: How often are unused directories generated?

25

    • On average 2.1 unused directories per step
    • 86% of steps generated at least one unused directory
  • RQ2: How effective is OptCD?
    • Identify the correct plugin for 92% of the unused directories
  • RQ3: How much runtime does OptCD save?
    • Reduces average step runtime by 7%
  • RQ4: What are developer’s reaction to OptCD?
    • Accepted 46.2% (12 out of 26) of submitted pull requests,�9 are still pending

26 of 26

Conclusion

Solution: We present OptCD, a technique to dynamically detect unnecessary work within CD builds, reducing build runtime and enhancing efficiency

  • Our study reveals that 86% of evaluated steps generate an average of 2.1 unused directories
  • Disabling generation of unused directories in 22 open source projects reduces step runtime by 7% on average

26

Problem: The execution of unnecessary plugins and generation of unnecessary directories in builds waste developers’ time and CD resources

Wing Lam <winglam@gmu.edu>