1 of 17

UG2+ Challenge

CVPR 2022 workshop

Team: Brave little briquettes

Peng Zhang[1] ,Chao Huang [2] ,Chengqing Xu [1] ,Jinghui Tang [1]

[1] China University of Mining and Technology

[2] University of Electronic Science and Technology of China

Docker: http://hub.docker.com/repository/docker/jinghuitang/image_tjh

2 of 17

Introduction to the competition

Target : UG2+ Track 1 aims to evaluate and advance object detection algorithms’ robustness on images captured from hazy environmental situations.

Dataset: 240 clear images and 177 paired hazy/clean images for training, 60 hazy images for validation, and eventually 50 hazy images for testing.

Performance Metric: mAP0.5

clean images

paired images

hazy images

3 of 17

Baseline model -- dehazing

origin image

result 1

result 2

result 3

We use the edn-gtm algorithm, the result is shown below. The figure shows the original hazy image, the different pre-training model result. It can be seen that the artifacts of the dehazing algorithm are obvious.

(We have tried many methods, here is the “edn-gtm” algorithm as an example)

4 of 17

Baseline model -- dehazing

Next, we also tried to perform fine tuning on the current dataset, but the paired images do not match exactly, so doing it directly will lead to poor results. So we made some modifications to the loss function (no the pixel-level alignment loss l2 loss), although the effect has become better, it still does not meet the requirements.

The results of different methods. a) Original figure. b) Pre-training model. c) fine-tuning model without pixel loss.

5 of 17

Baseline model -- dehazing

Since the performance of the dehazing algorithm is not good (instead it brings a lot of artifacts), we began to think about whether it is necessary to continue to try to dehaze. Our experiments found that the target can be basically detected using a simple detection algorithm. As shown in the figure below, so we did not include the dehazing module in our pipeline.

6 of 17

Baseline model -- YOLOX

YOLOX-s

YOLOX-l

YOLOX-x

YOLOX

s

l

x

Ap 0.5:0.95

0.534

0.613

0.633

Ap 0.5

0.784

0.791

0.804

P.S. we use coco model as pre-train model.

7 of 17

Solution – pre-train dataset

We found that the RTTS dataset is very similar to the race dataset.

YOLO-x

COCO

RTTS

Ap 0.5:0.95

0.633

0.646

Ap 0.5

0.804

0.839

The resuls show that the performance has been greatly improved with RTTS as pre-train, which indicates that the competition data and RTTS are homologous.

8 of 17

Solution – more powerful backbone

At the model level, ConvNext is adopteded as the new backbone of YOLOX.

As shown in Fig. 1, we can see that ConvNext has the best performance, surpassing Swin transformer, indicating that ConvNext has strong feature extraction ability. Due to limited computing resources, ConvNext (b) (backbone) + YOLOX-l (neck+head) is selected. In terms of performance, the results in the competition have reached AP 0.5 to 0.882.

Fig. 1

Fig. 2

9 of 17

Solution – adjust parameters

Considering that different loss fuctions occupy different weights, the loss weight we used before is the weight above coco. Coco has 80 categories of samples, but we only have one category in the competition, so we think the location of the model is of great importance. In the modification of loss, we change the weight of IOU loss from 5 to 10, and class loss from 1 to 0.5.

As shown in Fig. 1, The AP(0.5:0.95) of out model has changed from 0.643 to 0.662, and the AP(0.5) is up to 0.903, the performance has been significantly improved.

Fig. 1

10 of 17

Solution – adjusting parameters

Size of original multi-scale training

Adjusted multi-scale training size

Multi-scale training has been proved to be effective. Moreover, we find that the larger the scale span of the image, the better the performance are. The picture is our training size before and after adjustment.

11 of 17

Solution – adjusting parameters

In multi-scale training, the scale is pulled to the maximum range of their own equipment. Because the pictures are in different sizes, the information they contain is different. By enlarging the size, the model can learn more sufficient information to improve the detection accuracy of the model.

Previous multi-scale

Subsequent multi-scale

AP0.5:0.95

0.662

0.696

AP0.5

0.903

0.904

The performance of AP(0.5:0.95) has been improved by 3 points, but the performance of AP(0.5) has not been improved very well.

12 of 17

Solution –TTA

TTA is the test enhancement, which tests the model from different angles. In this competition, we did not choose the radical data enhancement, only chose the horizontal flip, up and down, up and down, left and right in test.

13 of 17

Solution – TTA

Results without TTA

Results after using TTA

Yolox

Yolox-TTA

AP 0.5:0.95

0.697

0.709

AP 0.5

0.914

0.922

the performance of the model is obviously improved with the introduction of TTA.

Note: the evaluation of yolox here selects the standard with conf greater than 0.001 as the test. (meet the needs of the competition)

14 of 17

Solution – Swin Faster RCNN

  1. Use the faster RCNN as the overall frame
  2. Use Swin-Base (384×384) as the backbone (Image Net pre training)
  3. Pre training with RTTS data set
  4. CIoU loss is introduced, and its weight is twice that of CE loss
  5. Mixup + Mosaic augmentation
  6. Multi-scal train/test(TTA)

15 of 17

Solution – Cascade RCNN scheme

a. Use cascade RCNN as the overall framework ;

b. IOU threshold is set as: 0.5,0.6,0.7 ;

c. Other settings are consistent with faster RCNN.

16 of 17

performance

We finally selected “YOLOX + Swin Faster + Swin Cascade RCNN” as the final model for testing.

We report the performance of different models on the validation set, including the results of combining different models using the WBF method.

method

Map

AP0.5

YOLOX(ConvNeXt)

0.709

0.922

Swin Faster RCNN

0.645

0.901

Swin Cascade RCNN

0.653

0.903

YOLOX + Swin Faster RCNN

0.708

0.930

YOLOX + Swin Cascade RCNN

0.712

0.926

Swin Faster RCNN+ Swin Cascade RCNN

0.671

0.915

YOLOX + Swin Faster RCNN+ Swin Cascade RCNN

0.709

0.929

17 of 17

Thanks