Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features

Introduction

This is a modified version of the official haartraining utility manual document. The statements of this color are my additions.

This document describes how to train and use a cascade of boosted classifiers for rapid object detection. A large set of over-complete haar-like features provide the basis for the simple individual classifiers. Examples of object detection tasks are face, eye and nose detection, as well as logo detection.

 

The sample detection task in this document is logo detection, since logo detection does not require the collection of large set of registered and carefully marked object samples. Instead we assume that from one prototype image, a very large set of derived object examples can be derived (createsamples utility, see below).

 

A detailed description of the training/evaluation algorithm can be found in [1] and [2].

The haartraining utilities have a character, almost no error handling. Make sure option names by yourself. The utilities silently ignore when options were specified wrongly.


Samples Creation

For training a training samples must be collected. There are two sample types: negative samples and positive samples. Negative samples correspond to non-object images. Positive samples correspond to object images.

Negative Samples

Negative samples are taken from arbitrary images. These images must not contain object representations. Negative samples are passed through background description file. It is a text file in which each text line contains the filename (relative to the directory of the description file) of negative sample image. This file must be created manually. Note that the negative samples and sample images are also called background samples or background samples images, and are used interchangeably in this document

 

Example of negative description file:

 

Directory structure:

/img

img1.jpg

img2.jpg

bg.txt

 

File bg.txt:

img/img1.jpg

img/img2.jpg

We can create such a collection file using UNIX commands as

 $ find img/ -name '*.jpg' > bg.txt


Positive Samples

Positive samples are created by createsamples utility. They may be created from single object image or from collection of previously marked up images.

This is a list of options of createsamples utility. You can see this by executing as

 $ createsamples

Usage: ./createsamples

  [-info <collection_file_name>]

  [-img <image_file_name>]

  [-vec <vec_file_name>]

  [-bg <background_file_name>]

  [-num <number_of_samples = 1000>]

  [-bgcolor <background_color = 0>]

  [-inv] [-randinv] [-bgthresh <background_color_threshold = 80>]

  [-maxidev <max_intensity_deviation = 40>]

  [-maxxangle <max_x_rotation_angle = 1.100000>]

  [-maxyangle <max_y_rotation_angle = 1.100000>]

  [-maxzangle <max_z_rotation_angle = 0.500000>]

  [-show [<scale = 4.000000>]]

  [-w <sample_width = 24>]

  [-h <sample_height = 24>]

The single object image may for instance contain a company logo. Then are large set of positive samples are created from the given object image by randomly rotating, changing the logo color as well as placing the logo on arbitrary background.

The amount and range of randomness can be controlled by command line arguments.

Command line arguments:

- vec <vec_file_name>

name of the output file containing the positive samples for training

- img <image_file_name>

source object image (e.g., a company logo)

- bg <background_file_name>

background description file; contains a list of images into which randomly distorted versions of the object are pasted for positive sample generation

- num <number_of_samples>

number of positive samples to generate

- bgcolor <background_color>

background color (currently grayscale images are assumed); the background color denotes the transparent color. Since there might be compression artifacts, the amount of color tolerance can be specified by -bgthresh. All pixels between bgcolor-bgthresh and bgcolor+bgthresh are regarded as transparent.

- bgthresh <background_color_threshold>

- inv

if specified, the colors will be inverted

- randinv

if specified, the colors will be inverted randomly

- maxidev <max_intensity_deviation>

maximal intensity deviation of foreground samples pixels; units are in [0, 255].

- maxxangle <max_x_rotation_angle>,

distortion by rotation; units are in radians. Rotate around horizontal axe. Use to get faces like seeing up and down. 

- maxyangle <max_y_rotation_angle>,

distortion by rotation; units are in radians. Rotate around vertical axe. Use to get faces like seeing left and right. 

- maxzangle <max_z_rotation_angle>

distortion by rotation; units are in radians. Rotate around depth axe. Use to get faces like tilting head to left and right. 

maximum rotation angles in radians

-show

if specified, each sample will be shown. Pressing ‘Esc’ will continue creation process without samples showing. Useful debugging option.

- w <sample_width>

width (in pixels) of the output samples

- h <sample_height>

height (in pixels) of the output samples

- info <collection_file_name>

See below

 

For following procedure is used to create a sample object instance:

The source image is rotated random around all three axes. The chosen angle is limited my -max?angle. Next pixels of intensities in the range of [bg_color-bg_color_threshold; bg_color+bg_color_threshold] are regarded as transparent. White noise is added to the intensities of the foreground. If –inv key is specified then foreground pixel intensities are inverted. If -randinv key is specified then it is randomly selected whether for this sample inversion will be applied. Finally, the obtained image is placed onto arbitrary background from the background description file, resized to the pixel size specified by -w and -h and stored into the file specified by the -vec command line parameter.

 


Positive samples also may be obtained from a collection of previously marked up images. This collection is described by text file similar to background description file. Each line of this file corresponds to collection image. The first element of the line is image file name. It is followed by number of object instances. The following numbers are the coordinates of bounding rectangles (x, y, width, height).

 

Example of description file:

 

Directory structure:

/img

img1.jpg

img2.jpg

info.dat

 

File info.dat:

img/img1.jpg 1 140 100 45 45

img/img2.jpg 2 100 200 50 50 50 30 25 25

[filename] [# of objects] [[left_x top_y width height] [... 2nd object] ...]

 

Image img1.jpg contains single object instance with bounding rectangle (140, 100, 45, 45). Image img2.jpg contains two object instances where (x, y) is the position of the left-upper corner position of the object where the origin (0,0) is the left-upper corner of the entire image.

 

In order to create positive samples from such collection -info argument should be specified instead of -img:

- info <collection_file_name>

description file of marked up images collection

 

The scheme of sample creation in this case is as follows. The object instances are taken from images. Then they are resized to samples size and stored in output file. No distortion is applied, so the only affecting arguments are -w, -h, -show and –num.

 

createsamples utility may be used for examining samples stored in positive samples file. In order to do this only -vec, -w and -h parameters should be specified.

 

Note that for training, it does not matter how positive samples files are generated. So the createsamples utility is only one way to collect/create a vector file of positive samples.


Training

The next step after samples creation is training of classifier. It is performed by the haartraining utility.

This is a list of options of the haartraining

Usage: haartraining

  -data <dir_name>

  -vec <vec_file_name>

  -bg <background_file_name>

  [-npos <number_of_positive_samples = 2000>]

  [-nneg <number_of_negative_samples = 2000>]

  [-nstages <number_of_stages = 14>]

  [-nsplits <number_of_splits = 1>]

  [-mem <memory_in_MB = 200>]

  [-sym (default)] [-nonsym]

  [-minhitrate <min_hit_rate = 0.995000>]

  [-maxfalsealarm <max_false_alarm_rate = 0.500000>]

  [-weighttrimming <weight_trimming = 0.950000>]

  [-eqw]

  [-mode <BASIC (default) | CORE | ALL>]

  [-w <sample_width = 24>]

  [-h <sample_height = 24>]

  [-bt <DAB | RAB | LB | GAB (default)>]

  [-err <misclass (default) | gini | entropy>]

  [-maxtreesplits <max_number_of_splits_in_tree_cascade = 0>]

  [-minpos <min_number_of_positive_samples_per_cluster = 500>]

 

Command line arguments:

- data <dir_name>

directory name in which the trained classifier is stored

- vec <vec_file_name>

file name of positive sample file (created by trainingsamples utility or by any other means)

- bg <background_file_name>

background description file

- npos <number_of_positive_samples>,

- nneg <number_of_negative_samples>

number of positive/negative samples used in training of each classifier stage. Reasonable values are npos = 7000 and nneg = 3000.

- nstages <number_of_stages>

number of stages to be trained

- nsplits <number_of_splits>

determines the weak classifier used in stage classifiers. If 1, then a simple stump classifier is used, if 2 and more, then CART classifier with number_of_splits internal (split) nodes is used. This is the number of feature values to be used in a weak classifier, not the number of splits of classifier tree. One stage is composed as a linear combination of the weak classifiers, and the final classifier is a cascade of such stages. PS. I do not think CART helps theoretically, but an empirical result shows it helped [2].

- mem <memory_in_MB>

Available memory in MB for precalculation. The more memory you have the faster the training process

- sym (default),

- nonsym

specifies whether the object class under training has vertical symmetry or not. Vertical symmetry speeds up training process. For instance, frontal faces show off vertical symmetry

- minhitrate <min_hit_rate>

minimal desired hit rate for each stage classifier. Overall hit rate may be estimated as (min_hit_rate^number_of_stages)

- maxfalsealarm <max_false_alarm_rate>

maximal desired false alarm rate for each stage classifier. Overall false alarm rate may be estimated as (max_false_alarm_rate^number_of_stages)

- weighttrimming <weight_trimming>

Specifies whether and how big weight trimming should be used. A decent choice is 0.90. This is a parameter for boosting algorithm. You may refer Boosting section of OpenCV Machine Learning Reference (cvBoostParams.weight_trim_rate).

- eqw

Enable equal weights for positives and negatives. You may feel that it is unfair to handle positives and negatives equally when their numbers are unequal.  The haartrainig utility calculates error in response to the ratio of number of positives and negatives to achieve the fairness as default. This option disables it. [1] states as "error is calculated with respect to the weighted positive and negative samples."  

- mode <BASIC (default) | CORE | ALL>

selects the type of haar features set used in training. BASIC use only upright features, while ALL uses the full set of upright and 45 degree rotated feature set. See [1] for more details.

- w <sample_width>,

- h <sample_height>

Size of training samples (in pixels). Must have exactly the same values as used during training samples creation (utility trainingsamples)

-  bt  <DAB | RAB | LB | GAB (default)>

DAB is Discrete Ada Boost, RAB is Real Ada Boost, LB is Logit Boost, GAB is Gentle Adaboost. [2] states as "GAB is not only the best, but also the fastest classifier."

-  err <misclass (default) | gini | entropy>]

Type of used error available only when Discrete AdaBoost algorithm is applied. misclass == number of misclassified samples / total samples. Entropy or Gini can also be used to measure a kind of errors.

-  maxtreesplits <max_number_of_splits_in_tree_cascade = 0>]

Construct classifiers trees, rather than cascades. Theoretically, a serial cascade should be enough, but empirically, a tree structure may help something.

-  minpos <min_number_of_positive_samples_per_cluster = 500>]

Note: in order to use multiprocessor advantage a compiler that supports OpenMP 1.0 standard should be used.

The haartraining utility outputs as follows:

+----+----+-+---------+---------+---------+---------+

|  N |%SMP|F|  ST.THR |    HR   |    FA   | EXP. ERR|

+----+----+-+---------+---------+---------+---------+

|   1|100%|-|-0.857040| 1.000000| 1.000000| 0.082075|

+----+----+-+---------+---------+---------+---------+

|   2|100%|+|-1.702127| 1.000000| 1.000000| 0.102168|

+----+----+-+---------+---------+---------+---------+

N

The iteration number of feature selection training.

%SMP

The percentage of original samples left. 

F

+ indicates the feature is flipped. Related to -sym (default) option. 

ST.THR

Stage threshold

HR

Hit rate

FA

False alarm rate. FYI: (HR, FA) = (1.0, 1.0) means the detector simply alarms every time for everything.

EXP.ERR

Expected (misclassification) error.

The haartraining utility creates <dir_name>.xml file when the training completely finished where <dir_name> is the argument for -data option. If you want to generate a xml file before the haartraining utility has finished, you can use a convert_cascade utility located at  OpenCV/samples/c/convert_cascade.c (your opencv installed directory). Compile it. The usage of the utility is as follows:

 $ convert_cascade --size="<sample_width>x<sampe_height>" <haartraining_ouput_dir> <ouput_xml_file>


Application

OpenCV cvHaarDetectObjects() function (in particular haarFaceDetect demo) is used for detection.

There is a facedetect utility in OpenCV/samples/c/facedetect.c (your opencv installed directory). Compile it. The usage of the utility is as follows:

$ facedetect --cascade=<xml_file> [filename(image or video)|camera_index]

Usually the camera_index is 0. If you connect more than one camera to your computer, it would be 1 or 2, etc.


Test Samples

In order to evaluate the performance of trained classifier a collection of marked up images is needed. When such collection is not available test samples may be created from single object image by createsamples utility. The scheme of test samples creation in this case is similar to training samples creation since each test sample is a background image into which a randomly distorted and randomly scaled instance of the object picture is pasted at a random position.

 

If both -img and -info arguments are specified then test samples will be created by createsamples utility. The sample image is arbitrary distorted as it was described below, then it is placed at random location to background image and stored. The corresponding description line is added to the file specified by -info argument.

 

The -w and -h keys determine the minimal size of placed object picture.

 

The test image file name format is as follows:

imageOrderNumber_x_y_width_height.jpg, where x, y, width and height are the coordinates of placed object bounding rectangle.

Note that you should use a background images set different from the background image set used during training.

Performance Evaluation

In order to evaluate the performance of the classifier performance utility may be used. It takes a collection of marked up images, applies the classifier and outputs the performance, i.e. number of found objects, number of missed objects, number of false alarms and other information.

Here is a list of options of the performance utility

Usage: ./performance

  -data <classifier_directory_name>

  -info <collection_file_name>

  [-maxSizeDiff <max_size_difference = 1.500000>]

  [-maxPosDiff <max_position_difference = 0.300000>]

  [-sf <scale_factor = 1.200000>]

  [-ni]

  [-nos <number_of_stages = -1>]

  [-rs <roc_size = 40>]

  [-w <sample_width = 24>]

  [-h <sample_height = 24>]

Command line arguments:

- data <dir_name>

directory name in which the trained classifier is stored. A haarcascade xml file can also be specified. In that case -w and -h options are not necessary and ignored because the haarcascade xml file includes the infomation. FYI: cvLoadHaarClassifierCascade function used in the performance utility supports both classifier directory and haarcascade xml file, but this function is obsolete.

- w <sample_width>,

- h <sample_height>

Size of training samples (in pixels). Must have exactly the same values as used during training (utility haartraining)

- info <tests_collection_file_name>

file with test samples description

- maxSizeDiff <max_size_difference>,

- maxPosDiff <max_position_difference>

determine the criterion of reference and detected rectangles coincidence. Default values are 1.5 and 0.3 respectively.

- sf <scale_factor>,

detection parameter. Default value is 1.2. Enlarge window sizes by multiplying with this number until exceeding the size of the picture.

- ni

Do not save resulted image files of detection. As default, the performance utility requires directories which prefix 'det-' is added to test image directories to store the resulted image files showing positions of detected objects by rectangles. For example, if a test image file has a name as "tests/01/img01.bmp/0001_0341_0241_0039_0039.jpg", we have to create a directory "det-tests/01/img01.bmp" beforehand, otherwise, we will see an error message as "OpenCV ERROR: Unspecified error (could not save the image) in function cvSaveImage, loadsave.cpp(520)".  We can avoid the error with -ni option or by creating directories as

    $ cat <tests_collection_file_name> | perl -pe 's!^(.*)/.*$!det-$1!g' | xargs mkdir -p

- rs <roc_size>

Resolution of Receiver Operating Curves (ROCs). Default value is 40. This is not a parameter for detection, but for outputs (just required for malloc)

An output of the performance utility is as follows:

+================================+======+======+======+

|            File Name           | Hits |Missed| False|

+================================+======+======+======+

|tests/01/img01.bmp/0001_0153_005|     0|     1|     0|

+--------------------------------+------+------+------+

....

+--------------------------------+------+------+------+

|                           Total|   874|   554|    72|

+================================+======+======+======+

Number of stages: 15

Number of weak classifiers: 68

Total time: 115.000000

15

        874     72      0.612045        0.050420

        874     72      0.612045        0.050420

        360     2       0.252101        0.001401

        115     0       0.080532        0.000000

        26      0       0.018207        0.000000

        8       0       0.005602        0.000000

        4       0       0.002801        0.000000

        1       0       0.000700        0.000000

        ....

'Hits' shows the number of correct detections. 'Missed' shows the number of missed detections or false negatives (Truly there exists, but the detector missed to detect it). 'False' shows the number of false alarms or false positives (Truly there does not exist, but the detector alarmed as there exists.)

The latter table is for ROC plot. ROC shows how well we can correctly detects when we allow some false alarm probabilities. The simplest way to detect everything is to alarm always. Refer Receiver Operating Curves (ROCs).You may plot it as following matlab codes:

>> ROC = [        874     72      0.612045        0.050420

        874     72      0.612045        0.050420

        360     2       0.252101        0.001401

        115     0       0.080532        0.000000

        26      0       0.018207        0.000000

        8       0       0.005602        0.000000

        4       0       0.002801        0.000000

        1       0       0.000700        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000

        0       0       0.000000        0.000000];

>> plot(ROC(:,4),ROC(:,3));

0.05 is the max false alarm value specified at the haartraining stage. This ROC plot has values upto 0.05, not 1.0 as an usual ROC plot.

References

[1] Rainer Lienhart and Jochen Maydt. An Extended Set of Haar-like Features for Rapid Object Detection. Submitted to ICIP2002.

[2] Alexander Kuranov, Rainer Lienhart, and Vadim Pisarevsky. An Empirical Analysis of Boosting Algorithms for Rapid Objects With an Extended Set of Haar-like Features. Intel Technical Report MRL-TR-July02-01, 2002.

[3] Paul Viola and Michael J. Jones. Rapid Object Detection using a Boosted Cascade of Simple Features. IEEE CVPR, 2001.