Call for Algorithm Competition in Foreground/Background Segmentation

Segmentation of foreground objects -- especially moving objects -- in image sequences is a core aspect in many computer vision systems including automated visual surveillance. Commonly, a foreground/background segmentation algorithm will provide at every time instance (after maybe some initial training) an estimate of the background image as well as a probability foreground image mask, where a probability score of belonging to a foreground object is specified for each pixel. There exists a wide variety of proposed algorithms for foreground/background segmentation. However, it is still difficult to compare various algorithms since (a) implementations are not available in source code and (b) the algorithms have been tested on different datasets and under widely varying conditions. Results reported in the literature do not provide a direct comparison among algorithms because each researcher reports results using different assumptions, evaluation methods and test sequences.

This algorithm competition is unique in that all participating algorithms

  1. must be submitted in source code complying to a very minimal, but also very general C/C++ API (based on OpenCV),

  2. are applied and evaluated on the same public data sets

  3. using a performance evaluation available in C/C++ source code.

This enables the evaluation of differences between the various foreground/background segmentation algorithms. Therefore the independently administered test provides a direct quantitative assessment of the relative strengths and weaknesses of the different approaches. As the test sets and performance metrics will likely change over time based on the submitted suggestions and discussions at the workshop, having algorithms available in source code will make it possible to re-run the tests. (NOTE: Researchers will have the opportunity to include their source code into the next OpenCV release.)

To obtain robust assessment of performance, algorithms are evaluated against different categories of test sequences. The test datasets will include the following problems:

The aim of the competition is manifold. It will bring together researchers interested in the area of background/foreground segmentation and discuss which criteria and test cases should be used for objective evaluation. Moreover through this competition the community learns in an open manner of the important technical problems to be addressed and how the community is progressing toward solving these problems.

Submission details
Important dates
API
Download
Test videos
Evaluation
Contact
References

Submission details:

Submission of algorithms must be in C/C++ source code against a predefined Application Programming Interface (API). This API is very simple (only 3 functions and one data structure), but also very flexible and generic. Each submitted algorithm must be accompanied by a 4-page paper describing the algorithm in detail. All working submissions are given the opportunity to present their algorithm as a poster, the best performing algorithms will get the chance to present orally. Researchers will have the opportunity to include their source code into the next OpenCV release.

Formatting : Please prepare your paper using the ACM template for the conference. Papers must be submitted in the Portable Document Format (PDF), formatted in two-column conference style.
Page Limits : not more than 4 pages
On-line Submission instructions:
Please submit the API function implementations with your .NET solution file (windows) or Linux makefile. We will first try to use only your .{c,cpp,h} files, but in case we have problems, it is good if we can consult your solution/make file. We do not need any executables. Also, please attach your 4-page paper in postscript or PDF to the email.
Please send your submission by mail to Eva.Hoerster@informatik.uni-augsburg.de and Rainer.Lienhart@informatik.uni-augsburg.de.

Important dates:

19 Aug. 2006: Submission deadline for algorithms and papers.
04 Sept. 2006: Notification of acceptance

API:

The Application Programming Interface consists of only three functions (named myCreateFGDStatModel(), myUpdateBGStatModel(), and myRleaseBGStatModel() in the following) and one data structure (named MyBGStatModel is the following).

Every algorithm must use a superset of the provided data structure CV_BG_STAT_MODEL_FIELDS() to store all its necessary state information:

//#define CV_BG_STAT_MODEL_FIELDS() \
// int type; /*type of BG model*/ \
// CvReleaseBGStatModel release; /*release function*/ \
// CvUpdateBGStatModel update; /* update bg model*/ \
// IplImage* background; /*8UC3 reference background image*/ \
// IplImage* foreground; /*8UC1 foreground image*/ \
// IplImage** layers; /*8UC3 reference background image, can be null */ \
// int layer_count; /* can be zero */ \
// CvMemStorage* storage; /*storage for “foreground_regions”*/ \
// CvSeq* foreground_regions /*foreground object contours*/

/* ignore the variables “int type”, “CvMemStorage storage” and “CvSeq* foreground_regions” */

//define your own model, i.e., extend the CV_BG_STAT_MODEL_FIELDS() model
typedef struct MyBGStatModel
{

CV_BG_STAT_MODEL_FIELDS();
// ... more fields could be added here ...

}
MyBGStatModel;

The unsigned char 3 channel image (8uC3) named background must contain always the current estimate of the background image, while foreground contains a mask image indicating which pixels are currently to be considered foreground. The foreground mask image is of pixel type unsigned char with one channel (8uC1). A pixel value of 0 indicates that at that position we have background, while a value of 255 indicates a foreground pixel. Values between these two extremes can be interpreted as probability values (prob = value / 255.0f ) of being a foreground pixel. Update and release must hold the function pointers to the respective function each algorithm must implement.

Optionally an algorithm can support a layered representation. Layers are sometimes needed to keep track of objects that move into the scene, settle down, and stay there for some time before they start to move again. One can export this representation by dynamically updating their visual appearance and location in layers and layer_count. As mentioned, this feature is optional and not used during the performance evaluation this year. Only background and foreground will be use for performance evalution.

The data structure MyBGStatModel will have to be created by means of calling myCreateFGDStatModel(). An example of this function is given below

/* Creates FGD model */
// first_frame must be 8uC3 (= 3 channel image (RGB))
CvBGStatModel* myCreateFGDStatModel( IplImage* first_frame )
{

//create
MyBGStatModel* myBGStatModel = new MyBGStatModel;
/* ... fill the struct with your parameters ... */
/// e.g.
//create images for background and foreground
myBGStatModel->background = cvCreateImage(cvGetSize(first_frame), IPL_DEPTH_8U, first_frame->nChannels);
myBGStatModel->foreground = cvCreateImage(cvGetSize(first_frame), IPL_DEPTH_8U, 1);

//layer images and number of layers used (could be zero)
myBGStatModel->layer_count = 0;
myBGStatModel->layers = 0;

//your algorithm specific update and release functions
myBGStatModel->update = myUpdateBGStatModel;
myBGStatModel->release = myReleaseBGStatModel;

// ... and cast your structure to the smaller generic structure
return (CvBGStatModel*)myBGStatModel;

}

The functions that does all the work is myUpdateBGStatModel(). It has the following prototype form:

/* Updates model*/
// typedef int (CV_CDECL * CvUpdateBGStatModel)( IplImage* curr_frame, struct CvBGStatModel* bg_model );
int myUpdateBGStatModel( IplImage* curr_frame, struct CvBGStatModel* bg_model )
{

// Necessary cast to get from the generic (i.e., common) part of the data structure
// to your algorithm specific fields.
MyBGStatModel* myBGStatModel = (MyBGStatModel*) bg_model;

//... define your algorithm specific update function; do whatever you have to to

// return the numbers of layers you have found; if your algorithm does not support layers, return 0
return 0;

}

Finally, the function myReleaseBGStatModel() cleans up everything. Its function prototype is

void myReleaseBGStatModel( struct CvBGStatModel** bg_model )

Having all this defined running a given foreground segmentation algorithm is extremely simple:

int main(int argc, char** argv)
{

IplImage* tmp_frame = NULL;
CvCapture* cap = NULL;

//capture video from file
cap = cvCaptureFromFile(argv[1]);
tmp_frame = cvQueryFrame(cap);
if(!tmp_frame) { printf("bad video \n"); exit(0); }

//create windows to show background and foreground images
cvNamedWindow("Background", 1);
cvNamedWindow("Foreground Mask", 1);

//create BG model
CvBGStatModel* bg_model = myCreateFGDStatModel( tmp_frame );
//for all frames in the video
for( int fr = 1;tmp_frame; tmp_frame = cvQueryFrame(cap), fr++ ) {

//update BG model
//myUpdateBGStatModel( tmp_frame, bg_model );
bg_model->update( tmp_frame, bg_model );

//show current estimation
cvShowImage("Background", bg_model->background);
cvShowImage("Foreground Mask", bg_model->foreground);
int k = cvWaitKey(5);
if( k == 27 ) break;
printf("frame# %d \r", fr);

}

//release BG model
// myReleaseBGStatModel( &bg_model );
bg_model->release( &bg_model );

//release capture
cvReleaseCapture(&cap);
return 0;

}

Download:

The whole archive with training videos & sample projects (MS VisualStudio .NET2003 & .NET2005) can be downloaded here: videos & src code. Note the code will require OpenCV beta5. Please copy the src code into the same directory as in which OpenCV resides. For instance, if “C:\Program files\OpenCV” is the OpenCV directory, then the source code should be in “C:\Program files\VSSN06-src-MSVC2003”.

The MS VisualStudio .NET 2003 solution file VSSN05.sln is divided into 3 projects:

  1. BG_FG_Template,

  2. BG_FG_Example and

  3. BG_FG_Evaluation. (For more information regarding the evaluation procedure and code see subsection Evaluation.)

Project BG_FG_Template:

The first step is to define the structure holding all your state information (MyBGStatModel). Therefore you may extend the existing OpenCV structure CV_BG_STAT_MODEL_FIELDS() by as much variables as you need. CV_BG_STAT_MODEL_FIELDS() consists of two functions for updating and releasing your state information as well as IplImages for the current foreground mask image and the current estimated background. Furthermore you may specify a number of layers.
Second you need to specify the functions myReleaseBGStatModel(), myUpdateBGStatModel(), myCreateFGDStatModel().

Project BG_FG_Example: An example showing how to use the above API and template project do create a real foreground /background segmentation algorithm. The implemented algorithm was presented by L. Li in [1].

In the sample project the following include paths are set: ..\..\opencv\cxcore\include;..\..\opencv\cv\include;..\..\opencv\cvaux\include;..\..\opencv\otherlibs\highgui;..\..\opencv\cv\src
The following library path is needed: ..\..\opencv\lib
The following libraries are needed: cv.lib cxcore.lib highgui.lib cvaux.lib


Test Videos:

Each test video will consist of

  1. a video consisting of some (maybe dynamic) background and one or several foreground objects and

  2. a foreground mask video (ground truth video) specifying each pixel belonging to a foreground object (pixel values above 128; same pixel values belong to the same object, while different values belong to different objects).

Each video will be color video of size 320x240 or 384x240 at 25 fps. The foreground objects are taken from [2], [3]. Four different training video will be provided. A start up period is given for each video in which the system can learn the background already. During this period (if not denoted otherwise it can be assumed 10 seconds) the performance will not be evaluated. For each of test category, which include different problems in background/foreground estimation one or more training videos are provided:

Evaluation:

Given

  1. an input video consisting of some (maybe dynamic) background and one or several foreground objects and

  2. a foreground mask video (ground truth video) specifying each pixel belonging to a foreground object (pixel values above 128; same pixel values belong to the same object, while different values belong to different objects), and

  3. a startup period during which the performance will not be evaluated, but the system can learn the foreground and background already

the minimal, average, and maximal count of false alarms pixels and missed foreground pixels per video frame will be calculated. In order to allow for small boundary errors, errors within 2 pixels of the boundary between foreground and background will not be counted. As mentioned the performance evaluation will start after the initial startup period. This startup period is currently assumed to be 10 seconds.

The code for the performance evaluation can be found under “vssn06-src-MSVC2005\BG_FG_Evaluation\” in the code archive (~120 MB consisting of source code, latest OpenCV source code, and videos).

Input:


Figure 1: Source video frame




Figure 2: Ground truth foreground mask



Algorithm
Output:


Figure 3: Currently Estimated Background




Figure 4: Currently Estimated Foreground Mask



Performance
Output:


Figure 5: False Alarms




Figure 6: Missed Foreground Pixels





Contact:

Eva Hörster: Eva.Hoerster@informatik.uni-augsburg.de
Rainer Lienhart: Rainer.Lienhart@informatik.uni-augsburg.de

References:

[1] L. Li, W. Huang, I.Y.H. Gu, Q. Tian "Foreground object detection from videos containing complex background", ACM Multimedia, 2003

[2] http://www.mpi-sb.mpg.de/departments/irg3/software.html

[3] http://www.gifart.de/