Call for Algorithm Competition in Foreground/Background Segmentation
Segmentation of foreground objects -- especially moving objects -- in image sequences is a core aspect in many computer vision systems including automated visual surveillance. Commonly, a foreground/background segmentation algorithm will provide at every time instance (after maybe some initial training) an estimate of the background image as well as a probability foreground image mask, where a probability score of belonging to a foreground object is specified for each pixel. There exists a wide variety of proposed algorithms for foreground/background segmentation. However, it is still difficult to compare various algorithms since (a) implementations are not available in source code and (b) the algorithms have been tested on different datasets and under widely varying conditions. Results reported in the literature do not provide a direct comparison among algorithms because each researcher reports results using different assumptions, evaluation methods and test sequences.
This algorithm competition is unique in that all participating algorithms
must be submitted in source code complying to a very minimal, but also very general C/C++ API (based on OpenCV),
are applied and evaluated on the same public data sets
using a performance evaluation available in C/C++ source code.
This enables the evaluation of differences between the various foreground/background segmentation algorithms. Therefore the independently administered test provides a direct quantitative assessment of the relative strengths and weaknesses of the different approaches. As the test sets and performance metrics will likely change over time based on the submitted suggestions and discussions at the workshop, having algorithms available in source code will make it possible to re-run the tests. (NOTE: Researchers will have the opportunity to include their source code into the next OpenCV release.)
To obtain robust assessment of performance, algorithms are evaluated against different categories of test sequences. The test datasets will include the following problems:
vacillate background
gradual illumination changes
sudden changes in illumination
bootstrapping (i.e., a training period of absent foreground is not available)
shadows
The aim of the competition is manifold. It will bring together researchers interested in the area of background/foreground segmentation and discuss which criteria and test cases should be used for objective evaluation. Moreover through this competition the community learns in an open manner of the important technical problems to be addressed and how the community is progressing toward solving these problems.
Submission details
Important
dates
API
Download
Test
videos
Evaluation
Contact
References
Submission of algorithms must be in C/C++ source code against a predefined Application Programming Interface (API). This API is very simple (only 3 functions and one data structure), but also very flexible and generic. Each submitted algorithm must be accompanied by a 4-page paper describing the algorithm in detail. All working submissions are given the opportunity to present their algorithm as a poster, the best performing algorithms will get the chance to present orally. Researchers will have the opportunity to include their source code into the next OpenCV release.
Formatting : Please prepare your paper using the ACM template
for the conference. Papers must be submitted in the Portable Document
Format (PDF), formatted in two-column conference style.
Page
Limits : not more than 4 pages
On-line Submission
instructions:
Please submit the API function
implementations with your .NET solution file (windows) or Linux
makefile. We will first try to use only your .{c,cpp,h} files, but in
case we have problems, it is good if we can consult your
solution/make file. We do not need any executables. Also, please
attach your 4-page paper in postscript or PDF to the email.
Please
send your submission by mail to
Eva.Hoerster@informatik.uni-augsburg.de and
Rainer.Lienhart@informatik.uni-augsburg.de.
19 Aug. 2006: Submission deadline for algorithms and
papers.
04 Sept. 2006: Notification of acceptance
The Application Programming Interface consists of only three functions (named myCreateFGDStatModel(), myUpdateBGStatModel(), and myRleaseBGStatModel() in the following) and one data structure (named MyBGStatModel is the following).
Every algorithm must use a superset of the provided data structure CV_BG_STAT_MODEL_FIELDS() to store all its necessary state information:
//#define
CV_BG_STAT_MODEL_FIELDS() \
// int type; /*type of BG model*/ \
//
CvReleaseBGStatModel release; /*release function*/ \
//
CvUpdateBGStatModel update; /* update bg model*/ \
//
IplImage* background; /*8UC3 reference background image*/ \
//
IplImage* foreground; /*8UC1 foreground image*/ \
//
IplImage** layers; /*8UC3 reference background image, can be null */
\
// int layer_count; /* can be zero */ \
// CvMemStorage*
storage; /*storage for “foreground_regions”*/ \
//
CvSeq* foreground_regions /*foreground object contours*/
/*
ignore the variables “int type”, “CvMemStorage
storage” and “CvSeq* foreground_regions”
*/
//define your own model, i.e., extend the
CV_BG_STAT_MODEL_FIELDS() model
typedef
struct
MyBGStatModel
{
CV_BG_STAT_MODEL_FIELDS();
//
... more fields could be added here ...
}
MyBGStatModel;
The unsigned char 3 channel image (8uC3) named background must contain always the current estimate of the background image, while foreground contains a mask image indicating which pixels are currently to be considered foreground. The foreground mask image is of pixel type unsigned char with one channel (8uC1). A pixel value of 0 indicates that at that position we have background, while a value of 255 indicates a foreground pixel. Values between these two extremes can be interpreted as probability values (prob = value / 255.0f ) of being a foreground pixel. Update and release must hold the function pointers to the respective function each algorithm must implement.
Optionally an algorithm can support a layered representation. Layers are sometimes needed to keep track of objects that move into the scene, settle down, and stay there for some time before they start to move again. One can export this representation by dynamically updating their visual appearance and location in layers and layer_count. As mentioned, this feature is optional and not used during the performance evaluation this year. Only background and foreground will be use for performance evalution.
The data structure MyBGStatModel will have to be created by means of calling myCreateFGDStatModel(). An example of this function is given below
/*
Creates FGD model */
// first_frame must be 8uC3 (= 3 channel
image (RGB))
CvBGStatModel*
myCreateFGDStatModel( IplImage* first_frame )
{
//create
MyBGStatModel*
myBGStatModel = new
MyBGStatModel;
/* ... fill the struct
with your parameters ... */
/// e.g.
//create images for
background and foreground
myBGStatModel->background
= cvCreateImage(cvGetSize(first_frame), IPL_DEPTH_8U,
first_frame->nChannels);
myBGStatModel->foreground =
cvCreateImage(cvGetSize(first_frame), IPL_DEPTH_8U, 1);
//layer
images and number of layers used (could be
zero)
myBGStatModel->layer_count =
0;
myBGStatModel->layers = 0;
//your
algorithm specific update and release functions
myBGStatModel->update
= myUpdateBGStatModel;
myBGStatModel->release =
myReleaseBGStatModel;
// ... and
cast your structure to the smaller generic structure
return
(CvBGStatModel*)myBGStatModel;
}
The functions that does all the work is myUpdateBGStatModel(). It has the following prototype form:
/*
Updates model*/
// typedef int (CV_CDECL * CvUpdateBGStatModel)(
IplImage* curr_frame, struct CvBGStatModel* bg_model );
int
myUpdateBGStatModel( IplImage* curr_frame, struct
CvBGStatModel* bg_model )
{
//
Necessary cast to get from the generic (i.e., common) part of the
data structure
// to your algorithm specific
fields.
MyBGStatModel* myBGStatModel =
(MyBGStatModel*) bg_model;
//... define your algorithm
specific update function; do whatever you have to to
// return
the numbers of layers you have found; if your algorithm does not
support layers, return 0
return
0;
}
Finally, the function myReleaseBGStatModel() cleans up everything. Its function prototype is
void myReleaseBGStatModel( struct CvBGStatModel** bg_model )
Having all this defined running a given foreground segmentation algorithm is extremely simple:
int
main(int argc, char**
argv)
{
IplImage*
tmp_frame = NULL;
CvCapture* cap = NULL;
//capture
video from file
cap =
cvCaptureFromFile(argv[1]);
tmp_frame =
cvQueryFrame(cap);
if(!tmp_frame)
{ printf("bad video \n"); exit(0); }
//create
windows to show background and foreground
images
cvNamedWindow("Background",
1);
cvNamedWindow("Foreground Mask", 1);
//create
BG model
CvBGStatModel* bg_model =
myCreateFGDStatModel( tmp_frame );
//for
all frames in the video
for(
int fr =
1;tmp_frame; tmp_frame = cvQueryFrame(cap), fr++ ) {
//update
BG model
//myUpdateBGStatModel( tmp_frame, bg_model
);
bg_model->update( tmp_frame, bg_model );
//show
current estimation
cvShowImage("Background",
bg_model->background);
cvShowImage("Foreground Mask",
bg_model->foreground);
int
k = cvWaitKey(5);
if(
k == 27 ) break;
printf("frame#
%d \r", fr);
}
//release
BG model
// myReleaseBGStatModel( &bg_model
);
bg_model->release( &bg_model );
//release
capture
cvReleaseCapture(&cap);
return
0;
}
The whole archive with training videos & sample projects (MS VisualStudio .NET2003 & .NET2005) can be downloaded here: videos & src code. Note the code will require OpenCV beta5. Please copy the src code into the same directory as in which OpenCV resides. For instance, if “C:\Program files\OpenCV” is the OpenCV directory, then the source code should be in “C:\Program files\VSSN06-src-MSVC2003”.
The MS VisualStudio .NET 2003 solution file VSSN05.sln is divided into 3 projects:
BG_FG_Template,
BG_FG_Example and
BG_FG_Evaluation. (For more information regarding the evaluation procedure and code see subsection Evaluation.)
Project BG_FG_Template:
The first step is to define the structure holding all your state
information (MyBGStatModel). Therefore you may extend the existing
OpenCV
structure CV_BG_STAT_MODEL_FIELDS() by as much
variables as you need. CV_BG_STAT_MODEL_FIELDS()
consists of two functions for updating and releasing your state
information as well as IplImages for the current foreground mask
image and the current estimated background. Furthermore you may
specify a number of layers.
Second you need to specify the
functions myReleaseBGStatModel(), myUpdateBGStatModel(),
myCreateFGDStatModel().
Project BG_FG_Example: An example showing how to use the above API and template project do create a real foreground /background segmentation algorithm. The implemented algorithm was presented by L. Li in [1].
In the sample project the following include paths are set:
..\..\opencv\cxcore\include;..\..\opencv\cv\include;..\..\opencv\cvaux\include;..\..\opencv\otherlibs\highgui;..\..\opencv\cv\src
The
following library path is needed: ..\..\opencv\lib
The
following libraries are needed: cv.lib cxcore.lib highgui.lib
cvaux.lib
Each test video will consist of
a video consisting of some (maybe dynamic) background and one or several foreground objects and
a foreground mask video (ground truth video) specifying each pixel belonging to a foreground object (pixel values above 128; same pixel values belong to the same object, while different values belong to different objects).
Each video will be color video of size 320x240 or 384x240 at 25 fps. The foreground objects are taken from [2], [3]. Four different training video will be provided. A start up period is given for each video in which the system can learn the background already. During this period (if not denoted otherwise it can be assumed 10 seconds) the performance will not be evaluated. For each of test category, which include different problems in background/foreground estimation one or more training videos are provided:
vacillate background, gradual illumination changes, shadows:
input video 9 (no ground truth provided)
sudden changes in illumination
input video 3 and ground truth video 3 (here the training period is 25 seconds)
input video (tunnel video, no ground truth provided)
bootstrapping (i.e., a training period of absent foreground is not available) (input video 4 and ground truth video 4)
two or more cameras with many people. The fact that the two views overlap can be exploded, but it is not required.
Given
an input video consisting of some (maybe dynamic) background and one or several foreground objects and
a foreground mask video (ground truth video) specifying each pixel belonging to a foreground object (pixel values above 128; same pixel values belong to the same object, while different values belong to different objects), and
a startup period during which the performance will not be evaluated, but the system can learn the foreground and background already
the minimal, average, and maximal count of false alarms pixels and missed foreground pixels per video frame will be calculated. In order to allow for small boundary errors, errors within 2 pixels of the boundary between foreground and background will not be counted. As mentioned the performance evaluation will start after the initial startup period. This startup period is currently assumed to be 10 seconds.
The code for the performance evaluation can be found under “vssn06-src-MSVC2005\BG_FG_Evaluation\” in the code archive (~120 MB consisting of source code, latest OpenCV source code, and videos).
|
Input: |
|
|
|
Algorithm
|
|
|
|
Performance
|
|
Eva Hörster:
Eva.Hoerster@informatik.uni-augsburg.de
Rainer Lienhart:
Rainer.Lienhart@informatik.uni-augsburg.de
[1] L. Li, W. Huang, I.Y.H. Gu, Q. Tian "Foreground object detection from videos containing complex background", ACM Multimedia, 2003
[2] http://www.mpi-sb.mpg.de/departments/irg3/software.html