Friday, January 30, 2015

Object localization using color histograms

This post shows how to use histogram backprojection to find the location of a known object within an image. Histogram backprojection for object localization was first proposed in the paper 'Color Indexing' by Swain and Ballard [1]. Following the terminology in the paper, the known object is referred to as the 'model' and the image within which we are searching the model is referred to as the 'image' below. I'm using OpenCV and C++ to demonstrate the techniques presented in the paper. 

Histogram backprojection tells us where in an image the colors of a given model histogram occur. Here for localization, we are backprojecting what is called the ratio histogram onto the image. Ratio histogram is defined as

R[i] = max(M[i]/I[i], 1) for all bins
where M, I and R are Model, Image and Ratio histograms respectively.

This backprojected image is then convolved with a circular mask having the same area as our model. The peak of the convolution should hopefully give us the location of the model within the image if the model appears in the image. 


Following program shows how to apply the above technique to locate a model within an image. It uses RGB color space with 32 bins for each channel. The model here was extracted from a different image than the test image and there were no significant lighting changes.

#include <opencv2/opencv.hpp>

using namespace cv;

int main(int argc, char* argv[])   
{ 
   Mat model = imread("model.jpg");
   Mat image = imread("image.jpg");

   int bins = 32;
   double d, max;
   Point maxPt;

   MatND histModel, histImage, histRatio;
   Mat model32fc3, image32fc3, backprj, kernel, conv;

   Mat temp, color;

   const int channels[] = {0, 1, 2};
   const int histSize[] = {bins, bins, bins};
   const float rgbRange[] = {0, 256};
   const float* ranges[] = {rgbRange, rgbRange, rgbRange};

   // model histogram
   model.convertTo(model32fc3, CV_32FC3);
   calcHist(&model32fc3, 1, channels, Mat(), histModel, 3, histSize, ranges, true, false);
   // image histogram
   image.convertTo(image32fc3, CV_32FC3);
   calcHist(&image32fc3, 1, channels, Mat(), histImage, 3, histSize, ranges, true, false);
   // ratio histogram
   divide(histModel, histImage, histRatio, 1.0, CV_32F);
   cv::min(histRatio, 1.0, histRatio);
   // backproject ratio histogram onto the image
   calcBackProject(&image32fc3, 1, channels, histRatio, backprj, ranges);
   // obtain a circular kernel having the same area as the model
   d = sqrt(4*model.rows*model.cols/CV_PI);
   kernel = getStructuringElement(MORPH_ELLIPSE, Size((int)d, (int)d));
   // convolve the kernel with the backprojected image
   filter2D(backprj, conv, CV_32F, kernel);
   // find the peak
   minMaxLoc(conv, NULL, &max, NULL, &maxPt);

   // display with color map
   // ratio histogram backprojected image
   backprj.convertTo(temp, CV_8U, 255);
   applyColorMap(temp, color, COLORMAP_JET);
   imshow("ratio histogram backprojection onto the image", color);
   // convolution result
   conv.convertTo(temp, CV_8U, 255/max);
   applyColorMap(temp, color, COLORMAP_JET);
   imshow("convolution with circular mask", color);
   // location of the peak
   image.copyTo(color);
   circle(color, Point(maxPt.x, maxPt.y), (int)(d/2), Scalar(0, 0, 255), 3);
   imshow("location of the peak", color);
   waitKey();

   return 0;   
}  

It's worth mentioning few points here:
* The significance of backprojecting the ratio histogram rather than the normalized model histogram is that, if a certain model color is distributed over large areas in the image (i.e. M[i] << I[i] for some bin i), those areas in the backprojected image will get a low value, hence less likely to distract the localization mechanism.
* Convolution kernel needn't be a circular mask. It is used here to handle the general case where the orientation of the model within the image is unknown.
* The case where I[i] = 0 for any M[i]: Since this color is absent in the image, the corresponding R[i] won't be backprojected onto the image.

* Color histograms are very sensitive to illumination. If the lighting conditions vary between models and images, this technique will not perform well. Models and Images should be preprocessed using a color constancy method in this case.
* This technique is robust to viewpoint changes and occlusion.

Below are the inputs and results of the program:

Model:
Location of the model within the image:
Ratio histogram backprojection onto the image:
Result of convolution with circular mask:

References
[1]M. J. Swain and D. H. Ballard, “Color indexing,” Int J Comput Vision, vol. 7, no. 1, pp. 11–32, Nov. 1991.