The challenge of object categorization in images is largely due to arbitrary
translations and scales of the foreground objects. To attack this difficulty,
we propose a new approach called collaborative receptive field learning to
extract specific receptive fields (RF's) or regions from multiple images, and
the selected RF's are supposed to focus on the foreground objects of a common
category. To this end, we solve the problem by maximizing a submodular function
over a similarity graph constructed by a pool of RF candidates. However,
measuring pairwise distance of RF's for building the similarity graph is a
nontrivial problem. Hence, we introduce a similarity metric called
pyramid-error distance (PED) to measure their pairwise distances through
summing up pyramid-like matching errors over a set of low-level features.
Besides, in consistent with the proposed PED, we construct a simple
nonparametric classifier for classification. Experimental results show that our
method effectively discovers the foreground objects in images, and improves
classification performance.