Fusing Scene Context to Improve Object Recognition



convolutional neural networks, neural networks, object recognition


Computer vision is the science that aims to give computers the capability of seeing the world around them. Among its tasks, object recognition intends to classify objects and to identify where each object is in a given image. As objects tend to occur in particular environments, their contextual association can be useful to improve the object recognition task. To address the contextual awareness on object recognition task, our approach aims to use the context of the scenes in order to achieve higher quality in object recognition, by fusing context information with object detection features. Hence, we propose a novel architecture composed of two convolutional neural networks based on two well-known pre-trained nets: Places365-CNN and Faster R-CNN. Our two-streams architecture uses the concatenation of object features with scene context features in a late fusion approach. We perform experiments using PASCAL VOC 2007 and MS COCO public datasets, analyzing its performance in different values of intersection over union. Results show that our approach is able to raise in-context object scores, and reduces out-of-context objects scores.


Download data is not yet available.