Learning by Asking Questions for Knowledge-Based Novel Object Recognition

International Journal of Computer Vision (IJCV)


Kohei Uehara1, Tatsuya Harada1, 2

1The University of Tokyo, 2RIKEN

Paper Dataset

In real-world object recognition, there are numerous object classes to be recognized. Traditional image recognition methods based on supervised learning can only recognize object classes present in the training data, and have limited applicability in the real world. In contrast, humans can recognize novel objects by questioning and acquiring knowledge about them. Inspired by this, we propose a framework for acquiring external knowledge by generating questions that enable the model to instantly recognize novel objects. Our framework comprises three components: the object classifier (OC), which performs knowledge-based object recognition, the question generator (QG), which generates knowledge-aware questions to acquire novel knowledge, and the policy decision (PD) Model, which determines the “policy” of questions to be asked. The PD model utilizes two strategies, namely “confirmation” and “exploration”—the former confirms candidate knowledge while the latter explores completely new knowledge. Our experiments demonstrate that the proposed pipeline effectively acquires knowledge about novel objects compared to several baselines, and realizes novel object recognition utilizing the obtained knowledge. We also performed a real-world evaluation in which humans responded to the generated questions, and the model used the acquired knowledge to retrain the OC, which is a fundamental step toward a real-world human-in-the-loop learning-by-asking framework.

intro_web

Dataset


The K-VQG v2 dataset ( kvqg_dataset_v2.zip ) can be downloaded from the link below.
After unzipping the downloaded file, you will get two files: kvqgv2_train.json and kvqgv2_val.json.
The training data contains 16,110 questions and the validation data contains 6,102 questions.
Image files can be downloaded from the Visual Genome website.
Note that the image id and object id are consistent with Visual Genome.


Here is an example of the data contained in the dataset.



We adapted the template for this website from IDR.