Kohei Uehara's Website
About me
I am Kohei Uehara (上原 康平), an Assistant Professor at Machine Intelligence Lab, the University of Tokyo. I'm also working as a part-time researcher at Accessibility Lab, Miraikan (The National Museum of Emerging Science and Innovation). My research interest focuses on machine learning across vision and language, Large Language Models (LLMs), Accessibility, and Human-Computer Interaction (HCI).Current Positions
- Assistant Professor, Machine Intelligence Lab., Research Center for Advanced Science and Technology (RCAST), The University of Tokyo
- Part-Time Researcher, Accessibility Lab., Miraikan (The National Museum of Emerging Science and Innovation)
- Visiting Researcher, Machine Intelligence for Medical Engineering Team, RIKEN
Education
- April 2020 - March 2023 : Ph. D. student, Information Science and Technology, The University of Tokyo. (Advisor: Prof. Tatsuya Harada)
- April 2018 - March 2020 : Master’s student, Information Science and Technology, The University of Tokyo. (Advisor: Prof. Tatsuya Harada)
- April 2014 - March 2018 : Undergraduate student, Mechano-Informatics, The University of Tokyo. (Advisor: Prof. Tatsuya Harada)
Projects
Asagi - Japanese Vision&Language Model
Asagi is a Japanese Vision&Language Model.
The architecture of Asagi is based on LLaVA, which consists of a vision encoder, a language decoder, and a 2-layer MLP for projecting visual features into the language feature space.
We used Japanese LLMs as the language decoder, and the vision encoder is based on the SigLIP model.
We synthesized a large-scale Japanese Vision & Language dataset, consisting of approximately 20 million image-text pairs.
The model is publicly available on the Hugging Face Model Hub.
Please check the project page for more details.

Publications
Journal and International Conference
- NEW Kohtaro Tanaka, Kohei Uehara, Lin Gu, Yusuke Mukuta, Tatsuya Harada. Content-Specific Humorous Image Captioning Using Incongruity Resolution Chain-of-Thought. In Findings of the Association for Computational Linguistics (NAACL Findings), 2024
- Kohei Uehara and Tatsuya Harada. Learning by Asking Questions for Knowledge-Based Novel Object Recognition. International Journal of Computer Vision (IJCV), 2024 [Paper] [Project Page]
- Kohei Uehara and Tatsuya Harada. K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition. WACV, 2023. [Paper] [Project Page]
- Kohei Uehara, Nan Duan, Tatsuya Harada. Learning to Ask Informative Sub-Questions for Visual Question Answering. 5th Multimodal Learning and Applications Workshop (CVPR 2022, Workshop), 2022. [Paper]
- Kohei Uehara†, Yusuke Mori† (†equal contribution), Yusuke Mukuta and Tatsuya Harada. ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer. The 1st International Workshop on Multimodal Understanding for the Web and Social Media (WWW 2022, Workshop), 2022. [Paper]
- Kohei Uehara, Tatsuya Harada. Unsupervised Keyword Extraction for Full-sentence VQA. First International Workshop on Natural Language Processing Beyond Text with EMNLP 2020 (NLPBT2020), 2020. [Paper]
- Sho Maeoki, Kohei Uehara, Tatsuya Harada. Interactive Video Retrieval with Dialog. CVPR 2020 Workshop on Multimodal Learning, 2020. [Paper]
- Kohei Uehara, Antonio Tejero-de-Pablos, Yoshitaka Ushiku and Tatsuya Harada. Visual Question Generation for Class Acquisition of Unknown Objects. The 15th European Conference on Computer Vision (ECCV2018), 2018. [Paper]
Domestic Conference
- 森 友亮†, 上原康平† († equal contribution), 原田達也. 視覚・言語融合 Transformer モデルによる画像からの物語文生成. CAI+CAI first workshop(言語処理学会第27回年次大会 ワークショップ), 2021. [Paper]
Others
- NEW Masaki Kuribayashi, Kohei Uehara, Allan Wang, Daisuke Sato, Simon Chu, Shigeo Morishima. Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People. arXiv, 2024. [Paper]
- Kohei Uehara, Nabarun Goswami, Hanqin Wang, Toshiaki Baba, Kohtaro Tanaka, Tomohiro Hashimoto, Kai Wang, Rei Ito, Takagi Naoya, Ryo Umagami, Yingyi Wen, Tanachai Anakewat, Tatsuya Harada. Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation. arXiv, 2024. [Paper]
Competitions
- The 5th place in the Visual Question Answering (VQA) Challenge 2018 in CVPR2018. Mikihiro Tanaka, Atsuhiro Noguchi, Kohei Uehara, Lisa Kawai, Yoshitaka Ushiku, Tatsuya Harada [competition page]
Lectures
- Intelligent Informatics - Graduate School of Information Science and Technology, The University of Tokyo, June 6, 2024
Invited Talks
- Kohei Uehara, Antonio Tejero-de-Pablos, Yoshitaka Ushiku and Tatsuya Harada. Visual Question Generation for Class Acquisition of Unknown Objects. FIT, 2019. [Program]
- Kohei Uehara, Antonio Tejero-de-Pablos, Yoshitaka Ushiku and Tatsuya Harada. Visual Question Generation for Class Acquisition of Unknown Objects. MIRU, 2019. [Program]
- Kohei Uehara, Antonio Tejero-de-Pablos, Yoshitaka Ushiku and Tatsuya Harada. Visual Question Generation for Class Acquisition of Unknown Objects. PRMU, 2019. [Program]
Work Experiences
- April 2023 - Current: The University of Tokyo, Assistant Professor
- April 2021 - July 2021: NVIDIA, Research Internship
- February 2019 - April 2019 : LINE Corporation, Machine Learning Engineer, Part time job
- August 2018 : Mercari, Inc. Machine Learning Engineer Internship
Grants & Fellowships
- January 2021 - December 2021 : Microsoft Research Asia Collaborative Research for Ph.D. Student 2021 (D-CORE 2021)
- April 2020 - March 2023 : Japan Society for the Promotion of Science (JSPS) Research Fellowship for Young Scientists (DC1)
Professional Activities
- Reviewer : ICCV, CVPR, ECCV, WACV, AAAI, NeurIPS, etc.
Links
Google Scholar Citations
last update: February 24, 2025