Research
Overview
Our lab explores natural language processing (NLP) with a focus on large language models (LLMs) and AI agents. We study fundamental challenges that remain unsolved despite the recent progress of LLMs, including the accuracy and updatability of knowledge, reasoning and planning, behavior control, and the understanding of internal mechanisms. We also investigate AI agents that interact with external environments and tools, reason and plan toward goals, and act autonomously.
Our goal is to develop AI systems that can reason, plan, and act autonomously, grounded in rich knowledge of the real world.
Deep Research Agents
Although recent LLMs demonstrate impressive capabilities, they still struggle with complex problems that require up-to-date information beyond the knowledge encoded in the model.
In this project, we study question-answering AI agents, or Deep Research agents, that autonomously search for the information needed to answer complex questions and generate responses by integrating and synthesizing the collected information. By leveraging multiple information sources and performing advanced reasoning and multi-step exploration, we aim to enable reliable question answering for complex real-world problems.
Achievements in NeurIPS Competitions
NeurIPS has hosted a series of competitions on question answering systems. We have participated three times, earning two first-place finishes and one runner-up finish.
- 2017: Achieved the best performance at the Human-Computer QA Competition, and also defeated by a large margin a human team consisting of six U.S. quiz champions.
- 2020: At the EfficientQA Competition, achieved runner-up in the constrained track behind Facebook (now Meta), and third place in the unconstrained track behind Microsoft and Facebook.
- 2025: Won the NeurIPS MMU-RAG Competition with a system based on LLMs that performs multi-step autonomous reasoning and search.
Papers
- Yamada et al. An Open and Reproducible Deep Research Agent for Long-Form Question Answering. Preprint. 2026.
- Yamada et al. Efficient passage retrieval with hashing for open-domain question answering. ACL. 2021.
- Min et al. NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned. PMLR. 2021.
- Wallace et al. Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering. TACL. 2019.
- Yamada et al. Studio Ousia’s Quiz Bowl Question Answering System. The NIPS ‘17 Competition: Building Intelligent Systems. 2018.
Knowledgeable AI: Enhancing How LLMs Handle Knowledge
Many real-world problems require specialized or organization-specific knowledge that is available only within a particular community or institution, yet such knowledge is difficult for LLMs to acquire during training. In addition, knowledge must be updated as the world changes. Even when models have learned relevant knowledge, they do not always use it effectively, especially in low-resource languages.
Two representative approaches to incorporating knowledge into LLMs are continual learning and retrieval-augmented generation (RAG). However, continual learning is often costly and can suffer from catastrophic forgetting, while RAG depends heavily on retriever quality and is limited by the model’s context window. To overcome these limitations, we study LLMs that can efficiently acquire, update, and accumulate knowledge.
Papers
- Yamada et al. Dynamic injection of entity knowledge into dense retrievers. EMNLP Findings. 2025.
- Yamada et al. LEIA: Facilitating cross-lingual knowledge transfer in language models with entity-based data augmentation. ACL Findings. 2024.
- Oba et al. Entity embedding completion for wide-coverage entity disambiguation. EMNLP Findings. 2022.
- Nishikawa et al. EASE: Entity-aware contrastive learning of sentence embedding. NAACL. 2022.
- Ri et al. mLUKE: The power of entity representations in multilingual pretrained language models. ACL. 2022.
- Yamada et al. LUKE: Deep contextualized entity representations with entity-aware self-attention. EMNLP. 2020.