About
My research focuses on developing practical language agents to assist humans in real-world tasks and enhance productivity. Two key challenges are grounding and planning. Existing work has primarily focused on action grounding. However, situational grounding at a higher level, which aligns the agent’s internal state representation with the actual environment, remains underexplored. In planning, search is essential for optimizing performance but presents significant challenges in agent tasks due to the complexity of environmental interactions. Robust world modeling plays a crucial role in improving both situational grounding and search.
My current efforts are dedicated to advancing methods in both situational grounding and planning.
As an AI researcher in general, I’m a fan of compression is intelligence, but NOT a fan of scaling is all you need. To achieve better compression, the following questions have to be answered.
- What’s the alternative to scaling law for achieving better compression? Should we combine weights-based compression with symbol-based compression (e.g., language)? And how?
- Closely related to the first question, what tasks are interpolative in vector space, and what are not? Our understanding of this question is constantly evolving (e.g., how many people considered learning natural language grammar an interpolative task a decade ago?). Fortunately, I believe most agent tasks today are still viewed as non-interpolative, which motivates us to explore solutions beyond scaling law using agent tasks as testbeds.
Selected Publications
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Yu Gu*, Boyuan Zheng*, Boyu Gou, Kai Zhang, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu Su
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Xiao Liu*, Tianjie Zhang*, Yu Gu*, Iat Long Iong, Yifan Xu, Xixuan Song, Shudan Zhang, Hanyu Lai, Xinyi Liu, Hanlin Zhao, Jiadai Sun, Xinyue Yang, Yu Yang, Zehan Qi, Shuntian Yao, Xueqiao Sun, Siyi Cheng, Qinkai Zheng, Hao Yu, Hanchen Zhang, Wenyi Hong, Ming Ding, Lihang Pan, Xiaotao Gu, Aohan Zeng, Zhengxiao Du, Chan Hee Song, Yu Su, Yuxiao Dong, Jie Tang
Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments
Yu Gu, Xiang Deng, Yu Su
ArcaneQA: Dynamic Program Induction and Contextualized Encoding for Knowledge Base Question Answering
Yu Gu, Yu Su
Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases
Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, Yu Su