Zebin Yao

I am a 2nd-year Master student at the School of Computer Science at Nankai University, fortunately advised by Prof. Gang Wang and Prof. Xiaoguang Liu. Before that, I was a undergraduate student in Computer Science and Technique of Nankai University.

yaozb[@]nbjl.nankai.edu.cn / GitHub /

Research

My primary research interests lie in developing high-performance vector search systems and their efficient application within large language models, such as retrieval-augmented generation and search agents. I am also interested in developing efficient disk index and parallel algorithms for approximate nearest neighbor search.

I am actively working on SearchAgent-X, a highly efficient system for reasoning-search interleaved large language model (LLM) agents. I am integrating SearchAgent-X into reasoning search agents with reinforcement learning, such as ReSearch and Search-R1, to accelerate post-training/deploying search agents.

Dynamic Detect and Fix Hardness for Efficient Approximate Nearest Neighbor Search

Zhiyuan Hua, Qiji Mo, Zebin Yao, et al.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2026
paper / code /

We propose a novel method for improving approximate nearest neighbor search performance in out-of-distribution scenarios by introducing Escape Hardness to dynamically identify and fix defective graph regions.

Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents

Tiannuo Yang*, Zebin Yao*, Bowen Jin, Lixiao Cui, et al.
arXiv, 2025
arxiv / code /

We demystifies the key factors affecting the efficiency of LLM-based search agents and, based on these insights, designs SearchAgent-X to improve end-to-end efficiency without compromising generation quality.

ALGAS: A Low-latency GPU-Accelerated Approximate Nearest Neighbor Search System

Yuanhui Chen, Lixiao Cui, Zebin Yao, Hao Zhou, et al.
IEEE International Parallel & Distributed Processing Symposium, 2025
paper /

We present a low-latency GPU vector search system for small batch processing that optimizes query bubble via dynamic batching, reduces sorting overhead with beam extend, eliminates TopK-merging overhead through GPU-CPU cooperation, and enhances resource utilization with adaptive tuning.

Design and source code from Jon Barron's website