News

💡 Check out our math reasoning paper "Cliff Token"

'Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning' now available on [arXiv](https://arxiv.org/abs/2606.25524)! Work with Jaeyong Ko and Pilsung …

Jun 24, 2026 • 1 min read

💡 Check out our implicit CoT paper "CIRF"

'CIRF: Tokenizing Chain-of-Thoughts into Reusable Functional Units for Efficient Latent Reasoning in Large Language Models' now available on …

May 30, 2026 • 1 min read

🎉 "Can Structural Cues Save LLMs?" has been accepted to KDD 2026!

Can Structural Cues Save LLMs? Evaluating Language Models in Massive Document Streams. Work with Yebin Lim, Woojun Jung, Wonjun Choi and Susik Yoon 🐯

May 15, 2026 • 1 min read

🎉 RExBench has been accepted to ACL 2026!

RExBench, Can coding agents autonomously implement AI research extensions?

Apr 6, 2026 • 1 min read

🎉 Co-organizing the "AI Modeling for Disappearing Knowledge" workshop at IJCAI 2026.

Please consider submitting your work :)

Mar 7, 2026 • 1 min read

👩🏻‍🏫 Invited talks at SNU (Jan 2nd), HYU (Jan 8th), and KU (Jan 9th)

Topic - Towards a Science of Evaluation for Language Model

Jan 2, 2026 • 1 min read

👩🏻‍🏫 Gave an invited talk at Stanford/UW (RExBench 🦖) with Nicholas

Topic - Can coding agents autonomouslyimplement AI research extensions?

Sep 15, 2025 • 1 min read

🎉 CheckEval has been accepted to EMNLP 2025!

CheckEval, A reliable LLM-as-a-Judge framework for evaluating text generation using checklists. Thanks to my collaborators ❤️

Aug 21, 2025 • 1 min read

👩🏻‍🏫 Gave an invitied talk at Korea University (RExBench 🦖)

Topic - Can coding agents autonomously implement AI research extensions?

Aug 19, 2025 • 1 min read

💡 Check out our RExBench paper

Can coding agents autonomously implement AI research extensions? Our [RExBench](https://arxiv.org/abs/2506.22598) is now available on arXiv !

Jun 30, 2025 • 1 min read

No results found

News