💡 Check out our implicit CoT paper "CIRF"
'CIRF: Tokenizing Chain-of-Thoughts into Reusable Functional Units for Efficient Latent Reasoning in Large Language Models' now available on …
'CIRF: Tokenizing Chain-of-Thoughts into Reusable Functional Units for Efficient Latent Reasoning in Large Language Models' now available on …
Can Structural Cues Save LLMs? Evaluating Language Models in Massive Document Streams. Collaboration with Korea University team 🐯
RExBench, Can coding agents autonomously implement AI research extensions?
Please consider submitting your work :)
Topic - Towards a Science of Evaluation for Language Model
Topic - Can coding agents autonomouslyimplement AI research extensions?
CheckEval, A reliable LLM-as-a-Judge framework for evaluating text generation using checklists. Thanks to my collaborators ❤️
Topic - Can coding agents autonomouslyimplement AI research extensions?
Can coding agents autonomously implement AI research extensions? Our [RExBench](https://arxiv.org/abs/2506.22598) is now available on arXiv !
Our [WritingPath](https://arxiv.org/abs/2404.13919) paper has been accepted to **NAACL 2025 (Industry Track)**!