Topic - Can coding agents autonomouslyimplement AI research extensions?
CheckEval, A reliable LLM-as-a-Judge framework for evaluating text generation using checklists. Thanks to my collaborators ❤️
Can coding agents autonomously implement AI research extensions? Our [RExBench](https://arxiv.org/abs/2506.22598) is now available on arXiv !
Our [WritingPath](https://arxiv.org/abs/2404.13919) paper has been accepted to **NAACL 2025 (Industry Track)**!
Our [ContAccum](https://arxiv.org/abs/2406.12356v1) paper has been accepted to **NeurIPS 2024**!
Topic => Goal-Oriented Language Model and Evaluation
🤖💭🦔
about LLM-based Evaluation for Open-ended Generation