AI Writers Have a Consistent Stylometric Footprint, but AI Editors Do Not

Apr 1, 2026·

Zhengyang Shan*

Yukyung Lee*

Sophie Hao

· 0 min read

Abstract

Text generated by large language models (LLMs) has been shown to be stylometrically distinct from human-written text. We show that stylometric features alone can accurately distinguish LLM-generated text from human-written text via logistic regression, with two such features—entropy and lexical diversity—forming a consistent ``stylometric footprint’’ that characterizes text generated by 8 different LLMs across 5 domains. LLM-edited text, however, does not conform to this footprint: we show that LLM editing has little to no effect on the stylometric properties of text originally written by humans. Consequently, logistic regression models trained on stylometric features largely fail to distinguish LLM-edited text from human-written text, but reliably separate LLM-edited text from LLM-generated text. This contrasts with Transformer-based models for detection of LLM editing, which are much more likely to mistake LLM-edited text for LLM-generated text. We show that including a stylometric model via ensemble modeling significantly improves Transformer models’ ability to distinguish between LLM-edited text and LLM-generated text.

Type

Preprint

Publication

preprint

Last updated on Apr 4, 2026

Can Structural Cues Save LLMs? Evaluating Language Models in Massive Document Streams Feb 8, 2026 →