BRIAN REDEKOPP
JULY 2023
CAN WE RELIABLY DETECT AI-GENERATED TEXT?
THE CHATGPT PROBLEM
OVERVIEW: METHODS OF DETECTION
(1) Statistical models measure certain statistical properties of a text and compare the result with properties typical of AI-generated text.
This is the method used by Turnitin and most of the other detectors currently available, e.g. CopyLeaks, GPTZero, ZeroGPT, Writer AI, Originality AI
(2) Training-based models use machine-learning techniques to train an AI system to classify text as either human or AI-generated. Through exposure to a large quantity of text of each kind, the machine creates its own rules for identifying AI-generated text.
This is the method Open AI used for its detector. But it was unsuccessful and Open AI quietly shut it down in July 2023.
(3) Watermarking models modify an LLM’s text-generation process such that its output contains hidden markers that can be detected by a statistical algorithm.
(4) Retrieval-based models store all the outputs of an LLM in a database and then compare text against these outputs.
WHAT STATISTICAL MODELS (OUR CURRENT DETECTORS) MEASURE: �“PERPLEXITY” AND “BURSTINESS”
Large-language models (LLMs) generate strings of linguistic tokens by calculating tokens’ probability given the prompt and the tokens already generated. To ensure its output is coherent, an LLM will tend to select tokens with higher degrees of probability. This is why LLMs tend to generate bland, formulaic and syntactically clean text. More technically, AI-generated text tends to have lower “perplexity” and “burstiness” than human-generated text.
These terms are used to describe both the performance of an LLM and the text it generates.
The perplexity of an LLM is how well it is able to predict the next token from a given string. A model with low perplexity is one that predicts the next token well.
Thus the perplexity of a text is how predictable it is, i.e. how likely each word is given the previous words. Since humans often creatively (or mistakenly) combine words in novel ways, human writing tends to have higher perplexity than AI writing.
The burstiness of an LLM is how much variation it generates in its outputs, i.e. in its terms, sentence structures, and sentence lengths. A model with low burstiness is one that generates syntactically homogenous text.
Thus the burstiness of a text is how much variety it has amongst its terms and in its structure. Again, given the dynamism of human language, human writing tends to have higher burstiness than AI writing.
Statistical models can measure perplexity and burstiness by calculating the average per-token probability of a text and comparing this with a threshold it deems characteristic of AI writing. (Mitchell et al. 2)
This threshold is difficult to set. If it is set too high, the risk of falsely identifying human writing as AI-generated increases (the false positive problem). If it is set too low, the risk of failing to detect AI writing increases (the evasion problem).
PROBLEMS WITH TURNITIN AND OTHER STATISTICAL MODELS
(1) Risk of false positives
(2) Bias against non-native writers
(3) Easily defeated through prompting and paraphrasing
(Here is an overview of Turnitin’s strengths and weaknesses, from a website that is a strange mix of advice for educators on how to detect AI and advice for students on how to evade detection.)
THE RISK OF FALSE POSITIVES
Given how statistical models work, student writing that is less “perplexing” or “bursty” risks falling under the statistical thresholds of the detector and getting wrongly flagged as AI-generated.
A detector set to catch a higher percentage of AI writing will set its threshold higher, thereby generating a higher rate of false positives. A lower threshold results in fewer false positives, but at the cost of detecting a lower percentage of AI writing.
So statistical models will inevitably generate false positives; the likelihood depends on the choice the developer makes in the trade-off between maximizing true positives and minimizing false ones.
STUDENTS AT A HIGHER RISK OF FALSE POSITIVES
�LIANG ET AL., “GPT DETECTORS ARE BIASED AGAINST NON-NATIVE ENGLISH WRITERS” (JULY 2023)
“In our study, we evaluated the performance of seven widely used GPT detectors on 91 TOEFL (Test of English as a Foreign Language) essays from a Chinese forum and 88 US eighth-grade essays from the Hewlett Foundation’s ASAP dataset. While the detectors accurately classified the US student essays, they incorrectly labeled more than half of the TOEFL essays as "AI-generated" (average false-positive rate: 61.3%). All detectors unanimously identified 19.8% of the human-written TOEFL essays as AI authored, and at least one detector flagged 97.8% of TOEFL essays as AI generated. Upon closer inspection, the unanimously identified TOEFL essays exhibited significantly lower text perplexity.”
“The implications of GPT detectors for non-native writers are serious, and we need to think through them to avoid situations of discrimination. Within social media, GPT detectors could spuriously flag non-native authors’ content as AI plagiarism, paving the way for undue harassment of specific non-native communities. Internet search engines, such as Google, that implement mechanisms to devalue AI-generated content may inadvertently restrict the visibility of non-native communities, potentially silencing diverse perspectives. Academic conferences or journals prohibiting use of GPT may penalize researchers from non-English-speaking countries. In education, arguably the most significant market for GPT detectors, non-native students bear more risks of false accusations of cheating, which can be detrimental to a student’s academic career and psychological well-being.
Paradoxically, GPT detectors might compel non-native writers to use GPT more to evade detection. As GPT text-generation models advance and detection thresholds tighten, the risk of non-native authors being inadvertently caught in the GPT detection net increases. If non-native writing is more consistently caught as GPT, this may create an unintended consequence of ironically causing non-native writers to use GPT to refine their vocabulary and linguistic diversity to sound more native.”
EVADING DETECTION THROUGH PROMPTING �
“Write an engaging article that incorporates a human-like style, simple English, contractions, idioms, transitional phrases, interjections, dangling modifiers, and colloquialisms, while also weaving in literary devices such as symbolism, irony, foreshadowing, metaphor, personification, hyperbole, alliteration, imagery, onomatopoeia, and simile without directly mentioning them.”
One way to evade detection is simply to prompt the LLM to generate text that is less statistically flat, e.g. this prompt from the sort of “how to avoid detection” video a student might consult on YouTube:
EVADING DETECTION THROUGH PARAPHRASERS �
Another highly effective way to evade existing detectors is to run AI-generated text through an AI-paraphrasing tool. These tools are readily available, and a student need only spend a short time on YouTube to learn how to use them effectively.
One way to paraphrase is simply by prompting the LLM to paraphrase its own output:
EVADING DETECTION THROUGH PARAPHRASERS �
Another way to paraphrase is by using a paraphrasing program like Quillbot, Spinbot, Undetectable, Stealthwriter, CogniBypass, Word AI.
I tested Quillbot vs. Turnitin by having ChatGPT generate a 1200-word essay on Descartes on free will. Turnitin flagged it as 40% AI-generated. I paraphrased it on Quillbot with the synonym setting to max. (Doing this for free requires tediously inputting 125 words or less at a time.) The paraphrased essay was clunkier and less readable, but this gave it the feel of an average student paper.
After Quillbot paraphrasing, the Turnitin AI score fell from 40 to 0.
Undetectable seems especially strong for evading detection.
Sadasivan et al. (June 2023) show that paraphrasing can defeat any current state-of-the-art detector.
LONG-TERM PROSPECTS FOR DETECTION: �GROUNDS FOR OPTIMISM
Currently, computer scientists are divided on the prospects for reliably detecting AI-generated text.
Chakraborty et al. (June 2023) provide a mathematical proof that “it is almost always possible to detect AI-generated text as long as we can collect multiple samples.”
They argue that while detection becomes more difficult the more statistically similar AI and human text become, it becomes impossible only in the rare instance of no statistical difference whatsoever. Short of this, we can always compensate for a smaller statistical gap between human and AI text by collecting more samples.
The authors acknowledge that paraphrasing reduces detection performance. But they argue that detection can withstand paraphrasing through a combination of increasing the sample and designing better watermarks.
LONG-TERM PROSPECTS FOR DETECTION: �GROUNDS FOR OPTIMISM
Kirchenbauer et al. (June 2023) share this view. They propose a watermarking technique and show that it is robust against paraphrasing, especially as text length increases.
The technique: “At each step of the text generation process, the watermark pseudo-randomly “colors” tokens into green and red lists. Then a sampling rule is used that preferentially samples [selects] green tokens when doing so does not negatively impact perplexity. To detect the watermark, a third party with knowledge of the hash function can reproduce the red and green lists for each step and count the violations.”
They find that that “even human writers cannot reliably remove watermarks if being measured at 1000 words, despite having the goal of removing the watermark.”
LONG-TERM PROSPECTS FOR DETECTION: �GROUNDS FOR PESSIMISM
Sadasivan et al. (June 2023) show that paraphrasing can defeat any current state-of-the-art detector, and argue that, as LLMs become more and more sophisticated, “reliable detection may be unachievable.” (5)
In response to the proof in Chakraborty et al. that detection is, in principle, always possible with enough samples, they point out that this will often not be practical. The samples must be independent of each other (e.g. separate articles or papers), but “it would be unreasonable to expect a student to submit several versions of their essay just to determine whether it has been written using AI or not.” (20)
LONG-TERM PROSPECTS FOR DETECTION: �GROUNDS FOR PESSIMISM
Problems with a watermarking approach:
Problem with a retrieval approach
WORKS CITED
Chakraborty, Souradip et al., “On the Possibilities of AI-Generated Text Detection.” ArXiv pre-print, June 2023. https://arxiv.org/abs/2304.04736
Kirchenbauer et al., “On the Reliability of Watermarks for Large Language Models.” ArXiv pre-print, June 2023. https://arxiv.org/abs/2306.04634
Krishna et al. “Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense.” ArXiv pre-print, March 2023. https://arxiv.org/abs/2303.13408
Liang, Weixin et al., “GPT detectors are biased against non-native English writers.” ArXiv pre-print, June 2023. https://arxiv.org/abs/2304.02819
Lu, Ning et al., “Large Language Models can be Guided to Evade AI-Generated Text Detection.” ArXiv pre-print June 2023. https://arxiv.org/abs/2305.10847
Mitchell, Eric et al., “DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature.” ArXiv pre-print, January 2023. https://arxiv.org/abs/2301.11305
Sadasivan, Vinu et al., “Can AI-Generated Text be Reliably Detected?” ArXiv pre-print, June 2023. https://arxiv.org/abs/2303.11156