How AI Detectors Work
AI text detectors are useful tools β but they're not infallible. Here's an honest explanation of how they work, what the numbers mean, and why we run 4 models instead of 1.
The core idea: statistical patterns
Language models (GPT-4, Claude, Gemini, etc.) generate text by predicting the most likely next token given the previous context. This tends to produce text that is statistically more predictable than human writing β lower "perplexity" in NLP terms.
AI detectors exploit this. They model the distribution of token probabilities and look for patterns that correlate with machine generation: unusually low perplexity, high burstiness (sentences alternating between very simple and very complex), and characteristic phrasing.
Why accuracy is bounded at 76β88%
Independent benchmarks consistently show top AI detectors achieving 76β88% accuracy on mixed human/AI corpora. Here's why the ceiling isn't higher:
- Human writing varies enormously. Technical writing, ESL text, and formal prose can resemble AI output statistically.
- AI output varies enormously. Heavily edited AI text looks more human. Prompted differently, the same model produces very different statistical signatures.
- Adversarial pressure. Users who want to evade detection will edit, paraphrase, or use humanizer tools.
- Distribution shift. Models trained on data up to 2024 may not generalize to text generated by models released in 2025β2026.
This is why we display all four model scores, not just an aggregate.
What the divergence score means
When our 4 models agree (all score >75 or all score <25), you can have higher confidence. When they diverge β one says 90% AI, another says 30% β that's a signal to interpret carefully. We show you the divergence score explicitly so you don't over-rely on a misleading average.
The 4 models we use
- RoBERTa OpenAI Detector
- A fine-tuned RoBERTa model trained specifically to detect GPT-style output. Strong on formal academic-style text.
- Sapling AI Detector
- A commercial classifier with broad training across multiple model families. Good coverage of Claude and Gemini output.
- Gemini Text Classification
- Google's own classifier, useful as a cross-check on Gemini-generated content and for multilingual text.
- AICheckHub Internal Classifier
- Our own model, continuously updated on recent model output including frontier models released in 2025β2026.
False positives and false negatives
A false positive (human text flagged as AI) is a real risk, especially for: non-native English speakers, highly technical writing, and minimalist prose styles.
A false negative (AI text flagged as human) is common when the AI output has been edited, paraphrased, or passed through a humanizer.
We will never tell you a score is definitive. Any use of AI detection scores in high-stakes decisions (academic sanctions, hiring, legal proceedings) should involve human review.