UncovAI's New AI Text Detector: Tested Against Every Attack | UncovAI

UncovAI's New AI Text Detector Passes Everything — Including the Attacks

We just shipped a new text detection model. Before releasing it, we ran it through every realistic scenario we could construct: genuine human writing, AI-generated content, hallucinated output, and every humanization tool currently used to evade detection. It passed all of them. Here's exactly what we tested and what the results mean.


Why text detection is harder than it looks

Detecting AI-generated text sounds straightforward until you see what people do to evade it. Most AI text detectors are trained to recognize patterns from a fixed model at a fixed point in time. Swap the model, run the output through a paraphraser, or replace a few characters with lookalikes — and many detectors fail silently, returning a "human" verdict on text that was never written by a person.

The humanization industry has made this worse. Tools like QuillBot, ZeroGPT Humanizer, CleverAI, HumanizeAI, AIHumanize, and Undetectable AI exist specifically to rewrite AI output until other detectors can't identify it. They're widely used, and they work — against most detectors.

Our new model was built with those attacks in mind from the start. We didn't test for evasion after training. We trained with evasion as part of the target.

What we tested

The benchmark covers three categories: real human writing, AI-generated content across different scenarios, and adversarial attacks designed to disguise AI output as human.

Human baselines

A detector that flags everything as AI is useless. The first requirement is that it correctly identifies genuine human writing. We tested two anchors at opposite ends of the spectrum:

Sample Type Result
US Constitution Historical human writing Human ✓ Correctly identified
Harry Potter excerpt Published literary fiction Human ✓ Correctly identified

These two samples represent very different human writing styles — formal legal prose from the 18th century and contemporary narrative fiction. Both were correctly identified as human. No false positives on either.

AI-generated content

We tested across three distinct AI output types, chosen because they represent the most common real-world use cases where accurate detection matters.

Sample Type Result
PyCharm hallucination AI-generated technical content with fabricated facts AI ✓ Correctly identified
France hallucination AI-generated factual content with invented details AI ✓ Correctly identified
Scientific abstract AI-generated academic-style writing AI ✓ Correctly identified
Dublin Standard AI-generated prose AI ✓ Correctly identified

The hallucination samples are worth noting specifically. When an AI model fabricates facts — a made-up PyCharm feature, a false claim about France — the output is still AI-generated text. Hallucinations don't change the underlying writing patterns. The model caught all three AI-generated samples cleanly.

The attack tests — where most detectors fail

This is the part that matters most in practice. We took the same base AI-generated text (the Dublin sample) and ran it through every major evasion method currently in use. Each attack transforms the text in a different way, attempting to strip the statistical and structural signatures that detectors look for.

The result

Every single attack was detected. The Dublin text, regardless of how it was transformed, was correctly identified as AI-generated across all ten attack variants.

Attack Method Result
Dublin — backtranslated Translated to another language and back to English Attack ✓ Still detected as AI
Dublin — QuillBot humanized Paraphrased via QuillBot Attack ✓ Still detected as AI
Dublin — ZeroGPT humanized Rewritten via ZeroGPT humanizer Attack ✓ Still detected as AI
Dublin — CleverAI humanized Rewritten via CleverAI Attack ✓ Still detected as AI
Dublin — HumanizeAI humanized Rewritten via HumanizeAI Attack ✓ Still detected as AI
Dublin — AIHumanize humanized Rewritten via AIHumanize Attack ✓ Still detected as AI
Dublin — Undetectable AI humanized Rewritten via Undetectable AI Attack ✓ Still detected as AI
Dublin — Homoglyph attack Characters replaced with visually identical Unicode lookalikes Attack ✓ Still detected as AI
Dublin — Article deletion Articles (a, an, the) systematically removed Attack ✓ Still detected as AI
Dublin — Uppercase/lowercase Random case changes inserted throughout Attack ✓ Still detected as AI
Dublin — Whitespace Hidden whitespace characters injected between words Attack ✓ Still detected as AI
Dublin — Perplexity misspelling Deliberate misspellings to disrupt token probability Attack ✓ Still detected as AI
Dublin — Paraphrase Manual and automated rephrasing of full paragraphs Attack ✓ Still detected as AI

Why the attacks don't work against this model

Most AI text detectors rely heavily on perplexity — a measure of how "surprising" each word choice is. Low perplexity means predictable, model-like text. That approach works until someone runs the output through a humanizer that adds variation, or swaps a few characters with Unicode lookalikes that break the tokenizer.

Our model doesn't depend on a single signal. It looks across multiple layers simultaneously: structural patterns, semantic consistency, token-level behavior, and document-level coherence. Humanizers change the surface. They rarely change all layers at once.

The backtranslation attack is a good example. Translating text to French and back to English changes vocabulary and sentence structure significantly. The perplexity profile shifts. But the way ideas are connected, the way arguments are constructed, the document-level consistency — those patterns survive translation and remain identifiable.

The homoglyph and whitespace attacks work by corrupting the input before it reaches the detector's tokenizer, hoping the model will fail silently on malformed text. Our preprocessing layer normalizes these before detection begins. By the time the model sees the text, the attack has already been neutralized.

🔤

Surface attacks neutralized

Homoglyphs, whitespace injection, case changes, and article deletion are cleaned before analysis. The model never sees the corrupted form.

🔄

Humanizers don't change enough

Paraphrasers alter word choice and sentence structure. They don't rewrite how an AI organizes ideas — and that's what the model reads.

🌍

Translation-resistant

Backtranslation shifts the surface vocabulary. Document-level coherence patterns survive the round trip and remain detectable.

📊

Multi-signal analysis

No single metric drives the verdict. The model weighs structural, semantic, and statistical signals together, making single-vector attacks insufficient.

What this means for real-world use

The benchmark matters because these aren't theoretical attacks. Students, content farms, and anyone trying to pass AI-generated work as their own routinely run text through QuillBot, Undetectable AI, or a quick round of backtranslation before submitting it. These tools are cheap, fast, and — until now — effective at fooling most detectors.

Educators using UncovAI's text detection to review submitted work will get accurate results even when students have explicitly tried to humanize their output. Publishers, legal teams, and compliance teams verifying document authenticity can trust the result regardless of how the source material was processed before delivery.

The false positive rate matters equally. Incorrectly flagging the US Constitution or a Harry Potter chapter as AI-generated isn't just an embarrassing error — it undermines trust in every result the tool produces. Getting both directions right is the baseline requirement, and this model meets it.

Humanizers change the surface. They don't change how an AI thinks through a problem. That gap is what the model detects.

Part of a complete detection suite

Text is one vector. AI-generated content now spans every media format — and the same evasion logic applies everywhere. Deepfake video gets run through re-encoding tools. AI-generated images get passed through style filters. Cloned audio gets mixed with background noise.

UncovAI covers all of it. The video detection tool identifies deepfakes and synthetic footage. The audio detection tool catches cloned voices and AI-generated speech. The image detector flags generated and manipulated visuals. If you need protection across formats — or want to verify content in real time during a meeting — the real-time deepfake detection tool handles that too.

Frequently asked questions

Does the text detector work on short texts?

Yes, though accuracy improves with length. Very short samples — a sentence or two — give the model less signal to work with. A paragraph or more produces reliable results. For short-form content, combining text detection with context from the document as a whole helps.

Will it flag text that was partly written by AI and partly by a human?

Mixed-authorship documents are one of the harder problems in AI detection. The model is designed to handle hybrid content and will flag the AI-generated portions, but the overall document verdict depends on the proportion and distribution of AI-written sections. We're continuing to improve mixed-authorship detection as a specific focus area.

What languages does the text detector support?

The current model performs best on English. Multilingual support is in active development. For non-English text, results are directionally useful but should be treated with more caution than English outputs.

How does UncovAI handle new AI models not in its training data?

The model isn't trained to recognize specific AI models — it's trained to recognize the structural and statistical properties of AI-generated text in general. New models from different providers share enough underlying characteristics that the model generalizes well. We run continuous benchmarking against new model releases to verify this.

Can I use the text detector via API?

Yes. API access is available for teams integrating AI detection into their own workflows. Visit the products page for details on API access and usage limits, or check pricing for plan options.

The model is live. Try it now.

Every test passed. Every attack neutralized. The new AI text detector is available now — no sign-up required to run your first detection. Paste any text and see the result in seconds.

Try the Text Detector Free →