Synthetic or Not

The Experiment

We took five AI-generated text samples (produced by GPT-4, Claude 3, Gemini, Llama 3, and Mistral) and five human-written samples (from published journalists, a Reddit post, an academic paper, and a personal blog). We ran all ten samples through the same ten detection tools and recorded the results.

Our goal was simple: find out which tools actually work, which ones give false positives, and which ones you can trust with real decisions.

The Tools We Tested

We selected tools across a range of price points and popularity:

Tool	Price Tier	Claims
GPTZero	Freemium	99% accuracy
Originality.ai	Paid	99% accuracy, 0.5% false positive
Copyleaks	Freemium	99%+ accuracy, 0.2% false positive
Winston AI	Freemium	99.98% accuracy
Pangram Labs	Paid	Near-zero false positives
ZeroGPT	Freemium	98%+ accuracy
Content at Scale	Free	Previously claimed 98%
Sapling	Freemium	97% accuracy
Writer.com	Free	No specific claims
Quillbot	Free	No specific claims

The Results

Detecting AI-Generated Text

The top performers correctly identified all five AI-generated samples:

Copyleaks and Pangram Labs both achieved a perfect 5/5 detection rate on AI text with zero false positives on human text. These were the clear winners.

GPTZero and Winston AI each caught 4 out of 5 AI samples. Both missed the Mistral-generated sample, which used a more conversational tone. Still, strong performance overall.

Originality.ai caught 4/5 but also flagged one human-written sample (the academic paper) as AI-generated, a false positive that could have serious consequences in an educational setting.

The Worst Performers

Writer.com detected zero AI-generated samples in our test. Every single one was marked as "likely human." We cannot recommend this tool for any serious use case.

ZeroGPT and Content at Scale both showed inconsistent results, catching some obvious AI text but missing more sophisticated outputs. Their accuracy hovered around 60%, far below their marketing claims.

Our Recommendations

Best Overall: Copyleaks. Accurate, affordable, and supports multiple content types including images and code.

Best Free Option: GPTZero's free tier. Limited in volume but reliable for spot-checking.

Best for Enterprises: Pangram Labs or Hive Moderation. Both offer near-perfect accuracy with enterprise-grade features.

Best for Educators: GPTZero or Turnitin (if your institution already has a subscription).

The Bottom Line

No detection tool is perfect. The best ones hover around 90-95% real-world accuracy, not the 99%+ they claim in marketing materials. Use them as one signal among many, not as the sole basis for accusations of AI use.

The most reliable approach combines tool-based detection with human judgment: look for the patterns, check the context, and use multiple tools when the stakes are high.

We Tested 10 AI Detection Tools on the Same Content. Here Are the Results.