An AI detection report lands in your inbox. The score reads 87 percent. What do you do next? If your instinct is to confront the writer or kill the piece, pause. That number means considerably less than it appears to, and acting on it without understanding what it actually measures can damage a journalist's career on the basis of a statistical guess.

This guide is for editors who need to make real decisions. We will walk through what the components of a typical detection report actually represent, where the numbers break down, and what responsible editorial practice looks like when AI detection is part of your workflow.

What the Score Actually Represents

Most AI detection tools output a probability, not a finding. A score of 87 percent does not mean the text is 87 percent AI-generated. It means the model assigned that piece of text a position on a probability distribution it was trained to produce. The underlying technology, explained in detail in our piece on how AI content detection works, relies on statistical patterns in language, primarily measuring how predictable each word choice is relative to what a language model would have selected. Human writers, particularly those who write clearly and concisely, can score very high on that measure.

The key things to look for in any report:

  • The overall score or percentage. Treat this as a signal, not a conclusion. Scores above roughly 80 percent warrant a closer look; they do not warrant an accusation.
  • Sentence-level or paragraph-level highlighting. Most tools flag individual sentences as high-risk. This granularity is useful for identifying which passages to examine editorially, but flagged sentences are often stylistically plain rather than machine-generated.
  • Confidence indicators. Some tools display a confidence band or a categorical label such as "likely AI" or "unclear." A result in the "unclear" band should be treated as uninformative, full stop.
  • The model version and training date. Detection tools are trained on specific model outputs. A tool trained primarily on GPT-3 outputs will perform differently on text produced by newer or less common models.

The False Positive Problem Is Not Edge-Case

The journalism community needs to understand that false positives are not rare anomalies. Research cited by Turnitin and GPTZero themselves, as well as independent academic work, has shown that non-native English speakers are flagged at substantially higher rates than native speakers, because their writing can exhibit the kind of lexical regularity that detectors associate with machine output. Students and writers who use plain, direct prose face the same issue.

Our analysis of how AI detectors disagree with one another on identical texts reinforces this point: the same article can receive a 20 percent score from one tool and a 76 percent score from another. That variance alone should make any editor cautious about treating a single report as authoritative. If you want to understand how different leading tools compare in practice, our 2026 comparison of GPTZero, Originality AI, and Turnitin lays out where each one tends to over- and under-flag.

What a Report Cannot Tell You

A detection report cannot tell you:

  • That a specific human did or did not write the text.
  • That AI was used to generate, rather than lightly assist, the piece.
  • That the flagged sentences were not edited, rewritten, or paraphrased by a human after generation.
  • That the writer had any intent to deceive, even if AI was involved.

This distinction matters editorially because many newsrooms are now operating with explicit or tacit acceptance of AI-assisted drafting for certain tasks: summaries, data tables, boilerplate sections. A detection flag on those sections is not the same thing as discovering fabricated sourcing or undisclosed ghostwriting. The editorial question is one of disclosure and policy compliance, not of the detection score itself.

A Practical Editorial Protocol

Rather than treating a detection report as a trigger for accusation, we would suggest the following approach:

  • Use detection as a triage layer, not a verdict layer. If a score is high, it should prompt an editorial conversation, not a disciplinary one.
  • Cross-reference with editorial knowledge. Does the writing style match the journalist's previous work? Are the sources verifiable? Does the piece contain details, observations, or quotes that would require original reporting?
  • Run the same text through more than one tool. Divergent results are common, as detailed in our analysis of seven detectors run on the same piece. Convergence across tools raises the signal; divergence should lower your confidence in any single result.
  • Ask about process, not about the score. Invite the writer to walk you through their reporting process. That conversation will tell you far more than a percentage will.
  • Document your workflow. Whatever your newsroom's policy on AI-assisted content, apply it consistently and in writing. A detection score applied inconsistently across staff is a liability, both editorially and legally.

The Deeper Technical Limits

For editors who want to understand the architecture behind what they are reading, our deep technical dive into AI content detection covers how perplexity and burstiness scoring work and why both measures are inherently limited when applied to professionally edited prose. The short version: detection models are optimised for academic contexts and perform less reliably on journalistic writing, which tends to be edited for clarity and rhythm in ways that inadvertently mimic machine output patterns.

The goal for our newsrooms is not to become experts in machine learning. It is to be informed enough to use these tools proportionately, to protect both editorial integrity and the journalists who work for us. A detection report is one data point. Treat it accordingly.

Sources

  • Turnitin AI Writing Detection capability documentation and accuracy disclosures, Turnitin.com
  • GPTZero documentation and published accuracy benchmarks, GPTZero.me
  • Liang, W. et al. (2023). "GPT detectors are biased against non-native English writers." Patterns, Cell Press.
  • Originality AI product documentation and classifier methodology notes, Originality.ai
  • World Editors Forum AI and Authenticity coverage, Editors Weblog