The rubric
Every check we run, in plain English. The full source is on GitHub.
A · AI crawler access (blocking)
- robots.txt fetched — present and parseable.
- 14 AI bots checked — GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, CCBot, Applebot-Extended, Bytespider, Amazonbot, cohere-ai, FacebookBot, Meta-ExternalAgent.
- X-Robots-Tag header — checks for
noai,noimageai,noindex. - Meta robots — checks for
noindexon the home page.
B · Discoverability & structure
- sitemap.xml — reachable, either at the standard path or referenced in robots.txt.
- llms.txt — present at root and non-empty (emerging standard).
- Canonical tag — present.
- lang attribute — declared on the
<html>tag. - HTTPS + HSTS — site served over HTTPS with optional HSTS hardening.
- Redirect chain — flags if URL needs more than one hop to resolve.
C · Structured data
- JSON-LD presence — at least one parseable block.
- Article / BlogPosting — requires author, datePublished, headline.
- Organization — requires name, url, logo, sameAs.
- Product — requires name, offers.
- FAQPage — at least 2 Question items.
- Microdata / RDFa — detected as fallback; we recommend migrating to JSON-LD.
- Open Graph + Twitter Card — checks og:title, og:description, og:image, og:type, twitter:card.
D · Content extractability
- Raw-HTML word count — flags pages with under 100 visible words in raw HTML (AI bots don't run JS).
- H1 count + heading hierarchy — one h1, no skipped levels.
- Semantic landmarks —
<article>or<main>present. - Image alt-text coverage — flags low coverage on pages with 5+ images.
- Text-to-code ratio — flags pages dominated by markup over visible text.
E · Citability signals
- Author byline — meta author, JSON-LD author, rel="author", or "By [Name]" text pattern.
- Publish / update date — JSON-LD datePublished/dateModified,
<time>with datetime, or article meta. - Outbound authoritative links — links to .gov, .edu, Wikipedia, etc.
- Internal link density — flags very low link count on the home page.
F · Answer-shape
- Question-form headings — h2/h3 starting with "What/How/Why…" or ending in "?".
- Lists and tables — presence of structured content blocks.
- FAQ schema match — Q&A headings without FAQPage schema flagged.
G · Classic SEO basics
- <title> length — 30-65 characters target.
- Meta description length — 120-160 characters target.
- Favicon + apple-touch-icon — present.
- Mobile viewport — declared.
- Core Web Vitals — fetched live from Google PageSpeed Insights (LCP, INP, CLS, performance score).
H · Free extras
- Test-it-yourself deep links — prefilled queries to ChatGPT, Claude, Perplexity, Google AI Mode for actual citation evidence.
- Auto-generated llms.txt template — when missing.
- Auto-generated robots.txt patch — when AI bots are blocked.
- Auto-generated JSON-LD snippet — for the most-missing schema type.
- URL hygiene — clean, lowercase, no session params.
Scoring
Each finding penalises one or both disciplines (AI SEO, Classic SEO) by severity:
- Blocking fail: −25 points
- Blocking warn: −10 points
- Important fail: −10 points
- Important warn: −5 points
- Nice fail: −3 points
- Nice warn: −1 point
Each discipline starts at 100. Scores clamp to 0-100.