An Open-Source SEO + GEO Audit Toolkit in Plain Node

Table of Contents

Why GEO? AI crawlers don't run your JavaScript
The four tools
What it found on my own site
How to check a sitemap for broken links
Philosophy
FAQ
Get it

I run this blog on a self-hosted stack, and I like knowing exactly how healthy it is — broken links, metadata, rankings, the lot. The tools that answer those questions properly start at around $99 a month, and I mostly needed the answers once a week. So over the last few weeks I built my own: four small Node scripts, each answering one question, each producing a markdown report. Today I cleaned them up and put them on GitHub.

The result is seo-geo-audit — MIT-licensed, about 1,500 lines total, and zero npm dependencies (one exception, more on that below). Every tool is a single command, and every report is plain markdown you can read in the terminal, diff in git, or paste into an issue.

Why GEO? AI crawlers don't run your JavaScript

GEO — Generative Engine Optimization — asks whether AI answer engines like ChatGPT, Claude or Perplexity can actually read and cite your site. This stopped being theoretical for me when AI assistants started showing up as referrers in my own analytics. Those visitors are real, and they arrive because an AI read your page and linked it.

Here is the catch: most AI crawlers fetch your raw server HTML and do not execute JavaScript. Google renders your page; GPTBot, ClaudeBot and PerplexityBot mostly don't. So structured data your framework injects client-side is invisible to them, even though every SEO browser extension tells you it's fine. The same goes for metadata that streaming frameworks flush into the body instead of the initial head — Google relocates it, a JS-less crawler misses it. My own site had six pages doing exactly that, and I only know because the crawler flags it as its own issue category.

The four tools

seo-audit crawls your sitemap, then every internal link and image target, and analyzes the raw server HTML — deliberately the JS-less view. It covers the classic checks (titles, descriptions, canonicals, hreflang, Open Graph, broken links with redirect-chain resolution, sitemap hygiene) plus the GEO set: client-side-only JSON-LD, metadata streamed to the body, heading-outline gaps, thin content, llms.txt presence, and AI crawlers blocked in robots.txt.

seo-audit/run.sh https://your-site.com

perf-audit is the one tool with a dependency — it drives a real browser via Playwright. It measures lab Core Web Vitals (LCP, CLS, FCP, TTFB, TBT) against Google's thresholds, pulls real-user CrUX field data including INP through the free PageSpeed Insights API, tracks a performance budget per page, and — most usefully — captures the post-hydration DOM so you can diff what Google sees against what AI crawlers see.

gsc-fetch talks to the Search Console API and computes the two lists a solo operator actually acts on: striking-distance queries (position 5–20 with real impressions — one title tweak from page 1) and low-CTR winners (already top 5, earning fewer clicks than the position implies, with an estimate of the clicks left on the table). That second list is literally my editorial backlog now.

umami-fetch pulls a self-hosted Umami v3 instance: traffic channels, top pages, custom events, UTM campaigns — and a datacenter-adjusted totals row, because one datacenter country turned out to be a third of my “visits” before I started subtracting it. Umami's API only filters by equality, so the tool fetches the suspect countries separately and does the math.

What it found on my own site

Eating my own dog food was sobering: six pages with metadata invisible to JS-less crawlers, eighty meta descriptions over the length limit, a broken internal link target I had missed for weeks, bot traffic inflating my visitor numbers by half, and a brand query ranking #1 with a 0% click-through rate. None of this was visible in any single dashboard I had. An audit you can re-run in thirty seconds is much harder to ignore than a subscription you check monthly.

How to check a sitemap for broken links

The single most common job the toolkit does for me: check the sitemap for broken links. The crawler fetches every URL in your sitemap, then follows every internal link and image target on those pages and reports anything that does not resolve:

git clone https://github.com/lireking/seo-geo-audit
cd seo-geo-audit
seo-audit/run.sh https://your-site.com

The report flags sitemap URLs that return 4xx or 5xx plus every page that contains broken internal links or images. I run exactly this check as a nightly cron on cloudapp.dev — a broken link never survives longer than a day.

Philosophy

One command, one markdown report. No build step, no config file with eighty options, no platform. Plain Node scripts you can read in one sitting and edit to your needs — sharp little knives, not a Swiss army knife. If a check doesn't apply to your stack, delete it; it's your copy.

FAQ

Is it free?

Yes — MIT license, no paid tier. The only optional costs are third-party APIs: PageSpeed Insights is free with a Google API key, Search Console is free, and backlink data is the one thing that genuinely has no free source (the tool plugs into a paid provider if you have one, and says so honestly if you don't).

What exactly is GEO?

Generative Engine Optimization: making your content readable, parseable and citable for AI answer engines. In practice it overlaps heavily with technical SEO — the difference is the renderer. AI crawlers read raw HTML, so anything that only exists after JavaScript runs does not exist for them.

Do I need API keys?

The crawler needs nothing — clone and run against any site. PageSpeed field data needs a free Google API key, Search Console needs a one-time OAuth flow (the kit includes a dependency-free helper that mints the refresh token), and the analytics tool needs your own Umami login.

Get it

git clone https://github.com/lireking/seo-geo-audit
cd seo-geo-audit
seo-audit/run.sh https://your-site.com

That's all there is to it — the repo is at github.com/lireking/seo-geo-audit. If it flags something on your site that it shouldn't (or misses something it should catch), open an issue. PRs welcome — the scope stays small on purpose.