Why must hreflang annotations be bidirectional, and what breaks if they are not?

Google requires that every language version of a page lists all other versions, including itself. If the English page references the Indonesian page but the Indonesian page does not reference back, the unpaired annotations are silently ignored — no error, no warning. The result is that Google cannot reliably route Indonesian searchers to the correct locale, which is the whole purpose of hreflang.

Why should you never point an Indonesian page's canonical tag at the English version?

A canonical tag tells Google which address is the authoritative one for that content. Pointing the Indonesian page's canonical at the English URL signals to Google that the Indonesian content is a duplicate and should be dropped from the index. For a bilingual site, each locale version must be its own canonical so both language versions remain eligible to rank.

Why is robots.txt the wrong tool for hiding duplicate content, and what should be used instead?

A page that is blocked by robots.txt cannot be crawled, which means Google never reads the canonical tag placed on it. Since canonical and Disallow are mutually exclusive tools, blocking a page prevents the very signal you need to consolidate duplicate URLs. The correct approach is to let the page be crawled and declare the authoritative URL with a canonical tag instead.

How does generating canonical and hreflang from a single shared helper prevent SEO regressions?

When canonical and the full language map are computed in one place, a page cannot declare one without the other. If each template generates its own hreflang independently, a refactor can silently break the bidirectional contract on some pages with no visible error. Centralising the logic into one helper, called from generateMetadata in the App Router, ensures completeness is structurally enforced rather than relying on per-template discipline.

What is the fastest way to catch SEO regressions in CI before they reach production?

The post recommends adding a CI step that fetches a small set of rendered pages — the home page and one localized post in each language — and asserts on the presence of a correct self-referencing canonical, a complete reciprocal hreflang set, and parseable JSON-LD. This takes about an hour to write and catches the class of mistakes — missing annotations, broken bidirectionality, malformed structured data — that are invisible in the browser and in Lighthouse.

Technical SEO for Developers: Canonical, hreflang, JSON-LD

Technical SEO has a reputation problem among developers: it sounds like marketing, so we ignore it until someone asks why the Indonesian version of a page ranks in Germany, or why Google indexed the staging domain. But canonical tags, hreflang, structured data, and sitemaps are not marketing — they are protocol-level contracts between your HTML and a crawler. Getting them wrong is a bug, with the same root causes as any other bug.

This checklist comes from shipping this very portfolio — a bilingual English-Indonesian Next.js site with a couple hundred blog posts — plus a handful of client builds. Everything here is the developer half of SEO: things you fix in code, verify with curl, and regression-test in CI. No keyword research, no content strategy, just the plumbing that decides whether your content is even eligible to rank.

Canonical URLs: One Address per Piece of Content

Every page on your site is reachable through more URLs than you think: with and without trailing slash, with UTM parameters, via http before the redirect, sometimes through both a vanity route and a real one. To Google these are separate pages competing against each other, splitting ranking signals and burning crawl budget. The canonical tag is how you declare which address is the real one — and per Google's documentation, redirects are an even stronger signal, with sitemap inclusion the weakest.

In the App Router the clean implementation is the alternates field of the Metadata API, computed in generateMetadata so every dynamic route declares itself. Two rules I enforce: canonicals are always absolute URLs, and every locale version is its own canonical. Pointing the Indonesian page's canonical at the English page is a classic mistake — it tells Google the Indonesian content is a duplicate to be dropped, which is the exact opposite of what a bilingual site wants.

// app/[locale]/blog/[slug]/page.tsx — canonical + hreflang in one place
export async function generateMetadata({ params }): Promise<Metadata> {
  const { locale, slug } = await params
  const base = "https://example.com"

  return {
    title: post.title,
    description: post.description,
    alternates: {
      // ONE canonical per language version — not one shared
      // canonical pointing every locale at English.
      canonical: `${base}/${locale}/blog/${slug}`,
      languages: {
        en: `${base}/en/blog/${slug}`,
        id: `${base}/id/blog/${slug}`,
        // safety net for every unmatched visitor:
        "x-default": `${base}/en/blog/${slug}`,
      },
    },
  }
}

hreflang: The Contract That Must Be Bidirectional

hreflang tells Google which language versions of a page exist so it can route Indonesian searchers to /id/ and everyone else to /en/. The implementation rules from Google's docs are strict enough that most sites get at least one wrong:

Every version must list ALL versions including itself. If the English page references the Indonesian one, the Indonesian page must reference back — annotations that are not reciprocated get ignored.
Codes are ISO 639-1 language plus optional ISO 3166-1 region: id, en, en-GB. Made-up codes fail silently.
x-default declares the fallback for users matching no listed language — point it at your primary locale.

The mistake I made on an early version of this site: my hreflang annotations were generated in one component but a refactor left one template emitting only the current locale's link. No error, no warning — just hreflang silently void on those pages because the bidirectional contract broke. The fix that sticks is structural: generate canonical and the full languages map from a single helper, so a page cannot declare one without the other.

hreflang failures are invisible in the browser and in Lighthouse. The only places they show up are Search Console's international targeting reports and raw HTML inspection. If you have not curled your production pages and read the link tags with your own eyes, you do not know your hreflang works.

Structured Data: JSON-LD or Nothing

Structured data is how you tell Google what a page is, not just what it says — this is an Article, by this Person, published on this date. Google recommends the JSON-LD format over microdata, and in a Server Component it costs nothing: build a plain object, serialize it into a script tag, done. No client JavaScript, no library.

// JSON-LD in a Server Component — no client JS needed
export default async function BlogPost({ params }) {
  const jsonLd = {
    "@context": "https://schema.org",
    "@type": "Article",
    headline: post.title,
    datePublished: post.datePublished,
    author: { "@type": "Person", name: "Matthews Wong" },
    image: `https://example.com${post.image}`,
  }

  return (
    <>
      <script
        type="application/ld+json"
        dangerouslySetInnerHTML={{ __html: JSON.stringify(jsonLd) }}
      />
      <article>...</article>
    </>
  )
}

Be honest in the markup. Rich-result eligibility depends on required properties per schema type, and Google validates aggressively — marking content with ratings it does not have or authorship it cannot show is how sites earn manual actions. I keep it boring: Article for posts, Person for the about page, BreadcrumbList where the UI actually shows breadcrumbs. Validate every template in the Rich Results Test before shipping, because one malformed JSON-LD block silently disqualifies the whole page.

Sitemaps and robots.txt: The Crawler's Map

The App Router turned sitemaps from an annoying build artifact into a typed function: app/sitemap.ts exports your URL list, generated from the same data source that renders the pages — on this site, the blog registry feeds both, so a new post cannot exist without a sitemap entry. Include lastModified honestly from real content dates; crawlers use it to prioritize re-crawls, and a sitemap where everything changed today reads as noise.

robots.txt via app/robots.ts follows the same pattern. Keep it minimal: point at the sitemap, block genuinely useless paths like API routes, and never use robots.txt to hide duplicate content — a blocked page cannot be crawled, which means Google can never see the canonical tag you put on it. Disallow and canonical are mutually exclusive tools, a subtlety straight out of Google's consolidation docs.

The Parts Developers Forget

Four items that are absolutely the developer's job, found broken on most sites I audit:

Honest status codes

A missing page must return 404, not a styled error component with HTTP 200. Soft 404s pollute the index, and redirect chains bleed signal — permanent moves get a single 301, not a 302 left over from testing.

One H1 and a real heading tree

Crawlers reconstruct document structure from headings, same as screen readers. The accessibility audit and the SEO audit converge here: heading hierarchy fixes serve both masters for free.

Metadata for the link preview economy

OpenGraph and Twitter card tags decide how every share on WhatsApp and LinkedIn renders. For Indonesian audiences, where WhatsApp link sharing dominates traffic, a missing og-image measurably depresses click-through.

Core Web Vitals as a ranking input

Page experience signals feed ranking. The perf work — LCP, CLS, INP — is not separate from SEO; it is the part of SEO that lives entirely in your codebase.

The Ship-Week Checklist

Every route emits an absolute self-referencing canonical; parameterized and duplicate URLs all resolve to one address.
hreflang set is complete, reciprocal, includes x-default, and is generated from one shared helper, not per template.
JSON-LD validates in the Rich Results Test for every page template, not just the homepage.
Sitemap is generated from the same data that renders routes, with truthful lastModified dates; robots.txt points to it.
Missing content returns real 404s; moved content returns 301s; no redirect chains deeper than one hop.
OG and Twitter meta render correctly in a WhatsApp and LinkedIn preview test.
Staging and preview deployments are blocked from indexing — a noindex header or auth wall, verified, because Google finding your staging site is a special kind of embarrassment.

The Takeaway

Technical SEO is a contract surface, and contracts are what we are good at. Canonical declares identity, hreflang declares language routing with a strict reciprocity rule, JSON-LD declares meaning, and the sitemap declares inventory. None of it requires marketing intuition — it requires the same discipline as an API: generate from one source of truth, validate in CI, verify in production with curl. Do the boring plumbing once, and your content competes on its actual merit.

Add a CI step that fetches three rendered pages — home, one localized post in each language — and asserts on canonical, hreflang completeness, and JSON-LD parseability. It takes an hour to write and has caught every SEO regression on this site since.

Sources and further reading

Frequently Asked Questions

Technical SEO for Developers: Canonical, hreflang, JSON-LD

Frequently Asked Questions

Technical SEO for Developers: Canonical, hreflang, JSON-LD

Canonical URLs: One Address per Piece of Content

hreflang: The Contract That Must Be Bidirectional

Structured Data: JSON-LD or Nothing

Sitemaps and robots.txt: The Crawler's Map

The Parts Developers Forget

The Ship-Week Checklist

The Takeaway

Canonical URLs: One Address per Piece of Content

hreflang: The Contract That Must Be Bidirectional

Structured Data: JSON-LD or Nothing

Sitemaps and robots.txt: The Crawler's Map

The Parts Developers Forget

The Ship-Week Checklist

The Takeaway