Validation and Rendering

AI Quick Start

Ask your AI agent: “Add a custom content validator to my Pagesmith collection that checks for missing image alt text and warns on TODO markers in markdown files.” Then read on to understand what happened and customize further.

Pagesmith separates validation from rendering so you can keep content workflows fast. Validation happens at load time (when getCollection() is called), while rendering is lazy and happens only when you call entry.render().

The diagram below highlights the key boundary on this page: getCollection() performs the full validation path up front, while entry.render() is a separate lazy step that builds and caches HTML only when you need it.

Flow showing Pagesmith validating content during getCollection, then deferring markdown rendering until entry.render caches the HTML result

Validation Pipeline

Validation runs in three phases during content loading, all orchestrated by ContentStore:

Phase 1: Schema Validation

Every entry’s data object is validated against the collection’s Zod schema using validateSchema(). This wraps Zod’s safeParse and converts errors into structured ValidationIssue[] objects:

1type ValidationIssue = {2  message: string;3  severity: "error" | "warn";4  field?: string; // Dot-path to the invalid field (e.g., "tags.0")5};

The coerced result from safeParse is reused as the entry data, so Zod transforms (like z.coerce.date()) are applied automatically.

Phase 2: Content Validators

For markdown collections, Pagesmith runs content validators on the raw markdown AST (MDAST). The key optimization is that one MDAST parse is shared across all validators via the ValidatorContext.mdast field, avoiding redundant parsing.

The ValidatorContext provides:

1type ValidatorContext = {2  filePath: string; // Absolute path to the source file3  slug: string; // URL-friendly slug4  collection: string; // Collection name5  rawContent?: string; // Raw markdown body6  data: Record<string, any>; // Parsed frontmatter/data7  mdast?: Root; // Pre-parsed MDAST tree (shared)8};

The MDAST tree is parsed once in runValidators() using unified().use(remarkParse).parse(rawContent) and set on the context before any validators execute.

Built-in Validators

Pagesmith provides four built-in validators for markdown content (the builtinMarkdownValidators array):

linkValidator (configurable via createLinkValidator(options)) — Checks links and images:

Warns on empty link text.
Errors on missing image alt text (toggle with requireAltText).
Errors on raw <img> tags outside a <picture> element (toggle with forbidHtmlImgTag).
Errors when adjacent *-light.<ext> / *-dark.<ext> image pairs do not match (toggle with requireThemeVariantPairs).
Warns on malformed external URLs.
Errors on broken internal links (resolved against the markdown source path, the configured rootDir/basePath, and additionalRoots).
Optional checks: internalLinksMustBeMarkdown, requireCanonicalInternalLinks, and an opt-in external-URL reachability fetch (unreachableSeverity, fetchTimeoutMs, fetchConcurrency).

headingValidator — Checks heading structure (all warnings, never aborts):

Warns when a document with content has no headings.
Warns on empty heading text.
Warns when more than one h1 is present (one warning per extra h1).
Warns when heading levels skip (for example h1 → h3).

codeBlockValidator — Checks fenced code block meta syntax:

Warns when meta is set but the code block has no language identifier.
Warns on unknown meta keys (anything outside title, showLineNumbers, startLineNumber, wrap, frame, mark, ins, del, collapse).
Warns on malformed line ranges in mark, ins, del, or collapse.

imageStructureValidator — Enforces the <figure><picture>...<img></picture><figcaption?></figure> shape (and <figure><img></figure> for SVG/GIF). Errors on <figure> nested inside <picture>, missing or multiple <img> inside <picture>, unbalanced <picture> tags, or foreign tags inside <picture>. Walks both MDAST html nodes and raw markdown.

Custom Content Validators

Implement the ContentValidator interface and add to the validators array:

1import type { ContentValidator } from "@pagesmith/core";23const noTodoValidator: ContentValidator = {4  name: "no-todo",5  validate(ctx) {6    const issues = [];7    if (ctx.rawContent?.includes("TODO")) {8      issues.push({9        message: "Content contains TODO markers",10        severity: "warn" as const,11      });12    }13    return issues;14  },15};1617const posts = defineCollection({18  loader: "markdown",19  directory: "content/posts",20  schema: z.object({ title: z.string() }),21  validators: [noTodoValidator],22});

Custom validators receive the same shared ValidatorContext with the pre-parsed MDAST tree. You can walk the MDAST tree for structural analysis:

1import type { ContentValidator } from "@pagesmith/core";2import { visit } from "unist-util-visit";34const imageAltValidator: ContentValidator = {5  name: "image-alt-text",6  validate(ctx) {7    const issues = [];8    if (ctx.mdast) {9      visit(ctx.mdast, "image", (node: any) => {10        if (!node.alt || node.alt.trim() === "") {11          issues.push({12            message: `Image missing alt text: ${node.url}`,13            severity: "warn" as const,14          });15        }16      });17    }18    return issues;19  },20};

Disabling Built-in Validators

Set disableBuiltinValidators: true on a collection to skip the built-in link, heading, code block, and image-structure validators:

1const posts = defineCollection({2  loader: "markdown",3  directory: "content/posts",4  schema: z.object({ title: z.string() }),5  disableBuiltinValidators: true,6  validators: [myCustomValidator],7});

Error Handling in Validators

Validators that throw errors are caught and converted to error-severity issues, so one failing validator does not abort the rest:

1for (const validator of validators) {2  try {3    const result = await validator.validate(ctx);4    issues.push(...result);5  } catch (err) {6    const message = err instanceof Error ? err.message : String(err);7    issues.push({8      message: `Validator "${validator.name}" threw: ${message}`,9      severity: "error",10    });11  }12}

Phase 3: Plugin Validators

If ContentPlugin instances are registered in the config, their validate() hooks run after all other validation. Plugin validators receive { data, content? } and return string[] of error messages:

1type ContentPlugin = {2  name: string;3  rehypePlugin?: () => (tree: any) => void;4  remarkPlugin?: () => (tree: any) => void;5  validate?: (entry: { data: Record<string, any>; content?: string }) => string[];6};

Collection-Level Validate Hook

The validate hook on CollectionDef provides a lightweight alternative to full ContentValidator instances. It runs during loading and returns a string error message or undefined:

1const posts = defineCollection({2  loader: "markdown",3  directory: "content/posts",4  schema: z.object({5    title: z.string(),6    date: z.coerce.date(),7  }),8  validate(entry) {9    if (entry.data.date > new Date()) {10      return "Post date cannot be in the future";11    }12  },13});

Rendering Model

Lazy Rendering

ContentEntry.render() is lazy — content loads with metadata, schema validation, and AST validation, but markdown becomes HTML only when you explicitly call render().

The ContentEntry class stores:

slug — URL-friendly identifier
collection — collection name
filePath — absolute path to source file
data — validated data (typed by the Zod schema)
rawContent — raw markdown body (only for markdown loaders)

When render() is called:

If a cached result exists (and force is not set), return immediately
If rawContent is empty (non-markdown entry), return { html: '', headings: [], readTime: 0 }
Process rawContent through the unified markdown pipeline (processMarkdown())
Compute read time from the raw markdown source (not the rendered HTML)
Cache and return the RenderedContent:

1type RenderedContent = {2  html: string; // Processed HTML3  headings: Heading[]; // Extracted headings for TOC { depth, text, slug }4  readTime: number; // Estimated read time in minutes5};

Render Caching

Rendered output is cached per entry after the first render() call. You can control caching with:

entry.render() — returns cached result if available
entry.render({ force: true }) — forces a re-render, replacing the cache
entry.clearRenderCache() — clears the cache without re-rendering

Read Time Computation

Read time is computed from the original markdown source rather than rendered HTML. This produces better estimates because it counts the actual words the reader will see, not HTML tags and attributes. The computation uses a standard 200 words-per-minute reading rate.

Markdown Pipeline

The markdown pipeline is built using the unified ecosystem with a built-in Shiki-backed code renderer for syntax highlighting and code block features. The full chain:

1remark-parse                    Parse markdown to MDAST2  -> remark-gfm                Tables, strikethrough, task lists, autolinks3  -> remark-frontmatter        Strip YAML frontmatter from AST4  -> remark-github-alerts      > [!NOTE], > [!TIP], etc.5  -> remark-smartypants        Smart quotes, dashes, ellipses6  -> remark-math (optional)    Enabled when `markdown.math` is `true` or `'auto'` detects math markers7  -> [user remark plugins]     From MarkdownConfig.remarkPlugins8  -> lang-alias transform      Map fenced-code language tags via `markdown.shiki.langAlias`9  -> remark-rehype             MDAST -> HAST (`allowDangerousHtml` defaults to true)10  -> rehype-mathjax/svg        Render math to SVG (when math is enabled)11  -> applyPagesmithCodeRenderer Syntax highlighting, code frames, copy/collapse UI12  -> rehype-code-tabs          Group consecutive titled blocks into tabs13  -> rehype-scrollable-tables  Wrap markdown tables for overflow-safe scrolling14  -> rehype-slug               Add id="" to headings15  -> rehype-autolink-headings  Wrap heading text in anchor links (behavior: 'wrap')16  -> rehype-external-links     target="_blank" on external URLs17  -> rehype-accessible-emojis  aria-label on emoji characters18  -> rehype-local-images       Fill intrinsic image dimensions, AVIF/WebP <picture> fallbacks, light/dark <picture> pairs, <figure>/<figcaption> wrapping19  -> heading extraction        Custom plugin: walk HAST, collect Heading[]20  -> [user rehype plugins]     From MarkdownConfig.rehypePlugins21  -> rehype-stringify           Serialize HAST to HTML string

The processor is cached per MarkdownConfig object reference via a WeakMap to avoid rebuilding the plugin chain on every call.

Built-in Code Renderer Configuration

Pagesmith uses a built-in Shiki-backed code renderer for all code block processing. It handles:

Dual-theme syntax highlighting (defaults to github-light / github-dark)
Code block frames with title bars
Line numbers (enabled by default)
Copy-to-clipboard buttons
Line highlighting, insertions, and deletions
Collapsible sections
Code block grouping/tabs

Shared code-block chrome ships in the normal Pagesmith CSS bundles, while Shiki token colors are injected during markdown processing and the shared Pagesmith content runtime handles copy/collapse behavior in the browser.

The renderer respects Pagesmith design tokens for font families, sizes, and border radius via CSS custom properties (--ps-font-sans, --ps-font-mono, --ps-font-size-sm, --ps-radius-lg, --ps-color-border-subtle).

Code Block Meta Syntax

The built-in renderer supports a rich meta syntax on fenced code blocks. Add options after the language identifier:

Title

Display a filename or label in the code block header:

1```js title="app.js"2console.log('hello')3```

Line Numbers

Line numbers are shown by default (controlled by markdown.shiki.defaultShowLineNumbers). Hide them per block:

1```js showLineNumbers=false2console.log('no line numbers')3```

Line Highlighting

Mark lines to draw attention. Use mark for neutral highlights, ins for additions (green), and del for deletions (red):

1```js mark={3} ins={4} del={5}2const a = 13const b = 24const c = 3  // highlighted5const d = 4  // inserted (green)6const e = 5  // deleted (red)7```

Line ranges are supported: mark={1-3,7}, ins={2-4}, del={8-10}.

Collapsible Sections

Collapse line ranges to keep long code blocks readable:

1```js collapse={1-5,12-14}2// These lines will be collapsed3import { a } from 'a'4import { b } from 'b'5import { c } from 'c'6import { d } from 'd'78// Visible code here9const result = a + b + c + d10console.log(result)1112// These will also be collapsed13// cleanup code14// more cleanup15```

Word Wrap

Enable word wrapping for long lines:

1```js wrap2const veryLongVariable = 'this is a very long string that would normally overflow the code block and require horizontal scrolling'3```

Frame Type

Control the frame style (code, terminal, or none). When omitted, the frame is auto-detected from the language:

1```bash frame="terminal"2npm install @pagesmith/core3```

Combined Example

Multiple options can be combined on a single code block:

1```ts title="server.ts" mark={3-4} ins={6} collapse={1-2}2import express from 'express'3import { createContentLayer } from '@pagesmith/core'45const layer = createContentLayer(config)6const app = express()78app.get('/api/posts', async (req, res) => {9  const posts = await layer.getCollection('posts')10  res.json(posts.map(p => p.data))11})12```

Plugins

Content plugins can inject into the markdown pipeline at two points:

remarkPlugin — runs as a remark plugin on the MDAST
rehypePlugin — runs as a rehype plugin on the HAST

Plugin remark and rehype plugins are collected and appended to the pipeline during rendering, so they run on every markdown entry. Plugin validators run during the loading phase, not during rendering.

1import { defineConfig } from "@pagesmith/core";23const myPlugin = {4  name: "my-plugin",5  rehypePlugin: () => (tree) => {6    // Transform the HAST tree7  },8  validate: (entry) => {9    const errors = [];10    if (!entry.data.title) {11      errors.push("Missing required title field");12    }13    return errors;14  },15};1617const config = defineConfig({18  collections: { posts },19  plugins: [myPlugin],20});

Direct Conversion

Use the content layer when you need collection semantics (file discovery, schema validation, caching). Use direct conversion when you only have an isolated markdown string:

1// Via the content layer (respects the layer's markdown config)2const fragment = await layer.convert("# Hello\n\nWorld");3// fragment.html, fragment.headings, fragment.toc, fragment.frontmatter45// Via the standalone convert() function6import { convert } from "@pagesmith/core";7const result = await convert("# Hello\n\nWorld", {8  markdown: { shiki: { themes: { light: "github-light", dark: "github-dark" } } },9});10// result.html, result.headings, result.toc, result.frontmatter

The ConvertResult type:

1type ConvertResult = {2  html: string;3  headings: Heading[];4  /** @deprecated Use `headings` instead. */5  toc: Heading[];6  frontmatter: Record<string, any>;7};

convert() extracts headings from the rendered HTML using extractToc() (regex-based) and exposes them on both headings and the deprecated toc alias. entry.render() extracts headings from the HAST during processing (more accurate) and only returns headings.

Validation in CI

Use layer.validate() in application code when you want content validation results as structured data:

1const layer = createContentLayer(config);23// Validate all collections4const results = await layer.validate();56// Validate a specific collection7const postResults = await layer.validate("posts");89// Check results10for (const result of results) {11  console.log(`${result.collection}: ${result.errors} errors, ${result.warnings} warnings`);12  for (const entry of result.entries) {13    for (const issue of entry.issues) {14      console.log(`  [${issue.severity}] ${entry.slug}: ${issue.message}`);15    }16  }17}

The ValidationResult type:

1type ValidationResult = {2  collection: string;3  entries: Array<{4    slug: string;5    filePath: string;6    issues: ValidationIssue[];7  }>;8  errors: number;9  warnings: number;10};

Validation Issue Lifecycle

Issues are collected during loading and stored alongside each entry in the ContentStore. The lifecycle:

Schema issues — generated by validateSchema() during loading
Content issues — generated by runValidators() during loading
Plugin issues — generated by plugin validate() hooks during loading
Load failures — caught and wrapped as error-severity issues

Issues do not prevent an entry from being returned by getCollection(). Entries with validation errors are still accessible — they may just have partial or coerced data. Use layer.validate() to inspect issues programmatically.