Collections and Loaders

AI Quick Start

Ask your AI agent: “Set up a content collection with schema validation for my blog posts in content/posts/. Use a Zod schema with title, date, tags, and draft fields. Read node_modules/@pagesmith/core/skills/pagesmith-core-setup/references/usage.md for reference.” Then read on to understand what happened and customize further.

Collections are the core FS-CMS primitive in Pagesmith. Each collection maps a filesystem directory to:

a loader (how to parse the files)
a schema (how to validate the data)
optional transforms, computed fields, filters, and validators

The diagram below gives the page’s mental model in one pass: files flow through a loader into a raw entry, then Pagesmith applies normalization, validation, derived fields, and optional filtering before you consume typed entries.

Collection pipeline showing files flowing through a loader, transform, schema, derived fields, filters, and into typed entries that are consumed through the content layer or Vite plugin

Defining a Collection

Use defineCollection() to create a type-safe collection definition:

1import { defineCollection, z } from "@pagesmith/core";23const posts = defineCollection({4  loader: "markdown",5  directory: "content/posts",6  schema: z.object({7    title: z.string(),8    date: z.coerce.date(),9    tags: z.array(z.string()).default([]),10    draft: z.boolean().default(false),11  }),12});

The defineCollection() function is a type-safe identity function — it does not transform the definition, but provides TypeScript inference from the Zod schema so that entry.data is fully typed.

CollectionDef Fields

The full CollectionDef<S> type accepts:

Field	Type	Required	Description
`loader`	`LoaderType \| Loader`	Yes	Built-in loader type string or a custom `Loader` instance
`directory`	`string`	Yes	Directory containing collection files (relative to `root`)
`schema`	`ZodType`	Yes	Zod schema for validating entry data
`include`	`string[]`	No	Glob patterns to include (defaults based on loader extensions)
`exclude`	`string[]`	No	Glob patterns to exclude
`computed`	`Record<string, (entry) => any>`	No	Computed fields derived from entry data
`transform`	`(entry) => RawEntry`	No	Pre-validation transform
`filter`	`(entry) => boolean`	No	Filter entries (return false to exclude)
`slugify`	`(filePath, directory) => string`	No	Custom slug generation
`validate`	`(entry) => string \| undefined`	No	Custom validation hook
`validators`	`ContentValidator[]`	No	Custom content validators
`disableBuiltinValidators`	`boolean`	No	Disable built-in markdown validators

Built-In Loaders

Each loader implements the Loader interface:

1interface Loader {2  name: string;3  kind: "markdown" | "data";4  extensions: string[];5  load(filePath: string): LoaderResult | Promise<LoaderResult>;6}78interface LoaderResult {9  data: Record<string, any>; // Parsed data (frontmatter or full object)10  content?: string; // Raw body content (markdown only)11}

The kind field distinguishes between markdown loaders (which produce a body for rendering) and data loaders (which produce structured data only).

Loader	Type String	Extensions	Kind	Parser	Notes
`MarkdownLoader`	`markdown`	`.md`	`markdown`	`gray-matter`	YAML frontmatter extracted as `data`, markdown body as `content`
`JsonLoader`	`json` / `json5`	`.json`, `.json5`	`data`	`JSON.parse` / `JSON5.parse`	Parser is chosen from the file extension
`JsoncLoader`	`jsonc`	`.jsonc`	`data`	comment stripping + `JSON.parse`	JSON with comments and trailing commas
`YamlLoader`	`yaml`	`.yml`, `.yaml`	`data`	`yaml` package	Structured config and content data
`TomlLoader`	`toml`	`.toml`	`data`	`smol-toml`	Useful for feature flags or settings

Loader resolution happens through resolveLoader(). If you pass a string like 'markdown', it creates a new MarkdownLoader() instance. If you pass an object implementing the Loader interface, it uses it directly.

Default include patterns are derived from the loader’s extensions array: each extension becomes a **/*<ext> glob pattern.

Custom Loaders

Custom loaders are objects implementing the Loader interface. Use them for CSV, XML, or proprietary export formats when the built-in loaders are not enough.

1import type { Loader, LoaderResult } from "@pagesmith/core/loaders";2import { readFileSync } from "fs";3import { parse as parseCsv } from "csv-parse/sync";45const csvLoader: Loader = {6  name: "csv",7  kind: "data",8  extensions: [".csv"],9  load(filePath: string): LoaderResult {10    const raw = readFileSync(filePath, "utf-8");11    const records = parseCsv(raw, { columns: true });12    // Return the first row as data (or adapt to your needs)13    return { data: records[0] ?? {} };14  },15};1617const metrics = defineCollection({18  loader: csvLoader,19  directory: "content/metrics",20  schema: z.object({21    name: z.string(),22    value: z.coerce.number(),23  }),24});

Collection Hooks

Transform

The transform hook runs before schema validation and receives a RawEntry (which has data, optional content, filePath, and slug). Use it to normalize data before validation:

1const posts = defineCollection({2  loader: "markdown",3  directory: "content/posts",4  schema: z.object({5    title: z.string(),6    tags: z.array(z.string()).default([]),7  }),8  transform(entry) {9    // Normalize tags to lowercase before validation10    if (entry.data.tags) {11      entry.data.tags = entry.data.tags.map((tag: string) => tag.toLowerCase());12    }13    return entry;14  },15});

Transforms can also be async:

1transform: async (entry) => {2  entry.data.wordCount = entry.content?.split(/\s+/).length ?? 03  return entry4},

Computed Fields

Computed fields are derived from entry data and merged into entry.data before schema validation (so they can be validated by the schema). They are defined as a map of field names to functions:

1const posts = defineCollection({2  loader: "markdown",3  directory: "content/posts",4  schema: z.object({5    title: z.string(),6    date: z.coerce.date(),7    tags: z.array(z.string()).default([]),8  }),9  computed: {10    slugPath: (entry) => `/posts/${entry.slug}/`,11    year: (entry) => new Date(entry.data.date).getFullYear(),12    hasMultipleTags: (entry) => entry.data.tags.length > 1,13  },14});

Computed field types are inferred and included in InferCollectionData<T>, so entry.data.slugPath will be typed as string automatically.

Filter

The filter hook runs after computed fields but before schema validation. Return false to exclude an entry from the collection results:

1const posts = defineCollection({2  loader: "markdown",3  directory: "content/posts",4  schema: z.object({5    title: z.string(),6    draft: z.boolean().default(false),7  }),8  filter(entry) {9    // Exclude drafts in production10    return process.env.NODE_ENV === "development" || !entry.data.draft;11  },12});

Custom Slug Generation

By default, Pagesmith generates slugs from relative file paths using toSlug(). Override this with slugify:

1const posts = defineCollection({2  loader: "markdown",3  directory: "content/posts",4  schema: z.object({5    title: z.string(),6    date: z.coerce.date(),7  }),8  slugify(filePath, directory) {9    // Use date-prefixed slugs: 2024-01-15-hello-world -> hello-world10    const base = path.relative(directory, filePath).replace(/\.md$/, "");11    return base.replace(/^\d{4}-\d{2}-\d{2}-/, "");12  },13});

Custom Validate Hook

The validate hook provides a lightweight way to add entry-level validation without implementing a full ContentValidator. Return a string to report an error, or undefined to pass:

1const posts = defineCollection({2  loader: "markdown",3  directory: "content/posts",4  schema: z.object({5    title: z.string(),6    date: z.coerce.date(),7  }),8  validate(entry) {9    if (entry.data.title.length > 100) {10      return "Title must be 100 characters or fewer";11    }12    // Return undefined to pass13  },14});

Slugs

Pagesmith generates slugs from relative file paths by default. The toSlug() utility strips file extensions and directory-based README/index suffixes:

File Path (relative to directory)	Generated Slug
`hello-world.md`	`hello-world`
`getting-started/README.md`	`getting-started`
`guide/setup/index.md`	`guide/setup`
`2024/my-post.yaml`	`2024/my-post`

Use slugify(filePath, directory) when you need custom URL rules.

Schema Strategy

Prefer schemas that describe exactly what your templates need:

Coerce dates at the schema level — use z.coerce.date() so string dates from frontmatter become Date objects automatically
Default array and boolean fields — use .default([]) and .default(false) to avoid undefined checks in templates
Keep markdown-derived fields as computed fields — read time, word count, and URL paths work well as computed fields rather than requiring them in frontmatter
Use .passthrough() sparingly — prefer explicit schemas that document all expected fields
Use .optional() for truly optional fields — the schema should reflect what is required vs. optional

Example of a well-structured schema:

1const posts = defineCollection({2  loader: "markdown",3  directory: "content/posts",4  schema: z.object({5    title: z.string(),6    description: z.string().optional(),7    date: z.coerce.date(),8    lastUpdated: z.coerce.date().optional(),9    tags: z.array(z.string()).default([]),10    category: z.enum(["tutorial", "guide", "reference"]).default("guide"),11    draft: z.boolean().default(false),12    featured: z.boolean().default(false),13    coverImage: z.string().optional(),14  }),15  computed: {16    url: (entry) => `/blog/${entry.slug}/`,17    readTime: (entry) => Math.ceil((entry.content?.split(/\s+/).length ?? 0) / 200),18  },19  filter: (entry) => !entry.data.draft || process.env.NODE_ENV === "development",20});

Multiple Collections

Use defineCollections() for defining several collections at once with strong literal type inference:

1import { defineCollection, defineCollections, z } from "@pagesmith/core";23const collections = defineCollections({4  posts: defineCollection({5    loader: "markdown",6    directory: "content/posts",7    schema: z.object({ title: z.string(), date: z.coerce.date() }),8  }),9  authors: defineCollection({10    loader: "yaml",11    directory: "content/authors",12    schema: z.object({ name: z.string(), bio: z.string() }),13  }),14  settings: defineCollection({15    loader: "json",16    directory: "content/config",17    schema: z.object({ siteName: z.string(), features: z.record(z.boolean()) }),18  }),19});

Using Collections with the Content Layer

1import { createContentLayer, defineConfig } from "@pagesmith/core";23const config = defineConfig({4  collections: { posts, authors },5  markdown: {6    shiki: { themes: { light: "github-light", dark: "github-dark" } },7  },8});910const layer = createContentLayer(config);1112// Get all entries in a collection13const allPosts = await layer.getCollection("posts");1415// Get a single entry by slug16const post = await layer.getEntry("posts", "hello-world");1718// Render markdown to HTML (lazy, cached)19if (post) {20  const rendered = await post.render();21  // rendered.html, rendered.headings, rendered.readTime22}2324// Validate all collections25const results = await layer.validate();

Using Collections with the Vite Plugin

For Vite-based projects, use the pagesmithContent() plugin to expose collections as virtual modules. Most app integrations keep that import on @pagesmith/site/vite, while @pagesmith/core/vite remains available for lower-level content-only setups:

1import { pagesmithContent } from "@pagesmith/site/vite";2import collections from "./content.config";34export default defineConfig({5  plugins: [pagesmithContent(collections)],6});

Then import collection data in your application code:

1import posts from "virtual:content/posts";23// Markdown entries: { id, contentSlug, html, headings, frontmatter }4// Data entries: { id, contentSlug, data }5for (const post of posts) {6  console.log(post.frontmatter.title, post.html);7}

The Vite plugin generates TypeScript declarations (pagesmith-content.d.ts) so imports are fully typed.

Content Plugins

Content plugins extend the markdown pipeline and validation system. A ContentPlugin can inject remark and rehype plugins and add custom validation:

1import { defineConfig } from "@pagesmith/core";23const myPlugin = {4  name: "my-plugin",5  rehypePlugin: () => (tree) => {6    // Transform the HAST tree7  },8  remarkPlugin: () => (tree) => {9    // Transform the MDAST tree10  },11  validate: (entry) => {12    const errors = [];13    if (!entry.data.title) {14      errors.push("Missing required title field");15    }16    return errors;17  },18};1920const config = defineConfig({21  collections: { posts },22  plugins: [myPlugin],23});

Plugin remark and rehype plugins are appended to the pipeline during rendering. Plugin validators run during the loading phase, after schema and content validators.

Practical Guidance

Use one collection per durable content type.
Keep singleton config files in their own collection.
Prefer markdown collections for longform content and structured loaders for metadata-heavy content.
Use include and exclude patterns when a directory contains mixed file types.
Keep schemas strict — they serve as documentation for your content model.
Use transforms for data normalization and computed fields for derived values.
Use filters to exclude draft content in production without removing files from the filesystem.