File Format Specification

The OLAMIP file (/olamip.json) is a structured JSON document that provides curated summaries of your website’s most important pages. It is designed to be easily parsed by large language models (LLMs), enabling them to understand, prioritize, and use your content with clarity, precision, and intent.

An image of a computer's internal components with the AI letters on top of a microchip

File Location

The file must be hosted at the root of your domain: https://yourdomain.com/olamip.json

Declaring the OLAMIP File Location

To maximize adoption and ensure that all systems can reliably locate your OLAMIP file, I recommend publishing both a <link> tag and a <meta> tag in your site’s <head> section.

1. `<link rel="olamip">` — Primary Discovery

Standardized practice: Crawlers and parsers already scan <link> tags for resources like canonical, sitemap, and alternate.
Machine-friendly: Declares a formal relationship between the page and the OLAMIP file.
Interoperability: Fits neatly into existing web standards, making it easier for AI systems to adopt without special handling.

2. <meta name=”olamip-location”> — Fallback Discovery

Human-readable: Simple for webmasters to add and understand.
Compatibility: Some parsers and tools prefer scanning <meta> tags for metadata.
Redundancy: Acts as a backup if a crawler doesn’t yet support rel=”olamip”.

Why Both Together Are Stronger

Future-proofing: As OLAMIP adoption grows, different systems may implement discovery differently. Including both ensures no system is left behind.
Resilience: If one method fails (e.g., a crawler ignores <link> tags), the other provides a fallback.
Ease of integration: Developers can choose whichever method fits their pipeline best, without forcing webmasters to guess.
Trust and clarity: Dual signals reduce ambiguity and make it explicit where the OLAMIP file lives.

Best Practice Implementation

<link rel="olamip" href="https://yourdomain.com/olamip.json">
<meta name="olamip-location" content="https://yourdomain.com/olamip.json">

By including both tags, you guarantee that your OLAMIP file is discoverable by the widest range of crawlers, validators, and AI systems, ensuring your content is always read the way you intend.

File Structure

The OLAMIP file must be a valid UTF-8 encoded JSON document containing:

Field	Description
protocol	Must be `"OLAMIP"`.
version	Protocol version (e.g., `"1.0"`).
identity	Describes the website or organization.
content	Contains overview, sections, subsections, and entries.
metadata	File‑level metadata such as language and last update date.

High‑level structure:


  {
    "protocol": "OLAMIP",
    "version": "1.0",
    "identity": { ... },
    "content": { ... },
    "metadata": { ... }
  }

1. Identity Object

Field	Type	Required	Description
name	string	✅ Yes	Name of the website or organization.
type	string	✅ Yes	Entity type (e.g., “company”, “blog”, “ecommerce”).
canonical_description	string	✅ Yes	Human‑readable description of the site.
tags	array	❌ No	Optional keywords describing the domain or industry.

2. Content Object

The content object contains:

an overview
a list of sections
each section may contain subsections
each section or subsection may contain entries

This supports multi‑level hierarchies.

Overview Object

Field	Type	Required	Description
summary	string	✅ Yes	A concise explanation of the website’s purpose.

2.1 Section Object Specification

A Section represents a category, collection, or grouping of content. Sections may contain:

entries (content items)
subsections (nested Section objects)

This allows unlimited nesting depth.

Section Fields

Field	Type	Required	Description
title	string	✅ Yes	Human‑readable name of the section.
summary	string	✅ Yes	Description of what the section contains.
url	string	✅ Yes	Canonical URL of the section.
section_type	string	✅ Yes	Semantic classification (see taxonomy).
policy	string	❌ No	“allow” or “forbid”. See explanation below for inheritance and default behavior.
tags	array	❌ No	Optional keywords.
priority	string	❌ No	“high”, “medium”, or “low”.
published	string	❌ No	ISO 8601 date.
entries	array	❌ Yes	Array of Entry objects.
subsections	array	❌ No	Array of nested Section objects.
language	string	❌ No	Use BCP‑47 language codes

Allowed `section_typ`e Values

section_type	Meaning
blog_category	Groups blog articles.
news_section	Groups news articles.
product_collection	Groups products or services.
doc_category	Groups documentation pages.
research_category	Groups research papers or datasets.
project_group	Groups portfolio projects.
content_section	Generic fallback.

Policy Behavior and Inheritance

The policy field controls whether AI systems are permitted to ingest the content represented by a Section, Subsection, or Entry. Valid values are "allow" and "forbid". This field is optional at all levels of the OLAMIP structure.

Default Behavior

If the policy field is omitted at a given level, the effective policy is determined through inheritance. If no ancestor defines a policy, the effective policy defaults to "allow".

Inheritance Rules

OLAMIP supports hierarchical inheritance of the policy field. AI systems must determine the effective policy for each Entry using the following lookup order:

Entry-level policy If the Entry defines a policy, that value is authoritative.
Subsection-level policy If the Entry omits policy, AI systems must check the nearest Subsection that contains it.
Section-level policy If neither the Entry nor its Subsection defines policy, AI systems must use the policy defined at the Section level.
Default policy If no ancestor defines a policy, the effective policy is "allow".

Intended Webmaster Usage

To make the entire website ingestible by AI systems, omit the policy field everywhere.
To control ingestion, use "allow" and "forbid" selectively at any level of the hierarchy.
A policy applied at a Section or Subsection automatically applies to all descendants unless overridden.

AI System Requirements

AI systems must:

Apply the default "allow" only when no explicit policy exists in the ancestor chain.
Respect the effective policy determined through inheritance.
Treat "forbid" as a strict prohibition on ingestion.
Treat "allow" as permission to ingest the content represented by that node.

Multi‑Level Hierarchy Diagram

 content
 └── sections[]
     ├── Section (Level 1)
     │     ├── entries[]
     │     └── subsections[]
     │           ├── Section (Level 2)
     │           │     ├── entries[]
     │           │     └── subsections[]
     │           │           └── Section (Level 3)
     │           │                 └── entries[]
     │           └── ...
     └── ...

This structure supports:

News → Politics → Opinion → Articles
Docs → API → Authentication → Pages
Store → Clothing → Men → Jackets → Products
Research → Physics → Quantum → Papers

2.2 Entry Object Specification

An Entry is the most granular content unit. Examples:

Blog article, news article, product page, documentation page, research paper, portfolio, project, legal page, downloadable resource.

Entry Fields

Field	Type	Required	Description
title	string	✅ Yes	Human‑readable title.
summary	string	✅ Yes	Concise description of the content.
url	string	✅ Yes	Canonical, absolute URL.
content_type	string	✅ Yes	Semantic classification (see taxonomy).
policy	string	❌ No	“allow” or “forbid”. Same as in Sections/Subsections.
tags	array	❌ No	Optional keywords.
priority	string	❌ No	“high”, “medium”, or “low”.
published	string	❌ No	ISO 8601 publication date.
language	string	❌ No	Use BCP‑47 language codes
metadata*	string	❌ No	Domain or page‑specific structured information

Arrays and Multi‑Value Fields

Some OLAMIP fields are designed to hold more than one value. Whenever a field contains multiple elements, such as tags, or any custom list defined within the optional metadata field of an Entry object; it must be expressed as a JSON array. Arrays are enclosed in square brackets ([ ]) and contain a comma‑separated sequence of individual string values.

Using arrays ensures that AI systems can reliably interpret multi‑value data without ambiguity. Each element inside the array must be a standalone string, and the order of elements should remain consistent whenever it carries semantic meaning.

*The metadata field is used to store domain‑specific or page‑specific structured information that extends beyond the core OLAMIP fields. This field allows publishers to include additional machine‑interpretable details relevant to their industry or content type, providing AI systems with richer contextual signals without altering the core protocol.

Why URLs Are Required

The URL field is essential because it serves as the canonical identifier for the content. While summaries convey the meaning of the content, URLs link that meaning to a specific and verifiable location on the web. AI systems utilize URLs for deduplication, retrieval, validation, and cross-referencing with schema.org, sitemaps, and crawlers.

Allowed `content_type` Values

General Pages

content_type	Meaning
page	Standard content page.
landing_page	Marketing or campaign page.
legal_page	Terms, privacy, disclaimers.

Blog

content_type	Meaning
blog_article	A blog post.

News

content_type	Meaning
news_article	A news story.

E‑commerce

content_type	Meaning
product	A product page.
service	A service offering.

Documentation

content_type	Meaning
doc_page	A documentation or help page.

Research

content_type	Meaning
research_paper	Academic or scientific paper.
dataset	Research dataset.

Portfolio

content_type	Meaning
project	Portfolio project or case study.

Media / Resources

content_type	Meaning
media_item	Video, audio, gallery, etc.
resource	Downloadable or reference material.

Multi‑Level Examples

Below are examples for different website types showing how subsections and entries work together.

Blog Website Example (3 Levels)

 Blog
 └── Photography
       └── Tutorials
             └── Articles

{
  "title": "Blog",
  "summary": "Articles and guides across multiple topics.",
  "url": "https://example.com/blog/",
  "section_type": "blog_category",
  "policy": "allow",
  "subsections": [
    {
      "title": "Photography",
      "summary": "Articles about photography techniques and gear.",
      "url": "https://example.com/blog/photography/",
      "section_type": "blog_category",
      "subsections": [
        {
          "title": "Tutorials",
          "summary": "Step-by-step photography guides.",
          "url": "https://example.com/blog/photography/tutorials/",
          "section_type": "blog_category",
          "entries": [
            {
              "title": "How to Shoot Long Exposure",
              "summary": "A beginner-friendly guide to long exposure photography.",
              "url": "https://example.com/blog/photography/tutorials/long-exposure/",
              "content_type": "blog_article"
            }
          ]
        }
      ]
    }
  ]
}

E‑commerce Website Example (Collections → Subcollections → Products)

 Store
 └── Clothing
       └── Men
             └── Jackets

{
  "title": "Clothing",
  "summary": "Apparel for all categories.",
  "url": "https://store.com/clothing/",
  "section_type": "product_collection",
  "policy": "allow",
  "subsections": [
    {
      "title": "Men",
      "summary": "Men's apparel.",
      "url": "https://store.com/clothing/men/",
      "section_type": "product_collection",
      "subsections": [
        {
          "title": "Jackets",
          "summary": "Men's jackets and outerwear.",
          "url": "https://store.com/clothing/men/jackets/",
          "section_type": "product_collection",
          "entries": [
            {
              "title": "Linen Jacket",
              "summary": "Lightweight linen jacket for summer.",
              "url": "https://store.com/products/linen-jacket/",
              "content_type": "product"
            }
          ]
        }
      ]
    }
  ]
}

Documentation Website Example (Category → Subcategory → Pages)

 Docs
 └── API
      └── Authentication
            └── Pages

{
  "title": "API Documentation",
  "summary": "Technical reference for developers.",
  "url": "https://docs.example.com/api/",
  "section_type": "doc_category",
  "policy": "allow",
  "subsections": [
    {
      "title": "Authentication",
      "summary": "Guides for API authentication.",
      "url": "https://docs.example.com/api/auth/",
      "section_type": "doc_category",
      "entries": [
        {
          "title": "API Key Authentication",
          "summary": "How to authenticate using API keys.",
          "url": "https://docs.example.com/api/auth/api-keys/",
          "content_type": "doc_page"
        }
      ]
    }
  ]
}

Research Website Example (Field → Subfield → Papers)

 Research
 └── Physics
      └── Quantum Mechanics
            └── Papers

{
  "title": "Physics",
  "summary": "Research in classical and modern physics.",
  "url": "https://research.example.com/physics/",
  "section_type": "research_category",
  "policy": "allow",
  "subsections": [
    {
      "title": "Quantum Mechanics",
      "summary": "Research papers on quantum theory.",
      "url": "https://research.example.com/physics/quantum/",
      "section_type": "research_category",
      "entries": [
        {
          "title": "Quantum Entanglement in Multi‑Particle Systems",
          "summary": "A study of entanglement behavior in complex quantum systems.",
          "url": "https://research.example.com/papers/entanglement/",
          "content_type": "research_paper"
        }
      ]
    }
  ]
}

Priority Field Guidelines

Value	Meaning
high	Flagship, mission‑critical content. Use sparingly.
medium	Default for most content.
low	Niche, outdated, or low‑value content.

Best Practices

Recommendation	Reason
Limit high to 5–10%	Preserves meaning of the signal.
Default to medium	Ensures consistency.
Use low for niche/legacy content	Reduces noise.
Review priorities regularly	Keeps the file accurate.

Why It Matters

LLMs may use "priority" to:

Allocate more attention during training
Rank pages for retrieval tasks
Filter out less relevant content

If every page is marked "high", the signal becomes meaningless, and your most valuable content gets lost in the noise.

Why Categorical Priority Works Best

Benefit	Explanation
Clarity & Consistency	“High/Medium/Low” is universally interpretable.
Simpler for Publishers	No numeric scoring required.
Easier to Validate	Tools can detect misuse.
Flexible for LLM Pipelines	Models can internally map categories to weights.

Arrays and Multi‑Value Fields

3. Metadata Object

 
 "metadata": { 
     "last_updated": "2026-01-21",
     "language": "en",
     "source_url": "https://www.yourwebsite.com/",
     "copyright": "© year Copyright Holder" 
 }

Multilanguage Support

To fully support multilingual websites, you should define language at:

File level (global default), inside metadata.

Section level (optional override).

Entry level (optional override).

This is essential for:

multilingual blogs
international news outlets
research sites with papers in multiple languages
e‑commerce stores with localized product pages

Language Format

Use BCP‑47 language codes, the global standard used by:

schema.org
HTML lang attribute
W3C
search engines
major LLM pipelines

Examples:

Language	Code
English	`en`
Spanish	`es`
French	`fr`
German	`de`
Portuguese (Brazil)	`pt-BR`
Chinese (Simplified)	`zh-CN`
Arabic	`ar`

Why This Matters for AI Systems

LLMs use language metadata to:

choose the correct tokenizer
apply the right summarization model
avoid mixing languages in embeddings
improve retrieval accuracy
reduce hallucinations in multilingual contexts
support cross‑language search and translation

Without explicit language fields, AI systems must guess, and they often guess wrong.

General Validation Rules

Rule	Requirement
Valid JSON	No trailing commas or malformed structures.
Required fields	Sections and entries must include required fields.
Canonical URLs	Must be absolute and stable.
Summary length	Under 500 characters.
Tags	Lowercase, single‑word strings.
Subsections	Must follow the Section schema.

Versioning

Guideline	Purpose
Parsers ignore unknown fields.	Ensures forward compatibility.
Publishers validate against latest schema.	Ensures correctness.

Semantic Alignment

OLAMIP complements existing structured data standards.

Standard	Purpose
schema.org / JSON‑LD	Defines what a page is for search engines and knowledge graphs.
OLAMIP	Explains why the page matters and how LLMs should interpret it.

Together

schema.org → structural meaning
OLAMIP → human‑curated interpretation

This dual‑layer approach improves AI comprehension and reduces hallucination.

Practical Integration

Recommended Workflow

Step	Action
1	Keep schema.org markup in your HTML.
2	Add OLAMIP to describe meaning, importance, and structure.
3	Reference schema.org types or Wikidata IDs when helpful.
4	Keep summaries concise and factual.
5	Update OLAMIP regularly.

Role Comparison

Task	schema.org	OLAMIP
Describe structured entities	✔️
Improve search engine visibility	✔️
Provide human‑curated summaries		✔️
Classify content for LLMs		✔️
Prioritize important pages		✔️
Provide multilingual context		✔️

Delta Updates (olamip-delta.json)

To support continuous learning and efficient AI ingestion, OLAMIP includes an optional companion file:olamip-delta.json. This file contains only the changes since the last update to your main olamip.json file. It may include:

added: new pages or products
updated: modified summaries, tags, or metadata
removed: URLs no longer present on your site

Delta files allow AI systems to stay synchronized with your content without reprocessing the entire dataset. Important: The main olamip.json must always remain fully updated and reflect the current state of your website. Delta files are incremental updates, not replacements.