The OLAMIP file (/olamip.json) is a structured JSON document that provides curated summaries of your website’s most important pages. It is designed to be easily parsed by large language models (LLMs), enabling them to understand, prioritize, and use your content with clarity, precision, and intent.

File Location
- The file must be hosted at the root of your domain: https://yourdomain.com/olamip.json
Declaring the OLAMIP File Location
To maximize adoption and ensure that all systems can reliably locate your OLAMIP file, I recommend publishing both a <link> tag and a <meta> tag in your site’s <head> section.
1. <link rel="olamip"> — Primary Discovery
- Standardized practice: Crawlers and parsers already scan
<link>tags for resources like canonical, sitemap, and alternate. - Machine-friendly: Declares a formal relationship between the page and the OLAMIP file.
- Interoperability: Fits neatly into existing web standards, making it easier for AI systems to adopt without special handling.
2. <meta name=”olamip-location”> — Fallback Discovery
- Human-readable: Simple for webmasters to add and understand.
- Compatibility: Some parsers and tools prefer scanning <meta> tags for metadata.
- Redundancy: Acts as a backup if a crawler doesn’t yet support rel=”olamip”.
Why Both Together Are Stronger
- Future-proofing: As OLAMIP adoption grows, different systems may implement discovery differently. Including both ensures no system is left behind.
- Resilience: If one method fails (e.g., a crawler ignores <link> tags), the other provides a fallback.
- Ease of integration: Developers can choose whichever method fits their pipeline best, without forcing webmasters to guess.
- Trust and clarity: Dual signals reduce ambiguity and make it explicit where the OLAMIP file lives.
Best Practice Implementation
<link rel="olamip" href="https://yourdomain.com/olamip.json">
<meta name="olamip-location" content="https://yourdomain.com/olamip.json">
By including both tags, you guarantee that your OLAMIP file is discoverable by the widest range of crawlers, validators, and AI systems, ensuring your content is always read the way you intend.
File Structure
The OLAMIP file must be a valid UTF-8 encoded JSON document containing:
| Field | Description |
|---|---|
| protocol | Must be "OLAMIP". |
| version | Protocol version (e.g., "1.0"). |
| identity | Describes the website or organization. |
| content | Contains overview, sections, subsections, and entries. |
| metadata | File‑level metadata such as language and last update date. |
High‑level structure:
{
"protocol": "OLAMIP",
"version": "1.0",
"identity": { ... },
"content": { ... },
"metadata": { ... }
}
1. Identity Object
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | ✅ Yes | Name of the website or organization. |
| type | string | ✅ Yes | Entity type (e.g., “company”, “blog”, “ecommerce”). |
| canonical_description | string | ✅ Yes | Human‑readable description of the site. |
| tags | array | ❌ No | Optional keywords describing the domain or industry. |
2. Content Object
The content object contains:
- an overview
- a list of sections
- each section may contain subsections
- each section or subsection may contain entries
This supports multi‑level hierarchies.
Overview Object
| Field | Type | Required | Description |
|---|---|---|---|
| summary | string | ✅ Yes | A concise explanation of the website’s purpose. |
2.1 Section Object Specification
A Section represents a category, collection, or grouping of content. Sections may contain:
- entries (content items)
- subsections (nested Section objects)
This allows unlimited nesting depth.
Section Fields
| Field | Type | Required | Description |
|---|---|---|---|
| title | string | ✅ Yes | Human‑readable name of the section. |
| summary | string | ✅ Yes | Description of what the section contains. |
| url | string | ✅ Yes | Canonical URL of the section. |
| section_type | string | ✅ Yes | Semantic classification (see taxonomy). |
| policy | string | ❌ No | “allow” or “forbid”. See explanation below for inheritance and default behavior. |
| tags | array | ❌ No | Optional keywords. |
| priority | string | ❌ No | “high”, “medium”, or “low”. |
| published | string | ❌ No | ISO 8601 date. |
| entries | array | ❌ Yes | Array of Entry objects. |
| subsections | array | ❌ No | Array of nested Section objects. |
| language | string | ❌ No | Use BCP‑47 language codes |
Allowed section_type Values
| section_type | Meaning |
|---|---|
| blog_category | Groups blog articles. |
| news_section | Groups news articles. |
| product_collection | Groups products or services. |
| doc_category | Groups documentation pages. |
| research_category | Groups research papers or datasets. |
| project_group | Groups portfolio projects. |
| content_section | Generic fallback. |
Policy Behavior and Inheritance
The policy field controls whether AI systems are permitted to ingest the content represented by a Section, Subsection, or Entry. Valid values are "allow" and "forbid". This field is optional at all levels of the OLAMIP structure.
Default Behavior
If the policy field is omitted at a given level, the effective policy is determined through inheritance. If no ancestor defines a policy, the effective policy defaults to "allow".
Inheritance Rules
OLAMIP supports hierarchical inheritance of the policy field. AI systems must determine the effective policy for each Entry using the following lookup order:
- Entry-level policy If the Entry defines a
policy, that value is authoritative. - Subsection-level policy If the Entry omits
policy, AI systems must check the nearest Subsection that contains it. - Section-level policy If neither the Entry nor its Subsection defines
policy, AI systems must use the policy defined at the Section level. - Default policy If no ancestor defines a
policy, the effective policy is"allow".
Intended Webmaster Usage
- To make the entire website ingestible by AI systems, omit the
policyfield everywhere. - To control ingestion, use
"allow"and"forbid"selectively at any level of the hierarchy. - A policy applied at a Section or Subsection automatically applies to all descendants unless overridden.
AI System Requirements
AI systems must:
- Apply the default
"allow"only when no explicit policy exists in the ancestor chain. - Respect the effective policy determined through inheritance.
- Treat
"forbid"as a strict prohibition on ingestion. - Treat
"allow"as permission to ingest the content represented by that node.
Multi‑Level Hierarchy Diagram
content
└── sections[]
├── Section (Level 1)
│ ├── entries[]
│ └── subsections[]
│ ├── Section (Level 2)
│ │ ├── entries[]
│ │ └── subsections[]
│ │ └── Section (Level 3)
│ │ └── entries[]
│ └── ...
└── ...
This structure supports:
- News → Politics → Opinion → Articles
- Docs → API → Authentication → Pages
- Store → Clothing → Men → Jackets → Products
- Research → Physics → Quantum → Papers
2.2 Entry Object Specification
An Entry is the most granular content unit. Examples:
Blog article, news article, product page, documentation page, research paper, portfolio, project, legal page, downloadable resource.
Entry Fields
| Field | Type | Required | Description |
|---|---|---|---|
| title | string | ✅ Yes | Human‑readable title. |
| summary | string | ✅ Yes | Concise description of the content. |
| url | string | ✅ Yes | Canonical, absolute URL. |
| content_type | string | ✅ Yes | Semantic classification (see taxonomy). |
| policy | string | ❌ No | “allow” or “forbid”. Same as in Sections/Subsections. |
| tags | array | ❌ No | Optional keywords. |
| priority | string | ❌ No | “high”, “medium”, or “low”. |
| published | string | ❌ No | ISO 8601 publication date. |
| language | string | ❌ No | Use BCP‑47 language codes |
| metadata* | string | ❌ No | Domain or page‑specific structured information |
Arrays and Multi‑Value Fields
Some OLAMIP fields are designed to hold more than one value. Whenever a field contains multiple elements, such as tags, or any custom list defined within the optional metadata field of an Entry object; it must be expressed as a JSON array. Arrays are enclosed in square brackets ([ ]) and contain a comma‑separated sequence of individual string values.
Using arrays ensures that AI systems can reliably interpret multi‑value data without ambiguity. Each element inside the array must be a standalone string, and the order of elements should remain consistent whenever it carries semantic meaning.
*The metadata field is used to store domain‑specific or page‑specific structured information that extends beyond the core OLAMIP fields. This field allows publishers to include additional machine‑interpretable details relevant to their industry or content type, providing AI systems with richer contextual signals without altering the core protocol.
Why URLs Are Required
The URL field is essential because it serves as the canonical identifier for the content. While summaries convey the meaning of the content, URLs link that meaning to a specific and verifiable location on the web. AI systems utilize URLs for deduplication, retrieval, validation, and cross-referencing with schema.org, sitemaps, and crawlers.
Allowed content_type Values
General Pages
| content_type | Meaning |
|---|---|
| page | Standard content page. |
| landing_page | Marketing or campaign page. |
| legal_page | Terms, privacy, disclaimers. |
Blog
| content_type | Meaning |
|---|---|
| blog_article | A blog post. |
News
| content_type | Meaning |
|---|---|
| news_article | A news story. |
E‑commerce
| content_type | Meaning |
|---|---|
| product | A product page. |
| service | A service offering. |
Documentation
| content_type | Meaning |
|---|---|
| doc_page | A documentation or help page. |
Research
| content_type | Meaning |
|---|---|
| research_paper | Academic or scientific paper. |
| dataset | Research dataset. |
Portfolio
| content_type | Meaning |
|---|---|
| project | Portfolio project or case study. |
Media / Resources
| content_type | Meaning |
|---|---|
| media_item | Video, audio, gallery, etc. |
| resource | Downloadable or reference material. |
Multi‑Level Examples
Below are examples for different website types showing how subsections and entries work together.
Blog Website Example (3 Levels)
Blog
└── Photography
└── Tutorials
└── Articles
{
"title": "Blog",
"summary": "Articles and guides across multiple topics.",
"url": "https://example.com/blog/",
"section_type": "blog_category",
"policy": "allow",
"subsections": [
{
"title": "Photography",
"summary": "Articles about photography techniques and gear.",
"url": "https://example.com/blog/photography/",
"section_type": "blog_category",
"subsections": [
{
"title": "Tutorials",
"summary": "Step-by-step photography guides.",
"url": "https://example.com/blog/photography/tutorials/",
"section_type": "blog_category",
"entries": [
{
"title": "How to Shoot Long Exposure",
"summary": "A beginner-friendly guide to long exposure photography.",
"url": "https://example.com/blog/photography/tutorials/long-exposure/",
"content_type": "blog_article"
}
]
}
]
}
]
}
E‑commerce Website Example (Collections → Subcollections → Products)
Store
└── Clothing
└── Men
└── Jackets
{
"title": "Clothing",
"summary": "Apparel for all categories.",
"url": "https://store.com/clothing/",
"section_type": "product_collection",
"policy": "allow",
"subsections": [
{
"title": "Men",
"summary": "Men's apparel.",
"url": "https://store.com/clothing/men/",
"section_type": "product_collection",
"subsections": [
{
"title": "Jackets",
"summary": "Men's jackets and outerwear.",
"url": "https://store.com/clothing/men/jackets/",
"section_type": "product_collection",
"entries": [
{
"title": "Linen Jacket",
"summary": "Lightweight linen jacket for summer.",
"url": "https://store.com/products/linen-jacket/",
"content_type": "product"
}
]
}
]
}
]
}
Documentation Website Example (Category → Subcategory → Pages)
Docs
└── API
└── Authentication
└── Pages
{
"title": "API Documentation",
"summary": "Technical reference for developers.",
"url": "https://docs.example.com/api/",
"section_type": "doc_category",
"policy": "allow",
"subsections": [
{
"title": "Authentication",
"summary": "Guides for API authentication.",
"url": "https://docs.example.com/api/auth/",
"section_type": "doc_category",
"entries": [
{
"title": "API Key Authentication",
"summary": "How to authenticate using API keys.",
"url": "https://docs.example.com/api/auth/api-keys/",
"content_type": "doc_page"
}
]
}
]
}
Research Website Example (Field → Subfield → Papers)
Research
└── Physics
└── Quantum Mechanics
└── Papers
{
"title": "Physics",
"summary": "Research in classical and modern physics.",
"url": "https://research.example.com/physics/",
"section_type": "research_category",
"policy": "allow",
"subsections": [
{
"title": "Quantum Mechanics",
"summary": "Research papers on quantum theory.",
"url": "https://research.example.com/physics/quantum/",
"section_type": "research_category",
"entries": [
{
"title": "Quantum Entanglement in Multi‑Particle Systems",
"summary": "A study of entanglement behavior in complex quantum systems.",
"url": "https://research.example.com/papers/entanglement/",
"content_type": "research_paper"
}
]
}
]
}
Priority Field Guidelines
| Value | Meaning |
|---|---|
| high | Flagship, mission‑critical content. Use sparingly. |
| medium | Default for most content. |
| low | Niche, outdated, or low‑value content. |
Best Practices
| Recommendation | Reason |
|---|---|
| Limit high to 5–10% | Preserves meaning of the signal. |
| Default to medium | Ensures consistency. |
| Use low for niche/legacy content | Reduces noise. |
| Review priorities regularly | Keeps the file accurate. |
Why It Matters
LLMs may use "priority" to:
- Allocate more attention during training
- Rank pages for retrieval tasks
- Filter out less relevant content
If every page is marked "high", the signal becomes meaningless, and your most valuable content gets lost in the noise.
Why Categorical Priority Works Best
| Benefit | Explanation |
|---|---|
| Clarity & Consistency | “High/Medium/Low” is universally interpretable. |
| Simpler for Publishers | No numeric scoring required. |
| Easier to Validate | Tools can detect misuse. |
| Flexible for LLM Pipelines | Models can internally map categories to weights. |
Arrays and Multi‑Value Fields
Some OLAMIP fields are designed to hold more than one value. Whenever a field contains multiple elements, such as tags, or any custom list defined within the optional metadata field of an Entry object; it must be expressed as a JSON array. Arrays are enclosed in square brackets ([ ]) and contain a comma‑separated sequence of individual string values.
Using arrays ensures that AI systems can reliably interpret multi‑value data without ambiguity. Each element inside the array must be a standalone string, and the order of elements should remain consistent whenever it carries semantic meaning.
3. Metadata Object
"metadata": {
"last_updated": "2026-01-21",
"language": "en",
"source_url": "https://www.yourwebsite.com/",
"copyright": "© year Copyright Holder"
}
Multilanguage Support
To fully support multilingual websites, you should define language at:
File level (global default), inside metadata.
Section level (optional override).
Entry level (optional override).
This is essential for:
- multilingual blogs
- international news outlets
- research sites with papers in multiple languages
- e‑commerce stores with localized product pages
Language Format
Use BCP‑47 language codes, the global standard used by:
- schema.org
- HTML
langattribute - W3C
- search engines
- major LLM pipelines
Examples:
| Language | Code |
|---|---|
| English | en |
| Spanish | es |
| French | fr |
| German | de |
| Portuguese (Brazil) | pt-BR |
| Chinese (Simplified) | zh-CN |
| Arabic | ar |
Why This Matters for AI Systems
LLMs use language metadata to:
- choose the correct tokenizer
- apply the right summarization model
- avoid mixing languages in embeddings
- improve retrieval accuracy
- reduce hallucinations in multilingual contexts
- support cross‑language search and translation
Without explicit language fields, AI systems must guess, and they often guess wrong.
General Validation Rules
| Rule | Requirement |
|---|---|
| Valid JSON | No trailing commas or malformed structures. |
| Required fields | Sections and entries must include required fields. |
| Canonical URLs | Must be absolute and stable. |
| Summary length | Under 500 characters. |
| Tags | Lowercase, single‑word strings. |
| Subsections | Must follow the Section schema. |
Versioning
| Guideline | Purpose |
|---|---|
| Parsers ignore unknown fields. | Ensures forward compatibility. |
| Publishers validate against latest schema. | Ensures correctness. |
Semantic Alignment
OLAMIP complements existing structured data standards.
| Standard | Purpose |
|---|---|
| schema.org / JSON‑LD | Defines what a page is for search engines and knowledge graphs. |
| OLAMIP | Explains why the page matters and how LLMs should interpret it. |
Together
- schema.org → structural meaning
- OLAMIP → human‑curated interpretation
This dual‑layer approach improves AI comprehension and reduces hallucination.
Practical Integration
Recommended Workflow
| Step | Action |
|---|---|
| 1 | Keep schema.org markup in your HTML. |
| 2 | Add OLAMIP to describe meaning, importance, and structure. |
| 3 | Reference schema.org types or Wikidata IDs when helpful. |
| 4 | Keep summaries concise and factual. |
| 5 | Update OLAMIP regularly. |
Role Comparison
| Task | schema.org | OLAMIP |
|---|---|---|
| Describe structured entities | ✔️ | |
| Improve search engine visibility | ✔️ | |
| Provide human‑curated summaries | ✔️ | |
| Classify content for LLMs | ✔️ | |
| Prioritize important pages | ✔️ | |
| Provide multilingual context | ✔️ |
Delta Updates (olamip-delta.json)
To support continuous learning and efficient AI ingestion, OLAMIP includes an optional companion file:olamip-delta.json. This file contains only the changes since the last update to your main olamip.json file. It may include:
- added: new pages or products
- updated: modified summaries, tags, or metadata
- removed: URLs no longer present on your site
Delta files allow AI systems to stay synchronized with your content without reprocessing the entire dataset. Important: The main olamip.json must always remain fully updated and reflect the current state of your website. Delta files are incremental updates, not replacements.