File Format Specification

The OLAMIP file (/olamip.json) is a structured JSON document that provides curated summaries of your website’s most important pages. It is designed to be easily parsed by large language models (LLMs), enabling them to understand, prioritize, and use your content with clarity, precision, and intent.

An image of a computer's internal components with the AI letters on top of a microchip

File Location

  • The file must be hosted at the root of your domain: https://yourdomain.com/olamip.json

Declaring the OLAMIP File Location

To maximize adoption and ensure that all systems can reliably locate your OLAMIP file, I recommend publishing both a <link> tag and a <meta> tag in your site’s <head> section.

1. <link rel="olamip"> — Primary Discovery
  • Standardized practice: Crawlers and parsers already scan <link> tags for resources like canonical, sitemap, and alternate.
  • Machine-friendly: Declares a formal relationship between the page and the OLAMIP file.
  • Interoperability: Fits neatly into existing web standards, making it easier for AI systems to adopt without special handling.
2. <meta name=”olamip-location”> — Fallback Discovery
  • Human-readable: Simple for webmasters to add and understand.
  • Compatibility: Some parsers and tools prefer scanning <meta> tags for metadata.
  • Redundancy: Acts as a backup if a crawler doesn’t yet support rel=”olamip”.
Why Both Together Are Stronger
  • Future-proofing: As OLAMIP adoption grows, different systems may implement discovery differently. Including both ensures no system is left behind.
  • Resilience: If one method fails (e.g., a crawler ignores <link> tags), the other provides a fallback.
  • Ease of integration: Developers can choose whichever method fits their pipeline best, without forcing webmasters to guess.
  • Trust and clarity: Dual signals reduce ambiguity and make it explicit where the OLAMIP file lives.
Best Practice Implementation

By including both tags, you guarantee that your OLAMIP file is discoverable by the widest range of crawlers, validators, and AI systems, ensuring your content is always read the way you intend.

File Structure

The OLAMIP file must be a valid UTF-8 encoded JSON document containing:

FieldDescription
protocolMust be "OLAMIP".
versionProtocol version (e.g., "1.0").
identityDescribes the website or organization.
contentContains overview, sections, subsections, and entries.
metadataFile‑level metadata such as language and last update date.

High‑level structure:

1. Identity Object

FieldTypeRequiredDescription
namestring✅ YesName of the website or organization.
typestring✅ YesEntity type (e.g., “company”, “blog”, “ecommerce”).
canonical_descriptionstring✅ YesHuman‑readable description of the site.
tagsarray❌ NoOptional keywords describing the domain or industry.
2. Content Object

The content object contains:

  • an overview
  • a list of sections
  • each section may contain subsections
  • each section or subsection may contain entries

This supports multi‑level hierarchies.

Overview Object
FieldTypeRequiredDescription
summarystring✅ YesA concise explanation of the website’s purpose.
2.1 Section Object Specification

A Section represents a category, collection, or grouping of content. Sections may contain:

  • entries (content items)
  • subsections (nested Section objects)

This allows unlimited nesting depth.

Section Fields
FieldTypeRequiredDescription
titlestring✅ YesHuman‑readable name of the section.
summarystring✅ YesDescription of what the section contains.
urlstring✅ YesCanonical URL of the section.
section_typestring✅ YesSemantic classification (see taxonomy).
policystring❌ No“allow” or “forbid”. See explanation below for inheritance and default behavior.
tagsarray❌ NoOptional keywords.
prioritystring❌ No“high”, “medium”, or “low”.
publishedstring❌ NoISO 8601 date.
entriesarray❌ YesArray of Entry objects.
subsectionsarray❌ NoArray of nested Section objects.
languagestring❌ NoUse BCP‑47 language codes
Allowed section_type Values
section_typeMeaning
blog_categoryGroups blog articles.
news_sectionGroups news articles.
product_collectionGroups products or services.
doc_categoryGroups documentation pages.
research_categoryGroups research papers or datasets.
project_groupGroups portfolio projects.
content_sectionGeneric fallback.
Policy Behavior and Inheritance

The policy field controls whether AI systems are permitted to ingest the content represented by a Section, Subsection, or Entry. Valid values are "allow" and "forbid". This field is optional at all levels of the OLAMIP structure.

Default Behavior

If the policy field is omitted at a given level, the effective policy is determined through inheritance. If no ancestor defines a policy, the effective policy defaults to "allow".

Inheritance Rules

OLAMIP supports hierarchical inheritance of the policy field. AI systems must determine the effective policy for each Entry using the following lookup order:

  1. Entry-level policy If the Entry defines a policy, that value is authoritative.
  2. Subsection-level policy If the Entry omits policy, AI systems must check the nearest Subsection that contains it.
  3. Section-level policy If neither the Entry nor its Subsection defines policy, AI systems must use the policy defined at the Section level.
  4. Default policy If no ancestor defines a policy, the effective policy is "allow".
Intended Webmaster Usage
  • To make the entire website ingestible by AI systems, omit the policy field everywhere.
  • To control ingestion, use "allow" and "forbid" selectively at any level of the hierarchy.
  • A policy applied at a Section or Subsection automatically applies to all descendants unless overridden.
AI System Requirements

AI systems must:

  • Apply the default "allow" only when no explicit policy exists in the ancestor chain.
  • Respect the effective policy determined through inheritance.
  • Treat "forbid" as a strict prohibition on ingestion.
  • Treat "allow" as permission to ingest the content represented by that node.
Multi‑Level Hierarchy Diagram

This structure supports:

  • News → Politics → Opinion → Articles
  • Docs → API → Authentication → Pages
  • Store → Clothing → Men → Jackets → Products
  • Research → Physics → Quantum → Papers
2.2 Entry Object Specification

An Entry is the most granular content unit. Examples:

Blog article, news article, product page, documentation page, research paper, portfolio, project, legal page, downloadable resource.

Entry Fields
FieldTypeRequiredDescription
titlestring✅ YesHuman‑readable title.
summarystring✅ YesConcise description of the content.
urlstring✅ YesCanonical, absolute URL.
content_typestring✅ YesSemantic classification (see taxonomy).
policystring❌ No“allow” or “forbid”. Same as in Sections/Subsections.
tagsarray❌ NoOptional keywords.
prioritystring❌ No“high”, “medium”, or “low”.
publishedstring❌ NoISO 8601 publication date.
languagestring❌ NoUse BCP‑47 language codes
metadata*string❌ NoDomain or page‑specific structured information
Arrays and Multi‑Value Fields

Some OLAMIP fields are designed to hold more than one value. Whenever a field contains multiple elements, such as tags, or any custom list defined within the optional metadata field of an Entry object; it must be expressed as a JSON array. Arrays are enclosed in square brackets ([ ]) and contain a comma‑separated sequence of individual string values.

Using arrays ensures that AI systems can reliably interpret multi‑value data without ambiguity. Each element inside the array must be a standalone string, and the order of elements should remain consistent whenever it carries semantic meaning.

*The metadata field is used to store domain‑specific or page‑specific structured information that extends beyond the core OLAMIP fields. This field allows publishers to include additional machine‑interpretable details relevant to their industry or content type, providing AI systems with richer contextual signals without altering the core protocol.

Why URLs Are Required

The URL field is essential because it serves as the canonical identifier for the content. While summaries convey the meaning of the content, URLs link that meaning to a specific and verifiable location on the web. AI systems utilize URLs for deduplication, retrieval, validation, and cross-referencing with schema.org, sitemaps, and crawlers.

Allowed content_type Values
General Pages
content_typeMeaning
pageStandard content page.
landing_pageMarketing or campaign page.
legal_pageTerms, privacy, disclaimers.
Blog
content_typeMeaning
blog_articleA blog post.
News
content_typeMeaning
news_articleA news story.
E‑commerce
content_typeMeaning
productA product page.
serviceA service offering.
Documentation
content_typeMeaning
doc_pageA documentation or help page.
Research
content_typeMeaning
research_paperAcademic or scientific paper.
datasetResearch dataset.
Portfolio
content_typeMeaning
projectPortfolio project or case study.
Media / Resources
content_typeMeaning
media_itemVideo, audio, gallery, etc.
resourceDownloadable or reference material.
Multi‑Level Examples

Below are examples for different website types showing how subsections and entries work together.

Blog Website Example (3 Levels)
E‑commerce Website Example (Collections → Subcollections → Products)
Documentation Website Example (Category → Subcategory → Pages)
Research Website Example (Field → Subfield → Papers)
Priority Field Guidelines
ValueMeaning
highFlagship, mission‑critical content. Use sparingly.
mediumDefault for most content.
lowNiche, outdated, or low‑value content.
Best Practices
RecommendationReason
Limit high to 5–10%Preserves meaning of the signal.
Default to mediumEnsures consistency.
Use low for niche/legacy contentReduces noise.
Review priorities regularlyKeeps the file accurate.
Why It Matters

LLMs may use "priority" to:

  • Allocate more attention during training
  • Rank pages for retrieval tasks
  • Filter out less relevant content

If every page is marked "high", the signal becomes meaningless, and your most valuable content gets lost in the noise.

Why Categorical Priority Works Best
BenefitExplanation
Clarity & Consistency“High/Medium/Low” is universally interpretable.
Simpler for PublishersNo numeric scoring required.
Easier to ValidateTools can detect misuse.
Flexible for LLM PipelinesModels can internally map categories to weights.
Arrays and Multi‑Value Fields

Some OLAMIP fields are designed to hold more than one value. Whenever a field contains multiple elements, such as tags, or any custom list defined within the optional metadata field of an Entry object; it must be expressed as a JSON array. Arrays are enclosed in square brackets ([ ]) and contain a comma‑separated sequence of individual string values.

Using arrays ensures that AI systems can reliably interpret multi‑value data without ambiguity. Each element inside the array must be a standalone string, and the order of elements should remain consistent whenever it carries semantic meaning.

3. Metadata Object

Multilanguage Support

To fully support multilingual websites, you should define language at:

File level (global default), inside metadata.

Section level (optional override).

Entry level (optional override).

This is essential for:

  • multilingual blogs
  • international news outlets
  • research sites with papers in multiple languages
  • e‑commerce stores with localized product pages

Language Format

Use BCP‑47 language codes, the global standard used by:

  • schema.org
  • HTML lang attribute
  • W3C
  • search engines
  • major LLM pipelines

Examples:

LanguageCode
Englishen
Spanishes
Frenchfr
Germande
Portuguese (Brazil)pt-BR
Chinese (Simplified)zh-CN
Arabicar
Why This Matters for AI Systems

LLMs use language metadata to:

  • choose the correct tokenizer
  • apply the right summarization model
  • avoid mixing languages in embeddings
  • improve retrieval accuracy
  • reduce hallucinations in multilingual contexts
  • support cross‑language search and translation

Without explicit language fields, AI systems must guess, and they often guess wrong.

General Validation Rules

RuleRequirement
Valid JSONNo trailing commas or malformed structures.
Required fieldsSections and entries must include required fields.
Canonical URLsMust be absolute and stable.
Summary lengthUnder 500 characters.
TagsLowercase, single‑word strings.
SubsectionsMust follow the Section schema.

Versioning

GuidelinePurpose
Parsers ignore unknown fields.Ensures forward compatibility.
Publishers validate against latest schema.Ensures correctness.

Semantic Alignment

OLAMIP complements existing structured data standards.

StandardPurpose
schema.org / JSON‑LDDefines what a page is for search engines and knowledge graphs.
OLAMIPExplains why the page matters and how LLMs should interpret it.
Together
  • schema.org → structural meaning
  • OLAMIP → human‑curated interpretation

This dual‑layer approach improves AI comprehension and reduces hallucination.

Practical Integration

Recommended Workflow
StepAction
1Keep schema.org markup in your HTML.
2Add OLAMIP to describe meaning, importance, and structure.
3Reference schema.org types or Wikidata IDs when helpful.
4Keep summaries concise and factual.
5Update OLAMIP regularly.
Role Comparison
Taskschema.orgOLAMIP
Describe structured entities✔️
Improve search engine visibility✔️
Provide human‑curated summaries✔️
Classify content for LLMs✔️
Prioritize important pages✔️
Provide multilingual context✔️

Delta Updates (olamip-delta.json)

To support continuous learning and efficient AI ingestion, OLAMIP includes an optional companion file:olamip-delta.json. This file contains only the changes since the last update to your main olamip.json file. It may include:

  • added: new pages or products
  • updated: modified summaries, tags, or metadata
  • removed: URLs no longer present on your site

Delta files allow AI systems to stay synchronized with your content without reprocessing the entire dataset. Important: The main olamip.json must always remain fully updated and reflect the current state of your website. Delta files are incremental updates, not replacements.