Ingestion settings
Ingestion settings control how documents are parsed, chunked, and enriched during ingestion. They’re used by the ingestion service when processing documents from document sources.
At runtime, the ingestion pipeline typically uses:
source.ingestionSettings(if a document source has explicit settings)- otherwise a default ingestion settings record (if one exists)
- otherwise a built-in fallback (for some file types like spreadsheets)
API endpoints
All endpoints require Authorization: Bearer <ACCESS_TOKEN>.
GET /api/ingestion/settings— list settingsPOST /api/ingestion/settings— create settingsHEAD /api/ingestion/settings/{id}— check existence (200if exists,204if not)GET /api/ingestion/settings/{id}— fetch by idPUT /api/ingestion/settings/{id}— updateDELETE /api/ingestion/settings/{id}— delete (204)
What’s inside an ingestion settings object
Ingestion settings are a single object that includes:
- LLM provider + embedding model (used for embedding / enrichment)
- parser (how raw content is extracted)
- splitter (how text is chunked)
- extractors (optional enrichment stages)
Type selection uses discriminator field _t in each sub-object.
Parser providers (parser._t)
Common parser providers include:
azure-document-intelligencellama-parsedefault(if enabled in your deployment)
Splitter types (splitter._t)
Common splitter types include:
sentence(chunkSize/chunkOverlap)token(chunkSize/chunkOverlap/separator)semantic(bufferSize/breakpointPercentileThreshold)markdown,markdown-table,html
Extractor types (extractors[]. _t)
Optional extractors can enrich the ingestion output:
keywordmetadatasummaryquestions-answeredtitle
Create example
This example mirrors the ingestion-service integration tests.
{
"name": "pytest-test-settings",
"default": false,
"provider": {
"_t": "azure-openai",
"endpoint": "https://YOUR-RESOURCE.openai.azure.com",
"apiKey": "YOUR_AZURE_OPENAI_API_KEY",
"apiVersion": "2024-02-01",
"deployment": "gpt-4",
"embeddingDeployment": "text-embedding-3-large"
},
"embeddingModel": "text-embedding-3-large",
"embeddingDimensions": 3072,
"parser": {
"_t": "azure-document-intelligence",
"endpoint": "https://YOUR-DI.cognitiveservices.azure.com",
"apiKey": "YOUR_AZURE_DOCUMENT_INTELLIGENCE_KEY",
"model": "prebuilt-layout",
"locale": "en",
"format": "markdown"
},
"splitter": {
"_t": "sentence",
"chunkSize": 2048,
"chunkOverlap": 128
},
"extractors": [
{
"_t": "keyword",
"keywords": 10
}
],
"excludeMetadata": [],
"extractMetadata": []
}