Ingestion & Processing
Uploading a document stores it, but ingestion is what extracts text/chunks and indexes content so assistants and chats can retrieve it.
In the API reference, these endpoints are under Ingestion.
What gets ingested?
Ingestion operates on a document id (returned by document upload). Ingestion typically produces:
- extracted text
- chunked “nodes” (the searchable units)
- indexing metadata and status
Process a single document
Endpoint: POST /api/ingestion/documents/{document_id}
This queues ingestion work for the given document id.
It supports a sync query param:
sync=false(default): enqueue work and return immediatelysync=true: perform ingestion synchronously before returning
Async (recommended):
curl -X POST "https://app.noema.ai/api/ingestion/documents/$DOCUMENT_ID?sync=false" \
-H "Authorization: Bearer $ACCESS_TOKEN"
Synchronous (waits for completion):
curl -X POST "https://app.noema.ai/api/ingestion/documents/$DOCUMENT_ID?sync=true" \
-H "Authorization: Bearer $ACCESS_TOKEN"
## Create & inspect ingestion records
These endpoints are useful for status inspection and debugging.
- Create record: `POST /api/ingestion/documents`
- Get record: `GET /api/ingestion/documents/{document_id}`
- Delete record: `DELETE /api/ingestion/documents/{document_id}`
Example (create):
```bash
curl -X POST "https://app.noema.ai/api/ingestion/documents" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"document_id":"'$DOCUMENT_ID'"}'
Example (get):
curl "https://app.noema.ai/api/ingestion/documents/$DOCUMENT_ID" \
-H "Authorization: Bearer $ACCESS_TOKEN"
Inspect / delete nodes (chunks)
Endpoint: GET /api/ingestion/documents/{document_id}/nodes
curl "https://app.noema.ai/api/ingestion/documents/$DOCUMENT_ID/nodes" \
-H "Authorization: Bearer $ACCESS_TOKEN"
Endpoint: DELETE /api/ingestion/documents/{document_id}/nodes
curl -X DELETE "https://app.noema.ai/api/ingestion/documents/$DOCUMENT_ID/nodes" \
-H "Authorization: Bearer $ACCESS_TOKEN"
## Scan and process an entire source
For connector-based sources (SharePoint, Website, SQLDatabase, etc.), you can scan and process at the source level.
Scan a source (discover items):
```bash
curl -X POST "https://app.noema.ai/api/ingestion/sources/my-source/scan" \
-H "Authorization: Bearer $ACCESS_TOKEN"
Process a source (ingest all documents in the source):
curl -X POST "https://app.noema.ai/api/ingestion/sources/my-source" \
-H "Authorization: Bearer $ACCESS_TOKEN"
Test a source connection:
curl -X POST "https://app.noema.ai/api/ingestion/sources/my-source/test-connection" \
-H "Authorization: Bearer $ACCESS_TOKEN"
## Ingestion settings
Ingestion behavior can be controlled via settings objects.
- List: `GET /api/ingestion/settings`
- Create: `POST /api/ingestion/settings`
- Exists: `HEAD /api/ingestion/settings/{id}`
- Get/update/delete: `GET|PUT|DELETE /api/ingestion/settings/{id}`
```bash
curl "https://app.noema.ai/api/ingestion/settings" \
-H "Authorization: Bearer $ACCESS_TOKEN"
Deprecated endpoints
Some older ingestion routes remain for backwards compatibility under /api/ingestion/source/....
Prefer /api/ingestion/sources/... and /api/ingestion/documents/... for new integrations.
<head>
<title>Ingestion & Processing</title>
</head>