Skip to main content

Ingestion & Processing

Uploading a document stores it, but ingestion is what extracts text/chunks and indexes content so assistants and chats can retrieve it.

In the API reference, these endpoints are under Ingestion.

What gets ingested?

Ingestion operates on a document id (returned by document upload). Ingestion typically produces:

  • extracted text
  • chunked “nodes” (the searchable units)
  • indexing metadata and status

Process a single document

Endpoint: POST /api/ingestion/documents/{document_id}

This queues ingestion work for the given document id.

It supports a sync query param:

  • sync=false (default): enqueue work and return immediately
  • sync=true: perform ingestion synchronously before returning

Async (recommended):

curl -X POST "https://app.noema.ai/api/ingestion/documents/$DOCUMENT_ID?sync=false" \
-H "Authorization: Bearer $ACCESS_TOKEN"

Synchronous (waits for completion):

curl -X POST "https://app.noema.ai/api/ingestion/documents/$DOCUMENT_ID?sync=true" \
-H "Authorization: Bearer $ACCESS_TOKEN"

## Create & inspect ingestion records

These endpoints are useful for status inspection and debugging.

- Create record: `POST /api/ingestion/documents`
- Get record: `GET /api/ingestion/documents/{document_id}`
- Delete record: `DELETE /api/ingestion/documents/{document_id}`

Example (create):

```bash
curl -X POST "https://app.noema.ai/api/ingestion/documents" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"document_id":"'$DOCUMENT_ID'"}'

Example (get):

curl "https://app.noema.ai/api/ingestion/documents/$DOCUMENT_ID" \
-H "Authorization: Bearer $ACCESS_TOKEN"

Inspect / delete nodes (chunks)

Endpoint: GET /api/ingestion/documents/{document_id}/nodes

curl "https://app.noema.ai/api/ingestion/documents/$DOCUMENT_ID/nodes" \
-H "Authorization: Bearer $ACCESS_TOKEN"

Endpoint: DELETE /api/ingestion/documents/{document_id}/nodes

curl -X DELETE "https://app.noema.ai/api/ingestion/documents/$DOCUMENT_ID/nodes" \
-H "Authorization: Bearer $ACCESS_TOKEN"

## Scan and process an entire source

For connector-based sources (SharePoint, Website, SQLDatabase, etc.), you can scan and process at the source level.

Scan a source (discover items):

```bash
curl -X POST "https://app.noema.ai/api/ingestion/sources/my-source/scan" \
-H "Authorization: Bearer $ACCESS_TOKEN"

Process a source (ingest all documents in the source):

curl -X POST "https://app.noema.ai/api/ingestion/sources/my-source" \
-H "Authorization: Bearer $ACCESS_TOKEN"

Test a source connection:

curl -X POST "https://app.noema.ai/api/ingestion/sources/my-source/test-connection" \
-H "Authorization: Bearer $ACCESS_TOKEN"

## Ingestion settings

Ingestion behavior can be controlled via settings objects.

- List: `GET /api/ingestion/settings`
- Create: `POST /api/ingestion/settings`
- Exists: `HEAD /api/ingestion/settings/{id}`
- Get/update/delete: `GET|PUT|DELETE /api/ingestion/settings/{id}`

```bash
curl "https://app.noema.ai/api/ingestion/settings" \
-H "Authorization: Bearer $ACCESS_TOKEN"

Deprecated endpoints

Some older ingestion routes remain for backwards compatibility under /api/ingestion/source/.... Prefer /api/ingestion/sources/... and /api/ingestion/documents/... for new integrations.


<head>
<title>Ingestion & Processing</title>
</head>