Documentation

Indexa reference

Everything to take a source from raw rows to a live, queryable REST API — declaratively.

Introduction

Writing an indexer by hand means re-solving the same hard problems every time: tracking where you left off, resuming after a crash without double-writing, backfilling history while tailing new data, and exposing a query API. Indexa solves these once. You only describe your data.

declare (yaml + optional handlers)  →  indexa deploy  →  REST query API

Install

Requires Node.js ≥ 22.5 — Indexa uses the built-in node:sqlite, so the default setup needs zero database installation.

$ npm install        # installs js-yaml (pg is optional, only for Postgres)
$ npm link           # optional: makes the `indexa` command global

Quickstart

Scaffold a starter project, then deploy. That backfills the sample data into SQLite and starts a REST API on port 4000.

$ indexa init my-app
$ cd my-app
$ indexa deploy --config indexa.config.yaml

$ curl localhost:4000/                       # metadata + schema
$ curl "localhost:4000/orders?status=paid"   # filter
$ curl "localhost:4000/orders/1"             # get by id

The config file

Everything lives in one file: a name, a source, a target, and a schema of output entities. If a stream's raw columns already match an entity (case/plural-insensitive: orders → Order), Indexa maps them automatically — no handler needed.

name: orders-indexer

source:
  type: csv                       # csv | postgres | evm | <your own>
  sources:
    - { key: orders, file: data/orders.csv }

target:
  type: sqlite                    # sqlite | postgres
  path: ./orders.db

schema:                           # your output entities
  Order:
    id: ID                        # every entity needs an id
    customer: String
    total: BigDecimal
    status: String
    items: Int
    created_at: Timestamp

Field types

Plus references to other entities. ${ENV_VAR} and ${ENV_VAR:-default} interpolation is supported anywhere.

ID String Int Float Boolean BigInt BigDecimal JSON Timestamp

Handlers

Add a handlers: ./handlers.js line only when raw data needs transforming or aggregating. Each source row is processed exactly once, so read-modify-write increments are safe.

export default {
  async orders(row, ctx) {
    await ctx.store.upsert('Order', row.id, {
      id: row.id, customer: row.customer, total: row.total, status: row.status,
    });

    // Stateful aggregate: read prior entity, then write.
    const c = await ctx.store.get('Customer', row.customer);
    await ctx.store.upsert('Customer', row.customer, {
      id: row.customer, name: row.customer,
      totalSpent: (c ? Number(c.totalSpent) : 0) + Number(row.total),
      orderCount: (c ? Number(c.orderCount) : 0) + 1,
    });
  },
};

Run indexa types to generate indexa-types.d.ts for autocomplete on entities and ctx.store.

REST endpoints

From your schema, every entity becomes a REST resource — no code.

GET /App metadata + full schema

GET /<entity>List with filtering

GET /<entity>/:idFetch one

GET /_healthHealth check

Query params: any schema field (?status=paid&customer=Bob), plus limit, offset, orderBy, desc=true. Try them live in the API Explorer →

CLI

$ indexa init [dir]                  scaffold a starter project
$ indexa deploy --config <file>      backfill + live tail + API
        [--port 4000] [--once] [--no-api]
$ indexa validate --config <file>    check config without running
$ indexa types --config <file>       generate TypeScript types

--once runs a single backfill and stops (good for batch jobs / CI). Without it, Indexa keeps polling the source for new rows (pollIntervalMs, default 2000).

Sources

Built-in: csv (zero-dependency files), postgres (tails a table by a monotonic cursor column), and evm (indexes blockchain events with automatic reorg handling). A connector exposes ordered streams with a monotonic cursor; the engine persists the cursor of the last written record inside the same transaction as the writes — that is what makes resumption idempotent.

source:
  type: postgres
  connection: ${SOURCE_DB_URL}
  tables:
    - { key: orders, table: orders, cursorColumn: updated_at }

Write your own by implementing init / streams / close, then register it:

import { registerConnector } from 'indexa';
import KafkaConnector from './kafka-connector.js';
registerConnector('kafka', KafkaConnector);

Targets

sqlite (default, built-in) or postgres (requires npm install pg). Tables are created automatically from your schema.

How it works

 source connector ──stream(cursor)──▶ engine ──transaction──▶ target store ──▶ REST API
                                       │                        ▲
                                       └─ checkpoint persisted ─┘ (same txn = idempotent)

1. Backfill — drain each stream from its last checkpoint until caught up. 2. Live tail — poll for records after the cursor on an interval. 3. Idempotency — entity writes + checkpoint advance commit together; a crash never double-writes or skips.

Reorg handling

The evm connector handles chain reorganizations automatically. Every entity write made while indexing an unfinalized block is recorded in an undo journal. When a previously-indexed block hash changes, the engine rolls the affected entities back to the last common ancestor — including aggregated values like running balances — then re-indexes the new canonical chain. The full walkthrough is in the EVM guide →

Deploy with Docker

# put your config + handlers + data under ./app, then:
$ docker build -t my-indexer .
$ docker run -p 4000:4000 -e CONFIG=app/indexa.config.yaml my-indexer

The image has a healthcheck on /_health and respects INDEXA_LOG_LEVEL. Use Postgres as the target for production.

Roadmap

Deliberately left out to keep the core small and the "just deploy" promise intact:

GraphQLA resolver set generated from the same schema-derived layer as REST.

Parallel backfillPartition by cursor range for faster historical sync.

FirehoseA push-based transport (Substreams/Firehose) as a connector for higher throughput.