Configuring Lookup Vindexes for Cross-Shard Joins

A sharded table can only route efficiently on its primary sharding key. The moment an application filters or joins on a second column — fetching an order by order_id when the table is sharded by customer_id — VTGate has no way to compute the target shard from the value, so it fans the query out to every shard and reassembles the result. A lookup vindex removes that penalty: it maintains a reverse-index table that maps the secondary column to a keyspace ID, letting the proxy resolve the exact shard before dispatch. This page shows how to define, backfill, and operate a lookup vindex so that cross-shard access patterns collapse back into targeted single-shard reads, and how to keep the mapping table consistent with its base table under production write load.

This is a subsystem of VSchema configuration and routing rule management; if you have not yet grounded the four core routing abstractions — keyspace, VSchema, vindex, routing rule — read that overview first, then return here for the lookup-specific mechanics.

Prerequisites

Before configuring a lookup vindex, confirm the following are in place:

Vitess 14.0 or later. consistent_lookup and consistent_lookup_unique are stable from v14; earlier lookup/lookup_unique types work but do not update the mapping transactionally. Examples here use vtctldclient (the v15+ control-plane client) rather than the deprecated vtctlclient.
A sharded keyspace with a working primary vindex. Every routable table already needs a primary vindex that places rows on INSERT. If you are still choosing that key, settle it via the keyspace partitioning model before adding secondary lookups — the lookup vindex sits on top of an existing sharding decision, it does not replace one.
Familiarity with VSchema grammar. The vindexes and column_vindexes blocks, owner semantics, and param syntax are defined in mastering VSchema syntax and structure. This page assumes you can read a VSchema document.
VEXPLAIN access through VTGate. You will use VEXPLAIN PLAN <sql> to prove that a query routes single-shard before and after the change.
A maintenance path for backfill. Populating the mapping table for existing rows requires either a vreplication-style workflow or a batched INSERT ... SELECT, coordinated as part of Online DDL orchestration.

How a Lookup Vindex Resolves a Query

A lookup vindex is a two-phase indirection. Where a functional vindex such as hash computes the keyspace ID arithmetically from the column value, a lookup vindex stores the mapping in an ordinary MySQL table and reads it at plan time.

Read path. When VTGate parses a query whose WHERE predicate matches a lookup-vindex column, the planner issues a first hop against the lookup (mapping) table: SELECT keyspace_id FROM <lookup_table> WHERE <from_col> = ?. That returns one keyspace ID (for a _unique lookup) or a small set (for a non-unique lookup). Each keyspace ID falls inside exactly one shard’s key range, so the planner now knows precisely which shard or shards hold the rows and dispatches the second hop only there. The scatter is replaced by a bounded lookup plus one or a few targeted reads. This first-hop routing is executed by the VTGate routing architecture, which caches the parsed plan but not the row-level mapping result.

Write path and the consistency obligation. The mapping table only helps if it stays truthful. This is where ownership matters. An owned lookup vindex (declared with owner: <table>) is maintained by Vitess itself: on every INSERT, UPDATE, or DELETE of the base table, VTGate writes the corresponding mapping row in the same logical operation. A consistent_lookup vindex goes further and makes that write transactional with the base-row write using Vitess’s two-phase commit, so the base row and its mapping entry can never diverge even across shard boundaries — the same 2PC machinery described in handling cross-shard transactions in Vitess. An unowned lookup vindex, by contrast, is a read-only view onto a mapping that some other table owns (or that is maintained out of band); if writes bypass Vitess, an unowned mapping drifts and starts routing reads to the wrong shard.

The practical rule: if the lookup column belongs to the table you are sharding and writes flow through Vitess, make the vindex owned and consistent_lookup. Reserve unowned lookups for the case where two tables share one physical mapping and only one of them owns it.

Step-by-Step Implementation

The following sequence adds an owned consistent_lookup_unique vindex on customer.email, where customer is sharded by id using hash. Each step is independently verifiable.

1. Create the backing mapping table

The lookup vindex needs a physical table to store the email → keyspace_id mapping. Shard it identically to its base table so the mapping row lives on the same shard as the row it points at — otherwise the first hop itself becomes a cross-shard join. Create it in the same sharded keyspace:

CREATE TABLE customer_lookup (
  email        VARBINARY(128) NOT NULL,
  keyspace_id  VARBINARY(8)   NOT NULL,
  PRIMARY KEY (email)
);

Apply the schema through your normal DDL path so it lands on every shard. Verify it exists on each shard before proceeding:

vtctldclient GetSchema --tables customer_lookup <tablet-alias>

2. Declare the vindex in VSchema

Add the vindex definition and bind it to the base table’s email column. The owner field is what makes Vitess maintain the mapping automatically:

{
  "sharded": true,
  "vindexes": {
    "hash": { "type": "hash" },
    "customer_email_lookup": {
      "type": "consistent_lookup_unique",
      "params": {
        "table": "commerce.customer_lookup",
        "from": "email",
        "to": "keyspace_id"
      },
      "owner": "customer"
    }
  },
  "tables": {
    "customer": {
      "column_vindexes": [
        { "column": "id",    "name": "hash" },
        { "column": "email", "name": "customer_email_lookup" }
      ]
    },
    "customer_lookup": {
      "column_vindexes": [
        { "column": "email", "name": "hash" }
      ]
    }
  }
}

Note that customer_lookup is itself a routed table with email as its primary vindex (hash) — that is what colocates each mapping row with its base row. The first entry in customer’s column_vindexes (id/hash) remains the primary vindex that places the row; email is the secondary lookup.

3. Apply the VSchema

Write the document into the topology server. Every VTGate watches for the change and reloads asynchronously:

vtctldclient ApplyVSchema \
  --vschema-file commerce.vschema.json \
  commerce

Because propagation is eventually consistent, treat this as the start of a rollout, not a completed change — poll each VTGate for the new version before relying on the new routing. Gating the apply behind an async VSchema validation workflow turns an invalid mapping definition into a build failure instead of a production incident.

4. Backfill the mapping for existing rows

A freshly declared owned vindex only maintains mappings for rows written after it exists. Rows already in customer have no entry in customer_lookup, so reads filtering on their email still scatter. Backfill them. For a large table, use a batched INSERT ... SELECT that computes the keyspace ID with the same function the primary vindex uses, run in key-range chunks to avoid long-running transactions:

INSERT INTO customer_lookup (email, keyspace_id)
SELECT email, keyspace_id_of(id)
FROM customer
WHERE id BETWEEN ? AND ?
ON DUPLICATE KEY UPDATE keyspace_id = VALUES(keyspace_id);

For very large or actively written tables, prefer a vreplication workflow so the backfill runs continuously and picks up in-flight writes; sequence the backfill inside your migration coordination as described in coordinating multi-shard schema migrations. Do not activate any read path that assumes complete coverage until the backfill is verified complete.

5. Automate apply-and-verify from Python

Platform teams should never fire-and-forget a routing change. The durable pattern is an idempotent apply followed by a convergence poll that blocks until every serving VTGate reports the expected VSchema and the mapping row count matches the base table:

import subprocess
import time


def apply_and_wait(keyspace: str, vschema_path: str,
                   gates: list[str], timeout_s: int = 120) -> None:
    """Apply a VSchema and block until the lookup mapping is complete."""
    subprocess.run(
        ["vtctldclient", "ApplyVSchema",
         "--vschema-file", vschema_path, keyspace],
        check=True,
    )
    deadline = time.monotonic() + timeout_s
    backoff = 1.0
    while True:
        base = _count(keyspace, "SELECT COUNT(*) FROM customer")
        mapped = _count(keyspace, "SELECT COUNT(*) FROM customer_lookup")
        if mapped >= base:
            return
        if time.monotonic() > deadline:
            raise TimeoutError(
                f"lookup backfill incomplete: {mapped}/{base}")
        time.sleep(backoff)
        backoff = min(backoff * 2, 8.0)

Driving lookup rollout through validation, an idempotent apply, and topology-state polling keeps the routing layer deterministic even while orchestration rewrites it underneath live traffic.

Configuration Reference

The vindex params and the VTGate flags that bound lookup behaviour under load:

Setting	Type	Default	Recommended (production)
`type` (`consistent_lookup_unique` vs `consistent_lookup`)	enum	—	`_unique` when the column is 1:1 with a row; non-unique only when it genuinely maps to many rows
`owner`	string	none (unowned)	set to the base table so Vitess maintains the mapping transactionally
`params.from`	string	—	the secondary column(s); comma-separated for a composite lookup
`params.to`	string	—	`keyspace_id` (must be the mapping table’s keyspace-ID column)
`params.autocommit`	bool	`false`	leave `false` for owned lookups so the mapping write joins the base transaction
`--query_timeout` (VTGate)	duration	`0` (unbounded)	`30s` — cap any lingering scatter before the lookup is complete
`--max_memory_rows` (VTGate)	int	`300000`	`100000` — protect `VTGate` heap if a lookup miss falls back to scatter
`--transaction_mode` (VTGate)	enum	`MULTI`	`TWOPC` is required for `consistent_lookup` to update mapping and base row atomically across shards

The single most common misconfiguration is a lookup table sharded differently from its base table — it turns the routing hop itself into a cross-shard join and quietly defeats the entire optimisation. The second is choosing consistent_lookup (non-unique) for a column that is actually unique, which stores redundant rows and slows the read hop. Match the type to the real cardinality of the column.

Failure Modes

Lookup-vindex failures present as latency and wrong-shard reads, rarely as loud errors. The named scenarios below are the ones that page.

Lookup vindex drift. Root cause: an unowned lookup table, or writes that bypass Vitess (direct MySQL access, an out-of-band backfill), leave the mapping out of sync with the base table. Symptoms: “row not found” on an email that demonstrably exists; a growing gap between COUNT(*) on the base table and the mapping table. Mitigation: prefer owned consistent_lookup so mapping and base row commit together; run a periodic reconciliation job that diffs the two tables; rebackfill affected key ranges when drift is detected.

Incomplete backfill exposed early. Root cause: the read path is activated before step 4 finishes, so pre-existing rows have no mapping entry and their lookups either miss or fall back to scatter. Symptoms: a subset of keys — always the older ones — either scatter or return empty while newly written keys route correctly. Mitigation: block activation on the mapping-count check from step 5; keep --query_timeout bounded so any residual scatter cannot exhaust connection pools.

Cross-shard mapping hop. Root cause: the lookup table is sharded on a different key than its base table (or placed in a keyspace that does not colocate the mapping row with the base row). Symptoms: VEXPLAIN shows the routing itself producing a scatter before the target read; latency barely improves versus the un-optimised query. Mitigation: give the lookup table the same primary vindex as its base table’s lookup column so mapping and base rows land on the same shard.

2PC contention under write bursts. Root cause: a consistent_lookup vindex on a hot write path forces a distributed transaction for every base-row mutation. Symptoms: elevated TWOPC transaction latency and lock waits during write spikes. Mitigation: confirm the column truly needs transactional consistency; for append-mostly patterns a non-transactional owned lookup may suffice; tune write batching. Sustained high-QPS tuning of these structures — plan caching, buffer-pool sizing, and avoiding multi-column lookups — is covered in optimizing vindex performance for high QPS.

Verification

Prove the configuration works before and after, at three levels:

Plan level. Confirm the query now routes to one shard instead of scattering. Run through VTGate:

VEXPLAIN PLAN SELECT * FROM customer WHERE email = 'a@example.com';

A correct result shows an OperatorType of Route with a Vindex of customer_email_lookup and a single-shard (EqualUnique/Equal) variant — not Scatter. Seeing Scatter here means the mapping is missing, drifting, or the column is not bound to the lookup vindex.

Data level. Confirm the mapping is complete and consistent:

SELECT
  (SELECT COUNT(*) FROM customer)        AS base_rows,
  (SELECT COUNT(*) FROM customer_lookup) AS mapped_rows;

The counts should match for a _unique lookup. A persistent shortfall indicates an incomplete backfill or active drift.

Metric level. Watch VTGate’s scatter/single-shard query ratio (exported query-execution metrics) around the rollout: the single-shard hit-rate for queries filtering on the lookup column should rise toward 100% and stay there. A regression back toward scatter is the earliest signal of drift and should alert.

For dynamic traffic shifting on top of a working lookup — for instance pinning reads during a migration — combine this with dynamic routing rules and query rewriting, which redirects a query’s target before vindex resolution runs.

Mastering VSchema Syntax and Structure — the full VSchema grammar behind the vindexes and column_vindexes blocks used here.
Optimizing VIndex Performance for High QPS — plan caching, buffer-pool sizing, and lookup-table tuning at sustained high throughput.
Dynamic Routing Rules and Query Rewriting — redirect and shift traffic on top of a working vindex during migrations.
Async VSchema Validation Workflows — gate every lookup-vindex change behind plan simulation and CI/CD checks.
Handling Cross-Shard Transactions in Vitess — the 2PC machinery that keeps a consistent_lookup mapping atomic with its base row.

← Back to VSchema Configuration & Routing Rule Management

Configuring Lookup Vindexes for Cross-Shard Joins

Prerequisites #

How a Lookup Vindex Resolves a Query #

Step-by-Step Implementation #

1. Create the backing mapping table #

2. Declare the vindex in VSchema #

3. Apply the VSchema #

4. Backfill the mapping for existing rows #

5. Automate apply-and-verify from Python #

Configuration Reference #

Failure Modes #

Verification #

Related #

Go deeper

Related in VSchema & Routing Rules