Scheduling DDL Windows Across Multiple Timezones

When a single logical schema change fans out to shards whose primaries live in APAC, EMEA, and AMER data centres, “run it at 2am” has no single meaning — the scheduler has to find a window that is quiet everywhere it needs to be and land the cutover inside it despite daylight-saving shifts.

Context

This page sits under Tracking Migration Progress and State Machines: the state machine tells you whether a migration is safe to cut over, and the window scheduler tells you when you are allowed to release that barrier. Both are part of the broader practice of Online DDL Orchestration & Migration Coordination. The timing problem only exists because shards are physically distributed — the geographic placement of primaries is decided upstream in Vitess Sharding Architecture & Topology Design, and the scheduler consumes that placement as input rather than changing it.

Vitess does not enforce timezone-aware scheduling on its own. vtctldclient will submit an ALTER TABLE the instant you ask, and each VTTablet executes on its own primary’s local clock. The scheduling layer is an external control that decides when the cutover directive is issued, then relies on the postponed-completion pattern so that no shard swaps traffic until every shard is inside its approved window.

The concept: intersecting rolling troughs, not offsetting a clock

Two mistakes make naive schedulers unsafe. The first is treating a timezone as a fixed offset. America/New_York is UTC−5 in January and UTC−4 in July; hard-coding either produces a window that drifts by an hour twice a year and eventually collides with the morning traffic ramp. The second is scheduling each shard independently. Because the cutover is barrier-gated across the whole migration_context group, the migration is bottlenecked by the union of every participating region’s peak — you need a UTC interval where all target regions are simultaneously in their low-traffic trough, which is a much narrower slot than any single region’s overnight window.

So the correct model is: express each region’s acceptable window in its own IANA zone, project every window onto absolute UTC using a DST-aware library, then take the intersection of those intervals. For a fleet spanning more than ~9–10 hours of longitude the intersection can be empty — at which point you either split the change into per-region batches (each with its own barrier group) or accept a partial-fleet cutover that the state machine tracks as separate contexts.

Solution: a DST-aware window resolver in Python

Use the standard-library zoneinfo module (Python 3.9+) so DST transitions come from the system tz database rather than hand-maintained offsets. The resolver below takes a per-region low-traffic window in local wall-clock time, projects each onto a concrete UTC date, and returns the intersection.

from datetime import datetime, timedelta, time
from zoneinfo import ZoneInfo

# One acceptable local low-traffic window per region that hosts shards.
REGION_WINDOWS = {
    "APAC": ("Asia/Singapore",   time(1, 0), time(4, 0)),
    "EMEA": ("Europe/Frankfurt", time(2, 0), time(5, 0)),
    "AMER": ("America/New_York",  time(1, 0), time(4, 0)),
}

def to_utc_interval(zone: str, start: time, end: time, on: datetime) -> tuple[datetime, datetime]:
    tz = ZoneInfo(zone)
    # Anchor the wall-clock window to a calendar day IN THAT ZONE, then convert.
    local_start = datetime.combine(on.date(), start, tzinfo=tz)
    local_end = datetime.combine(on.date(), end, tzinfo=tz)
    if local_end <= local_start:          # window crosses local midnight
        local_end += timedelta(days=1)
    return local_start.astimezone(ZoneInfo("UTC")), local_end.astimezone(ZoneInfo("UTC"))

def intersect(intervals: list[tuple[datetime, datetime]]) -> tuple[datetime, datetime] | None:
    lo = max(i[0] for i in intervals)
    hi = min(i[1] for i in intervals)
    return (lo, hi) if lo < hi else None      # None == no common trough

def next_window(regions: list[str], on: datetime) -> tuple[datetime, datetime] | None:
    intervals = [to_utc_interval(*REGION_WINDOWS[r], on=on) for r in regions]
    return intersect(intervals)

Because astimezone resolves the offset from the tz database for that specific date, the March/November DST boundaries are handled for free — the projected UTC interval simply shifts by an hour on the correct day. Anchoring datetime.combine with tzinfo=tz (rather than converting a naive UTC time) is what keeps “2am local” meaning 2am wall-clock in each region.

The resolver only decides when. The cutover itself uses the postponed-completion pattern so that submission and traffic-switch are decoupled: submit early with the barrier held, then complete only once the clock is inside the intersected window.

# Submit ahead of the window; every shard copies rows and then HOLDS at cutover.
vtctldclient ApplySchema \
  --ddl-strategy "vitess --postpone-completion" \
  --migration-context "orders-idx-2026q3" \
  --sql "ALTER TABLE orders ADD INDEX idx_customer (customer_id)" \
  commerce

import subprocess
from datetime import datetime, timezone

def release_if_in_window(keyspace: str, context: str, uuid: str, regions: list[str]) -> bool:
    now = datetime.now(timezone.utc)
    window = next_window(regions, now)
    if window is None:
        raise RuntimeError(f"no common low-traffic window for {regions}; batch per region")
    start, end = window
    if not (start <= now <= end):
        return False                      # not yet / already past — poll again next tick
    # Barrier + clock both satisfied: release the postponed cutover.
    subprocess.check_call(["vtctldclient", "OnlineDDL", "complete", keyspace, uuid])
    return True

The window is None branch is the load-bearing one: rather than silently cutting over at a bad time, the scheduler refuses and forces the operator to split the change — for which the barrier mechanics are covered in Coordinating Multi-Shard Schema Migrations.

Edge cases and gotchas

DST “spring forward” gaps. A window defined as 02:00–03:00 in a zone that skips 02:00→03:00 on the transition night is a non-existent wall-clock interval. zoneinfo will still resolve it (folding to the post-transition offset), but the usable minutes collapse — widen the window to 01:00–04:00 so it survives the gap on both boundary nights.
“Fall back” fold ambiguity. On the autumn transition, 01:00–02:00 local occurs twice. astimezone picks fold=0 (the first occurrence) by default; if your trough is genuinely two hours long that night, do not assume the window doubled — treat the ambiguous hour as a single hour.
Clock skew between primaries. NTP/PTP skew of even a few seconds means “inside the window” is fuzzy at the edges. Apply a tolerance margin (shrink the usable interval by ~30s on each side) so a shard whose clock runs fast does not release the barrier a hair early.
Empty intersection across wide fleets. Any fleet spanning more than ~9 hours of longitude will routinely produce None. Do not “fix” this by relaxing regional windows into peak hours — split into per-region migration_context batches instead, each tracked as its own run.
Window shorter than the cutover cost. The intersection may be 40 minutes, but a metadata-lock wait plus atomic rename on a hot table can exceed that. Verify the empirical cutover duration (from prior runs on the same keyspace) fits inside the interval before releasing; a rename that starts at the tail of the window can finish after traffic has ramped.
Regional blackout collisions. Compliance freezes, batch-billing runs, and holiday peaks are additional exclusions on top of the traffic trough. Subtract them from the per-region window before intersecting, or the scheduler will happily cut over into a payroll batch.
Choosing the strategy changes the window budget. External copy tools generally hold a wider vulnerable window than native VReplication; weigh that against control granularity in Vitess Native Online DDL vs External Tools before sizing the interval.

Verification

Confirm the cutover actually landed inside the intended window, not just that it completed. Every shard records its cutover timestamp in _vt.schema_migrations; pull them as UTC and check the spread:

vtctldclient OnlineDDL show --json commerce orders-idx-2026q3 \
  | jq -r '.[] | [.shard, .completed_timestamp] | @tsv'

Every completed_timestamp should fall between the resolved start and end for the run’s date, and the max-minus-min across shards should be small (a few minutes). A shard whose timestamp lands outside the window — or a wide spread — means the barrier released late on one primary, usually from cutover contention against a long-running transaction; that lock-wait failure mode is diagnosed in resolving gh-ost lock contention in sharded MySQL. Cross-check against the traffic dashboard for each region: QPS at the recorded cutover time should sit in the trough, not on the ramp.

FAQ

Why not just schedule everything at 00:00 UTC?

Midnight UTC is mid-afternoon in APAC and late morning in AMER — peak for both. A fixed UTC time only works if every target shard is in roughly the same zone; the whole point of the intersection is that it is not.

Can I let each shard cut over in its own local window instead of a shared one?

Only if the change is genuinely independent per region. A barrier-gated migration keeps the VTGate routing layer consistent by swapping all shards in a narrow common window; releasing shards on their own local clocks means query routing can observe a mix of old and new table definitions for hours.

How do I keep the tz database current?

The offsets come from the OS tz data, so DST rule changes (which governments alter with little notice) only apply if the scheduler host’s tzdata package is patched. Treat tzdata as a monitored dependency on the orchestrator image.

Tracking Migration Progress and State Machines — the barrier and per-shard state the window scheduler gates on.
Coordinating Multi-Shard Schema Migrations — holding and releasing the cutover barrier across a keyspace, and splitting into per-region batches.
Configuring VTTablet for High Availability — the per-shard primaries whose local clocks and headroom bound the window.

For DST and fold semantics see the Python zoneinfo documentation; for the native cutover mechanics the window gates, see the Vitess Online DDL reference.

← Back to Tracking Migration Progress and State Machines

Scheduling DDL Windows Across Multiple Timezones

Context #

The concept: intersecting rolling troughs, not offsetting a clock #

Solution: a DST-aware window resolver in Python #

Edge cases and gotchas #

Verification #

FAQ #

Why not just schedule everything at 00:00 UTC? #

Can I let each shard cut over in its own local window instead of a shared one? #

How do I keep the tz database current? #

Related #