Service-Layer Performance Budgets¶
Scope¶
This document defines performance budgets for the heavy service-layer sync and job paths that already have explicit runtime ownership, locks or tracked-operation boundaries.
The budgets are intentionally practical:
targetis the normal operating envelope;alertmeans the path is degrading and should page/emit an operational signal;hardis the maximum tolerated runtime before the path must be split, chunked or rescheduled.
For TaskIQ jobs, the hard budget must never exceed the effective Redis lock timeout for the same path.
Global Rules¶
- A service/job path that is forecast to exceed its hard budget must be chunked or delegated to a longer-running workflow, not left inline.
- Alert budget breaches should produce telemetry before lock timeouts are exhausted.
- Long-running sweeps must preserve restartability and partial-progress safety.
- Budget reviews happen when a path changes batching strategy, upstream provider mix or fan-out behavior.
Budget Matrix¶
| Domain / path | Target | Alert | Hard | Operational note |
|---|---|---|---|---|
market_data coin history sync (run_coin_history_job, keyed coin sync) |
5m |
15m |
30m |
Hard budget aligns with COIN_HISTORY_LOCK_TIMEOUT_SECONDS; if a single coin needs longer, split backfill windows. |
market_data latest history refresh sweep |
5m |
10m |
15m |
Must stay inside HISTORY_REFRESH_LOCK_TIMEOUT_SECONDS; provider stalls should fall back to later refresh rather than hold the global sweep lock. |
market_data history backfill sweep |
15m |
45m |
60m |
Uses the longest global lock; wide backfills should chunk by symbol groups and resume safely. |
market_structure single-source poll |
30s |
90s |
120s |
Must fit the per-source poll lock. |
market_structure enabled-source poll sweep |
2m |
4m |
5m |
Sweep must not monopolize the global enabled-poll lock. |
market_structure source health refresh |
30s |
2m |
3m |
Health refresh is lightweight; sustained overruns indicate source-count growth or hidden IO drift. |
news single-source poll |
30s |
90s |
120s |
Bound by per-source poll lock and upstream provider responsiveness. |
news enabled-source poll sweep |
2m |
4m |
5m |
Cursor-driven sweeps should remain incremental; oversized source sets must shard rather than extend the lock. |
portfolio balance sync |
1m |
3m |
4m |
Hard budget equals PORTFOLIO_SYNC_LOCK_TIMEOUT_SECONDS; exchange fan-out must stay bounded. |
predictions pending-evaluation sweep |
1m |
4m |
5m |
If evaluation volume grows, batch windows must split rather than hold the singleton evaluation lock. |
anomalies enrichment / sector / market-structure scans |
1m |
4m |
5m |
Keyed scans are safe to rerun; long anomaly batches should partition by trigger tuple. |
hypothesis_engine evaluation sweep |
1m |
4m |
5m |
Operation-tracked evaluation should surface long-running status before the singleton lock is exhausted. |
patterns bootstrap / statistics / market-structure refresh |
30m |
90m |
120m |
These are intentionally heavyweight analytics jobs; over-budget runs must checkpoint and resume instead of expanding lock time. |
patterns discovery / strategy discovery |
60m |
180m |
240m |
Discovery is the longest-running path; hard budget aligns with the existing 4h locks and should not grow further without repartitioning. |
Review Checklist¶
- Does the path have a target/alert/hard budget recorded here?
- Is the hard budget still below or equal to the real runtime lock/operation boundary?
- If the path exceeded alert budget, was the cause batching, provider latency or hidden cross-domain fan-out?
- If the path approaches hard budget, can it be chunked or resumed without violating ADR 0010 and ADR 0014?