# `pit_rules.md` — Point-in-Time Semantics Contract

**Version:** 0.1.0
**Status:** `stable`
**Scope:** defines the bitemporal contract the whole system depends on. Every table, every API endpoint, every backtest built on top inherits these rules. Breaking any of them invalidates downstream strategies.

## 1. Timestamps

Three timestamps per filing, all UTC, all stored on both `filings` and (denormalized) on `standardized_facts_pit`.

| Field | Who sets it | Authoritative? | Use |
|---|---|---|---|
| `filed_at_utc` | Filer, in the filing header | **No.** Filer-controlled. | Diagnostic only. |
| `accepted_at_utc` | EDGAR on receipt | **Yes — primary PIT timestamp.** | Strict bitemporal selection; "what did EDGAR know?" |
| `market_available_at` | Derived per §2 | **Yes — strategy-usable timestamp.** | Backtester-safe selection; "what could a strategy act on?" |

Both authoritative timestamps are stored. The caller picks which one drives the PIT cutoff via the `availability` query parameter.

## 2. Market availability policy

Inputs: `accepted_at_utc` (UTC), NYSE session calendar.
Output: `market_available_at` (UTC).

```
Convert accepted_at_utc → America/New_York (`accepted_at_et`).
Let d  = accepted_at_et.date()
Let t  = accepted_at_et.time()

If d is a NYSE session day:
    If 09:30 ≤ t ≤ 16:00   → market_available_at = accepted_at_utc     (intraday)
    If t < 09:30           → market_available_at = 09:30 ET on d        (pre-market)
    If t > 16:00           → market_available_at = 09:30 ET on next session day
Else (weekend / holiday)   → market_available_at = 09:30 ET on next session day
```

NYSE calendar: weekends excluded, federal holidays per `holidays.NYSE()` (the `holidays` Python package's NYSE calendar). Half-days are treated as full sessions for availability purposes — a filing accepted before 13:00 ET on a 13:00-close day is `market_available_at = accepted_at_utc`.

**Implementation:** `pitdata/services/availability.py::compute_market_available_at`. Pinned by tests in `tests/test_smoke.py` and `tests/conftest.py` fixtures.

### 2.1 Edge cases and how they are handled

| Case | Behavior |
|---|---|
| Filing accepted 09:28 ET Monday | `market_available_at = 09:30 ET Monday`. |
| Filing accepted 20:00 ET Friday | `market_available_at = 09:30 ET next session Monday (or Tuesday if Monday is a holiday)`. |
| Filing accepted 11:00 ET on July 4 (holiday) | `market_available_at = 09:30 ET next session day`. |
| Filing accepted 10:00 ET on a half-day (close 13:00) | Intraday → `market_available_at = accepted_at_utc`. |
| Missing `accepted_at_utc` | Filing is **rejected** at ingestion time, not defaulted. PIT correctness cannot survive a guessed acceptance time. |
| Pre-market override | Not supported in v1. Strategies that trade pre-market must use `availability=accepted` explicitly and understand the trade-off. |

### 2.2 Non-overridable

`market_available_at` is a deterministic function of `accepted_at_utc`. It is never overwritten, never backfilled differently for the same filing, and never diverges between the `filings` row and its child `standardized_facts_pit` rows.

## 3. The PIT selection query

This is the canonical query. Every API endpoint that returns a fact executes some form of it.

```sql
-- "What did we know about CIK = :cik, concept = :concept,
--  fiscal_year = :fy, fiscal_period = :fp as of :as_of, for a strategy
--  trading under :availability rules?"
SELECT *
FROM standardized_facts_pit
WHERE cik           = :cik
  AND concept       = :concept
  AND fiscal_year   = :fy
  AND fiscal_period = :fp
  AND {availability_col} <= :as_of
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY cik, concept, period_end, fiscal_period
  ORDER BY {availability_col} DESC, accepted_at_utc DESC, pit_fact_id DESC
) = 1;
```

`{availability_col}` is `market_available_at` when `availability = market` (default) and `accepted_at_utc` when `availability = accepted`. The secondary sort on `accepted_at_utc DESC` and the deterministic tiebreaker on `pit_fact_id DESC` guarantee reproducibility when two facts share a timestamp.

**Partition key is `(cik, concept, period_end, fiscal_period)`, not `(cik, concept, fiscal_year, fiscal_period)`.** Period end is more stable across amendments and fiscal-calendar oddities.

**Implementation:** `pitdata/repositories/fundamentals.py::get_fact` (with a thin `get_latest_fact` back-compat shim).

### 3.1 Semantics guarantees

1. **No look-ahead.** A fact is returned only if its chosen availability column `≤ as_of`. Equal is allowed.
2. **Amendments win by default.** When `versioning = latest` (default), the partition is ordered DESC on the availability column, so `/A` filings with a later `accepted_at_utc` shadow the original. The caller can inspect `is_restated` and `supersedes_pit_fact_id` to see lineage.
3. **Original is still reachable two ways.** Either query with an `as_of` before the amendment's `market_available_at`, or pass `versioning = original`, which swaps the sort to ASC and returns the earliest accepted fact whose availability is `≤ as_of`. Restatements never rewrite history; both rows live in `standardized_facts_pit` forever.
4. **Deterministic tiebreak.** If two filings somehow land at the same timestamp, `pit_fact_id DESC` (or `ASC` in `original` mode) picks the same row every run. Ties are rare but must be reproducible.

### 3.2 Versioning modes

| `versioning` | Sort on availability | What it returns |
|---|---|---|
| `latest` (default) | DESC | Restatement-aware current best knowledge. Correct default for research. |
| `original` | ASC | As-filed value before any 10-K/A or 10-Q/A rewrite. Use for strict as-filed backtests that reject hindsight. |

Both modes honor the same `availability <= as_of` filter; `versioning` only changes which end of the surviving timeline is chosen.

## 4. Amendment and restatement lineage

Each standardized fact carries `supersedes_pit_fact_id`. The link is populated by `resolve_supersessions` (`pitdata/db/store.py`), which runs after every ingest and once during `ensure_seeded_database`.

**Algorithm.** For each `(cik, concept, period_end, fiscal_period)` group, order facts by `(accepted_at_utc ASC, pit_fact_id ASC)`. For row `n > 1`, set `supersedes_pit_fact_id = pit_fact_id_of_row(n-1)` and `is_restated = TRUE`. Row 1 has `NULL` and `FALSE`.

**Consequences.**

- An original filing's PIT row is never updated or deleted when the amendment arrives. It is shadowed at query time by the later row, but directly queryable.
- A single `pit_fact_id` supersedes exactly one prior `pit_fact_id`. Multi-level restatements chain: `A ← B ← C`. Callers asking "what was the original?" walk `supersedes_pit_fact_id` recursively.
- Supersession is group-scoped, not form-scoped. A 10-K/A might restate Q1 facts that were reported in the 10-Q; both groups get their own chain.

## 5. Period semantics

The API surfaces two families of period.

### 5.1 Reported periods

Appear directly in the filing. No arithmetic beyond mapping.

| `period` value | Meaning | Typical source form | Typical filings |
|---|---|---|---|
| `FY` | Fiscal year | 10-K | 10-K, 10-K/A, 20-F, 40-F |
| `Q1`..`Q4` | Quarter | 10-Q, 10-K (Q4 only) | 10-Q, 10-Q/A |
| `YTD` | Year-to-date as of `period_end` | 10-Q | 10-Q |

The DEI tag `dei:DocumentFiscalPeriodFocus` drives this value (`FY`, `Q1`, `Q2`, `Q3`). Q4 is implied by a 10-K. `YTD` is emitted when the raw fact's duration spans more than one quarter in a 10-Q (e.g., the 9-month duration on a Q3 10-Q).

### 5.2 Derived periods

Computed values. Lives in `derived_facts_pit` (reserved) or computed at query time; either way the response **must** advertise the derivation.

| `period` value | Definition | Derivation |
|---|---|---|
| `TTM` | Trailing twelve months | `sum_of_quarters` — four latest non-overlapping quarters (duration concepts) or latest instant (balance-sheet concepts). |

**Contract:** `/v1/fundamentals` **never** silently returns a derived value. A caller requesting `period=TTM` gets a response that includes:

```json
{
  "derivationMethod": "sum_of_quarters",
  "sourcePitFactIds": ["pf_...", "pf_...", "pf_...", "pf_..."]
}
```

If the required operands are not all available as of `as_of`, the response omits the fact rather than returning a partial TTM.

### 5.3 Period-type guardrails

- `period_type` on the raw and standardized rows is `duration` or `instant`. Mapping the wrong type to a concept is a hard error: flow-concept (`Revenue`, `NetIncome`) rows must be `duration`; stock-concept (`TotalAssets`, `CashAndEquivalents`) rows must be `instant`. Enforced by `FactMapping.period_type` during standardization.
- TTM for instants returns the **latest** instant `≤ as_of`, not a sum. TTM for durations returns the sum of the trailing four non-overlapping quarters. The `derivationMethod` string disambiguates.

## 6. Fiscal calendar

`fiscal_year` and `fiscal_period` come from DEI tags (`dei:DocumentFiscalYearFocus`, `dei:DocumentFiscalPeriodFocus`) with fallback to `period_end` only when both DEI tags are absent. **Never** derived from `report_date + form_type` alone — that misses non-calendar-year filers (e.g., Apple's FY ends September).

Both `fiscal_year`/`fiscal_period` and `period_end` are retained so the caller can ask either "FY2023" or "as of calendar date X".

## 7. Invariants (enforced in code and tests)

1. `market_available_at ≥ accepted_at_utc` for every filing.
2. For every `(cik, concept, period_end, fiscal_period)` group ordered by `accepted_at_utc ASC`, the `supersedes_pit_fact_id` column is `NULL` on row 1 and equals the prior row's `pit_fact_id` on all others.
3. The `/v1/fundamentals` default response contains no derived values.
4. `period_type` of every standardized fact matches its concept's declared `period_type`.
5. Resolving ticker → CIK uses `ticker_history` filtered by `as_of`. Querying fact tables by ticker directly is forbidden.

## 8. What happens when we get it wrong

A PIT violation is a **blocking quality bug** — it invalidates backtests built on the surface that produced it. Treatment:

- Revert the offending commit.
- Identify the `(cik, concept, period_end)` range affected.
- Add a regression test that would have caught it.
- Re-parse affected filings from bronze.

No partial re-parses are shipped; the store is drop-and-rebuild from the raw layer until the invariant holds everywhere.

## 9. Change log

| Version | Date | Change |
|---|---|---|
| 0.1.0 | 2026-04-24 | Initial; codifies `compute_market_available_at`, PIT selection, DEI-sourced fiscal identity, and `resolve_supersessions` semantics as implemented and tested. |
