top-100 methodology + provenance review v1

thesis -> mechanism -> lever -> one step

thesis: у нас уже есть не просто список biomarker-ов, а многослойная логика отбора; проблема не в отсутствии мысли, а в том, что provenance распределен по разным артефактам и не везде одинаково strong.
mechanism: разводим 4 слоя происхождения решений: guideline/evidence, expert/ops, founder/cohort fit, health-system fit, а затем смотрим, что из этого реально попало в v1 и что изменилось в v2.
lever: это дает честную картину: где мы стоим на твердых official anchors, где на рабочей product logic, а где у нас явный research debt.
one step: использовать этот review как board-prep перед top100_biomarker_scorecard_v2.csv и top100_biomarker_board_decisions_v2.md.

1) executive read

что у нас уже было

v1 строился как core-70 + gated-30 для mostly healthy preventive cohort.
база логики была правильная:
hard gates
score model
retest discipline
preference for reproducibility over prestige

что стало ясно при пересмотре

для Health и high-stress founder cohort v1 был слишком generic.
цена в scoring мешала честно поднять полезные, но более сложные маркеры.
endocrine / recovery / HPA / vascular mechanics / skin-hair layer были недоэксплицированы.
часть v2 сейчас стоит на сильной product/system logic, но еще не полностью закрыта official-source provenance.

что получилось

v2 теперь зафиксирован как отдельный canon:
ai/data/biomarker_top100_selection_framework_v2.md
ai/data/biomarker_top100_rerun_protocol_v2.md
ai/data/system_age_contribution_model_v2.md
exact top-100 теперь относится только к orderable lab biomarkers.
derivations и companion modules вынесены отдельно.
hpa best-read lock сейчас такой:
cortisol_am
cortisol_pm
spot_urinary_cortisol_creatinine
urinary_metanephrines_fractionated
urinary_3_methoxytyramine
24h urinary free cortisol оставлен только как fallback / confirmatory path.

2) where the current logic actually came from

A. guideline / official evidence layer

главные anchors лежат в:
- ai/data/sources.md
- ai/data/panel_evidence_matrix_v1.csv

что там уже strong:
- lipids / apoB / Lp(a) / CAC:
- C-101, C-105, C-122, C-131
- dysglycemia / glucose / HbA1c / OGTT:
- C-102
- kidney / eGFR / albuminuria:
- C-103, C-104
- thyroid caution / repeat logic:
- C-118
- vitamin D caution:
- C-119
- iron / ferritin:
- C-120
- liver fibrosis gate (FIB-4):
- C-121
- imaging boundaries:
- C-107, C-108, C-123, C-124, C-125, C-126, C-127, C-128
- vascular mechanics / PWV:
- C-133, C-134
- VO2max:
- C-129

B. expert / operational layer

главные anchors:
- ai/context/stanislav_skakun_biodata_call_2026_02_28_v1.md
- ai/data/first_panel_decision_packet_v1.md

что оттуда пришло:
- phased MVP instead of full-checkup from day 1
- baseline anchor:
- cbc
- expanded biochemistry
- hs-crp
- insulin / hba1c / glucose context
- tsh
- sex hormone block
- ultrasound-first and partner MRI/endoscopy
- reserve sample / reflex logic
- cIMT as adjunct, not standalone trigger

C. founder / cohort-fit layer

главный anchor:
- ai/context/founder_input_2026_02_22_25_v1.md

что оттуда пришло:
- target cohort = high-agency founders / operators
- raw data + API + provenance > black-box score
- core biomarkers ~100-130
- skin baseline + ecg/vitals + optional dexa/cgm
- "money is not primary"
- machine-readable longitudinal self-use as filter

D. health-system fit layer

главный anchor:
- ai/data/top100_blood_urine_health_audit_v1.md

что оттуда пришло:
- generic top-100 надо сверять с реальной Health truth-layer
- urine block оказался главным фактическим debt
- igf1 surfaced as serious add-candidate
- generic design было полезно, но не до конца закрывало endocrine/adrenal reading

E. equipment / companion-module doctrine

anchors:
- ai/data/mvp_equipment_philosophy_v1.md
- ai/data/vascular_intelligence_layer_v1.md
- ai/data/vascular_intelligence_ops_v1.md

что оттуда пришло:
- PWV как core vascular mechanics signal
- skin / dermoscopy / face / hair as machine-readable longitudinal companion layer
- hair as real endocrine / stress / nutrient-response surface, not cosmetic garnish

3) methodology evolution: v1 -> v2

layer	v1	v2
cohort	generic preventive mostly healthy	high-stress / high-travel founders, men + women
score	`bps v1` included `cost_efficiency`	`bps v2` removed price
count	biomarkers + derivations + non-lab items were too mixed	exact count = only orderable lab biomarkers
system-age	present but lighter endocrine weight	endocrine / recovery materially upgraded
hpa read	too flat / under-specified	dynamic read closed explicitly
vascular	blood + imaging boundary present	blood + `PWV` + carotid + CAC tied into one loop
skin/hair	present in equipment doctrine	now treated as required companion layer

4) provenance map by cluster

cluster	why it is in the model	primary provenance	status
cbc / hematology	baseline oxygen / marrow / inflammation context	framework logic + evidence matrix, but no clean hematology claim ids yet	`medium / gap`
cmp / chemistry / liver / electrolytes	intervention safety + organ context	evidence matrix, but chemistry anchors still partly `gap`	`medium / gap`
glycemic core	strongest preventive longitudinal loop	`C-102` + framework + system-age model	`strong`
kidney / urine / uacr / cystatin	silent drift + serum-only miss protection	`C-103`, `C-104` + audit	`strong`
lipids / apoB / Lp(a) / CAC	causal vascular burden + reclassification	`C-101`, `C-105`, `C-122`, `C-131`	`strong`
thyroid	baseline sanity + repeat caution	`C-118`	`strong for caution`, `medium for broad default use`
iron / ferritin / TSAT	common false-action zone; needs context	`C-120` + framework logic	`medium-strong`
vitamin D	common panel item but not strong default-screening evidence	`C-119`	`cautionary strong`
liver fibrosis gate	cheap escalation logic	`C-121`	`strong`
imaging exclusions	avoid prestige/incidental theater	`C-107`, `C-108`, `C-123`, `C-124`, `C-125`, `C-126`, `C-127`, `C-128`	`strong`
VO2max	powerful fitness/mortality signal	`C-129` + expert/founder tension	`strong signal`, `operationally mixed`
vascular mechanics / PWV	closes vascular age loop beyond lipids alone	`C-133`, `C-134` + vascular docs	`strong`
skin / melanoma baseline	companion longitudinal surface layer	founder + equipment doctrine; repo has cautionary `C-106`, but not a strong routine-screening support anchor	`product-strong / evidence-gap`
hair / trichoscopy baseline	companion endocrine/stress-visible layer	equipment doctrine + founder fit; no formal source registry anchor yet	`product-strong / evidence-gap`
endocrine / androgen closure	required for useful free-T and recovery read	founder fit + Health fit + framework v2; repo lacks strong direct claim anchors	`important / evidence-gap`
adrenal / HPA / SNS closure	critical for this cohort, absent from v1 logic	Health fit + framework v2; repo currently lacks strong source-registry anchors for spot urinary cortisol, urinary metanephrines, 3-methoxytyramine	`important / evidence-gap`
GlycA / holoTC / IGF-1 / apoE / NMR-LDL-P / sdLDL	promoted in v2 because they sharpen system-age or discordance reads	mainly v2 product/system logic; not yet fully backed by repo claim ids	`medium / gap`

5) what we were really relying on

strong-evidence zones

glycemic screening and OGTT reflex logic
kidney / albuminuria logic
lipid / apoB / Lp(a) / CAC hierarchy
imaging exclusions and screening boundaries
PWV as vascular mechanics signal

medium zones

thyroid inclusion:
strong caution evidence
weaker routine-screening enthusiasm
iron / ferritin / TSAT:
useful, but context-dependent
VO2max:
strong signal, but packaging/utilization question remains

weak or missing-by-repo zones

HPA / SNS best-read stack
spot urinary cortisol / creatinine
urinary metanephrines fractionated
urinary 3-methoxytyramine
IGF-1 in this cohort
GlycA
holoTC default policy
apoE baseline-consent logic
skin / hair companion layers as medical/protocol objects rather than premium UX objects

6) what v2 changed because of cohort fit

explicit additions

free_t3
fasting_c_peptide
igf_1
cortisol_pm
spot_urinary_cortisol_creatinine
urinary_metanephrines_fractionated
urinary_3_methoxytyramine
glyca
holotranscobalamin
apoe_genotype
apoa1
nmr_ldl_particle_number
small_dense_ldl

explicit closure rules

testosterone is not interpreted without:
total_testosterone
shbg
albumin
derived free_testosterone
adrenal/HPA is not interpreted from morning cortisol alone
vascular age is not treated as closed without PWV

explicit counting cleanup

out of exact 100:
free_t_calc
homa_ir
uacr
fib4
non_hdl
apob_apoa1
tsat
kept outside count as companion modules:
PWV
carotid protocol
skin mapping / dermoscopy
face skin analysis
hair diagnostics
ECG
DEXA
VO2max

7) the HPA block we locked now

best-read block:
- cortisol_am
- cortisol_pm
- spot_urinary_cortisol_creatinine
- urinary_metanephrines_fractionated
- urinary_3_methoxytyramine

why this is the chosen shape:
- am + pm gives diurnal shape
- spot urinary cortisol gives a more integrated read without 24h collection fragility
- urinary metanephrines separate sympathetic / catecholamine load from pure cortisol story
- 3-methoxytyramine adds extra catecholamine-metabolite nuance when we want the fullest read

what we explicitly rejected as default:
- 24h urinary free cortisol

why:
- too much collection friction
- weak routine longitudinal compliance for this cohort
- bad default if the real product is repeatable tracking

8) what came out of the review

the good

there was already a real selection engine, not a random lab wishlist.
the repo already had good anchors for cardio-metabolic, kidney, vascular boundaries, and imaging exclusions.
the move to v2 is not a vibe-shift; it is a coherent response to the actual intended cohort and Health system-age use case.

the weak

provenance is uneven.
some v2 promotions are currently justified more by product/system logic than by explicit claim ids in the repo.
HPA / SNS / endocrine closure is directionally right, but still under-cited.
skin/hair are clearly part of the product, but still under-anchored as research-backed longitudinal modules.

the actual result now

we have a cleaner doctrine.
we do not yet have the final exact v2 locked 100.
the next artifact must be a scorecard + board decision pass, not another prose-only brainstorm.

9) what must be added next

highest-priority provenance debt

HPA / SNS evidence anchors:
- spot urinary cortisol / creatinine
- urinary metanephrines fractionated
- urinary 3-methoxytyramine
endocrine / recovery anchors:
- IGF-1
- free T interpretation logic in preventive cohort
- DHT / AMH / progesterone timing rationale
v2 promoted markers:
- GlycA
- holoTC
- NMR-LDL-P
- small dense LDL
- apoE baseline-consent framing
companion-module provenance:
- skin mapping / dermoscopy as longitudinal protocol object
- hair diagnostics as endocrine/stress-response longitudinal object

operational debt

exact cadence by cluster
exact role-tag map (guardrail / response / engagement)
exact top-100 count after removing derivations and companion modules

10) final read

if the question is did we have a method? -> yes.
if the question is was that method evenly provenanced? -> no.
if the question is does v2 improve the actual self-use / Health / founder-cohort fit? -> yes, materially.
if the question is are we done? -> no; the next real step is top100_biomarker_scorecard_v2.csv and top100_biomarker_board_decisions_v2.md.

source map inside repo

methodology baseline:
ai/data/biomarker_top100_selection_framework_v1.md
methodology rerun:
ai/data/biomarker_top100_selection_framework_v2.md
ai/data/biomarker_top100_rerun_protocol_v2.md
evidence registry:
ai/data/sources.md
ai/data/panel_evidence_matrix_v1.csv
cohort / expert provenance:
ai/context/founder_input_2026_02_22_25_v1.md
ai/context/stanislav_skakun_biodata_call_2026_02_28_v1.md
system fit:
ai/data/top100_blood_urine_health_audit_v1.md
companion modules:
ai/data/vascular_intelligence_layer_v1.md
ai/data/vascular_intelligence_ops_v1.md
ai/data/mvp_equipment_philosophy_v1.md