Schema Reference¶
ExposoGraph uses a typed schema for all nodes and edges in the knowledge graph. Types are defined as Python enums and enforced by Pydantic models. The core ontology remains fixed, while matching/provenance metadata captures whether a record is canonical, alias-matched, unmatched, or custom.
Node Types¶
Type |
Description |
Key Fields |
|---|---|---|
|
Chemical carcinogenic agents (PAHs, HCAs, nitrosamines, etc.) |
|
|
Metabolizing, transport, and repair proteins |
|
|
Gene loci — for pharmacogenomic variants or expression context |
|
|
Intermediate and terminal metabolites |
|
|
Covalent DNA lesions formed by reactive metabolites |
|
|
Biological/KEGG pathways |
|
|
Anatomical tissues or organs |
Edge Types¶
Type |
Description |
Typical Source → Target |
|---|---|---|
|
Enzyme converts procarcinogen to reactive metabolite |
Enzyme → Metabolite |
|
Enzyme conjugates/inactivates a metabolite |
Enzyme → Metabolite |
|
Efflux transporter moves conjugate out of cell |
Enzyme → Metabolite |
|
Reactive metabolite covalently modifies DNA |
Metabolite → DNA_Adduct |
|
DNA repair enzyme removes a lesion |
Enzyme → DNA_Adduct |
|
Node belongs to a biological pathway |
Node → Pathway |
|
Gene or enzyme is expressed in a tissue |
Gene/Enzyme → Tissue |
|
Substance induces enzyme expression/activity |
Carcinogen → Enzyme |
|
Substance inhibits enzyme expression/activity |
Carcinogen → Enzyme |
|
Gene encodes an enzyme |
Gene → Enzyme |
|
User-defined exploratory predicate |
Any validated or provisional node pair |
Annotation Fields¶
All nodes support these optional annotation fields:
source_db— Provenance databases (NCBI Gene, GTEx, ClinPGx, CTD, IARC, KEGG, etc.)evidence— Brief evidence notepmid— PubMed IDtissue— Relevant tissue context
For repair proteins, group is the recommended place to store the repair
class (for example DNA Repair (BER), DNA Repair (NER), or
DNA Repair (Direct Reversal)). phase is reserved for Phase I/II/III
metabolism and transport labels.
Edges also support carcinogen (the parent carcinogen context node ID)
and label (short description of the reaction).
Structured Provenance¶
Nodes and edges may also carry a provenance list with one or more records.
Each record can store:
source_db— Source database or catalogrecord_id— Stable database identifier or accession when availableevidence— Evidence summary for the specific recordpmid— PubMed IDtissue— Tissue-specific contextcitation— Human-readable citation texturl— Link to the source when available
Legacy top-level fields such as source_db and pmid remain supported.
When present, they are normalized into a single provenance record for backward
compatibility.
Grounding and Match Metadata¶
Nodes and edges also support a lightweight grounding layer:
origin— where the record came from:imported,seeded,user, orllmmatch_status— grounding state:unknown,canonical,alias,unmatched, orcustomcanonical_id/canonical_label/canonical_namespace— canonical mapping for grounded nodescanonical_predicate/canonical_namespace— canonical mapping for grounded edgescustom_type/custom_predicate— exploratory labels for intentionally user-defined content
CUSTOM nodes must include custom_type and CUSTOM edges must include
custom_predicate.
This metadata is orthogonal to curation. A node may be canonical but still in
Draft curation status, or custom but manually reviewed.
Curation Fields¶
Nodes and edges may include a curation object with review metadata:
status—Draft,In Review,Reviewed,Approved, orRejectedconfidence—Low,Medium, orHighcurator— Person who created or updated the recordreviewed_by— Reviewer identityreviewed_at— Review timestamp or date stringnotes— Free-text curation rationale
Validation¶
The GraphEngine enforces referential integrity and
mode-aware merge behavior: