ExposoGraph.seeder

Database seeding helpers for KEGG, CTD, and IARC-backed enrichment.

KEGG and CTD seeders accept a mode argument so seeded graphs can be prepared in exploratory or strict mode before merge.

The KEGG-backed path uses the fixed-width KEGG REST record parser from ExposoGraph.db_clients.kegg. Multi-line GENE and PATHWAY sections are supported so pathway seeding keeps the expected gene symbols and memberships from live KEGG records.

Orchestrator for seeding the knowledge graph from public databases.

Converts KEGG, CTD, and IARC data into KnowledgeGraph objects that can be merged into an existing graph via the GraphEngine.

ExposoGraph.seeder.seed_from_kegg_pathway(pathway_id, *, client=None, mode=GraphMode.EXPLORATORY)[source]

Build a KnowledgeGraph from a KEGG pathway.

Creates a Pathway node and Gene nodes for all member genes, connected by PATHWAY edges.

Parameters

pathway_id:

KEGG pathway identifier, e.g. "hsa05204".

client:

Optional pre-configured KEGGClient.

Parameters:
Return type:

KnowledgeGraph

ExposoGraph.seeder.seed_from_ctd(chemical_name, *, client=None, organism='Homo sapiens', mode=GraphMode.EXPLORATORY)[source]

Build a KnowledgeGraph from CTD chemical-gene interactions.

Creates a Carcinogen node for the chemical and Gene nodes for each interacting gene, connected by ACTIVATES or DETOXIFIES edges based on interaction text heuristics.

Parameters

chemical_name:

Chemical name to query (e.g. "Benzo(a)pyrene").

client:

Optional pre-configured CTDClient.

organism:

Organism filter. Defaults to "Homo sapiens".

Parameters:
Return type:

KnowledgeGraph

ExposoGraph.seeder.seed_iarc_classification(chemical_name, *, classifier=None)[source]

Look up IARC classification for a chemical.

Returns a dict with group, cas, and category keys, or None if the chemical is not in the IARC dataset.

This is a lightweight helper — it does not produce a full KnowledgeGraph but provides annotation data to enrich existing Carcinogen nodes.

Parameters:
Return type:

dict[str, str] | None