ExposoGraph.db_clients

Public database clients for KEGG, CTD, and the bundled IARC reference catalog.

KEGG notes

The KEGG client parses the fixed-width KEGG REST record format used by get/{id} responses. This includes:

  • multi-line GENE sections where the first token may be a numeric KEGG gene ID

  • multi-line PATHWAY sections in gene records

Those parsing rules are important for seeding because they preserve pathway member gene symbols and per-gene pathway memberships from live KEGG records.

KEGG REST API client for pathway and enzyme lookups.

Uses the public KEGG REST API (https://rest.kegg.jp/) to retrieve pathway membership, enzyme annotations, and gene-pathway mappings. No API key is required for the public endpoints.

class ExposoGraph.db_clients.kegg.KEGGPathway(pathway_id, name, genes=<factory>)[source]

Minimal representation of a KEGG pathway.

Parameters:
pathway_id: str
name: str
genes: list[str]
class ExposoGraph.db_clients.kegg.KEGGGene(gene_id, symbol, name='', pathways=<factory>)[source]

Minimal representation of a KEGG gene entry.

Parameters:
gene_id: str
symbol: str
name: str = ''
pathways: list[str]
class ExposoGraph.db_clients.kegg.KEGGClient(base_url='https://rest.kegg.jp', timeout=30)[source]

Lightweight client for the KEGG REST API.

Parameters

base_url:

Override the KEGG REST base URL (useful for testing).

timeout:

HTTP request timeout in seconds.

get_pathway(pathway_id)[source]

Fetch pathway details including member genes.

Parameters

pathway_id:

KEGG pathway identifier, e.g. "hsa05204" or "path:hsa05204".

Parameters:

pathway_id (str)

Return type:

KEGGPathway

get_gene(gene_id)[source]

Fetch a KEGG gene entry.

Parameters

gene_id:

KEGG gene identifier, e.g. "hsa:1543" for CYP1A1.

Parameters:

gene_id (str)

Return type:

KEGGGene

find_genes(query, organism='hsa')[source]

Search KEGG for genes matching a query string.

Returns a list of {"gene_id": ..., "description": ...} dicts.

Parameters:
Return type:

list[dict[str, str]]

list_pathway_genes(pathway_id)[source]

Return gene IDs belonging to a pathway via the /link endpoint.

Parameters

pathway_id:

KEGG pathway identifier, e.g. "hsa05204".

Parameters:

pathway_id (str)

Return type:

list[str]

Parameters:
  • base_url (str)

  • timeout (int)

CTD (Comparative Toxicogenomics Database) chemical-gene interaction client.

Queries the CTD public data via their batch query API to retrieve chemical-gene interactions relevant to carcinogen metabolism.

class ExposoGraph.db_clients.ctd.ChemicalGeneInteraction(chemical_name, chemical_id, gene_symbol, gene_id, organism='', interaction='', pubmed_ids=<factory>)[source]

A single chemical-gene interaction from CTD.

Parameters:
  • chemical_name (str)

  • chemical_id (str)

  • gene_symbol (str)

  • gene_id (str)

  • organism (str)

  • interaction (str)

  • pubmed_ids (list[str])

chemical_name: str
chemical_id: str
gene_symbol: str
gene_id: str
organism: str = ''
interaction: str = ''
pubmed_ids: list[str]
class ExposoGraph.db_clients.ctd.CTDClient(base_url='https://ctdbase.org/tools/batchQuery.go', timeout=60)[source]

Client for querying CTD chemical-gene interactions.

Parameters

base_url:

Override the CTD batch query URL (useful for testing).

timeout:

HTTP request timeout in seconds.

get_chemical_gene_interactions(chemical_name, *, organism='Homo sapiens')[source]

Fetch chemical-gene interactions for a given chemical.

Parameters

chemical_name:

Chemical name to query (e.g. "Benzo(a)pyrene").

organism:

Organism filter. Defaults to "Homo sapiens".

Parameters:
  • chemical_name (str)

  • organism (str)

Return type:

list[ChemicalGeneInteraction]

get_gene_interactions(gene_symbol, *, organism='Homo sapiens')[source]

Fetch chemical-gene interactions for a given gene.

Parameters

gene_symbol:

Gene symbol to query (e.g. "CYP1A1").

organism:

Organism filter. Defaults to "Homo sapiens".

Parameters:
  • gene_symbol (str)

  • organism (str)

Return type:

list[ChemicalGeneInteraction]

Parameters:
  • base_url (str)

  • timeout (int)

Bundled IARC carcinogen classification data.

Provides a static lookup of IARC monograph classifications for common carcinogens relevant to the carcinogen metabolism knowledge graph. No external API calls are required — data is embedded as a Python dict.

class ExposoGraph.db_clients.iarc.IARCGroup(value)[source]

IARC carcinogen classification groups.

GROUP_1 = 'Group 1'
GROUP_2A = 'Group 2A'
GROUP_2B = 'Group 2B'
GROUP_3 = 'Group 3'
class ExposoGraph.db_clients.iarc.IARCClassifier(extra=None)[source]

Look up IARC classifications from the bundled static dataset.

Example

>>> clf = IARCClassifier()
>>> clf.classify("Benzo[a]pyrene")
IARCGroup.GROUP_1
classify(chemical_name)[source]

Return the IARC group for a chemical, or None if not found.

Parameters:

chemical_name (str)

Return type:

IARCGroup | None

get_entry(chemical_name)[source]

Return the full classification entry (group, CAS, category).

Parameters:

chemical_name (str)

Return type:

dict[str, str] | None

list_by_group(group)[source]

Return all chemical names in a given IARC group.

Parameters:

group (IARCGroup)

Return type:

list[str]

list_by_category(category)[source]

Return all chemical names in a given category (e.g. 'PAH').

Parameters:

category (str)

Return type:

list[str]

property all_chemicals: list[str]

Return all known chemical names.

Parameters:

extra (Optional[dict[str, dict[str, str]]])