ExposoGraph.graph_analysis

Domain-aware graph analysis helpers built on top of ExposoGraph.engine.GraphEngine.

Notable behavior:

  • metabolism_chain() follows carcinogen-linked metabolism edges without pulling in unrelated unlabeled branches that merely share an upstream enzyme

  • variant_impact_score() combines representative activity scores with downstream adduct and repair topology

Graph analysis functions built on top of the NetworkX engine.

All functions are pure — they read from a GraphEngine but never mutate it. This module provides domain-aware queries (metabolism chains, variant impact) alongside standard graph-theory algorithms (shortest path, centrality).

class ExposoGraph.graph_analysis.MetabolismChain(carcinogen_id, node_ids=<factory>, edges=<factory>)[source]

Result of metabolism_chain().

Parameters:
carcinogen_id: str
node_ids: list[str]
edges: list[dict[str, Any]]
property activation_edges: list[dict[str, Any]]
property detox_edges: list[dict[str, Any]]
property adduct_edges: list[dict[str, Any]]
property repair_edges: list[dict[str, Any]]
class ExposoGraph.graph_analysis.VariantImpact(gene_id, activity_score, downstream_adduct_count, downstream_repair_count, score)[source]

Result of variant_impact_score().

Parameters:
  • gene_id (str)

  • activity_score (float | None)

  • downstream_adduct_count (int)

  • downstream_repair_count (int)

  • score (float)

gene_id: str
activity_score: float | None
downstream_adduct_count: int
downstream_repair_count: int
score: float
ExposoGraph.graph_analysis.shortest_path(engine, source, target)[source]

Return the shortest path between source and target, or None.

Operates on the undirected view of the graph so that both incoming and outgoing edges are considered.

Parameters:
Return type:

list[str] | None

ExposoGraph.graph_analysis.all_shortest_paths(engine, source, target)[source]

Return all shortest paths between source and target.

Parameters:
Return type:

list[list[str]]

ExposoGraph.graph_analysis.centrality(engine, method='degree')[source]

Compute centrality scores for all nodes.

method must be one of "degree", "betweenness", or "closeness".

Parameters:
Return type:

dict[str, float]

ExposoGraph.graph_analysis.metabolism_chain(engine, carcinogen_id)[source]

Extract the full metabolic chain for a carcinogen.

Traverses edges of type ACTIVATES, DETOXIFIES, TRANSPORTS, FORMS_ADDUCT, and REPAIRS that are annotated with the given carcinogen_id (via the carcinogen edge attribute) or that directly connect to the carcinogen node.

Parameters:
Return type:

MetabolismChain

ExposoGraph.graph_analysis.pathway_subgraph(engine, pathway_id)[source]

Return node IDs connected to pathway_id via PATHWAY edges.

Parameters:
Return type:

list[str]

ExposoGraph.graph_analysis.variant_impact_score(engine, gene_id)[source]

Compute a variant impact score for a gene node.

The score combines the node’s activity_score with the number of downstream adduct-forming and repair paths reachable from the gene. A gene with a low activity score that sits upstream of many adducts receives a higher impact score (higher = more impactful variant).

Returns None if gene_id is not in the graph.

Parameters:
Return type:

VariantImpact | None

ExposoGraph.graph_analysis.compute_edge_weights(engine)[source]

Compute topology-based edge weights for layout distance scaling.

Each edge receives a weight w = 1 / (1 + hop_count) where hop_count is the shortest-path length from the edge’s source to the nearest DNA_Adduct node via metabolism edges. Edges closer to an adduct endpoint receive higher weights (shorter ideal layout distance).

Returns a dict mapping "source->target" keys to weight floats.

Parameters:

engine (GraphEngine)

Return type:

dict[str, float]