Skip to content

Reference: textgraphs package

API by Adnen Kadri from the Noun Project Package definitions for the TextGraphs library.

see copyright/license https://huggingface.co/spaces/DerwenAI/textgraphs/blob/main/README.md

TextGraphs class

Construct a lemma graph from the unstructured text source, then extract ranked phrases using a textgraph algorithm.


infer_relations_async method

[source]

infer_relations_async(pipe, debug=False)

Gather triples representing inferred relations and build edges, concurrently by running an async queue. https://stackoverflow.com/questions/52582685/using-asyncio-queue-for-producer-consumer-flow

Make sure to call beforehand: TextGraphs.collect_graph_elements()

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for this document

  • debug : bool
    debugging flag

  • returns : typing.List[textgraphs.elem.Edge]
    a list of the inferred Edge objects


__init__ method

[source]

__init__(factory=None, iri_base="https://github.com/DerwenAI/textgraphs/ns/")

Constructor.

  • factory : typing.Optional[textgraphs.pipe.PipelineFactory]
    optional PipelineFactory used to configure components

create_pipeline method

[source]

create_pipeline(text_input)

Use the pipeline factory to create a pipeline (e.g., spaCy.Document) for each text input, which are typically paragraph-length.

  • text_input : str
    raw text to be parsed by this pipeline

  • returns : textgraphs.pipe.Pipeline
    a configured pipeline


create_render method

[source]

create_render()

Create an object for rendering the graph in PyVis HTML+JavaScript.

  • returns : textgraphs.vis.RenderPyVis
    a configured RenderPyVis object for generating graph visualizations

collect_graph_elements method

[source]

collect_graph_elements(pipe, text_id=0, para_id=0, debug=False)

Collect the elements of a lemma graph from the results of running the textgraph algorithm. These elements include: parse dependencies, lemmas, entities, and noun chunks.

Make sure to call beforehand: TextGraphs.create_pipeline()

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for this document

  • text_id : int
    text (top-level document) identifier

  • para_id : int
    paragraph identitifer

  • debug : bool
    debugging flag


construct_lemma_graph method

[source]

construct_lemma_graph(debug=False)

Construct the base level of the lemma graph from the collected elements. This gets represented in NetworkX as a directed graph with parallel edges.

Make sure to call beforehand: TextGraphs.collect_graph_elements()

  • debug : bool
    debugging flag

perform_entity_linking method

[source]

perform_entity_linking(pipe, debug=False)

Perform entity linking based on the KnowledgeGraph object.

Make sure to call beforehand: TextGraphs.collect_graph_elements()

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for this document

  • debug : bool
    debugging flag


infer_relations method

[source]

infer_relations(pipe, debug=False)

Gather triples representing inferred relations and build edges.

Make sure to call beforehand: TextGraphs.collect_graph_elements()

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for this document

  • debug : bool
    debugging flag

  • returns : typing.List[textgraphs.elem.Edge]
    a list of the inferred Edge objects


calc_phrase_ranks method

[source]

calc_phrase_ranks(pr_alpha=0.85, debug=False)

Calculate the weights for each node in the lemma graph, then stack-rank the nodes so that entities have priority over lemmas.

Phrase ranks are normalized to sum to 1.0 and these now represent the ranked entities extracted from the document.

Make sure to call beforehand: TextGraphs.construct_lemma_graph()

  • pr_alpha : float
    optional alpha parameter for the PageRank algorithm

  • debug : bool
    debugging flag


get_phrases method

[source]

get_phrases()

Return the entities extracted from the document.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • yields :
    extracted entities

get_phrases_as_df method

[source]

get_phrases_as_df()

Return the ranked extracted entities as a dataframe.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • returns : pandas.core.frame.DataFrame
    a pandas.DataFrame of the extracted entities

export_rdf method

[source]

export_rdf(lang="en")

Extract the entities and relations which have IRIs as RDF triples.

  • lang : str
    language identifier

  • returns : str
    RDF triples N3 (Turtle) format as a string


denormalize_iri method

[source]

denormalize_iri(uri_ref)

Discern between a parsed entity and a linked entity.

  • returns : str
    lemma_key for a parsed entity, the full IRI for a linked entity

load_bootstrap_ttl method

[source]

load_bootstrap_ttl(ttl_str, debug=False)

Parse a TTL string with an RDF semantic graph representation to load bootstrap definitions for the lemma graph prior to parsing, e.g., for synonyms.

  • ttl_str : str
    RDF triples in TTL (Turtle/N3) format

  • debug : bool
    debugging flag


export_kuzu method

[source]

export_kuzu(zip_name="lemma.zip", debug=False)

Export a labeled property graph for KùzuDB (openCypher).

  • debug : bool
    debugging flag

  • returns : str
    name of the generated ZIP file

SimpleGraph class

An in-memory graph used to build a MultiDiGraph in NetworkX.


__init__ method

[source]

__init__()

Constructor.


reset method

[source]

reset()

Re-initialize the data structures, resetting all but the configuration.


make_node method

[source]

make_node(tokens, key, span, kind, text_id, para_id, sent_id, label=None, length=1, linked=True)

Lookup and return a Node object. By default, link matching keys into the same node. Otherwise instantiate a new node if it does not exist already.

  • tokens : typing.List[textgraphs.elem.Node]
    list of parsed tokens

  • key : str
    lemma key (invariant)

  • span : spacy.tokens.token.Token
    token span for the parsed entity

  • kind : <enum 'NodeEnum'>
    the kind of this Node object

  • text_id : int
    text (top-level document) identifier

  • para_id : int
    paragraph identitifer

  • sent_id : int
    sentence identifier

  • label : typing.Optional[str]
    node label (for a new object)

  • length : int
    length of token span

  • linked : bool
    flag for whether this links to an entity

  • returns : textgraphs.elem.Node
    the constructed Node object


make_edge method

[source]

make_edge(src_node, dst_node, kind, rel, prob, key=None, debug=False)

Lookup an edge, creating a new one if it does not exist already, and increment the count if it does.

  • src_node : textgraphs.elem.Node
    source node in the triple

  • dst_node : textgraphs.elem.Node
    destination node in the triple

  • kind : <enum 'RelEnum'>
    the kind of this Edge object

  • rel : str
    relation label

  • prob : float
    probability of this Edge within the graph

  • key : typing.Optional[str]
    lemma key (invariant); generate a key if this is not provided

  • debug : bool
    debugging flag

  • returns : typing.Optional[textgraphs.elem.Edge]
    the constructed Edge object; this may be None if the input parameters indicate skipping the edge


dump_lemma_graph method

[source]

dump_lemma_graph()

Dump the lemma graph as a JSON string in node-link format, suitable for serialization and subsequent use in JavaScript, Neo4j, Graphistry, etc.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • returns : str
    a JSON representation of the exported lemma graph in

load_lemma_graph method

[source]

load_lemma_graph(json_str, debug=False)

Load from a JSON string in a JSON representation of the exported lemma graph in node-link format

  • debug : bool
    debugging flag

Node class

A data class representing one node, i.e., an extracted phrase.


__repr__ method

[source]

__repr__()

get_linked_label method

[source]

get_linked_label()

When this node has a linked entity, return that IRI. Otherwise return its label value.

  • returns : typing.Optional[str]
    a label for the linked entity

get_name method

[source]

get_name()

Return a brief name for the graphical depiction of this Node.

  • returns : str
    brief label to be used in a graph

get_stacked_count method

[source]

get_stacked_count()

Return a modified count, to redact verbs and linked entities from the stack-rank partitions.

  • returns : int
    count, used for re-ranking extracted entities

get_pos method

[source]

get_pos()

Generate a position span for OpenNRE.

  • returns : typing.Tuple[int, int]
    a position span needed for OpenNRE relation extraction

Edge class

A data class representing an edge between two nodes.


__repr__ method

[source]

__repr__()

EnumBase class

A mixin for Enum codecs.

NodeEnum class

Enumeration for the kinds of node categories

RelEnum class

Enumeration for the kinds of edge relations

PipelineFactory class

Factory pattern for building a pipeline, which is one of the more expensive operations with spaCy


__init__ method

[source]

__init__(spacy_model="en_core_web_sm", ner=None, kg=<textgraphs.pipe.KnowledgeGraph object at 0x130529960>, infer_rels=[])

Constructor which instantiates the spaCy pipelines:

  • tok_pipe -- regular generator for parsed tokens
  • ner_pipe -- with entities merged
  • aux_pipe -- spotlight entity linking

which will be needed for parsing and entity linking.

  • spacy_model : str
    the specific model to use in spaCy pipelines

  • ner : typing.Optional[textgraphs.pipe.Component]
    optional custom NER component

  • kg : textgraphs.pipe.KnowledgeGraph
    knowledge graph used for entity linking

  • infer_rels : typing.List[textgraphs.pipe.InferRel]
    a list of components for inferring relations


create_pipeline method

[source]

create_pipeline(text_input)

Instantiate the document pipelines needed to parse the input text.

  • text_input : str
    raw text to be parsed

  • returns : textgraphs.pipe.Pipeline
    a configured Pipeline object

Pipeline class

Manage parsing of a document, which is assumed to be paragraph-sized.


__init__ method

[source]

__init__(text_input, tok_pipe, ner_pipe, aux_pipe, kg, infer_rels)

Constructor.

  • text_input : str
    raw text to be parsed

  • tok_pipe : spacy.language.Language
    the spaCy.Language pipeline used for tallying individual tokens

  • ner_pipe : spacy.language.Language
    the spaCy.Language pipeline used for tallying named entities

  • aux_pipe : spacy.language.Language
    the spaCy.Language pipeline used for auxiliary components (e.g., DBPedia Spotlight)

  • kg : textgraphs.pipe.KnowledgeGraph
    knowledge graph used for entity linking

  • infer_rels : typing.List[textgraphs.pipe.InferRel]
    a list of components for inferring relations


get_lemma_key classmethod

[source]

get_lemma_key(span, placeholder=False)

Compose a unique, invariant lemma key for the given span.

  • span : typing.Union[spacy.tokens.span.Span, spacy.tokens.token.Token]
    span of tokens within the lemma

  • placeholder : bool
    flag for whether to create a placeholder

  • returns : str
    a composed lemma key


get_ent_lemma_keys method

[source]

get_ent_lemma_keys()

Iterate through the fully qualified lemma keys for an extracted entity.

  • yields :
    the lemma keys within an extracted entity

[source]

link_noun_chunks(nodes, debug=False)

Link any noun chunks which are not already subsumed by named entities.

  • nodes : dict
    dictionary of Node objects in the graph

  • debug : bool
    debugging flag

  • returns : typing.List[textgraphs.elem.NounChunk]
    a list of identified noun chunks which are novel


iter_entity_pairs method

[source]

iter_entity_pairs(pipe_graph, max_skip, debug=True)

Iterator for entity pairs for which the algorithm infers relations.

  • pipe_graph : networkx.classes.multigraph.MultiGraph
    a networkx.MultiGraph representation of the graph, reused for graph algorithms

  • max_skip : int
    maximum distance between entities for inferred relations

  • debug : bool
    debugging flag

  • yields :
    pairs of entities within a range, e.g., to use for relation extraction

Component class

Abstract base class for a spaCy pipeline component.


augment_pipe method

[source]

augment_pipe(factory)

Encapsulate a spaCy call to add_pipe() configuration.

  • factory : PipelineFactory
    a PipelineFactory used to configure components

NERSpanMarker class

Configures a spaCy pipeline component for SpanMarkerNER


__init__ method

[source]

__init__(ner_model="tomaarsen/span-marker-roberta-large-ontonotes5")

Constructor.

  • ner_model : str
    model to be used in SpanMarker

augment_pipe method

[source]

augment_pipe(factory)

Encapsulate a spaCy call to add_pipe() configuration.

  • factory : textgraphs.pipe.PipelineFactory
    the PipelineFactory used to configure this pipeline component

NounChunk class

A data class representing one noun chunk, i.e., a candidate as an extracted phrase.


__repr__ method

[source]

__repr__()

KnowledgeGraph class

Base class for a knowledge graph interface.


augment_pipe method

[source]

augment_pipe(factory)

Encapsulate a spaCy call to add_pipe() configuration.

  • factory : PipelineFactory
    a PipelineFactory used to configure components

remap_ner method

[source]

remap_ner(label)

Remap the OntoTypes4 values from NER output to more general-purpose IRIs.

  • label : typing.Optional[str]
    input NER label, an OntoTypes4 value

  • returns : typing.Optional[str]
    an IRI for the named entity


normalize_prefix method

[source]

normalize_prefix(iri, debug=False)

Normalize the given IRI to use standard namespace prefixes.

  • iri : str
    input IRI, in fully-qualified domain representation

  • debug : bool
    debugging flag

  • returns : str
    the compact IRI representation, using an RDF namespace prefix


perform_entity_linking method

[source]

perform_entity_linking(graph, pipe, debug=False)

Perform entity linking based on "spotlight" and other services.

  • graph : textgraphs.graph.SimpleGraph
    source graph

  • pipe : Pipeline
    configured pipeline for the current document

  • debug : bool
    debugging flag


resolve_rel_iri method

[source]

resolve_rel_iri(rel, lang="en", debug=False)

Resolve a rel string from a relation extraction model which has been trained on this knowledge graph.

  • rel : str
    relation label, generation these source from Wikidata for many RE projects

  • lang : str
    language identifier

  • debug : bool
    debugging flag

  • returns : typing.Optional[str]
    a resolved IRI

KGSearchHit class

A data class representing a hit from a knowledge graph search.


__repr__ method

[source]

__repr__()

KGWikiMedia class

Manage access to WikiMedia-related APIs.


__init__ method

[source]

__init__(spotlight_api="https://api.dbpedia-spotlight.org/en", dbpedia_search_api="https://lookup.dbpedia.org/api/search", dbpedia_sparql_api="https://dbpedia.org/sparql", wikidata_api="https://www.wikidata.org/w/api.php", ner_map=OrderedDict([('CARDINAL', {'iri': 'http://dbpedia.org/resource/Cardinal_number', 'definition': 'Numerals that do not fall under another type', 'label': 'cardinal number'}), ('DATE', {'iri': 'http://dbpedia.org/ontology/date', 'definition': 'Absolute or relative dates or periods', 'label': 'date'}), ('EVENT', {'iri': 'http://dbpedia.org/ontology/Event', 'definition': 'Named hurricanes, battles, wars, sports events, etc.', 'label': 'event'}), ('FAC', {'iri': 'http://dbpedia.org/ontology/Infrastructure', 'definition': 'Buildings, airports, highways, bridges, etc.', 'label': 'infrastructure'}), ('GPE', {'iri': 'http://dbpedia.org/ontology/Country', 'definition': 'Countries, cities, states', 'label': 'country'}), ('LANGUAGE', {'iri': 'http://dbpedia.org/ontology/Language', 'definition': 'Any named language', 'label': 'language'}), ('LAW', {'iri': 'http://dbpedia.org/ontology/Law', 'definition': 'Named documents made into laws', 'label': 'law'}), ('LOC', {'iri': 'http://dbpedia.org/ontology/Place', 'definition': 'Non-GPE locations, mountain ranges, bodies of water', 'label': 'place'}), ('MONEY', {'iri': 'http://dbpedia.org/resource/Money', 'definition': 'Monetary values, including unit', 'label': 'money'}), ('NORP', {'iri': 'http://dbpedia.org/ontology/nationality', 'definition': 'Nationalities or religious or political groups', 'label': 'nationality'}), ('ORDINAL', {'iri': 'http://dbpedia.org/resource/Ordinal_number', 'definition': 'Ordinal number, i.e., first, second, etc.', 'label': 'ordinal number'}), ('ORG', {'iri': 'http://dbpedia.org/ontology/Organisation', 'definition': 'Companies, agencies, institutions, etc.', 'label': 'organization'}), ('PERCENT', {'iri': 'http://dbpedia.org/resource/Percentage', 'definition': 'Percentage', 'label': 'percentage'}), ('PERSON', {'iri': 'http://dbpedia.org/ontology/Person', 'definition': 'People, including fictional', 'label': 'person'}), ('PRODUCT', {'iri': 'http://dbpedia.org/ontology/product', 'definition': 'Vehicles, weapons, foods, etc. (Not services)', 'label': 'product'}), ('QUANTITY', {'iri': 'http://dbpedia.org/resource/Quantity', 'definition': 'Measurements, as of weight or distance', 'label': 'quantity'}), ('TIME', {'iri': 'http://dbpedia.org/ontology/time', 'definition': 'Times smaller than a day', 'label': 'time'}), ('WORK OF ART', {'iri': 'http://dbpedia.org/resource/Work_of_art', 'definition': 'Titles of books, songs, etc.', 'label': 'work of art'})]), ns_prefix=OrderedDict([('dbc', 'http://dbpedia.org/resource/Category:'), ('dbt', 'http://dbpedia.org/resource/Template:'), ('dbr', 'http://dbpedia.org/resource/'), ('yago', 'http://dbpedia.org/class/yago/'), ('dbd', 'http://dbpedia.org/datatype/'), ('dbo', 'http://dbpedia.org/ontology/'), ('dbp', 'http://dbpedia.org/property/'), ('units', 'http://dbpedia.org/units/'), ('dbpedia-commons', 'http://commons.dbpedia.org/resource/'), ('dbpedia-wikicompany', 'http://dbpedia.openlinksw.com/wikicompany/'), ('dbpedia-wikidata', 'http://wikidata.dbpedia.org/resource/'), ('wd', 'http://www.wikidata.org/'), ('wd_ent', 'http://www.wikidata.org/entity/'), ('rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'), ('schema', 'https://schema.org/'), ('owl', 'http://www.w3.org/2002/07/owl#')]), min_alias=0.8, min_similarity=0.9)

Constructor.

  • spotlight_api : str
    DBPedia Spotlight API or equivalent local service

  • dbpedia_search_api : str
    DBPedia Search API or equivalent local service

  • dbpedia_sparql_api : str
    DBPedia SPARQL API or equivalent local service

  • wikidata_api : str
    Wikidata Search API or equivalent local service

  • ner_map : dict
    named entity map for standardizing IRIs

  • ns_prefix : dict
    RDF namespace prefixes

  • min_alias : float
    minimum alias probability threshold for accepting linked entities

  • min_similarity : float
    minimum label similarity threshold for accepting linked entities


augment_pipe method

[source]

augment_pipe(factory)

Encapsulate a spaCy call to add_pipe() configuration.

  • factory : textgraphs.pipe.PipelineFactory
    a PipelineFactory used to configure components

remap_ner method

[source]

remap_ner(label)

Remap the OntoTypes4 values from NER output to more general-purpose IRIs.

  • label : typing.Optional[str]
    input NER label, an OntoTypes4 value

  • returns : typing.Optional[str]
    an IRI for the named entity


normalize_prefix method

[source]

normalize_prefix(iri, debug=False)

Normalize the given IRI using the standard DBPedia namespace prefixes.

  • iri : str
    input IRI, in fully-qualified domain representation

  • debug : bool
    debugging flag

  • returns : str
    the compact IRI representation, using an RDF namespace prefix


perform_entity_linking method

[source]

perform_entity_linking(graph, pipe, debug=False)

Perform entity linking based on DBPedia Spotlight and other services.

  • graph : textgraphs.graph.SimpleGraph
    source graph

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for the current document

  • debug : bool
    debugging flag


resolve_rel_iri method

[source]

resolve_rel_iri(rel, lang="en", debug=False)

Resolve a rel string from a relation extraction model which has been trained on this knowledge graph, which defaults to using the WikiMedia graphs.

  • rel : str
    relation label, generation these source from Wikidata for many RE projects

  • lang : str
    language identifier

  • debug : bool
    debugging flag

  • returns : typing.Optional[str]
    a resolved IRI


wikidata_search method

[source]

wikidata_search(query, lang="en", debug=False)

Query the Wikidata search API.

  • query : str
    query string

  • lang : str
    language identifier

  • debug : bool
    debugging flag

  • returns : typing.Optional[textgraphs.elem.KGSearchHit]
    search hit, if any


dbpedia_search_entity method

[source]

dbpedia_search_entity(query, lang="en", debug=False)

Perform a DBPedia API search.

  • query : str
    query string

  • lang : str
    language identifier

  • debug : bool
    debugging flag

  • returns : typing.Optional[textgraphs.elem.KGSearchHit]
    search hit, if any


dbpedia_sparql_query method

[source]

dbpedia_sparql_query(sparql, debug=False)

Perform a SPARQL query on DBPedia.

  • sparql : str
    SPARQL query string

  • debug : bool
    debugging flag

  • returns : dict
    dictionary of query results


dbpedia_wikidata_equiv method

[source]

dbpedia_wikidata_equiv(dbpedia_iri, debug=False)

Perform a SPARQL query on DBPedia to find an equivalent Wikidata entity.

  • dbpedia_iri : str
    IRI in DBpedia

  • debug : bool
    debugging flag

  • returns : typing.Optional[str]
    equivalent IRI in Wikidata

LinkedEntity class

A data class representing one linked entity.


__repr__ method

[source]

__repr__()

InferRel class

Abstract base class for a relation extraction model wrapper.


gen_triples_async method

[source]

gen_triples_async(pipe, queue, debug=False)

Infer relations as triples produced to a queue concurrently.

  • pipe : Pipeline
    configured pipeline for the current document

  • queue : asyncio.queues.Queue
    queue of inference tasks to be performed

  • debug : bool
    debugging flag


gen_triples method

[source]

gen_triples(pipe, debug=False)

Infer relations as triples through a generator iteratively.

  • pipe : Pipeline
    configured pipeline for the current document

  • debug : bool
    debugging flag

  • yields :
    generated triples

InferRel_OpenNRE class

Perform relation extraction based on the OpenNRE model. https://github.com/thunlp/OpenNRE


__init__ method

[source]

__init__(model="wiki80_cnn_softmax", max_skip=11, min_prob=0.9)

Constructor.

  • model : str
    the specific model to be used in OpenNRE

  • max_skip : int
    maximum distance between entities for inferred relations

  • min_prob : float
    minimum probability threshold for accepting an inferred relation


gen_triples method

[source]

gen_triples(pipe, debug=False)

Iterate on entity pairs to drive OpenNRE, inferring relations represented as triples which get produced by a generator.

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for the current document

  • debug : bool
    debugging flag

  • yields :
    generated triples as candidates for inferred relations

InferRel_Rebel class

Perform relation extraction based on the REBEL model. https://github.com/Babelscape/rebel https://huggingface.co/spaces/Babelscape/mrebel-demo


__init__ method

[source]

__init__(lang="en_XX", mrebel_model="Babelscape/mrebel-large")

Constructor.

  • lang : str
    language identifier

  • mrebel_model : str
    tokenizer model to be used


tokenize_sent method

[source]

tokenize_sent(text)

Apply the tokenizer manually, since we need to extract special tokens.

  • text : str
    input text for the sentence to be tokenized

  • returns : str
    extracted tokens


extract_triplets_typed method

[source]

extract_triplets_typed(text)

Parse the generated text and extract its triplets.

  • text : str
    input text for the sentence to use in inference

  • returns : list
    a list of extracted triples


gen_triples method

[source]

gen_triples(pipe, debug=False)

Drive REBEL to infer relations for each sentence, represented as triples which get produced by a generator.

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for the current document

  • debug : bool
    debugging flag

  • yields :
    generated triples as candidates for inferred relations

RenderPyVis class

Render the lemma graph as a PyVis network.


__init__ method

[source]

__init__(graph, kg)

Constructor.

  • graph : textgraphs.graph.SimpleGraph
    source graph to be visualized

  • kg : textgraphs.pipe.KnowledgeGraph
    knowledge graph used for entity linking


render_lemma_graph method

[source]

render_lemma_graph(debug=True)

Prepare the structure of the NetworkX graph to use for building and returning a PyVis network to render.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • debug : bool
    debugging flag

  • returns : pyvis.network.Network
    <a pyvis.network.Network interactive visualization


draw_communities method

[source]

draw_communities(spring_distance=1.4, debug=False)

Cluster the communities in the lemma graph, then draw a NetworkX graph of the notes with a specific color for each community.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • spring_distance : float
    NetworkX parameter used to separate clusters visually

  • debug : bool
    debugging flag

  • returns : typing.Dict[int, int]
    a map of the calculated communities


generate_wordcloud method

[source]

generate_wordcloud(background="black")

Generate a tag cloud from the given phrases.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • background : str
    background color for the rendering

  • returns : wordcloud.wordcloud.WordCloud
    the rendering as a wordcloud.WordCloud object, which can be used to generate PNG images, etc.

NodeStyle class

Dataclass used for styling PyVis nodes.


__setattr__ method

[source]

__setattr__(name, value)

GraphOfRelations class

Attempt to reproduce results published in "INGRAM: Inductive Knowledge Graph Embedding via Relation Graphs" https://arxiv.org/abs/2305.19987


__init__ method

[source]

__init__(source)

Constructor.

  • source : textgraphs.graph.SimpleGraph
    source graph to be transformed

load_ingram method

[source]

load_ingram(json_file, debug=False)

Load data for a source graph, as illustrated in lee2023ingram

  • json_file : pathlib.Path
    path for the JSON dataset to load

  • debug : bool
    debugging flag


seeds method

[source]

seeds(debug=False)

Prep data for the topological transform illustrated in lee2023ingram

  • debug : bool
    debugging flag

trace_source_graph method

[source]

trace_source_graph()

Output a "seed" representation of the source graph.


construct_gor method

[source]

construct_gor(debug=False)

Perform the topological transform described by lee2023ingram, constructing a graph of relations (GOR) and calculating affinity scores between entities in the GOR based on their definitions:

we measure the affinity between two relations by considering how many entities are shared between them and how frequently they share the same entity

  • debug : bool
    debugging flag

tally_frequencies classmethod

[source]

tally_frequencies(counter)

Tally the frequency of shared entities.

  • counter : collections.Counter
    counter data collection for the rel_b/entity pairs

  • returns : int
    tallied values for one relation


get_affinity_scores method

[source]

get_affinity_scores(debug=False)

Reproduce metrics based on the example published in lee2023ingram

  • debug : bool
    debugging flag

  • returns : typing.Dict[tuple, float]
    the calculated affinity scores


trace_metrics method

[source]

trace_metrics(scores)

Compare the calculated affinity scores with results from a published example.

  • scores : typing.Dict[tuple, float]
    the calculated affinity scores between pairs of relations (i.e., observed values)

  • returns : pandas.core.frame.DataFrame
    a pandas.DataFrame where the rows compare expected vs. observed affinity scores


render_gor_plt method

[source]

render_gor_plt(scores)

Visualize the graph of relations using matplotlib

  • scores : typing.Dict[tuple, float]
    the calculated affinity scores between pairs of relations (i.e., observed values)

render_gor_pyvis method

[source]

render_gor_pyvis(scores)

Visualize the graph of relations interactively using PyVis

  • scores : typing.Dict[tuple, float]
    the calculated affinity scores between pairs of relations (i.e., observed values)

  • returns : pyvis.network.Network
    a pyvis.networkNetwork representation of the transformed graph

TransArc class

A data class representing one transformed rel-node-rel triple in a graph of relations.


__repr__ method

[source]

__repr__()

RelDir class

Enumeration for the directions of a relation.

SheafSeed class

A data class representing a node from the source graph plus its partial edge, based on a Sheaf Theory decomposition of a graph.


__repr__ method

[source]

__repr__()

Affinity class

A data class representing the affinity scores from one entity in the transformed graph of relations.

NB: there are much more efficient ways to calculate these affinity scores using sparse tensor algebra; this approach illustrates the process -- for research and debugging.


__repr__ method

[source]

__repr__()

module functions


calc_quantile_bins function

[source]

calc_quantile_bins(num_rows)

Calculate the bins to use for a quantile stripe, using numpy.linspace

  • num_rows : int
    number of rows in the target dataframe

  • returns : numpy.ndarray
    calculated bins, as a numpy.ndarray


get_repo_version function

[source]

get_repo_version()

Access the Git repository information and return items to identify the version/commit running in production.

  • returns : typing.Tuple[str, str]
    version tag and commit hash

root_mean_square function

[source]

root_mean_square(values)

Calculate the root mean square of the values in the given list.

  • values : typing.List[float]
    list of values to use in the RMS calculation

  • returns : float
    RMS metric as a float


stripe_column function

[source]

stripe_column(values, bins)

Stripe a column in a dataframe, by interpolating quantiles into a set of discrete indexes.

  • values : list
    list of values to stripe

  • bins : int
    quantile bins; see calc_quantile_bins()

  • returns : numpy.ndarray
    the striped column values, as a numpy.ndarray


module types