Reference: textgraphs
package¶
Package definitions for the TextGraphs
library.
see copyright/license https://huggingface.co/spaces/DerwenAI/textgraphs/blob/main/README.md
TextGraphs
class¶
Construct a lemma graph from the unstructured text source,
then extract ranked phrases using a textgraph
algorithm.
infer_relations_async
method¶
infer_relations_async(pipe, debug=False)
Gather triples representing inferred relations and build edges, concurrently by running an async queue. https://stackoverflow.com/questions/52582685/using-asyncio-queue-for-producer-consumer-flow
Make sure to call beforehand: TextGraphs.collect_graph_elements()
-
pipe
:textgraphs.pipe.Pipeline
configured pipeline for this document -
debug
:bool
debugging flag -
returns :
typing.List[textgraphs.elem.Edge]
a list of the inferredEdge
objects
__init__
method¶
__init__(factory=None, iri_base="https://github.com/DerwenAI/textgraphs/ns/")
Constructor.
factory
:typing.Optional[textgraphs.pipe.PipelineFactory]
optionalPipelineFactory
used to configure components
create_pipeline
method¶
create_pipeline(text_input)
Use the pipeline factory to create a pipeline (e.g., spaCy.Document
)
for each text input, which are typically paragraph-length.
-
text_input
:str
raw text to be parsed by this pipeline -
returns :
textgraphs.pipe.Pipeline
a configured pipeline
create_render
method¶
create_render()
Create an object for rendering the graph in PyVis
HTML+JavaScript.
- returns :
textgraphs.vis.RenderPyVis
a configuredRenderPyVis
object for generating graph visualizations
collect_graph_elements
method¶
collect_graph_elements(pipe, text_id=0, para_id=0, debug=False)
Collect the elements of a lemma graph from the results of running
the textgraph
algorithm. These elements include: parse dependencies,
lemmas, entities, and noun chunks.
Make sure to call beforehand: TextGraphs.create_pipeline()
-
pipe
:textgraphs.pipe.Pipeline
configured pipeline for this document -
text_id
:int
text (top-level document) identifier -
para_id
:int
paragraph identitifer -
debug
:bool
debugging flag
construct_lemma_graph
method¶
construct_lemma_graph(debug=False)
Construct the base level of the lemma graph from the collected
elements. This gets represented in NetworkX
as a directed graph
with parallel edges.
Make sure to call beforehand: TextGraphs.collect_graph_elements()
debug
:bool
debugging flag
perform_entity_linking
method¶
perform_entity_linking(pipe, debug=False)
Perform entity linking based on the KnowledgeGraph
object.
Make sure to call beforehand: TextGraphs.collect_graph_elements()
-
pipe
:textgraphs.pipe.Pipeline
configured pipeline for this document -
debug
:bool
debugging flag
infer_relations
method¶
infer_relations(pipe, debug=False)
Gather triples representing inferred relations and build edges.
Make sure to call beforehand: TextGraphs.collect_graph_elements()
-
pipe
:textgraphs.pipe.Pipeline
configured pipeline for this document -
debug
:bool
debugging flag -
returns :
typing.List[textgraphs.elem.Edge]
a list of the inferredEdge
objects
calc_phrase_ranks
method¶
calc_phrase_ranks(pr_alpha=0.85, debug=False)
Calculate the weights for each node in the lemma graph, then stack-rank the nodes so that entities have priority over lemmas.
Phrase ranks are normalized to sum to 1.0 and these now represent the ranked entities extracted from the document.
Make sure to call beforehand: TextGraphs.construct_lemma_graph()
-
pr_alpha
:float
optionalalpha
parameter for the PageRank algorithm -
debug
:bool
debugging flag
get_phrases
method¶
get_phrases()
Return the entities extracted from the document.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
- yields :
extracted entities
get_phrases_as_df
method¶
get_phrases_as_df()
Return the ranked extracted entities as a dataframe.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
- returns :
pandas.core.frame.DataFrame
apandas.DataFrame
of the extracted entities
export_rdf
method¶
export_rdf(lang="en")
Extract the entities and relations which have IRIs as RDF triples.
-
lang
:str
language identifier -
returns :
str
RDF triples N3 (Turtle) format as a string
denormalize_iri
method¶
denormalize_iri(uri_ref)
Discern between a parsed entity and a linked entity.
- returns :
str
lemma_key for a parsed entity, the full IRI for a linked entity
load_bootstrap_ttl
method¶
load_bootstrap_ttl(ttl_str, debug=False)
Parse a TTL string with an RDF semantic graph representation to load bootstrap definitions for the lemma graph prior to parsing, e.g., for synonyms.
-
ttl_str
:str
RDF triples in TTL (Turtle/N3) format -
debug
:bool
debugging flag
export_kuzu
method¶
export_kuzu(zip_name="lemma.zip", debug=False)
Export a labeled property graph for KùzuDB (openCypher).
-
debug
:bool
debugging flag -
returns :
str
name of the generated ZIP file
SimpleGraph
class¶
An in-memory graph used to build a MultiDiGraph
in NetworkX.
__init__
method¶
__init__()
Constructor.
reset
method¶
reset()
Re-initialize the data structures, resetting all but the configuration.
make_node
method¶
make_node(tokens, key, span, kind, text_id, para_id, sent_id, label=None, length=1, linked=True)
Lookup and return a Node
object.
By default, link matching keys into the same node.
Otherwise instantiate a new node if it does not exist already.
-
tokens
:typing.List[textgraphs.elem.Node]
list of parsed tokens -
key
:str
lemma key (invariant) -
span
:spacy.tokens.token.Token
token span for the parsed entity -
kind
:<enum 'NodeEnum'>
the kind of thisNode
object -
text_id
:int
text (top-level document) identifier -
para_id
:int
paragraph identitifer -
sent_id
:int
sentence identifier -
label
:typing.Optional[str]
node label (for a new object) -
length
:int
length of token span -
linked
:bool
flag for whether this links to an entity -
returns :
textgraphs.elem.Node
the constructedNode
object
make_edge
method¶
make_edge(src_node, dst_node, kind, rel, prob, key=None, debug=False)
Lookup an edge, creating a new one if it does not exist already, and increment the count if it does.
-
src_node
:textgraphs.elem.Node
source node in the triple -
dst_node
:textgraphs.elem.Node
destination node in the triple -
kind
:<enum 'RelEnum'>
the kind of thisEdge
object -
rel
:str
relation label -
prob
:float
probability of thisEdge
within the graph -
key
:typing.Optional[str]
lemma key (invariant); generate a key if this is not provided -
debug
:bool
debugging flag -
returns :
typing.Optional[textgraphs.elem.Edge]
the constructedEdge
object; this may beNone
if the input parameters indicate skipping the edge
dump_lemma_graph
method¶
dump_lemma_graph()
Dump the lemma graph as a JSON string in node-link format, suitable for serialization and subsequent use in JavaScript, Neo4j, Graphistry, etc.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
- returns :
str
a JSON representation of the exported lemma graph in
load_lemma_graph
method¶
load_lemma_graph(json_str, debug=False)
Load from a JSON string in a JSON representation of the exported lemma graph in node-link format
debug
:bool
debugging flag
Node
class¶
A data class representing one node, i.e., an extracted phrase.
__repr__
method¶
__repr__()
get_linked_label
method¶
get_linked_label()
When this node has a linked entity, return that IRI.
Otherwise return its label
value.
- returns :
typing.Optional[str]
a label for the linked entity
get_name
method¶
get_name()
Return a brief name for the graphical depiction of this Node.
- returns :
str
brief label to be used in a graph
get_stacked_count
method¶
get_stacked_count()
Return a modified count, to redact verbs and linked entities from the stack-rank partitions.
- returns :
int
count, used for re-ranking extracted entities
get_pos
method¶
get_pos()
Generate a position span for OpenNRE
.
- returns :
typing.Tuple[int, int]
a position span needed forOpenNRE
relation extraction
Edge
class¶
A data class representing an edge between two nodes.
__repr__
method¶
__repr__()
EnumBase
class¶
A mixin for Enum codecs.
NodeEnum
class¶
Enumeration for the kinds of node categories
RelEnum
class¶
Enumeration for the kinds of edge relations
PipelineFactory
class¶
Factory pattern for building a pipeline, which is one of the more
expensive operations with spaCy
__init__
method¶
__init__(spacy_model="en_core_web_sm", ner=None, kg=<textgraphs.pipe.KnowledgeGraph object at 0x130529960>, infer_rels=[])
Constructor which instantiates the spaCy
pipelines:
tok_pipe
-- regular generator for parsed tokensner_pipe
-- with entities mergedaux_pipe
-- spotlight entity linking
which will be needed for parsing and entity linking.
-
spacy_model
:str
the specific model to use inspaCy
pipelines -
ner
:typing.Optional[textgraphs.pipe.Component]
optional custom NER component -
kg
:textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linking -
infer_rels
:typing.List[textgraphs.pipe.InferRel]
a list of components for inferring relations
create_pipeline
method¶
create_pipeline(text_input)
Instantiate the document pipelines needed to parse the input text.
-
text_input
:str
raw text to be parsed -
returns :
textgraphs.pipe.Pipeline
a configuredPipeline
object
Pipeline
class¶
Manage parsing of a document, which is assumed to be paragraph-sized.
__init__
method¶
__init__(text_input, tok_pipe, ner_pipe, aux_pipe, kg, infer_rels)
Constructor.
-
text_input
:str
raw text to be parsed -
tok_pipe
:spacy.language.Language
thespaCy.Language
pipeline used for tallying individual tokens -
ner_pipe
:spacy.language.Language
thespaCy.Language
pipeline used for tallying named entities -
aux_pipe
:spacy.language.Language
thespaCy.Language
pipeline used for auxiliary components (e.g.,DBPedia Spotlight
) -
kg
:textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linking -
infer_rels
:typing.List[textgraphs.pipe.InferRel]
a list of components for inferring relations
get_lemma_key
classmethod¶
get_lemma_key(span, placeholder=False)
Compose a unique, invariant lemma key for the given span.
-
span
:typing.Union[spacy.tokens.span.Span, spacy.tokens.token.Token]
span of tokens within the lemma -
placeholder
:bool
flag for whether to create a placeholder -
returns :
str
a composed lemma key
get_ent_lemma_keys
method¶
get_ent_lemma_keys()
Iterate through the fully qualified lemma keys for an extracted entity.
- yields :
the lemma keys within an extracted entity
link_noun_chunks
method¶
link_noun_chunks(nodes, debug=False)
Link any noun chunks which are not already subsumed by named entities.
-
nodes
:dict
dictionary ofNode
objects in the graph -
debug
:bool
debugging flag -
returns :
typing.List[textgraphs.elem.NounChunk]
a list of identified noun chunks which are novel
iter_entity_pairs
method¶
iter_entity_pairs(pipe_graph, max_skip, debug=True)
Iterator for entity pairs for which the algorithm infers relations.
-
pipe_graph
:networkx.classes.multigraph.MultiGraph
anetworkx.MultiGraph
representation of the graph, reused for graph algorithms -
max_skip
:int
maximum distance between entities for inferred relations -
debug
:bool
debugging flag -
yields :
pairs of entities within a range, e.g., to use for relation extraction
Component
class¶
Abstract base class for a spaCy
pipeline component.
augment_pipe
method¶
augment_pipe(factory)
Encapsulate a spaCy
call to add_pipe()
configuration.
factory
:PipelineFactory
aPipelineFactory
used to configure components
NERSpanMarker
class¶
Configures a spaCy
pipeline component for SpanMarkerNER
__init__
method¶
__init__(ner_model="tomaarsen/span-marker-roberta-large-ontonotes5")
Constructor.
ner_model
:str
model to be used inSpanMarker
augment_pipe
method¶
augment_pipe(factory)
Encapsulate a spaCy
call to add_pipe()
configuration.
factory
:textgraphs.pipe.PipelineFactory
thePipelineFactory
used to configure this pipeline component
NounChunk
class¶
A data class representing one noun chunk, i.e., a candidate as an extracted phrase.
__repr__
method¶
__repr__()
KnowledgeGraph
class¶
Base class for a knowledge graph interface.
augment_pipe
method¶
augment_pipe(factory)
Encapsulate a spaCy
call to add_pipe()
configuration.
factory
:PipelineFactory
aPipelineFactory
used to configure components
remap_ner
method¶
remap_ner(label)
Remap the OntoTypes4 values from NER output to more general-purpose IRIs.
-
label
:typing.Optional[str]
input NER label, anOntoTypes4
value -
returns :
typing.Optional[str]
an IRI for the named entity
normalize_prefix
method¶
normalize_prefix(iri, debug=False)
Normalize the given IRI to use standard namespace prefixes.
-
iri
:str
input IRI, in fully-qualified domain representation -
debug
:bool
debugging flag -
returns :
str
the compact IRI representation, using an RDF namespace prefix
perform_entity_linking
method¶
perform_entity_linking(graph, pipe, debug=False)
Perform entity linking based on "spotlight" and other services.
-
graph
:textgraphs.graph.SimpleGraph
source graph -
pipe
:Pipeline
configured pipeline for the current document -
debug
:bool
debugging flag
resolve_rel_iri
method¶
resolve_rel_iri(rel, lang="en", debug=False)
Resolve a rel
string from a relation extraction model which has
been trained on this knowledge graph.
-
rel
:str
relation label, generation these source from Wikidata for many RE projects -
lang
:str
language identifier -
debug
:bool
debugging flag -
returns :
typing.Optional[str]
a resolved IRI
KGSearchHit
class¶
A data class representing a hit from a knowledge graph search.
__repr__
method¶
__repr__()
KGWikiMedia
class¶
Manage access to WikiMedia-related APIs.
__init__
method¶
__init__(spotlight_api="https://api.dbpedia-spotlight.org/en", dbpedia_search_api="https://lookup.dbpedia.org/api/search", dbpedia_sparql_api="https://dbpedia.org/sparql", wikidata_api="https://www.wikidata.org/w/api.php", ner_map=OrderedDict([('CARDINAL', {'iri': 'http://dbpedia.org/resource/Cardinal_number', 'definition': 'Numerals that do not fall under another type', 'label': 'cardinal number'}), ('DATE', {'iri': 'http://dbpedia.org/ontology/date', 'definition': 'Absolute or relative dates or periods', 'label': 'date'}), ('EVENT', {'iri': 'http://dbpedia.org/ontology/Event', 'definition': 'Named hurricanes, battles, wars, sports events, etc.', 'label': 'event'}), ('FAC', {'iri': 'http://dbpedia.org/ontology/Infrastructure', 'definition': 'Buildings, airports, highways, bridges, etc.', 'label': 'infrastructure'}), ('GPE', {'iri': 'http://dbpedia.org/ontology/Country', 'definition': 'Countries, cities, states', 'label': 'country'}), ('LANGUAGE', {'iri': 'http://dbpedia.org/ontology/Language', 'definition': 'Any named language', 'label': 'language'}), ('LAW', {'iri': 'http://dbpedia.org/ontology/Law', 'definition': 'Named documents made into laws', 'label': 'law'}), ('LOC', {'iri': 'http://dbpedia.org/ontology/Place', 'definition': 'Non-GPE locations, mountain ranges, bodies of water', 'label': 'place'}), ('MONEY', {'iri': 'http://dbpedia.org/resource/Money', 'definition': 'Monetary values, including unit', 'label': 'money'}), ('NORP', {'iri': 'http://dbpedia.org/ontology/nationality', 'definition': 'Nationalities or religious or political groups', 'label': 'nationality'}), ('ORDINAL', {'iri': 'http://dbpedia.org/resource/Ordinal_number', 'definition': 'Ordinal number, i.e., first, second, etc.', 'label': 'ordinal number'}), ('ORG', {'iri': 'http://dbpedia.org/ontology/Organisation', 'definition': 'Companies, agencies, institutions, etc.', 'label': 'organization'}), ('PERCENT', {'iri': 'http://dbpedia.org/resource/Percentage', 'definition': 'Percentage', 'label': 'percentage'}), ('PERSON', {'iri': 'http://dbpedia.org/ontology/Person', 'definition': 'People, including fictional', 'label': 'person'}), ('PRODUCT', {'iri': 'http://dbpedia.org/ontology/product', 'definition': 'Vehicles, weapons, foods, etc. (Not services)', 'label': 'product'}), ('QUANTITY', {'iri': 'http://dbpedia.org/resource/Quantity', 'definition': 'Measurements, as of weight or distance', 'label': 'quantity'}), ('TIME', {'iri': 'http://dbpedia.org/ontology/time', 'definition': 'Times smaller than a day', 'label': 'time'}), ('WORK OF ART', {'iri': 'http://dbpedia.org/resource/Work_of_art', 'definition': 'Titles of books, songs, etc.', 'label': 'work of art'})]), ns_prefix=OrderedDict([('dbc', 'http://dbpedia.org/resource/Category:'), ('dbt', 'http://dbpedia.org/resource/Template:'), ('dbr', 'http://dbpedia.org/resource/'), ('yago', 'http://dbpedia.org/class/yago/'), ('dbd', 'http://dbpedia.org/datatype/'), ('dbo', 'http://dbpedia.org/ontology/'), ('dbp', 'http://dbpedia.org/property/'), ('units', 'http://dbpedia.org/units/'), ('dbpedia-commons', 'http://commons.dbpedia.org/resource/'), ('dbpedia-wikicompany', 'http://dbpedia.openlinksw.com/wikicompany/'), ('dbpedia-wikidata', 'http://wikidata.dbpedia.org/resource/'), ('wd', 'http://www.wikidata.org/'), ('wd_ent', 'http://www.wikidata.org/entity/'), ('rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'), ('schema', 'https://schema.org/'), ('owl', 'http://www.w3.org/2002/07/owl#')]), min_alias=0.8, min_similarity=0.9)
Constructor.
-
spotlight_api
:str
DBPedia Spotlight
API or equivalent local service -
dbpedia_search_api
:str
DBPedia Search
API or equivalent local service -
dbpedia_sparql_api
:str
DBPedia SPARQL
API or equivalent local service -
wikidata_api
:str
Wikidata Search
API or equivalent local service -
ner_map
:dict
named entity map for standardizing IRIs -
ns_prefix
:dict
RDF namespace prefixes -
min_alias
:float
minimum alias probability threshold for accepting linked entities -
min_similarity
:float
minimum label similarity threshold for accepting linked entities
augment_pipe
method¶
augment_pipe(factory)
Encapsulate a spaCy
call to add_pipe()
configuration.
factory
:textgraphs.pipe.PipelineFactory
aPipelineFactory
used to configure components
remap_ner
method¶
remap_ner(label)
Remap the OntoTypes4 values from NER output to more general-purpose IRIs.
-
label
:typing.Optional[str]
input NER label, anOntoTypes4
value -
returns :
typing.Optional[str]
an IRI for the named entity
normalize_prefix
method¶
normalize_prefix(iri, debug=False)
Normalize the given IRI using the standard DBPedia namespace prefixes.
-
iri
:str
input IRI, in fully-qualified domain representation -
debug
:bool
debugging flag -
returns :
str
the compact IRI representation, using an RDF namespace prefix
perform_entity_linking
method¶
perform_entity_linking(graph, pipe, debug=False)
Perform entity linking based on DBPedia Spotlight
and other services.
-
graph
:textgraphs.graph.SimpleGraph
source graph -
pipe
:textgraphs.pipe.Pipeline
configured pipeline for the current document -
debug
:bool
debugging flag
resolve_rel_iri
method¶
resolve_rel_iri(rel, lang="en", debug=False)
Resolve a rel
string from a relation extraction model which has
been trained on this knowledge graph, which defaults to using the
WikiMedia
graphs.
-
rel
:str
relation label, generation these source from Wikidata for many RE projects -
lang
:str
language identifier -
debug
:bool
debugging flag -
returns :
typing.Optional[str]
a resolved IRI
wikidata_search
method¶
wikidata_search(query, lang="en", debug=False)
Query the Wikidata search API.
-
query
:str
query string -
lang
:str
language identifier -
debug
:bool
debugging flag -
returns :
typing.Optional[textgraphs.elem.KGSearchHit]
search hit, if any
dbpedia_search_entity
method¶
dbpedia_search_entity(query, lang="en", debug=False)
Perform a DBPedia API search.
-
query
:str
query string -
lang
:str
language identifier -
debug
:bool
debugging flag -
returns :
typing.Optional[textgraphs.elem.KGSearchHit]
search hit, if any
dbpedia_sparql_query
method¶
dbpedia_sparql_query(sparql, debug=False)
Perform a SPARQL query on DBPedia.
-
sparql
:str
SPARQL query string -
debug
:bool
debugging flag -
returns :
dict
dictionary of query results
dbpedia_wikidata_equiv
method¶
dbpedia_wikidata_equiv(dbpedia_iri, debug=False)
Perform a SPARQL query on DBPedia to find an equivalent Wikidata entity.
-
dbpedia_iri
:str
IRI in DBpedia -
debug
:bool
debugging flag -
returns :
typing.Optional[str]
equivalent IRI in Wikidata
LinkedEntity
class¶
A data class representing one linked entity.
__repr__
method¶
__repr__()
InferRel
class¶
Abstract base class for a relation extraction model wrapper.
gen_triples_async
method¶
gen_triples_async(pipe, queue, debug=False)
Infer relations as triples produced to a queue concurrently.
-
pipe
:Pipeline
configured pipeline for the current document -
queue
:asyncio.queues.Queue
queue of inference tasks to be performed -
debug
:bool
debugging flag
gen_triples
method¶
gen_triples(pipe, debug=False)
Infer relations as triples through a generator iteratively.
-
pipe
:Pipeline
configured pipeline for the current document -
debug
:bool
debugging flag -
yields :
generated triples
InferRel_OpenNRE
class¶
Perform relation extraction based on the OpenNRE
model.
https://github.com/thunlp/OpenNRE
__init__
method¶
__init__(model="wiki80_cnn_softmax", max_skip=11, min_prob=0.9)
Constructor.
-
model
:str
the specific model to be used inOpenNRE
-
max_skip
:int
maximum distance between entities for inferred relations -
min_prob
:float
minimum probability threshold for accepting an inferred relation
gen_triples
method¶
gen_triples(pipe, debug=False)
Iterate on entity pairs to drive OpenNRE
, inferring relations
represented as triples which get produced by a generator.
-
pipe
:textgraphs.pipe.Pipeline
configured pipeline for the current document -
debug
:bool
debugging flag -
yields :
generated triples as candidates for inferred relations
InferRel_Rebel
class¶
Perform relation extraction based on the REBEL
model.
https://github.com/Babelscape/rebel
https://huggingface.co/spaces/Babelscape/mrebel-demo
__init__
method¶
__init__(lang="en_XX", mrebel_model="Babelscape/mrebel-large")
Constructor.
-
lang
:str
language identifier -
mrebel_model
:str
tokenizer model to be used
tokenize_sent
method¶
tokenize_sent(text)
Apply the tokenizer manually, since we need to extract special tokens.
-
text
:str
input text for the sentence to be tokenized -
returns :
str
extracted tokens
extract_triplets_typed
method¶
extract_triplets_typed(text)
Parse the generated text and extract its triplets.
-
text
:str
input text for the sentence to use in inference -
returns :
list
a list of extracted triples
gen_triples
method¶
gen_triples(pipe, debug=False)
Drive REBEL
to infer relations for each sentence, represented as
triples which get produced by a generator.
-
pipe
:textgraphs.pipe.Pipeline
configured pipeline for the current document -
debug
:bool
debugging flag -
yields :
generated triples as candidates for inferred relations
RenderPyVis
class¶
Render the lemma graph as a PyVis
network.
__init__
method¶
__init__(graph, kg)
Constructor.
-
graph
:textgraphs.graph.SimpleGraph
source graph to be visualized -
kg
:textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linking
render_lemma_graph
method¶
render_lemma_graph(debug=True)
Prepare the structure of the NetworkX
graph to use for building
and returning a PyVis
network to render.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
-
debug
:bool
debugging flag -
returns :
pyvis.network.Network
<apyvis.network.Network
interactive visualization
draw_communities
method¶
draw_communities(spring_distance=1.4, debug=False)
Cluster the communities in the lemma graph, then draw a
NetworkX
graph of the notes with a specific color for each
community.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
-
spring_distance
:float
NetworkX
parameter used to separate clusters visually -
debug
:bool
debugging flag -
returns :
typing.Dict[int, int]
a map of the calculated communities
generate_wordcloud
method¶
generate_wordcloud(background="black")
Generate a tag cloud from the given phrases.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
-
background
:str
background color for the rendering -
returns :
wordcloud.wordcloud.WordCloud
the rendering as awordcloud.WordCloud
object, which can be used to generate PNG images, etc.
NodeStyle
class¶
Dataclass used for styling PyVis nodes.
__setattr__
method¶
__setattr__(name, value)
GraphOfRelations
class¶
Attempt to reproduce results published in "INGRAM: Inductive Knowledge Graph Embedding via Relation Graphs" https://arxiv.org/abs/2305.19987
__init__
method¶
__init__(source)
Constructor.
source
:textgraphs.graph.SimpleGraph
source graph to be transformed
load_ingram
method¶
load_ingram(json_file, debug=False)
Load data for a source graph, as illustrated in lee2023ingram
-
json_file
:pathlib.Path
path for the JSON dataset to load -
debug
:bool
debugging flag
seeds
method¶
seeds(debug=False)
Prep data for the topological transform illustrated in lee2023ingram
debug
:bool
debugging flag
trace_source_graph
method¶
trace_source_graph()
Output a "seed" representation of the source graph.
construct_gor
method¶
construct_gor(debug=False)
Perform the topological transform described by lee2023ingram, constructing a graph of relations (GOR) and calculating affinity scores between entities in the GOR based on their definitions:
we measure the affinity between two relations by considering how many entities are shared between them and how frequently they share the same entity
debug
:bool
debugging flag
tally_frequencies
classmethod¶
tally_frequencies(counter)
Tally the frequency of shared entities.
-
counter
:collections.Counter
counter
data collection for the rel_b/entity pairs -
returns :
int
tallied values for one relation
get_affinity_scores
method¶
get_affinity_scores(debug=False)
Reproduce metrics based on the example published in lee2023ingram
-
debug
:bool
debugging flag -
returns :
typing.Dict[tuple, float]
the calculated affinity scores
trace_metrics
method¶
trace_metrics(scores)
Compare the calculated affinity scores with results from a published example.
-
scores
:typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values) -
returns :
pandas.core.frame.DataFrame
apandas.DataFrame
where the rows compare expected vs. observed affinity scores
render_gor_plt
method¶
render_gor_plt(scores)
Visualize the graph of relations using matplotlib
scores
:typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values)
render_gor_pyvis
method¶
render_gor_pyvis(scores)
Visualize the graph of relations interactively using PyVis
-
scores
:typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values) -
returns :
pyvis.network.Network
apyvis.networkNetwork
representation of the transformed graph
TransArc
class¶
A data class representing one transformed rel-node-rel triple in a graph of relations.
__repr__
method¶
__repr__()
RelDir
class¶
Enumeration for the directions of a relation.
SheafSeed
class¶
A data class representing a node from the source graph plus its partial edge, based on a Sheaf Theory decomposition of a graph.
__repr__
method¶
__repr__()
Affinity
class¶
A data class representing the affinity scores from one entity in the transformed graph of relations.
NB: there are much more efficient ways to calculate these affinity scores using sparse tensor algebra; this approach illustrates the process -- for research and debugging.
__repr__
method¶
__repr__()
module functions¶
calc_quantile_bins
function¶
calc_quantile_bins(num_rows)
Calculate the bins to use for a quantile stripe,
using numpy.linspace
-
num_rows
:int
number of rows in the target dataframe -
returns :
numpy.ndarray
calculated bins, as anumpy.ndarray
get_repo_version
function¶
get_repo_version()
Access the Git repository information and return items to identify the version/commit running in production.
- returns :
typing.Tuple[str, str]
version tag and commit hash
root_mean_square
function¶
root_mean_square(values)
Calculate the root mean square of the values in the given list.
-
values
:typing.List[float]
list of values to use in the RMS calculation -
returns :
float
RMS metric as a float
stripe_column
function¶
stripe_column(values, bins)
Stripe a column in a dataframe, by interpolating quantiles into a set of discrete indexes.
-
values
:list
list of values to stripe -
bins
:int
quantile bins; seecalc_quantile_bins()
-
returns :
numpy.ndarray
the striped column values, as anumpy.ndarray