Technical Approach¶

Construct a lemma graph, then perform entity linking based on: spaCy, transformers, SpanMarkerNER, spaCy-DBpedia-Spotlight, REBEL, OpenNRE, qwikidata, pulp

use spaCy to parse a document, augmented by SpanMarker use of LLMs for NER
add noun chunks in parallel to entities, as "candidate" phrases for subsequent HITL confirmation
perform entity linking: spaCy-DBpedia-Spotlight, WikiMedia API, etc.
infer relations, plus graph inference: REBEL, OpenNRE, qwikidata, etc.
build a lemma graph in NetworkX from the parse results
run a modified textrank algorithm plus graph analytics
approximate a pareto archive (hypervolume) to re-rank extracted entities with pulp
visualize the lemma graph interactively in PyVis
cluster communities within the lemma graph
apply topological transforms to enhance graph ML and embeddings
build ML models from the graph of relations (in progress)

In other words, this hybrid approach integrates NLP parsing, LLMs, graph algorithms, semantic inference, operations research, and also provides UX affordances for including human-in-the-loop practices.

The demo app and the Hugging Face space both illustrate a relatively small problem, although they address a much broader class of AI problems in industry.

This step is a prelude before leveraging topological transforms, large language models, graph representation learning, plus human-in-the-loop domain expertise to infer the nodes, edges, properties, and probabilities needed for the semi-automated construction of knowledge graphs from raw unstructured text sources.

In addition to providing a library for production use cases, TextGraphs creates a "playground" or "gym" in which to prototype and evaluate abstractions based on "Graph Levels Of Detail"