Technical Approach¶
Construct a lemma graph, then perform entity linking based on:
spaCy
, transformers
, SpanMarkerNER
,
spaCy-DBpedia-Spotlight
, REBEL
, OpenNRE
,
qwikidata
, pulp
- use
spaCy
to parse a document, augmented bySpanMarker
use of LLMs for NER - add noun chunks in parallel to entities, as "candidate" phrases for subsequent HITL confirmation
- perform entity linking:
spaCy-DBpedia-Spotlight
,WikiMedia API
, etc. - infer relations, plus graph inference:
REBEL
,OpenNRE
,qwikidata
, etc. - build a lemma graph in
NetworkX
from the parse results - run a modified
textrank
algorithm plus graph analytics - approximate a pareto archive (hypervolume) to re-rank extracted entities with
pulp
- visualize the lemma graph interactively in
PyVis
- cluster communities within the lemma graph
- apply topological transforms to enhance graph ML and embeddings
- build ML models from the graph of relations (in progress)
In other words, this hybrid approach integrates NLP parsing, LLMs, graph algorithms, semantic inference, operations research, and also provides UX affordances for including human-in-the-loop practices.
The demo app and the Hugging Face space both illustrate a relatively small problem, although they address a much broader class of AI problems in industry.
This step is a prelude before leveraging topological transforms, large language models, graph representation learning, plus human-in-the-loop domain expertise to infer the nodes, edges, properties, and probabilities needed for the semi-automated construction of knowledge graphs from raw unstructured text sources.
In addition to providing a library for production use cases,
TextGraphs
creates a "playground" or "gym"
in which to prototype and evaluate abstractions based on
"Graph Levels Of Detail"