Skip to content

Technical Approach

Construct a lemma graph, then perform entity linking based on: spaCy, transformers, SpanMarkerNER, spaCy-DBpedia-Spotlight, REBEL, OpenNRE, qwikidata, pulp

  1. use spaCy to parse a document, augmented by SpanMarker use of LLMs for NER
  2. add noun chunks in parallel to entities, as "candidate" phrases for subsequent HITL confirmation
  3. perform entity linking: spaCy-DBpedia-Spotlight, WikiMedia API, etc.
  4. infer relations, plus graph inference: REBEL, OpenNRE, qwikidata, etc.
  5. build a lemma graph in NetworkX from the parse results
  6. run a modified textrank algorithm plus graph analytics
  7. approximate a pareto archive (hypervolume) to re-rank extracted entities with pulp
  8. visualize the lemma graph interactively in PyVis
  9. cluster communities within the lemma graph
  10. apply topological transforms to enhance graph ML and embeddings
  11. build ML models from the graph of relations (in progress)

In other words, this hybrid approach integrates NLP parsing, LLMs, graph algorithms, semantic inference, operations research, and also provides UX affordances for including human-in-the-loop practices.

The demo app and the Hugging Face space both illustrate a relatively small problem, although they address a much broader class of AI problems in industry.

This step is a prelude before leveraging topological transforms, large language models, graph representation learning, plus human-in-the-loop domain expertise to infer the nodes, edges, properties, and probabilities needed for the semi-automated construction of knowledge graphs from raw unstructured text sources.

In addition to providing a library for production use cases, TextGraphs creates a "playground" or "gym" in which to prototype and evaluate abstractions based on "Graph Levels Of Detail"