Skip to content

Note

To run this notebook in JupyterLab, load examples/ex2_0.ipynb

bootstrap the lemma graph with RDF triples

Show how to bootstrap definitions in a lemma graph by loading RDF, e.g., for synonyms.

environment

from icecream import ic
from pyinstrument import Profiler
import pyvis

import textgraphs
%load_ext watermark
%watermark
Last updated: 2024-01-16T17:35:59.608787-08:00

Python implementation: CPython
Python version       : 3.10.11
IPython version      : 8.20.0

Compiler    : Clang 13.0.0 (clang-1300.0.29.30)
OS          : Darwin
Release     : 21.6.0
Machine     : x86_64
Processor   : i386
CPU cores   : 8
Architecture: 64bit
%watermark --iversions
pyvis     : 0.3.2
textgraphs: 0.5.0
sys       : 3.10.11 (v3.10.11:7d4cc5aa85, Apr  4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)]

load the bootstrap definitions

Define the bootstrap RDF triples in N3/Turtle format: we define an entity Werner as a synonym for Werner Herzog by using the skos:broader relation. Keep in mind that this entity may also refer to other Werners...

TTL_STR: str = """
@base <https://github.com/DerwenAI/textgraphs/ns/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<entity/werner_PROPN> a dbo:Person ;
    skos:prefLabel "Werner"@en .

<entity/werner_PROPN_herzog_PROPN> a dbo:Person ;
    skos:prefLabel "Werner Herzog"@en.

dbo:Person skos:definition "People, including fictional"@en ;
    skos:prefLabel "person"@en .

<entity/werner_PROPN_herzog_PROPN> skos:broader <entity/werner_PROPN> .
"""

Provide the source text

SRC_TEXT: str = """                                                                                                                      
Werner Herzog is a remarkable filmmaker and an intellectual originally from Germany, the son of Dietrich Herzog.
After the war, Werner fled to America to become famous.
"""

set up the statistical stack profiling

profiler: Profiler = Profiler()
profiler.start()

set up the TextGraphs pipeline

tg: textgraphs.TextGraphs = textgraphs.TextGraphs(
    factory = textgraphs.PipelineFactory(
        kg = textgraphs.KGWikiMedia(
            spotlight_api = textgraphs.DBPEDIA_SPOTLIGHT_API,
            dbpedia_search_api = textgraphs.DBPEDIA_SEARCH_API,
            dbpedia_sparql_api = textgraphs.DBPEDIA_SPARQL_API,
            wikidata_api = textgraphs.WIKIDATA_API,
            min_alias = textgraphs.DBPEDIA_MIN_ALIAS,
            min_similarity = textgraphs.DBPEDIA_MIN_SIM,
        ),
    ),
)

load the bootstrap definitions

tg.load_bootstrap_ttl(
    TTL_STR,
    debug = False,
)

parse the input text

pipe: textgraphs.Pipeline = tg.create_pipeline(
    SRC_TEXT.strip(),
)

tg.collect_graph_elements(
    pipe,
    debug = False,
)

tg.construct_lemma_graph(
    debug = False,
)

visualize the lemma graph

render: textgraphs.RenderPyVis = tg.create_render()

pv_graph: pyvis.network.Network = render.render_lemma_graph(
    debug = False,
)

initialize the layout parameters

pv_graph.force_atlas_2based(
    gravity = -38,
    central_gravity = 0.01,
    spring_length = 231,
    spring_strength = 0.7,
    damping = 0.8,
    overlap = 0,
)

pv_graph.show_buttons(filter_ = [ "physics" ])
pv_graph.toggle_physics(True)
pv_graph.prep_notebook()
pv_graph.show("tmp.fig04.html")
tmp.fig04.html

png

Notice how the Werner and Werner Herzog nodes are now linked? This synonym from the bootstrap definitions above provided means to link more portions of the lemma graph than the demo in ex0_0 with the same input text.

statistical stack profile instrumentation

profiler.stop()
<pyinstrument.session.Session at 0x1522e2110>
profiler.print()
  _     ._   __/__   _ _  _  _ _/_   Recorded: 17:35:59  Samples:  2846
 /_//_/// /_\ / //_// / //_'/ //     Duration: 4.111     CPU time: 3.294
/   _/                      v4.6.1

Program: /Users/paco/src/textgraphs/venv/lib/python3.10/site-packages/ipykernel_launcher.py -f /Users/paco/Library/Jupyter/runtime/kernel-4365d4ba-2d4d-4d4b-83e2-eb5ef8abfe26.json

4.111 IPythonKernel.dispatch_shell  ipykernel/kernelbase.py:378
└─ 4.075 IPythonKernel.execute_request  ipykernel/kernelbase.py:721
      [9 frames hidden]  ipykernel, IPython
         3.995 ZMQInteractiveShell.run_ast_nodes  IPython/core/interactiveshell.py:3394
         ├─ 3.250 <module>  ../ipykernel_4433/1372904243.py:1
           └─ 3.248 PipelineFactory.__init__  textgraphs/pipe.py:434
              └─ 3.232 load  spacy/__init__.py:27
                    [98 frames hidden]  spacy, en_core_web_sm, catalogue, imp...
                       0.496 tokenizer_factory  spacy/language.py:110
                       └─ 0.108 _validate_special_case  spacy/tokenizer.pyx:573
                       0.439 <lambda>  spacy/language.py:2170
                       └─ 0.085 _validate_special_case  spacy/tokenizer.pyx:573
         ├─ 0.672 <module>  ../ipykernel_4433/3257668275.py:1
           └─ 0.669 TextGraphs.create_pipeline  textgraphs/doc.py:103
              └─ 0.669 PipelineFactory.create_pipeline  textgraphs/pipe.py:508
                 └─ 0.669 Pipeline.__init__  textgraphs/pipe.py:216
                    └─ 0.669 English.__call__  spacy/language.py:1016
                          [31 frames hidden]  spacy, spacy_dbpedia_spotlight, reque...
         └─ 0.055 <module>  ../ipykernel_4433/72966960.py:1
            └─ 0.046 Network.prep_notebook  pyvis/network.py:552
                  [5 frames hidden]  pyvis, jinja2

outro

[ more parts are in progress, getting added to this demo ]