Note
To run this notebook in JupyterLab, load examples/ex2_0.ipynb
bootstrap the lemma graph with RDF triples¶
Show how to bootstrap definitions in a lemma graph by loading RDF, e.g., for synonyms.
environment¶
from icecream import ic
from pyinstrument import Profiler
import pyvis
import textgraphs
%load_ext watermark
%watermark
Last updated: 2024-01-16T17:35:59.608787-08:00
Python implementation: CPython
Python version : 3.10.11
IPython version : 8.20.0
Compiler : Clang 13.0.0 (clang-1300.0.29.30)
OS : Darwin
Release : 21.6.0
Machine : x86_64
Processor : i386
CPU cores : 8
Architecture: 64bit
%watermark --iversions
pyvis : 0.3.2
textgraphs: 0.5.0
sys : 3.10.11 (v3.10.11:7d4cc5aa85, Apr 4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)]
load the bootstrap definitions¶
Define the bootstrap RDF triples in N3/Turtle format: we define an entity Werner
as a synonym for Werner Herzog
by using the skos:broader
relation. Keep in mind that this entity may also refer to other Werners...
TTL_STR: str = """
@base <https://github.com/DerwenAI/textgraphs/ns/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
<entity/werner_PROPN> a dbo:Person ;
skos:prefLabel "Werner"@en .
<entity/werner_PROPN_herzog_PROPN> a dbo:Person ;
skos:prefLabel "Werner Herzog"@en.
dbo:Person skos:definition "People, including fictional"@en ;
skos:prefLabel "person"@en .
<entity/werner_PROPN_herzog_PROPN> skos:broader <entity/werner_PROPN> .
"""
Provide the source text
SRC_TEXT: str = """
Werner Herzog is a remarkable filmmaker and an intellectual originally from Germany, the son of Dietrich Herzog.
After the war, Werner fled to America to become famous.
"""
set up the statistical stack profiling
profiler: Profiler = Profiler()
profiler.start()
set up the TextGraphs
pipeline
tg: textgraphs.TextGraphs = textgraphs.TextGraphs(
factory = textgraphs.PipelineFactory(
kg = textgraphs.KGWikiMedia(
spotlight_api = textgraphs.DBPEDIA_SPOTLIGHT_API,
dbpedia_search_api = textgraphs.DBPEDIA_SEARCH_API,
dbpedia_sparql_api = textgraphs.DBPEDIA_SPARQL_API,
wikidata_api = textgraphs.WIKIDATA_API,
min_alias = textgraphs.DBPEDIA_MIN_ALIAS,
min_similarity = textgraphs.DBPEDIA_MIN_SIM,
),
),
)
load the bootstrap definitions
tg.load_bootstrap_ttl(
TTL_STR,
debug = False,
)
parse the input text
pipe: textgraphs.Pipeline = tg.create_pipeline(
SRC_TEXT.strip(),
)
tg.collect_graph_elements(
pipe,
debug = False,
)
tg.construct_lemma_graph(
debug = False,
)
visualize the lemma graph¶
render: textgraphs.RenderPyVis = tg.create_render()
pv_graph: pyvis.network.Network = render.render_lemma_graph(
debug = False,
)
initialize the layout parameters
pv_graph.force_atlas_2based(
gravity = -38,
central_gravity = 0.01,
spring_length = 231,
spring_strength = 0.7,
damping = 0.8,
overlap = 0,
)
pv_graph.show_buttons(filter_ = [ "physics" ])
pv_graph.toggle_physics(True)
pv_graph.prep_notebook()
pv_graph.show("tmp.fig04.html")
tmp.fig04.html
Notice how the Werner
and Werner Herzog
nodes are now linked? This synonym from the bootstrap definitions above provided means to link more portions of the lemma graph than the demo in ex0_0
with the same input text.
statistical stack profile instrumentation¶
profiler.stop()
<pyinstrument.session.Session at 0x1522e2110>
profiler.print()
_ ._ __/__ _ _ _ _ _/_ Recorded: 17:35:59 Samples: 2846
/_//_/// /_\ / //_// / //_'/ // Duration: 4.111 CPU time: 3.294
/ _/ v4.6.1
Program: /Users/paco/src/textgraphs/venv/lib/python3.10/site-packages/ipykernel_launcher.py -f /Users/paco/Library/Jupyter/runtime/kernel-4365d4ba-2d4d-4d4b-83e2-eb5ef8abfe26.json
4.111 IPythonKernel.dispatch_shell ipykernel/kernelbase.py:378
└─ 4.075 IPythonKernel.execute_request ipykernel/kernelbase.py:721
[9 frames hidden] ipykernel, IPython
3.995 ZMQInteractiveShell.run_ast_nodes IPython/core/interactiveshell.py:3394
├─ 3.250 <module> ../ipykernel_4433/1372904243.py:1
│ └─ 3.248 PipelineFactory.__init__ textgraphs/pipe.py:434
│ └─ 3.232 load spacy/__init__.py:27
│ [98 frames hidden] spacy, en_core_web_sm, catalogue, imp...
│ 0.496 tokenizer_factory spacy/language.py:110
│ └─ 0.108 _validate_special_case spacy/tokenizer.pyx:573
│ 0.439 <lambda> spacy/language.py:2170
│ └─ 0.085 _validate_special_case spacy/tokenizer.pyx:573
├─ 0.672 <module> ../ipykernel_4433/3257668275.py:1
│ └─ 0.669 TextGraphs.create_pipeline textgraphs/doc.py:103
│ └─ 0.669 PipelineFactory.create_pipeline textgraphs/pipe.py:508
│ └─ 0.669 Pipeline.__init__ textgraphs/pipe.py:216
│ └─ 0.669 English.__call__ spacy/language.py:1016
│ [31 frames hidden] spacy, spacy_dbpedia_spotlight, reque...
└─ 0.055 <module> ../ipykernel_4433/72966960.py:1
└─ 0.046 Network.prep_notebook pyvis/network.py:552
[5 frames hidden] pyvis, jinja2
outro¶
[ more parts are in progress, getting added to this demo ]