Starting with a text document...
- parse the text document in
spaCy
- iterate through each sentence...
- filter for
token.pos_ in ["ADJ", "NOUN", "PROPN", "VERB"]
- print the results
Next, construct a graph...
- create a graph in
networkx
- use
key = (token.lemma_, token.pos_)
- create graph nodes using the key
- link nodes in each sentence which are within 3 hops
- visualize the graph
Next, rank tokens based on their "connectedness" within the graph...
- run
pagerank()
on the graph (eigenvalue centrality) - print the ranks (sort descending) and their lemma values
Next, apply the ranks to the noun chunks within the text...
- iterate through each noun chunk in the text document
- sum the ranks for each token (lemma) within the noun chunk
- print the noun chunks (sort descending)
Extras:
- use
spacy-wordnet
to add links among hypernyms and hyponyms - restrict the WordNet domains to specific areas of interest
- how could you use noun chunks and entities within the graph?