Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 1.04 KB

challenge.md

File metadata and controls

33 lines (23 loc) · 1.04 KB

Programming Challenge

Starting with a text document...

  • parse the text document in spaCy
  • iterate through each sentence...
  • filter for token.pos_ in ["ADJ", "NOUN", "PROPN", "VERB"]
  • print the results

Next, construct a graph...

  • create a graph in networkx
  • use key = (token.lemma_, token.pos_)
  • create graph nodes using the key
  • link nodes in each sentence which are within 3 hops
  • visualize the graph

Next, rank tokens based on their "connectedness" within the graph...

  • run pagerank() on the graph (eigenvalue centrality)
  • print the ranks (sort descending) and their lemma values

Next, apply the ranks to the noun chunks within the text...

  • iterate through each noun chunk in the text document
  • sum the ranks for each token (lemma) within the noun chunk
  • print the noun chunks (sort descending)

Extras:

  • use spacy-wordnet to add links among hypernyms and hyponyms
  • restrict the WordNet domains to specific areas of interest
  • how could you use noun chunks and entities within the graph?