Skip to content


Where possible, the bibliography entries use conventions at for citation keys.

Journal abbreviations come from based on ISO 4 standards.

Links to online versions of cited works use DOI for persistent identifiers. When available, open access URLs are listed as well.

– F –


"PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents"
Corina Florescu, Cornelia Caragea
Comput Linguist Assoc Comput Linguis pp. 1105-1115 (2017-07-30)
DOI: 10.18653/v1/P17-1102

The large and growing amounts of online scholarly data present both challenges and opportunities to enhance knowledge discovery. One such challenge is to automatically extract a small set of keyphrases from a document that can accurately describe the document’s content and can facilitate fast information processing. In this paper, we propose PositionRank, an unsupervised model for keyphrase extraction from scholarly documents that incorporates information from all positions of a word’s occurrences into a biased PageRank. Our model obtains remarkable improvements in performance over PageRank models that do not take into account word positions as well as over strong baselines for this task. Specifically, on several datasets of research papers, PositionRank achieves improvements as high as 29.09%.

– G –


"PageRank Beyond the Web"
David Gleich
SIAM Review 57 3 pp. 321-363 (2015-08-06)
DOI: 10.1137/140976649

Google's PageRank method was developed to evaluate the importance of web-pages via their link structure. The mathematics of PageRank, however, are entirely general and apply to any graph or network in any domain. Thus, PageRank is now regularly used in bibliometrics, social and information network analysis, and for link prediction and recommendation. It's even used for systems analysis of road networks, as well as biology, chemistry, neuroscience, and physics. We'll see the mathematics and ideas that unite these diverse applications.

– K –


"Biased TextRank: Unsupervised Graph-Based Content Extraction"
Ashkan Kazemi, Verónica Pérez-Rosas, Rada Mihalcea
COLING 28 pp. 1642-1652 (2020-12-08)
DOI: 10.18653/v1/2020.coling-main.144

We introduce Biased TextRank, a graph-based content extraction method inspired by the popular TextRank algorithm that ranks text spans according to their importance for language processing tasks and according to their relevance to an input 'focus'. Biased TextRank enables focused content extraction for text by modifying the random restarts in the execution of TextRank. The random restart probabilities are assigned based on the relevance of the graph nodes to the focus of the task. We present two applications of Biased TextRank: focused summarization and explanation extraction, and show that our algorithm leads to improved performance on two different datasets by significant ROUGE-N score margins. Much like its predecessor, Biased TextRank is unsupervised, easy to implement and orders of magnitude faster and lighter than current state-of-the-art Natural Language Processing methods for similar tasks.

– M –


"TextRank: Bringing Order into Text"
Rada Mihalcea, Paul Tarau
EMNLP pp. 404-411 (2004-07-25)

In this paper, the authors introduce TextRank, a graph-based ranking model for text processing, and show how this model can be successfully used in natural language applications.

– P –


"The PageRank Citation Ranking: Bringing Order to the Web"
Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd
Stanford InfoLab (1999-11-11)

The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.

– W –


"Summarizing documents"
Mike Williams

I've recently given a couple of talks (PyGotham video, PyGotham slides, Strata NYC slides) about text summarization. I cover three ways of automatically summarizing text. One is an extremely simple algorithm from the 1950s, one uses Latent Dirichlet Allocation, and one uses skipthoughts and recurrent neural networks. The talk is conceptual, and avoids code and mathematics. So here is a list of resources if you're interested in text summarization and want to dive deeper. This list useful is hopefully also useful if you're interested in topic modelling or neural networks for other reasons.

Last update: 2021-06-29