Skip to content

Graph-Based Data Science

illustration of a knowledge graph, plus laboratory glassware

The kglab package provides a simple abstraction layer in Python for building knowledge graphs.

The main goal is to leverage idiomatic Python for common use cases in data science and data engineering work that require graph data, presenting graph-based data science as an emerging practice.

Cut to the Chase

  1. To get started right away, jump to Getting Started
  2. For other help, see Community Resources
  3. For an extensive, hands-on coding tour through kglab, follow the Tutorial notebooks
  4. Check the source code at



FAQ: Why build yet another graph library, when there are already so many available?

A short list of primary motivations have been identified for kglab, its design criteria, and engineering trade-offs:

Point 1: integrate with popular graph libraries, including RDFlib, OWL-RL, pySHACL, NetworkX, iGraph, PyVis, node2vec, pslpython, pgmpy, and so on – several of which would otherwise not have much common ground.

Data Science Workflows

Point 2: close integration plus example code for working with the "PyData" stack, namely pandas, NumPy, scikit-learn, matplotlib, etc., as well as PyTorch, and other quintessential data science tools.

Distributed Systems Infrastructure

Point 3: integrate efficiently with Big Data tools and practices for contemporary data engineering and cloud computing infrastructure, including: Ray, Jupyter, RAPIDS, Apache Arrow, Apache Parquet, Apache Spark, etc.

Natural Language Understanding

Point 4: incorporate graph-based methods and semantic technologies into spaCy pipelines, e.g., through pytextrank, plus biome.text and other customized natural language pipelines.

Hybrid AI Approaches

Point 5: explore "hybrid" approaches that combine machine learning with symbolic, rule-based processing – including probabilistic graph inference and knowledge graph embedding.

Last update: 2021-04-03