# Graph-Based Data Science¶

The **kglab** package provides a simple
abstraction layer
in Python for building
*knowledge graphs*.

The main goal is to leverage idiomatic Python for common use cases in
data science
and
data engineering
work that require graph data, presenting
*graph-based data science*
as an emerging practice.

## Cut to the Chase¶

- To get started right away, jump to
*Getting Started* - For other help, see
*Community Resources* - For an extensive, hands-on coding tour through
**kglab**, follow the*Tutorial*notebooks - Check the source code at https://github.com/DerwenAI/kglab

## Motivations¶

Note

**FAQ:** Why build yet another graph library, when there are already so many available?

A short list of primary motivations have been identified for
**kglab**, its design criteria, and engineering trade-offs:

### Popular Graph Libraries¶

**Point 1:**
integrate with popular graph libraries, including
RDFlib,
OWL-RL,
pySHACL,
NetworkX,
iGraph,
PyVis,
node2vec,
pslpython,
pgmpy,
and so on –
several of which would otherwise not have much common ground.

### Data Science Workflows¶

**Point 2:**
close integration plus example code for working with the
"PyData" stack,
namely
pandas,
NumPy,
scikit-learn,
matplotlib,
etc.,
as well as
PyTorch,
and other quintessential data science tools.

### Distributed Systems Infrastructure¶

**Point 3:**
integrate efficiently with *Big Data* tools and practices for contemporary
data engineering
and
cloud computing
infrastructure, including:
Ray,
Jupyter,
RAPIDS,
Apache Arrow,
Apache Parquet,
Apache Spark,
etc.

### Natural Language Understanding¶

**Point 4:**
incorporate graph-based methods and
semantic technologies
into
`spaCy`

pipelines, e.g., through
`pytextrank`

,
plus
`biome.text`

and other customized
natural language
pipelines.

### Hybrid AI Approaches¶

**Point 5:**
explore "hybrid" approaches that combine
machine learning
with
symbolic, rule-based processing – including
probabilistic graph inference
and
knowledge graph embedding.