What's a Knowledge Graph?¶
DRAFT: Work in progress¶
This material is a work in progress, at "rough draft" stage.
First, read: [hogan2020knowledge]
Some good elements for KG definition: https://towardsdatascience.com/howtocreaterepresentationsofentitiesinaknowledgegraphusingpyrdf2vec82e44dad1a0#1f07
Graphs are data structures that are useful to represent ubiquitous phenomena, such as social networks, chemical molecules and recommendation systems. One of their strengths lies in the fact that they explicitly model relations (i.e. edges) between individual units (i.e. nodes), which adds an extra dimension to the data.
compare: [nathan2019fifty]
going back to: [peirce1882]
Just Enough Graph Theory¶
In a pure mathematical form, where a node (or vertex) can connect through an edge (or arc or link) to another node.
In that case an adjacency matrix can represent the entire graph. Each node has a row and a column in the matrix.
 https://en.wikipedia.org/wiki/Adjacency_matrix
 https://mathworld.wolfram.com/AdjacencyMatrix.html
In the simplest form, a 1
value in the matrix element represents an
edge between nodes or a 0
otherwise.
 symmetric for undirected graphs
 asymmetric for directed graphs
If a directed graph has weights on its edges (i.e., to represent the
probability of an event between two nodes) then replace the 1
value
with the weight or probability.
This is called a stochastic matrix
 https://en.wikipedia.org/wiki/Stochastic_matrix
 transitions of Markov chain (state)
There's an entire field of algebraic graph theory that translates between graph theory and linear algebra.
eigenvalues, eigenvectors, spectrum
nonnegative, symmetric properties allow for factorization (or decomposition):
 https://en.wikipedia.org/wiki/Matrix_decomposition
 https://sparse.tamu.edu/about
 https://www.cs.purdue.edu/homes/dgleich/
RDF Graph¶
 RDF graph, semantic technologies
 KG compare/contrast with property graph
 the ubiquity of graphbased work in data science
 graph algorithms vs. visualization vs. graph AI vs. probabilistic graphs vs. graph queries
 separation of concerns between graphbased data science and data engineering
 lessons of 2010 all over again: MPPs, BI, EDW, etc.
narrative arc: [linden2006early] => [kreps2014] => [anderson2020dt] with [breiman2001] [brewer2012cap] inbetween
In 2018, Gartner began to acknowledge the term knowledge graph and in mid2020 described the importance of KGs for developing AI applications^{1}.
KGs allow for multiple teams to be working concurrently, i.e., with less centralized control.
This is in contrast to legacy notion about one size fits all (OSFA) for data management.
To wit, the best way to make data consistent and available^{2} is not to import all of the data sources into a single data management framework, where a limited of set of individuals proscribe the schema and access rules. Those days are long gone.

"How to Build Knowledge Graphs That Enable AIDriven Enterprise Applications", Afraz Jaffri, Gartner Research (20200527) ↩

see [brewer2012cap] for discussion of the CAP Theorem ↩