Skip to content

Reference: kglab package

KnowledgeGraph class

This is the primary class used to represent RDF graphs, on which the other classes are dependent. See https://derwen.ai/docs/kgl/concepts/#knowledge-graph

Core feature areas include:

  • namespace management (ontology, controlled vocabularies)
  • graph construction
  • serialization
  • SPARQL querying
  • SHACL validation
  • inference based on OWL-RL, RDFS, SKOS

__init__ method

[source]

__init__(name="generic", base_uri=None, language="en", use_gpus=True, import_graph=None, namespaces=None)

Constructor for a KnowledgeGraph object.

  • name : str
    optional, internal name for this graph

  • base_uri : str
    the default base URI for this RDF graph

  • language : str
    the default language tag, e.g., used for language indexing

  • use_gpus : bool
    optionally, use the NVidia GPU devices with RAPIDS if these libraries have been installed and the devices are available; defaults to True

  • import_graph : typing.Union[rdflib.graph.ConjunctiveGraph, rdflib.graph.Dataset, rdflib.graph.Graph, NoneType]
    optionally, another existing RDF graph to be used as a starting point

  • namespaces : dict
    a dictionary of namespace (dict values) and their corresponding prefix strings (dict keys) to add as controlled vocabularies which are available for use in the RDF graph, binding each prefix to the given namespace


rdf_graph method

[source]

rdf_graph()

Accessor for the RDF graph.


add_ns method

[source]

add_ns(prefix, iri, override=True, replace=False)

Adds another namespace among the controlled vocabularies available to use in the RDF graph, binding the prefix to the given namespace.

Since the RDFlib NamespaceManager automagically converts all input bindings into URIRef instead, we'll keep references to the namespaces – for later use.

  • prefix : str
    a namespace prefix; it's recommended to confirm prefix usage (based on convention) by searching on http://prefix.cc/

  • iri : str
    URL to use for constructing the namespace IRI

  • override : bool
    rebind, even if the given namespace is already bound with another prefix

  • replace : bool
    replace any existing prefix with the new namespace


get_ns method

[source]

get_ns(prefix)

Lookup a namespace among the controlled vocabularies available to use in the RDF graph.

  • prefix : str
    a namespace prefix

  • returns : rdflib.namespace.Namespace
    the RDFlib Namespace for the controlled vocabulary referenced by prefix


get_ns_dict method

[source]

get_ns_dict()

Generate a dictionary of the namespaces used in this RDF graph.

  • returns : dict
    a dict describing the namespaces in this RDF graph

describe_ns method

[source]

describe_ns()

Describe the namespaces used in this RDF graph.


get_context method

[source]

get_context()

Generates a JSON-LD context used for serializing the RDF graph as JSON-LD.

  • returns : dict
    context needed for JSON-LD serialization

encode_date method

[source]

encode_date(datetime, tzinfos)

Helper method to ensure that an input datetime value has a timezone that can be interpreted by rdflib.XSD.dateTime.

  • datetime : str
    input datetime as a string

  • tzinfos : dict
    timezones as a dict, used by

  • returns : rdflib.term.Literal
    rdflib.Literal formatted as an XML Schema 2 dateTime value


add method

[source]

add(s, p, o)

Wrapper for rdflib.Graph.add() to add a relation (subject, predicate, object) to the RDF graph, if it doesn't already exist. Uses the RDF Graph as its context.

To prepare for upcoming kglab features, this is the preferred method for adding relations to an RDF graph.

  • s : typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    subject node;

  • p : typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    predicate relation;

  • o : typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    object node;


remove method

[source]

remove(s, p, o)

Wrapper for rdflib.Graph.remove() to remove a relation (subject, predicate, object) from the RDF graph, if it exist. Uses the RDF Graph as its context.

To prepare for upcoming kglab features, this is the preferred method for removing relations from an RDF graph.

  • s : typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    subject node;

  • p : typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    predicate relation;

  • o : typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    object node;


load_rdf method

[source]

load_rdf(path, format="ttl", base=None, **args)

Wrapper for rdflib.Graph.parse() which parses an RDF graph from the path source. This traps some edge cases for the several source-ish parameters in RDFlib which had been overloaded. Throws TypeError whenever a format parser plugin encounters a syntax error.

Note: this adds relations to an RDF graph, although it does not overwrite the existing RDF graph.

  • path : typing.Union[str, pathlib.Path, urlpath.URL, typing.IO]
    must be a file name (str) or a path object (not a URL) to a local file reference; or a readable, file-like object

  • format : str
    serialization format, defaults to Turtle triples; see _RDF_FORMAT for a list of default formats, which can be extended with plugins – excluding the "json-ld" format; otherwise this throws a TypeError exception

  • base : str
    logical URI to use as the document base; if not specified, the document location gets used

  • returns : KnowledgeGraph
    this KnowledgeGraph object – used for method chaining


load_rdf_text method

[source]

load_rdf_text(data, format="ttl", base=None, **args)

Wrapper for rdflib.Graph.parse() which parses an RDF graph from a text. This traps some edge cases for the several source-ish parameters in RDFlib which had been overloaded.

Note: this adds relations to an RDF graph, it does not overwrite the existing RDF graph.

  • data : typing.AnyStr
    text representation of RDF graph data

  • format : str
    serialization format, defaults to Turtle triples; see _RDF_FORMAT for a list of default formats, which can be extended with plugins – excluding the "json-ld" format; otherwise this throws a TypeError exception

  • base : str
    logical URI to use as the document base; if not specified, the document location gets used

  • returns : KnowledgeGraph
    this KnowledgeGraph object – used for method chaining


save_rdf method

[source]

save_rdf(path, format="ttl", base=None, encoding="utf-8", **args)

Wrapper for rdflib.Graph.serialize() which serializes the RDF graph to the path destination. This traps some edge cases for the destination parameter in RDFlib which had been overloaded.

  • path : typing.Union[str, pathlib.Path, urlpath.URL, typing.IO]
    must be a file name (str) or a path object (not a URL) to a local file reference; or a writable, bytes-like object; otherwise this throws a TypeError exception

  • format : str
    serialization format, which defaults to Turtle triples; see _RDF_FORMAT for a list of default formats, which can be extended with plugins – excluding the "json-ld" format; otherwise this throws a TypeError exception

  • base : str
    optional base set for the graph

  • encoding : str
    text encoding value, defaults to "utf-8", must be in the Python codec registry; otherwise this throws a LookupError exception


save_rdf_text method

[source]

save_rdf_text(format="ttl", base=None, encoding="utf-8", **args)

Wrapper for rdflib.Graph.serialize() which serializes the RDF graph to a string.

  • format : str
    serialization format, which defaults to Turtle triples; see _RDF_FORMAT for a list of default formats, which can be extended with plugins; otherwise this throws a TypeError exception

  • base : str
    optional base set for the graph

  • encoding : str
    text encoding value, defaults to "utf-8", must be in the Python codec registry; otherwise this throws a LookupError exception

  • returns : typing.AnyStr
    text representing the RDF graph


load_jsonld method

[source]

load_jsonld(path, encoding="utf-8", **args)

Wrapper for rdflib-jsonld.parser.JsonLDParser.parse() which parses an RDF graph from a JSON-LD source. This traps some edge cases for the several source-ish parameters in RDFlib which had been overloaded.

Note: this adds relations to an RDF graph, it does not overwrite the existing RDF graph.

  • path : typing.Union[str, pathlib.Path, urlpath.URL, typing.IO]
    must be a file name (str) or a path object (not a URL) to a local file reference; or a readable, file-like object; otherwise this throws a TypeError exception

  • encoding : str
    text encoding value, which defaults to "utf-8"; must be in the Python codec registry; otherwise this throws a LookupError exception

  • returns : KnowledgeGraph
    this KnowledgeGraph object – used for method chaining


save_jsonld method

[source]

save_jsonld(path, encoding="utf-8", **args)

Wrapper for rdflib-jsonld.serializer.JsonLDSerializer.serialize() which serializes the RDF graph to the path destination as JSON-LD. This traps some edge cases for the destination parameter in RDFlib which had been overloaded.

  • path : typing.Union[str, pathlib.Path, urlpath.URL, typing.IO]
    must be a file name (str) or a path object (not a URL) to a local file reference; or a writable, bytes-like object; otherwise this throws a TypeError exception

  • encoding : str
    text encoding value, which defaults to "utf-8"; must be in the Python codec registry; otherwise this throws a LookupError exception


load_csv method

[source]

load_csv(url)

Wrapper for csvwlib which parses a CSV file from the path source, then converts to RDF and merges into this RDF graph.

  • url : str
    must be a URL represented as a string

  • returns : KnowledgeGraph
    this KnowledgeGraph object – used for method chaining


load_parquet method

[source]

load_parquet(path, **kwargs)

Wrapper for pandas.read_parquet() which parses an RDF graph represented as a Parquet file, using the pyarrow engine. Uses the RAPIDS cuDF library if GPUs are enabled.

To prepare for upcoming kglab features, this is the preferred method for deserializing an RDF graph.

Note: this adds relations to an RDF graph, it does not overwrite the existing RDF graph.

  • path : typing.Union[str, pathlib.Path, urlpath.URL, typing.IO]
    must be a file name (str), path object to a local file reference, or a readable, file-like object; a string could be a URL; valid URL schemes include https, http, ftp, s3, gs, file; a file URL can also be a path to a directory that contains multiple partitioned files, including a bucket in cloud storage – based on fsspec

  • returns : KnowledgeGraph
    this KnowledgeGraph object – used for method chaining


save_parquet method

[source]

save_parquet(path, compression="snappy", storage_options=None, **kwargs)

Wrapper for pandas.to_parquet() which serializes an RDF graph to a Parquet file, using the pyarrow engine. Uses the RAPIDS cuDF library if GPUs are enabled.

To prepare for upcoming kglab features, this is the preferred method for serializing an RDF graph.

  • path : typing.Union[str, pathlib.Path, urlpath.URL, typing.IO]
    must be a file name (str), path object to a local file reference, or a writable, bytes-like object; a string could be a URL; valid URL schemes include https, http, ftp, s3, gs, file; accessing cloud storage is based on fsspec

  • compression : str
    name of the compression algorithm to use; defaults to "snappy"; can also be "gzip", "brotli", or None for no compression

  • storage_options : dict
    extra options parsed by fsspec for cloud storage access; NOT USED until pandas 1.2.x becomes stable across platforms and also RAPIDS provides support


n3fy method

[source]

n3fy(node, pythonify=True)

Wrapper for RDFlib n3() and toPython() to serialize a node into a human-readable representation using N3 format.

  • node : typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    must be a rdflib.term.Node

  • pythonify : bool
    flag to force instances of rdflib.term.Literal to their Python literal representation

  • returns : typing.Any
    text (or Python objects) for the serialized node


n3fy_row method

[source]

n3fy_row(row_dict, pythonify=True)

Wrapper for RDFlib n3() and toPython() to serialize one row of a result set from a SPARQL query into a human-readable representation for each term using N3 format.

  • row_dict : dict
    one row of a SPARQL query results, as a dict

  • pythonify : bool
    flag to force instances of rdflib.term.Literal to their Python literal representation

  • returns : dict
    a dictionary of serialized row bindings


query method

[source]

query(sparql, bindings=None)

Wrapper for rdflib.Graph.query() to perform a SPARQL query on the RDF graph.

  • sparql : str
    text for the SPARQL query

  • bindings : dict
    initial variable bindings

  • yields :
    rdflib.query.ResultRow named tuples, to iterate through the query result set


query_as_df method

[source]

query_as_df(sparql, bindings=None, simplify=True, pythonify=True)

Wrapper for rdflib.Graph.query() to perform a SPARQL query on the RDF graph.

  • sparql : str
    text for the SPARQL query

  • bindings : dict
    initial variable bindings

  • simplify : bool
    convert terms in each row of the result set into a readable representation for each term, using N3 format

  • pythonify : bool
    convert instances of rdflib.term.Literal to their Python literal representation

  • returns : pandas.core.frame.DataFrame
    the query result set represented as a pandas.DataFrame; uses the RAPIDS cuDF library if GPUs are enabled


validate method

[source]

validate(shacl_graph=None, shacl_graph_format=None, ont_graph=None, ont_graph_format=None, advanced=False, inference=None, inplace=True, abort_on_error=None, **kwargs)

Wrapper for pyshacl.validate() for validating the RDF graph using rules expressed in the SHACL (Shapes Constraint Language).

  • shacl_graph : typing.Union[rdflib.graph.ConjunctiveGraph, rdflib.graph.Dataset, rdflib.graph.Graph, ~AnyStr, NoneType]
    text representation, file path, or URL of the SHACL shapes graph to use in validation

  • shacl_graph_format : typing.Union[str, NoneType]
    RDF format, if the shacl_graph parameter is a text representation of the shapes graph

  • ont_graph : typing.Union[rdflib.graph.ConjunctiveGraph, rdflib.graph.Dataset, rdflib.graph.Graph, ~AnyStr, NoneType]
    text representation, file path, or URL of an optional, extra ontology to mix into the RDF graph ont_graph_format RDF format, if the ont_graph parameter is a text representation of the extra ontology

  • advanced : typing.Union[bool, NoneType]
    enable advanced SHACL features

  • inference : typing.Union[str, NoneType]
    prior to validation, run OWL2 RL profile-based expansion of the RDF graph based on OWL-RL; values: "rdfs", "owlrl", "both", None

  • inplace : typing.Union[bool, NoneType]
    when enabled, do not clone the RDF graph prior to inference/expansion, just manipulate it in-place

  • abort_on_error : typing.Union[bool, NoneType]
    abort validation on the first error

  • returns : typing.Tuple[bool, KnowledgeGraph, str]
    a tuple of conforms (RDF graph passes the validation rules) + report_graph (report as a KnowledgeGraph object) + report_text (report formatted as text)


infer_owlrl_closure method

[source]

infer_owlrl_closure()

Infer deductive closure for OWL 2 RL semantics based on OWL-RL

See https://wiki.uib.no/info216/index.php/Python_Examples#RDFS_inference_with_RDFLib


infer_rdfs_closure method

[source]

infer_rdfs_closure()

Infer deductive closure for RDFS semantics based on OWL-RL

See https://wiki.uib.no/info216/index.php/Python_Examples#RDFS_inference_with_RDFLib


infer_rdfs_properties method

[source]

infer_rdfs_properties()

Perform RDFS sub-property inference, adding super-properties where sub-properties have been used.

Adapted from skosify which wasn't being updated regularly.


infer_rdfs_classes method

[source]

infer_rdfs_classes()

Perform RDFS subclass inference, marking all resources having a subclass type with their superclass.

Adapted from skosify which wasn't being updated regularly.


infer_skos_related method

[source]

infer_skos_related()

Infer OWL symmetry (both directions) for skos:related (S23)

Adapted from skosify which wasn't being updated regularly.


infer_skos_concept method

[source]

infer_skos_concept()

Infer skos:topConceptOf as a sub-property of skos:inScheme (S7)

Infer skos:topConceptOf as owl:inverseOf the property skos:hasTopConcept (S8)

Adapted from skosify which wasn't being updated regularly.


infer_skos_hierarchical method

[source]

infer_skos_hierarchical(narrower=True)

Infer skos:narrower as owl:inverseOf the property skos:broader; although only keep skos:narrower on request (S25)

Adapted from skosify which wasn't being updated regularly.

  • narrower : bool
    if false, skos:narrower will be removed instead of added

infer_skos_transitive method

[source]

infer_skos_transitive(narrower=True)

Infer transitive closure, skos:broader as a sub-property of skos:broaderTransitive, and skos:narrower as a sub-property of skos:narrowerTransitive (S22)

Infer skos:broaderTransitive and skos:narrowerTransitive (on request only) as instances of owl:TransitiveProperty (S24)

Adapted from skosify which wasn't being updated regularly.

  • narrower : bool
    also infer transitive closure for skos:narrowerTransitive

infer_skos_symmetric_mappings method

[source]

infer_skos_symmetric_mappings(related=True)

Infer symmetric mapping properties (skos:relatedMatch, skos:closeMatch, skos:exactMatch) as instances of owl:SymmetricProperty (S44)

Adapted from skosify which wasn't being updated regularly.

  • related : bool
    infer the skos:related super-property for all skos:relatedMatch relations

infer_skos_hierarchical_mappings method

[source]

infer_skos_hierarchical_mappings(narrower=True)

Infer skos:narrowMatch as owl:inverseOf the property skos:broadMatch (S43)

Infer the skos:related super-property for all skos:relatedMatch relations (S41)

Adapted from skosify which wasn't being updated regularly.

  • narrower : bool
    if false, skos:narrowMatch will be removed instead of added

Subgraph class

Base class for projection of an RDF graph into an algebraic object such as a vector, matrix, or tensor representation, to support integration with non-RDF graph libraries. In other words, this class provides means to vectorize selected portions of a graph as a dimension. See https://derwen.ai/docs/kgl/concepts/#subgraph

Features support several areas of use cases, including:

  • label encoding
  • vectorization (parallel processing)
  • graph algorithms
  • visualization
  • embedding (deep learning)
  • probabilistic graph inference (statistical relational learning)

The base case is where a subset of the nodes in the source RDF graph get represented as a vector, in the node_vector member. This provides an efficient index on a constructed dimension, solely for the context of a specific use case.


__init__ method

[source]

__init__(kg, preload=None)

Constructor for creating and manipulating a subgraph as a vector, projecting from an RDF graph represented by a KnowledgeGraph object.

  • kg : kglab.kglab.KnowledgeGraph
    the source RDF graph

  • preload : list
    an optional, pre-determined list to pre-load for label encoding


transform method

[source]

transform(node)

Transforms a node in an RDF graph to an integer value, as a unique identifier with the closure of a specific use case. The integer value can then be used to index into an algebraic object such as a matrix or tensor. Effectvely, this method is similar to a sklearn.preprocessing.LabelEncoder.

Notes:

  • the integer value is not a uuid since it is only defined within the closure of a specific use case.
  • a special value -1 represents the unique identifier for a non-existent (None) node, which is useful in data structures that have optional placeholders for links to RDF nodes

  • node : typing.Union[str, NoneType, rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    a node in the RDF graph

  • returns : int
    a unique identifier (an integer index) for the node in the RDF graph


inverse_transform method

[source]

inverse_transform(id)

Inverse transform from an intenger to a node in the RDF graph, using the identifier as an index into the node vector.

  • id : int
    an integer index for the node in the RDF graph

  • returns : typing.Union[str, NoneType, rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    node in the RDF graph


n3fy method

[source]

n3fy(node)

Wrapper for RDFlib n3() and toPython() to serialize a node into a human-readable representation using N3 format. This method provides a convenience, which in turn calls KnowledgeGraph.n3fy()

  • node : typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    must be a rdflib.term.Node

  • returns : typing.Any
    text (or Python object) for the serialized node

SubgraphMatrix class

Projection of a RDF graph to a matrix representation. Typical use cases include integration with non-RDF graph libraries for graph algorithms.


__init__ method

[source]

__init__(kg, sparql, bindings=None, src_dst=None)

Constructor for creating and manipulating a subgraph as a matrix, projecting from an RDF graph represented by a KnowledgeGraph object.

  • kg : kglab.kglab.KnowledgeGraph
    the source RDF graph

  • sparql : str
    text for a SPARQL query that yields pairs to project into the subgraph; by default this expects the query to return bindings for subject and object nodes in the RDF graph

  • bindings : dict
    initial variable bindings

  • src_dst : typing.List[str]
    an optional map to override the subject and object bindings expected in the SPARQL query results; defaults to None


build_df method

[source]

build_df(show_symbols=False)

Factory pattern to populate a pandas.DataFrame object, using transforms in this subgraph.

Note: this method is primarily intended for cuGraph support. Loading via a DataFrame is required – in lieu of using the nx.add_node() approach. Therefore the support for representing bipartite graphs is still pending.

  • show_symbols : bool
    optionally, include the symbolic representation for each node; defaults to False

  • returns : pandas.core.frame.DataFrame
    the populated DataFrame object; uses the RAPIDS cuDF library if GPUs are enabled


build_nx_graph method

[source]

build_nx_graph(nx_graph, bipartite=False)

Factory pattern to populate a networkx.DiGraph object, using transforms in this subgraph. See https://networkx.org/

  • nx_graph : networkx.classes.digraph.DiGraph
    pass in an unpopulated networkx.DiGraph object; must be a cugraph.DiGrap if GPUs are enabled

  • bipartite : bool
    flag for whether the (subject, object) pairs should be partitioned into bipartite sets, in other words whether the adjacency matrix is symmetric; ignored if GPUs are enabled

  • returns : networkx.classes.digraph.DiGraph
    the populated NetworkX graph object; uses the RAPIDS cuGraph library if GPUs are enabled


build_ig_graph method

[source]

build_ig_graph(ig_graph)

Factory pattern to populate an igraph.Graph object, using transforms in this subgraph. See https://igraph.org/python/doc/

Note that iGraph is somewhat notorious for being quite difficult to install correctly across a wide range of different platforms and environments. Consequently this has been removed from being a dependency for kglab; to use iGraph please install and import it separately.

  • ig_graph : typing.Any
    pass in an unpopulated igraph.Graph object

  • returns : typing.Any
    the populated iGraph graph object

SubgraphTensor class

Projection of a RDF graph to a tensor representation. Typical use cases include integration with non-RDF graph libraries for visualization and embedding.


__init__ method

[source]

__init__(kg, excludes=None)

Constructor for creating and manipulating a subgraph as a tensor, projecting from an RDF graph represented by a KnowledgeGraph object.

  • kg : kglab.kglab.KnowledgeGraph
    the source RDF graph

  • excludes : list
    a list of RDF predicates to exclude from projection into the subgraph


as_tuples method

[source]

as_tuples()

Iterator for enumerating the RDF triples to be included in the subgraph, used in factory patterns for visualizations. This allows a kind of lazy evaluation.

  • yields :
    the RDF triples within the subgraph

pyvis_style_node method

[source]

pyvis_style_node(pyvis_graph, node_id, label, style=None)

Adds a node into a PyVis network, optionally with styling info.

  • pyvis_graph : pyvis.network.Network
    the pyvis.network.Network being used for interactive visualization

  • node_id : int
    unique identifier for a node in the RDF graph

  • label : str
    text label for the node

  • style : dict
    optional style dictionary


build_pyvis_graph method

[source]

build_pyvis_graph(notebook=False, style=None)

Factory pattern to create a pyvis.network.Network object, populated by transforms in this subgraph. See https://pyvis.readthedocs.io/

  • notebook : bool
    flag for whether or not the interactive visualization will be generated within a notebook

  • style : dict
    optional style dictionary

  • returns : pyvis.network.Network
    a PyVis network object

Measure class

This class measures an RDF graph. Its downstream use cases include: graph size estimates; computation costs; constructed shapes. See https://derwen.ai/docs/kgl/concepts/#measure

Core feature areas include:

  • descriptive statistics
  • topological analysis

__init__ method

[source]

__init__(name="generic")

Constructor for this graph measure.

  • name : str
    optional name for this measure

reset method

[source]

reset()

Reset (reinitialize) all of the counts for different kinds of census, which include:

  • total nodes
  • total edges
  • count for each kind of subject (Simplex0)
  • count for each kind of predicate (Simplex0)
  • count for each kind of object (Simplex0)
  • count for each kind of literal (Simplex0)
  • item census (Simplex1)
  • dyad census (Simplex1)

get_node_count method

[source]

get_node_count()

Accessor for the node count.

  • returns : int
    value of node_count

get_edge_count method

[source]

get_edge_count()

Accessor for the edge count.

  • returns : int
    value of edge_count

measure_graph method

[source]

measure_graph(kg)

Run a full measure of the given RDF graph.

  • kg : kglab.kglab.KnowledgeGraph
    KnowledgeGraph object representing the RDF graph to be measured

get_keyset method

[source]

get_keyset(incl_pred=True)

Accessor for the set of items (domain: nodes, predicates, labels, URLs, literals, etc.) that were measured. Used for label encoding in the transform between an RDF graph and a matrix or tensor representation.

  • incl_pred : bool
    flag to include the predicates in the set of keys to be encoded

  • returns : typing.List[str]
    sorted list of keys to be used in the encoding

Simplex0 class

Count the distribution of a class of items in an RDF graph. In other words, tally an "item census" – to be consistent with the usage of that term.


__init__ method

[source]

__init__(name="generic")

Constructor for an item census.

  • name : str
    optional name for this measure

increment method

[source]

increment(item0)

Increment the count for this item.

  • item0 : typing.Union[str, rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    an item (domain: node, predicate, label, URL, literal, etc.) to be counted

get_tally method

[source]

get_tally()

Accessor for the item counts.

  • returns : typing.Union[pandas.core.frame.DataFrame, NoneType]
    a pandas.DataFrame with the count distribution, sorted in ascending order

get_keyset method

[source]

get_keyset()

Accessor for the set of items (domain) counted.

  • returns : set
    set of keys for the items (domain: nodes, predicates, labels, URLs, literals, etc.) that were counted

Simplex1 class

Measure a dyad census in an RDF graph, i.e., count the relations (directed edges) which connect two nodes.


__init__ method

[source]

__init__(name="generic")

Constructor for a dyad census.

  • name : str
    optional name for this measure

increment method

[source]

increment(item0, item1)

Increment the count for a dyad represented by the two given items.

  • item0 : typing.Union[str, rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    "source" item (domain: node, label, URL, etc.) to be counted

  • item1 : typing.Union[str, rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]
    "sink" item (range: node, label, literal, URL, etc.) to be counted


get_tally_map method

[source]

get_tally_map()

Accessor for the dyads census.

  • returns : typing.Tuple[pandas.core.frame.DataFrame, dict]
    a tuple of a pandas.DataFrame with the count distribution, sorted in ascending order; and a map of the observed links between "source" and "sink" items

PSLModel class

Class representing a probabilistic soft logic (PSL) model.

For PSL-specific terminology used here, see https://psl.linqs.org/wiki/master/Glossary.html


__init__ method

[source]

__init__(name=None)

Wrapper for constructing a pslpython.model.Model.

  • name : str
    optional name of the PSL model; if not supplied, PSL generates a random name

clear_model method

[source]

clear_model()

Clear any pre-existing data from each of the predicates, to initialize the model.

  • returns : PSLModel
    this PSL model – use for method chaining

add_predicate method

[source]

add_predicate(raw_name, size=None, closed=False, arg_types=None)

Add a pslpython.predicate.Predicate to this model. Enough details must be supplied for PSL to infer the number and types of each predicate's arguments.

  • raw_name : str
    name of the predicate; must be unique among all of the predicates

  • size : int
    optional, the number of arguments for this predicate

  • closed : bool
    indicates that this predicate is fully observed, i.e., all substitutions of this predicate have known values and will behave as evidence for inference; otherwise, if False then infer some values of this predicate; defaults to False

  • arg_types : typing.List
    optional, a list of types for the arguments for this predicate; all arguments will default to string

  • returns : PSLModel
    this PSL model – use for method chaining


add_rule method

[source]

add_rule(rule_string, weighted=None, weight=None, squared=None)

Add a pslpython.rule.Rule to this model.

  • a weighted rule can change its weight or squared status
  • a weighted rule cannot convert into an unweighted rule nor visa-versa
  • unweighted rules are constraints

For more details, see https://psl.linqs.org/wiki/master/Rule-Specification.html

  • rule_string : str
    text representation for specifying the rule

  • weighted : bool
    indicates that this rule is weighted

  • weight : float
    weight of this rule

  • squared : bool
    indicates that this rule's potential is squared

  • returns : PSLModel
    this PSL model – use for method chaining


add_data_row method

[source]

add_data_row(predicate_name, args, partition="observations", truth_value=1.0, verbose=False)

Add a single record to a specified predicate, within a specified partition.

  • predicate_name : str
    name of the specific predicate; name normalization will be handled internally; raises ModelError if the predicate name is not found

  • args : list
    arguments for the record being added, as a list

  • partition : str
    label for the pslpython.partition.Partition into which the data gets added; must be among [ "observations", "targets", "truth" ]; defaults to "observations"; see https://psl.linqs.org/wiki/master/Data-Storage-in-PSL.html

  • truth_value : float
    optional truth value of the record being added

  • verbose : bool
    flag for verbose trace of each added record

  • returns : PSLModel
    this PSL model – use for method chaining


trace_predicate method

[source]

trace_predicate(predicate_name, partition="observations", path=None)

Construct a trace of the data in a specified predicate, within a specified partition, formatted as a dataframe. Use a consistent column naming and sort order, so that these values can be used later in testing. Optionally write out this out to a TSV file.

  • predicate_name : str
    name of the specific predicate; name normalization will be handled internally; raises ModelError if the predicate name is not found

  • partition : str
    label for the pslpython.partition.Partition into which the data gets added; must be among [ "observations", "targets", "truth" ]; defaults to "observations"; see https://psl.linqs.org/wiki/master/Data-Storage-in-PSL.html

  • path : pathlib.Path
    optional output path for the TSV file; defaults to None

  • returns : pandas.core.frame.DataFrame
    dataframe representing the traced partition data


compare_predicate classmethod

[source]

compare_predicate(df, trace_path)

Compare the values of a predict with its expected values which get loaded from a file. This will print any expected (missing) or error (mismatched) rows.

  • df : pandas.core.frame.DataFrame
    dataframe from trace_predicate

  • trace_path : pathlib.Path
    path to a TSV file of expected values, saved from the trace of a baseline run

  • returns : pandas.core.frame.DataFrame
    dataframe loaded from the expected values


infer method

[source]

infer(method="", cli_options=None, psl_config=None, jvm_options=None)

Run inference on this model, storing the inferred results in an internal dataframe.


get_results method

[source]

get_results(predicate_name)

Accessor for the inferred results for a specified predicate.

  • predicate_name : str
    name of the specific predicate; name normalization will be handled internally; raises ModelError if the predicate name is not found

  • returns : pandas.core.frame.DataFrame
    inferred values as a pandas.DataFrame, with columns names for each argument plus the "truth" value


module functions


calc_quantile_bins function

[source]

calc_quantile_bins(num_rows)

Calculate the bins to use for a quantile stripe, using numpy.linspace

  • num_rows : int
    number of rows in the target dataframe

  • returns : numpy.ndarray
    the calculated bins, as a numpy.ndarray


get_gpu_count function

[source]

get_gpu_count()

Special handling for detecting GPU availability: an approach recommended by the NVidia RAPIDS engineering team, since nvml bindings are difficult for Python libraries to keep updated.

  • returns : int
    count of available GPUs

import_from_neo4j function

[source]

import_from_neo4j(username, password, dbname, host="localhost", port="7474")

Wrapper for a Cypher export request, to provide neo4j integration through the neosemantics library.

Tested with ~10GB of stored triples.

  • username : str
    the user name, as a string

  • password : str
    the password, as a string

  • dbname : str
    the database name, as a string

  • host : str
    optionally, the neo4j server domain name or IP address, as a string – including the protocol scheme; defaults to "http://localhost"

  • port : str
    optionally, the neo4j server port; defaults to "7474"

  • returns : rdflib.graph.Graph
    an rdflib.Graph object parsed from the exported RDF


root_mean_square function

[source]

root_mean_square(values)

Calculate the root mean square of the values in the given list.

  • values : list
    list of values to use in the RMS calculation

  • returns : float
    RMS metric as a float


stripe_column function

[source]

stripe_column(values, bins, use_gpus=False)

Stripe a column in a dataframe, by interpolating quantiles into a set of discrete indexes.

  • values : list
    list of values to stripe

  • bins : int
    quantile bins; see calc_quantile_bins()

  • use_gpus : bool
    optionally, use the NVidia GPU devices with the RAPIDS libraries if these libraries have been installed and the devices are available; defaults to False

  • returns : numpy.ndarray
    the striped column values, as a numpy.ndarray; uses the RAPIDS cuDF library if GPUs are enabled


module types

Census_Dyad_Tally type

Census_Dyad_Tally = typing.Tuple[pandas.core.frame.DataFrame, dict]

Census_Item type

Census_Item = typing.Union[str, rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]

EvoShapeBoard type

EvoShapeBoard = typing.List[typing.List[~Evolike]]

EvoShapeDistance type

EvoShapeDistance = typing.Tuple[int, int, float]

GraphLike type

GraphLike = typing.Union[rdflib.graph.ConjunctiveGraph, rdflib.graph.Dataset, rdflib.graph.Graph]

IOPathLike type

IOPathLike = typing.Union[str, pathlib.Path, urlpath.URL, typing.IO]

NodeLike type

NodeLike = typing.Union[str, NoneType, rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]

PathLike type

PathLike = typing.Union[str, pathlib.Path, urlpath.URL]

RDF_Node type

RDF_Node = typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]

RDF_Triple type

RDF_Triple = typing.Tuple[typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode], typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode], typing.Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.BNode]]

SPARQL_Bindings type

SPARQL_Bindings = typing.Tuple[str, dict]

SerializedEvoShape type

SerializedEvoShape = typing.List[~Evolike]

Last update: 2021-04-10