Compute distributed RDF dataset statistics.
Compute distributed RDF dataset statistics.
VoID description of the given dataset
29.
29. Average per property {int,float,time} criterion
entities with their average values on the graph
22.
22. Average typed string length criterion.
the average typed string length used throughout the RDF graph.
23.
23. Average untyped string length criterion.
the average untyped string length used throughout the RDF graph.
19.
19. Blanks as object criterion
number of triples where blanknodes are used as objects.
18.
18. Blanks as subject criterion
number of triples where blanknodes are used as subjects.
2. Class Usage Count Criterion
Count the usage of respective classes of a datase,
the filter rule that is used to analyze a triple is the
same as in the first criterion.
2. Class Usage Count Criterion
Count the usage of respective classes of a datase,
the filter rule that is used to analyze a triple is the
same as in the first criterion.
As an action a map is being created having class IRIs as
identifier and its respective usage count as value.
If a triple is conform to the filter rule the respective
value will be increased by one.
Filter rule : ?p=rdf:type && isIRI(?o)
Action : M[?o]++
RDD of classes used in the dataset and their frequencies.
3. Classes Defined Criterion
Gets a set of classes that are defined within a
dataset this criterion is being used.
3. Classes Defined Criterion
Gets a set of classes that are defined within a
dataset this criterion is being used.
Usually in RDF/S and OWL a class can be defined by a triple
using the predicate rdf:type
and either rdfs:Class
or
owl:Class
as object.
The filter rule illustrates the condition used to analyze the triple.
If the triple is accepted by the rule, the IRI used as subject is added to the set of classes.
Filter rule : ?p=rdf:type && isIRI(?s) &&(?o=rdfs:Class||?o=owl:Class)
Action : S += ?s
RDD of classes defined in the dataset.
20.
20. Datatypes criterion
histogram of types used for literals.
16. Distinct entities
Count distinct entities of a dataset by filtering out all IRIs.
16. Distinct entities
Count distinct entities of a dataset by filtering out all IRIs.
Filter rule : S+=iris({?s,?p,?o})
Action : S
RDD of distinct entities in the dataset.
Distinct Objects
Count distinct objects within triples.
Distinct Objects
Count distinct objects within triples.
Filter rule : isURI(?o)
Action : M[?o]++
RDD of objects used in the dataset.
Distinct Subjects
Count distinct subject within triples.
Distinct Subjects
Count distinct subject within triples.
Filter rule : isURI(?s)
Action : M[?s]++
RDD of subjects used in the dataset.
24.
24. Labeled subjects criterion.
list of labeled subjects.
21.
21. Languages criterion
histogram of languages used for literals.
26.
26. Links criterion.
list of namespaces and their frequentcies.
* 17.
* 17. Literals criterion
number of triples that are referencing literals to subjects.
28.Maximum per property {int,float,time} criterion
28.Maximum per property {int,float,time} criterion
entities with their maximum values on the graph
32. Object vocabularies
Compute object vocabularies/namespaces used through the dataset.
32. Object vocabularies
Compute object vocabularies/namespaces used through the dataset.
Filter rule : ns=ns(?o)
Action : M[ns]++
RDD of distinct object vocabularies used in the dataset and their frequencies.
31. Predicate vocabularies
Compute predicate vocabularies/namespaces used through the dataset.
31. Predicate vocabularies
Compute predicate vocabularies/namespaces used through the dataset.
Filter rule : ns=ns(?p)
Action : M[ns]++
RDD of distinct predicate vocabularies used in the dataset and their frequencies.
Properties Defined
Count the defined properties within triples.
Properties Defined
Count the defined properties within triples.
Filter rule : ?p=rdf:type && (?o=owl:ObjectProperty ||
?o=rdf:Property)&& !isIRI(?s)
Action : M[?p]++
RDD of predicates defined in the dataset.
5. Property Usage Criterion
Count the usage of properties within triples.
5. Property Usage Criterion
Count the usage of properties within triples.
Therefore an RDD will be created containing all property
IRI's as identifier.
Afterwards, their frequencies will be computed.
Filter rule : none
Action : M[?p]++
RDD of predicates used in the dataset and their frequencies.
7. Property usage distinct per object
Count the usage of properties within triples based on objects.
7. Property usage distinct per object
Count the usage of properties within triples based on objects.
Filter rule : none
Action : M[?o] += ?p
RDD of predicates used in the dataset and their frequencies.
6. Property usage distinct per subject
Count the usage of properties within triples based on subjects.
6. Property usage distinct per subject
Count the usage of properties within triples based on subjects.
Filter rule : none
Action : M[?s] += ?p
RDD of predicates used in the dataset and their frequencies.
25.
25. SameAs criterion.
list of triples with owl#sameAs as predicate
30. Subject vocabularies
Compute subject vocabularies/namespaces used through the dataset.
30. Subject vocabularies
Compute subject vocabularies/namespaces used through the dataset.
Filter rule : ns=ns(?s)
Action : M[ns]++
RDD of distinct subject vocabularies used in the dataset and their frequencies.
24.
24. Typed subjects criterion.
list of typed subjects.
1. Used Classes Criterion
Creates an RDD of classes are in use by instances of the analyzed dataset.
1. Used Classes Criterion
Creates an RDD of classes are in use by instances of the analyzed dataset.
As an example of such a triple that will be accepted by
the filter is sda:Gezim rdf:type distLODStats:Developer
.
Filter rule : ?p=rdf:type && isIRI(?o)
Action : S += ?o
RDD of classes/instances