StatsCriteria

Instance Constructors

new StatsCriteria(triples: RDD[Triple])

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val spark: SparkSession
def stats: RDD[String]

Compute distributed RDF dataset statistics.
Compute distributed RDF dataset statistics.
returns
VoID description of the given dataset
def statsAvgPerProperty(): RDD[(Node, Double)]

29.
29. Average per property {int,float,time} criterion
returns
entities with their average values on the graph
def statsAvgTypedStringLength(): Double

22.
22. Average typed string length criterion.
returns
the average typed string length used throughout the RDF graph.
def statsAvgUntypedStringLength(): Double

23.
23. Average untyped string length criterion.
returns
the average untyped string length used throughout the RDF graph.
def statsBlanksAsObject(): RDD[Triple]

19.
19. Blanks as object criterion
returns
number of triples where blanknodes are used as objects.
def statsBlanksAsSubject(): RDD[Triple]

18.
18. Blanks as subject criterion
returns
number of triples where blanknodes are used as subjects.
def statsClassHierarchyDepth(): RDD[(Node, Int)]

4. Class hierarchy depth criterion
4. Class hierarchy depth criterion
returns
the depth of the graph
def statsClassUsageCount(): RDD[(Node, Int)]

2. Class Usage Count Criterion
Count the usage of respective classes of a datase, the filter rule that is used to analyze a triple is the same as in the first criterion.
2. Class Usage Count Criterion
Count the usage of respective classes of a datase, the filter rule that is used to analyze a triple is the same as in the first criterion. As an action a map is being created having class IRIs as identifier and its respective usage count as value. If a triple is conform to the filter rule the respective value will be increased by one. Filter rule : ?p=rdf:type && isIRI(?o) Action : M[?o]++
returns
RDD of classes used in the dataset and their frequencies.
def statsClassesDefined(): RDD[Node]

3. Classes Defined Criterion
Gets a set of classes that are defined within a dataset this criterion is being used.
3. Classes Defined Criterion
Gets a set of classes that are defined within a dataset this criterion is being used. Usually in RDF/S and OWL a class can be defined by a triple using the predicate rdf:type and either rdfs:Class or owl:Class as object. The filter rule illustrates the condition used to analyze the triple. If the triple is accepted by the rule, the IRI used as subject is added to the set of classes. Filter rule : ?p=rdf:type && isIRI(?s) &&(?o=rdfs:Class||?o=owl:Class) Action : S += ?s
returns
RDD of classes defined in the dataset.
def statsDatatypes(): RDD[(String, Int)]

20.
20. Datatypes criterion
returns
histogram of types used for literals.
def statsDistinctEntities(): RDD[Node]

16. Distinct entities
Count distinct entities of a dataset by filtering out all IRIs.
16. Distinct entities
Count distinct entities of a dataset by filtering out all IRIs. Filter rule : S+=iris({?s,?p,?o}) Action : S
returns
RDD of distinct entities in the dataset.
def statsDistinctObjects(): RDD[Node]

Distinct Objects
Count distinct objects within triples.
Distinct Objects
Count distinct objects within triples. Filter rule : isURI(?o) Action : M[?o]++
returns
RDD of objects used in the dataset.
def statsDistinctSubjects(): RDD[Node]

Distinct Subjects
Count distinct subject within triples.
Distinct Subjects
Count distinct subject within triples. Filter rule : isURI(?s) Action : M[?s]++
returns
RDD of subjects used in the dataset.
def statsLabeledSubjects(): RDD[Node]

24.
24. Labeled subjects criterion.
returns
list of labeled subjects.
def statsLanguages(): RDD[(String, Int)]

21.
21. Languages criterion
returns
histogram of languages used for literals.
def statsLinks(): RDD[(String, String, Int)]

26.
26. Links criterion.
returns
list of namespaces and their frequentcies.
def statsLiterals(): RDD[Triple]

* 17.
* 17. Literals criterion
returns
number of triples that are referencing literals to subjects.
def statsMaxPerProperty(): RDD[(Node, Node)]

28.Maximum per property {int,float,time} criterion
28.Maximum per property {int,float,time} criterion
returns
entities with their maximum values on the graph
def statsObjectVocabularies(): RDD[(String, Int)]

32. Object vocabularies
Compute object vocabularies/namespaces used through the dataset.
32. Object vocabularies
Compute object vocabularies/namespaces used through the dataset. Filter rule : ns=ns(?o) Action : M[ns]++
returns
RDD of distinct object vocabularies used in the dataset and their frequencies.
def statsPredicateVocabularies(): RDD[(String, Int)]

31. Predicate vocabularies
Compute predicate vocabularies/namespaces used through the dataset.
31. Predicate vocabularies
Compute predicate vocabularies/namespaces used through the dataset. Filter rule : ns=ns(?p) Action : M[ns]++
returns
RDD of distinct predicate vocabularies used in the dataset and their frequencies.
def statsPropertiesDefined(): RDD[Node]

Properties Defined
Count the defined properties within triples.
Properties Defined
Count the defined properties within triples. Filter rule : ?p=rdf:type && (?o=owl:ObjectProperty || ?o=rdf:Property)&& !isIRI(?s) Action : M[?p]++
returns
RDD of predicates defined in the dataset.
def statsPropertyHierarchyDepth(): RDD[(Node, Int)]

12.
12. Property hierarchy depth criterion
returns
the depth of the graph
def statsPropertyUsage(): RDD[(Node, Int)]

5. Property Usage Criterion
Count the usage of properties within triples.
5. Property Usage Criterion
Count the usage of properties within triples. Therefore an RDD will be created containing all property IRI's as identifier. Afterwards, their frequencies will be computed. Filter rule : none Action : M[?p]++
returns
RDD of predicates used in the dataset and their frequencies.
def statsPropertyUsageDistinctPerObject(): RDD[(Iterable[Triple], Int)]

7. Property usage distinct per object
Count the usage of properties within triples based on objects.
7. Property usage distinct per object
Count the usage of properties within triples based on objects. Filter rule : none Action : M[?o] += ?p
returns
RDD of predicates used in the dataset and their frequencies.
def statsPropertyUsageDistinctPerSubject(): RDD[(Iterable[Triple], Int)]

6. Property usage distinct per subject
Count the usage of properties within triples based on subjects.
6. Property usage distinct per subject
Count the usage of properties within triples based on subjects. Filter rule : none Action : M[?s] += ?p
returns
RDD of predicates used in the dataset and their frequencies.
def statsSameAs(): RDD[Triple]

25.
25. SameAs criterion.
returns
list of triples with owl#sameAs as predicate
def statsSubjectVocabularies(): RDD[(String, Int)]

30. Subject vocabularies
Compute subject vocabularies/namespaces used through the dataset.
30. Subject vocabularies
Compute subject vocabularies/namespaces used through the dataset. Filter rule : ns=ns(?s) Action : M[ns]++
returns
RDD of distinct subject vocabularies used in the dataset and their frequencies.
def statsTypedSubjects(): RDD[Node]

24.
24. Typed subjects criterion.
returns
list of typed subjects.
def statsUsedClasses(): RDD[Node]

1. Used Classes Criterion
Creates an RDD of classes are in use by instances of the analyzed dataset.
1. Used Classes Criterion
Creates an RDD of classes are in use by instances of the analyzed dataset. As an example of such a triple that will be accepted by the filter is sda:Gezim rdf:type distLODStats:Developer. Filter rule : ?p=rdf:type && isIRI(?o) Action : S += ?o
returns
RDD of classes/instances
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package stats

implicit class StatsCriteria extends Logging

Instance Constructors

new StatsCriteria(triples: RDD[Triple])

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

def isTraceEnabled(): Boolean

def log: Logger

def logDebug(msg: ⇒ String, throwable: Throwable): Unit

def logDebug(msg: ⇒ String): Unit

def logError(msg: ⇒ String, throwable: Throwable): Unit

def logError(msg: ⇒ String): Unit

def logInfo(msg: ⇒ String, throwable: Throwable): Unit

def logInfo(msg: ⇒ String): Unit

def logName: String

def logTrace(msg: ⇒ String, throwable: Throwable): Unit

def logTrace(msg: ⇒ String): Unit

def logWarning(msg: ⇒ String, throwable: Throwable): Unit

def logWarning(msg: ⇒ String): Unit

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val spark: SparkSession

def stats: RDD[String]

def statsAvgPerProperty(): RDD[(Node, Double)]

def statsAvgTypedStringLength(): Double

def statsAvgUntypedStringLength(): Double

def statsBlanksAsObject(): RDD[Triple]

def statsBlanksAsSubject(): RDD[Triple]

def statsClassHierarchyDepth(): RDD[(Node, Int)]

def statsClassUsageCount(): RDD[(Node, Int)]

def statsClassesDefined(): RDD[Node]

def statsDatatypes(): RDD[(String, Int)]

def statsDistinctEntities(): RDD[Node]

def statsDistinctObjects(): RDD[Node]

def statsDistinctSubjects(): RDD[Node]

def statsLabeledSubjects(): RDD[Node]

def statsLanguages(): RDD[(String, Int)]

def statsLinks(): RDD[(String, String, Int)]

def statsLiterals(): RDD[Triple]

def statsMaxPerProperty(): RDD[(Node, Node)]

def statsObjectVocabularies(): RDD[(String, Int)]

def statsPredicateVocabularies(): RDD[(String, Int)]

def statsPropertiesDefined(): RDD[Node]

def statsPropertyHierarchyDepth(): RDD[(Node, Int)]

def statsPropertyUsage(): RDD[(Node, Int)]

def statsPropertyUsageDistinctPerObject(): RDD[(Iterable[Triple], Int)]

def statsPropertyUsageDistinctPerSubject(): RDD[(Iterable[Triple], Int)]

def statsSameAs(): RDD[Triple]

def statsSubjectVocabularies(): RDD[(String, Int)]

def statsTypedSubjects(): RDD[Node]

def statsUsedClasses(): RDD[Node]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped