Object

net.sansa_stack.rdf.spark.io

NTripleReader

Related Doc: package io

Permalink

object NTripleReader

An N-Triples reader. One triple per line is assumed.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. NTripleReader
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. def load(session: SparkSession, path: String, stopOnBadTerm: ErrorParseMode.Value = ErrorParseMode.STOP, stopOnWarnings: WarningParseMode.Value = WarningParseMode.IGNORE, checkRDFTerms: Boolean = false, errorLog: Logger = ErrorHandlerFactory.stdLogger): RDD[Triple]

    Permalink

    Loads N-Triples data from a file or directory into an RDD.

    Loads N-Triples data from a file or directory into an RDD. The path can also contain multiple paths and even wildcards, e.g. "/my/dir1,/my/paths/part-00[0-5]*,/another/dir,/a/specific/file"

    Handling of errors

    By default, it stops once a parse error occurs, i.e. a org.apache.jena.riot.RiotException will be thrown generated by the underlying parser.

    The following options exist:

    • STOP the whole data loading process will be stopped and a org.apache.jena.net.sansa_stack.rdf.spark.riot.RiotException will be thrown
    • SKIP the line will be skipped but the data loading process will continue, an error message will be logged
    Handling of warnings

    If the additional checking of RDF terms is enabled, warnings during parsing can occur. For example, a wrong lexical form of a literal w.r.t. to its datatype will lead to a warning.

    The following can be done with those warnings:

    • IGNORE the warning will just be logged to the configured logger
    • STOP similar to the error handling mode, the whole data loading process will be stopped and a org.apache.jena.riot.RiotException will be thrown
    • SKIP similar to the error handling mode, the line will be skipped but the data loading process will continue
    Checking of RDF terms

    Set whether to perform checking of NTriples - defaults to no checking.

    Checking adds warnings over and above basic syntax errors. This can also be used to turn warnings into exceptions if the option stopOnWarnings is set to STOP or SKIP.

    • IRIs - whether IRIs confirm to all the rules of the IRI scheme
    • Literals: whether the lexical form conforms to the rules for the datatype.
    • Triples: check slots have a valid kind of RDF term (parsers usually make this a syntax error anyway).

    See also the optional errorLog argument to control the output. The default is to log.

    session

    the Spark session

    path

    the path to the N-Triples file(s)

    stopOnBadTerm

    stop parsing on encountering a bad RDF term

    stopOnWarnings

    stop parsing on encountering a warning

    checkRDFTerms

    run with checking of literals and IRIs either on or off

    errorLog

    the logger used for error message handling

    returns

    the RDD of triples

  13. def load(session: SparkSession, paths: Seq[URI]): RDD[Triple]

    Permalink

    Loads N-Triples data from a set of files or directories into an RDD.

    Loads N-Triples data from a set of files or directories into an RDD. The path can also contain multiple paths and even wildcards, e.g. "/my/dir1,/my/paths/part-00[0-5]*,/another/dir,/a/specific/file"

    session

    the Spark session

    paths

    the path to the N-Triples file(s)

    returns

    the RDD of triples

  14. def load(session: SparkSession, path: URI): RDD[Triple]

    Permalink

    Loads N-Triples data from a file or directory into an RDD.

    Loads N-Triples data from a file or directory into an RDD.

    session

    the Spark session

    path

    the path to the N-Triples file(s)

    returns

    the RDD of triples

  15. def main(args: Array[String]): Unit

    Permalink
  16. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  20. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped