NQuadReader

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def load(session: SparkSession, path: String, stopOnBadTerm: ErrorParseMode.Value = ErrorParseMode.STOP, stopOnWarnings: WarningParseMode.Value = WarningParseMode.IGNORE, checkRDFTerms: Boolean = false, errorLog: Logger = ErrorHandlerFactory.stdLogger): RDD[Triple]

Loads N-Triples data from a file or directory into an RDD.
Loads N-Triples data from a file or directory into an RDD. The path can also contain multiple paths and even wildcards, e.g. "/my/dir1,/my/paths/part-00[0-5]*,/another/dir,/a/specific/file"
Handling of errors
By default, it stops once a parse error occurs, i.e. a org.apache.jena.riot.RiotException will be thrown generated by the underlying parser.
The following options exist:
- STOP the whole data loading process will be stopped and a org.apache.jena.net.sansa_stack.rdf.spark.riot.RiotException will be thrown
- SKIP the line will be skipped but the data loading process will continue, an error message will be logged
Handling of warnings
If the additional checking of RDF terms is enabled, warnings during parsing can occur. For example, a wrong lexical form of a literal w.r.t. to its datatype will lead to a warning.
The following can be done with those warnings:
- IGNORE the warning will just be logged to the configured logger
- STOP similar to the error handling mode, the whole data loading process will be stopped and a org.apache.jena.riot.RiotException will be thrown
- SKIP similar to the error handling mode, the line will be skipped but the data loading process will continue
Checking of RDF terms
Set whether to perform checking of NTriples - defaults to no checking.
Checking adds warnings over and above basic syntax errors. This can also be used to turn warnings into exceptions if the option stopOnWarnings is set to STOP or SKIP.
- IRIs - whether IRIs confirm to all the rules of the IRI scheme
- Literals: whether the lexical form conforms to the rules for the datatype.
- Triples: check slots have a valid kind of RDF term (parsers usually make this a syntax error anyway).
See also the optional errorLog argument to control the output. The default is to log.
session
the Spark session
path
the path to the N-Triples file(s)
stopOnBadTerm
stop parsing on encountering a bad RDF term
stopOnWarnings
stop parsing on encountering a warning
checkRDFTerms
run with checking of literals and IRIs either on or off
errorLog
the logger used for error message handling
returns
the RDD of triples
def load(session: SparkSession, paths: Seq[URI]): RDD[Triple]

Loads N-Triples data from a set of files or directories into an RDD.
Loads N-Triples data from a set of files or directories into an RDD. The path can also contain multiple paths and even wildcards, e.g. "/my/dir1,/my/paths/part-00[0-5]*,/another/dir,/a/specific/file"
session
the Spark session
paths
the path to the N-Triples file(s)
returns
the RDD of triples
def load(session: SparkSession, path: URI): RDD[Triple]

Loads N-Triples data from a file or directory into an RDD.
Loads N-Triples data from a file or directory into an RDD.
session
the Spark session
path
the path to the N-Triples file(s)
returns
the RDD of triples
def main(args: Array[String]): Unit
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package nquads

object NQuadReader

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

def load(session: SparkSession, path: String, stopOnBadTerm: ErrorParseMode.Value = ErrorParseMode.STOP, stopOnWarnings: WarningParseMode.Value = WarningParseMode.IGNORE, checkRDFTerms: Boolean = false, errorLog: Logger = ErrorHandlerFactory.stdLogger): RDD[Triple]

Handling of errors

Handling of warnings

Checking of RDF terms

def load(session: SparkSession, paths: Seq[URI]): RDD[Triple]

def load(session: SparkSession, path: URI): RDD[Triple]

def main(args: Array[String]): Unit

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped