Release notes

This is the release notes for the SANSA project. This page contains latest releases for the project.


 

  • Features

    • General
      • #69 Add coverall integration
      • #82 Further improvement of unit test coverage
    • Spark
      • #81 Add RDF compression techniques
      • #83 Add TRIX support
      • Refactor Tensor representation
    • Flink
      • Align Quality Assessment implementation with the Spark module

    Bug Fixes

    • #78 Write Code Once (Riot parser)
    • #80 Remove sortBy operation for semantic-based partition

    Dependency Changes

    • Apache Spark 2.4.0 → 2.4.3
    • Apache Flink 1.7.0 → 1.8.0
    • Apache Jena 3.9.0 → 3.11.0

    Dependency changes

    • Apache Spark 2.4.3
    • Apache Flink 1.8.0

    Features

    Bug Fixes

    • #30 Fix issue of not returning the result set on semantic query engine
    • #32 Correct broken DataLake input files

    Dependency changes

    • Apache Spark 2.4.0 → 2.4.3
    • Apache Flink 1.7.0 → 1.8.0
    • Apache Jena 3.9.0 → 3.11.0

    Features

    • Spark
      • #11 Return graph + inferred axioms for axioms instead of just inferred axioms
    • Flink
      • #14 Makes Flink layer compliant with Jena datastructures

    Bug Fixes

    • #10 Running axioms forward chaining generates different result set from triples inference

    Dependency Changes

    • Apache Spark 2.4.0 →2.4.3
    • Apache Flink 1.7.0 → 1.8.0
    • Apache Jena 3.9.0 → 3.11.0

    Features

    • Spark
      • Add coverall integration
      • Further improvement of unit test coverage
      • #12 Refactor vandalism detection package
      • #20 Align with RDF layer

    Bug Fixes

    • #13 Classes with names that only differ in casing
    • #15 Hard-coded spatial partitioning value (DBSCAN)
    • #17 geospark scope removed

    Dependency Changes

    • Apache Spark 2.4.0 → 2.4.3
    • Apache Flink 1.7.0 → 1.8.0
    • Apache Jena 3.9.0 → 3.11.0

  • SANSA RDF 0.5.0

    Features

    • Spark
      • Further support for RDF quality assessment
      • Refactor and improve stats sub-module
      • Semantic partitioning (refactoring)
      • Adding the possibility to generate R2RML mappings and the associated SQL commands
    • Flink
      • Support for RDF quality assessment
      • Refactor and improve stats sub-module
      • Add support for TripleOps on DataSet
      • Add support for GraphOps on Gelly
      • Introduced implicit calls for partitioning strategies
      • Introduced implicit calls for io operations and align the API with sansa-rdf-module

    Bug Fixes

    • #33 kryo exceptions
    • #60 Issue with avg. untyped String literal length measure
    • #62 Issue with Distinct entities measure
    • #63 Improvement of the return type of the Link measure
    • #64 Issue with Class Hierarchy Depth measure
    • #65 Issues with Max/Avg Per Property measure
    • #68 Signature of net.sansa_stack.rdf.spark.partition.semantic.RdfPartition not clear

    Dependency changes

    • Apache Spark : 2.3.1 -> 2.4.0
    • Apache Flink : 1.5.0 -> 1.7.0
    • Apache Jena : 3.7.0 -> 3.9.0

    SANSA Query 0.5.0

    Features

    • Spark
      • Add possibility to query heterogeneous data sources using SANSA DataLake
      • Add OnTop integration
      • Semantic-based query engine (refactoring)

    Dependency changes

    • Apache Spark 2.4.0
    • Apache Flink 1.7.0
    • Apache Jena 3.9.0

    SANSA Inference 0.5.0

    Features

    • Spark
      • Rule-based forward chaining on OWL axioms (EXPERIMENTAL) supporting the following profiles
        • RDFS
        • OWL Horst

    Dependency Changes

    • Apache Spark 2.4.0
    • Apache Flink 1.7.0
    • Apache Jena: 3.9.0
    • Jena Sparql API: 3.9.0-1

    SANSA-ML 0.5.0

    Features

    • Spark
      • Numerical outlier detection
      • RDF Graph Kernels (Alpha)
      • Update: knowledge graph embedding support(pre Alpha)
      • Decision trees (pre Alpha)
      • Update: Clustering
        • Power iteration clustering
        • DBScan (experimental)
        • Unified interface for all clustering algorithms

    Dependency Changes

    • Apache Spark 2.4.0
    • Apache Flink 1.7.0
    • Apache Jena 3.9.0

    SANSA Examples 0.5.0

    Features

    • Spark
      • RDF
        • RDF Statistics example
        • RDF Quality Assessment example
        • PageRank of resources example
        • Triple Ops example
        • Triple reader example
        • Triple writer example
      • OWL
        • Dataset OWL reader example (Functional and Manchester syntax)
        • RDD OWL reader example (Functional and Manchester syntax)
      • Inference
        • Triples RDF Graph Inference example (RDFS,RDFS_SIMPLE, TRANSITIVE, OWL Horst reasoner)
        • Axioms RDF Graph Inference example (RDFS, OWL Horst reasoner) – new
      • Query
        • Sparklify example
        • Semantic example
        • Graph example
        • DataLake example – new
      • ML
        • Mines the Rules example
        • RDF By Modularity Clustering example
        • Power Iteration Clustering example
        • Border Flow Clustering example
        • Silvia Clustering example
        • Holdout Cross validation techniques example
        • Anomaly Detection example
        • RDF Graph Kernel example
    • Flink
      • RDF
        • RDF Statistics example
        • Triple Ops example
        • Triple reader example
        • Triple writer example
      • OWL
        • Dataset OWL reader example (Functional and Manchester syntax)
      • Inference
        • RDF Graph Inference example (RDFS (Full), RDFS (Simple), OWL Horst, Transitive reasoner)
      • ML
        • RDF By Modularity Clustering example

  • SANSA RDF 0.4.0

    Features

    • Spark
      • Support for RDF quality assessment
      • Semantic partitioning (improvements)
      • Graph partitioning strategies
      • Add support for TripleOps on DataFrame/DataSets
      • Add support for GraphOps on GraphX
      • RDF Parser Performance Improvement
      • Permissive RDF Parsing (N-Triples)
      • Deprecate banana-rdf implementation
      • Introduced implicit calls
      • Make Scala-style compliant
    • Flink
      • Support for RDF quality assessment
      • Support for semantic partitioning
      • Introduced implicit calls for RDF stats
      • Make Scala-style compliant

    Bug Fixes

    • #25 Multiple sources found exception
    • #28 Problem with rdf_loader.conf in master branch
    • #24 Ntriple reader gave error reading nt files including these characters
    • #47 Hotfix/rdf parser

    Dependency changes

    • Apache Spark 2.3.1
    • Apache Flink 1.5.0
    • Apache Jena 3.7.0

    SANSA OWL 0.4.0

    Dependency changes

    • Apache Spark 2.3.1
    • Apache Flink 1.5.0
    • OWL API 5.1.5

    SANSA Query 0.4.0

    Features

    • Spark
      • Update: Join conditions on Spark working
      • Experimental: SPARQL-to-GraphX translation (performance tuning targeted for SANSA 0.5)
      • New: Query rewriter on top of semantic partitioning
    • Flink
      • New: Query rewriter on top of semantic partitioning

    Dependency changes

    • Apache Spark 2.3.1
    • Apache Flink 1.5.0
    • Apache Jena 3.7.0

    SANSA Inference 0.4.0

    Features

    Spark

    • Alignment with RDF and OWL layer

    Flink

    • Alignment with RDF and OWL layer

    Dependency Changes

    • Apache Spark 2.3.1
    • Apache Flink 1.5.0
    • Apache Jena 3.7.0
    • OWL API 5.1.5

    SANSA-ML 0.4.0

    Features

    • Spark
      • New: Numerical outlier detection(Beta status)
      • New: RDF Graph Kernels (Alpha)
      • Update: knowledge graph embedding support(pre Alpha)
      • Update: Decision trees (pre Alpha)
      • Update: Clustering(Beta status)
      • Update: Semantic similarity measures
      • Update: Vandalism Detection in WikiData(Beta status)(Beta status)

    Dependency changes

    • Apache Spark 2.3.1
    • Apache Flink 1.5.0
    • Apache Jena 3.7.0

    SANSA Examples 0.4.0

    Features

    • Spark
      • RDF
        • RDF Statistics example
        • RDF Quality Assessment example
        • PageRank of resources example
        • Triple Ops example
        • Triple reader example
        • Triple writer example
      • OWL
        • Dataset OWL reader example (Functional and Manchester syntax)
        • RDD OWL reader example (Functional and Manchester syntax)
      • Inference
        • RDF Graph Inference example (RDFS,RDFS_SIMPLE, TRANSITIVE, OWL Horst reasoner)
      • Query
        • Sparklify example
        • Semantic example
        • Graph example
      • ML
        • Mines the Rules example
        • RDF By Modularity Clustering example
        • Power Iteration Clustering example
        • Border Flow Clustering example
        • Silvia Clustering example
        • Holdout Cross validation techniques example
        • Anomaly Detection example
        • RDF Graph Kernel example
    • Flink
      • RDF
        • RDF Statistics example
        • Triple Ops example
        • Triple reader example
        • Triple writer example
      • OWL
        • Dataset OWL reader example (Functional and Manchester syntax)
      • Inference
        • RDF Graph Inference example (RDFS (Full), RDFS (Simple), OWL Horst, Transitive reasoner)
      • ML
        • RDF By Modularity Clustering example

  • SANSA RDF 0.3.0

    Features

    • Spark
      • Support for ingestion of additional RDF formats
        • RDF/XML
        • N quad
        • Turtle
      • Support for RDF quality assessment
      • Support for semantic partitioning
    • Flink
      • Support for RDF quality assessment
      • Support for semantic partitioning

    Bug Fixes

    • #19 Review internal table naming

    Dependency changes

    • Scala 2.11.11
    • Apache Spark 2.2.1
    • Apache Flink 1.4.0
    • Apache Jena 3.5.0

    SANSA OWL 0.3.0

    Dependency changes

    • Scala 2.11.1.1
    • Apache Spark 2.2.1
    • Apache Flink 1.3.2
    • OWL API 5.1.3

    SANSA Query 0.3.0

    Bugfixes

    • Fixed an issue when using language tags in basic graph patterns
    • Fixed version conflicts in dependencies
    • Fixed an issue that caused invalid table names to be derived from certain URIs

    Notes

    • SPARQL query capabilities against the bundled SPARK release are limited due to a SPARK issue that was already resolved but will only be available in the next SPARK release.

    SANSA Inference 0.3.0

    Freatures

    • Spark
      • Forward chaining for OWL EL (Experimental)
      • Automatic inference plan detection (Experimental)

    Dependency changes

    • Scala 2.11.11
    • Apache Spark 2.2.1
    • Apache Flink 1.3.2
    • Apache Jena 3.5.0

    SANSA-ML 0.3.0

    Features

    • Updated: Rule mining algorithm for RDF graphs based on AMIE+ further developed (still beta status)
    • Updated: semantic similarity measures: They can be defined as a function of common and distinctive features among different entities. We have implemented the following measures:
      • Jaccard similarity,
      • Rodr ́ıguez and Egenhofer similarity
      • Tversky Ratio Model
      • Batet Similarity
    • Updated: Clustering algorithms further extended and evaluated (Experimental)
      • Silvia Link Clustering
      • Border Flow (Extended for RDF)
      • Power Iteration Clustering (Extended for RDF)
      • Modularity Clustering
    • New: Anomaly detection (beta status)
    • New: Vandalism Detection (beta status)
    • Knowledge graph embedding approaches integrated into the SANSA core: TransE (beta status), DistMult (beta status)
    • In-Progress: Terminological Decision Trees for the classification of concepts

    Dependency changes

    • Scala 2.11.11
    • Apache Spark 2.2.1
    • Apache Flink 1.4.0
    • Apache Jena 3.5.0

    SANSA Examples 0.3.0

    Features

    • Spark
      • RDF Graph Inference example (RDFS, OWL Horst reasoner)
      • Dataset OWL reader example (Functional and Manchester syntax)
      • RDD OWL reader example (Functional and Manchester syntax)
      • Mines the Rules example
      • RDF By Modularity Clustering example
      • Power Iteration Clustering example
      • Border Flow Clustering example
      • Silvia Clustering example
      • Holdout Cross validation techniques example
      • Anomaly Detection example
      • Sparklify example
      • RDF Statistics example
      • PageRank of resources example
      • Triple Ops example
      • Triple reader example
      • Triple writer example
    • Flink
      • RDF Graph Inference example (RDFS (Full), RDFS (Simple), OWL Horst, Transitive reasoner)
      • Dataset OWL reader example (Functional and Manchester syntax)
      • RDF By Modularity Clustering example
      • RDF Statistics example
      • Triple Ops example
      • Triple reader example
      • Triple writer example

  • SANSA RDF 0.2.0

    Features

    • Spark
      • Support for streaming RDF files/kafka010 in N-Triples format
      • Support for RDF stats
    • Flink
      • Partitioning based on Sparqlify
      • Support for RDF stats
      • Support for Gelly

    Dependency changes

    • Spark 2.1.1
    • Flink 1.3.0
    • JenaAPI 3.1.1

    SANSA OWL 0.2.0

    Features

    • Extended support for reading OWL files in Manchester syntax for Spark and Flink

    Bugs fixed

    • In the previous version, certain constructs of OWL files in Manchester syntax were not parsed correctly due to shortcomings in the OWLAPI Manchester parser (also forcing us to disable some of the tests), which is now fixed.

    Dependency changes

    • Spark 2.1.1
    • Flink 1.3.0
    • OWLAPI 5.1.0

    SANSA Query 0.2.0

    Features

    • Spark
      • Improved support for Datatypes

    SANSA Inference 0.2.0

      • Several bug fixes
      • Spark
        • ADDED reasoner to compute the transitive closure for a given set of properties
        • ADDED option to distinguish between RDFS Simple and RDFS Full
        • IMPROVED performance for RDFS and OWL Horst
      • Flink
        • ADDED option to distinguish between RDFS Simple and RDFS Full

    SANSA ML 0.2.0

    Features

    • Spark
      • Rule mining algorithm for RDF graphs based on AMIE+ further developed (still beta status)
      • Distributed Tensor Factorisation (very experimental, not fully integrated)
      • Several semantic similarity measures implemented (experimental)
      • Power Iteration Clustering with custom similarity measures
      • Border Flow Clustering
    • Flink

    Dependency changes

    • Spark 2.1.1
    • Flink 1.3.0
    • JenaAPI 3.1.1

    SANSA Examples 0.2.0

    Features

    • Spark
      • RDF Graph Inference example (RDFS, OWL Horst reasoner)
      • Dataset OWL reader example (Functional and Manchester syntax)
      • RDD OWL reader example (Functional and Manchester syntax)
      • Mines the Rules example
      • RDF By Modularity Clustering example
      • Power Iteration Clustering example
      • Border Flow Clustering example
      • Silvia Clustering example
      • Holdout Cross validation techniques example
      • Anomaly Detection example
      • Sparklify example
      • RDF Statistics example
      • PageRank of resources example
      • Triple Ops example
      • Triple reader example
      • Triple writer example
    • Flink
      • RDF Graph Inference example (RDFS (Full), RDFS (Simple), OWL Horst, Transitive reasoner)
      • Dataset OWL reader example (Functional and Manchester syntax)
      • RDF By Modularity Clustering example
      • RDF Statistics example
      • Triple Ops example
      • Triple reader example
      • Triple writer example

  • SANSA RDF 0.1.0

    Features

    • Spark
      • Support for reading and writing RDF files in N-Triples format
      • Partitioning based on Sparqlify
      • Jena Kryo Serialiser
    • Flink
      • Support for reading and writing RDF files in N-Triples format

    SANSA OWL 0.1.0

    Features

      • Spark
        • Support for reading OWL files in OWL Functional Syntax
        • Support for reading OWL files in Manchester Syntax (experimental)
        • Support for both RDD and Dataset representations of distributed OWL axioms
      • Flink
        • Support for reading OWL files in OWL Functional Syntax

    Support for reading OWL files in Manchester Syntax (experimental)

    SANSA Query 0.1.0

    Features

    • Spark
      • Webserver based on Jena-Sparql-API
      • Partitioning based on Sparqlify
      • Support for SPARQL queries over SANSA RDF
    • Flink is not supported in this release

    SANSA Inference 0.1.0

    Features

    • Spark
      • Support for RDFS/RDFS Simple/OWL-Horst materialization
    • Flink
      • Support for RDFS/RDFS Simple/OWL-Horst materialization

    We provide Jar files which have been build for deployment on a Spark/Flink cluster.

    SANSA ML 0.1.0

    Features

    • Spark
      • An RDF clustering algorithm based on an approach for undirected graphs maximizing a modularity function, which was first introduced by Newman (DOI: https://doi.org/10.1103/PhysRevE.69.066133) (beta status)
      • A rule mining algorithm for RDF graphs based on AMIE+ (beta status)
    • Flink is not supported in this release

    SANSA Examples 0.1.0

    Features

    • Spark
      • RDF Graph Inference example (RDFS, OWL Horst reasoner)
      • Dataset OWL reader example (Functional and Manchester syntax)
      • RDD OWL reader example (Functional and Manchester syntax)
      • Mines the Rules example
      • RDF By Modularity Clustering example
      • Sparklify example
      • PageRank of resources example
      • Triple Ops example
      • Triple reader example
      • Triple writer example
    • Flink
      • RDF Graph Inference example (RDFS, OWL Horst reasoner)
      • Dataset OWL reader example (Functional and Manchester syntax)


We’re actively working (cooking) on SANSA with rich functionalities. Stay tuned for more info.