JensLehmann – SANSA-Stack

SANSA 0.7.1 (Semantic Analytics Stack) Released

January 16, 2020January 17, 2020JensLehmann

We are happy to announce SANSA 0.7.1 – the seventh release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

Website: http://sansa-stack.net
GitHub: https://github.com/SANSA-Stack
Download: http://sansa-stack.net/downloads-usage/
ChangeLog: https://github.com/SANSA-Stack/SANSA-Stack/releases

You can find usage guidelines and examples at http://sansa-stack.net/user-guide.

The following features are currently supported by SANSA:

Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad, TRIX format
Reading OWL files in various standard formats
Query heterogeneous sources (Data Lake) using SPARQL – CSV, Parquet, MongoDB, Cassandra, JDBC (MySQL, SQL Server, etc.) are supported
Support for multiple data partitioning techniques
SPARQL querying via Sparqlify and Ontop and Tensors
Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
RDFS, RDFS Simple and OWL-Horst forward chaining inference
RDF graph clustering with different algorithms
Terminological decision trees (experimental)
Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

TRIX support
A new query engine over compressed RDF data
OWL/XML Support

Deployment and getting started:

There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
Example code is available for various tasks.
We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Ocean, SLIPO, QROWD, BETTER, BOOST, MLwin, PLATOON and Simple-ML. Also check out our recent articles in which we describe how to use SANSA for tensor based querying, scalable RDB2RDF query execution, quality assessment and semantic partitioning.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team

SANSA 0.6 (Semantic Analytics Stack) Released

July 2, 2019July 2, 2019JensLehmann

We are happy to announce SANSA 0.6 – the sixth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

Website: http://sansa-stack.net
GitHub: https://github.com/SANSA-Stack
Download: http://sansa-stack.net/downloads-usage/
ChangeLog: https://github.com/SANSA-Stack/SANSA-Stack/releases

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format

Reading OWL files in various standard formats
Query heterogeneous sources (Data Lake) using SPARQL – CSV, Parquet, MongoDB, Cassandra, JDBC (MySQL, SQL Server, etc.) are supported
Support for multiple data partitioning techniques
SPARQL querying via Sparqlify and Ontop and Tensors
Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
RDFS, RDFS Simple and OWL-Horst forward chaining inference
RDF graph clustering with different algorithms
Terminological decision trees (experimental)
Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

Tensor representation of RDF added
Ontop RDB2RDF engine support has been added
Tensor based querying engine introduced
RDF data quality assessment methods have been added
Dataset statistics calculation has been substantially improved
New clustering algorithms have been added and the interface for clustering has been unified

Deployment and getting started:

There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
Example code is available for various tasks.
We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects HOBBIT, Big Data Ocean, SLIPO, QROWD, BETTER, BOOST, MLwin and Simple-ML.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team

SANSA 0.5 (Semantic Analytics Stack) Released

December 12, 2018December 12, 2018JensLehmann

We are happy to announce SANSA 0.5 – the fifth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

Website: http://sansa-stack.net
GitHub: https://github.com/SANSA-Stack
Download: http://sansa-stack.net/downloads-usage/
ChangeLog: https://github.com/SANSA-Stack/SANSA-Stack/releases

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
Reading OWL files in various standard formats
Query heterogeneous sources (Data Lake) using SPARQL – CSV, Parquet, MongoDB, Cassandra, JDBC (MySQL, SQL Server, etc.) are supported
Support for multiple data partitioning techniques
SPARQL querying via Sparqlify and Ontop
Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
RDFS, RDFS Simple and OWL-Horst forward chaining inference
RDF graph clustering with different algorithms
Terminological decision trees (experimental)
Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

A data lake concept for querying heterogeneous data sources has been integrated into SANSA
New clustering algorithms have been added and the interface for clustering has been unified
Ontop RDB2RDF engine support has been added
RDF data quality assessment methods have been substantially improved
Dataset statistics calculation has been substantially improved
Improved unit test coverage

Deployment and getting started:

There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
Example code is available for various tasks.
We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects HOBBIT, Big Data Ocean, SLIPO, QROWD, BETTER, BOOST, MLwin and Simple-ML.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team

SANSA 0.4 (Semantic Analytics Stack) Released

June 26, 2018June 26, 2018JensLehmann

We are happy to announce SANSA 0.4 – the fourth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

Website: http://sansa-stack.net
GitHub: https://github.com/SANSA-Stack
Download: http://sansa-stack.net/downloads-usage/
ChangeLog: https://github.com/SANSA-Stack/SANSA-Stack/releases

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
Reading OWL files in various standard formats
Support for multiple data partitioning techniques
SPARQL querying via Sparqlify
Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
RDFS, RDFS Simple, OWL-Horst, EL (experimental) forward chaining inference
Automatic inference plan creation (experimental)
RDF graph clustering with different algorithms
Terminological decision trees (experimental)
Anomaly detection (beta)
Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

Parser performance has been improved significantly e.g. DBpedia 2016-10 can be loaded in <100 seconds on a 7 node cluster
Support for a wider range of data partitioning strategies
A better unified API across data representations (RDD, DataFrame, DataSet, Graph) for triple operations
Improved unit test coverage
Improved distributed statistics calculation (see ISWC paper)
Initial scalability tests on 6 billion triple Ethereum blockchain data on a 100 node cluster
New SPARQL-to-GraphX rewriter aiming at providing better performance for queries exploiting graph locality
Numeric outlier detection tested on DBpedia (en)
Improved clustering tested on 20 GB RDF data sets

Deployment and getting started:

There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
Example code is available for various tasks.
We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Europe, HOBBIT, SAKE, Big Data Ocean, SLIPO, QROWD, BETTER, BOOST and SPECIAL.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team