What is SANSA?

SANSA is a big data engine for scalable processing of large-scale RDF data. SANSA uses Apache Spark which offers fault-tolerant, highly available and scalable approaches to efficiently process massive sized datasets. SANSA provides the facilities for Semantic data representation, Querying, Inference, and Analytics.

SANSA-Stack’s core is a data flow engine that provides data distribution and fault tolerance for distributed computations over RDF large-scale datasets.

SANSA includes several modules for creating applications:

  1. Read / Write RDF / OWL for RDF/OWL operations,
  2. Querying support a query language on top of distributed RDF/OWL library, as well as querying heterogeneous non-RDF data.
  3. Inference implements rule-based reasoning on RDF/OWL data,
  4. ML- Machine Learning for semantic aware analytics on RDF data

SANSA is easily integrated with well-known open source systems both for data input and output (HDFS) and is build on top of Spark.


SANSA-Stack Architecture

SANSA-Stack Architecture

What is the idea behind SANSA?

In SANSA, we combine distributed computing frameworks (specifically Spark and Flink) with the semantic technology stack.

The SANSA vision combines distributed analytics (left) and semantic technologies (right) into a scalable semantic analytics stack (top). The colours encode what part of the two original stacks influence which part of the SANSA stack. The main objective of SANSA is to investigate whether the characteristics of each technology stack (bottom) can be combined to retain the respective advantages.


SANSA inherits the following advantages from the semantic technology
stack and machine learning research and distributed computing.

Powerful Data Integration

Current analytics pipelines have to handle increasing data variety and complexity more…

Expressive Modelling

The vast majority of machine learning algorithms have to rely on simple input more…


The usage of W3C standards can generally reduce pre-processing time in those more…

Measurable Benefits

A key driver for the success of machine learning
is that its benefits are often directly more…

Horizontal Scalability

Distributed in-memory computing can provide the
horizontal scalability required more…


If you have question related to SANSA community then you can post in on various channels:

Latest Blog Posts

Supported By

Smart Data Analytics
SANSA is a research project of the Smart Data Analytics research group.