Semantic Analytics Stack (SANSA)
Open Source Algorithms for Distributed Data Processing
for Large-scale RDF Knowledge Graphs
Slider

What is SANSA?

SANSA is a big data processing engine for scalable processing of large-scale RDF data. SANSA uses Spark and Flink which offer fault-tolerant, highly available and scalable approaches to process massive sized datasets efficiently. SANSA provides the facilities for Semantic data representation, Querying, Inference, and Analytics.

SANSA-Stack’s core is a processing data flow engine that provides data distribution and fault tolerance for distributed computations over RDF large-scale datasets.

SANSA includes several libraries for creating applications:

  1. Read / Write RDF / OWL library for RDF/OWL operations,
  2. Querying library  support a query language on top of distributed RDF/OWL library,
  3. Inference library implements rule-based reasoning on RDF/OWL data,
  4. ML- Machine Learning core library

SANSA is easily integrated with well-known open source systems both for data input and output (HDFS) and is build on top of Spark and Flink.

|

SANSA-Stack Architecture


SANSA-Stack Architecture

What is the idea behind SANSA?

In SANSA, we combine distributed computing frameworks (specifically Spark and Flink) with the semantic technology stack.

The SANSA vision combines distributed analytics (left) and semantic technologies (right) into a scalable semantic analytics stack (top). The colours encode what part of the two original stacks influence which part of the SANSA stack. The main objective of SANSA is to investigate whether the characteristics of each technology stack (bottom) can be combined to retain the respective advantages.


Why SANSA?

SANSA inherits the following advantages from the semantic technology
stack and machine learning research and distributed computing.



Powerful Data Integration

Current analytics pipelines have to handle increasing data variety and complexity more…



Expressive Modelling

The vast majority of machine learning algorithms have to rely on simple input more…



Standards

The usage of W3C standards can generally reduce pre-processing time in those more…



Measurable Benefits

A key driver for the success of machine learning
is that its benefits are often directly more…



Horizontal Scalability

Distributed in-memory computing can provide the
horizontal scalability required more…


Community

If you have question related to SANSA community then you can post in on various channels:

Latest Blog Posts


Supported By

Uni_Bonn_newlogo
logo-infai
logo-iais
Smart Data Analytics
SANSA is a research project of the Smart Data Analytics research group.