Getting Started with SANSA-Stack
This document summarizes all instructions to help first time users to get and use SANSA-Stack.
Set up SANSA
In order to get quickly started, SANSA provides project templates for the following build tools: Maven and SBT.
Maven
-
Use this Maven template to generate a SANSA project using Apache Spark.
1234git clone https://github.com/SANSA-Stack/SANSA-Template-Maven-Spark.gitcd SANSA-Template-Maven-Sparkmvn clean packageThe subsequent steps depend on your IDE. Generally, just import this repository as a Maven project and start using SANSA / Spark.
-
Use this Maven template to generate a SANSA project using Apache Flink.
1234git clone https://github.com/SANSA-Stack/SANSA-Template-Maven-Flink.gitcd SANSA-Template-Maven-Flinkmvn clean packageThe subsequent steps depend on your IDE. Generally, just import this repository as a Maven project and start using SANSA / Flink.
SBT
-
Use this SBT template to generate a SANSA project using Apache Spark.
1234git clone https://github.com/SANSA-Stack/SANSA-Template-SBT-Spark.gitcd SANSA-Template-SBT-Sparksbt clean packageThe subsequent steps depend on your IDE. Generally, just import this repository as a SBT project and start using SANSA / Spark.
-
Use this SBT template to generate a SANSA project using Apache Flink.
1234git clone https://github.com/SANSA-Stack/SANSA-Template-SBT-Flink.gitcd SANSA-Template-SBT-Flinksbt clean packageThe subsequent steps depend on your IDE. Generally, just import this repository as a SBT project and start using SANSA / Flink.
These templates help you to set up the project structure and to create the initial build files. Enjoy it! 🙂
IDE Setup
-
- Make sure that you have Java 8 or higher installed.
- Install the Eclipse m2e Maven plugin for Maven support, “m2e-egit“ for Git (if not installed already) and m2eclipse-scala (if not installed already).
- Go to “File → New Project → “Checkout Maven Projects from SCM“.
- Set SCM URL type to “git“ and enter the URL of your repository (e.g. for https://github.com/SANSA-Stack/SANSA-RDF it is https://github.com/SANSA-Stack/SANSA-RDF.git ).
- Click on “OK” and wait a while.
-
- File → New → Project from version control -> GitHub
- Log in to github
- Choose github.com/SANSA-Stack/SANSA-Query.git (for example)
- Clone
- “Non-managed pom file found” prompt in the lower right
- Add as maven project
- Be patient while it is “Resolving dependencies” (in the status bar)
- Done
For developers using SANSA:
-
- In order to generate Eclipse project files out of the sbt project, you should install sbteclipse plugin and just hit sbt eclipse on the root of the project .
- Once you have installed and generated the Eclipse project files using one of the above plug-ins, start Eclipse.
- File → Import → General/Existing Project into Workspace.
- Select the directory containing your project as root directory (e.g. https://github.com/SANSA-Stack/SANSA-Template-SBT-Spark), select the project and hit Finish.
-
- File –> New –> Project from Existing Sources.
- Select a project (e.g. https://github.com/SANSA-Stack/SANSA-Template-SBT-Spark) that you want to import and click OK.
- Select Import project from external model option and choose SBT project from the list. Click Next.
- Select SBT options and click Finish.
SANSA-Notebooks
Interactive Spark Notebooks can run SANSA-Examples and are easy to deploy with docker-compose. Deployment stack includes Hadoop for HDFS, Spark for running SANSA examples, Hue for navigation and copying file to HDFS. The notebooks are created and run using Apache Zeppelin.
Clone the SANSA-Notebooks git repository:
1 2 |
git clone https://github.com/SANSA-Stack/SANSA-Notebooks cd SANSA-Notebooks |
Get the SANSA Examples jar file (requires wget
):
1 2 |
make |
Start the cluster (this will lead to downloading BDE docker images, will take a while):
1 2 |
make up |
When start-up is done you will be able to access the following interfaces:
- http://localhost:8080/ (Spark Master)
- http://localhost:8088/home (Hue HDFS Filebrowser)
- http://localhost/ (Zeppelin) To load the data to your cluster simply do:
1 2 |
make load-data |
Go on and open Zeppelin, choose any available notebook and try to execute it.
For more information refer to SANSA-Notebooks Github repository. If you have questions or found bugs, feel free to open an issue on the Github.
Configuring the Computing Frameworks
SANSA Version | Spark Version | Flink Version | Scala Version |
---|---|---|---|
0.8.0 | 3.0.x | 2.12 | |
0.7.1 | 2.4.x | 2.11 | |
0.6.0 | 2.4.x | 1.8.x | 2.11 |
0.5.0 | 2.4.x | 1.7.x | 2.11 |
0.4.0 | 2.3.x | 1.5.x | 2.11 |
0.3.0 | 2.2.x | 1.4.x | 2.11 |
0.2.0 | 2.1.x | 1.3.x | 2.11 |
0.1.0 | 2.0.x | 1.1.x | 2.11 |
Using SANSA in Maven Projects
If you want to import the full SANSA Stack for Apache Spark, please add the following Maven dependency to your project POM file:
1 2 3 4 5 6 |
<!-- SANSA Stack --> <dependency> <groupId>net.sansa-stack</groupId> <artifactId>sansa-stack-spark_2.12</artifactId> <version>$LATEST_RELEASE_VERSION$</version> </dependency> |
If you want to use only a particular layer of the stack, the pattern is always “sansa-LAYER_NAME-spark_SCALA_VERSION” for the Maven artifact name, i.e. it looks in your POM file as follows:
1 2 3 4 5 6 |
<!-- SANSA $LAYER_NAME$ layer --> <dependency> <groupId>net.sansa-stack</groupId> <artifactId>sansa-$LAYER_NAME$-spark_$SCALA_VERSION$</artifactId> <version>$LATEST_RELEASE_VERSION$</version> </dependency> |
For example, if you just want to use latest RDF layer version 0.8.0 with Scala 2.12 in your project, you have to add
1 2 3 4 5 6 |
<!-- SANSA RDF layer --> <dependency> <groupId>net.sansa-stack</groupId> <artifactId>sansa-rdf-spark_2.12</artifactId> <version>0.8.0</version> </dependency> |