SANSA contains the implementation of a partitioning algorithm for RDF graphs given as NTriples. The algorithm uses the structure of the underlying undirected graph to partition the nodes into different clusters. SANSA’s clustering procedure follows a standard algorithm for partitioning undirected graphs aimed to maximize a modularity function, which was first introduced by Newman.
You will need your RDF graph in the form of a text file, with each line containing exactly one triple of the graph. Then you specify the number of iterations and supply a file path where you want your resulting clusters to be saved to.
1234567import net.sansa_stack.ml.spark.clustering.RDFByModularityClusteringval numIterations = 100val input ="path_to_your_RDFgraph.txt"val output ="path_name_for_clusters.txt"RDFByModularityClustering(spark.sparkContext, numIterations, input, output)
Full example code: https://github.com/SANSA-Stack/SANSA-Examples/blob/master/sansa-examples-spark/src/main/scala/net/sansa_stack/examples/spark/ml/clustering/RDFByModularityClustering.scala
123456789import net.sansa_stack.ml.flink.clustering.RDFByModularityClusteringval numIterations = 100val input ="path_to_your_RDFgraph.txt"val output ="path_name_for_clusters.txt"val env = ExecutionEnvironment.getExecutionEnvironmentRDFByModularityClustering(env, numIterations, graphFile, outputFile)
Full example code: https://github.com/SANSA-Stack/SANSA-Examples/blob/master/sansa-examples-flink/src/main/scala/net/sansa_stack/examples/flink/ml/clustering/RDFByModularityClustering.scala