Counting 1,868 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1


Knowledge Browser

  • A UI Dashboard built on top of Spark to browse knowledge (a.k.a data)
  • Real-time query spark and visualise it as graph.
  • Supports SQL query syntax.
  • This is just a sample application to get an idea on how to go about building any kind of analytics dashboard on top of the data that Spark processed. One can customize it according to their needs.



  • When you run this project, the dashboard page will look something like the one shown above.
  • User can type his query and hit submit.
  • Upon submit, spark processes it and returns the data as JSON.
  • The json result is rendered as graph using D3 and AngularJS.
  • Above demo illustrated a simple country profile search CountryCode IN ("USA", "IND", "WLD") and how the three countries information is displayed as graph. Notice, any common relationship between 2 countries are linked via a common node.

Technology Stack

Running this project

What to Query?

I've used open data countries profile information as knowledge base in this project.

Sample Data:

Following table displays some sample rows to give an idea on the columns and schema of the data taken as knowledge base: screen shot 2017-08-03 at 11 45 41 pm

Sample Queries to run (supports sql syntax):

Query USA profile:

  • CountryCode = 'USA'

Query all countries with name starting with letter 'I':

  • CountryCode LIKE 'I%'

Query to get India, USA and World's profile info:

  • CountryCode IN ('USA', 'WLD', 'IND')

Query total population, mortality rate and population growth information in India, USA and World countries:

  • CountryCode IN ('USA', 'WLD', 'IND') AND SeriesCode IN ('SP.POP.TOTL', 'SH.DYN.MORT', 'SP.POP.GROW')

cmd-line args:

Optionally, you can provide configuration params like the host and port from command line. To see the list of configurable params, just type: $ spark-submit <path-to-graph-knowledge-browser.jar> --help

Help content will look something like this:

    Apart from Spark, this application uses akka-http from browser integration.
    So, it needs config params like AkkaWebPort to bind to, SparkMaster
    and SparkAppName

    Usage: spark-submit graph-knowledge-browser.jar [options]
      -h, --help
      -m, --master <master_url>                    spark://host:port, mesos://host:port, yarn, or local. Default: $sparkMasterDef
      -n, --name <name>                            A name of your application. Default: $sparkAppNameDef
      -p, --akkaHttpPort <portnumber>              Port where akka-http is binded. Default: $akkaHttpPortDef

    Configured one route:
    1. http://host:port/index.html - takes user to knowledge browser page

Structure of the project:

  • src/main/scala/com/spoddutur/MainApp.scala: The main class from where application execution begins
  • data/countriesProfile.csv sample data used to query
  • src/main/resources/application.conf: tweak command line args directly here before building the jar and run spark-submit
  • src/main/scala/com/spoddutur/web/WebServer.scala: Starts akka-http webserver at the mentioned host and port. Also, registers the routes (index.html).
  • src/main/scala/com/spoddutur/web/Router.scala:: This is where we can create and register more routes apart from index.html

D3 references: