# SparkTDA

The scalable topological data analysis package for Apache Spark. This project aims to implement the following features:

- Scalable Mapper Implemented as Reeb Diagrams, i.e., Reeb Cosheaves
- Scalable Mapper Implementation
- Scalable Multiscale Mapper Implementation
- Scalable Tower Computation for Multiscale Mapper
- Scalable Persistent Homology Computation on Top of Apache Spark

If you would like to know how to use and/or learn more the implementation details of the above mentioned features, please follow the links.

# Status

**WIP** and **EXPERIMENTAL**. This package is still a proof-of-concept of scalable topological data analysis support for
Apache Spark, hence you cannot expect that this package is ready for production use.

# Examples

### Mapper

# Requirements

This library requires Spark 2.0+

# Building and Running Unit Tests

To compile this project, run `sbt package`

from the project home directory. This will also run the Scala unit tests.
To run the unit tests, run `sbt test`

from the project home directory. This project uses the
sbt-spark-package plugin, which provides the 'spPublish' and
'spPublishLocal' task. We recommend users to use this library with Apache Spark including the dependencies by
supplying a comma-delimited list of Maven coordinates with `--packages`

and download the package from the locally
repository or official Spark Packages repository.

### The package can be published locally with:

`$ sbt spPublishLocal`

### Spark Packages with (requires authentication and authorization):

The package can be published to`$ sbt spPublish`

# Using with Spark Shell

This package can be added to Spark using the `--packages`

command line option. For example, to include it when starting
the spark shell:

`$ spark-shell --packages ognis1205:spark-tda:0.0.1-SNAPSHOT-spark2.2-s_2.11`

# Future Works

### Mapper

- Write Wiki
- Implement Python APIs
- Publish to Spark Packages
- Benchmark
- Consider using GraphFrames instead of plain GraphX
- Implement some useful filter functions, e.g., Gaussian Density, Graph Laplacian, etc as transformers

