Counting 3,742 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Last Commit
Apr. 21, 2019
Dec. 30, 2015

WallarooLabs logo

Build and scale real-time applications as easily as writing a script

CircleCI GitHub license GitHub version IRC

A fast, stream-processing framework. Wallaroo makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been simpler.

What is Wallaroo?

When we set out to build Wallaroo, we had several high-level goals in mind:

  • Create a dependable and resilient distributed computing framework
  • Take care of the complexities of distributed computing "plumbing," allowing developers to focus on their business logic
  • Provide high-performance & low-latency data processing
  • Be portable and deploy easily (i.e., run on-prem or any cloud)
  • Manage in-memory state for the application
  • Allow applications to scale as needed, even when they are live and up-and-running

You can learn more about Wallaroo from our "Hello Wallaroo!" blog post and the Wallaroo overview video.

What makes Wallaroo unique

Wallaroo is a little different than most stream processing tools. While most require the JVM, Wallaroo can be deployed as a separate binary. This means no more jar files. Wallaroo also isn't locked to just using Kafka as a source, use any source you like. Application logic can be written in Python 2, Python 3, or Pony.

Getting Started

Wallaroo can either be installed via Docker, Vagrant or (on Linux) via our handy Wallaroo Up command.

As easy as:

docker pull

Check out our installation options page to learn more.


Once you've installed Wallaroo, Take a look at some of our examples. A great place to start are our word_count or market spread examples in Python.

This is a complete example application that receives lines of text and counts each word.
import string
import struct
import wallaroo

def application_setup(args):
    in_name, in_host, in_port = wallaroo.tcp_parse_input_addrs(args)[0]
    out_host, out_port = wallaroo.tcp_parse_output_addrs(args)[0]

    lines = wallaroo.source("Split and Count",
                        wallaroo.TCPSourceConfig(in_name, in_host, in_port,
    pipeline = (lines
        .to_sink(wallaroo.TCPSinkConfig(out_host, out_port, 

    return wallaroo.build_application("Word Count Application", pipeline)

@wallaroo.computation_multi(name="split into words")
def split(data):
    punctuation = " !\"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~"

    words = []

    for line in data.split("\n"):
        clean_line = line.lower().strip(punctuation)
        for word in clean_line.split(" "):
            clean_word = word.strip(punctuation)

    return words

class WordTotal(object):
    count = 0

@wallaroo.state_computation(name="count word", state=WordTotal)
def count_word(word, word_total):
    word_total.count = word_total.count + 1
    return WordCount(word, word_total.count)

class WordCount(object):
    def __init__(self, word, count):
        self.word = word
        self.count = count

def extract_word(word):
    return word

@wallaroo.decoder(header_length=4, length_fmt=">I")
def decode_line(bs):
    return bs.decode("utf-8")

def encode_word_count(word_count):
    output = word_count.word + " => " + str(word_count.count) + "\n"
    return output.encode("utf-8")


Are you the sort who just wants to get going? Dive right into our documentation then! It will get you up and running with Wallaroo.

More information is also on our blog. There you can find more insight into what we are working on and industry use-cases.

Wallaroo currently exists as a mono-repo. All the source that is Wallaroo is located in this repo. See application structure for more information.

Need Help?

Trying to figure out how to get started?


We welcome contributions. Please see our Contribution Guide

For your pull request to be accepted you will need to accept our Contributor License Agreement


Wallaroo is licensed under the Apache version 2 license.

Latest Releases
 Dec. 31 2018
 Nov. 30 2018
 Oct. 31 2018
 Sep. 28 2018
 Aug. 24 2018