Counting 3,384 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Author
Last Commit
Dec. 15, 2018
Created
Dec. 30, 2015

WallarooLabs logo

Build and scale real-time applications as easily as writing a script


CircleCI GitHub license GitHub version IRC Groups.io

A fast, stream-processing framework. Wallaroo makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been simpler.

What is Wallaroo?

When we set out to build Wallaroo, we had several high-level goals in mind:

  • Create a dependable and resilient distributed computing framework
  • Take care of the complexities of distributed computing "plumbing," allowing developers to focus on their business logic
  • Provide high-performance & low-latency data processing
  • Be portable and deploy easily (i.e., run on-prem or any cloud)
  • Manage in-memory state for the application
  • Allow applications to scale as needed, even when they are live and up-and-running

You can learn more about Wallaroo from our "Hello Wallaroo!" blog post and the Wallaroo overview video.

What makes Wallaroo unique

Wallaroo is a little different than most stream processing tools. While most require the JVM, Wallaroo can be deployed as a separate binary. This means no more jar files. Wallaroo also isn't locked to just using Kafka as a source, use any source you like. Application logic can be written in Python 2, Python 3, or Pony.

Getting Started

Wallaroo can either be installed via Docker, Vagrant or (on Linux) via our handy Wallaroo Up command.

As easy as:

docker pull wallaroo-labs-docker-wallaroolabs.bintray.io/release/wallaroo:latest

Check out our installation options page to learn more.

Usage

Once you've installed Wallaroo, Take a look at some of our examples. A great place to start are our word_count or market spread examples in Python.

"""
This is a complete example application that receives lines of text and counts each word.
"""
import string
import struct
import wallaroo

def application_setup(args):
    in_host, in_port = wallaroo.tcp_parse_input_addrs(args)[0]
    out_host, out_port = wallaroo.tcp_parse_output_addrs(args)[0]

    lines = wallaroo.source("Split and Count",
                        wallaroo.TCPSourceConfig(in_host, in_port, 
                            decode_line))
    pipeline = (lines
        .to(split)
        .key_by(extract_word)
        .to(count_word)
        .to_sink(wallaroo.TCPSinkConfig(out_host, out_port, 
            encode_word_count)))

    return wallaroo.build_application("Word Count Application", pipeline)

@wallaroo.computation_multi(name="split into words")
def split(data):
    punctuation = " !\"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~"

    words = []

    for line in data.split("\n"):
        clean_line = line.lower().strip(punctuation)
        for word in clean_line.split(" "):
            clean_word = word.strip(punctuation)
            words.append(clean_word)

    return words

class WordTotal(object):
    count = 0

@wallaroo.state_computation(name="count word", state=WordTotal)
def count_word(word, word_total):
    word_total.count = word_total.count + 1
    return WordCount(word, word_total.count)

class WordCount(object):
    def __init__(self, word, count):
        self.word = word
        self.count = count

@wallaroo.key_extractor
def extract_word(word):
    return word

@wallaroo.decoder(header_length=4, length_fmt=">I")
def decode_line(bs):
    return bs.decode("utf-8")

@wallaroo.encoder
def encode_word_count(word_count):
    output = word_count.word + " => " + str(word_count.count) + "\n"
    return output.encode("utf-8")

Documentation

Are you the sort who just wants to get going? Dive right into our documentation then! It will get you up and running with Wallaroo.

More information is also on our blog. There you can find more insight into what we are working on and industry use-cases.

Wallaroo currently exists as a mono-repo. All the source that is Wallaroo is located in this repo. See application structure for more information.

Need Help?

Trying to figure out how to get started?

Contributing

We welcome contributions. Please see our Contribution Guide

For your pull request to be accepted you will need to accept our Contributor License Agreement

License

Wallaroo is licensed under the Apache version 2 license.

Latest Releases
0.6.0
 Nov. 30 2018
0.5.4
 Oct. 31 2018
0.5.3
 Sep. 28 2018
0.5.2
 Aug. 24 2018
0.5.1
 Aug. 1 2018