Counting 3,663 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Author
Last Commit
Mar. 22, 2019
Created
Sep. 9, 2015

What is StreamSets Data Collector?

StreamSets Data Collector is an enterprise grade, open source, continuous big data ingestion infrastructure. It has an advanced and easy to use User Interface that lets data scientists, developers and data infrastructure teams easily create data pipelines in a fraction of the time typically required to create complex ingest scenarios. Out of the box, StreamSets Data Collector reads from and writes to a large number of end-points, including S3, JDBC, Hadoop, Kafka, Cassandra and many others. You can use Python, Javascript and Java Expression Language in addition to a large number of pre-built stages to transform and process the data on the fly. For fault tolerance and scale out, you can setup data pipelines in cluster mode and perform fine grained monitoring at every stage of the pipeline.

To learn more, check out http://streamsets.com

Building StreamSets Data Collector

To build the StreamSets Data Collector from source code, click here for details.

License

StreamSets Data Collector is built on open source technologies, our code is licensed with the Apache License 2.0.

Getting Help

A good place to start is to check out http://streamsets.com/community. On that page you will find all the ways you can reach us and channels our team monitors. You can post questions on Google Groups sdc-user or on StackExchange using the tag #StreamSets. Post bugs at http://issues.streamsets.com or tweet at us with #StreamSets.

If you need help with production systems, you can check out the variety of support options offered on our support page.

Contributing Code

We welcome contributors, please check out our guidelines to get started.

Changelog

See the latest changelog

Latest Releases
3.8.0-RC3
 Mar. 12 2019
3.8.0-RC2
 Mar. 8 2019
3.8.0-RC1
 Mar. 5 2019
3.6.2-RC1
 Feb. 8 2019
3.7.2-RC1
 Feb. 5 2019