Spark-Streaming-Twitch: live stream Twitch.tv data with Spark Streaming
Twitch-Streamer uses Twitch's Chat and IRC API to stream in messages from specified
Twitch.tv channels. It is a light-weight wrapper over Spark Streaming and Twitch's Chat IRC data feed. The goal of this project is to fully utilize the strengths of Spark Streaming to allow others to perform analyses of Twitch's live stream chatrooms.
Clients (yourself!) will interact directly using the scala classes -- these are important!
The nitty gritty internals of the project were written in Java.
<dependency> <groupId>com.andrewgapic</groupId> <artifactId>spark-streaming-twitch</artifactId> <version>1.0.0</version> </dependency>
libraryDependencies += "com.andrewgapic" %% "spark-streaming-twitch" % "1.0.0"
This project is built with Java 1.8, Scala 2.11.8, and Spark Streaming 2.11.8.
$ git clone https://github.com/agapic/twitch-streamer.git $ cd twitch-streamer/ $ mvn clean install
Note: If you're interested in helping with developing the project further, there are lots of features and optimizations that could be done. For example, concurrency isn't as optimal as it could be; the entire stream is being fed through one thread. Other asynchronous calls could potentially also have their own thread.
Twitch-Streamer can be used with either Scala or Java, and was built with this notion in mind. Generally, Scala is a better fit since Spark was built in Scala; but, as always, use your favourite language.
Twitch-Streamer uses the Builder pattern to construct a new Receiver object; please refer to the scaladocs for a complete listing of mutator methods.
- You can obtain the
twitch_client_idby registering for your application here: https://www.twitch.tv/settings/connections. Alternatively, you can follow the instructions here: https://blog.twitch.tv/client-id-required-for-kraken-api-calls-afbb8e95f843.
twitch_usernamecan be any string that hasn't joined the IRC chat already. Typically, I just use my actual username.
twitch_passwordcan be retrieved here: ttp://twitchapps.com/tmi/ after your application has been registered.
twitch_client_id <clientid> twitch_username <username> twitch_password <go here: http://twitchapps.com/tmi/> It's an oauth: password, not your twitch account password.
Twitch-Streamer introduces an abstraction called a
Message. It transforms a line of text from Twitch's IRC chat into a
Message, which allows clients to get the author of the message, the channel name, and the actual message content.
import com.andrewgapic.spark.streaming.TwitchStreamBuilder import com.andrewgapic.stream.Message val gamesSet: Set[String] = Set("League+of+Legends") val stream: ReceiverInputDStream[Message] = new TwitchStreamBuilder().setGames(gamesSet).build(ssc)
import com.andrewgapic.spark.streaming.TwitchStreamBuilder; import com.andrewgapic.stream.Message; Set<String> gamesSet = new HashSet<>(); gamesSet.add("League+of+Legends"); JavaReceiverInputDStream<Message> stream = new TwitchStreamBuilder().setGames(gamesSet).build(jssc);
More advanced usage (Scala)
Note: spaces in game names must be replaced by a
+ character. This will be done automatically in future versions.
import com.andrewgapic.spark.streaming.TwitchStreamBuilder import com.andrewgapic.stream.Message val sparkConf = new SparkConf().setAppName("TwitchTest") val ssc = new StreamingContext(sparkConf, Seconds(2)) val gamesSet: Set[String] = Set("League+of+Legends") val channelsSet: Set[String] = Set("TSM_Dyrus") val language: String = "en" //english val storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2 val schedulingInterval: FiniteDuration = 600 seconds // refresh channels every 10 minutes val stream: ReceiverInputDStream[Message] = new TwitchStreamBuilder() .setGames(gamesSet) .setChannels(channelsSet) .setLanguage(language) .setStorageLevel(storageLevel) .setSchedulingInterval(schedulingInterval) .build(ssc)
There are two examples in the
examples folder; one in Scala (
ChannelAndWordsCount), and one in Java (
The Scala example does two things: displays the top 15 words by word frequency (ignores stopwords), and displays the top channels by message frequency. The Java example only displays word frequency.
Bugs and Feedback
For bugs, questions and discussions please use the GitHub Issues.
Copyright 2017, Andrew Gapic.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.