Spark Kafka writer is a little project that lets you save your Spark RDDs and DStreams to Kafka seamlessly.
If you’ve already heard about this project it’s because I took over the work of Hari Shreedharan since his project was discontinued for reasons unknown to me.
I released the first version (0.1.0) a couple of weeks back and we’ve used it until now without issues so I thought I’d write a bit about how to use it.
Important note: the 0.1.0 relase is compatible with Apache Spark 1.6.x and hasn’t been tested against 2.0.0.
The first thing you’ll have to do is to add a dependency to the project.
If you’d like to write an RDD to Kafka it’s pretty simple.
First, you’ll have to define a configuration for your Kafka producer:
This is the minimum set of properties you’ll have to provide. A full list of the available properties can be found here.
Next up, you can just write your RDD to Kafka!
The first argument to the
writeToKafka function is our previously defined
The second argument is a function which transforms an element of the RDD (here
String) into a
ProducerRecord which can be sent to Kafka. Here, we
simply create a producer record with the topic we want to send our messages to
my-topic) and the record will only contain a value which is our RDD element.
If you want to know more about
ProducerRecord, the javadoc can be found
The API for writing a DStream to Kafka is very similar to the one we just used to write RDDs.
In fact, we’ll reuse the same configuration as before to write our DStream to Kafka:
As you can see, the API is exactly the same!
As mentioned earlier, the 0.1.0 version only supports Apache Spark 1.6.x. Because of that, the current focus is to support Spark 2.0.0 with support for both Kafka 0.8 and Kafka 0.10. This is planned for 0.2.0.
Another thing I’d like to work on is the ability to write DataFrames to Kafka but that won’t be in the near future I would think.
If you’d like to help out or fill a bug you’ve found, please pick or create an issue on GitHub!
You can also discuss the project on the Gitter channel.