As a reminder, Spark Kafka writer is a project that lets you save your Spark
RDD
s and DStream
s to Kafka seamlessly.
The repo is on GitHub and you can find the latest version on maven central.
In this post, I’ll introduce what’s new and the breaking changes we’ve made.
First and foremost, the 0.2.0 release is compatible with Spark 2.0 and is built against it. Feel free to update the version of your Spark Kafka writer if you’re using Spark 2.0.
Since support for Kafka is split inside Spark depending on whether you’re using
Kafka 0.8 (spark-streaming-kafka-0-8
) or Kafka 0.10
(spark-streaming-kafka-0-10
), we did the same for spark-kafka-writer.
As a result, two artifacts have been created:
In order to reduce the verbosity when using spark-kafka-writer, we decided
to move the definitions of the implicits needed to convert an RDD
or a
DStream
to our internal KafkaWriter
to a top-level package object.
Consequently, you won’t need to import
com.github.benfradet.spark.kafka.writer.KafkaWriter._
as before, but only
com.github.benfradet.spark.kafka.writer._
.
Following is a full example showing the new import using Kafka 0.10.
Starting from this release, the scaladoc is available online at https://benfradet.github.io/spark-kafka-writer.
Quite a few things are planned for 0.3.0:
DataFrame
s and Dataset
s to KafkaIf you’re interested in helping out, there are ways you can contribute to the project:
You can also ask your questions and discuss the project on the Gitter channel.