spark-kafka-writer v0.2.0 released!

October 5, 2016 - Spark, Kafka, tutorial

As a reminder, Spark Kafka writer is a project that lets you save your Spark RDDs and DStreams to Kafka seamlessly.

The repo is on GitHub and you can find the latest version on maven central.

In this post, I’ll introduce what’s new and the breaking changes we’ve made.

Support for Spark 2.0!

First and foremost, the 0.2.0 release is compatible with Spark 2.0 and is built against it. Feel free to update the version of your Spark Kafka writer if you’re using Spark 2.0.

Support for Kafka 0.8 and Kafka 0.10

Since support for Kafka is split inside Spark depending on whether you’re using Kafka 0.8 (spark-streaming-kafka-0-8) or Kafka 0.10 (spark-streaming-kafka-0-10), we did the same for spark-kafka-writer.

As a result, two artifacts have been created:

libraryDependencies ++= Seq(
  "com.github.benfradet" %% "spark-kafka-0-8-writer" % "0.2.0"
)
libraryDependencies ++= Seq(
  "com.github.benfradet" %% "spark-kafka-0-10-writer" % "0.2.0"
)

Implicits moved to a top-level package object

In order to reduce the verbosity when using spark-kafka-writer, we decided to move the definitions of the implicits needed to convert an RDD or a DStream to our internal KafkaWriter to a top-level package object.

Consequently, you won’t need to import com.github.benfradet.spark.kafka.writer.KafkaWriter._ as before, but only com.github.benfradet.spark.kafka.writer._.

Following is a full example showing the new import using Kafka 0.10.

import com.github.benfradet.spark.kafka010.writer._
import org.apache.kafka.common.serialization.StringSerializer

val producerConfig = {
  val p = new java.util.Properties()
  p.setProperty("bootstrap.servers", "127.0.0.1:9092")
  p.setProperty("key.serializer", classOf[StringSerializer].getName)
  p.setProperty("value.serializer", classOf[StringSerializer].getName)
}

val rdd: RDD[String] = ...
rdd.writeToKafka(
  producerConfig,
  s => new ProducerRecord[String, String]("my-topic", s)
)

val dstream: DStream[String] = ...
dstream.writeToKafka(
  producerConfig,
  s => new ProducerRecord[String, String]("my-topic", s)
)

Online Scaladoc

Starting from this release, the scaladoc is available online at https://benfradet.github.io/spark-kafka-writer.

Future work

Quite a few things are planned for 0.3.0:

If you’re interested in helping out, there are ways you can contribute to the project:

You can also ask your questions and discuss the project on the Gitter channel.