spark-kafka-writer v0.3.0 released!

March 28, 2017 - Spark, Kafka, tutorial

We’re pleased to announce version 0.3.0 of Spark Kafka Writer.

Spark Kafka Writer is a library that lets tou save your Spark RDDs and DStreams to Kafka seamlessly.

The repository is on GitHub and you can find the latest version on maven central.

In this post we’ll cover the new Callback API as well as the other small updates this release brings.

Callback API

The major update for this release is the possibility to have a Kafka Callback called whenever a message is produced.

As an example, we could log a message whenever the production of a message failed:

import com.github.benfradet.spark.kafka010.writer._
import org.apache.kafka.clients.producer.{Callback, ProducerRecord, RecordMetadata}

@transient lazy val log = org.apache.log4j.Logger.getLogger("spark-kafka-writer")

val producerConfig: java.util.Properties = ...

// with a RDD
val rdd: RDD[String] = ...
rdd.writeToKafka(
  producerConfig,
  s => new ProducerRecord[String, String](topic, s),
  Some(new Callback with Serializable {
    override def onCompletion(metadata: RecordMetadata, e: Exception): Unit = {
      if (Option(e).isDefined) {
        log.warn("error sending message", e)
      } else {
        log.info(s"everything went fine, record offset was ${metadata.offset()}")
      }
    }
  })
)

// with a DStream
val dStream: DStream[String] = ...
dStream.writeToKafka(
  producerConfig,
  s => new ProducerRecord[String, String](topic, s),
  Some(new Callback with Serializable {
    override def onCompletion(metadata: RecordMetadata, e: Exception): Unit = {
      if (Option(e).isDefined) {
        log.warn("error sending message", e)
      } else {
        log.info(s"everything went fine, record offset was ${metadata.offset()}")
      }
    }
  })
)

Refer to the Kafka javadoc for sending a message with a Callback through the Kafka producer to know more about callbacks in Kafka.

Thanks a lot to Lawrence Carvalho for this cool new feature!

Other updates

Version 0.3.0 brings another couple of updates:

Thanks to phungleson who updated the Spark version Spark Kafka Writer builds against to 2.1.0
Thanks Bas van den Brink who caught a nasty bug where Kafka producers were always recreated instead of cached (this one’s on me, should have copy/pasted :’()

Roadmap

For version 0.4.0, we’re aiming to provide an API to write DataFrams and Datasets to Kafka.

If you’d like to get involved, there are different ways you can contribute to the project:

You can also ask questions and discuss the project on the Gitter channel and check out the Scaladoc.

Ben Fradet's blog