spark-kafka-writer v0.4.0 released!
July 22, 2017
-
Spark,
Kafka,
tutorial
We’re pleased to announce version 0.4.0 of Spark Kafka Writer.
Spark Kafka Writer is a library that lets tou save your Spark data to Kafka seamlessly: RDD
s,
DStream
s, Dataset
s and DataFrame
s.
The repository is on GitHub
and you can find the latest version on maven central.
In this post, we’ll walk through the new support for writing DataFrame
s and Dataset
s to Kafka.
Writing a DataFrame to Kafka
From version 0.4.0 on, you’ll be able to write DataFrame
s to Kafka.
This differs from writing the output of batch queries to Kafka using the Structure Streaming API,
in the way that you control how you serialize Row
s and you can access the callback API.
Writing a Dataset to Kafka
In the same way you can write DataFrame
s to Kafka, you’ll now be able to write Dataset
s to
Kafka:
Other updates
Version 0.4.0 also brings other changes:
- Supporting Spark 2.2.0
- Providing a way to close producers (see pull request #77)
- Dropping the support for Kafka 0.8
Roadmap
For version 0.5.0, we’re aiming to provide a native API for Java and Python.
If you’d like to get involved, there are different ways you can contribute to
the project:
You can also ask questions and discuss the project on the Gitter channel and check out the Scaladoc.