eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
announcement - icon

Let's get started with a Microservice Architecture with Spring Cloud:

>> Join Pro and download the eBook

eBook – Mockito – NPI EA (tag = Mockito)
announcement - icon

Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.

Get started with mocking and improve your application tests using our Mockito guide:

Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Reactive – NPI EA (cat=Reactive)
announcement - icon

Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:

>> Join Pro and download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Jackson – NPI EA (cat=Jackson)
announcement - icon

Do JSON right with Jackson

Download the E-book

eBook – HTTP Client – NPI EA (cat=Http Client-Side)
announcement - icon

Get the most out of the Apache HTTP Client

Download the E-book

eBook – Maven – NPI EA (cat = Maven)
announcement - icon

Get Started with Apache Maven:

Download the E-book

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

eBook – RwS – NPI EA (cat=Spring MVC)
announcement - icon

Building a REST API with Spring?

Download the E-book

Course – LS – NPI EA (cat=Jackson)
announcement - icon

Get started with Spring and Spring Boot, through the Learn Spring course:

>> LEARN SPRING
Course – RWSB – NPI EA (cat=REST)
announcement - icon

Explore Spring Boot 3 and Spring 6 in-depth through building a full REST API with the framework:

>> The New “REST With Spring Boot”

Course – LSS – NPI EA (cat=Spring Security)
announcement - icon

Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.

I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.

You can explore the course here:

>> Learn Spring Security

Course – LSD – NPI EA (tag=Spring Data JPA)
announcement - icon

Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.

Get started with Spring Data JPA through the guided reference course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (cat=Spring Boot)
announcement - icon

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Course – LJB – NPI EA (cat = Core Java)
announcement - icon

Code your way through and build up a solid, practical foundation of Java:

>> Learn Java Basics

Partner – LambdaTest – NPI EA (cat= Testing)
announcement - icon

Distributed systems often come with complex challenges such as service-to-service communication, state management, asynchronous messaging, security, and more.

Dapr (Distributed Application Runtime) provides a set of APIs and building blocks to address these challenges, abstracting away infrastructure so we can focus on business logic.

In this tutorial, we'll focus on Dapr's pub/sub API for message brokering. Using its Spring Boot integration, we'll simplify the creation of a loosely coupled, portable, and easily testable pub/sub messaging system:

>> Flexible Pub/Sub Messaging With Spring Boot and Dapr

1. Overview

Apache Kafka® is a distributed streaming platform. In a previous tutorial, we discussed how to implement Kafka consumers and producers using Spring.

In this tutorial, we’ll learn how to use Kafka Connectors.

We’ll have a look at:

  • Different types of Kafka Connectors
  • Features and modes of Kafka Connect
  • Connectors configuration using property files as well as the REST API

2. Basics of Kafka Connect and Kafka Connectors

Kafka Connect is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems, using so-called Connectors.

Kafka Connectors are ready-to-use components, which can help us to import data from external systems into Kafka topics and export data from Kafka topics into external systemsWe can use existing connector implementations for common data sources and sinks or implement our own connectors.

A source connector collects data from a system. Source systems can be entire databases, streams tables, or message brokers. A source connector could also collect metrics from application servers into Kafka topics, making the data available for stream processing with low latency.

A sink connector delivers data from Kafka topics into other systems, which might be indexes such as Elasticsearch, batch systems such as Hadoop, or any kind of database.

Some connectors are maintained by the community, while others are supported by Confluent or its partners. Really, we can find connectors for most popular systems, like S3, JDBC, and Cassandra, just to name a few.

3. Features

Kafka Connect features include:

  • A framework for connecting external systems with Kafka – it simplifies the development, deployment, and management of connectors
  • Distributed and standalone modes – it helps us to deploy large clusters by leveraging the distributed nature of Kafka, as well as setups for development, testing, and small production deployments
  • REST interface – we can manage connectors using a REST API
  • Automatic offset management – Kafka Connect helps us to handle the offset commit process, which saves us the trouble of implementing this error-prone part of connector development manually
  • Distributed and scalable by default – Kafka Connect uses the existing group management protocol; we can add more workers to scale up a Kafka Connect cluster
  • Streaming and batch integration – Kafka Connect is an ideal solution for bridging streaming and batch data systems in connection with Kafka’s existing capabilities
  • Transformations – these enable us to make simple and lightweight modifications to individual messages

4. Setup

Instead of using the plain Kafka distribution, we’ll download Confluent Platform, a Kafka distribution provided by Confluent, Inc., the company behind Kafka. Confluent Platform comes with some additional tools and clients, compared to plain Kafka, as well as some additional pre-built Connectors.

For our case, the Open Source edition is sufficient, which can be found at Confluent’s site.

5. Quick Start Kafka Connect

For starters, we’ll discuss the principle of Kafka Connect, using its most basic Connectors, which are the file source connector and the file sink connector.

Conveniently, Confluent Platform comes with both of these connectors, as well as reference configurations.

5.1. Source Connector Configuration

For the source connector, the reference configuration is available at $CONFLUENT_HOME/etc/kafka/connect-file-source.properties:

name=local-file-source
connector.class=FileStreamSource
tasks.max=1
topic=connect-test
file=test.txt

This configuration has some properties that are common for all source connectors:

  • name is a user-specified name for the connector instance
  • connector.class specifies the implementing class, basically the kind of connector
  • tasks.max specifies how many instances of our source connector should run in parallel, and
  • topic defines the topic to which the connector should send the output

In this case, we also have a connector-specific attribute:

  • file defines the file from which the connector should read the input

For this to work then, let’s create a basic file with some content:

echo -e "foo\nbar\n" > $CONFLUENT_HOME/test.txt

Note that the working directory is $CONFLUENT_HOME.

5.2. Sink Connector Configuration

For our sink connector, we’ll use the reference configuration at $CONFLUENT_HOME/etc/kafka/connect-file-sink.properties:

name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=test.sink.txt
topics=connect-test

Logically, it contains exactly the same parameters, though this time connector.class specifies the sink connector implementation, and file is the location where the connector should write the content.

5.3. Worker Config

Finally, we have to configure the Connect worker, which will integrate our two connectors and do the work of reading from the source connector and writing to the sink connector.

For that, we can use $CONFLUENT_HOME/etc/kafka/connect-standalone.properties:

bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/share/java

Note that plugin.path can hold a list of paths, where connector implementations are available

As we’ll use connectors bundled with Kafka, we can set plugin.path to $CONFLUENT_HOME/share/java. Working with Windows, it might be necessary to provide an absolute path here.

For the other parameters, we can leave the default values:

  • bootstrap.servers contains the addresses of the Kafka brokers
  • key.converter and value.converter define converter classes, which serialize and deserialize the data as it flows from the source into Kafka and then from Kafka to the sink
  • key.converter.schemas.enable and value.converter.schemas.enable are converter-specific settings
  • offset.storage.file.filename is the most important setting when running Connect in standalone mode: it defines where Connect should store its offset data
  • offset.flush.interval.ms defines the interval at which the worker tries to commit offsets for tasks

And the list of parameters is quite mature, so check out the official documentation for a complete list.

5.4. Kafka Connect in Standalone Mode

And with that, we can start our first connector setup:

$CONFLUENT_HOME/bin/connect-standalone \
  $CONFLUENT_HOME/etc/kafka/connect-standalone.properties \
  $CONFLUENT_HOME/etc/kafka/connect-file-source.properties \
  $CONFLUENT_HOME/etc/kafka/connect-file-sink.properties

First off, we can inspect the content of the topic using the command line:

$CONFLUENT_HOME/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic connect-test --from-beginning

As we can see, the source connector took the data from the test.txt file, transformed it into JSON, and sent it to Kafka:

{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}

And, if we have a look at the folder $CONFLUENT_HOME, we can see that a file test.sink.txt was created here:

cat $CONFLUENT_HOME/test.sink.txt
foo
bar

As the sink connector extracts the value from the payload attribute and writes it to the destination file, the data in test.sink.txt has the content of the original test.txt file.

Now let’s add more lines to test.txt.

When we do, we see that the source connector detects these changes automatically.

We only have to make sure to insert a newline at the end, otherwise, the source connector won’t consider the last line.

At this point, let’s stop the Connect process, as we’ll start Connect in distributed mode in a few lines.

6. Connect’s REST API

Until now, we made all configurations by passing property files via the command line. However, as Connect is designed to run as a service, there is also a REST API available.

By default, it is available at http://localhost:8083. A few endpoints are:

  • GET /connectors – returns a list with all connectors in use
  • GET /connectors/{name} – returns details about a specific connector
  • POST /connectors – creates a new connector; the request body should be a JSON object containing a string name field and an object config field with the connector configuration parameters
  • GET /connectors/{name}/status – returns the current status of the connector – including if it is running, failed or paused – which worker it is assigned to, error information if it has failed, and the state of all its tasks
  • DELETE /connectors/{name} – deletes a connector, gracefully stopping all tasks and deleting its configuration
  • GET /connector-plugins – returns a list of connector plugins installed in the Kafka Connect cluster

The official documentation provides a list with all endpoints.

We’ll use the REST API for creating new connectors in the following section.

7. Kafka Connect in Distributed Mode

The standalone mode works perfectly for development and testing, as well as smaller setups. However, if we want to make full use of the distributed nature of Kafka, we have to launch Connect in distributed mode.

By doing so, connector settings and metadata are stored in Kafka topics instead of the file system. As a result, the worker nodes are really stateless.

7.1. Starting Connect 

A reference configuration for distributed mode can be found at $CONFLUENT_HOME/etc/kafka/connect-distributed.properties.

Parameters are mostly the same as for standalone mode. There are only a few differences:

  • group.id defines the name of the Connect cluster group. The value must be different from any consumer group ID
  • offset.storage.topicconfig.storage.topic and status.storage.topic define topics for these settings. For each topic, we can also define a replication factor

Again, the official documentation provides a list with all parameters.

We can start Connect in distributed mode as follows:

$CONFLUENT_HOME/bin/connect-distributed $CONFLUENT_HOME/etc/kafka/connect-distributed.properties

7.2. Adding Connectors Using the REST API

Now, compared to the standalone startup command, we didn’t pass any connector configurations as arguments. Instead, we have to create the connectors using the REST API.

To set up our example from before, we have to send two POST requests to http://localhost:8083/connectors containing the following JSON structs.

First, we need to create the body for the source connector POST as a JSON file. Here, we’ll call it connect-file-source.json:

{
    "name": "local-file-source",
    "config": {
        "connector.class": "FileStreamSource",
        "tasks.max": 1,
        "file": "test-distributed.txt",
        "topic": "connect-distributed"
    }
}

Note how this looks pretty similar to the reference configuration file we used the first time.

And then we POST it:

curl -d @"$CONFLUENT_HOME/connect-file-source.json" \
  -H "Content-Type: application/json" \
  -X POST http://localhost:8083/connectors

Then, we’ll do the same for the sink connector, calling the file connect-file-sink.json:

{
    "name": "local-file-sink",
    "config": {
        "connector.class": "FileStreamSink",
        "tasks.max": 1,
        "file": "test-distributed.sink.txt",
        "topics": "connect-distributed"
    }
}

And perform the POST like before:

curl -d @$CONFLUENT_HOME/connect-file-sink.json \
  -H "Content-Type: application/json" \
  -X POST http://localhost:8083/connectors

If needed, we can verify, that this setup is working correctly:

$CONFLUENT_HOME/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic connect-distributed --from-beginning
{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}

And, if we have a look at the folder $CONFLUENT_HOME, we can see that a file test-distributed.sink.txt was created here:

cat $CONFLUENT_HOME/test-distributed.sink.txt
foo
bar

After we tested the distributed setup, let’s clean up, by removing the two connectors:

curl -X DELETE http://localhost:8083/connectors/local-file-source
curl -X DELETE http://localhost:8083/connectors/local-file-sink

8. Transforming Data

8.1. Supported Transformations

Transformations enable us to make simple and lightweight modifications to individual messages.

Kafka Connect supports the following built-in transformations:

  • InsertField – Add a field using either static data or record metadata
  • ReplaceField – Filter or rename fields
  • MaskField – Replace a field with the valid null value for the type (zero or an empty string, for example)
  • HoistField – Wrap the entire event as a single field inside a struct or a map
  • ExtractField – Extract a specific field from struct and map and include only this field in the results
  • SetSchemaMetadata – Modify the schema name or version
  • TimestampRouter – Modify the topic of a record based on original topic and timestamp
  • RegexRouter – Modify the topic of a record based on original topic, a replacement string, and a regular expression

A transformation is configured using the following parameters:

  • transforms – A comma-separated list of aliases for the transformations
  • transforms.$alias.type – Class name for the transformation
  • transforms.$alias.$transformationSpecificConfig – Configuration for the respective transformation

8.2. Applying a Transformer

To test some transformation features, let’s set up the following two transformations:

  • First, let’s wrap the entire message as a JSON struct
  • After that, let’s add a field to that struct

Before applying our transformations, we have to configure Connect to use schemaless JSON, by modifying the connect-distributed.properties:

key.converter.schemas.enable=false
value.converter.schemas.enable=false

After that, we have to restart Connect, again in distributed mode:

$CONFLUENT_HOME/bin/connect-distributed $CONFLUENT_HOME/etc/kafka/connect-distributed.properties

Again, we need to create the body for the source connector POST as a JSON file. Here, we’ll call it connect-file-source-transform.json.

Besides the already known parameters, we add a few lines for the two required transformations:

{
    "name": "local-file-source",
    "config": {
        "connector.class": "FileStreamSource",
        "tasks.max": 1,
        "file": "test-transformation.txt",
        "topic": "connect-transformation",
        "transforms": "MakeMap,InsertSource",
        "transforms.MakeMap.type": "org.apache.kafka.connect.transforms.HoistField$Value",
        "transforms.MakeMap.field": "line",
        "transforms.InsertSource.type": "org.apache.kafka.connect.transforms.InsertField$Value",
        "transforms.InsertSource.static.field": "data_source",
        "transforms.InsertSource.static.value": "test-file-source"
    }
}

After that, let’s perform the POST:

curl -d @$CONFLUENT_HOME/connect-file-source-transform.json \
  -H "Content-Type: application/json" \
  -X POST http://localhost:8083/connectors

Let’s write some lines to our test-transformation.txt:

Foo
Bar

If we now inspect the connect-transformation topic, we should get the following lines:

{"line":"Foo","data_source":"test-file-source"}
{"line":"Bar","data_source":"test-file-source"}

9. Using Ready Connectors

After using these simple connectors, let’s have a look at more advanced ready-to-use connectors, and how to install them.

9.1. Where to Find Connectors

Pre-built connectors are available from different sources:

  • A few connectors are bundled with plain Apache Kafka (source and sink for files and console)
  • Some more connectors are bundled with Confluent Platform (ElasticSearch, HDFS, JDBC, and AWS S3)
  • Also check out Confluent Hub, which is kind of an app store for Kafka connectors. The number of offered connectors is growing continuously:
    • Confluent connectors (developed, tested, documented and are fully supported by Confluent)
    • Certified connectors (implemented by a 3rd party and certified by Confluent)
    • Community-developed and -supported connectors
  • Beyond that, Confluent also provides a Connectors Page, with some connectors which are also available at the Confluent Hub, but also with some more community connectors
  • And finally, there are also vendors, who provide connectors as part of their product. For example, Landoop provides a streaming library called Lenses, which also contains a set of ~25 open source connectors (many of them also cross-listed in other places)

9.2. Installing Connectors from Confluent Hub

The enterprise version of Confluent provides a script for installing Connectors and other components from Confluent Hub (the script is not included in the Open Source version). If we’re using the enterprise version, we can install a connector using the following command:

$CONFLUENT_HOME/bin/confluent-hub install confluentinc/kafka-connect-mqtt:1.0.0-preview

9.3. Installing Connectors Manually

If we need a connector, which is not available on Confluent Hub or if we have the Open Source version of Confluent, we can install the required connectors manually. For that, we have to download and unzip the connector, as well as move the included libs to the folder specified as plugin.path.

For each connector, the archive should contain two folders that are interesting for us:

  • The lib folder contains the connector jar, for example, kafka-connect-mqtt-1.0.0-preview.jar, as well as some more jars required by the connector
  • The etc folder holds one or more reference config files

We have to move the lib folder to $CONFLUENT_HOME/share/java, or whichever path we specified as plugin.path in connect-standalone.properties and connect-distributed.properties. In doing so, it might also make sense to rename the folder to something meaningful.

We can use the config files from etc either by referencing them while starting in standalone mode, or we can just grab the properties and create a JSON file from them.

10. Conclusion

In this tutorial, we had a look at how to install and use Kafka Connect.

We looked at types of connectors, both source and sink. We also looked at some features and modes that Connect can run in. Then, we reviewed transformers. And finally, we learned where to get and how to install custom connectors.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.
Baeldung Pro – NPI EA (cat = Baeldung)
announcement - icon

Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:

>> Explore a clean Baeldung

Once the early-adopter seats are all used, the price will go up and stay at $33/year.

eBook – HTTP Client – NPI EA (cat=HTTP Client-Side)
announcement - icon

The Apache HTTP Client is a very robust library, suitable for both simple and advanced use cases when testing HTTP endpoints. Check out our guide covering basic request and response handling, as well as security, cookies, timeouts, and more:

>> Download the eBook

eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
announcement - icon

Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.

Get started with understanding multi-threaded applications with our Java Concurrency guide:

>> Download the eBook

eBook – Java Streams – NPI EA (cat=Java Streams)
announcement - icon

Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.

But these can also be overused and fall into some common pitfalls.

To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:

>> Join Pro and download the eBook

eBook – Persistence – NPI EA (cat=Persistence)
announcement - icon

Working on getting your persistence layer right with Spring?

Explore the eBook

Course – LS – NPI EA (cat=REST)

announcement - icon

Get started with Spring Boot and with core Spring, through the Learn Spring course:

>> CHECK OUT THE COURSE

Partner – Moderne – NPI EA (tag=Refactoring)
announcement - icon

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.

eBook Jackson – NPI EA – 3 (cat = Jackson)