Overview

Kafka Connect is a framework for streaming data between Kafka and external systems using connector plugins. It can run in standalone or distributed mode.

Kafka Connect scales by dividing connector jobs into multiple parallel tasks, which are executed across one or more worker instances. Distributed mode enables a cluster of workers to share the workload dynamically. As connectors are deployed or fail, workers rebalance tasks using Kafka-based coordination for fault-tolerant distribution.

You typically run one Connect cluster with all necessary plugins installed; each connector instance (e.g. Transformer, Elasticsearch, JDBC) runs within that cluster. Tasks from different connectors are executed across the same pool of worker instances, enabling shared scaling and fault tolerance.

Connector Type	Direction of Data Flow	Purpose
Source Connector	External System -> Kafka	Pulls data into Kafka topics from external systems
Sink Connector	Kafka -> External System	Pushes data out of Kafka topics into external systems

Source connectors act like producers, continuously ingesting data from outside (e.g., databases, APIs, files) and writing to Kafka topics. In contrast, Sink connectors act like consumers, reading from Kafka topics and delivering data to external destinations like databases, search systems, or storage.

Plugins

Kafka Connect plugins are extensions that implement the logic for connecting Kafka to external systems. They fall into two main categories: source connectors, which ingest data into Kafka, and sink connectors, which deliver data from Kafka to external systems. Kafka Connect comes with a set of pre-built connectors, but you can also develop custom plugins to meet specific requirements.

Commonly used plugins:

JDBC Connector (PostgreSQL, MySQL, Oracle, etc.): Reads from relational databases into Kafka topics and writes Kafka data back to tables; supports incremental updates using primary keys or timestamps, with task parallelism depending on table partitions. Can be both sink and source since it got two different cases.
Elasticsearch Connector: Indexes Kafka topic data into Elasticsearch; requires careful schema mapping and batching to maintain performance. Works as sink only since it just writes into Elasticsearch. Schema mapping and batching need careful tuning to avoid performance bottlenecks.

You can develop your own connector plugin if no pre-built connector meets your needs. Key points to consider:

Implement the SourceConnector or SinkConnector interface along with SourceTask or SinkTask.
Package your plugin as a JAR and place it in the Kafka Connect plugins.path.
Ensure compatibility with the Kafka Connect version and cluster runtime.
Properly handle offsets and task failures for fault tolerance.
Test thoroughly in distributed mode to ensure rebalancing and parallelism work as expected.

Overview

Plugins

Workers