Overview

Kafka Connect is a framework for streaming data between Kafka and external systems using connector plugins. It can run in standalone or distributed mode.

Kafka Connect scales by dividing connector jobs into multiple parallel tasks, which are executed across one or more worker instances. Distributed mode enables a cluster of workers to share the workload dynamically. As connectors are deployed or fail, workers rebalance tasks using Kafka-based coordination for fault-tolerant distribution.

You typically run one Connect cluster with all necessary plugins installed; each connector instance (e.g. Transformer, Elasticsearch, JDBC) runs within that cluster. Tasks from different connectors are executed across the same pool of worker instances, enabling shared scaling and fault tolerance.

Connector Type Direction of Data Flow Purpose
Source Connector External System -> Kafka Pulls data into Kafka topics from external systems
Sink Connector Kafka -> External System Pushes data out of Kafka topics into external systems

Source connectors act like producers, continuously ingesting data from outside (e.g., databases, APIs, files) and writing to Kafka topics. In contrast, Sink connectors act like consumers, reading from Kafka topics and delivering data to external destinations like databases, search systems, or storage.

Plugins

Kafka Connect plugins are extensions that implement the logic for connecting Kafka to external systems. They fall into two main categories: source connectors, which ingest data into Kafka, and sink connectors, which deliver data from Kafka to external systems. Kafka Connect comes with a set of pre-built connectors, but you can also develop custom plugins to meet specific requirements.

Commonly used plugins:

You can develop your own connector plugin if no pre-built connector meets your needs. Key points to consider:

Workers