For Kafka Connect cheat sheet:

Kafka Connect


Overview

Kafka is a distributed event-streaming platform designed for high-throughput, low-latency, and fault-tolerant data pipelines. It is used to publish, store, process, and consume streams of records in real time, and is commonly deployed for log aggregation, metrics, messaging, stream processing, and event-driven microservices at scale.

This documentation provides a concise reference for Kafka architecture, configuration, scalability, and operational best practices.

Components

Each Kafka topic is split into multiple partitions—ordered, append-only logs stored across different brokers. That lets Kafka scale horizontally and handle large workloads across many machines. Data is split across the partitions.

Kafka partitions distribute a topic across multiple brokers, enabling horizontal scaling, fault tolerance, and higher throughput. Partitions enable parallel processing of messages by consumers, increasing throughput. Partitioning also allows topics to scale beyond the limitations of a single broker. If one broker fails, partitions on other brokers remain available, ensuring data durability. Messages can be strictly ordered to a specific partition.