Apache Kafka Core Concepts
Apache Kafka is a distributed streaming platform that is widely used for building real-time data pipelines and streaming applications. It is designed to handle high throughput, fault tolerance, and scalability. In this article, we will explore the core concepts of Kafka, along with some advanced topics, and provide examples in Golang to help you understand how to work with Kafka.
1. Topics
A topic in Kafka is a category or feed name to which messages are sent by producers. Topics act as logs where messages are stored and organized.
- Key Features:
- Topics are split into partitions for scalability.
- Topics can have one or more consumers.
- Data in a topic is retained based on the configured retention period.
- Example: A topic named
orders
could be used to store all order-related events in an e-commerce system.
2. Producer
A producer is a client application that publishes messages to a Kafka topic.
- Key Features:
- Producers can specify which partition a message should be sent to.
- Producers can include a message key to direct messages to specific partitions.
- Acknowledgment settings allow tuning between performance and reliability.
- Example: An inventory system that publishes stock updates to the
inventory
topic.
3. Consumer
A consumer is a client application that subscribes to topics and processes messages.
- Key Features:
- Consumers read messages sequentially from a topic’s partition.
- Consumers keep track of their progress using offsets.
- Example: A billing system that processes new orders from the
orders
topic.
4. Consumer Groups
A consumer group is a group of consumers working together to process messages from a topic.
- Key Features:
- Each partition in a topic is consumed by only one consumer within a group.
- Allows horizontal scaling of message consumption.
- Example: A topic with 5 partitions can be processed by 5 consumers in a group, each consuming one partition.
5. Partitions
A partition is a unit of parallelism in Kafka topics.
- Key Features:
- Each partition is an append-only log.
- Partitions enable scalability and fault tolerance.
- Messages within a partition are ordered.
- Example: A topic
user-logs
with 3 partitions distributes messages among three brokers, improving throughput.
6. Offsets
An offset is a unique identifier for a message within a partition.
- Key Features:
- Consumers use offsets to keep track of which messages have been processed.
- Offsets allow re-reading messages if necessary.
- Example: A consumer can start processing from offset
100
in partition1
of a topic.
7. Error Handling
Error handling ensures reliable message delivery and processing.
- Key Features:
- Producers handle errors like retries on failed deliveries.
- Consumers handle errors such as message deserialization issues or processing failures.
- Example: If a consumer fails to process a message, it can log the error and retry based on the application logic.
Advanced Kafka Concepts
8. Message Keys
A message key is an optional value that producers can include with each message.
- Key Features:
- Determines the partition for the message.
- Ensures ordering within a key across partitions.
- Example: In a topic
user-actions
, a keyuser123
ensures all actions foruser123
are sent to the same partition.
9. Custom Partitioning
Custom partitioning allows producers to define logic to determine the target partition.
- Key Features:
- Provides more control over data distribution.
- Useful for ensuring specific data is always written to specific partitions.
- Example: A producer sends high-priority messages to specific partitions using a custom partitioning strategy.
10. Brokers
A broker is a Kafka server that stores topic data.
- Key Features:
- Brokers manage partitions and handle read/write requests.
- Kafka clusters consist of multiple brokers for fault tolerance.
- Example: A cluster with 3 brokers might distribute partitions of a topic
logs
among them.
11. Replication
Replication ensures fault tolerance by duplicating partition data across brokers.
- Key Features:
- Each partition has a leader and multiple replicas.
- Kafka automatically switches the leader to a replica if the current leader fails.
- Example: A partition with a replication factor of
3
ensures data is available even if two brokers fail.
12. Retention Policy
Kafka retains messages based on configured policies (time or size).
- Key Features:
- Messages can be retained for a specific duration or until the topic size exceeds a limit.
- Consumers can read older messages if they haven’t been deleted.
- Example: A topic configured with a retention policy of
7 days
keeps data available for a week.
13. Compaction
Log compaction removes old messages with the same key, retaining only the latest value.
- Key Features:
- Helps manage storage efficiently.
- Ensures consumers always have the latest state for each key.
- Example: A topic
user-profiles
uses compaction to store only the most recent user profile updates.
14. Streams API
Kafka Streams API is used for processing and transforming data within Kafka.
- Key Features:
- Allows building applications that consume, process, and produce messages.
- Supports stateful operations.
- Example: A streaming application aggregates clickstream data to calculate real-time website statistics.
Summary of Use Case Examples
- Order Processing: Producers publish to
orders
, consumers in a group process each order for billing and shipping. - Real-Time Analytics: Producers send website logs to a
logs
topic; stream processors analyze user behavior. - Event Sourcing: A topic stores events for rebuilding application states when needed.
Conclusion
Keep Coding! 😊