How Kafka Stores Data Internally

Kafka’s Disk Storage: The Core Idea

When you think about how Apache Kafka stores data, it’s easy to get lost in the distributed nature, the replication, and the high throughput. But at its heart, Kafka is pretty straightforward about how it keeps messages on disk. It’s all about logs.

Each Kafka topic is split into partitions. And each partition is simply an ordered, immutable sequence of records. Kafka stores each partition as a sequence of files on the broker’s local filesystem. These files are called log segments.

Log Segments Explained

Imagine a massive append-only log. That’s essentially what a partition is. Kafka doesn’t try to be clever by using databases or complex indexing schemes for the primary storage of messages. It just writes them sequentially.

Each partition directory on a broker contains multiple files. These files are the log segments. A log segment file has a naming convention based on the base offset it starts with. For example, you might see files like 00000000000000000000.log and 00000000000000000010.log.

The number 00000000000000000000 is the base offset. This means the first message in this segment has an offset of 0. The next file, 00000000000000000010.log, starts with an offset of 10. This implies that all messages from offset 0 up to (but not including) offset 10 are stored in the first file.

Why Log Segments?

This segmentation approach serves a crucial purpose: log retention. Kafka brokers don’t keep data forever. They have retention policies, either based on time (e.g., keep data for 7 days) or size (e.g., keep the partition size below 10GB). When a log segment is no longer needed according to these policies, it’s simply deleted.

This simple append-and-delete mechanism is incredibly efficient. Appending to a file is a fast operation. Deleting old files is also fast and avoids complex garbage collection or compaction processes that might slow down message ingestion.

Index Files

So, how does Kafka efficiently find a specific message if it’s just writing to files? It uses index files. Alongside each .log segment file, you’ll find a corresponding .index file (e.g., 00000000000000000000.index).

These index files are not a full index of every single message. Instead, they are typically a sparse index. They map message offsets to their position within the .log file. This means for a large log segment, the index file might only contain entries for, say, every 100th message.

When a consumer requests messages from a specific offset, Kafka uses the index file to quickly locate the approximate position of that message within the .log file. Then, it scans a small portion of the .log file to find the exact message.

Let’s look at a simplified example of what an index file might look like. Suppose we have a .log file starting at offset 0 and containing messages with offsets 0, 1, 2, 3, 4, 5. A sparse index might only store entries for offsets 0 and 3:

1
# Offset | Position in .log file
2
# ------ | ---------------------
3
# 0      | 0 (start of file)
4
# 3      | <byte offset of message 3>

If a consumer asks for offset 2, Kafka sees that offset 0 is the last indexed offset before 2. It jumps to the position of offset 0 in the .log file and reads forward until it finds message with offset 2.

Other Files

Besides .log and .index files, you’ll also find files with the .timeindex extension. These are similar to the offset index but map timestamps to the position in the log segment. This is useful for time-based operations.

Putting It Together

When Kafka receives a message for a particular topic-partition, it appends that message to the current active log segment file for that partition. It also updates the index files. Consumers fetch messages by providing the offset they are interested in. Kafka uses the index files to quickly find the relevant data within the log segments on disk.

This layered approach, using sequential writes for the primary data store and sparse indexes for quick lookups, is what makes Kafka so performant and scalable. It’s a clever, simple design that holds up remarkably well under heavy load.