Command
- Single-broker cluster
bin/zookeeper-server-start.sh config/zookeeper.properties <– no need this if using external zookeeper
bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1 –partitions 1 –topic test
bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1 –partitions 1 –topic test-deadleter
bin/kafka-topics.sh –list –zookeeper localhost:2181
bin/kafka-console-producer.sh –broker-list localhost:9092 –topic test
bin/kafka-console-consumer.sh –zookeeper localhost:2181 –topic test –from-beginning
server.properties
listeners=PLAINTEXT://:9092 #listener point for producer
zookeeper.connect=localhost:2181 // <– zookeeper client port
- multi-broker cluster
First we make a config file for each of the brokers:
> cp config/server.properties config/server-1.properties
> cp config/server.properties config/server-2.properties
Now edit these new files and set the following properties:
config/server-1.properties:
broker.id=1
listeners=PLAINTEXT://:9093
log.dir=/tmp/kafka-logs-1
config/server-2.properties:
broker.id=2
listeners=PLAINTEXT://:9094
log.dir=/tmp/kafka-logs-2
bin/kafka-server-start.sh config/server.properties &
bin/kafka-server-start.sh config/server-1.properties &
bin/kafka-server-start.sh config/server-2.properties &
bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 3 –partitions 1 –topic my-replicated-topic
bin/kafka-console-producer.sh –broker-list localhost:9092 –topic my-replicated-topic
bin/kafka-console-consumer.sh –zookeeper localhost:2181 –from-beginning –topic my-replicated-topic
- Cluster tools
- Kafka provides tools for a controlled shutdown of Kafka brokers. If the broker has the lead partition shut down, this tool transfers the leadership proactively to other in-sync replicas on another broker. If there is no in-sync replica available, the tool will fail to shut down the broker in order to ensure no data is lost.
The following is the format for using this tool:
[root@localhost kafka_2.9.2-0.8.1.1]# bin/kafka-run-class.sh kafka.admin.ShutdownBroker –zookeeper <zookeeper_host:port/namespace> –broker <brokerID> –num.retries 3 –retry.interval.ms 100
- Kafka provides a tool that is used to maintain a balanced distribution of lead replicas within the Kafka cluster across available brokers.
The following is the format for using this tool:
[root@localhost kafka_2.9.2-0.8.1.1]# bin/kafka-preferred-replica-election.sh –zookeeper <zookeeper_host:port/namespace>
- move partitions across brokers
[root@localhost kafka_2.9.2-0.8.1.1]# cat topics-for-new-server.json
{“partitions”:
[{“topic”: “kafkatopic”,
{“topic”: “kafkatopic1”}],
“version”:1
}
[root@localhost kafka_2.9.2-0.8.1.1]# bin/kafka-reassign-partitions.sh –zookeeper localhost:2181
–topics-to-move-json-file topics-for-new-server.json –broker-list “4,5” -–generate new-topic-reassignment.json
- Kafka provides a mirror maker tool for mirroring the source cluster into a target cluster.
[root@localhost kafka_2.9.2-0.8.1.1]# bin/kafka-run-class.sh kafka.tools.MirrorMaker –consumer.config sourceClusterConsumer.config –num.streams 2 –producer.config targetClusterProducer.config –whitelist=”.*”
- Delete topic
- Add this to server.property
delete.topic.enable=true
- bin/kafka-topics.sh –zookeeper localhost:2181 –delete –topic test
- The layout and contents of the __consumer_offsets topics is a Kafka implementation detail and you should not try to access it directly. Instead use the ConsumerMetadata, OffsetCommit and OffsetFetch API requests
Concept
- Topic
A topic is a category or feed name to which messages are published. For each topic, the ‘ cluster maintains a partitioned log that looks like this:
Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message within the partition, This offset is controlled by the consumer
The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time
A message within a topic is consumed by a single process (consumer) within the consumer group and, if the requirement is such that a single message is to be consumed by multiple consumers, all these consumers need to be kept in different consumer groups. Consumers always consume messages from a particular partition sequentially and also acknowledge the message offset
- Distribution
The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance.
- Producers
Producers publish data to the topics of their choice. The producer is responsible for choosing which message to assign to which partition within the topic.
- Consumers
Messaging traditionally has two models: queuing and publish-subscribe. In a queue, a pool of consumers may read from a server and each message goes to one of them; in publish-subscribe the message is broadcast to all consumers. Kafka offers a single consumer abstraction that generalizes both of these—the consumer group.
Consumers label themselves with a consumer group name, and each message published to a topic is delivered to one consumer instance within each subscribing consumer group
If all the consumer instances have the same consumer group, then this works just like a traditional queue balancing load over the consumers.
If all the consumer instances have different consumer groups, then this works like publish-subscribe and all messages are broadcast to all consumers.
- Message ordering
Kafka has stronger ordering guarantees than a traditional messaging system
Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group
Kafka only provides a total order over messages within a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order over messages this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.
- Guarantees
At a high-level Kafka gives the following guarantees:
Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a message M1 is sent by the same producer as a message M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.
A consumer instance sees messages in the order they are stored in the log.
For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any messages committed to the log.
- Each partition available on either of the servers acts as the leader and has zero or more servers acting as followers. Here the leader is responsible for handling all read and write requests for the partition while the followers asynchronously replicate data from the leader. Kafka dynamically maintains a set of in-sync replicas (ISR) that are caught-up to the leader and always persist the latest ISR set to ZooKeeper. In if the leader fails, one of the followers (in-sync replicas) will automatically become the new leader. In a Kafka cluster, each server plays a dual role; it acts as a leader for some of its partitions and also a follower for other partitions. This ensures the load balance within the Kafka cluster.
- Broker
In line with Kafka’s design, brokers are stateless, which means the message state of any consumed message is maintained within the message consumer, and the Kafka broker does not maintain a record of what is consumed by whom
There are multiple possible ways to deliver messages, such as:
- At most once—Messages may be lost but are never redelivered.
- At least once—Messages are never lost but may be redelivered.
- Exactly once—this is what people actually want, each message is delivered once and only once.
When publishing a message we have a notion of the message being “committed” to the log. Once a published message is committed it will not be lost. These are not the strongest possible semantics for publishers
For consumers, Kafka guarantees that the message will be delivered at least once by reading the messages, processing the messages, and finally saving their position. If the consumer process crashes after processing messages but before saving their position, another consumer process takes over the topic partition and may receive the first few messages, which are already processed.
- For the cases where network bandwidth is a bottleneck, Kafka provides a message group compression feature for efficient message delivery
compression.codec = null/snappy/gzip