#onenote# kafka



  • Single-broker cluster


bin/zookeeper-server-start.sh config/zookeeper.properties  <– no need this if using external zookeeper

bin/kafka-server-start.sh config/server.properties

bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1 –partitions 1 –topic test

bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1 –partitions 1 –topic test-deadleter

bin/kafka-topics.sh –list –zookeeper localhost:2181


bin/kafka-console-producer.sh –broker-list localhost:9092 –topic test

bin/kafka-console-consumer.sh –zookeeper localhost:2181 –topic test –from-beginning





listeners=PLAINTEXT://:9092     #listener point for producer

zookeeper.connect=localhost:2181   // <– zookeeper client port





  • multi-broker cluster


First we make a config file for each of the brokers:

> cp config/server.properties config/server-1.properties
> cp config/server.properties config/server-2.properties

Now edit these new files and set the following properties:



bin/kafka-server-start.sh config/server.properties &

bin/kafka-server-start.sh config/server-1.properties &

bin/kafka-server-start.sh config/server-2.properties &


bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 3 –partitions 1 –topic my-replicated-topic


bin/kafka-console-producer.sh –broker-list localhost:9092 –topic my-replicated-topic

bin/kafka-console-consumer.sh –zookeeper localhost:2181 –from-beginning –topic my-replicated-topic



  • Cluster tools



      • Kafka provides tools for a controlled shutdown of Kafka brokers. If the broker has the lead partition shut down, this tool transfers the leadership proactively to other in-sync replicas on another broker. If there is no in-sync replica available, the tool will fail to shut down the broker in order to ensure no data is lost.


The following is the format for using this tool:


[root@localhost kafka_2.9.2-]# bin/kafka-run-class.sh kafka.admin.ShutdownBroker –zookeeper <zookeeper_host:port/namespace> –broker <brokerID> –num.retries 3 –retry.interval.ms 100



      • Kafka provides a tool that is used to maintain a balanced distribution of lead replicas within the Kafka cluster across available brokers.


The following is the format for using this tool:


[root@localhost kafka_2.9.2-]# bin/kafka-preferred-replica-election.sh –zookeeper <zookeeper_host:port/namespace>



      • move partitions across brokers

[root@localhost kafka_2.9.2-]# cat topics-for-new-server.json


[{“topic”: “kafkatopic”,

{“topic”: “kafkatopic1”}],




[root@localhost kafka_2.9.2-]# bin/kafka-reassign-partitions.sh –zookeeper localhost:2181

–topics-to-move-json-file topics-for-new-server.json –broker-list “4,5” -–generate new-topic-reassignment.json



      •  Kafka provides a mirror maker tool for mirroring the source cluster into a target cluster.

[root@localhost kafka_2.9.2-]# bin/kafka-run-class.sh kafka.tools.MirrorMaker –consumer.config sourceClusterConsumer.config –num.streams 2 –producer.config targetClusterProducer.config –whitelist=”.*”


      1. Delete topic
        1. Add this to server.property


      1. bin/kafka-topics.sh –zookeeper localhost:2181 –delete –topic test


      1. The layout and contents of the __consumer_offsets topics is a Kafka implementation detail and you should not try to access it directly. Instead use the ConsumerMetadata, OffsetCommit and OffsetFetch API requests






      1. Topic

A topic is a category or feed name to which messages are published. For each topic, the ‘   cluster maintains a partitioned log that looks like this:

Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message within the partition,  This offset is controlled by the consumer

The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time


A message within a topic is consumed by a single process (consumer) within the consumer group and, if the requirement is such that a single message is to be consumed by multiple consumers, all these consumers need to be kept in different consumer groups. Consumers always consume messages from a particular partition sequentially and also acknowledge the message offset



  • Distribution


The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance.




  • Producers


Producers publish data to the topics of their choice. The producer is responsible for choosing which message to assign to which partition within the topic.



  • Consumers


Messaging traditionally has two models: queuing and publish-subscribe. In a queue, a pool of consumers may read from a server and each message goes to one of them; in publish-subscribe the message is broadcast to all consumers. Kafka offers a single consumer abstraction that generalizes both of these—the consumer group.


Consumers label themselves with a consumer group name, and each message published to a topic is delivered to one consumer instance within each subscribing consumer group


If all the consumer instances have the same consumer group, then this works just like a traditional queue balancing load over the consumers.


If all the consumer instances have different consumer groups, then this works like publish-subscribe and all messages are broadcast to all consumers.



  • Message ordering


Kafka has stronger ordering guarantees than a traditional messaging system

Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group


Kafka only provides a total order over messages within a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order over messages this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.



  • Guarantees



At a high-level Kafka gives the following guarantees:


Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a message M1 is sent by the same producer as a message M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.


A consumer instance sees messages in the order they are stored in the log.


For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any messages committed to the log.



      1. Each partition available on either of the servers acts as the leader and has zero or more servers acting as followers. Here the leader is responsible for handling all read and write requests for the partition while the followers asynchronously replicate data from the leader. Kafka dynamically maintains a set of in-sync replicas (ISR) that are caught-up to the leader and always persist the latest ISR set to ZooKeeper. In if the leader fails, one of the followers (in-sync replicas) will automatically become the new leader. In a Kafka cluster, each server plays a dual role; it acts as a leader for some of its partitions and also a follower for other partitions. This ensures the load balance within the Kafka cluster.


      1. Broker

In line with Kafka’s design, brokers are stateless, which means the message state of any consumed message is maintained within the message consumer, and the Kafka broker does not maintain a record of what is consumed by whom


There are multiple possible ways to deliver messages, such as:


      • At most once—Messages may be lost but are never redelivered.
      • At least once—Messages are never lost but may be redelivered.
      • Exactly once—this is what people actually want, each message is delivered once and only once.



When publishing a message we have a notion of the message being “committed” to the log. Once a published message is committed it will not be lost.  These are not the strongest possible semantics for publishers



For consumers, Kafka guarantees that the message will be delivered at least once by reading the messages, processing the messages, and finally saving their position. If the consumer process crashes after processing messages but before saving their position, another consumer process takes over the topic partition and may receive the first few messages, which are already processed.


      1. For the cases where network bandwidth is a bottleneck, Kafka provides a message group compression feature for efficient message delivery

compression.codec  = null/snappy/gzip





Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s