Providing Technology Training and Mentoring For Modern Technology Adoption
This tutorial is adapted from Web Age course Confluent Kafka for System Administrators.
Regarding OS Platforms for Confluent Platform
Linux is the primary platform for deploying Confluent Kafka. macOS is supported for testing/development purposes. Windows is not a supported platform. Confluent Platform can be deployed On-Premises or in cloud environments like AWS, Google Cloud Platform, Azure.
Confluent Platform currently supports these Linux OSs:RHEL/CentOS 7.x, Debian 8/9 and Ubuntu 16.04 LST/ 18.04 LTS.
An updated list of supported OSs can be found here:
Confluence Platform runs on Java 8. Java 7 and earlier are no longer supported. Java 9/10 are not supported. The HDFS connector does not support Java 11.
Confluent Platform can be run for development purposes on a machine with 4GB ram. System requirements for each component of the platform are typically higher for production. For example: Control Center, 300GB storage, 32GB ram, 8 CPU cores or more; Brokers, MultiTB storage, 64GB ram, Dual 12 core cpu; KSQL, SSD storage, 20GB ram, 4 cores.
A complete list of components and requirements can be found here:
Confluent can be run on dedicated hardware by installing the platform locally or in the cloud by going to https://confluent.cloud. Confluent can be installed locally from platform packages (installs the entire platform at once) or individual component packages (installs individual components).
Install images for the Confluent platform can be obtained from the downloads page here:
You can choose to download any of the following formats:
For development and testing, follow the steps below:
For Production, follow the steps below:
Docker is a technology that enables running of applications inside of 'Containers'. With Docker, software (like Confluent Platform) can be configured and run inside a container without altering the setup and configuration of the underlying operating system. Containers for Confluent platform components can be created from images available on the DockerHub web site. The Docker software must be installed and the Docker engine running in order to download Docker images and run Docker containers.
docker pull confluentinc/cp-kafka
docker run -d --net=host confluentinc/cp-kafka
Full instructions for running Confluent Platform on Docker can be found here:
For a usable installation you will typically need to run at least zookeeper & kafka:
docker run -d \
-e ZOOKEEPER_CLIENT_PORT=32181 \
-e ZOOKEEPER_TICK_TIME=2000 \
docker run -d \
-e KAFKA_ZOOKEEPER_CONNECT=localhost:32181 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:29092 \
When planning a Kafka cluster, the following two areas should be considered:
Sizing for throughput
Sizing for storage
Know the expected throughput of your Producer(s) and Consumer(s). The system throughput is as fast as your weakest link. Consider the message size of the produced messages. Know how many consumers would be consuming from the Topics/Partitions. At what rate would the consumer(s) be able to process each message given the message size.
Increasing the number of brokers and configuring replication across brokers is a mechanism to achieve parallelism and higher throughput.
It also helps to achieve high availability. HA may not be a factor when running Dev and Test environments, but in Production environment, it is strongly recommended to deploy multiple brokers ( 3+). ZooKeeper plays a critical role in the Broker cluster management by keeping track of which brokers are leaving the cluster and tracking which new ones are joining the cluster. Leader election and configuration management is also the responsibility of ZooKeeper. Zookeeper should also be deployed in a HA cluster. The recommendation for sizing your Zookeeper cluster is to use:
1 instance for Dev/Test Environments, 3 instances to plan for 1 node failure and 5 instances to plan for 2 node failures.
The number of partitions depend on the desired throughput and the degree of parallelism that your Producer / Consumer ecosystem can support.
Generally speaking, increasing the # of partitions on a given topic , linearly increases your throughput. The throughput bottleneck could end up being the rate at which your Producer can produce or the rate at which your consumers can consume.
Simple formula to size for topics and partitions:
Lets say the desired Throughput is “t”.
Max Producer throughput is “p”
Max Consumer throughput is “c”.
Number of Partitions = max ( t/p, t/c).
A rule of thumb often used is to have at least as many partitions as the number of consumers in largest consumer group.
Factors to consider when sizing your storage on a Broker are # Topics, # Partitions per topic, Desired Replication factor, Message Sizes, Retention period and Rate at which messages are expected to be Published and Consumed from Event Hub Broker.
You'll probably want to use data from other sources or export data from Kafka to other systems. For many systems, instead of writing custom integration code you can use Kafka Connect to import or export data. Kafka Connect is a tool included with Kafka that imports and exports data to Kafka. It is an extensible tool that runs connectors, which implement the custom logic for interacting with an external system.
Kafka Connect requires three configuration files as parameters. These files include a unique connector name, the connector class to instantiate, and any other configuration required by the connector.
etc/kafka/connect-standalone.properties – this is the configuration for the Kafka Connect process, containing common configuration such as the Kafka brokers to connect to and the serialization format for data.
etc/kafka/connect-file-source.properties – specifies a file source connector. Data is read from the source file and written to a topic configured in this configuration file.
etc/kafka/connect-file-sink.properties – specifies file sink connector. Data is read from a topic and written to a text file specified in the configuration.
Apache Kafka comes with default configuration files which you can modify to support single or multi-broker configuration. Apache Kafka comes with client tools, such as producer, consumer, and Kafka Connect. You can utilize Apache Kafka in various development tools/frameworks, such Spring Boot, Nodejs etc.
Your email address will not be published. Required fields are marked *