For a long time I have been interested in Apache Kafka and its applications. Unfortunately, forced by circumstances, work and other personal endevours, I had not been able to really dive deeper into the matters until Spring 2019. In April I have finally finished the Udemy course “Apache Kafka for Beginners“.
At work, my exposure to Kafka had only been limited, as we were (ultimately) publishing messages onto a Kafka topic using Oracle Service Bus. However, this was actually a Java-built integration, as we wer just pushing the messages onto a JMS queue, which had a MDB listening that propagated the messages to the Kafka cluster.
After completing the first training I got interested, especially in the role of Kafka in real-time event systems and I decided to take another course on Kafka Streams. I was a bit disappointed that this specific course focussed on the Java development quite heavily, and as an exception I decided to abandon the course uncompleted. During one of the Kafka Meetups, I found out that Confluent was actually offering a very interesting alternative to programming the Kafka Streams API in Java, viz. KSQL.
If you have not yet heard of KSQL, read on! KSQL is the streaming SQL engine that is provided as part of the Confluent Platform and licensed under its Community License. KSQL lets you approach stream processing on Kafka data like executing SQL statements on a relational database.
So, the good parts are:
- much easier than programming the Kafka Streams API
- very accessible using a SQL-like language for defining entities based on a Kafka topic
- both a CLI (KSQL) and a graphical client (through Confluent’s Control Center) are offered
And does it have any bad parts? Mmm, come to think of it: you probably can’t implement all possible scenario’s using “just” KSQL, so for more advanced applications you will still need to revert to Java programming the Kafka Streams API … but it’s dead simple to get started!
Just as in SQL you do not have to know the exact steps the query engine performs to access the data and arrive at the result, using KSQL insulates the user from having to program the Streams API – KSQL will invoke the required APIs based on your declarative input!
Bits & Bites – Not Bytes
The company that employs me, SynTouch, greatly values knowledge sharing. We’re encouraged to organize hands-on workshops called “Bits & Bites” for our colleagues and interested co-workers, customers etc. The “Bits” stand for the technical aspects, the Bites represent the dinner we’ll have as these events typically start at four o’clock in the afternoon and run until eight or nine in the evening.
This September, a colleague and myself organized one of the workshops, where he introduced us to Event-Driven Architectures as seen from his perspective as an architect. I was responsible for the introducing Kafka, KSQL/Kafka Streams and creating the material for the hands-on. We’ve made this material (in Dutch) available on Github.
The kind of applications that are written using Kafka Streams, are usually applications that consume data from Kafka, perform some slicing and dicing processing, and publish the transformed, data to another Kafka topic. The processing can consist of any number of operations, and some of the more popular ones are:
- transformation – perform simple scalar transformation on data
- aggegration – aggregate data on specific fields and compute derived properties like mean values, sums etc.
- windowed data – data can be windowed to have a moving time frame for calculations
- simple projections and filtering – for tailoring the data to your wishes
- joins – just as in SQL, data can be enriched by performing JOINs to other data sources
The Kafka Streams concepts to define a streams application consist of sources, sinks and processors, built into a topology:
Basically, source processors make the data available to your streams application, sink processors write the data back to Kafka and the other processor do anything in between.
For the love of Beer
For the company’s tenth anniversary, a number of my colleagues (and fellow beer afficionado’s) had created the plan to create a special anniversary beer. As they say, the rest is history. In the following years, the recipe was further refined and new beer styles have been added to the list of SynBier. During our events, SynBier has become a modern classic that is highly appreciated by colleagues, customers and other consumers.
Although I do have an M.Sc. in Chemistry, I am not really into the brewing process – never had a knack for the practical laboratory work, but I am an avid beer consumer in my weekends and holidays. So, when thinking about assignments for the practical work during a “Bits & Bites” session, beer-related scenarios spring to mind quite easily.
So … What is KSQL?
KSQL is Confluent’s “Streaming SQL Engine for Kafka”. Basically, this means you can simply manipulate your data in Kafka using SQL-like statements. You do not need to program the Kafka Streams API in Java (although KSQL does use the API under the hood) and as a “consumer” you’re insulated from having to imperatively specify the processing steps, but you’re instead enabled to declaratively specify the manipulations to be performed on the data in the Kafka topic. Like SQL, there are all kinds of built-in functions already provided and if these are not sufficient, you can still bring out the ol’ Java IDE and roll your own User Defined Functions (for operations on a single record) or User Defined Aggregate Functions (for aggregates).
In my next blog, I would like to show the power of KSQL through one of the (slightly reworked) assignments I have created for the workshop.