Ditch the Spreadsheets and Build a Model-Driven Power App

Author – Nick Doelman

Introduction

Excel sheets are easy to track business data, but there are pitfalls with this approach. The Power Platform allows a low-code, no-code method to build a robust business application. Power Platform business applications can be further expanded with other tools and technologies, such as business analysis tools and Artificial Intelligence.

The Spreadsheet

Up until the 1970s, computers were limited to universities, governments, and large enterprises.  The invention of the desktop computer along with spreadsheet software was a revolutionary step for all small and medium businesses.

Almost 50 years later, there are many unique business data tracking and transaction requirements that are still implemented using tools like Microsoft Excel.

And why not?  Microsoft Excel is fast and easy.  You create a new worksheet and start entering data.  If you are a bit more advanced, you can create some charts and macros to speed up your work or perform some analysis.

However, there are many pitfalls.  Excel has an upper limit on the amount of data you can store.  Usually, only one person can effectively enter and update data on a worksheet.  While there are some locking and privacy features, for the most part, a user of an Excel worksheet has access to all the data.

Humans are humans.  It can take a very slight keyboard slip and data can easily be changed or deleted even without the knowledge of the user.

I had a client a while back lose months’ worth of data because someone accidentally overwrote an important Excel file that was tracking key business data.

As a business, we no longer keep piles of cash in a box in our office, it is kept safe in a bank.  Should we treat our valuable data the same?

Business Applications

There are hundreds of business applications on the market today for managing data for a variety of industries and needs.  Accounting, project management, customer relationship, and sales tracking are a few examples.  These systems have evolved from desktop client-server to cloud browser-based systems that can be accessed from anywhere on any device.  Furthermore, many of these systems have very flexible personalization tools and application programming interfaces (APIs) that allow businesses to tailor these applications to their specific needs.

However, there are some instances where a business may have very unique requirements that are not necessarily served by available business applications and too complicated to manage using a spreadsheet.  There are tremendous investments in resources and time to build a custom software application from the ground up.

Low-code, No-code Platforms

The last couple of years has seen the rise of Low-code development platforms.  The idea is to provide tools to allow business users to build robust business applications without complex software coding.

The Microsoft Power Platform is an example of a low-code, no-code platform that is the assembly and evolution of several tools and technologies.  Much of the platform traces its roots from Microsoft Dynamics CRM, a customer relationship management application that provided flexible tools to extend CRM applications.  Many businesses benefited from Dynamics CRM as an “anything” relationship management platform (aka xRM) and this helped shaped many of the tools available on the Microsoft Power Platform today.

Build a Model-driven Power App to Replace a Business-critical spreadsheet

While there are a few different options to build solutions in the Power Platform (Canvas-apps, Power Automate, Portals) the next section will focus on taking an example Excel-based application and creating a corresponding model-driven Power App.

Model-driven Power Apps are designed around particular modeling of business data.

Business Requirement

In this example, the requirement is to track the allocation of subsidies for student employment placements.  This is modeled after a real-world project I recently implemented, simplified quite a bit to highlight the main points in this post.

In this example, we have an Excel worksheet that is tracking each placement request.  We have columns to contain the employer information, the employer contact, student information, and of course information on the subsidy itself.

While the Excel sheet is workable, there are all of the issues I mentioned above.  We also see that there is some potential duplication (e.g. ABC company has multiple requests).  Chances are that ABC Company also interacts with the organization for other projects and initiatives (more on that later).

If we want to build a management app, where can we start?

Step by Step

To try this out, if you have an existing Microsoft 365 (Office 365) subscription, you can start to build your app using the Power Platform Community plan.  The community plan is not meant for production uses, but you can build your app and test it out.  We will discuss later how to license Power Apps and deploy the app to production environments.

The process will provide a Power Platform developer environment for you to build apps.  You will arrive at the Power Apps Maker Portal.  While you might be tempted to click the “Model-driven app from blank” in the Make your own app section, we should lay a bit of groundwork and design first.

When creating Power Apps, it is a best practice to create the assets in the context of Solutions.  This will allow us to package everything and move it to a test or production environment.  There is also a full application lifecycle management (ALM) story to go along with this but let’s keep it to the basics.  When you click on solutions you will be prompted to create a database.

You will need to pick the currency and you might want to install sample apps and data to take a look at some examples of model-driven apps.

This process will create a Dataverse.  A Dataverse is a database but with many other features and options for building applications, automation, securing, and storing data.  The core technology is Azure SQL but a Dataverse is much, much more.

When the process is complete, you will see a listing of existing solutions, both administrative and samples.

To build our app, we will create a new solution.  We can give it a name, and we can also create a publisher that will further personalize the app.  The solution will have a version that can help with the management of updates and new versions.

Designing the Data Model

Now that we have the foundation, a bit of pre-planning goes a long way to build an effective app.  If we look at our spreadsheet app, we can begin to categorize our data and see we have employer info, employer contacts, student contacts, and placement data.  If we think further, an employer’s contact info is the same as the student’s contact (name, phone, email).  There also might be a situation (although rare) where the student contact is the same person as the employer contact.  Whether it’s a white-board or Visio, we can begin to design our data model.

Ideally, we should establish any fields (columns) as well.  Now that we have identified our data, we need to see how the parts are related to each other and “link them up”.

Other Design Considerations

Now that we have the data model, we should consider the design of other aspects of our app.  These can be User Interface considerations (and you should understand the basic structure of how model-driven user interfaces are constructed) as well as security, automation, reporting, and integration.  Understanding how these will work to meet the requirements will make the actual construction of the app smoother as well as better user acceptance.

Building the App

Understanding the Common Data Model

Before we go off and start adding new tables to our solution, we might want to understand what the Dataverse can give us before we build.  Most of the applications I have built over the years need to track companies and contacts.  It would be tedious to always have to build new tables with company name, address, email, phone, etc.  The Common Data Model provides pre-built “common” tables with common functionality (e.g. email integration) so we don’t have to re-invent the wheel each time we create an app.  Looking in the Tables section of the maker portal, we can find items like “Accounts” (which have the structure to track company information) as well as “Contacts” (which have the structure to track people information, and can integrate with Microsoft Outlook).

However, further searching we don’t have a “Placement” table in our Common Data Model (however, there are accelerators available that cover a variety of scenarios).  For our app, we will create our own Placement table.

We will add the existing tables to our solution.

And then we will add a new table to the solution. 

We will create our Placement table, and with the existing Account and Contact tables, we will have created the basis of our data model from our design.

For the tables we create, there will be several auto-generated fields (create on, modified on, etc) but we need to add the specific fields (columns) to track our specific information.  There are a variety of field “types” (text, number, date, etc) that will help enforce formats and data accuracy.  We can also create calculated or rollup fields to aggregate information.  

Once the columns are created, we need to link the various tables together with lookup fields.  This is done using “relationships”.  To link to our Placement table, we will choose “Many to One” to link to the Account and Contact tables (Many Placements can be linked to One Account) or vice versa.

The next step is to create the user interface.  Model-driven apps provide a basic framework of “Views” and “Forms” to display and allow users to interact with the data.  This is one of the main differentiators from “canvas” Power Apps that provide a blank canvas where a user needs to align and build out various controls. 

There are series of views pre-generated for a table and we can either modify these existing views or create our own.  A view is very similar to an Excel worksheet as its shows data in a series of rows and columns.

We will have a lot of flexibility on column view width, placement, sorting, and filtering.  A model-driven app can have many views as well as embed views in model-driven forms (next).

We can design forms to show the details of a particular record in our app.  The form structure is made up of tabs and sections and generally displays fields but can be extended to show embedded views and forms from related records.

We can also modify our Account and Contact forms to better suit our application as well.

Once we have defined our tables, views, and forms, we can dive deeper and add charts, dashboards, and business process flows to further energize our apps.  For now, we have enough for a basic app, and we can create the app as part of our solution.

I created the app with the solution as a base, this added the appropriate tables.  From here we can define further what tables, forms, views, dashboards, etc. will be part of our app.

We can build more than one app, and have it reside in the same environment.  We could have another department build an app for potentially a training program, but share data with our Placement app.  This is the power of the Dataverse, centralizing core business information.

Within our app designer, we will need to define our site map, which is the main navigation framework for model-driven apps.  Here we can break down the navigation to Areas, Groups, and Sub-Areas. 

We can also place links to dashboards or even external browser-based applications.

Once we create our Sitemap, we can publish our app and try it out.

Now we have a working business application where we can track our Placements.

Notice that a no time did I add “code” or complex functionality.  Most of the application development was point and click.

Here are some of the immediate benefits or ditching the spreadsheet and using a model-driven app;

  • Data can be entered and updated in a consistent, easy to use interface
  • The application is automatically multi-user
  • The application can be accessed from almost anywhere
  • Security roles can be created to allow certain users to only access parts of the data
  • Data can be exported to Word or Excel Templates
  • The app is responsive and will adapt to different devices
  • Data is backed up on the Microsoft cloud

Expanding the App

The app can be further extended and expanded using tools from the Power Platform.  Users could create Power BI reports analyzing the data tracked in the app.  Users can also create AI Builder models to predict trends or perform other analysis.  Users can automate many tedious or complex business processes using Power Automate.  Information can be surfaced to external stakeholders using Power Apps portals.  More specific, task-based apps can be created using canvas Power Apps to interact with the Dataverse information captured in the app.

As applications evolve, there will be requirements where we need to involve a professional developer.  The Power Platform has published APIs where pro developers can further extend the functionality of the platform with plug-ins, custom actions, and user interface controls.

Deployment

The entire solution can be exported and imported into a production Dataverse environment.  Many of the deployment steps can be automated using Azure Dev Ops.

Licensing

Microsoft licensing can be complex and Power Apps subscriptions are no exception.  The core licensing for Power Apps is either “per user” that provides a user with an unlimited number of apps and a “per app” license that allows a user access to two apps + portal access.  The subscriptions also allow for a certain amount of database storage and compute capacity.  However, if you consider the return on investment for these apps, there is tremendous value over trusting the data to a spreadsheet or investing in building a fully customized solution.  Information on licensing can be found here. https://docs.microsoft.com/ /power-platform/admin/pricing-billing-SKUs

Conclusion

The next time you have a business requirement that needs data tracking or database application, do not be tempted by Excel.  Consider using the Power Platform and creating a model-driven Power App.  The platform is not only great for your custom apps but is what is now used for many of Microsoft’s own Dynamics 365 applications.  While some technical skill is required, you don’t need to be a hard-core developer to either build apps for your application or build a career on the Power Platform.  The PL-900 certification is a great way to start to learn more details about the Power Platform.

Functional Programming in Python

This tutorial is adapted from the Web Age course Introduction to Python Programming.

1.1 What is Functional Programming?

Functional programming reduces problems to a set of function calls.

The functions used, referred to as Pure functions, follow these rules:

  • Only produce a result
  • Do not modify the parameters that were passed in
  • Do not produce any side effects
  • Always produce the same result when called with the same parameters

Another condition of functional programming is that all data be immutable.

1.2 Benefits of Functional Programming

Solutions implemented using functional programming have several advantages:

  • Pure functions are easier to test
  • Programs follow a predictable flow
  • Debugging is easier
  • Functional building blocks allow for high-level programming
  • Parallel execution is easier to implement

1.3 Functions as Data

In Python, functions can be assigned to and passed as variables. This allows for:

  • Passing functions as parameters to other functions
  • Returning functions as the result of a function

Python functions such as map(), filter() and sort() take functions as parameters. Functions that accept other functions as parameters or return functions as their result are referred to as higher-order functions.

1.4 Using Map Function

The map() function applies a specified function to each item in an iterable data type (list, string, dictionary, set).

map( transform_function, iterable )

The transform_function passed to map takes an item from the iterable as a parameter and returns data related to the item but modified in some way.

def toUpper(item):
  return   item.upper()

The following map function call converts all items to upper case characters:

states =['Arizona', 'Georgia','New York', 'Texas']
map_result = map( toUpper, states)
list_result = list(map_result)
print(list_result)
# prints:['ARIZONA', 'GEORGIA', 'NEW YORK', 'TEXAS']

The map() returns a map type object which is then converted to a list using the list() constructor function.

1.5 Using Filter Function

The filter() function checks each item in an iterable (list, string, dictionary, set) against a condition and outputs a new iterable that only includes items that pass the check

filter( check_function, iterable )

The check_function passed to filter takes an item from the iterable as a parameter and returns True|False depending on whether the item conforms to the given condition.

def lengthCheck(item, maxlen=7):
  return len(item) <= maxlen

The following map function call converts all itmes to upper case characters:

states =['Arizona', 'Oklahoma','Utah', 'Texas']
filter_result = filter( lengthCheck, states)
list_result = list(filter_result)
print(list_result)
# prints:['Arizona', 'Georgia', 'Texas']

The filter() returns a filter type object which is then converted to a list using the list() constructor function.

1.6 Lambda expressions

Lambda expressions in Python:

  • Are a special syntax for creating anonymous functions
  • Are limited to a single line of code
  • Return the result of their single code statement by default
  • Are typically defined in-line where they will be used
  • Some lambda expressions
lambda x : x * x     # multiply parameter by itself
lambda y : y * 2     # multiply parameter by 2 
lambda z : z['name'] # get value from dict with 
                     # key = 'name'

Normally a lambda expression is placed in code where it will be used:

list(map(lambda x:x*x, [1,2,3]))
# outputs [1, 4, 9]

Lambda expressions can be tested as shown here by supplying a parameter in parenthesis:

(lambda x : x * x)(5) # returns 25

1.7 List.sort() Using Lambda Expression

List.sort() takes two parameters:

list.sort( key, reverse )
    • key = function that should return a value to be sorted on
    • reverse = True/False to indicate source order

In the code below a lambda expression is used to provide the function for the ‘key’ parameter:

list1 = [{'name':'Jack', 'age':25},
{'name':'Cindy', 'age':17},
{'name':'Stuart', 'age':20}]

list1.sort(key=lambda item: item['name'],  
           reverse=False)
print(list1)
# outputs: [{'name': 'Cindy', 'age': 17}, 
# {'name': 'Jack', 'age': 25}, 
# {'name': 'Stuart', 'age': 20}]

The lambda expression returns the value of each object’s ‘name’ parameter to be sorted on

Notes

Python’s sorted() function is similar to list.sort() except that you need to pass in the list as the first parameter and that sorted() returns the sorted list which must then be assigned to a variable

list_sorted = sorted(list1, key, reverse)

1.8 Difference Between Simple Loops and map/filter Type Functions

Loops intended to modify an iterable’s items often does so my mutating the original iterable:

# loop over and mutate iterable
list1 = [ 'c', 'f', 'a', 'e', 'b' ]
for i in range(len(list1)):
  list1[i] = list1[i].upper()

The problems with this are:

  • The original list is no longer available as it has been mutated
  • This code would not work with immutable sequences like tuples
  • The equivalent code using map fixes both of these issues
# map creates new list based on original
list1 = [ 'c', 'f', 'a', 'e', 'b' ]
list2 = list(map(lambda x: x.upper(), list1))

Here is an example of the same map function being used with a tuple:

tup1 = ( 'c', 'f', 'a', 'e', 'b' )
tup2 = tuple(map(lambda x:x.upper(), tup1))

 

1.9 Additional Functions

Python includes many functions developers can use out of the box rather than have to hand-code them:

any(seq) # iterate boolean seq, returns true if at least one item in true

all(seq) # iterate boolean seq, returns true if all items are true

max(seq) # iterate sequence and calculate min value

min(seq) # iterate sequence and calculate min value

sum(list) # iterate sequence and calculate sum

len(seq) # return length of strings, # items in list, etc.

input() # get input from user

randint(0,10) # generate a random number between 0 and 10

Counter(seq) # create dictionary with freq of each seq member

Notes:

randint() is part of the random module which must be imported:
import random as r

r.randint(0,10)

Counter is part of the collections module and must be imported

import collections as c

c.Counter(‘asdffg’)

1.10 General Rules for Creating Functions

Since creating functions is a big part of functional programming we will re-iterate here some of the rules we learned earlier

  • Name functions appropriately
  • Limit the function to a single responsibility
  • Include a docstring
  • Always return a value
  • Limit functions to 50 lines or less
  • Make the function ‘idempotent’ and ‘pure’ if possible

1.11 Summary

In this tutorial, we covered:

  • What is Functional Programming
  • Functional Programming Benefits
  • Using Map Function
  • Using Filter Function
  • Lambda Expressions
  • List.sort() using Lambda
  • Loops vs. map/filter
  • Additional Functions
  • General Rules for Creating Functions

Building Data Pipelines in Kafka

This tutorial is adapted from Web Age course Kafka for Application Developers Training.

1.1 Building Data Pipelines

Data pipelines can involve various use cases:

  • Building a data pipeline where Apache Kafka is one of the two endpoints. For example, getting data from Kafka to S3 or getting data from MongoDB into Kafka. 
  • Building a pipeline between two different systems but using Kafka as an intermediary. For example, getting data from Twitter to Elasticsearch by sending the data first from Twitter to Kafka and then from Kafka to Elasticsearch. 

The main value Kafka provides to data pipelines is its ability to serve as a very large, reliable buffer between various stages in the pipeline, effectively decoupling producers and consumers of data within the pipeline. This decoupling, combined with reliability security, and efficiency, makes Kafka a good fit for most data pipelines.

1.2 Considerations When Building Data Pipelines

  • Timeliness
  • Reliability
  • High and varying throughput
  • Data formats
  • Transformations
  • Security
  • Failure handling
  • Coupling and agility

1.3 Timeliness

Good data integration systems can support different timeliness requirements for different pipelines. Kafka makes the migration between different timetables easier as business requirements can change. Kafka is a scalable and reliable streaming data platform that can be used to support anything from near-real-time pipelines to hourly batches. Producers can write to Kafka as frequently as needed and consumers can also read and deliver the latest events as they arrive. Consumers can work in batches, when required, such as run every hour, connect to Kafka, and read the events that accumulated during the previous hour. Kafka acts as a buffer that decouples the time-sensitivity requirements between producers and consumers. Producers can write events in real-time while consumers process batches of events or vice versa. The consumption rate is driven entirely by consumers.

1.4 Reliability

Systems failure for more than a few seconds can be hugely disruptive, especially when the timeliness requirement is closer to the few-milliseconds end of the spectrum. Data integration systems should avoid single points of failure and allow for fast and automatic recovery from all sorts of failure events. Data pipelines are often the way data arrives in business-critical systems. Another important consideration for reliability is delivery guarantees. Kafka offers a reliable and guaranteed delivery.

1.5 High and Varying Throughput

The data pipelines should be able to scale to very high throughput. They should be able to adapt if throughput suddenly increases and reduces. With Kafka acting as a buffer between producers and consumers, we no longer need a couple of consumer throughput to the producer throughput. If producer throughput exceeds that of the consumer, data will accumulate in Kafka until the consumer can catch up. Kafka’s ability to scale by adding consumers or producers independently allows us to scale either side of the pipeline dynamically and independently to match the changing requirements. Kafka is a high-throughput distributed system capable of processing hundreds of megabytes per second on even modest clusters. Kafka also focuses on parallelizing the work and not just scaling it out. Parallelizing means it allows data sources and sinks to split the work between multiple threads of execution and use the available CPU resources even when running on a single machine. Kafka also supports several types of compression, allowing users and admins to control the use of network and storage resources as the throughput requirements increase.

1.6 Data Formats

A good data integration platform allows and reconciles different data formats and data types. The data types supported vary among different databases and other storage systems. For e.g. you may be loading XMLs and relational data into Kafka and then need to convert data to JSON when writing it. Kafka itself and the Connect APIs are completely agnostic when it comes to data formats. Producers and consumers can use any serializer to represent data in any format that works for you. Kafka Connect has its own in-memory objects that include data types and schemas, but it allows for pluggable converters to allow storing these records in any format. Many sources and sinks have a schema; we can read the schema from the source with the data, store it, and use it to validate compatibility or even update the schema in the sink database. For e.g. if someone added a column in MySQL, a pipeline will make sure the column gets added to Hive too as we are loading new data into it. When writing data from Kafka to external systems, Sink connectors are responsible for the format in which the data is written to the external system. Some connectors choose to make this format pluggable. For example, the HDFS connector allows a choice between Avro and Parquet formats.

1.7 Transformations

There are generally two schools of building data pipelines:

  • ETL (Extract-Transform-Load)
  • ELT (Extract-Load-Transform)

ETL– It means the data pipeline is responsible for making modifications to the data as it passes through. It has the perceived benefit of saving time and storage because you don’t need to store the data, modify it, and store it again.  It shifts the burden of computation and storage to the data pipeline itself, which may or may not be desirable. The transformations that happen to the data in the pipeline tie the hands of those who wish to process the data farther down the pipe. If users require access to the missing fields, the pipeline needs to be rebuilt and historical data will require reprocessing (assuming it is available).

ELT– It means the data pipeline does only minimal transformation (mostly around data type conversion), with the goal of making sure the data that arrives at the target is as similar as possible to the source data. These are also called high-fidelity pipelines or data-lake architecture. In these systems, the target system collects “raw data” and all required processing is done at the target system. Users of the target system have access to all the data. These systems also tend to be easier to troubleshoot since all data processing is limited to one system rather than split between the pipeline and additional applications.. The transformations take CPU and storage resources at the target system.

1.8 Security

In terms of data pipelines, the main security concerns are:

  • Encryption – the data going through the pipe should be encrypted. This is mainly a concern for data pipelines that cross datacenter boundaries.
  • Authorization – Who is allowed to make modifications to the pipelines?
  • Authentication – If the data pipeline needs to read or write from access-controlled locations, can it authenticate properly?

Kafka allows encrypting data on the wire, as it is piped from sources to Kafka and from Kafka to sinks. It also supports authentication (via SASL) and authorization. Kafka’s encryption feature ensures the sensitive data can’t be piped into less secured systems by someone unauthorized. Kafka also provides an audit log to track access—unauthorized and authorized. With some extra coding, it is also possible to track where the events in each topic came from and who modified them, so you can provide the entire lineage for each record.

1.9 Failure Handling

It is important to plan for failure handling in advance, such as:

  • Can we prevent faulty records from ever making it into the pipeline?
  • Can we recover from records that cannot be parsed?
  • Can bad records get fixed (perhaps by a human) and reprocessed?
  • What if the bad event looks exactly like a normal event and you only discover the problem a few days later?

Because Kafka stores all events for long periods of time, it is possible to go back in time and recover from errors when needed.

1.10 Coupling and Agility

One of the most important goals of data pipelines is to decouple the data sources and data targets.

There are multiple ways accidental coupling can happen:

  • Ad-hoc pipelines
  • Loss of metadata
  • Extreme processing

1.11 Ad-hoc Pipelines

Some companies end up building a custom pipeline for each pair of applications they want to connect.

For example:

  • Use Logstash to dump logs to Elasticsearch
  • Use Flume to dump logs to HDFS
  • Use GoldenGate to get data from Oracle to HDFS
  • Use Informatica to get data from MySQL and XMLs to Oracle

This tightly couples the data pipeline to the specific endpoints and creates a mess of integration points that requires significant effort to deploy, maintain, and monitor. Data pipelines should only be planned for systems where it’s really required.

1.12 Loss of Metadata

If the data pipeline doesn’t preserve schema metadata and does not allow for schema evolution, you end up tightly coupling the software producing the data at the source and the software that uses it at the destination. Without schema information, both software products need to include information on how to parse the data and interpret it. For example: If data flow from Oracle to HDFS and a DBA added a new field in Oracle without preserving schema information and allowing schema evolution, either every app that reads data from HDFS will break or all the developers will need to upgrade their applications at the same time. Neither option is agile. With support for schema evolution in the pipeline, each team can modify their applications at their own pace without worrying that things will break down the line.

1.13 Extreme Processing

Some processing/transformation of data is inherent to data pipelines. Too much processing ties all the downstream systems to decisions made when building the pipelines. For example, which fields to preserve, how to aggregate data. This often leads to constant changes to the pipeline as requirements of downstream applications change, which isn’t agile, efficient, or safe. The more agile way is to preserve as much of the raw data as possible and allow downstream apps to make their own decisions regarding data processing and aggregation.

1.15 Kafka Connect Versus Producer and Consumer

When writing to Kafka or reading from Kafka, you have the choice between using a traditional producer and consumer clients and using the Connect APIs and the connectors. Use Kafka clients when you can modify the code of the application that you want to connect an application to and when you want to either push data into Kafka or pull data from Kafka. Use Connect to connect Kafka to datastores that you did not write and whose code you cannot or will not modify. Connect is used to pull data from the external datastore into Kafka or push data from Kafka to an external store. For datastores where a connector already exists, Connect can be used by non-developers, who will only need to configure the connectors. Connect is recommended because it provides out-of-the-box features like configuration management, offset storage, parallelization, error handling, support for different data types, and standard management REST APIs. If you need to connect Kafka to a datastore and a connector does not exist yet, you can choose between writing an app using the Kafka clients or the Connect API. Writing a small app that connects Kafka to a datastore sounds simple, but there are many little details you will need to handle data types and configurations that make the task non-trivial. Kafka Connect handles most of this for you, allowing you to focus on transporting data to and from the external stores.

1.16 Summary

  • Kafka can be used to implement data pipelines
  • When designing the data pipelines, various factors should be considered.
  • One of the most important Kafka features is its ability to deliver all messages under all failure conditions.