Fixing a fundamental flaw in Qlik Sense, that costs your developers and analysts up to 50% of their time each month.
Let's admit it. With all the advantages of the Qlik platform, it has an extremely weak, but very important side - the construction of data models. We all have seen commercials where users, by dragging “circles” on top of each other, build an associative data model, and immediately start working with it.
At first glance, everything is cool - there are no primary/secondary keys, no relationship types, no relationship directions. But everyone who works seriously with Qlik knows that this scenario is only suitable for primitive models - with one fact table in the center, and bunch of dictionaries around it.
Do you want to analyze the data according to the plan-fact scenario, where the data is located in 2 tables? You already need to come up with a scheme with a composite key.
The Plans and Sales table is connected by a composite key by date and user ID
In fact, in this model, dates are represented by two different fields. Means, you can’t visualize it properly on one date axis.
What if one of the tables has not one field with a date, but two? We need to create a canonical date field, and this:
1) Can be considered impossible to do in a “bubble builder”
2) A concept that needs to be understood and cannot be done by a simple business user.
Actually, this is just the top of the iceberg. The fact is that due to the peculiarities of the associative engine, a situation may arise when you physically cannot assemble a data model, where tables are connected directly to each other.
Tasks, notes, and events can be linked to contacts, deals, and companies. Good luck collecting this in the data manager). By the way, behind the scenes there was a Users table that has a connection with each table of this model.
Anything can happen. From cyclic connections to the inability to get the desired logic of the model. And if you managed to assemble the model in such a way that it works now, then in the future, when adding new data, you can come to a dead end. Because your tasks have gone beyond the capabilities of the data manager.
Therefore, in the end, data models in analytical applications begin to come down to the Star topology: tables are not connected to each other, but through a central bridge containing a set of links for each model table to all other tables. And here comes the part that costs you a lot of money and time.
The matter is that you must write a script for Qlik, for Link table generation. That in itself takes time. But the fact that scripts are written by programmers, i.e. people, leads to the fact that they create a difficult to maintain business application architecture. See for yourself.
Fragment of the script for creating a data model
We have a number of analytical tables (Views), that are prepared by some tool. When creating an application in Qlik, we need to write a data model creation script. By creating several applications based on the same set of Views, we will get models that are incompatible with each other with a 99% probability. Why? Because of the human factor.
We have always hired highly qualified developers for our projects. They were great at writing complex data transformation scripts. Means they effectively solved distinctive, creative non-trivial tasks. But when it comes to models, uniformity, standardization, and pipelineability become important.
Look. You have created data models for different departments that solve the task, but each of them is written in its own way. Somewhere we joined dictionaries, somewhere not. Somewhere fields was renamed. Somewhere we made a link block that works specifically inside this model. And suddenly, we need to add an array of data from a new source to each of the models. We will have to do this 3 times, adjusting to the specifics of each application.
We not only spent time writing code. We spend 3 times more time on its support and improvements, scaling. And we are approaching the moment when we will have to refactor the entire solution in order to bring it to a single form. We depend on the people who wrote these models, because only they know exactly how everything works there. Or maybe they don’t know, because several months have passed, and they developed the solution as closing the problem in the moment, not thinking about further scaling.
It would seem that the standards should save the situation. but actually no. Because if a person writes the code, he will always do it a little differently. And you will have to spend resources on monitoring standards, and bringing decisions to this standard.
Faced with this situation 4 years ago, we decided that clear guidelines would help us develop applications easier and faster. After writing them, we realized that the actions that we expect from developers can be automated. Thus began the story of creating an automatic generator of data models for Qlik - Super Star Data Model. In the next article, we will share what results we got, how it works, and how you can apply our experience in your business.
2. How to work with the topology of Star data models, and not bury yourself.
Due to the nature of Qlik's associative engine, the Star topology is the only way to implement complex data models. This approach is well described in the “Unified Star Schema” paradigm by Francesco Puppini and Bill Inmon. Unlike the relational model, in the Star topology, tables are not connected to each other, but through a central bridge (link table).
The star can be presented in 2 versions: the central table contains fields for calculating measures and general measurements. Dimension tables hang around the central table like the rays of a star. This option is used to optimize cross-tab calculations and may not works with complex models, especially with one-to-many links scenarios. Because of data duplication in central table.
Or, only common dimensions (key fields) are placed in the central table. As well as additional link options. This allows you to implement complex connection scenarios, because a relationship can be described not by one specific value, but by several. For example, to aggregate the amount of sales by employees by user role: by current responsible, or by those who created the deal.
The data tables are joined to the link table by the primary key.