a query folding?” “Does your query fold?”… Maybe someone asked you those questions, but you were like: “Query…Whaaaat?!”
Or, maybe you’ve heard about query folding in Power BI, but didn’t know how to take advantage of it in real-life scenarios.
If you recognized yourself in (at least) one of the two situations specified above, then please continue reading this article.
Fine, you are curious to find out what a Query folding is. But, first things first…Before you proceed, we need to establish some theoretical foundations, which will put the Query folding feature in the proper context.
Data Shaping
and why it is one of the key concepts in the data preparation phase. Now, I would like to expand on that in a (maybe) unusual way:
I guess you all know about the book written by Thomas More, called “Utopia”.
In that story, everything is perfect and everyone is satisfied. In an ideal world, let’s call it “Data Utopia”, we have clean, high-quality data that just flies into our reports “as-is”, without needing to perform any kind of face-lifting or transformations along the way. Unfortunately, “Data Utopia” can exist only in books — the reality is crueler — as we have to deal with numerous challenges while nurturing our data.
That being said, one of the key concepts that we have to absorb is Data Shaping. Data shaping is the process you should perform once you get familiar with your data, and become aware of possible pitfalls within the data you are planning to use in your business intelligence solution.
I’ve intentionally used the term “Business Intelligence” instead of “Power BI”, as this is a general concept that should be used outside of Power BI solutions too.
In simple words, data shaping is the process of data consolidation, BEFORE it becomes part of your data model. The key thing to keep in mind is the word: BEFORE! So, one would perform data shaping before the data goes into the report itself. Data shaping can be done at different places, and, depending on where you apply data shaping techniques, at different points in time during the data preparation process.
WHERE should you perform data shaping?
Source Database — This is the most obvious choice and in most cases the most desirable scenario. It is based on traditional data warehousing principles of Extracting-Transforming-Loading (ETL) data. In this scenario, you define what data you want to extract (not all data from the database is needed, and it’s usually not a good idea to import all the data). Then, you decide if your data needs to be transformed along the way, to suit your reporting needs better — for example, do you want to perform currency conversion, or do you need to conform country and city names?
Do you recognize the city in the following image?
Yes, it’s New York. Or, is it NYC? Or, is it New York City? Which one of these three names is correct? Yes, all of them are correct — but if you import the data into your data model like this, you will get incorrect results — as each New York, NYC, and New York City will be treated as a separate entity. This, and many more potential caveats, need to be solved during the Data Shaping phase, and that’s why it’s important to spend some time massaging your data.
Power Query
If you don’t perform data transformations on the source side, the next station is Power Query — it’s the built-in tool within Power BI, that enables you to perform all kinds of transformations to your data. According to Microsoft’s official documentation, you can apply more than 300 different transformations!
The key advantage of Power Query is that you can perform complex data transformations with little or no coding skills! Additionally, all steps you’ve applied during the data transformation process are being saved, so every time you refresh your dataset, those steps will be automatically applied to shape your data and prepare it for consumption via reports.
Under the hood of Power Query is a Mashup engine, that enables your data shaping to run smoothly. Power Query uses a very powerful M language for data manipulation. And, now you are probably asking yourselves, what does all this story about data shaping, Power Query, Mashup engine, M language, etc. have to do with Query folding? I don’t blame you, it’s a fair question, but we will come back soon to answer it.
What is a Query folding?
For some data sources, such as relational databases, but also non-relational data sources, for example, OData, AD, or Exchange, the Mashup engine is able to “translate” M language to a language that the underlying data source will “understand” — in most cases, it’s SQL.
By pushing complex calculations and transformations directly to a source, Power Query leverages the capabilities of the robust relational database engines, that are built to cope with large volumes of data in the most efficient way.
That ability of Power Query’s Mashup engine to create a single SQL statement combining all M statements behind your transformations is what we call Query folding.
Or, let`s make it simple: if the Mashup engine is able to generate a single SQL query that is going to be executed on the data source side, we say that the query folds.
Data sources that support Query folding
As already mentioned, the most obvious beneficiary of query folding is relational database sources, such as SQL Server, Oracle, or MySQL. However, it`s not just that SQL databases take advantage of the query folding concept. Essentially, any data source that supports some kind of querying language can possibly take advantage of query folding. Those other data sources are OData, SSAS, SharePoint lists, Exchange, and Entra ID.
On the other hand, when you use data sources such as Excel files, BLOB storage files, flat files, etc. in your Power BI datasets, the query can’t fold.
Data Transformations that support Query folding
However, when it comes to data sources that support query folding in general, it’s important to keep in mind that not all transformations can be folded and pushed to a data source. So, just to be clear, the fact that a SQL database supports query folding doesn’t necessarily mean that your query will fold! There are some Power Query transformations that simply can`t be pushed to a SQL database engine.
Very often, some subtle differences in the Power Query transformations can be decisive in the final outcome, and whether your query will fold or not. I’ll show you a few of those subtle differences in the following sections.
Generally speaking, the following transformations, when applied in Power Query, can be “translated” to a single SQL statement:
- Removing columns
- Renaming columns
- Filtering rows, with static values or Power Query parameters, as they are treated as WHERE clause predicates in SQL
- Grouping and summarizing, which are equivalent to SQL’s Group by clause
- Merging of foldable queries based on the same source, as this operation can be translated to JOIN in SQL. When I said, merging of foldable queries — that means it will work if you are joining two SQL server tables, but it will not work if you are trying to join a SQL table and an Excel file
- Appending foldable queries based on the same source — this transformation relates to the UNION ALL operator in SQL
- Adding custom columns with simple logic. What does simple logic mean? Using M functions that have equivalents in SQL language, for example, mathematical functions, or text manipulation functions
- Pivot and Unpivot transformations
On the other hand, some transformations that will prevent the query from folding are:
- Merging queries based on different sources, as explained previously
- Appending (union-ing) queries based on different sources — similar logic as in the previous case
- Adding custom columns with complex logic or using some M functions that don’t have a counterpart in SQL
- Adding index columns
- Changing a column data type. This one is a typical “it depends” case. I will show you soon what it depends on, but just keep in mind that changing a column data type can be both a foldable and a non-foldable transformation
Now, let’s examine why it is important to achieve this behavior — or, maybe it’s better to say, why should you care if the query folds or not?
Why should you care about Query folding?
When you’re using Import mode in Power BI, the data refresh process will work more efficiently when the query folds, both in terms of refresh speed and resource consumption.
If you are working with DirectQuery or Dual storage mode, as you are targeting the SQL database directly, all your transformations MUST fold — or your solution will not work.
Finally, query folding is also of key importance for Incremental refresh — it’s so important that Power BI will warn you once it determines that query folding can’t be achieved. It will not break your incremental refresh “per-se”, but without query folding in place, an incremental refresh wouldn’t serve its main purpose — to reduce the amount of data that needs to be refreshed in your data model — as without query folding, Mashup engine needs to retrieve all data from the source and then apply subsequent steps to filter the data.
With all these in mind, you should tend to achieve query folding whenever possible.
Slow report — don’t blame Query folding!
One important disclaimer here, and this is one of the key takeaways from this series of blog posts: if your report is slow, or your visuals need a lot of time to render, or your data model size is large, query folding has nothing to do with it!
Only if your data refresh or incremental refresh is slow and inefficient, you should investigate your Power Query steps in more depth.
All or nothing?
A few more things to keep in mind regarding query folding. It’s not an all-or-nothing process. That means if you have, let’s say, 10 transformation steps within Power Query, and your query folds until the 6th step, you will still get some benefit from partial query folding. However, once the query folding is broken, it can’t be achieved anymore.
To simplify, if you have 10 transformation steps, and your query folding is broken in the 5th step, all previous steps will fold, but once the folding is broken, it can’t be achieved again, even if you have transformations that support query folding by default in steps 6 to 10 — like in our example where filtering should be a foldable step, these steps will not fold. Keep that in mind, and try to push all non-foldable steps down the pipeline as much as possible.
How do you know if the query folds?
Ok, now we are not rookies anymore. We know what query folding is, why we should strive to achieve it, and some subtle tricks that can make a huge difference.
Now, it’s time to learn how to check if the specific query folds or not. The first and most obvious way is to right-click on the step and check what the View Native Query option looks like.
If it’s greyed out, this step probably does not fold. On the other hand, if you are able to click on this option, that means that your query will fold. I guess you are maybe confused with the word: PROBABLY!
But, that’s the proper word, as you can’t be 100% sure that if the View Native Query option is disabled, your query doesn’t fold. I will show you later how this option can trick us into thinking that the query folding was broken, even though, in reality, folding actually occurs.
Instead, when you want to be sure if your query folds or not, you can use the Query Diagnostics feature within Power Query Editor, or SQL Server Profiler, like a good old and reliable way to check the queries sent to a database by the Power BI engine.
Additionally, there is a cool feature in Power Query Online, where each step is marked with the icon that shows if that step folds, does not fold, or is unknown. As I said, this feature is available only in Power Query Online at this moment, so let’s hope that the Power BI team will implement it in the Desktop version soon.
The devil is in the details…
Fine…You’ve probably heard about the saying that the devil is in the details. Now, it’s time to understand how little nuances can make a big difference in our data transformation process.
Let’s start with one of the most curious cases in Power Query editor…
Devil #1 — Merge Join
This one is very interesting, as you will hardly assume what is happening in the background. Let’s say that I want to combine two of my queries into one. I will use the Adventure Works sample database, and I need to merge the FactInternet Sales and DimCustomer tables.
I’ll remove some of the columns from my fact table, and keep only the CustomerKey column, as this is a foreign key to a DimCustomer table, and the Sales Amount column. I will join the DimCustomer table as it is, without any additional steps before merging.
Merging tables is equivalent to JOIN operation in SQL. Essentially, we choose the column on which we want to perform MERGE operation, and the type of join (left, outer, or inner).
The problem is that by default, when you’re merging two queries, Power Query will generate a nested join statement, which can’t be properly translated in SQL.
If I go to the Tools tab and click on Diagnose Step, I can see that the Mashup engine fired two separate queries to my underlying SQL Server database — in other words, those two queries couldn’t be executed as a single SQL statement, and that means that query didn’t fold!
How do we solve this? Let’s just choose a blank query and write our M code by hand to achieve exactly the same result.
The key thing is that we will use a similar, but still different M function: Table.Join.
All function arguments are exactly the same as previously, and let’s now check the outcome.
You remember once I told you that when the View Native Query is greyed out, your query probably doesn’t fold, but it’s not 100% correct. And, this is a good example. If you take a look at View Native Query, it still shows that our query doesn’t fold…
…but let’s go to Diagnostics and check if that’s true.
Oh, boy, we were tricked — this step indeed folded! As you can see in the illustration above, we have a single SQL query generated and sent to a SQL Server source database to be executed.
So, we found two devils in this example — the first one was a join type, which we were able to solve by tweaking the automatically generated M code. And, the other one was the incorrect behavior of the View Native Query option. I’ll show you in the next part of the series one more example when View Native Query lies.
Query folding in Power BI — tricks, lies & ultimate performance test
I assume you are now familiar with the concept of query folding in Power BI, and especially with its importance for data refresh and incremental refresh processes. We’ve also started to scratch some interesting behaviors of Power Query transformations, and in this final part of the article, I will show you a few more interesting findings.
Finally, we will wrap it up with the ultimate performance test — I will show you the exact numbers behind two identical queries — one folds, and the other does not!
Changing Data types
One of the most common transformations in Power Query is changing data type. It’s a well-known best practice to use proper data types in your data model — for example, if you don’t need hours, minutes, and seconds level of granularity in your reports, you should be better off getting rid of them and changing the data type of that column from Date/Time to Date only.
However, the road to hell is paved with good intentions:)…So, let me show you one subtle difference that can cause your query to become damn slow, even though you’ve stuck with the recommendation to use a proper data type!
As you can spot in the illustration above, my OrderDate column is of Date/Time data type. And, I want to switch it to Date only. There are (at least) two possible options to do this — the first one is to right-click on the column, expand the drop-down for the Change Type option (like I did in the illustration), and select Date type (just below the Date/Time):
A few important things happened here, so let me explain each of those:
- In the Applied Steps pane, you can notice that our transformation step had been recorded
- In the column itself, you can see that the time portion disappeared
- When I’ve opened the View Native Query dialog box, you can see that the Mashup engine nicely translated our transformation to a T-SQL CONVERT() function
- The M formula applied to this transformation step is: Table.TransformColumnTypes()
Let’s now examine the other option to change data type of our column:
Just below our previous Change Type option, there is a Transform option. Once you expand the drop-down, you can see the Date Only transformation. Let’s click on it and check what happens:
Looks quite similar, does it? But, let’s walk through all the things that happened now:
- Instead of the Changed Type step, we now have a step called Extracted Date
- The column itself looks exactly the same as in the previous example — no time part in there
- Ooops, the query doesn’t fold anymore! As you can see, the View Native Query option is greyed out!
- This time, M formula applied is: Table.TransformColumns()
So, one single different word in the M formula (Table.TransformColumnTypes vs Table.TransformColumns) affected our query so hard that it couldn’t be translated to SQL!
Takeover from this story: be careful, and watch out when you’re choosing options for changing data types!
Liar, Liar…
I’ve promised in the previous part of the article that I will show you one more example when the View Native Query option can fool you into thinking that query folding was broken, even if in reality it’s not true…
Let’s say that we want to keep only the top X rows from our table. In my case, I want to preserve the top 2000 rows from my fact table:
Once I’ve applied this step and checked the View Native Query, I can realize that my query folds, as my transformation was translated to a TOP clause in SQL:
Now, let’s say that I want to apply Absolute value transformation on my Sales Amount column. Normally, this transformation easily folds, as there is an ABS function in T-SQL:
However, if I right-click on this step, I will see that the View Native Query option is greyed out, so I would assume that this step broke my query folding!
Let’s check this in our Query Diagnostics tool:
Oh, my God! This step folded indeed! So, we were tricked by the View Native Query option again!
The key takeover here is: whenever you’re assuming that a specific transformation step can be folded (like in this example, when we knew that SQL has an ABS function to support our transformation), double-check what really happens under the hood!
The ultimate performance test
Ok, if I didn’t manage to convince you so far, why you should strive to achieve query folding, let me now pull my last ace up my sleeve!
I want to show you the difference in data refresh performance between the queries that return exactly the same results — one of them folds, and the other does not!
Test #1 Query folding ON
For this testing, I’ll use the FactOnlineSales table from the Contoso sample database. This table has around 12.6 million rows, and it’s good to demonstrate the magnitude of importance of the query folding concept.
In the first example, I’ve applied 9 different transformation steps, and all of them are foldable, as you can see in the following illustration:
Don’t pay attention to the SQL code that the Mashup engine generated: if you are a SQL professional, of course, you could write much more optimal SQL code — however, keep in mind that with auto-generated scripts by the Mashup engine, you are not getting the most optimal SQL — you are just getting correct SQL!
I will hit Close & Apply and turn on my stopwatch to measure how much time my data refresh lasts.
This query took 32 seconds to load 2.8 million records in my Power BI report. Data was loaded in batches of 100.000–150.000 records, which is a good indicator that the query folding is in place.
Test #2 Query folding OFF
Now, I will go back to Power Query Editor, and intentionally break query folding at the 3rd step (remember the example above with changing Date/Time type to Date), using the transformation for which I know that is not foldable:
Truth to be said, I will achieve a partial folding here, as first two steps will fold, but all subsequent steps after the Extracted Date transformation will not fold!
Let’s turn on the stopwatch again and check what happens:
The first thing to notice: this query took 4 minutes and 41 seconds to load into our Power BI report, which is approximately 10 times more than in our previous case when the query folded. This time, batches of loaded data were between 10.000 and 20.000 records.
But, what’s even more concerning — you can see that the total number of records loaded was almost 11 million!!! Instead of 2.8 million in the previous example! Why is it happening? Well, in the previous sections, I explained that when the Mashup engine can’t translate M language to SQL, it needs to pull ALL the data (from the moment when the query folding was broken), and THEN apply transformations on the whole chunk of imported data!
The final result is exactly the same — we have 2.830.017 records in our Power BI report — but, with query folding in place, all necessary transformations were performed on the SQL database side, and the Mashup engine got an already prepared data set. While in the second scenario, after we broke the query folding, the Mashup engine pulled the whole remaining rows (approx. 11 million), and only after that was it able to apply other transformation steps.
And, this was just a basic example, with one single table, and not so big in terms of data volume! Simply imagine the magnitude of implications on a larger dataset, with multiple tables in it.
Conclusion
Well, we covered a lot in this article. We learned about the data shaping concept, we introduced Power Query fundamentals, and we also learned what query folding is and why we should do our best to achieve it.
I’ve also shared with you some basic examples and neat tricks on how to achieve query folding in some common use cases.
In the end, please be aware that the query folding is a work in progress, and folks from the Power BI team are constantly improving this feature. So, it can happen that some of the issues with query folding I’ve shown you here are resolved in the meantime. Therefore, be sure to stay up to date with the latest improvements.
Thanks for reading!
Source link
#Query #Folding #Power #Care