BI Office Query Speed Optimization
by the Pyramid Analytics’ Dev Team
Given the vast number of query formations possible with MDX, together with the almost endless cube designs and calculation combinations available, there is often a significant challenge building the fastest queries possible automatically. To address this problem, BI Office has deployed several optimization switches to let end users adjust the way a query is built to deliver the fastest possible result
Several of the following optimization features are already available in various versions of BI Office. The document below, however, addresses the latest implementation of these items in 6.31 onwards.
Keeping it Simple
Given the large number of optimization choices available in BI Office, we recommend applying the new “Query Options” button (on the Query ribbon) to a slow query, which will enable each of the optimizations covered in this document. If performance does not improve, then we recommend turning each item off, one by one, until the optimal formulation is found for your specific circumstance. The query options include:
- Optimize Columns
- Optimize Rows
- Measure Optimization
- Optimize Totals
- Context Heuristics
- Sub Query Mode
Column and Row Optimization
Selecting the “Optimize Columns” or “Optimize Rows” switches (or both) employs the NONEMPTY MDX function to reduce the number of blanks in the query result. By removing sparsity from the query, performance can be dramatically improved in many cases. This is most notable when two or more hierarchies, each from different dimensions, are used on a given query axis (columns or rows).
When “Optimize Columns” and “Optimize Rows” switches are used, we recommend turning on “Measure Optimization” as well (see below).
Key things to note:
- NONEMPTY can sometimes return no results depending on the cube structure and/or calculation logic. This can typically happen when:
- One or more ALL levels are empty.
- MDX scope statements programmatically return no results for the query selections.
- The fact table and dimension tables intersect with no results for the selected dimensions and the default model/cube measures.
- NONEMPTY can sometimes make queries run slower. This is often seen when determining sparsity takes longer than running the query itself. As such, this optimization should only be used on large queries.
- Using NONEMPTY on an axis (columns or rows) that has only one selected hierarchy is sometimes counterproductive. In these cases, it should generally not be used.
Because of the wide variance in results and the complexity in automatically determining if they are appropriate or not, the optimization switches cannot be applied automatically and should be left off unless specifically needed.
Column and row optimization can be triggered from the “Optimize” buttons in the “Columns” or “Rows” on the Query ribbon; or by the “Query Options”’ button also found in the Query ribbon.
Filter and N-of-N Optimization
The NONEMPTY switch can also be triggered by clicking the “Filter” and “N-of-N” buttons, found on the Query ribbon, and checking Optimize in the dialogs. These are simply shortcuts to the row and column optimization functionality described above. Running filters and N-of-N with selections with the NONEMPTY applied can produce significant improvements in performance. Review more details on the value of N-of-N HERE.
NONEMPTY vs NON EMPTY
NONEMPTY and NON EMPTY are two different functions. The NONEMPTY optimization, described above, attempts to eliminate sparsity in the query as part of the query’s functional design. In contrast, the NON EMPTY clause, applied to each axis, attempts to eliminate blank rows or columns from the final data set, after the result has been generated. The difference is usually minimal unless the cube is very large, the query result set is large, or there are complex calculations involved.
The “Measure Optimization” switch will calibrate the way column or row optimization operates. If measures are placed on the column or row axis, the NONEMPTY function often performs far better if the specific measures in the queries are injected into its formulation. Since the switch uses heuristics to determine applicability and activation, it should always be left on when using column or row optimization. Further, testing has shown that at worst it does not make the standard optimization NONEMPTY formulation any slower.
Note that this does not affect totals optimization (below).
“Measure Optimization” can be triggered from the “Query Options” button in the Query ribbon. BI Office is able to detect when it is generally optimal to apply this function, so it should be left on by default.
The “Optimize Totals” switch also employs the NONEMPTY function, but does so only within the total operation when it is used in a query. Using totals optimization does not trigger column or row optimization as well (and vice versa). The optimization, as described above, can have a dramatic improvement on totaling performance and overall query speed. Unfortunately, it also has the same drawbacks, so its use needs to be checked carefully.
Note that the NONEMPTY will not be deployed if there is only one hierarchy in the totals function.
Totals optimization can be triggered from the “Totals” menus on the Home or Query ribbons; from the “Query Options” button in the Query ribbon; or under the “Advanced Options” tab in the “More Totals Options…” dialog box from the “Totals” menu.
The “Context Heuristics” optimization switch will inject the EXISTING MDX function explicitly into the query as needed. The heuristic engine will determine if it is needed to eliminate redundant query cycles that are sometimes produced in the Microsoft MDX engine when using two or more hierarchies from the same dimension on a specific axis (column or row).
By employing this function, highly nested queries with sibling hierarchies can see dramatic improvements in performance. This function should not be confused with NONEMPTY—which removes sparsity from the cross join of hierarchies from two or more different dimensions. In a few cases, it can make queries slower—in which case it should be turned off.
“Context Heuristics” can be triggered from the “Query Options” button in the Query ribbon. BI Office is able to detect when it is generally optimal to apply this function, so it should be left on by default.
The MDX “Sub Query Mode” capability is provided to focus the main query on the relevant part of the model/cube. In many cases, it can improve performance, especially when attempting to filter or slice a query. BI Office, in some query situations, uses sub queries automatically to solve requests when there is no alternative method.
However, sub queries in the Microsoft engine come with numerous drawbacks that make them impractical as a universal solution to all query types.
Key things to note:
- Sub queries cannot contain any calculations or sets. This severely limits them to queries with natural elements and basic selections only. Workarounds to this problem exist, but severely impact the breadth and depth of the calculation capabilities afforded in BI Office with MDX.
- Sub queries often perform no better than the standard WHERE clause, especially when single elements from dimensions are chosen as the filter. They have a greater impact when multiple elements from a given hierarchy are chosen as the filter.
- Sub query filter selections can be undone by certain main query selections and functions—at times producing erroneous results.
- If context is needed in the main query and it also affects the slicers, then the context (WHERE) needs to applied twice: in the sub query and the main query. This can produce complex query statements and unintended results.
Because of the wide variance in results and the complexity in automatically determining if it is appropriate or not, the sub query switch cannot be applied automatically and should be left off unless specifically needed. Users can, however, elect to make sub querying the default option from the user option section in BI Office.
Sub querying can be triggered from the “Query Options” button in the Query ribbon.
Other Optimization Tips
For End Users
1. Turn “Query Pre-check” off: This BI Office feature is designed to protect users from accidentally running large queries and depleting server resources. The pre-check attempts to measure the size of the possible rows and columns in the result set and will block overly large queries from executing. By switching this feature off, the pre-check process, which can take much time to calculate, can be skipped.
“Query Pre-check” can be turned on or off from the Query Ribbon.
2. Use the “Flat” grid visualization: BI Office allows users to choose from six grid styles. If the query result is large, the “Flat” grid will deliver the fastest rendering time, while the “Multi” grid will produce the longest. Switching to the “Flat” grid for large queries will improve performance.
Grid visualization styles can be set from the “Type” button on the Grid ribbon.
3. Build better aggregations using Usage Based Optimization in SSAS: This feature applies only to OLAP cubes. By sampling queries against your cubes in Analysis Services, you’ll be able to use the sampled data to drive better aggregation designs in the cubes. Applying these aggregations can also dramatically improve the performance of your cubes as long running, but highly demanded queries, are optimized inside the cube’s structure.
4. Check IIS compression settings on the web server: As of 6.31, the recommendation is to turn OFF both static and dynamic settings for the BI Office website—they usually add processing overhead without any particular value. Note that in many cases, leaving it on for the “HTML” and “MobileApp” virtual application nodes in the site can have a positive impact. If you’re unsure, these should be left ON.
Obliquely related to this topic, we noticed some other optimizations:
- If you're using a tabular model, slim it down as much as possible, removing unused or irrelevant columns. This makes it lighter and faster for queries.
- Sometimes, using custom sets in the query make it faster, especially on Tabular (perhaps a Microsoft bug).
- If you're embedding BioXL content into a BioPoint, remove the unused slicers from the BioXL sources. It will make the rendering of the underlying query faster in BI Office.
- Highly nested queries (many dimensional selections on rows or columns) makes Tabular queries really slow. So cutting it down to roughly 2 on columns or 2 on rows produces a far better response.