Showing posts with label aggregates. Show all posts
Showing posts with label aggregates. Show all posts


Dimension design: A different perspectiveObjective: The objective of this post is to simplify the understanding on dimension designs of an infocube and to decide upon the dimensions based on the repetition of the data held in the dimension tables.


Pre-requisites: An infocube is already created and active, and filled will data, which will be used for analysis of dimension tables.

Dimension to Fact Ratio Computation: This ratio is a percentage figure of the number of records that exists in the dimension table to the number of records in fact table or what percentage of fact table size is a dimension table. Mathematically putting it down, the equation would be as below:

          Ratio = No of rows in Dimension table X 100 / No of rows in Fact Table

Dimension Table Design Concept: We have been reading and hearing over and over again that the characteristics should be added into a dimension if there exists a 1:1 or 1:M relation and they should be in separate dimension if there exists a M:M relation. What is this 1:1 or 1: M? This is the relation which the characteristics share among each other.
For instance if one Plant can have only one Storage Location and one storage location can belong to only one plant at any given point of time, then the relation shared among them is 1:1.
If 1 Functional Location can have many equipment but one equipment can belong to only one functional location then the relation shared between the functional location and Equipment is 1:M.
If 1 sales order can have many materials and one material can exist in different sales orders then there absolutely is no dependence among these two and the relation between these two is many to many or M: M.

Challenges in understanding the relationship: Often we SAP BI consultants depend on the Functional consultants to help us out with the relationship shared between these characteristics / fields. Due to time constraint we generally cannot dedicate time to educate the functional consultants on the purpose of this exercise, and it takes a lot of time to understand this relationship thoroughly.


Scenario: An infocube ZPFANALYSIS had few dimensions which were way larger than the preferred 20% ratio. This had to be redesigned such that the performance was under 20% ratio.
This ratio could be either manually derived by checking the number of entries in the desired dimension table (/BIC/D<infocube name><dimension number>) to the fact table (/BIC/F<Infocube Name> or /BIC/E<Infocube name>) or a program SAP_INFOCUBE_DESIGNS can be executed in SE38 which reports this ratio for all the dimensions, for all the infocubes in the system.

SAP_INFOCUBE_DESIGNS:
1.jpg
We can find from the report that the total number of rows in the fact table is 643850. Dimension 2 (/BIC/DZPFANLSYS2) has around 640430 rows, which is 99% (99.49%)of the fact table rows and Dimension 4(/BIC/DZPFANLSYS4) has around 196250 rows, which is 30%  (30.48%)of the fact table rows.

Infocube ZPFANLSYS:
ZPFANLSYS.jpg

Approach:

Step 1: Analysis of the dimension table /BIC/DZPFANALSYS2 to plan on reducing the number of records.
/BIC/DZPFANLSYS2
3.jpg

Fact table:
4.jpg
Dimension table holds 1 record more than the fact table.
View the data in the table /BIC/DZPFANLSYS2 (Table related to Dimension 2) in SE12 and sort all the fields. This sorting will help us spot the rows which have repeated values for many columns, which will eventually lead to understanding the relationship between the characteristics (columns in dimension table).
5.jpg

Identifying the relationships:
Once the sorting is done we need to look out for the number of values that repeat across the columns. All the records which repeat could have been displayed in a single row with one dimension id assigned if all the columns had same data. The repetition is a result of one or more columns which contribute a unique value to each row. Such columns if removed from the table then the number of rows in the table will come down.

In the below screenshot I’ve highlighted the rows in green that were repeating themselves with new dimension IDs, as only 2 columns SID_ZABNUM and SID_0NPLDA have new values for every row. These two columns having new values for every row have resulted in rest of the columns repeating themselves and in turn increasing the data size in the dimension table. Hence it can be easily said that these two columns do not belong in this dimension tables, so the related characteristics (ZABNUM and 0NPLDA) need to be removed out of this dimension.
Few rows could be found which repeat themselves for most of the columns, but have a new value once in a while for some columns, as highlighted in yellow in the below screenshot. This indicates that these columns share a 1:M relation with the rest of the columns with repeated rows and these could be left in the same dimension.
6.jpg
Conclusion: The columns marked in green belong to this dimension tables and the columns marked in red needs to be in other dimension tables.
7.jpg
Step 2: Create a copy infocube C_ZPFAN and create new dimensions to accommodate ZABNUM and 0NPLDA.
8.jpg
ZABNUM was added to dimension C_ZPFAN8 and 0NPLDA was added to C_ZPFAN7. These were marked as line item dimensions as they have only one characteristic under them.
Analysed the issue with dimension 4 in the similar way and changed other dimensions to help the situation.

Post changes, loaded the data into the copy infocube C_ZPFAN and found the number of records in the dimension table /BIC/DC_ZPFAN2 to be 40286.
9.jpg

Ratio: 40286 / 657400 * 100 = 6.12 %


SAP_INFOCUBE_DESIGNS:
10.jpg

Dimension2 of the copy infocube: /BIC/DC_ZPFAN2
11.jpg
Even now there a few repeated rows and columns, but the ratio is within 20%. We can create up to 13 dimensions, but it is always better to keep a dimension or two free for future enhancements.

Hope this was helpful.

All about.... SAP BW / BI Data Load Performance Analysis and Tuning

SAP BW Data Load Performance Analysis and Tuning

By: Ron Silberstein

Overview:

The staging process of any significant volume of data into SAP BW presents challenges to system resource utilization and timeliness of data. This session discusses the causes of data load performance issues, highlights the troubleshooting process, and offers tuning solutions to help maximize throughput. Many aspects of data load performance analysis and tuning are covered including extraction, packaging, transformation, parallel processing, as well as change run and aggregate rollup.




All about BEX Query Performance....

Checklist for Query Performance

By: Neelam

1. If exclusions exist, make sure they exist in the global filter area. Try to remove exclusions by subtracting out inclusions.

2. Use Constant Selection to ignore filters in order to move more filters to the global filter area. (Use ABAPer to test and validate that this ensures better code)

3. Within structures, make sure the filter order exists with the highest level filter first.

4. Check code for all exit variables used in a report.

5. Move Time restrictions to a global filter whenever possible.

6. Within structures, use user exit variables to calculate things like QTD, YTD. This should generate better code than using overlapping restrictions to achieve the same thing. (Use ABAPer to test and validate that this ensures better code).

7. When queries are written on multiproviders, restrict to InfoProvider in global filter whenever possible. MultiProvider (MultiCube) queries require additional database table joins to read data compared to those queries against standard InfoCubes (InfoProviders), and you should therefore hardcode the infoprovider in the global filter whenever possible to eliminate this problem.

8. Move all global calculated and restricted key figures to local as to analyze any filters that can be removed and moved to the global definition in a query. Then you can change the calculated key figure and go back to utilizing the global calculated key figure if desired

9. If Alternative UOM solution is used, turn off query cache.

10. Set read mode of query based on static or dynamic. Reading data during navigation minimizes the impact on the R/3 database and application server resources because only data that the user requires will be retrieved. For queries involving large hierarchies with many nodes, it would be wise to select Read data during navigation and when expanding the hierarchy option to avoid reading data for the hierarchy nodes that are not expanded. Reserve the Read all data mode for special queries—for instance, when a majority of the users need a given query to slice and dice against all dimensions, or when the data is needed for data mining. This mode places heavy demand on database and memory resources and might impact other SAP BW processes and tasks.

11. Turn off formatting and results rows to minimize Frontend time whenever possible.

12. Check for nested hierarchies. Always a bad idea.

13. If “Display as hierarchy” is being used, look for other options to remove it to increase performance.

14. Use Constant Selection instead of SUMCT and SUMGT within formulas.

15. Do review of order of restrictions in formulas. Do as many restrictions as you can before calculations. Try to avoid calculations before restrictions.

16. Check Sequential vs Parallel read on Multiproviders.

17. Turn off warning messages on queries.

18. Check to see if performance improves by removing text display (Use ABAPer to test and validate that this ensures better code).

19. Check to see where currency conversions are happening if they are used.

20. Check aggregation and exception aggregation on calculated key figures. Before aggregation is generally slower and should not be used unless explicitly needed.

21. Avoid Cell Editor use if at all possible.

22. Make sure queries are regenerated in production using RSRT after changes to statistics, consistency changes, or aggregates.

23. Within the free characteristics, filter on the least granular objects first and make sure those come first in the order.

24. Leverage characteristics or navigational attributes rather than hierarchies. Using a hierarchy requires reading temporary hierarchy tables and creates additional overhead compared to characteristics and navigational attributes. Therefore, characteristics or navigational attributes result in significantly better query performance than hierarchies, especially as the size of the hierarchy (e.g., the number of nodes and levels) and the complexity of the selection criteria increase.

25. If hierarchies are used, minimize the number of nodes to include in the query results. Including all nodes in the query results (even the ones that are not needed or blank) slows down the query processing. The “not assigned” nodes in the hierarchy should be filtered out, and you should use a variable to reduce the number of hierarchy nodes selected.