By Martin Grob
Introduction
In reality dimensions in an InfoCube are often designed by business terms (like material, customer etc.) This often leads to the impression that InfoCube dimensions should be designed based on business constraints. This although should not be the leading criteria and shouldn't drive the decision.
Aside from the datavolume which depends on the granularity of the data in the InfoCube, performance is very much depending on how the InfoObject are arranged in the dimensions. Although this has no impact on the size of the fact table it certainly has one on the size of the dimensions.
How is a dimension then designed?
The main goal distributing the InfoObjects in their dimensions must be to keep the dimensions as small as possible. The decision on how many dimension and what InfoObjects go where is purely technical driven. In some cases this matches the organisational view but this would only be a conicidence and not the goal.
There is a few guidelines that should be considered assigning InfoObjects to dimensions:
- Use as many dimensions as necessary but it's more important to minimize dimension size rather than the number of dimensions.
- Within the dimension only characteristics that have a 1:n relation should be added (e.g. material and product hierarchy)
- Within a dimension there shouldn't be n:m relations. (e.g. product hierarchy and customer)
- Document level InfoObjects or big characteristics should be designed as Line-Item dimensions. Line item dimensions are not a true dimensions they have a direct link between the fact table and the SID table.
- The most selective characteristics should be at the top of the dimension table
- Don't mix characteristics with values that change frequently causing large dimension tables. (e.g. material and promotions)
- Consider also to combine unrelated characteristics it can improve performance by reducing the number of table joins. (you only have 13 dimensions so combine the small ones)
As a help the report (SE38) SAP_INFOCUBE_DESIGNS can be used.
This yellow marked dimension should be converted into a line item dimension if it contains a document level characteristic or it is simply bad design.
The maximum number of entries a dimension potentially can have is calculated through the cartesian product of all SID's. (e.g. 10'000 customer and 1'000 product hierarchies lead to 10'000'000 possible combinations in the dimension table. It's unlikely that this is going to happen and while designing the dimension this should also be considered - analyzing the possibilities of all customers buying all products in this case.
In cases where there is an m:m relationship it usually means there is a missing entity between those two and therefore they should be stored in different dimensions.
Once data is loaded into the InfoCube a check on the actual number of records loaded into the dimension table vs. the number of record in the fact table should be done. As a rule of thumb the ratio should be between 1:10 and 1:20.
Degenerated Dimensions
If a large dimension table reaches almost the size of the fact table when measured the number of rows in the tables it's a degenerated dimension. The OLAP processer has to join two big tables which is bad for the query perfromance. Such dimensions can be marked as Line Item Dimensions causing the database not to create an actual dimension table. Checking the table /BIC/F<INFOCUBE> will then show that instead of the DMID dimension key the SID of the degenerated dimension table is placed in the fact table. (Field name RSSID). With this a join of the two tables is eliminated. Those dimensions can only hold one InfoObject as a 1:1 relationship must exist between the SID value and the DIMID.
Dimensions with a lot of unique values can be set to High Cardinality which changes the method of indexing dimensions. (ORA DB only) This results in a switch from a bitmap index to a B-Tree index.
Defining a dimension as Line Item Dimension / High Cardinality
Conclusion
Finding the optimal model and balancing the size and the number of dimensions is a delicate excercise.
Dimensions in MultiProvider do not have to follow the underlying InfoCubes definitions. Those can be focused on the end users need and be structrured by the organizations meaning. This does not affect the performance as the MultiProvider does not have a physically existing datamodel on the database.
Designing the dimension in an InfoCube correctly can have a significant improvement on performance!