Showing posts with label enterprise_data_warehouse_architecture. Show all posts
Showing posts with label enterprise_data_warehouse_architecture. Show all posts

By Martin Grob


Introduction


In reality dimensions in an InfoCube are often designed by business terms (like material, customer etc.) This often leads to the impression that InfoCube dimensions should be designed based on business constraints. This although should not be the leading criteria and shouldn't drive the decision. 
Aside from the datavolume which depends on the granularity of the data in the InfoCube, performance is very much depending on how the InfoObject are arranged in the dimensions. Although this has no impact on the size of the fact table it certainly has one on the size of the dimensions.


How is a dimension then designed?

The main goal distributing the InfoObjects in their dimensions must be to keep the dimensions as small as possible. The decision on how many dimension and what InfoObjects go where is purely technical driven. In some cases this matches the organisational view but this would only be a conicidence and not the goal.

There is a few guidelines that should be considered assigning InfoObjects to dimensions:
  • Use as many dimensions as necessary but it's more important to minimize dimension size rather than the number of dimensions.
  • Within the dimension only characteristics that have a 1:n relation should be added (e.g. material and product hierarchy)
  • Within a dimension there shouldn't be n:m relations. (e.g. product hierarchy and customer)
  • Document level InfoObjects or big characteristics should be designed as Line-Item dimensions. Line item dimensions are not a true dimensions they have a direct link between the fact table and the SID table. 
  • The most selective characteristics should be at the top of the dimension table
  • Don't mix characteristics with values that change frequently causing large dimension tables. (e.g. material and promotions)
  • Consider also to combine unrelated characteristics it can improve performance by reducing the number of table joins. (you only have 13 dimensions so combine the small ones)

As a help the report (SE38) SAP_INFOCUBE_DESIGNS can be used.
image001.png
This yellow marked dimension should be converted into a line item dimension if it contains a document level characteristic or it is simply bad design.

The maximum number of entries  a dimension potentially can have is calculated through the cartesian product of all SID's. (e.g. 10'000 customer and 1'000 product hierarchies lead to 10'000'000 possible combinations in the dimension table. It's unlikely that this is going to happen and while designing the dimension this should also be considered - analyzing the possibilities of all customers buying all products in this case.
In cases where there is an m:m relationship it usually means there is a missing entity between those two and therefore they should be stored in different dimensions.
Once data is loaded into the InfoCube a check on the actual number of records loaded into the dimension table vs. the number of record in the fact table should be done. As a rule of thumb the ratio should be between 1:10 and 1:20.


Degenerated Dimensions

If a large dimension table reaches almost the size of the fact table when measured the number of rows in the tables it's a degenerated dimension. The OLAP processer has to join two big tables which is bad for the query perfromance. Such dimensions can be marked as Line Item Dimensions causing the database not to create an actual dimension table. Checking the table /BIC/F<INFOCUBE> will then show that instead of the DMID dimension key the SID of the degenerated dimension table is placed in the fact table. (Field name RSSID). With this a join of the two tables is eliminated. Those dimensions can only hold one InfoObject as a 1:1 relationship must exist between the SID value and the DIMID.
Dimensions with a lot of unique values can be set to High Cardinality which changes the method of indexing dimensions. (ORA DB only) This results in a switch from a bitmap index to a B-Tree index.
image001-1.png
Defining a dimension as Line Item Dimension / High Cardinality


Conclusion

Finding the optimal model and balancing the size and the number of dimensions is a delicate excercise.
Dimensions in MultiProvider do not have to follow the underlying InfoCubes definitions. Those can be focused on the end users need and be structrured by the organizations meaning. This does not affect the performance as the MultiProvider does not have a physically existing datamodel on the database.    
Designing the dimension in an InfoCube correctly can have a significant improvement on performance!

By: Manohar Delampadi          scn.sap.com

Objective: The objective of this post is to simplify the understanding on dimension designs of an infocube and to decide upon the dimensions based on the repetition of the data held in the dimension tables.

Pre-requisites: An infocube is already created and active, and filled will data, which will be used for analysis of dimension tables.

Dimension to Fact Ratio Computation: This ratio is a percentage figure of the number of records that exists in the dimension table to the number of records in fact table or what percentage of fact table size is a dimension table. Mathematically putting it down, the equation would be as below:

          Ratio = No of rows in Dimension table X 100 / No of rows in Fact Table

Dimension Table Design Concept: We have been reading and hearing over and over again that the characteristics should be added into a dimension if there exists a 1:1 or 1:M relation and they should be in separate dimension if there exists a M:M relation. What is this 1:1 or 1: M? This is the relation which the characteristics share among each other.
For instance if one Plant can have only one Storage Location and one storage location can belong to only one plant at any given point of time, then the relation shared among them is 1:1.
If 1 Functional Location can have many equipment but one equipment can belong to only one functional location then the relation shared between the functional location and Equipment is 1:M.
If 1 sales order can have many materials and one material can exist in different sales orders then there absolutely is no dependence among these two and the relation between these two is many to many or M: M.

Challenges in understanding the relationship: Often we SAP BI consultants depend on the Functional consultants to help us out with the relationship shared between these characteristics / fields. Due to time constraint we generally cannot dedicate time to educate the functional consultants on the purpose of this exercise, and it takes a lot of time to understand this relationship thoroughly.


Scenario: An infocube ZPFANALYSIS had few dimensions which were way larger than the preferred 20% ratio. This had to be redesigned such that the performance was under 20% ratio.
This ratio could be either manually derived by checking the number of entries in the desired dimension table (/BIC/D<infocube name><dimension number>) to the fact table (/BIC/F<Infocube Name> or /BIC/E<Infocube name>) or a program SAP_INFOCUBE_DESIGNS can be executed in SE38 which reports this ratio for all the dimensions, for all the infocubes in the system.

SAP_INFOCUBE_DESIGNS:
1.jpg
We can find from the report that the total number of rows in the fact table is 643850. Dimension 2 (/BIC/DZPFANLSYS2) has around 640430 rows, which is 99% (99.49%)of the fact table rows and Dimension 4(/BIC/DZPFANLSYS4) has around 196250 rows, which is 30%  (30.48%)of the fact table rows.

Infocube ZPFANLSYS:
ZPFANLSYS.jpg

Approach:

Step 1: Analysis of the dimension table /BIC/DZPFANALSYS2 to plan on reducing the number of records.
/BIC/DZPFANLSYS2
3.jpg

Fact table:
4.jpg
Dimension table holds 1 record more than the fact table.
View the data in the table /BIC/DZPFANLSYS2 (Table related to Dimension 2) in SE12 and sort all the fields. This sorting will help us spot the rows which have repeated values for many columns, which will eventually lead to understanding the relationship between the characteristics (columns in dimension table).
5.jpg

Identifying the relationships:
Once the sorting is done we need to look out for the number of values that repeat across the columns. All the records which repeat could have been displayed in a single row with one dimension id assigned if all the columns had same data. The repetition is a result of one or more columns which contribute a unique value to each row. Such columns if removed from the table then the number of rows in the table will come down.

In the below screenshot I’ve highlighted the rows in green that were repeating themselves with new dimension IDs, as only 2 columns SID_ZABNUM and SID_0NPLDA have new values for every row. These two columns having new values for every row have resulted in rest of the columns repeating themselves and in turn increasing the data size in the dimension table. Hence it can be easily said that these two columns do not belong in this dimension tables, so the related characteristics (ZABNUM and 0NPLDA) need to be removed out of this dimension.
Few rows could be found which repeat themselves for most of the columns, but have a new value once in a while for some columns, as highlighted in yellow in the below screenshot. This indicates that these columns share a 1:M relation with the rest of the columns with repeated rows and these could be left in the same dimension.
6.jpg
Conclusion: The columns marked in green belong to this dimension tables and the columns marked in red needs to be in other dimension tables.
7.jpg
Step 2: Create a copy infocube C_ZPFAN and create new dimensions to accommodate ZABNUM and 0NPLDA.
8.jpg
ZABNUM was added to dimension C_ZPFAN8 and 0NPLDA was added to C_ZPFAN7. These were marked as line item dimensions as they have only one characteristic under them.
Analysed the issue with dimension 4 in the similar way and changed other dimensions to help the situation.

Post changes, loaded the data into the copy infocube C_ZPFAN and found the number of records in the dimension table /BIC/DC_ZPFAN2 to be 40286.
9.jpg

Ratio: 40286 / 657400 * 100 = 6.12 %


SAP_INFOCUBE_DESIGNS:
10.jpg

Dimension2 of the copy infocube: /BIC/DC_ZPFAN2
11.jpg
Even now there a few repeated rows and columns, but the ratio is within 20%. We can create up to 13 dimensions, but it is always better to keep a dimension or two free for future enhancements.

Hope this was helpful.

By: CSM Reddy from SCN

Run Program RSBKDTPREPAIR

RSDG_TRFN_ACTIVATE

RSDG_CUBE_ACTIVATE             Activation of InfoCubes

RSDG_ODSO_ACTIVATE             Activation of all ODS Objects

RSDG_IOBJ_ACTIVATE             Activation of all InfoObjects

RSDG_MPRO_ACTIVATE             Activating Multiproviders

RS_COMSTRU_ACTIVATE_ALL         Activate all inactive Communication Structures

RS_TRANSTRU_ACTIVATE_ALL         Activate Transfer Structure

RSAU_UPDR_REACTIVATE_ALL         Activate Update Rules
  
RRHI_HIERARCHY_ACTIVATE         Activate Hierarchies

SAP_AGGREGATES_ACTIVATE_FILL     Activating and Filling the Aggregates of an InfoCube

RS_PERS_ACTIVATE             Activating Personalization in Bex(Inactive are highlighted)

RSDDS_AGGREGATES_MAINTAIN —     attribute change run

RSSM_SET_REPAIR_FULL_REQUEST — changes full update to repair fulll request

RSTRSNSTRU_ACTIVATE_ALL–     activating transfer structure

RSPC_PROCESS_FINISH —         process chain details

RSDG_IOBJ_REORG         Repair InfoObjects

RSDG_IOBJ_REORG_TEXTS         Reorganization of Texts for InfoObjects

RSDG_MPRO_ACTIVATE         Activating Multiproviders

RSDG_MPRO_COPY             Make Multiprovider Copies

RSDG_MPRO_DELETE         Deleting Multiproviders

RS_COMSTRU_ACTIVATE_ALL     Activate all inactive Communication Structures

RS_TRANSTRU_ACTIVATE_ALL     Activate Transfer Structure

RSAU_UPDR_REACTIVATE_ALL     Activate Update Rules

RRHI_HIERARCHY_ACTIVATE     Activate Hierarchies

SAP_AGGREGATES_ACTIVATE_FILL    Activating and Filling the Aggregates of an InfoCube

SAP_AGGREGATES_DEACTIVATE     Deactivating the Aggregates of an InfoCube

RS_PERS_ACTIVATE         Activating Personalization in Bex(Inactive are highlighted)

RSSM_SET_REPAIR_FULL_FLAG     Convert Full Requests to Repair Full Requests

SAP_INFOCUBE_DESIGNS         Print a List of Cubes in The System and Their Layouts

SAP_ANALYZE_ALL_INFOCUBES     Create DB Statstics for all InfoCubes

SAP_CREATE_E_FACTTABLES     Create Missing E-Fact Tables for InfoCubes and Aggregates

SAP_DROP_EMPTY_FPARTITIONS     Locate/Remove Unused or Empty partitions of F-Fact Table

SAP_DROP_TMPTABLES         Remove Temperory Database Objects



Function Modules within BW.

Function Module Description (Function Group RRMX)

RRMX_WORKBOOK_DELETE         Delete BW Workbooks permanently from Roles & Favourites

RRMX_WORKBOOK_LIST_GET         Get list of all Workbooks

RRMX_WORKBOOK_QUERIES_GET     Get list of queries in a workbook

RRMX_QUERY_WHERE_USED_GET     Lists where a query has been used

RRMX_JUMP_TARGET_GET         Get list of all Jump Targets

RRMX_JUMP_TARGET_DELETE     Delete Jump Targets



Function Module Description

MONI_TIME_CONVERT         Used for Time Conversions.

CONVERT_TO_LOCAL_CURRENCY    Convert Foreign Currency to Local Currecny.

CONVERT_TO_FOREIGN_CURRENCY     Convert Local Currency to Foreign Currency.

TERM_TRANSLATE_TO_UPPER_CASE     Used to convert all texts to UPPERCASE

UNIT_CONVERSION_SIMPLE         Used to convert any unit to another unit. (Ref. table : T006)

TZ_GLOBAL_TO_LOCAL         Used to convert timestamp to local time

FISCPER_FROM_CALMONTH_CALC     Convert 0CALMONTH or 0CALDAY to Financial Year or Period

RSAX_BIW_GET_DATA_SIMPLE     Generic Extraction via Function Module

RSAU_READ_MASTER_DATA         Used in Data Transformations to read master data InfoObjects

Attribute tables:
·         Attribute tbl for Time Independent attributes:
·         /BI*/P<characteristic_name>
·         stored with characteristic values

Attribute tbl for Time Dependent attributes:
·         /BI*/Q<characteristic_name>
·         Fields DATETO & DATEFROM are included in time dependent attribute tbl.
·         stored with characteristic values

Dimension tables:
·         Dimension tbls (i.e. DIM tables): /BI*/D<Cube_name><dim.no.>
·         stores the DIMID, the pointer between fact tbl & master data tbl
·         data is inserted during upload of transact.data (data is never changed, only inserted)
·         Examples:
o    /bic/D(cube name)P is the package dimension of a content cube
o    /bic/D(cube name)U is the unit dimension of a content cube
o    /bic/D(cube name)T is the time dimension of a content cube
o    /bic/D(cube name)I is the user defined dimension of a content cube


External Hierarchy tables:
·         /BI*/I*, /BI*/J*, /BI*/H*, /BI*/K*
·         /BI0/0P...
·         are tables that occur in the course of an optimized preprocessing that contains many tables.
·         bic/H(object name) hierarchy data of object
·         For more information see Note 514907.


Fact tables:
·         In SAP BW, there are two fact tables for including transaction data for Basis InfoCubes: the F and the E fact tables.
o    /bic/F(cube name) is the F-fact table of a content cube
o    /bic/E(cube name) is the E-fact table of a content cube
·         The Fact tbl is the central tbl of the InfoCube. Here key figures (e.g. sales volume) & pointers to the dimension tbls are stored (dim tbls, in turn, point to the SID tbls).
·         If you upload data into an InfoCube, it is always written into the F-fact table.
·         If you compress the data, the data is shifted from the F-fact table to the E-fact table.
·         The F-fact tables for aggregates are always empty, since aggregates are compressed automatically
·         After a changerun, the F-fact table can have entries as well as when you use the functionality 'do not compress requests‘ for Aggregates.
·         E-fact tbl is optimized for Reading => good for Queries
·         F-fact tbl is optimized for Writing => good for Loads
·         see Note 631668 


Master Data tables
·         /BI0/P<char_name>
·         /bic/M(object name) master data of object
·         Master data tables are independent of any InfoCube
·         Master data & master data details (attributes, texts & hierarchies) are stored.
·         Master data table stores all time independent attributes (display & navigational attribues)


Navigational attributes tables:
·         SID Attribute table for time independent navigational attributes: /BI*/X<characteristic_name>
·         SID Attribute tbl for time dependent navigational attributes: /BI*/Y<characteristic_name>
·         Nav.attribs can be used for naviagtion purposes (filtering, drill down).
·         The attribs are not stored as char values but as SIDs (master data IDs).


P table:
·         P-table only gets filled if you load master data explicitly.
·         As soon as the SID table is populated, the P tbl is populated as well


SID table:
·         SID tbl: /BI*/S<characteristic>
·         stores the char value (eg customer number C95) & the SID. The SID is the pointer that is used to link the master data tbls & the dimension tbls. The SID is generated during the upload (uniqueness is guaranteed by a number range obj).
·         Data is inserted during the upload of master data or of transactional data
S table gets filled whenever transaction gets loaded. That means if any new data is there for that object in the transactions then SID table gets fillled.


Text table:
·         Text tbl: /BI*/T<characteristic>
·         stores the text for the chars
·         data is inserted & changed during the upload of text data attribs for the InfoObject
·         stored either language dependent or independent