Common Production Failures Encountered at BW / BI Production Support....

Summary
This Knowledge brief helps BW Consultants as Quick reference guide, in solving Complex production Issues,
Author: Devakar Reddy TatiReddy

1 Transactional RFC Error(trfc) – Non Updated IDOCs in the Source System.
1.1 Why does the error occur?
• tRFC – Transact Remote Function Call Error, occurs whenever LUW’s (Logical Unit of
Work’s) are not transferred from the source system to the destination system.
1.2 What happens when this error occur?
• Message appears in the bottom of the “Status” tab in RSMO. The error message would
appear like “tRFC Error in Source System” or “tRFC Error in Data Warehouse” or simply
“tRFC Error” depending on the system from where data is being extracted.
• Sometimes IDOC are also stuck on R/3 side as there were no processors available to
process them.
1.3 What can be the possible actions to be carried out?
• Once this error is encountered, we could try to Click a complete Refresh “F6” in RSMO,
and check if the LUW’s get cleared manually by the system.
• If after “couple” of Refresh, the error is as it is, then follow the below steps quickly as it
may happen that the load may fail with a short dump.
• Go to the menu Environment -> Transact. RFC -> In the Source System, from RSMO. It
asks to login into the source system.
• Once logged in, it will give a selection screen with “Date”, “User Name”, TRFC options.
• On execution with “F8” it will give the list of all Stuck LUW’s. The “Status Text” will appear Red for
the Stuck LUW’s which are not getting processed. And the “Target System” for those LUWs should
be “WP1CL015”, that’s the Bose BW Production system. Do not execute any other IDOC which is
not related have the “Target System” as “WP1CL015”.
• Right Click and “Execute” or “F6” after selection, those LUW’s which are identified properly. So that
they get cleared, and the load on BW side gets completed successfully.
• When IDocs are stuck go to R/3, use Tcode BD87 and expand ‘IDOC in inbound Processing’ tab for
IDOC Status type as 64 (IDoc ready to be transferred to application). Keep the cursor on the error
message (pertaining to IDOC type RSRQST only) and click Process tab (F8) . This will push any
stuck Idoc on R/3.
• Monitor the load for successful completion, and complete the further loads if any in the Process
Chain.


2 Time Stamp Error.
2.1 Why the error does occur?
• The “Time Stamp” Error occurs when the Transfer Rules/Structure (TR/TS) are internally inactive in
the system.
• They can also occur whenever the DataSources are changed on the R/3 side or the DataMarts are
changed in BW side. In that case, the Transfer Rules (TR) is showing active status when checked.
But they are actually not, it happens because the time stamp between the DataSource and the
Transfer Rules are different.
2.2 What happens when this error occur?
• The message appears in the Job Overview in RSMO, or in “Display Message” option of the Process
in the PC.
• Check the Transfer Rules in RSA1, Administrator Workbench.
2.3 What can be the possible actions to be carried out?
• Whenever we get such an error, we first need to check the Transfer Rules (TR) in the Administrator
Workbench. Check each rule if they are inactive. If so then Activate the same.
• You need to first replicate the relevant data source, by right click on the source system of D/s ->
Replicate Datasources.
• During such occasions, we can execute the following ABAP Report Program
“RS_TRANSTRU_ACTIVATE_ALL”. It asks for Source System Name, InfoSource Name, and 2
check boxes. For activating only those TR/TS which are set by some lock, we can check the option
for “LOCK”. For activating only those TR/TS which are Inactive, we check for the option for “Only
Inactive”.
• Once executed it will activate the TR/TS again within that particular InfoSource even though they are
already active.
• Now re-trigger the InfoPackage again.
• Monitor the load for successful completion, and complete the further loads if any in the Process
Chain.


3 Error occurred due to Short Dump.
3.1 Why does the error occur?
• Whenever a Job fails with an error “Time Out” it means that the job has been stopped
due to some reason, and the request is still in yellow state. And as a result of the same
it resulted in Time Out error. It will lead to a short dump in the system. Either in R/3 or
in BW.
• Short dump may also occur if there is some mismatch in the type of incoming data. For
example say date field is not in the format which is specified in BW, then it may happen
that instead of giving an error it may give a short dump. Every time we trigger the load.
3.2 What happens when this error occur?
• We would get a Time Out Error after the time which is specified in the Infopackage ->
Time Out settings (which may or may not be same for all InfoPackages). But by that
time in between, we may get a short dump in the BW system or in the Source System
R/3.
• The message appears in the Job Overview in RSMO, or in “Display Message” option of
the Process in the PC.
3.3 What can be the possible actions to be carried out?
• Usually “Time Out” Error results in a Short Dump. In order to check the Short Dump we go to the
following, Environment -> Short Dump -> In the Data Warehouse / -> In the Source System.
• Alternatively we can check the Transaction ST22, in the Source System / BW system. And then
choose the relevant option to check the short dump for the specific date and time. Here when we
check the short dump, make sure we go through the complete analysis of the short dump in detail
before taking any actions.
• In case of Time Out Error, Check whether the time out occurred after the extraction or not. It may
happen that the data was extracted completely and then there was a short dump occurred. Then
nothing needs to be done.
• In order to check whether the extraction was done completely or not, we can check the “Extraction”
in the “Details” tab in the Job Overview. Where in we can conclude whether the extraction was done
or not. If it is a “full load” from R/3 then we can also check the no. of records in RSA3 in R/3 and
check if the same no of records are loaded in BW.
• In the short dump we may find that there is a Runtime Error, "CALL_FUNCTION_SEND_ERROR"
which occurred due to Time Out in R/3 side.
• In such cases following could be done.
• If the data was extracted completely, then change the QM status from yellow to green. If “CUBE” is
getting loaded then create indexes, for ODS activate the request.
• If the data was not extracted completely, then change the QM status from yellow to red. Re-trigger
the load and monitor the same.
• Monitor the load for successful completion, and complete the further loads if any in the Process
Chain.


4 Job Cancellation in R/3 Source System.
4.1 Why does the error occur?
• If the job in R/3 system cancels due to some reasons, then this error is encountered. This may be
due to some problem in the system. Some times it may also be due to some other jobs running in
parallel which takes up all the Processors and the jobs gets cancelled on R/3 side.
• The error may or may not be resulted due to Time Out. It may happen that there would be some
system hardware problem due to which these errors could occur.
4.2 What happens when this error occurs?
• The Exact Error message is "Job termination in source system". The exact error message may also
differ, it may be “The background job for data selection in the source system has been terminated”.
Both the error messages mean the same. Some times it may also give “Job Termination due to
System Shutdown”.
• The message appears in the Job Overview in RSMO, or in “Display Message” option of the Process
in the PC.
4.3 What can be the possible actions to be carried out?
• Firstly we check the job status in the Source System. It can be checked through Environment -> Job
Overview -> In the Source System. This may ask you to login to the source system R/3. Once logged
in it will have some pre-entered selections, check if they are relevant, and then Execute. This will
show you the exact status of the job. It should show “X” under Canceled.
• The job name generally starts with “BIREQU_” followed by system generated number.
• Once we are confirm that this error has occurred due to job cancellation, we then check the status of
the ODS, Cube under the manage tab. The latest request would be showing the QM status as Red.
• We need to re-trigger the load again in such cases as the job is no longer active and it is cancelled.
We re-trigger the load from BW.
• We first delete the Red request from the manage tab of the InfoProvider and then re-trigger the
InfoPackage.
• Monitor the load for successful completion, and complete the further loads if any in the Process
Chain.


5 Incorrect data in PSA.
5.1 Why the error does occur?
• It may happen some times that the incoming data to BW is having some incorrect format, or few
records have few incorrect entries. For example, expected value was in upper case and data is in
lower case or if the data was expected in numeric form, but the same was provided in Alpha
Numeric.
• The data load may be a Flat File load or it may be from R/3. Mostly it may seem that the Flat File
provided by the users may have incorrect format.
5.2 What happens when this error occur?
• The error message will appear in the job overview and will guide you what exactly we need to do for
the error occurred.
The message on the bottom of the “Header” tab of the Job Overview in RSMO will have “PSA Pflege”
written on it, which will give u direct link to the PSA data
5.3 What can be the possible actions to be carried out?
• Once confirmed with the error, we go ahead and check the “Detail” tab of the Job Overview to check
which Record, field and what in the data has the error.
• Once we make sure from the Extraction, in the Details tab in the Job Overview that the data was
completely extracted, we can actually see here, which record, which field, has the erroneous data.
Here we can also check the validity of the data with the previous successful load PSA data.
• When we check the data in the PSA, it will show the record with error with traffic signal as “Red”. In
order to change data in PSA, we need to have the request deleted from Manage Tab of the
InfoProvider first, only then it will allow to change the data in PSA.
• Once the change in the specific field entry in the record in PSA is done, we then save it. Once data
in PSA is changed. We then again reconstruct the same request from the manage tab. Before we
could reconstruct the request, it needs to have QM status as “Green”.
• This will update the records again which are present in the request
• Monitor the load for successful completion, and complete the further loads if any in the Process
Chain.


6 ODS Activation Failed.
6.1 Why does the error occur?
• During data load in ODS, It may happen sometimes that the data gets extracted and loaded
completely, but then at the time of the ODS activation it may fail giving status 9 error.
• Or due to lack of resources, or cause of an existing failed request in the ODS. For Master Data it is
fine if we have an existing failed request.
• This happens as there are Roll back Segment errors in Oracle Database and gives an error ORA-
00060. When activation of data takes place data is read in Active data table and then either Inserted
or Updated. While doing this there are system dead locks and Oracle is unable to extend the extents.
6.2 What happens when this error occur?
• The exact error message would be like “Request REQU_3ZGI6LEA5MSAHIROA4QUTCOP8, data
package 000012 incorrect with status 9 in RSODSACTREQ”. Some times it may accompany with
“Communication error (RFC call) occurred” error. It is actually due to some system error.
• The message appears in the Job Overview in RSMO, or in “Display Message” option of the Process
in the PC.
• The exact error message is “ODS Activation Failed”.
6.3 What can be the possible actions to be carried out?
• Whenever such error occurs the data is may or may not be completely loaded. It is only while
activation it fails. Hence when we see the details of the job, we can actually see which data package
failed during activation.
• We can once again try to manually Activate the ODS, here do not change the QM status as in
Monitor its green but within the Data Target it red. Once the data is activated QM status turns into
Green .
• For successful activation of the failed request, click on the “Activate” button at the bottom, which will
open another window which will only have the request which is/are not activated. Select the request
and then check the corresponding options on the bottom. And then Click on “Start”
• This will set a background job for activation of the selected request.
• Monitor the load for successful completion, and complete the further loads if any in the Process
Chain.
• In case the above does not work out, we check the size of the Data Package specified in the
InfoPackage. In InfoPackage -> Scheduler -> DataS. Default Data Transfer. Here we can set the size
of the Data Package. Here we need to “reduce” the maximum size of the data package. So that
activation takes place successfully.
• Once the size of the Data Package is reduced we again re trigger the load and reload the complete
data again.
• Before starting the manual activation, it is very important to check if there was an existing failed
“Red” Request. If so make sure you delete the same before starting the manual activation.
• This error is encountered at the first place and then rectified as at that point in time system is not
able to process the activation process via 4 different Parallel processes. This parameter is set in
RSCUSTA2 transaction. Later on the resources are free so the activation completes successfully.


7 Caller 70 is missing.
7.1 Why does the error occur?
• This error normally occurs whenever BW encounters error and is not able to classify them. There
could be multiple reasons for the same
o Whenever we are loading the Master Data for the first time, it creates SID’s. If system is
unable to create SID’s for the records in the Data packet, we can get this error message.
o If the Indexes of the cube are not deleted, then it may happen that the system may give the
caller 70 error.
o Whenever we are trying to load the Transactional data which has master data as one of the
Characteristics and the value does not exist in Master Data table we get this error. System
can have difficultly in creating SID’s for the Master Data and also load the transactional data.
o If ODS activation is taking place and at the same time there is another ODS activation
running parallel then in that case it may happen that the system may classify the error as
caller 70. As there were no processes free for that ODS Activation.
o It also occurs whenever there is a Read/Write occurring in the Active Data Table of ODS.
For example if activation is happening for an ODS and at the same time the data loading is
also taking place to the same ODS, then system may classify the error as caller 70.
o It is a system error which can be seen under the “Status” tab in the Job over View.
7.2 What happens when this error occur?
• The exact error message is “System response "Caller 70" is missing”.
• It may happen that it may also log a short dump in the system. It can be checked at "Environment ->
Short dump -> In the Data Warehouse".
7.3 What can be the possible actions to be carried out?
• If the Master Data is getting loaded for the first time then in that case we can reduce the Data
Package size and load the Info Package. Processing sometimes is based on the size of Data
Package. Hence we can reduce the data package size and then reload the data again. We can also
try to split the data load into different data loads
• If the error occurs in the cube load then we can try to delete the indexes of the cube and then reload
the data again.
• If we are trying to load the Transactional and Master Data together and this error occurs then we can
reduce the size of the Data Package and try reloading, as system may be finding it difficult to create
SID’s and load data at the same time. Or we can load the Master Data first and then load
Transactional Data
• If the error is happening while ODS activation cause of no processes free, or available for processing
the ODS activation, then we can define processes in the T Code RSCUSTA2.
• If error is occurring due to Read/Write in ODS then we need to make changes in the schedule time of
the data loading.
• Once we are sure that the data has not been extracted completely, we can then go ahead and delete
the red request from the manage tab in the InfoProvider. Re-trigger the InfoPackage again.
• Monitor the load for successful completion, and complete the further loads if any in the Process
Chain.


8 Attribute Change Run Failed – ALEREMOTE was locked.
8.1 Why does the error occur?
• During Master Data loads, some times a lock is set by system user ALEREMOTE.
• This normally occurs when HACR is running for some other MD load, and system tries to carry out
HACR for this new MD. This is a scheduling problem.
8.2 What happens when this error occur?
• The message appears in the Job Overview in RSMO, or in “Display Message” option of the Process
in the PC.
• The exact error message would be like, “User ALEREMOTE locked the load of master data for
characteristic 0CUSTOMER”. Here it is specifically for the 0CUSTOMER load. It may be different
related to Master Data InfoObject which is getting loaded.
8.3 What can be the possible actions to be carried out?
• Check the error message completely and also check the long text of the error message, as it will tell
you the exact Master Data which is locked by user ALEREMOTE.
• The lock which is set is because of load and HACR timing which clashed. We first need to check
RSA1 -> Tools -> HACR, where in we would get the list of InfoObjects on which HACR is currently
running. Once that is finished only then, go to the TCode SM12. This will give you few options and
couple of default entries. When we list the locks, it will display all the locks set. Delete the lock for the
specific entry only else it may happen that some load which was running may fail, due to the lock
released.
• Now we choose the appropriate lock which has caused the failure, and click on Delete. So that the
existing lock is released. Care should be taken that we do not delete an active running job.
Preferable avoid this solution
• When HACR finishes for the other Master Data, trigger Attribute change run for this Master Data.


9 SAP R/3 Extraction Job Failed.
There are certain jobs which are triggered in R/3 based upon events created there. These events are
triggered from SAP BW via ABAP Program attached in Process Chains. This extract job also triggers along
with it a extract status job. The extract status job will send the status back to BW with success, failure. Hence
it is important that the extract job, and the extract status job both get completed. This is done so that on
completion of these jobs in R/3, extraction jobs get triggered in R/3 via Info pack from BW. Error may occur
in the extract job or in the extract status job.
9.1 What happens when this error occur?
• The exact error message normally can be seen in the source system where the extraction occurs. In
BW the process for program in the PC will fail.
• This Process is placed before the InfoPackage triggers, hence if the extraction program in R/3 is still
running or is not complete, or is failed, the InfoPackage will not get triggered. Hence it becomes very
important to monitor such loads through RSPC rather than through RSMO.
9.2 What can be the possible actions to be carried out?
• We login to the source system and then check the Tx Code SM37, for the status of the job running in
R/3. Here it will show the exact status of the running job.
• Enter the exact job name, user, date, and choose the relevant options, then execute. It will show a
list of the job, which is Active with that name. You may also find another job Scheduled for the next
load, Cancelled job if any, or previous finished job. The active job is the one which is currently
running.
• Here if the job status for the “Delay (sec.)” is increasing instead of “Duration(sec.)” then it means
there is some problem with the extraction job. It is not running, and is in delay.
• It may happen sometimes that there is no active job and there is a job which is in finished status with
the current date/time.
• The extract job and the status job both needs to be checked, because it may happen that the extract
job is finished but the extract status job has failed, as a result of which it did not send success status
to BW. But the extraction was complete. In such cases, we manually change the status of the Extract
Program Process in the PC in BW to green with the help of the FM “ZRSPC_ABAP_FINISH”.
Execute the FM with the correct name of the Program process variant and the status “F”. This will
make the Process green triggering the further loads. Here we need to check if there is no previous
Extract Program Process is running in the BW system. Hence we need to check the PC logs in detail
for any previous existing process pending.
• Monitor the PC to complete the loads successfully.
• If in case we need to make the ABAP Process within the PC to turn “RED” and retrigger the PC, then
we execute the FM “ZRSPC_ABAP_FINISH” with the specific variant and Job Status as “R” – which
will turn the ABAP process RED.
• This usually needs to be done when the Extraction Job was cancelled in R/3 due to some reason &
we have another job in Released state and the BW ABAP Process is in Yellow state. We can then
make the ABAP Process RED via the FM, and then re-trigger the PC.


10 File not found (System Command for file check failed).
10.1 Why the error does occur?
• The system command process is placed in a PC before the infopackage Process. Hence it will check
for the Flat File on the application server before the infopackage is triggered. This will ensure that
when the load starts it has a Flat File to upload.
• It may happen that the file is not available and the system command process fails. In that case it will
not trigger the InfoPackage. Hence it is very important to monitor the PC through RSPC.
10.2 What happens when this error occur?
• The error message will turn the System Command Process in the PC “Red” and the UNIX Script
which has failed will have a specific return code which determines that the script has failed.
10.3 What can be the possible actions to be carried out?
• Whenever the system command process fails it indicated that the file is not present. We right click on
the Process and “Display Message” we get to see the failed script. Here we need to check the return
code.Here if exit status is –1 then failure i.e. Process becomes Red, else it becomes Green in PC.
• We need to check the script carefully for the above mentioned exit status. And then only conclude
that the file was really not available.
• Once confirmed that the file is not available we need to take appropriate actions.
• We need to identify the person who is responsible for FTPing the file on the Application server. A
mail already goes to the responsible person, via the error message in the Process. But we also need
to send a mail, regarding the same.
• The Process Chains which are having the system command Process in them, and the corresponding
actions to be taken.


11 Table space issue.

11.1 Why does the error occur?
• Many a times, particularly with respect to HACR while the Program is doing realignment of
aggregates it needs lot of temporary table space [PSATEMP]. If there is a large amount of data to be
processed and if Oracle is not able to extend the table space it gives a dump.
• This normally happens if there are many aggregates created on the same day or there is a large
change in the incoming Master data / Hierarchy, so that large amount of temporary memory is
needed to perform the realignment.
• Also whenever the PSAPODS (Which houses the many tables) is full, the data load / ODS Activation
stops and hence we may get failures.
11.2 What happens when this error occur?
• The Error ORA - 01653 and ORA - 01688 – Relates to issues with table space. It will give error as
the ORA number which asks to increase the table space.
11.3 What can be the possible actions to be carried out?
• In case the table space is full then we need to contact the Basis and accordingly ask for a increase in
the size of the table space.
• The increase of the table space is done by changing some parameters allocating more space which
is defined for individual tables.


12 How is it possible to restart a process chain at a failed step/request?
Sometimes, it doesn't help to just set a request to green status in order to run the process chain from that
step on to the end.
You need to set the failed request/step to green in the database as well as you need to raise the event that
will force the process chain to run to the end from the next request/step on.
Therefore you need to open the messages of a failed step by right clicking on it and selecting 'display
messages'.
In the opened popup click on the tab 'Chain'.
In a parallel session goto transaction se16 for table rspcprocesslog and display the entries with the following
selections:
1. copy the variant from the popup to the variante of table rspcprocesslog
2. copy the instance from the popup to the instance of table rspcprocesslog
3. copy the start date from the popup to the batchdate of table rspcprocesslog
Press F8 to display the entries of table rspcprocesslog.
Now open another session and goto transaction se37. Enter RSPC_PROCESS_FINISH as the name of the
function module and run the fm in test mode.
Now copy the entries of table rspcprocesslog to the input parameters of the function module like described
as follows:
1. rspcprocesslog-log_id -> i_logid
2. rspcprocesslog-type -> i_type
3. rspcprocesslog-variante -> i_variant
4. rspcprocesslog-instance -> i_instance
5. enter 'G' for parameter i_state (sets the status to green).
Now press F8 to run the fm.
Now the actual process will be set to green and the following process in the chain will be started and the
chain can run to the end.
ABAP PROGRAM:
*&---------------------------------------------------------------------*
*& Report ZRSPC_PROCESS_FINISH *
*& *
*&---------------------------------------------------------------------*
************************************************************************
* Author: Jesper Christensen
* Date: Mar 22nd 2006
* Type: Executable Program
* Purpose/Description : Restart process chain after a failed request
*
************************************************************************
* MODIFICATION LOG
************************************************************************
* Date | Change Number | Initials | Description
************************************************************************
* 03/22/06 JMCHRIS Program created
*
*
************************************************************************
REPORT zrspc_process_finish .
PARAMETERS: VARIANT TYPE rspc_variant OBLIGATORY,
INSTANCE TYPE rspc_instance OBLIGATORY,
DATE TYPE SY-DATUM OBLIGATORY,
state TYPE rspc_state OBLIGATORY default 'G'.
DATA : logid TYPE rspc_logid,
chain TYPE rspc_chain,
type TYPE rspc_type,
p_vari TYPE rspc_variant,
instan TYPE rspc_instance,
jobcount TYPE btcjobcnt,
batchdat TYPE btcreldt,
batchtim TYPE btcreltm.
DATA: LS_PCLOG LIKE RSPCPROCESSLOG.
* select the process log
SELECT SINGLE * FROM RSPCPROCESSLOG INTO LS_PCLOG
where variante = variant
and instance = instance
and batchdate = date.
if sy-subrc = 0.
* Set the status
CALL FUNCTION 'RSPC_PROCESS_FINISH'
EXPORTING
i_logid = LS_PCLOG-log_id
* i_chain = LS_PCLOG-chain
i_type = LS_PCLOG-type
i_variant = LS_PCLOG-variante
i_instance = LS_PCLOG-instance
i_state = state
* i_job_count = jobcount
i_batchdate = LS_PCLOG-batchdate
* i_batchtime = batchtim
EXCEPTIONS
error_message = 1.
IF sy-subrc <> 0.
MESSAGE ID sy-msgid TYPE 'I' NUMBER sy-msgno
WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.
else.
message E000(YBW_USR_MON) with
'Process selected does not exist ' ' - Check you entry'.
endif.






3 comments:

baaluu4u said...

thanq u so much for such a nice & informative document....Devakar Reddy TatiReddy & prashanth.

Deepa said...

Hi

Am new in BI.The document is very informative & easily understandable.

Many thanks for such a useful blog.


Regards,

Deepa.

Unknown said...

Hello... I have an issue.. When executing infopackage from R3 rfc to bw it should bring 164 records but instead brings 164 many times, millions of records in a loop. What should we set up in order to avoiding many invocations? Thanks a lot