Friday, April 13, 2012

Fine Tuning Incremental Updates using LOAD PRUNE

If you are like most people, you probably use the LOAD_AND_AGGREGATE cube script that is automatically created by Analytic Workspace Manager. Fine tuning the update process simply involves filtering the fact table for new or changed rows.

With a little bit of effort, you can improve update times by writing your own cube processing script. You can also use MV log tables to automatically captured changes made to the fact table and use them as the data sources to cube updates.

AWM defines and makes the LOAD_AND_AGGREGATE script the default script of the cube. If you don’t specify a different script, LOAD_AND_AGGREGATE is automatically used as shown in the following example (note that the script references the OLAPTRAIN.SALES_CUBE but does not including the USING clause).

BEGIN
DBMS_CUBE.BUILD('OLAPTRAIN.SALES_CUBE','C',false,4,true,true,false);
END;
/

This script will run the LOAD PARALLEL and SOLVE PARALLEL commands. What this means is that for each partition, the database will LOAD data from the fact table/view and then SOLVE (aggregate) data. If you have specified a value for parallel that is greater than 1, partitions will be processed in parallel (in the example above, 4 processes). AWM also provides the ability to set the refresh method (C, or complete, in the above example).

LOAD_AND_AGGREGATE is a good choice for a full build, but it might not be the best choice for an incremental update. If you are simply updating the cube with changes within a few recent partitions (e.g., yesterday or this month), the LOAD PRUNE command is probably better than LOAD PARALLEL.

LOAD PRUNE will first query the fact table or view to first determine which partition will have new data using a SELECT DISTINCT. It will then only generate LOAD commands for those partitions that will have records loaded into them.
Let’s run through an update scenario. Make the following assumptions:

* The time dimension has months for 2008 through 2012 and the cube is partitioned by month. The cube will have 60 partitions.

* You have loaded data into the cube for January 2008 through March 2012.

* It’s now time to load data for April 2012. This data has been inserted into the fact table.

* You have mapped the cube to a view. For the April 2012 update, you have added a filter to the view so that it returns data only for April.

If you use the LOAD_AND_AGGREGATE script and choose the FAST SOLVE refresh method, the database will really to the following:

BEGIN
DBMS_CUBE.BUILD('OLAPTRAIN.SALES_CUBE USING (LOAD PARALLEL, SOLVE PARALLEL)','S',false,4,true,true,false);
END;
/

With LOAD PARALLEL, the database will process the LOAD command for each partition (all 60). Since it’s selecting from a view that’s filtered out all but April 2012, 59 partitions will have no new or changed data. Although it doesn’t take a long time to load 0 rows and figure out that a SOLVE is not required, it still adds up if there are a lot of partitions.

With LOAD PRUNE, the database will determine that a LOAD is only required for April 2012. The LOAD step is skipped for all other partitions. While you will still see the SOLVE for all partitions, it doesn’t really do any work because no rows were loaded into the partition. An example using LOAD PRUNE follows.

BEGIN
DBMS_CUBE.BUILD('OLAPTRAIN.SALES_CUBE USING (LOAD PRUNE, SOLVE PARALLEL)','S',false,2,true,true,false);
END;
/

If you would like a script that walked through a complete example using the OLAPTRAIN schema, including the use of an MV log table to automatically capture changes to the fact table, send me an email william.endress@oracle.com with a link to this posting.

3 comments:

Eddie said...

Hi Bud,

I have a cube with information of 2010,2011 and 2012.

I tried to implement LOAD PRUNE option, it works for new added records but doesn't update data that already exists in the cube,

I'm mapping a view filtered by month

This is the script:

BEGIN
DBMS_CUBE.BUILD('FINANCIAL.GL_CUBE USING (LOAD PRUNE, SOLVE PARALLEL)','S',false,8,true,true,false);
END;

Is it correct?

Thanks in advance

Bud Endress said...

LOAD PRUNE will do a select distinct on the column representing the partitioned dimension. If there are rows for the partition, the partition will be processed. If there are no rows in the fact table/view for that partition, that partition will not be processed.

Look to see that the updated data is not filtered out in your view.

Bud Endress said...

LOAD PRUNE will do a select distinct on the column representing the partitioned dimension. If there are rows for the partition, the partition will be processed. If there are no rows in the fact table/view for that partition, that partition will not be processed.

Look to see that the updated data is not filtered out in your view.