Monday, May 14, 2012

Creating Custom (Calculated) Members

If you have ever wanted to report on the total of two or more dimension members you have probably done so using in SQL using SUM ... GROUP BY. 

Let's look at an example using the OLAPTRAIN schema.  Suppose you want the total of the states in the northeastern United States.  In OLAPTRAIN, that would include Massachusetts, Rhode Island, New York and Pennsylvania..

If you did this with SQL, it might look something like this.

SELECT t.calendar_year_long_descr AS calendar_year,
  g.country_long_description      AS country,
  p.all_products_long_descri      AS all_products,
  c.all_channels_long_descri      AS all_channels,
  SUM(f.sales)
FROM time_calendar_view t,
  product_standard_view p,
  geography_regional_view g,
  channel_sales_channel_view c,
  sales_cube_view f
WHERE t.dim_key                 = f.time
AND p.dim_key                   = f.product
AND g.dim_key                   = f.geography
AND c.dim_key                   = f.channel
AND t.level_name                = 'CALENDAR_YEAR'
AND p.level_name                = 'ALL_PRODUCTS'
AND c.level_name                = 'ALL_CHANNELS'
AND g.level_name                = 'STATE_PROVINCE'
AND g.state_province_long_desc IN ('Massachusetts','Rhode Island','New York', 'Pennsylvania')
AND t.calendar_year_long_descr  = 'CY2009'
AND g.country_long_description  = 'United States'
GROUP BY t.calendar_year_long_descr,
  g.country_long_description,
  p.all_products_long_descri,
  c.all_channels_long_descri;

And the query would return as

CY2009    United States    All Products    All Channels    10828869.07

A different way to approach this is to create a custom member that is the aggregate of other members.  In this case, the custom member is added to the dimension and can be used just like any other dimension member.  The only real difference is that a custom member is not within a hierarchy and does not belong to a level.  The advantages are that the custom member is available to all users (unless you control access, more on that later), they work with all of the cube's aggregation rules (e.g., first, last, hierarchical weighted average and so on), they work seamlessly with calculated measures and they are available in all tools (e.g., Excel PivotTables).

Custom aggregates are created using the dbms_cube.import program.  Note that the dimension keys are numeric in OLAPTRAIN.  (Sorry for posting this sample as an image ... blogger wasn't happy about displaying XML.  To view the full example option the image in a new tab or window).




I can view custom members in the dimension or hierarchy view with a query such as this:

SELECT * FROM geography_view WHERE member_type = 'C';

I can query the cube view with a simplified query (note the lack of SUM ... GROUP BY).

SELECT t.long_description AS time,
  g.long_description      AS geog,
  p.long_description      AS product,
  c.long_description      AS channel,
  f.sales
FROM time_calendar_view t,
  product_standard_view p,
  geography_regional_view g,
  channel_sales_channel_view c,
  sales_cube_view f
WHERE t.dim_key                 = f.time
AND p.dim_key                   = f.product
AND g.dim_key                   = f.geography
AND c.dim_key                   = f.channel
AND t.level_name                = 'CALENDAR_YEAR'
AND p.level_name                = 'ALL_PRODUCTS'
AND c.level_name                = 'ALL_CHANNELS'
AND g.long_description          = 'Northeast US'
AND t.calendar_year_long_descr  = 'CY2009';

And the query returns Northeast US rather than a modified version of United States.

CY2009    Northeast US    All Products    All Channels    10828869.07

If it is appropriate for the custom members to be visable to all users, there's nothing else to do.  If you want to control access to custom member, Virtual Private Database is probably a good choice.  

If you would like a sample script that creates a few custom members and sets up a VPD policy to control access, send me an email at william.endress@oracle.com.







Friday, April 13, 2012

Fine Tuning Incremental Updates using LOAD PRUNE

If you are like most people, you probably use the LOAD_AND_AGGREGATE cube script that is automatically created by Analytic Workspace Manager. Fine tuning the update process simply involves filtering the fact table for new or changed rows.

With a little bit of effort, you can improve update times by writing your own cube processing script. You can also use MV log tables to automatically captured changes made to the fact table and use them as the data sources to cube updates.

AWM defines and makes the LOAD_AND_AGGREGATE script the default script of the cube. If you don’t specify a different script, LOAD_AND_AGGREGATE is automatically used as shown in the following example (note that the script references the OLAPTRAIN.SALES_CUBE but does not including the USING clause).

BEGIN
DBMS_CUBE.BUILD('OLAPTRAIN.SALES_CUBE','C',false,4,true,true,false);
END;
/

This script will run the LOAD PARALLEL and SOLVE PARALLEL commands. What this means is that for each partition, the database will LOAD data from the fact table/view and then SOLVE (aggregate) data. If you have specified a value for parallel that is greater than 1, partitions will be processed in parallel (in the example above, 4 processes). AWM also provides the ability to set the refresh method (C, or complete, in the above example).

LOAD_AND_AGGREGATE is a good choice for a full build, but it might not be the best choice for an incremental update. If you are simply updating the cube with changes within a few recent partitions (e.g., yesterday or this month), the LOAD PRUNE command is probably better than LOAD PARALLEL.

LOAD PRUNE will first query the fact table or view to first determine which partition will have new data using a SELECT DISTINCT. It will then only generate LOAD commands for those partitions that will have records loaded into them.
Let’s run through an update scenario. Make the following assumptions:

* The time dimension has months for 2008 through 2012 and the cube is partitioned by month. The cube will have 60 partitions.

* You have loaded data into the cube for January 2008 through March 2012.

* It’s now time to load data for April 2012. This data has been inserted into the fact table.

* You have mapped the cube to a view. For the April 2012 update, you have added a filter to the view so that it returns data only for April.

If you use the LOAD_AND_AGGREGATE script and choose the FAST SOLVE refresh method, the database will really to the following:

BEGIN
DBMS_CUBE.BUILD('OLAPTRAIN.SALES_CUBE USING (LOAD PARALLEL, SOLVE PARALLEL)','S',false,4,true,true,false);
END;
/

With LOAD PARALLEL, the database will process the LOAD command for each partition (all 60). Since it’s selecting from a view that’s filtered out all but April 2012, 59 partitions will have no new or changed data. Although it doesn’t take a long time to load 0 rows and figure out that a SOLVE is not required, it still adds up if there are a lot of partitions.

With LOAD PRUNE, the database will determine that a LOAD is only required for April 2012. The LOAD step is skipped for all other partitions. While you will still see the SOLVE for all partitions, it doesn’t really do any work because no rows were loaded into the partition. An example using LOAD PRUNE follows.

BEGIN
DBMS_CUBE.BUILD('OLAPTRAIN.SALES_CUBE USING (LOAD PRUNE, SOLVE PARALLEL)','S',false,2,true,true,false);
END;
/

If you would like a script that walked through a complete example using the OLAPTRAIN schema, including the use of an MV log table to automatically capture changes to the fact table, send me an email william.endress@oracle.com with a link to this posting.

Tuesday, April 3, 2012

Excel and OLAP: ODBC vs. MDX

A question that I often get is "what is the difference between using ODBC and the MDX Provider for Oracle OLAP (from Simba Technologies) to query Oracle cubes"? Given that the Oracle cube is easily queried with SQL, it's a reasonable question.
The answer really boils down to leveraging meta data and automatic query generation.
With ODBC, it's up to the Excel user to write a SQL query to fetch data from the cube. Data can be returned in tabular format or a pivot table. When the data is viewed in a pivot table Excel will aggregate data, sometimes with unexpected results. For example Excel might choose to aggregate a measure such as Sales with COUNT or might try to SUM a measure such as Sales YTD Percent Change. Neither make any sense. It's up to the user to get it right.
With the MDX Provider, Excel understands what all the columns mean. It understands dimensions, hierarchies and levels. It's understand the difference between a key and a label. It knows what a measure is. It allows the server to calculate the data. Query generation is automatic. Business users just choose hierarchies and measures and the MDX Provider does the rest.
Here's a list of some of the advantages of using the MDX Provider for Oracle OLAP as compared to using ODBC and writing your own SQL.


Monday, April 2, 2012

Oracle OLAP Exadata Performance Demonstration

For a great paper on Oracle OLAP running on Exadata, see:

http://www.oracle.com/technetwork/database/options/olap/olap-exadata-x2-2-performance-1429042.pdf

The Executive Overview section of this paper provides an introduction:

This paper describes a performance demonstration of the OLAP Option to the Oracle Database running on an X2-2 Exadata Database Machine half rack. It shows how Oracle OLAP cubes can be used to enhance the performance and analytic content of the data warehouse and business intelligence solutions, supporting a demanding user community with ultrafast query and rich analytic content.

The demonstration represents users of a business intelligence application using SQL to query an Oracle OLAP cube that has been enhanced with a variety of analytic measures. The cube contains data loaded from a fact table with more than 1 billion rows.

Utilizing Exadata features such as Smart Flash Cache, Oracle Database supported a community of 50 concurrent users querying the cube with queries that are typical of those executed from a business intelligence tool such as Oracle Business Intelligence Enterprise Edition.

With each user querying the database non-stop (without waits between queries) with median query times ranged from .03 to .58 seconds, average query times ranged from .26 to 2.32 seconds, and 95 percent of queries returned in 1.5 to 5.5 seconds, depending on the type of query.

Query performance can be attributed to highly optimized data types and Exadata Smart Flash Cache. Cubes are designed for fast access to random data points, using features such as array-based storage, cost-based aggregation, and joined cube scans. Exadata Smart Flash Cache contributes significantly to cube query performance, virtually eliminating IO wait for the high volume, random IO typically seen with cube queries.

Monday, January 17, 2011

Script for Time Dimension Table

Note - This blog post was updated on Nov. 14, 2012 with a new script.  This as been simplified a bit and includes half year.

One of the more common requests I get is a script for creating time dimension tables for Oracle OLAP. The following script will create a time dimension table for a standard calendar. It starts by creating a table with dimension members and labels. The second part of the script fills in end date and time span attributes. The section that creates end date and time span can be easily adapted for completing other calendars (e.g., fiscal) where the members have already been filled in.


--
-- Create time dimension table for a standard calendar year (day, month,
-- quarter, half year and year).
--
-- Drop table.
--
--DROP TABLE time_calendar_dim;
--
-- Create time dimension table for calendar year.
--
-- First day if the next day after TO_DATE('31/12/2010','DD/MM/YYYY').
-- Number of days is set in CONNECT BY level <= 365.
--
-- Values for end date and time span attributes are place holders. They need
-- to be filled in correctly later in this script.
--
CREATE TABLE time_calendar_dim AS
WITH base_calendar AS
  (SELECT CurrDate          AS Day_ID,
    1                       AS Day_Time_Span,
    CurrDate                AS Day_End_Date,
    TO_CHAR(CurrDate,'Day') AS Week_Day_Full,
    TO_CHAR(CurrDate,'DY')  AS Week_Day_Short,
    TO_NUMBER(TRIM(leading '0'
  FROM TO_CHAR(CurrDate,'D'))) AS Day_Num_of_Week,
    TO_NUMBER(TRIM(leading '0'
  FROM TO_CHAR(CurrDate,'DD'))) AS Day_Num_of_Month,
    TO_NUMBER(TRIM(leading '0'
  FROM TO_CHAR(CurrDate,'DDD'))) AS Day_Num_of_Year,
    UPPER(TO_CHAR(CurrDate,'Mon')
    || '-'
    || TO_CHAR(CurrDate,'YYYY')) AS Month_ID,
    TO_CHAR(CurrDate,'Mon')
    || ' '
    || TO_CHAR(CurrDate,'YYYY') AS Month_Short_Desc,
    RTRIM(TO_CHAR(CurrDate,'Month'))
    || ' '
    || TO_CHAR(CurrDate,'YYYY') AS Month_Long_Desc,
    TO_CHAR(CurrDate,'Mon')     AS Month_Short,
    TO_CHAR(CurrDate,'Month')   AS Month_Long,
    TO_NUMBER(TRIM(leading '0'
  FROM TO_CHAR(CurrDate,'MM'))) AS Month_Num_of_Year,
    'Q'
    || UPPER(TO_CHAR(CurrDate,'Q')
    || '-'
    || TO_CHAR(CurrDate,'YYYY'))     AS Quarter_ID,
    TO_NUMBER(TO_CHAR(CurrDate,'Q')) AS Quarter_Num_of_Year,
    CASE
      WHEN TO_NUMBER(TO_CHAR(CurrDate,'Q')) <= 2
      THEN 1
      ELSE 2
    END AS half_num_of_year,
    CASE
      WHEN TO_NUMBER(TO_CHAR(CurrDate,'Q')) <= 2
      THEN 'H'
        || 1
        || '-'
        || TO_CHAR(CurrDate,'YYYY')
      ELSE 'H'
        || 2
        || '-'
        || TO_CHAR(CurrDate,'YYYY')
    END                      AS half_of_year_id,
    TO_CHAR(CurrDate,'YYYY') AS Year_ID
  FROM
    (SELECT level n,
      -- Calendar starts at the day after this date.
      TO_DATE('31/12/2010','DD/MM/YYYY') + NUMTODSINTERVAL(level,'DAY') CurrDate
    FROM dual
      -- Change for the number of days to be added to the table.
      CONNECT BY level <= 365
    )
  )
SELECT day_id,
  day_time_span,
  day_end_date,
  week_day_full,
  week_day_short,
  day_num_of_week,
  day_num_of_month,
  day_num_of_year,
  month_id,
  COUNT(*) OVER (PARTITION BY month_id)    AS Month_Time_Span,
  MAX(day_id) OVER (PARTITION BY month_id) AS Month_End_Date,
  month_short_desc,
  month_long_desc,
  month_short,
  month_long,
  month_num_of_year,
  quarter_id,
  COUNT(*) OVER (PARTITION BY quarter_id)    AS Quarter_Time_Span,
  MAX(day_id) OVER (PARTITION BY quarter_id) AS Quarter_End_Date,
  quarter_num_of_year,
  half_num_of_year,
  half_of_year_id,
  COUNT(*) OVER (PARTITION BY half_of_year_id)    AS Half_Year_Time_Span,
  MAX(day_id) OVER (PARTITION BY half_of_year_id) AS Half_Year_End_Date,
  year_id,
  COUNT(*) OVER (PARTITION BY year_id)    AS Year_Time_Span,
  MAX(day_id) OVER (PARTITION BY year_id) AS Year_End_Date
FROM base_calendar
ORDER BY day_id;
);
--
COMMIT;

Thursday, December 2, 2010

Simba previews Cognos8 Analysis Studio accessing Oracle Database OLAP Option cubes

Hot on the heels of support for BusinessObjects Voyager, and in addition to the native Excel 2003/2007/2010 pivot table access, Simba are previewing the same connectivity for Cognos8 Analysis Studio - the dimensionally aware UI in the Cognos BI suite.

Together with the unique SQL access to the same multidimensional data & calculations in Oracle Database OLAP cubes (meaning that *any* tool or application capable of connecting to Oracle and issuing simple SQL can leverage the power of Database OLAP - like Oracle Application Express for example), plus the existing support for Oracle's own BI tools including
together with the big functionality and performance improvements in 11g , there is now every reason to move to Oracle Database 11gR2 and to fully exploit the OLAP Option - whatever your choice of front end tool(s).

For Cognos fans: Here is the Video on YouTube:


More information, see the Simba website : http://www.simba.com/MDX-Provider-for-Oracle-OLAP.htm

Tuesday, November 16, 2010

Simba previews Oracle OLAP MDX Provider connectivity to SAP BusinessObjects Voyager

Simba technologies have released a short video to preview 'Using MDX Provider for Oracle OLAP to directly connect SAP BusinessObjects Voyager to Oracle Database OLAP Option'

This will be a great capability for users of both Oracle OLAP and BusinessObjects and will futher extend the reach of Oracle database embedded OLAP cubes.





You can get more details on the Simba website