Friday, March 7, 2008

OLAP Workshop 7 : Creating Calculated Measures

Most data warehouses contain a lot of data but also conversely contain very little information. Many DW teams are content to publish basic facts to their user communities and then leave those communities to fend for themselves in turning data into information. For example, it is not uncommon to see basic measures such as sales, costs, and stock in many data warehouse schemas, but in reality these types of measures are completely useless in themselves. Business users are not interested per se in the value of sales today and this can be seen in the general press as well when reporting key trading periods such as Thanksgiving in the US and Christmas in Europe. Both business and financial communities are both more interested in sales compared to the same time last year, rate of growth of sales, sales compared to forecast. In other words most people are actually interested in calculated measures such as ratios and percentages derived from the base data.

Therefore, the key question for many data warehouse teams is how to create and manage these types of calculations?

In this workshop we will explore some the calculated measures that can be quickly and easily created in your analytic workspace (AW) to enrich its analytic content for end users.

One of the powerful features of the Oracle OLAP technology is the ability to efficiently and easily create business calculations. Oracle OLAP contains a powerful calculation engine that allows you to extend the analytic content of your AW by adding into it some useful business calculations as calculated measures. Some of these business calculations are simple and some of are a lot more involved. However, none are complex from an end-user perspective, although many of them are challenging to traditional relational-only databases. This is especially true when the calculations are numerous, and when many of the queries are ad hoc and unpredictable in nature.

Calculated measures are, as the name suggests, calculated from other measures available in the AW. They are implemented as formulas in the AW; that is, their definition is saved, but no calculated data is stored. The calculations happen at run time when a query requires it. Calculated measures are derived from the contents of other measures, including stored measures as well as measures that are calculated at run time. The calculated measures that you define in the AW are indistinguishable to end users from the stored measures into which data has been loaded and stored in the AW. All measures, according to the dimensional model presented to the end user, are identical. This promotes ease of use by end users.

There is generally a trade-off between precomputing and storing measures in the AW versus calculating them at query time. However, Oracle AWs are very efficient at preserving query performance at very fast levels, even when there are many calculated measures that are resolved dynamically. It is not uncommon for Oracle OLAP customers to implement multidimensional cubes with many hundreds or even thousands of calculated measures and key performance indicators (KPIs), which are calculated at query time from a relatively small number of physically stored and aggregated (or partially aggregated) measures. It is a striking characteristic of AWs in the Oracle database that query performance generally remains consistent even as data volumes and calculation complexity increase.

So how do calculated measures works and what happens when the dimensionality of the source measures does not match exactly?

In the example below a measure called Revenue is a calculated measure based on two other measures: quantity and price. The calculation itself is simple: quantity × price. Notice that the resulting dimensionality of Revenue is inherited from the two measures involved in the calculation. When you use measures with different dimensionality in a calculation, the result always contains the superset of the dimensions of the base measures. The multidimensional data model handles this automatically. You do not have to worry about the possibility that different measures in your AW have different shapes or dimensionality. You specify the calculation rule in the wizard, and the engine automatically resolves the dimensionality. One obvious requirement is that one dimension must be in common for the result to make sense.

In this example, Quantity is dimensioned by Time, Product, and Customer but Unit Price is not dimensioned by Customer. When Oracle OLAP is asked to calculate quantity × price, it uses its knowledge of the dimensional model to automatically handle the calculation of Revenue for all customers, even though there is no separate price stored for each customer. If there is not a separate price for each customer, then there must be a single price for all customers. Price does not vary by customer. As Oracle OLAP performs the calculation quantity × price, it applies the appropriate price for the particular product and time dimension intersections being calculated.

There are two methods for creating a calculated measure
  • Wizard and template method
  • Free format
Using the Calculation Wizard
By default both AWM and OWB provide a calculation wizard to help define the most common types of business calculaltions. There are four categories of calculations:
  • Basic
  • Advanced
  • Prior/Future comparison
  • Time Frame
The image below shows the wizard screen and the list of templates within each category (Note there are some changes with AWM 11g).

Creating a Share Calculation
The Share template prompts you for the components you need to specify the calculation:
Share Of: A measure or calculated measure that is dimensioned by the Product dimension (in this example)

  • For: The dimension for which the share is to be calculated
  • In: The hierarchy to be used while calculating the share for the selected dimension
  • As a Percent of: The dimension member to be used as a baseline to calculate the share. Select one of the following choices:
    • Total: Specifies that the baseline consists of the total of all items on the level that is associated with the current member (that is, the item for which the share is being calculated). This option is disabled for a dimension that has no hierarchies.
    • Parent: Specifies that the baseline consists of the total on the level of the parent for the current member (that is, the item for which the share is being calculated). This option is disabled for a dimension that has no hierarchies.
    • Level: Specifies that the baseline consists of the total of a level to be specified. Choosing this item requires the selection of a value in the associated drop-down list. This list displays the names of levels from the selected hierarchy for the selected dimension that are available for calculating the share. This option is disabled for a dimension that has no hierarchies.
    • Member: Specifies that the baseline consists of the total for a dimension member to be specified. Choosing this item requires the selection of a value in the associated drop-down list. This list displays the names of the dimension members that are available for calculating the share. This type of calculation applies to measures only.
Note that although the most common use of this template is to express the share as a “% of Total” or as a “% of Parent” in the chosen hierarchy, a specific member can be used as the baseline of the calculation. This is useful if you want to compare members of the dimension in question to a specific benchmark or model member, such as an established market leading product, flagship store, or key competitor.

So, as can be seen with this share calculation template it may be necessary to create multiple calculated measures using the same template to provide different results, such as shown below:

This report is showing the Budget Profit base measure and three Share calculations: Share of Product Total, Share of Product Level, and Share of a Product Member (Hardware). Note how the “% of Total,” “% of Level,” and “% of HW ” (hardware) category measures behave differently as the user drills down the product hierarchy. Also note that the third share measure on the report is base-lined to a specific product: the Hardware category. The report shows how Hardware compares to the other categories: Electronics, Peripherals and Accessories, Photo, and Software/Other.

Creating a % Different Prior Period Calculation
Using the “Percent Difference from Prior Period” calculated measure template, you can create a calculated measure that is useful to indicate growth or decline of a business over time. This calculation template is found in the Prior/Future Time Period calculation type folder. This template accepts input for the following items to calculate the percentage difference from a prior period:
  • Measure: Select a measure or a dimension member for which you want to calculate the percentage difference from the prior period.
  • Over: If there is more than one time dimension, then a box appears to enable the selection of the proper Time dimension. Otherwise, the default Time dimension is used.
  • In: Select the hierarchy for the specified dimension.
  • From: Choose one of the following items to indicate the previous time period that the comparison is to be based on:
    • Year ago: Use if your measure is to compare performance with the same time period from the previous year
    • Period ago: Use if your measure is to compare performance with the previous period at the same level in the Time hierarchy
    • Number of periods or years ago: Use if your measure is to calculate a comparison with a time period of a specified number (entered in the number box) of periods ago, at a particular level (such as Year, Quarter, or Month)

In a report this would look like this:

This report contains calculations of a number of alternative percentage differences from prior periods. All the measures automatically handle the situation in which the user needs to drill down into the time dimension and look at time periods at different levels. A single calculated measure in the AW can be used at any level of time, by any query tool, including SQL tools.

Note the following:
  • The Last Year calculation works at all levels of time, and compares each time period with the same time period 1 year ago.
  • The Last Period calculation works at all levels of time, and compares each time period with the previous period at the same level.
  • The 3 Months Ago calculation works at the appropriate levels of time (in this case, Month and Quarter because a quarter is made up of three months), and compares each time period with the same time period 3 months ago (which is equivalent to one quarter ago).
  • Similar calculations can be easily generated for Costs, Quantity, Profit, and Budget measures.

Creating a Moving Average Calculation
The Moving Average calculated measure template enables you to create moving averages over any of the measures in your AW. Moving averages are very useful when you analyze volatile data, because they smooth out the peaks and troughs and enable you to more easily visualize the trend in data. In the Moving Average template, you are asked to provide the following input:
  • Measure: Select the measure for which you want to calculate a moving average.
  • Over Time In: If there is more than one time dimension, then a box appears to enable the selection of the proper time dimension. Otherwise, the default time dimension is used. In identifies the hierarchy for the specified dimension.
  • Include Previous: Enter the number of periods to be used for the calculation.
  • An example of this calculation is as follows:
  • Moving average of sales for the last three months = (Jan sales + Feb sales + March sales) / 3
Note: Similar pages are used for Moving Totals, Moving Maximums, and Moving Minimums.

Below is a combination graph showing how moving averages can be a useful way of smoothing out volatile data, thus enabling you to see the trends in data more easily.
One line is a moving six-month average, and the other line is a three-month average.

Modifying a Calculated Measure
Existing calculated measures can be edited from within AWM 10g. The descriptions and the calculation details can be changed. To change a calculated measure, click the calculated measure in the Model view. You see the general information displayed on the right. You can:
  1. Make changes to labels and description. You can change the labels and description, but not the name.
  2. Click the Launch Calculation Editor button to change the details of the calculated measure. You can change the details, but not the type, of the measure.

Managing Calculated Measures
My recommendation is always to create your calculated measures in a separate cube. This helps insulate you from changes to the physical implementation of your base cubes. For example, if you want to change the storage definition for a cube AWM forces you to delete the cube, which if it contains calculated measures means these are also delete and need to be recreated. By keeping your calculated measures in a separate cube it is possible to delete and a rebuild a cube without impacting the calculated measures. Assuming of course you do not change the dimensionality.

There are other ways to resolve this issue (deleting a cube but keeping the calculated measures):
  • Save the calculated measure to an XML template
  • Hack the XML definition for the cube
Saving the calculated measures to an XML file is always a good idea since this creates a backup of the definition. However, you can only save one measure at a time which is fine if you create the XML template when you define the calculated measure but not so good if you have lots and lots of calculated measures in a cube and then you decide to save them all to XML templates.

Hacking the XML is not something I would normally recommend, however, it is possible to move calculated measures from one template file to another using notepad. Again, assuming you do not change the dimensionality (as some measures may refer to specific dimensions and/or levels and/or hierarchies) you can cut & paste the XML. The calculated measures are all defined in the last but one block of the XML definition, using the tag “DerivedMeasure”. Simply copy all the DerivedMeasure blocks to your new cube XML template and reload that template to restore all your calculated measures. This works for 10gR2 but has not been tested with 11g.

Creating custom calculated Measures
Oracle OLAP Option has a very powerful calculation engine that supports a huge library of functions:
  • Numeric Functions
  • Time Series Functions
  • Text Functions
  • Financial Functions
  • Statistical Functions
  • Date and Time Functions
  • Aggregation Functions
  • Data Type Conversion Functions
Any these functions can be used to create a custom calculated measure. To get more information on these various functions you can refer to the Oracle OLAP DML Reference 10g Release 2 documentation:

and for 11g:

Here is a very simple example of how to create a custom calculated measure. Lets create a measure to show the percent variance for a measure, sales revenue, based on the prior-period. To do this we need to use the lagpct function. The LAGPCT function returns the percentage difference between the value of a dimensioned variable or expression at a specified offset of a dimension prior to the current value of that dimension and the current value of the dimensioned variable or expression.

The syntax for the function is :
  • LAGPCT(variable, n, [dimension], [STATUS|NOSTATUS|limit-clause] )
  • Variable - A variable or expression that is dimensioned by dimension.
  • ‘n’ - The offset (that is, the number of dimension values) to lag. LAGPCT uses this value to determine the number of values that LAGPCT should go back in dimension to retrieve the value of variable. Typically, n is a positive INTEGER that indicates the number of time periods (or dimension values) before the current one. When you specify a negative value for n, it indicates the number of time periods after the current one. In this case, LAGPCT compares the current value of the time series with a subsequent value.
  • Dimension - The dimension along which the lag occurs. While this can be any dimension, it is typically a hierarchical time dimension of type TEXT that is limited to a single level (for example, the month or year level) or a dimension with a type of DAY, WEEK, MONTH, QUARTER or YEAR. When variable has a dimension with a type of DAY, WEEK, MONTH, QUARTER, or YEAR and you want LAGPCT to use that dimension, you can omit the dimension argument.
  • Status can be one of the following:
    • STATUS - Specifies that LAGPCT should use the current status list (that is, only the dimension values currently in status in their current status order) when computing the lag.
    • NOSTATUS - (Default) Specifies that LAGPCT should use the default status (that is, a list all the dimension values in their original order) when computing the lag.
    • limit-clause - Specifies that LAGPCT should use the default status limited by limit-clause when computing the lag. You can use any valid LIMIT clause (see the entry for the LIMIT command for further information). To specify that LAGPCT should use the current status limited by limit-clause when computing the lag, specify a LIMIT function for limit-clause.
Based on this syntax, the format of our function would be as follows:
  • lagpct(sales_revenue, 1, TIME, LEVELREL TIME_LEVELREL)
since sales_revenue is the variable, we need to offset by one period to get the prior period, the dimension for the lag is Time and the limit clause is based on the standard form level object TIME_LEVELREL which ensures the correct prior period is selected based on the level of the dimension member, so months are only compared to months and quarters only compared to quarters and so on.

How do you install a Custom Calculation?
There are two ways to add a custom calculation to a cube:
  • Special XML Template
  • Excel Utility
It is possible to use an XML template file to define a custom calculated measure. As we noted in the XML definition of a cube containing a calculated measure the tag “DerivedMeasure” is used to denote a calculated measure. The template needs to have the following fields:
  • Name
  • LongName
  • ShortName
  • PluralName
  • Id
  • DataType
  • IsInternal
  • UseGlobalIndex
  • ForceCalc
  • ForceOrder
  • SparseType
  • AutoSolve
  • IsValid
  • ExpressionText
So for our example the following entries would be required:

  • Name="SR_PPV_PCT"
  • LongName="Sales Revenue Prior Period % Variance"
  • ShortName=" Sales Rev Prior Period % Var"
  • PluralName=" Sales Revenue Prior Period % Variance"
  • DataType="Decimal"
  • isInternal="false"
  • UseGlobalIndex="false"
  • ForceCalc="false"
  • ForceOrder="false"
  • SparseType="STANDARD"
  • AutoSolve="DEFAULT"
  • IsValid="true"
  • ExpressionText=" lagpct(sales_revenue, 1, TIME, LEVELREL TIME_LEVELREL)"/>

Note the following:

  • Name is can must match the “custom_calculated_measure_name” value in the Id tag.
  • Id is derived as follows:
    • Cube_name.custom_calculated_measure_name.MEASURE
  • ExpressionText can refer either to the AWM Object View names or the physical objects from the ModelView. It can be more efficient to refer directly to the stored variables rather than using the standard form objects since this involves and additional layer of processing that is not always necessary. But start by referring to the standard form objects and check query performance before pointing directly to the base storage objects.
Fortunately, there is a much easier way to install a custom calculated measure. On OTN there is an Excel utility to that can help. See the link on the OLAP OTN Home Page, “Creating OLAP Calculations using Excel”:

Follow the instructions in the readme file and then open the spreadsheet included in the zip. This utility can be used to install both custom and standard calculations (those generated by the Calculated Measure Wizard), which makes installing calculations into a cube a quick and simple exercise. However, you do need to understand how the underlying functions are implemented as some of the templates require you to provide inputs such as “offset”, “start”, “stop” and “step”. Now the example worksheet provided does include examples for each of the types of templates, which makes it much easier to understand the values required for some of these templates. Using Excel is a good way to back up all your calculation definitions and makes it very easy to install the calculations into different environments, such as test, training,QA, production etc.

To use this utility follow these steps:

Step 1: Define your connection

This should match the details you set in AWM to connect to your analytic workspace.

Step 2: Select an AW

Once the connection is established the next stage is to select an AW. Each user is not limited to owning and/or using just one AW. In most implementations an OLAP user may have access to multiple AWs. Therefore, it is important to select the required AW before creating any calculations. A pulldown list of available AWs is provided just below the “Connection Details” button.

Step 3: Defining the calculation type
This utility will allow you to create both custom and pre-defined calculations. The column headed “Calculation Type” can be toggled between two values:
  • Template
  • Equation

The “Template” option will install one of the calculations from the AWM Calc Wizard and the “Equation” option allows you to define a free format equation.

Step 4: Basic details.
Each measure needs to have a name (which is the physical storage name for the measure so it cannot contain spaces or certain characters such as %, $, £ etc.), long label a short label and be assigned to a specific cube.

A pulldown list can be used to select the target cube. The next column to the right allows you to assign the measure to a measure folder.

Step 4a: Template Calculations
If you are defining a calculation based on a template, a pulldown list of available templates is available in the column marked “Calculation Template”

At this point it is a good idea to refer to the sample worksheet as this show you how to complete the additional columns to the right that manage the arguments for the templates:

The inputs are:
  • Base measure – a pulldown list of all available measures is provided
  • Dimension – the base dimension, which for most of the templates tends to be Time (pulldown list is available)
  • Hierarchy – the main hierarchy from the time dimension (pulldown list is available)
  • Level – the target level from the time dimension (pulldown list is available)
  • Other numeric arguments determined by the type of template

All this information is taken from the Calculation Wizard so if you want to check your inputs simply run the calculation wizard in AWM and note the inputs for the specific template.

Step 4b: Equation Templates:
To create a custom calculation set the calculation type to “Equation” and then in the “Free Form Equation” column enter the formula using either the standard form object names or the physical storage object names. The equations can be one of three basic data types:
  • Decimal
  • Integer
  • Text
In the example below (taken from the sample spreadsheet) two calculated measures are defined, one decimal and one text:

Measure name = PROFIT
Data Type = DECIMAL

Measure Name = HOW_IS_MARGIN
Data Type = TEXT

Step 5: Installing the Calculated Measures
Once all your calculations are defined, simply on the “Define Calculations” button.

This will launch a command window where the OLAP AW XML Java API is used to load the calculated measures defined in the worksheet into the target AW. During the installation process, feedback is written to the command window. An errors will be visible in this window and need to be resolved before trying the installation process again. This utility will overwrite an existing calculated measure so updating an AW is quick and easy since there is no need to first delete any existing calculated measures.

Once the measures have been deployed I would recommend starting AWM and checking all the calculated measures were correctly installed and do in fact return data. Sometimes it is easier to do this via the OLAP Worksheet, especially checking the data, since you can limit the various dimensions to a nice small subset of the data.

It is possible for a calculated measure to be installed and visible in AWM but not physically present. Which seems a little odd. This usually implies an issue with the naming convention, which allowed the object to be added to the metadata catalog, but the physical name generated an error for some reason. Easiest solution is to delete the calculated measure using AWM and try again after checking the name in the Excel worksheet.

1 comment:

deeps said...

Can you just explain how can you find the moving average in Oracle 11g. Because there is no option of selecting period to include in moving average in Oracle 11g. Can u please explain it.