Monday, January 21, 2008

OLAP Workshop 5 : Building Cubes

In the last series of Workshops, we started to look at building the dimensions to support our data model. Each dimension contained levels and a hierarchy. The purpose of a hierarchy is to provide the relationships for summarization of measures in the cube and to make navigating multiple levels of data easy and intuitive for the end user. The next stage is to start building cubes.

Creating Cubes

What Are Cubes?

Cubes are containers of measures (facts). They simply provide a convenient way of collecting up measures with the same dimensions. Therefore, all measures in a cube are candidates for being processed together at all stages: data loading, aggregation, and storage. Cubes are only visible to the cube builder (end users only see the measures they contain) and simplify the setup and maintenance of measures in AWM.

Creating Cubes
To create a cube, right-click the Cubes node in the navigator, and then select the Create Cube option.

Note: You can also create a cube from a cube template if you have a template available.

The Create Cube window appears, as shown below:

The Create Cube wizard provides a tabbed page interface that enables you to specify the logical model and processing options for a cube. The best way to use this wizard is to always work from left to right across the various tabs.

General Tabbed Page
On the General tabbed page, enter the basic information about the cube:
  • Provide the cube with a distinct name and provide the short and long label descriptions. Note – the name of the cube cannot be changed once the cube has been created.
  • Identify the dimensionality of the cube by using the arrow keys to move dimensions from the Available Dimension list to the Selected Dimension list. After you define the dimensionality, all measures that you create based on this cube will have the same dimensionality. Note – the dimensionality of the cube cannot be changed once the cube has been created.
Remember that Oracle OLAP supports cubes of different dimensionality. Therefore, you do not need to select all the dimensions listed in the panel marked ‘Available Dimension’.

The tick box “Use Default Aggregation Plan for Cube Aggregation” allows you to shortcut the process of creating of measures by applying the settings defined at the cube level to all measures within the cube. As we will see later, defining measures is an almost identical process as defining cube.

Translations Tabbed Page
Enables you to provide long and short descriptions for the cube in each language that the AW supports. Although there are other tabs within the cube wizard, at this point it is possible to ignore all the other tabs and allow the AWM to default all the other features.

Adding Measures to a Cube
Base measures store the facts collected about your business. Dimensions logically organize the edges of a measure, and the body of the measure contains data values. Each measure belongs to a particular cube, and by default all the settings for a measure (such as dimensions) are inherited from the cube.

Right clicking on the Measures node in the navigator can create a measure. Next select the Create Measure option.

This will then launch the wizard to create the measure:

General Tabbed Page
On the General tabbed page, you create a name and add label information. Long labels are used by most OLAP clients for display. If you do not specify a value for the long label, then it defaults to the measure’s name. Once the measure is defined you cannot change the name of the measure. If you delete a measure all the data associated with that measure is lost.

Other Tabbed Pages
The Translations tabbed page enables you to provide long and short descriptions for the measure in each language that the AW supports. The other tabbed pages (Implementation Details, Rules, and so on) enable the selection of certain measure-specific processing options other than the settings that are applied by the definition of the cube. These tabbed pages are examined in the following workshop.

At this point it possible to simply create the measure and allow AWM to default all the other settings.

Loading Data into a Cube.
After creating logical objects, you can map them to relational data sources in the Oracle database. Afterward, you can load data into your analytic workspace by using the Maintain Analytic Workspace Wizard.

Step 1 – Mapping Data Sources
To map your measures to a data source, perform these steps:

1. In the navigator, choose Mappings for the cube that contains the measure that you want to map. A list of schemas appears. Find the schema to which you want to map your measure, and then click the + button.

2. Select either Tables or Views, depending on what you are mapping to.

3. Find the table or view name and double-click, or drag it to the mapping canvas. When on the canvas, the structure of the table is visible.

Note: If you want to see the data in the table or view, right-click the name of the table or view, and then select the View Data option.

My recommendation is never to map directly to a fact table. Always use a view as this allows to you to fine tune the load process. For example by using a view you can select to load a single time period, which can be useful when you are trying to manage some of the more advanced settings and you are using an iterative development approach. As you will see later, using a view can make the data take stage (i.e. the initial build of the cube) easier to plan and manage.

4. Drag the cursor from the column name in the relational source to the destination object name in the measure. The image below shows a completed mapping.

Note: The mapping canvas enables you to map the contents of the source data to any level of dimensions. Here, because Budgets are set by product, and by channel for each month, map them to the Month, Product and Channel levels. In the next lesson there is advice techniques for managing situations where source data for different cubes and measures is loaded at different leaf levels of detail.

Step 2 - Loading Data into the Cube
AWM contains a data maintenance wizard to help you create a job to load data into your cubes. The job both loads and aggregates the data within the cube as a single job. You can load:
  • All mapped objects in the analytic workspace
  • All mapped measures in a cube including the dimensions
  • All mapped measures in a cube excluding the dimensions
  • Individually mapped measures
To load data, right-click the desired object name into which you are loading data, and then select the Maintain option.

In this screenshot, the Budgets cube is maintained. This results in the loading of data for all the dimensions that organize the cube and all the mapped measures associated with the cube.

The Maintenance Wizard takes you through a set of steps to load data from the mapped relational objects to the multidimensional objects in the AW.

Step 1 of the wizard, you identify the objects for which data is to be loaded. If you choose cubes, all the measures for the cube are selected. Alternatively, you can choose a specific measure of a cube. After a measure or cube is selected, the associated dimensions are automatically selected as well. AWM, by default, selects the related dimensions for the cube. This is because AWM is dimensionally aware, and knows that the dimensions must exist and be populated in order for measures to be loaded (the dimensions organize the measures physically not just logically in an AW, so they must be maintained before the measure data can be loaded).

Note – My personal preference is not to maintain dimensions at the same time as processing the cube. This goes back to the old days of Express Server when it was best practice to load dimensions first and then load data as a separate job. The reason for this two-step process was to ensure efficient storage of a measure. With the OLAP Option I am not sure if this should still be considered best practice but old habits die-hard.

From this screen it possible to simply click on the “Finish” button and the job will run immediately. Alternatively you can step through the two other screens to set some additional processing options:

Step 2 allows you to determine how previously load data should be managed as well as new data. For the moment, simply ignore this screen, all will be explained in the next workshop.

Step 3 allows you to determine when to run the job. For the moment simply use the default option to run the job immediately. Again, the other options will be explained in the next workshop.

After the loading of data is completed, you can view the report which is shown below (this is the 10g report, the 11g report provides a lot more detail):

After successful completion, the data in your AW is ready to be analyzed.

Note - All the maintenance logging goes into the XML_LOAD_LOG table (for 10g, with 11g there have been some changes which will be explained in a later post), which belongs to the OLAPSYS user. This table can be reviewed later, if required. There is a lot of information in this log, but some of it can be hidden. Always make sure ALL your records were correctly loaded. The log file will tell you if any were rejected, but unfortunately it will not tell you why or which records. The usual reasons are:
  • missing dimension members
  • invalid data due to data type errors

Viewing the Results
After data is loaded, you can preview it by using the Data Viewer. To see the data, right-click the name of the measure or cube that you want to view, and then select View Data from the submenu.

A tabular report appears. If you view a cube, all measures in the cube are displayed. In the Data Viewer, you can:
  • Drill up or down on the dimension values
  • Pivot or rotate the view of the data by dragging the edges (rows, columns, and pages) to new positions
  • Use the query builder to slice and dice the data
This basic crosstab control is used extensively in Oracle Business Intelligence tools, including OracleBI Beans, Discoverer Plus OLAP, and administrative tools such as AWM and Oracle Warehouse Builder. Also, third-party tools and applications sold by Oracle partners that use the OracleBI Beans technology use this same user interface.

Note - When you are developing an analytic workspace always check your data after it has loaded. Do not just assume the data is correct. It always good practice to go back to the fact table and make sure the totals from the source data match the totals in the OLAP cube.

In end-user tools and applications, more functionality (such as formatting, colour coding, and cell actions) is enabled in Discoverer Plus OLAP, as you see in the lesson titled “Building Analytical Reports with OracleBI Discoverer Plus OLAP.”

Note: From the File menu within the Data Viewer, or from the Query Builder tool,
you can access the Oracle OLAP Query Builder. This query wizard is used throughout Oracle Business Intelligence tools.

As shown in the image above, you drill down on data to the lowest levels of detail by clicking the arrow icon to the left of the dimension value. Notice that the measure appears to the user as fully aggregated at all level combinations of all dimensions. This is an important feature of the Oracle OLAP dimensional model. All data is presented to the end user as if it is already aggregated and calculated, even if some or all of the data being displayed is being calculated on the fly.

For example, some of the budget data has been pre-aggregated during the maintenance task, and some of it is being calculated dynamically. The end user cannot tell the difference, and does not need to know. The AW contains the data and the calculation logic and presents the results that the user needs. From the technical perspective, not even the query behind this crosstab needs to know whether the measure cells being requested are pre-computed or not. The query simply requests these cells from the database, and the AW engine performs any calculations required at query time.

In some cases you may need to move beyond the default settings described in this workshop. Therefore, in the next workshop we will look at the other tabs that are part of the Cube wizard. These tabs control sparsity, compression and partitioning features, aggregation rules, and summarization strategies.

No comments: