Monday, January 14, 2008

OLAP Workshop 4 : Managing Different Types of Hierarchies

In the previous posting we started to look at building our first analytic workspace using Analytic Workspace Manager. At this stage don’t forget that we can also use Warehouse Builder to perform the same tasks and in many cases, especially on large-scale projects, this will be the product of choice for designing, building and maintaining your analytic workspaces.

At the end of the last workshop we had defined a simple time dimension and examined the various components that make up a dimension:
  • Levels
  • Hierarchies
  • Attributes
In this workshop we are going to look in more detail at the types of hierarchies that you might need to design and map within your environments.

Most dimensions will have at least one hierarchy, but Oracle OLAP does also support completely flat dimensions where no hierarchy exists. Although this is rare it does occur in some cases, but it is always wise to have an “All Members” level for these types of dimensions as this will allow business users to pivot these types of dimensions out of their query by selecting that top level. Otherwise their queries will always be pinned to a single dimension member within the page dimension.

A hierarchy defines a set of parentage relationships between all or some of a dimension's members:
  • Used for rollups of data.
  • Used for end-user navigation; e.g., drill-down.
While multiple hierarchies are supported each member can have only one parent within each hierarchy. Lets look at some basic examples:

In the first image we have a traditional level based hierarchy where each child has a parent at the next level up in the hierarchy. Although the number of children at each node may, and usually does, differ between nodes. The second image shows another type of level based hierarchy that is some times referred to as a “Skip Level” hierarchy. This is where a leaf node links to a higher-level parent above its next most obvious level. Oracle database can support skip-level relationships within relational hierarchies, however, this is limited to skipping to only one specific level. Oracle OLAP is able to support skip-levels across multiple levels, as seen here:

Oracle OLAP is able to manage these types of relationships quickly and easily because all types of hierarchies are effectively stored as parent-child relationships. A derivation of the skip-level hierarchy is the “Ragged” hierarchy. This is where leaf-nodes are located at different levels within the hierarchy. Obviously this can have an impact on the data loading and aggregation plans, however, Oracle OLAP is more than capable of handling this type of scenario in just the same way as any other level based hierarchy.

Of course you can combine some of these structures to create more complicated relationships such as a “Ragged-Skip” level hierarchy. These more complex structures are also supported.

The last type of hierarchy shown above is a simple flat hierarchy, which as explained earlier may or may not be an ideal type of dimension to model depending on how your business users plan to build queries. In all these cases, the hierarchy is defined based on levels.

One type of hierarchy not shown, but which is supported Oracle OLAP, is “Value” based hierarchies, of which the typical Employee/HR table is the most common example. This type of hierarchy contains no levels and is dealt with as a pure parent-child relationship. In this case the level names are converted into attributes to help business users define the queries.

Across all these types of hierarchies there are some simple rules that need to be followed. It is recommended you create at least one top level on each of your hierarchies. Although some types of dimensions, such as time, will require multiple top levels such as Years.

What you cannot do is have a child owned by multiple parents within the same hierarchy as shown below. In this case, you would need to create two separate hierarchies to manage the relationships separately.

The interesting part here, is the basic design of the dimension and its related levels, hierarchies and attributes is largely consistent across all these different types of structures. The only real different is between level and value-based relationships where value-based dimensions do not contain levels. Fortunately, the dimension loading routines manage these types of dimension structures transparently.

The next step, having defined our dimensions and their associated hierarchies, is to map the source data to the actual dimension itself. To help with this process, and to accommodate some of these more complex relationships, the AWM Mapping Editor allows for three types of source data:
  • Star format source table
  • Snowflake collection of source tables
  • Other
Which just allows you to use just about any type of relational schema design as a source in the mapping editor.

The Mapping Editor

The mapping editor is laid is comprised of four main areas:

  • 1 – The mapping editor is launched from the main navigator. There is a mapping editor for each dimension and each cube.
  • 2 – Schema List: lists the available tables, views and synonyms where the owner of the AW has been granted SELECT privilege.
  • 3 – The mapping Canvas: dragging tables views and.or synonym on to the mapping canvas makes it available for use within a mapping.
  • 4 – Mapping Control: Controls the type of layout, which includes:
    • Star schema
    • Snowflake schema
    • Other

In the following sections we will look at how to use the mapping editor to manage different schema layouts to model different types of dimensions and hierarchies.

Types of Dimension Source Tables/Views
Firstly, a quick best practice tip. Personally, I always find it useful to map to views rather than directly to tables. This provides more control over the data passed into the data loader (which can be useful for testing), especially when trying to perform incremental updates from a fact table. But we will look at this in more detail when reviewing processes for designing and building cubes.

A star schema provides one table or view with columns containing member id's representing all levels of a hierarchy for each dimension. Each row in the table specifies a branch in the hierarchy. Additional columns identify additional attributes for each level, such as long and short descriptions. In the case of a time dimension, additional attributes will be required to provide information on end date and time span for each level.

Where a hierarchy is unbalanced and contains skip-levels, or is ragged, or is a combination of both, some rows may contain blank entries in specific columns.

OLAP dimension member ids must be unique within a level, which is normal in relational models, but they may also need to be unique across levels as well. In fact most people forget or try to ignore this requirement and often hit problems later when loading data into their cubes. OLAP stores dimension members as a single continuous list of ids. If your source keys are not unique across levels then you must take the option of generating surrogate keys as stated in the previous workshop.

Enabling the surrogate key option appends the level name to the member id, which should then guarantee uniqueness. However, this is only possible with value-based hierarchies. If your dimension requires a value based hierarchy you must use natural keys.

In summary:
  • Natural keys:
    • Created in the AW as is from the source table or view (except numeric, dates become text).
    • If source table had months 1, 2, 3 then the AW dimension values would be '1', '2', '3'.
  • Surrogate keys:
    • The level name is prefixed to the source table or view id value.
    • If source table had months 1, 2, 3 then the AW dimension values would be 'MONTH_1', 'MONTH_2', 'MONTH_3'
Mapping a Star Based Schema

The steps for using a star schema are:
  • Use natural or surrogate keys allowed
    • Must use surrogate keys if dimension values are not unique across levels.
  • Define levels and a level-based hierarchy.
  • In the mapping editor choose Star Schema as the Type of Dimension Table(s).
Dimension Objects used in the Mapping
The Mapping Editor allows mapping from the source table to the member and attributes at each level. Each attribute is shown as a separate entry in the dimension object in the editor. The editor will not allow mappings from more than one column to each element, although AWM 11g removes the restriction by allowing simple transformations to be performed during the data loading process.
Here is an example of a completed mapping for the Product dimension. Note the long and short description attributes share the same source (so it is possible to map a source column to multiple target attributes).

Some query tools will differentiate between long and short descriptions. For example both Discoverer and the OLAP Spreadsheet Addin for Excel will use short descriptions for dimensions used as column headers and long descriptions when the dimension is used in the page or row edge.

If you do not provide a long and./or short description the data loader will default to using the dimension key to populate these attributes.

A snowflake schema provides separate tables or views for each levels of a hierarchy. Each row in the table specifies a level in the hierarchy with an additional column to link to each parent across the various hierarchies. The same basic requirements apply as for star schemas in terms of uniqueness.

Mapping a Snowflake Based Schema

The steps for using a snowflake schema are:
  • Natural or surrogate keys allowed
    • Must use surrogate keys if dim values are not unique across levels.
  • Define levels and a level-based hierarchy.
  • Choose Snowflake schema as the Type of Dimension Table(s).

Dimension Objects used in the Mapping
The mapping editor has to be switched to “Snowflake” mode using the pulldown selection dialog at the top of the editor. The mapping canvas will then change to allow you to map each member, its parent and associated attributes at each level.

As with the Star schema mapping process, the snowflake mapping editor will not allow mappings from more than one column to each element i.e., map from a single source table or view per level. Here is a completed snowflake mapping:

Collection of Tables
The basic snowflake schema can be taken a stage further by moving the various attributes, such as descriptions etc, to separate tables. This follows a more 3NF approach to data storage and although it looks more complicated it can easily be managed within AWM's mapping editor.

Mapping a Collection Based Schema

In this format natural or surrogate keys can still be used within the dimension. To map a collection of tables as described above the mapping editor needs to be switched to “Other” mode.

When mapping the tables to the dimension, the normal rules still apply. The Mapping Editor only allows mapping to member, parent for dim values and member, value for attributes at each level. It will not allow mappings from more than one column to each element, but you can map from an arbitrary set of source tables and/or views, which have member and value columns.

Value Based (Parent Child)
This is probably the most simple type of relationship to manage from a mapping perspective. Likely sources for this type of mapping are other AWs, where the source data is from an OLAP enabled SQL view, or another multi-dimensional engine.

The source for this type of relationship is normally a two-column table that provides the key and the parent for each child. Other columns are used to provide additional attributes.

In this case, natural keys must be used to define the dimension, since there are no level identifiers that can be used to construct the surrogate key. In this scenario it is possible to use any of the mapping editor options (star, snowflake, or other) to construct the mapping.

Things to remember when designing a parent-child/value based hierarchy. Firstly, there are no levels, therefore, certain calculations, such as share, are not possible. A parent-child hierarchy cannot be used in the partition statement of a cube because there is no level identifier to act as the partition key.

However, it is possible to provide a pseudo level identified by creating a level attribute. This allows users to create selections using the attribute in the normal way. In some cases, a value based hierarchy may be the only way to manage an unbalanced hierarchy, where not all branches have the same number of dimension members).

Flat List
Another version of the parent-child/value based hierarchy is the flat-list dimension. In this scenario, the dimension has no hierarchies and is simply a flat list of dimension members. Personally, I would not recommend building this type of dimension simply because there is no top level. This makes it difficult for business users to pivot the dimension out of the query since they have to pin the dimension to a specific member when it is hidden. This can make the query process more complicated for business users to understand.

In most cases I would suggest that a flat list hierarchy where no top level is possible is a likely candidate for migration to a series of measures within a cube. This is something you should seriously consider before creating a flat list dimension.

The dimension itself can have a hierarchy based on a single level. This provides the flexibility to use either surrogate or natural keys. If the dimension is designed with no levels and no hierarchy then only natural keys are available.

Skip, Ragged and Ragged-Skip Level Hiearchies
Ragged is a special form of skip. The diagram below shows the various scenarios that can be found in many dimensions. It is highly likely that at least one dimension in a data model will have one or all of these scenarios.

The question is how can such a structure be represented within a relational table?

Using an across format structure, where a skip level occurs one or more columns are left blank within a specific row. However, within a skip level there is a common leaf node that denotes the lowest level of the hierarchy. From the leaf node to the top level, certain columns that relate to parents of the leaf node are left blank. As shown below (in this case the ID columns are not shown but follow the same pattern)

This type of layout is difficult, if not possible, for most SQL based query tools to manage. However, recent additions to the SQL language has allowed skip-level hierarchies to be partially managed using normal query methods. However, it is only possible to skip one level within a single hierarchy. Fortunately, OLAP does not enforce this constraint.

To map this type of hierarchy use a normal star schema approach. The OLAP engine will manage the complexity of the relationships for you.

For a ragged hierarchy, the leaf node will occur at any or all-intervening levels within a hierarchy. Again null values will appear in certain columns within each row.

When defining a ragged hierarchy within a dimension wse natural keys and create a level-based hierarchy(ies). Within the dimension-mapping editor map the source table as a star. But for the cube mapping the fact table requires a little more work. It is necessary to map the key for the ragged dimension to all levels in the dimension, which have leaf values (or, to be safe, map to all levels). This is shown below

Ragged Skip Level
In this scenario, looking at the image at the start of this sectio, we can see the leaves are not always at lowest level; there are some intervening nulls However, this simply a combination of the two types of hierarchies we have already reviewed. The source table would look something like this:

In this scenario the same rules apply as before:
  • Use natural keys, level-based hierarchy(ies).
  • Map as a star.
  • When map the fact table, map its key to all levels in the dimension which have leaf values.

In the next posting in this series we will consider how to design and create cubes.

1 comment:


Great Post, im just started to view OLAP or DataWareHouse?, but in Sql Server Analysis services, now im searching info about creating structure of a dimension, because i have that work so iam here reading your post, the first part, that is theory about Hierarchies is well explained, congratulations