Thursday, May 6, 2010

Time Dimensions with Hourly Time Periods

I was working on an application last week that required time series analysis at Hour, Day, Month, Quarter and Year levels. Two interesting things came out of this application.

First, a little implementation detail. The data was supplied in the fact and dimension tables at the Hour level with a TIMESTAMP data type. As you might expect then, there were time periods at the hour level such as:

02-JAN-10 AM
02-JAN-10 AM
02-JAN-10 PM
02-JAN-10 PM
02-JAN-10 PM

In my first attempt at building the time dimension I loaded hours directly from TIMESTAMP data type. In that case, the members at Hour level were loaded into the dimension stripped of the hour (e.g., 02-JAN-10). Since this isn't what I wanted, I converted the hours into a CHAR as follows:

CREATE VIEW time_dim_view AS
TO_CHAR(hour_id, 'DD-MON-YYYY HH24') AS hour_id,
TO_CHAR(hour_id, 'DD-MON-YYYY HH24') AS hour_desc,
hour_id AS hour_end_date,
.. and so on.

This gave me dimension members at hour as follows:

01-JAN-2010 00
01-JAN-2010 01
01-JAN-2010 02
01-JAN-2010 03
01-JAN-2010 04

That worked just fine. I did the same for the descriptions (so that they would be more easily readable by end users) and added a corresponding column to a fact view so that the time view and fact view joined correctly on the TO_CHAR(...) columns.

For the TIME SPAN attribute, I used a fractional value of DAY (0.041667, which is 1/24th of a day). I read the DATETIME into the END DATE attribute as is (no conversion required). From there on, everything worked perfectly (cube builds, time series calculations, etc).

If you happen to look at the END DATE attribute from the OLAP DML side, be sure to wrap the END_DATE object in a TO_CHAR function so that you see the hours. Otherwise, you will see only the day in most cases (it depends on the NLS_DATE_FORMAT setting for the session). For example:


The other thing that was interesting has more to do with the application design. As so often happens, the customer was inclined to build one cube with all history at the hour level (two years of history). When examining the reporting requirements, however, it turned out that hour level analysis very rarely occurs more than 2 month back. Almost all of the reporting looking back over the two years was at the day level or higher (that is, not hourly level reporting).

We could have built the one cube (two years, hour and higher), but most of the processing of hour level data would have been a waste because users don't look at the older data at that level. Instead, we built a very efficient application with two cubes. One cube contained only three months of data at the hour, day, month, quarter and year levels. Another cube contained two years of history starting at the day level.

Presentation of the data is mostly done using Oracle Business Intelligence Enterprise Edition(via SQL to the cube). Some reports examine hourly level data. Other reports examine more aggregate data over longer time periods. Time series calculations (e.g., period to date, moving average, etc.) were added to both cubes and made available in the OBIEE reports.

Occasionally, a user will want to drill from day to hour more than three months back. To support this, OBIEE was set up to drill from day (in the two year cube) to hour in the fact table. The only compromise was that the time series calculations of the cube were not available when drilling to hour in the fact table. That didn't matter to these users.

From the end user perspective, the fact that there were two cubes instead of one (as well as a fact table) was completely irrelevant since OBIEE presented all data in reports in a single dashboard. From a processing perspective, the system was much more efficient and manageable as compared to the single big cube approach.

It is very worthwhile to keep this lesson in mind when you design your applications. Pay careful attention to reporting requirements and build cubes that meet those requirements. You can tie multiple cubes together in a tool such as OBIEE. This approach is often much better then building a single cube every level of detail.

In this case, the example is about what level of detail is in which cube. The same concept applies to dimensions. You might find it much more efficient to build Cube 1 with dimensions A, B, C and D and Cube 2 with dimensions A, B, E and F rather than one big cube with all dimensions.

No comments: