免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3387 | 回复: 0

Dimension Embellishments(转英文资料欣赏) [复制链接]

论坛徽章:
4
金牛座
日期:2014-08-21 12:58:152015年辞旧岁徽章
日期:2015-03-03 16:54:152015亚冠之本尤德科
日期:2015-05-22 00:05:18数据库技术版块每日发帖之星
日期:2015-06-23 22:20:00
发表于 2008-10-27 21:24 |显示全部楼层
Kimball Design Tip #53: Dimension Embellishments
By Bob Becker
When developing dimensional models, we strive to create robust dimension tables decorated with a
rich set of descriptive attributes. The more relevant attributes we pack into dimensions, the greater
the users’ ability to evaluate their business in new and creative ways. This is especially true when
building a customer-centric dimension.

We encourage you to embed intellectual capital in dimensional models. Rather than applying
business rules to the data at the analytical layer (often using Excel), derivations and groupings
required by the business should be captured in the data so they’re consistent and easily shared
across analysts regardless of their tools. Of course, this necessitates understanding what the
business is doing with data above and beyond what’s captured in the operational source. However,
it’s through this understanding and inclusion of derived attributes (and metrics) that the data
warehouse adds value.

As we deliver a wide variety of analytic goodies in the customer dimension, we sometimes become
victims of our own success. Inevitably, the business wants to track changes for all these interesting
attributes. Assuming we have a customer dimension with millions of rows, we need to use minidimensions
to track customer attribute changes. Our old friend, the type 2 slowly changing dimension
technique, isn’t effective due to the large number of additional rows required to support all the
change.
The mini-dimension technique uses a separate dimension(s) for the attributes that frequently change.
We might build a mini-dimension for customer demographic attributes, such as own/rent home,
presence of children, and income level. This dimension would contain a row for every unique
combination of these attributes observed in the data. The static and less frequently changing
attributes are kept in our large base customer dimension. The fact table captures the relationship of
the base customer dimension and demographic mini-dimension as the fact rows are loaded.
It is not unusual for organizations dealing with consumer-level data to create a series of related minidimensions.
A financial services organization might have mini-dimensions for customer scores,
delinquency statuses, behavior segmentations, and credit bureau attributes. The appropriate minidimensions
along with the base customer dimension are tied together via their foreign key
relationship in the fact table rows. The mini-dimensions effectively track changes and also provide
smaller points of entry into the fact tables. They are particularly useful when analysis does not
require consumer-specific detail.

Users often want to analyze customers without analyzing metrics in a fact table, especially when
comparing customer counts based on specific attribute criteria. It’s often advantageous to include the
currently-assigned surrogate keys for the customer mini-dimensions in the base customer dimension
to facilitate this analysis without requiring joins to the fact table. A simple database view or
materialized view provides a complete picture of the current view of the customer dimension. In this
case, be careful not to attempt to track the mini-dimension surrogate keys as type 2 slowly changing
dimension attributes. This will put you right back at the beginning with a large customer dimension
growing out of control with too frequent type 2 changes.

Another dimension embellishment is to add aggregated performance metrics to the customer
dimension, such as total net purchases last year. While we normally consider performance metrics to
be best handled as facts in fact tables (and they should certainly be there!), we are populating them
© Copyright Kimball Group, 2004. All rights reserved. 2 of 2

in the dimension to support constraining and labeling, not for use in numeric calculations. Business
users will appreciate the inclusion of these metrics for analyses. Of course, populating these
attributes in our dimension table places additional demands on the data staging system. We must
ensure these aggregated attributes are accurate and consistent.

An alternative and/or complementary approach to storing the actual aggregated performance metrics
is grouping the aggregated values into range buckets or segments, such as identifying a credit card
customer as a balance revolver or transactor. This is likely to be of greater analytic value than the
actual aggregated values and has the added benefit of assuring a consistent segment definition
across the organization. This approach works particular well in combination with the mini-dimension
technique.
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP