免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 4370 | 回复: 0
打印 上一主题 下一主题

Document Your ETL System(转原版英文文档鉴赏) [复制链接]

论坛徽章:
4
金牛座
日期:2014-08-21 12:58:152015年辞旧岁徽章
日期:2015-03-03 16:54:152015亚冠之本尤德科
日期:2015-05-22 00:05:18数据库技术版块每日发帖之星
日期:2015-06-23 22:20:00
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-10-10 22:47 |只看该作者 |倒序浏览
www.kimballgroup.com Number 65, March 6, 2005
Design Tip #65 Document Your ETL System

By Joy Mundy
Whether you use an ETL tool or hand-code your ETL system, it’s a piece of software like any other
and needs to be documented. As your data warehouse evolves, the ETL system evolves in step; you
and your colleagues need to be able to quickly understand both the entire system architecture and
the gritty details.

There’s a widespread myth that ETL tools are self-documenting. This is true only in comparison with
hand-coded systems. Don’t buy into this myth: you need to develop an overall, consistent architecture
for your ETL system. And, you need to document that system. Yes, writing a document.

The first step in building a maintainable ETL system is to STOP and think about what you’re doing.
How can you modularize the system? How will those modules fit together into an overall flow?
Develop your system so that you use a separate package, flow, module (or whatever your tool calls it)
for each table in the data warehouse. Write a document that describes the overall approach – this can
be a few pages, plus a screenshot or two.

Design a template module and group like activities together. The template should clearly identify
which widgets are associated with extracts, transformations, lookups, conformation, dimension
change management, and final delivery of the target table. Then, document this template flow in
painstaking detail, including screenshots. The documentation should focus on what’s going on, not on
the detailed properties of each step or task.

Next, use the templates to build out the modules for each dimension and fact table. If you can control
layout within your ETL tool, make the modules look similar, so people can look in the top-left for the
extract logic, and can more easily understand the squiggly mess in the middle. The modules for each
dimension table should look really similar to each other; likewise for fact tables. Remember: it’s only a
foolish consistency that’s the hobgoblin of small minds. The table-specific documentation should
focus on what’s different from the standard template. Don’t repeat the details; highlight what’s
important. Pepper your ETL system with annotations, if your ETL tool supports them.

Finally, your ETL tool may support some form of self-documentation. Use this feature, but consider it
an appendix to the real document as it’s either relatively lame (screenshots) or overwhelmingly
detailed (all the properties of all the objects); it’s not, in our experience, particularly useful.
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP