ETL 面试问题 (会做的请给答案) 1. 分析 a. What is a logical data mapping and what does it mean to ETL team? b. What are the primary goals of the data discovery phase of the data warehouse project? c. How is the system-of-record determined? 2. 结构 a. What are the four basic data flow steps of an ETL process? b. What are the permissible data structures for the data staging area? Firefly describe the pros and cons of each c. When should data be set to disk for safekeeping during ETL? 3. 抽取 a. Describe techniques for extracting from heterogeneous data source b. What is the best approach for handling ERP source data? c. Explain the pros and cons of communication with databases natively versus ODBC d. Describe three change data capture(CDC) practices and the pros and cons of each 4. 数据质量 a. What are the four broad categories of data quality checks? Provide an implementation technique for each. b. At which stage of the ETL should data be profiled? c. What are the essential deliverables of the data quality portion of ETL? d. How can data quality be quantified in the data warehouse? 5. 建立对应 a. What are surrogate keys? Explain how the surrogate key pipeline works. b. Why do dates require special treatment during process. c. Explain the three basic delivery steps for conformed dimensions. d. Name the three fundamental fact grains and describe an ETL approach for each. e. How are bridge tables delivered to classify groups of dimension records associated to a single fact? f. How does late arriving data affect dimension facts? Share techniques for handling each 6. metadata a. Describe the different type of ETL metadata and provide examples of each. b. Share acceptable mechanisms for capture operational metadata. c. Offer techniques for sharing business and technical metadata. 7. 优化 a. state the primary types of tables found in a data warehourse and the order which they mush be loaded to enforce referential integrity. b. What are the characteristics of the four levels of the ETL support model? c. What steps do you take for determine the bottleneck of a slow running ETL process? Describe how to estimate the load time of a large ETL job?
TAG:
ETL
面试
safekeeping
10秒注册会员 结交数据仓库朋友 分享你的精彩

最新评论
删除 引用 Guest (2008-9-19 13:57:12, 评分: 0 )
删除 引用 Guest (2008-9-04 22:39:33, 评分: 0 )