May 18, 2012

ETL Architecture in Depth with Kimball University, September 17-20, 2012 (Melbourne)

Discover How To Future-proof Your Data Warehouse Investment

This course makes sure that you understand all the factors necessary for effectively designing the back room of a data warehouse that can gracefully evolve over time as your needs mature and new technologies become available.

Altis Consulting is excited to announce that Ralph Kimball and Bob Becker are delivering a 4-day ETL Architecture in Depth course in Melbourne, September 17-20 2012.

This course is intended for the data warehouse designer who has identified the sources of data and the target end users and is ready to start implementing.

Above all, this course tries to guarantee that you don’t overlook a critical requirement. For example you dare not design your data warehouse while ignoring:

  • Compliance
  • Integration of diverse sources
  • Increasingly demanding real-time pressures
  • The time variance of your major dimensions such as customer and product
  • Being able to resume or back out a partially completed load
  • Having a 100% certainty that you have captured all the changes in the source systems
  • And a host of other requirements that you will learn about in this course

Even if you don’t have an immediate qualified need for every item on our list, over time it is likely that that you will. At the end of this course you will understand how your data warehouse ETL system can be built to anticipate all of the possible requirements.

This is not a microscopic code-oriented implementation class. Rather, it is an architecture class for the designer who must keep a broad perspective, and who needs to know what the latest technologies and techniques make possible. The course is organised around 34 necessary ETL subsystems which are developed in detail as the course progresses. See the course outline below for the names of the 34 subsystems.

In this course, you will circle around a series of design issues starting with the first steps of extraction, on through to the final steps of delivery of properly formatted data suitable for your BI tool. In this four day class, each student builds on paper a comprehensive ETL system based on a realistic complex example. All 34 subsystems are included.

Every attendee will receive a copy of The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data, co-authored by Ralph Kimball and Joe Caserta.

Who Should Attend ETL Architecture in Depth?

This course is designed for data warehouse implementers, who are responsible for building the back room, or ETL portion, of a data warehouse environment. This would include:

  • ETL Developers
  • ETL Architects
  • Data warehouse operational staff
  • Compliance tracking data warehouse professionals
  • Real time data warehouse designers

 

Prerequisites for ETL Architecture in Depth

Familiarity with the basic principles of dimensional modeling is necessary since dimensional models are designed as the ultimate ETL deliverables. Students can gain this familiarity by reading the following articles written for Data Management Review:

 

The rest of the Data Management Review series is also recommended reading. The class will include brief reviews of dimensional modeling principles so that everyone has the same vocabulary.

 

Course Details

ETL Architecture in Depth

Sep 17-20, 2012

Bayview Eden Hotel, Melbourne

 

For more information, please email us at training@altis.com.au

 

ETL Architecture in Depth Full Course Outline

(Numbered items refer to the 34 subsystems taught in this course)

Day 1: Surrounding The Requirements

  • Business needs
  • Compliance
  • Data profiling
  • Security
  • Latency (daily, hourly, seconds, instantaneous)
  • Archiving (recent history, very long term)
  • End user profiles (developers, business end users, analysts)
  • Skills (traditional EDW, new Big Data systems)
  • Licenses
  • Coding vs. tool choice
  • The restaurant analogy
  • Data types used in ETL systems
  • Data Profiling
  • Source to target map
  • Access methods, source types (including new Big Data)
  • Software, techniques
  • Change data capture
  • Extract window
  • Immediate transformations
  • (3) Extract staging table designs, table types, retention, backup
  • (22) Job scheduler
  • (22) Exception handling architecture
  • (23) Backup
  • (24) recovery, (24) restart
  • Historical versus incremental load
  • Team Responsibilities

Day 2

Cleaning

  • (4) Data quality architecture
  • (4) Data quality screens
  • (5) Error event fact table
  • (6) Audit dimension, compliance tracking
  • (28) Sorting
  • Module designs: (7) customer deduplication, address validation
  • Final clean data table designs: (8) Conforming
  • Definition of conformed dimensions and facts
  • Using the matrix
  • Master data management
  • Mapping incompatible structures into common structure
  • (25) Version control
  • (26) System and version migration, testing and regression
  • (27) Workflow monitor
  • (23) Job scheduler
  • (29) Lineage and dependency analyzer
  • (30) Problem escalation system

Modifying your ETL architecture for Big Data predictive analytics

  • The Hot Partition
  • Streaming ETL vs. batch ETL
  • Streaming delivery, query, reporting, dashboards, notifications
  • EAI architecture (Enterprise Application Integration)
  • MBETL architecture (Micro Batch ETL)
  • EII architecture (Enterprise Information Integration)

Modifying your ETL architecture for Big Data predictive analytics

  • Extreme size
  • Extreme integration
  • Massively distributed
  • No standard schema
  • MapReduce, Hadoop, Pig, Hive, Hbase
  • When to export to conventional RDBMS

Day 3: Building the ETL System

Delivering Dimension Tables

  • (9) Time variance designs (Slowly Changing Dimensions)
  • (10) Surrogate key generator
  • (15) Multi-valued dimensions, bridge tables
  • (11) Hierarchical dimensions (fixed, variable, ragged), bridge tables II
  • (12) Special dimensions
  • Date / Time dimensions
  • Junk dimensions
  • Mini-dimensions
  • Small dimensions
  • User maintained dimensions
  • Shrunken dimensions
  • Outrigger dimensions
  • Behavior tags
  • tep dimensions
  • Super type / Sub type dimensions
  • Study groups
  • Special cases (extreme dimensionality, extreme dimension width, many incompatible members)

Day 4

Delivering Fact Tables

  • (13) Fact table builder (transaction, periodic,accumulating and consolidated)
  • (14) Surrogate key pipeline
  • Referential integrity
  • Graceful extensibility (add attributes, add facts, add dimensions to existing schemas)
  • (16) Late arriving dimension and fact data
  • (17) The dimension manager, responsibilities and procedures, real time complexities
  • (18) The fact provider, responsibilities and procedures, real time complexities
  • (19) Aggregations
  • (20) Feeding OLAP cubes
  • (21) Data Integration manager (feeding data mining, presentation layer extracts, 3rd party flat files)

Development and Operations

  • (31) Parallel processing and pipelining
  • (32) Security
  • (33) Compliance
  • (34) Metadata
  • Metadata context
  • Process metadata
  • Technical metadata
  • Business metadata
  • Metadata Options
  • Metadata Strategy

 

About Kimball University
Kimball University is the definitive source for dimensional data warehouse education. They provide the highest quality and most practical education consistent with their instructors’ books and extensive experience in the dimensional approach. You’ll learn from the best in the business.

 

 

Every attendee will receive a copy of The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data, co-authored by Ralph Kimball and Joe Caserta.

Job listings powered by the CATS Applicant Tracking System - ©2010 CATS Software, Inc.