Ten Mistakes To Avoid
The staff of The Data Warehousing Institute has called
upon experts across the industry, and conducted meetings
in several cities with active data warehousing project
managers and IS executives to assist us in developing
a compendium of the "ten mistakes to avoid for
data warehousing managers." This article contains
about 65 percent of the complete document.
1. Starting With The Wrong Sponsorship Chain
The right sponsorship chain includes two key individuals
above the data warehousing manager. At the top is an
executive sponsor with a great deal of money to invest
in effective use of information. A good sponsor, however,
is not the only person required in the reporting chain
above the warehousing manager.
When a data warehousing project craters, the cause
can sometimes be traced to the lack of a key individual
between the sponsor and the data warehousing manager.
That person is often called the project "driver"
because he or she keeps the project moving in the right
direction and ensures the schedule is kept. A good driver
is a business person with three essential characteristics:
(1) s/he has already earned the respect of the other
executives, (2) s/he has a healthy skepticism about
technology, and (3) s/he is decisive but flexible.
2. Setting Expectations That You Cannot Meet And
Frustrating Executives At The Moment Of Truth
Data warehousing projects have at least two phases:
(1) the selling phase in which you attempt to persuade
people that they can expect to get wonderful access
to the right data through simple, graphical delivery
tools, (2) the struggle to meet the expectations you
have raised in phase one. Data warehouses do not give
users all the information they need.
All data warehousing is, by necessity, domain specific,
which means it focuses on a particular set of business
information. Worse still, many warehouses are loaded
with summary information - not detail. If a question
asked by an executive requires more detail or requires
information from outside the domain, the answer is often,
"we haven't loaded that information, but we can,
it will just cost (a bunch) and take (many) weeks."
Executives focus their frustration on the person who
made the promises.
3. Engaging in Politically-Naive Behavior. (e.g.
Saying "This Will Help Managers Make Better Decisions")
A foolish error made by many data warehousing managers
is promoting the value of their data warehouse with
arguments to the effect of, "This will help managers
make better decisions." When a self-respecting
manager hears those words, the natural reaction is "This
person thinks we have not been making good decisions
and that his/her system is going to 'fix' us."
From that point on, that manager is very, very hard
to please.
Most experienced CIOs know that the objective of data
warehousing is the same one that fueled the fourth generation
language boom of the late seventies, and the EIS craze
of the late eighties - giving end users better access
to important information. Fourth generation languages
have had a long and useful life, but EIS had a quick
rise and a quicker fall. Why? One possible answer is
that 4GLs were sold as tools to get data while EIS were
promoted as change agents that would improve business
and enable better management decisions. That raised
political issues, and made enemies out of potential
supporters.
4. Loading The Warehouse With Information "Just
Because It Was Available."
Some inexperienced data warehousing managers send a
list of tables and data elements to end users along
with a request asking, "which of these elements
should be included in the warehouse?" Sometimes
they ask for categories such as 'essential', 'important',
and 'nice-to-have'. They get back long lists of marginally
useful information that radically expand the data warehouse
storage requirements and, more importantly, slow responsiveness.
Extraneous data buries important information. Faced
with the need to dig through long guides to find the
right field name, and having to deal with multiple versions
of the same information, users quickly grow frustrated
and may even give up entirely.
5. Believing That Data Warehousing Database Design
Is The Same As Transactional Database Design
Data warehousing is fundamentally different from transaction
processing. The goal here is to access aggregates -
sums, averages, trends, and more. Another difference
is the user. In transaction processing, a programmer
develops a query that will be used tens of thousands
of times. In data warehousing, an end-user develops
the query and may use it only one time. Data warehousing
databases are often denormalized to make them easier
to navigate for infrequent users.
An even more fundamental difference is in content.
Where transactional systems usually contain only the
basic data, data warehousing users increasingly expect
to find aggregates and time-series information already
calculated for them and ready for immediate display.
That's the impetus behind the multi-dimensional database
market.
6. Choosing A Data Warehousing Manager Who Is Technology-Oriented
Rather Than User-Oriented
"The biggest mistake I ever made was putting that
propeller-head in as the manager of the project."
Those are the exact words from the driver at a large
oil company, explaining how the user-hostile project
manager had made so many people angry that the entire
project was in danger of being scrapped.
Do not let his words tar all technologists. Some make
excellent project managers and can serve as effective
data warehousing managers; however, many cannot. Data
warehousing is a service business-not a storage business-and
making clients angry is a near perfect method of destroying
a service business.
7. Focusing On Traditional Internal Record-Oriented
Data and Ignoring The Potential Value of External Data
and of Text, Images, and - Potentially - Sound And Video
A White House study of commercial executives showed
that the very highest executives rely on outside data
(news, telephone calls from associates, etc.) for more
than 95 percent of all the information they use. Because
of their focus on external sources of information, senior
executives sometimes see data warehouses as irrelevant.
Therefore, it's valuable to extend the project focus
to include external information.
In addition, consider expanding the forms of information
available through the warehouse. Users are starting
to ask, "Where's the copy of the contract (image)
that explains the information behind the data? And where's
the ad (image) that ran in that magazine? Where's the
tape (audio or video) of the key competitor at a recent
conference talking about its business strategy? Where's
the recent product launch (video)?" This is the
age of television. Traditional alphanumeric data is
two generations behind the current technology.
8. Delivering Data With Overlapping And Confusing
Definitions
The Achilles heel of data warehousing is the requirement
to gain consensus on data definitions. Conflicting definitions
each have champions, and they are not easily reconciled.
Many of the most stubborn definitions have been constructed
by managers to reflect data in a way that makes their
department look effective. To the finance manager, sales
means the net of revenue less returns. Sales to the
distribution people is what needs to be delivered. Sales
to the sales organization is the amount committed by
clients.
One organization reported twenty-seven different definitions
of sales. Executives do not give up their definitions
without a fight, and few data warehousing managers are
in a position to bully executives into agreement. Solving
this problem is one of the most important tasks of the
data warehousing driver. If it is not solved, users
will not have confidence in the information they are
getting. Worse, they may embarrass themselves by using
the wrong data - in which case, they will inevitably
blame the data warehouse.
9. Believing The Performance, Capacity, And Scalability
Promises
At a recent conference, CIOs from three companies-a
manufacturer, a retailer, and a service company-described
their data warehousing efforts. Although the three data
warehouses were very different, all three ran into an
identical problem. Within four months of getting started,
each of the CIOs unexpectedly had to purchase at least
one additional processor of a size equal to or larger
than the largest computer that they had originally purchased
for data warehousing. They simply ran out of power.
Two of the three had failed to budget for the addition,
and found themselves with a serious problem. The third
had budgeted for unforeseen difficulties, and was able
to adapt. A very common capacity problem arises in networking.
One company reported that it sized a network to support
an image warehouse, but discovered that the network
was soon overwhelmed The surprise was that the images
were not at fault. The problem turned out to be network
traffic for data transfer between the end-user application
and the database of indices on the server. The images
moved fast, but the process of finding the right one
clogged the network. Network overloads are a very common
surprise in client/server systems in general and in
data warehousing systems in particular.
10. Believing that Once The Data Warehouse Is Up
and Running, Your Problems Are Finished
Each happy data warehouse user asks for new data and
tells others about the 'great new tool.' And they, too,
ask for more data to be added. And all of them want
it immediately. At the same time, each performance or
delivery problem results in a high-pressure search for
additional technology or a new process. Thus the data
warehousing team needs to maintain high energy over
long periods of time. A common error is to place data
warehousing in the hands of project-oriented people
who believe that they will be able to set it up once
and have it run itself. Data warehousing is a journey,
not a destination.
11. Focusing On Ad Hoc Data Mining And Periodic
Reporting.*
This is a suble error, but an important one. Fixing
it may transform a data warehousing manager from a data
librarian into a hero. The natural progression of information
in a data warehouse is (1) extract the data from legacy
systems, clean it, and feed it to the warehouse, (2)
support ad hoc reporting until you learn what people
want, and then (3) convert the ad hoc reports into regularly
scheduled reports. That's the natural progression, but
it isn't the best progression. It ignores the fact that
managers are busy and that reports are liabilities rather
than assets unless the recipients have time to read
the reports.
Alert systems can be a better approach and they can
make a data warehouse mission-critical. Alert systems
monitor the data flowing into the warehouse and inform
all key people with the need to know, as soon as a critical
event takes place. Harris Semiconductor's industry-leading
manufacturing alert server, for example, monitors patterns
in semi-conductor test data, and screams loudly (via
email) when wafer characteristics anywhere in the world
(Malaysia, Singapore, or three US sites) creep too far
from the ideal. Rethink the manager's need: Does he
or she really want reports? Or would an alert system
be better?
* You'll find eleven 'mistakes' on our list. Believing
there are only ten mistakes to avoid is also a mistake,
so we've given you eleven to keep you on your toes.
|