Saturday, 27 July 2013

Greenfield Data Warehouse - Problems faced and solutions found






















I've been leading the building of a greenfield data warehouse - among many other things - for an SME over the last 18 months. My team work closely with an excellent Analytics department who are responsible - among many other things - of translating the data we provide into usable MI for stakeholders. 

We recently presented the stakeholder focused, agile methodology we've adopted to another part of the organisation who require a reboot of their current data management practices and will be building a data warehouse of their own either with our guidance or under our direct control.

The two slides featured here show the before and after position. Although it has to be said that our project is far from finished we have nonetheless built a solid foundation on which we can expand. Both from a technical and business point of view our solution is very pragmatic. We knew all along that anything we built, whether process or technical solution, would have to be modular so it could adapt to the changing environment present in the vibrant, fast-moving and maturing SME in which we operate.


The Difficulties

As the business has grown the number of systems has grown and so has the complexity.

This creates challenges unique to us but comparable to challenges faced by all businesses of our size and composition.

We began back in January 2012 and at the time we didn't have many of the foundations we required.

  1. Our data is mostly insurance premium and claims but from many different sources.
  2. Within that data there were technical differences but also semantic differences so the same thing is known by different labels in different systems
  3. There was no existing method of sharing data with colleagues from a central source
  4. Even if there had been an MI Portal there was no central source for reporting from which such a portal could have been fed
  5. The DBA and Analytics team worked together but none of us was dedicated full time to this project
  6. We didn’t have a plan or a PM to drive it forward
  7. And everything was needed yesterday

The Solutions

  1. We identified an existing mid-range under-utilized server and declared it our reporting environment. Where ever possible we switch reporting – which was still running on our live servers – onto this environment. It gave us the power to run resource intensive processes away from any system that serves customers.
  2. We were already a business run on Microsoft software by colleagues comfortable with that environment. It made sense to use SQL Server as our Business Intelligence back-end.
  3. We already had data distributed across multiple servers and we expect that environment to get more complex over time. We created replica’s of all the reporting data sources on our reporting environment. Now we had all the data in one place but it was still in different formats.
  4. Next we identified key business areas – led by stakeholder requirements – and we built data marts for over each process. We did this sometimes across multiple data sources so we had to create a translation layer between the inputs and the outputs.
  5. We then shared the data to users via Microsoft Excel, as it was and remains today the easiest tool we have to get data in front of consumers. The Excel workbooks we provide are dynamic cubes which means they are connected to the reporting databases and function like massive pivot tables.
  6. The early developments were really only straw men that we used to create a feedback with users until we were happy we had accurate data in the shape required to satisfy “all reasonable questions, and some unreasonable questions too.” Without an Agile approach we wouldn't have progressed as fast.
  7. We work closely with the stakeholders. Without there constant input, testing and feedback we may have followed the unfortunate path that many IT geeks take of building a gold-plated technical solution which is of no use to the business.
  8. We've never pretended that the data we present is perfect. Recently we uncovered a few glaring mistakes, not with our development, but in an underlying third-party database. Before we found that issue we were blind to it. Now we have our single view of the facts we’re discovering errors in previous reports and processes.

In Summary

It's horses for courses. As a business intelligence architect it's important to balance stakeholders and technical needs. There are many technical elements of our data warehouse which I would love to spend more time on as I love that type of work. But the opportunity to build a data warehouse solution from scratch comes with an implicit responsibility to provide ROI from day 1.

2 comments:

  1. Lovely blog i just survey web sites but your blog is so good it just great enough informative for people. I like it. Rail sided warehouse chicago

    ReplyDelete