Monday, January 5, 2009

Microsoft Analysis Services migration case study

This study shows some lessons learned when migrating a large SQL Server Analysis Database from 2000 to 2005 version with hundreds of gigabytes in size, hundreds of users and thousands of user MDX queries.

It describes how the:
• MDX syntax changed
• Design of the analytical database was changed
• ProClarity repository with user Queries was updated
• Analytical Processing was designed
• What utilities were useful in migration process

Source system description


Before migration process description lets describe little bit the system before migration (source system). Datawarehouse users access data through the ProClarity Analytics Server 6.3 (PAS) where there is storage of about 20 000 user queries or via ProClarity Desktop Professional. PAS is a web application that enables advanced ad-hoc queries, supports some publishing features etc. Its repository is stored on the SQL Server 2000.
Microsoft Analysis Services (SSAS) 2000 is the server component for OLAP queries. There are a few reports accessed through the Reporting Services 2000. Only a few users access the Oracle relational layer via SQL or reports on Reporting Services.

Target system


The end client applications will not be changed but the analytical database is subject of many changes. Many changes are not visible from business users’ side but are necessary when the server platform changed to the Microsoft Analysis Services 2005. Main drivers of the change are:
• enhanced monitoring – with SSAS 2000 was the tracing impossible
• enhanced processing capabilities – ProcessIndex method for aggregation processing
• supportability – support of the SSAS 2000 ended on July, 2007
• scalability – easier hardware scalability or the easier software scalability to the SSAS 2008

Migration tasks


We expected major issues will be coming from following areas:
• User queries
  o Mark from the bunch of PAS queries only used and valid one. Unfortunately PAS has no support for such a task.
  o MDX syntax changes (date key, dot notation)
  o PAS queries upgrade (refresh of the changed underlying MDX queries)
• structural (design) changes of OLAP structure 
  o Obsolete features replacement (virtual dimensions, virtual cubes etc.)
  o Duplicities in key columns
  o Redesing of the server calculations (usage of the Scope statement)
• processing workflow application – the processing model was completely redesign and is accessed via Analysis Management Objects (AMO).


Model of PAS entities


From wide functionalities of PAS our effort was in so called Briefing Book Server. From a user point an MDX query is a page that is organized with books and libraries. Superclass for a page is an Briefing Book Element. Also in the relational repository there are tables BookElements, Books, Libraries and MDStores.

 



Pages and books have more detailed structure as can be seen from the PAS documentation. These structures can be accessed either via PAS object model or via XML which is stored in tables mentioned above. 


 

BriefingBook object refers to the Book. MDStore object should contain all metadata from all pages within a book. Our experience shows that MDStore can be sometimes ignored – we did so with BookCreator application described later. 
When digging into a BriefingPage or shortly a page there is very important section called Commands which holds always one Command. A command is defined via an MDX query and has some additional attributes that are used in the web pages e.g. for showing which dimensions are combined in the background. These attributes are visible under the tag DSH (Data Source Hub) and its cardinality is always 1:1. So one page has exactly one command object (with MDX as a property) and it has exactly one object referenced to the Axes class and exactly one object referenced to the Columns class but it can have many dimensions laying under it.


 

Marking of user queries


Neither our customer neither we believed that all 20 000 elements on PAS are all important. We together defined a rule for an active user which we applied on the SSAS query log. Queries of such a user were ignored. 

Next step was in validation of the rest MDX queries. We developed a small C# utility (called ASCMD2000). We could not allow a linked server to the SSAS because many of the queries were longer than 4000 bytes.

ASCMD2000 utility:
• Runs an MDX
• Measures duration of that command – to sort out long running queries, comparison of the queries against the target system
• Retrieves checksum of the resultset or returning error – MDX with error were ignored in later steps.
• Enable query on both SSAS 2000 and 2005.

This utility was scheduled multiple times via SQL Server agent and all results were stored in a SQL server table. It turnout to be a long lasting task (when not running in parallel we were able test about 50 MDX command per hour) and some successful command in ASCMD2000 were still not all successful in ProClarity. Main error issues we faced were:
• Connection problems – sometimes the connection simply failed. When repeating the command executed without problems. Imagine – we had to query the production system – we had no SSAS 2000 lab that could be compatible with 300GB production olap system.
• Dimension member not found – the older the query the higher the probability to quote not existing (existing with a different parent when using a composite key) dimension member

Finally we filtered out 40 percent of queries like obsolete or invalid. The extra output of this marking step was the OlapQuery log which we used for Usage-Based-Optimization.

MDX syntax changes


When migrating an MDX from SSAS 2000 to 2005 we found some syntax changes:
• Format of the date key columns,
• Notation in composite key columns. Duplicity within level key columns (attribute that makes the level) is handled in different way – SSAS 2005 takes the first occurrence and all members with that level key are its descendants. However SSAS 2000 takes as the composite key the whole path across all levels. 

For example date time formatting in MDX 2000 was national characters specific but MDX 2005 is international:

--2000: 
[Time].[Monthly].[Day].&[16.7.2006]
--2005: 
[Time].[Monthly].[Day].&[2006-07-16T00:00:00]

With key columns there must be in 2005 quoted the right level (in our example the LevelX) and the composite values (LevelX has a composite key column) in the path are not separated with dot mark:

--2000:
[Dimension].[TopLevel].&[Level1Field].&[Level2Field].&[LevelXField]
--2005: [Dimension].[Hierarchy].[LevelX].&[Level1Field]&[Level2Field]&[LevelXField]

For syntax changes we had to develop a quite sophisticated utility on the basis of regular expressions that was configured via XML. As you know the regular expressions in TSQL must be implemented via .Net assembly because it is not implemented in TSQL. We also found that the .Net deserialization is not somehow supported in SQL Server 2005 hosting environment.

Testing approach


During the migration we developed set of tests that were frequently running. There were always a couple of scripts in each test case: one on the source (production) environment and the second on the target (migrated) environment. Frequency of tests depended on data consistency – we run tests when we have comparable data on both environments.

The first set of test cases was for dimension compatibility. Results of AllMembers script for a production database should be the same as for the database on the development server. Sure there are exceptions when a dimension design was changed. Dimension tests showed to us some interesting issue with key columns. 

For the new dimensions with multiple levels we had to take a special care for defining the composite keys on all levels.

The second set of test cases was developed for measures results. Here our effort came hand in hand with processing workflow because it was difficult to prepare a compatible data in the development database. A lot of changes in a production database still hit the historical data.

Testing scripts were running through the developed ASCMD2000 utility.

Structural changes in analytical database


Most structural changes were technically driven. For example virtual dimensions can be now bypassed with dimension attributes. However when a dimension has more than one hierarchy or additional attribute (behaves same like additional hierarchy) each member or level must quote a hierarchy.

For this purposes we developed a replacement facility where there is a pair of an old text and new one. A priority can be specified when some replacement must happen before another one.

The biggest effort in SSAS redesign took calculations redesign. The source system was special with number of calculated members. Most of calculations belong to the measures dimension but there is a special dimension used for measures meaning modification. For explanation there were some groups of modificators:
• time – for example month to date modificator showed for a [Measures].CurrentMember only a MTD time slice
• budget – slice for in budget/out budged
• promo – sales coming from promo etc.
• balance – average, closing balance modificators

For these modificators we decided to use Scope statement whenever suitable. Our findings show that Scope is sometimes hard to predict and simply is not working. For example we wanted the [Measures Modificator] dimension had all physical members coming from a table. During the implementation it turned out to be all calculated members.

Our other note is to the order of implementation of Scope. There seems to be a couple of stereotypes when a currently working Scope stops to work. Problems are often solved with reordering of calculations.

Finally we found that our AS2005 database is 2 times bigger in size than AS2000 production one.

PAS queries update


Our task was to change the user queries (about 12 000 BriefingPages i.e. queries) with minimal impact on user side. 

At the beginning we wanted to use replacement facility not only to MDX queries (property of the Command object in the BriefingPage object) but on the whole BriefingPage and MDStored object (via its XML interface). The main problem was in our understanding and documentation of the whole XML interface.

Solution base fully on object model approach was in its time consumption very similar to running the whole queries during the Marking phase mentioned above.

The combination of direct database modifications with PAS object model modification showed the most suitable. We developed other utility called BookCreator. Utility was based on ProClarity desktop components (ActiveX) which we used to connect to PAS server, create book etc. On the whole it enabled us:
• Creation of flat BriefingBooks where pages were without folders
• Add BriefingPages and MDStores into flat BriefingBook from a set of MDX queries belonging to one book

For long running MDX we used a fake query that was replaced with the 2005 MDX. For this replacement we had to develop another utility that could replace BookElements.Xmldata attribute which is of NTEXT data type on MS SQL Server. Our experiment with UPDATETEXT and WRITETEXT commands failed because of time and disk space consumption. Change of 1000 records took 3 hours and 50 GB. Our final solution resulted in another utility (BlobUpdator) based on ADO 2.8 with the ADODB.Stream object. Modification took minutes and the repository grew in hundreds of MB.

The flat structure of newly created books was changed via a TSQL stored procedure that created BookElements of folder type and did other updates (author, library etc.) all based on the production version of the Books. .

Processing workflow application


Specific requirements led to development of another utility (OlapProcessor) that should:
• be driven via Oracle tables – these interface tables contained data about what type of change was on which dimension table or for which dates fact table records were added (all partitions are time based)
• process dimensions either with ProcessUpdate or ProcessAdd based on the type of change in the interface table
• create and process partitions (weekly, monthly, quarterly, yearly) based on the date of change in the interface table. The changing data has to be processed immediately after its change in Oracle (a message about change was accessed via special Oracle package) 
• aggregation reprocess during idle time
• error aware – some processing states has to be detected and handled


Processing is heavily based on AMO and for processing it uses CaptureLog object. We came to small compromises in setting MaxParrallel tag or in detecting which aggregations were changed during a dimension ProcessUpdate.


 
Robert Srna, 2008
Adastra Corporation
www.adastragrp.com

Revised version of this post can be found at sqlservercentral.com

No comments: