IT Pains

Monday, July 15, 2013

XML Editors - Oxygen vs. XML Spy

Character of our team

Working with XML documents belongs to daily tasks when you are member of an integration team. We design "SOA" services (in apostrophes because of maturity of SOA governance) on IBM Websphere platform. We decided to improve our hammer - XML editor and began some evaluation. We evaluated following product versions: * Oxygen XML Editor 14.2 Enterprise edition, build 2013051016 * Altova XMLSpy Enterprise edition, version 2013 rel. 2

Results

Oxygen best fits our needs in most of criterions. In top needs Oxygen better in 4 cases (Altova only in 2 cases). List price (Oxygen EE €540 ($698), Altova EE €799) & internal support (for each tool there is a responsible person selected in our company. Altova's responsible person is from abroad) wins also Oxygen.
Following picture shows what features we see as most important

Below you can see what features is Oxygen better

Evaluation approach

We tried to compare about 25 criterion of both products and put marks to Excel (C, D columns - mark 1 is for best support - mark 5 for none support). Importance of each criterion were put to green columns by different colleagues (value 10 for critical feature, value 0 for not important feature).
Preview of evaluation excel sheet

XSD editor features

Oxygen is very strong editor with options oriented to source code (compared to Altova text like experience). Perfect support for refactoring and navigation among imported/included XML files (hold CTRL and point with mouse like in PLSQL developer). Oxygen is able to show annotations from imported/included XSD.

Altova is not able to show inherited annotations. Editing options enable basic code completion. Options are text oriented. Altova generates easier sample data (on the selected xsd. In Oxygen there must be specified path to xsd).

Support for visual editing of XSD have both tools very good.

Webservices support

Altova's support has better visual WSDL tools while Oxygen offers outlining.

SOAP support is similar.

XSLT/XPATH support

XSLT debugger is in Oxygen available in a XSLT perspective.

Results of XPath expression are in Oxygen well graphically shown. Alova can apply expressions on multiple XML files.

HTML and DOCBOOK editors

Toolset in Oxygen is richer than in Altova.

Sunday, January 2, 2011

Lessons learned with Bonita Open Solution

Why BPM
IT operations and IT development in our company is subject of outsourcing. It has some consequences among others also the lack of knowledge how our business processes are automated by SW solutions.
Business processes are often documented in various styles. We decided to take advantage of BPM tools where there is process documentation tightly coupled with process implementation.
We started with a study that compared many open source BPM tools (ODE, WSO2, jBPM, INTALIO, OW2, SARASVATI, ENHYDRA, TAVERNA, ACTIVE BPEL, RUNA, BOS). The Bonita Open Solution BPM tool was decided as a winner for our POC (proof-of-concept).
POC automates process of the application audit (APA). APA is a poll where our business and IT users fill various questionnaires. These answers are then compiled into reports of threats and opportunities, reports of portfolio clusters and answers are also an important resource for our application portfolio documentation.

BPM in Bonita
In Bonita we implemented a process as can be seen from following picture. The current APA process is not important - we want to highlight some of its parts.

On the picture you can see that:
• a process has its starting (green) and ending point (red)
• a process flow is noted with arrows – a flow can be conditional (a good practice is note the condition)
• an activity can have more enter points – in such a case there is a gate (diamond sign) used – e.g. the NotifyDistribute activity has a preceding gate
• An activity can have a human character. There is a form GUI which allows a human to interact with the process. In such a case there is a figure sign in top left coroner.
• an activity can have other functionalities which are called connectors – see the sign of a socket plug in top right coroner

Bonita allows you to design a process – it has its own design studio. Bonita allows you also manage of running process instances (which are called cases) in a component called Bonita User Experience (management console) and Bonita has also its API.

Positives
Following chapters are listed by importance from our personal view.
1. Process documentation
As you can see it is the major function of BPM tools. It is pretty easy to understand a process also from a user perspective.
2. Human tasks
Bonita has very strong support for user forms – a case has simple web forms. A design of a form in a studio and a resulting web form are on following pictures. Button “Odeslat” is a submit button – the human activity has finished then. A “Zpet” button is back button for previous form – in this activity there are two forms designed for the user interaction.

A GUI component has its layout settings, data settings (you can see variables used for component initialization or for save on the picture) or validation settings (you can write a Java class used for validation purpose).

3. Monitor console
It is pretty easy to see running cases in Bonita User Experience. It is a GUI where administrator can see all processes, cases, users and has basic functionality for a management. GUI can be accessed also by other users – they can see history of cases what they participated or can see in their inbox tasks to be done.

4. Web services connector
Connectors enable designer trigger procedures or component calls during events of an activity. There can be multiple connectors for an activity (e.g. Database call during initialization event, Groovy script execution after a submit event etc.).
We appreciate the Web service client connector. It is pretty easy to invoke a web service. You can see that activity “Aktualizace technickych charakteristik” has 4 connectors (read facts from a database; send a web service request – both during the initialization activity event “On Ready”. After submit event there is a Groovy script used for a variable serialization and then a database save connector is launched.)
Web service detail shows how the request and connection are configured – we use variables for reuse. Later I will discuss problems with reuse – this web service is responsible for user notification. The notification contains also URL of the task. Each activity has a dynamic URL which can be recognized only during the activity life (see the getStepURL method of the Bonita API).

5. Actor assignment
With Bonita you can set actors also during run time. You simple set a variable that is global for the whole case (process instance).

Negatives
1. Testability, configuration and bug fixing
Testing of a process is not easy. There are times when you need your process running without for example notifying a user. In APA process we use a connector and it would be nice to enable or disable via configuration a specific connector.
Bug fixing is possible but you have to save a new version of your process. The fixed version is running next to the version with a bug. A simple replace is not possible because when there is at least a case of a process running you are not allowed to delete the process.

2. Error messages
The error messages are poor and you often don’t recognize where the problem dwells. Sometimes you get more info from log but it is not the rule. You can for example run into these errors - see our workaround.
2.1. Error “Cannot seal oracle class when starting a new case”. Means deployment problem - you have duplicities with jdbc library – you have to remove the library reference from process definition.
2.2. Error “Could not execute JDBC batch update”. Means database problem on the server backend database. Bonita needs a database for storing processes, users and cases. We ran into a problem when Bonita started the numbering of cases from the beginning while we were at least on the number 87 (I’m still not sure how it happened). Fortunately there is a database constraint that prevents some duplicity in case numbering.

Unfortunately the finished cases are not presented in BN_PROC_INST table so the 2nd error causes that yet finished cases will be overwritten - also with its variables. On the following picture notice that after the activity “Aktualizace technickych charakteristik” should be activity NotifyDistribute and the End1 activity (default name for final activity).
Database constraint has not found the case number in database and allowed to start the case with the same ID (name plus number) again. You can see the activity “Nastaveni hlasovani” which is first human activity in APA process like an actual activity task. Notice also that variable values from the first case run are missing. Luckily we are storing results of APA in a custom database table!

And there are many other smaller bugs – every product has a bug.

3. Reuse of connectors
Reuse helps to design a solution once and use multiple times. Our process has a Groovy routine which normalizes a text (capital letters, national characters etc.) that we use in more activities. Groovy reuse is cool. On the other hand we miss a possibility to reuse the mentioned notification web service client connector. Currently there are 4 almost identical definitions of a web service call and our APA process is not too much complicated.
Other option that we are missing could enable or disable a connector during the run time.

4. Deployment
Deployment of Bonita on server is not trivial – so is not the deployment of a process. From studio you can start a case locally. On the background you have a jetty server in let’s say Windows XP - everything seems fine.
Your server environment may not have a jetty and may not have Windows. We have a Tomcat on CentOS Linux. It is source of many problems. For example the national characters or operation system calls (we wanted to configure process via OS variables).
In general for process deployment you need two java binaries for each process – all of them are produced via BOS development studio:
4.1. the WAR binary which is hosted on the application server
4.2. the BAR binary which is loaded into Bonita management console (User Experience)
Deployment of fixes leads often to delete BAR and undeploy WAR and vice versa. Especially when you change the graphical design – e.g. the web forms for human activities.
It is very important to choose the embedded version of WAR deployment. In development studio you will need to eliminate duplicities of libraries via lib references of the process.
See dialog for building a WAR file.

Summary
The product is relatively stable and under the active development. We started to play with the version 5.1.1 in May 2010 and now in December 2010 there is version 5.3.1. We are using APA on the version 5.2.2.
Bonita is still not mature for a business critical process where you cannot lose any transaction e.g. customer orders.

Robert Srna, 2011
Links to Bonita

Monday, January 5, 2009

Microsoft Analysis Services migration case study

This study shows some lessons learned when migrating a large SQL Server Analysis Database from 2000 to 2005 version with hundreds of gigabytes in size, hundreds of users and thousands of user MDX queries.

It describes how the:
• MDX syntax changed
• Design of the analytical database was changed
• ProClarity repository with user Queries was updated
• Analytical Processing was designed
• What utilities were useful in migration process

Source system description

Before migration process description lets describe little bit the system before migration (source system). Datawarehouse users access data through the ProClarity Analytics Server 6.3 (PAS) where there is storage of about 20 000 user queries or via ProClarity Desktop Professional. PAS is a web application that enables advanced ad-hoc queries, supports some publishing features etc. Its repository is stored on the SQL Server 2000.
Microsoft Analysis Services (SSAS) 2000 is the server component for OLAP queries. There are a few reports accessed through the Reporting Services 2000. Only a few users access the Oracle relational layer via SQL or reports on Reporting Services.

Target system

The end client applications will not be changed but the analytical database is subject of many changes. Many changes are not visible from business users’ side but are necessary when the server platform changed to the Microsoft Analysis Services 2005. Main drivers of the change are:
• enhanced monitoring – with SSAS 2000 was the tracing impossible
• enhanced processing capabilities – ProcessIndex method for aggregation processing
• supportability – support of the SSAS 2000 ended on July, 2007
• scalability – easier hardware scalability or the easier software scalability to the SSAS 2008

Migration tasks

We expected major issues will be coming from following areas:
• User queries
o Mark from the bunch of PAS queries only used and valid one. Unfortunately PAS has no support for such a task.
o MDX syntax changes (date key, dot notation)
o PAS queries upgrade (refresh of the changed underlying MDX queries)
• structural (design) changes of OLAP structure
o Obsolete features replacement (virtual dimensions, virtual cubes etc.)
o Duplicities in key columns
o Redesing of the server calculations (usage of the Scope statement)
• processing workflow application – the processing model was completely redesign and is accessed via Analysis Management Objects (AMO).

Model of PAS entities

From wide functionalities of PAS our effort was in so called Briefing Book Server. From a user point an MDX query is a page that is organized with books and libraries. Superclass for a page is an Briefing Book Element. Also in the relational repository there are tables BookElements, Books, Libraries and MDStores.

Pages and books have more detailed structure as can be seen from the PAS documentation. These structures can be accessed either via PAS object model or via XML which is stored in tables mentioned above.

BriefingBook object refers to the Book. MDStore object should contain all metadata from all pages within a book. Our experience shows that MDStore can be sometimes ignored – we did so with BookCreator application described later.
When digging into a BriefingPage or shortly a page there is very important section called Commands which holds always one Command. A command is defined via an MDX query and has some additional attributes that are used in the web pages e.g. for showing which dimensions are combined in the background. These attributes are visible under the tag DSH (Data Source Hub) and its cardinality is always 1:1. So one page has exactly one command object (with MDX as a property) and it has exactly one object referenced to the Axes class and exactly one object referenced to the Columns class but it can have many dimensions laying under it.

Marking of user queries

Neither our customer neither we believed that all 20 000 elements on PAS are all important. We together defined a rule for an active user which we applied on the SSAS query log. Queries of such a user were ignored.

Next step was in validation of the rest MDX queries. We developed a small C# utility (called ASCMD2000). We could not allow a linked server to the SSAS because many of the queries were longer than 4000 bytes.

ASCMD2000 utility:
• Runs an MDX
• Measures duration of that command – to sort out long running queries, comparison of the queries against the target system
• Retrieves checksum of the resultset or returning error – MDX with error were ignored in later steps.
• Enable query on both SSAS 2000 and 2005.

This utility was scheduled multiple times via SQL Server agent and all results were stored in a SQL server table. It turnout to be a long lasting task (when not running in parallel we were able test about 50 MDX command per hour) and some successful command in ASCMD2000 were still not all successful in ProClarity. Main error issues we faced were:
• Connection problems – sometimes the connection simply failed. When repeating the command executed without problems. Imagine – we had to query the production system – we had no SSAS 2000 lab that could be compatible with 300GB production olap system.
• Dimension member not found – the older the query the higher the probability to quote not existing (existing with a different parent when using a composite key) dimension member

Finally we filtered out 40 percent of queries like obsolete or invalid. The extra output of this marking step was the OlapQuery log which we used for Usage-Based-Optimization.

MDX syntax changes

When migrating an MDX from SSAS 2000 to 2005 we found some syntax changes:
• Format of the date key columns,
• Notation in composite key columns. Duplicity within level key columns (attribute that makes the level) is handled in different way – SSAS 2005 takes the first occurrence and all members with that level key are its descendants. However SSAS 2000 takes as the composite key the whole path across all levels.

For example date time formatting in MDX 2000 was national characters specific but MDX 2005 is international:

--2000:
[Time].[Monthly].[Day].&[16.7.2006]
--2005:
[Time].[Monthly].[Day].&[2006-07-16T00:00:00]

With key columns there must be in 2005 quoted the right level (in our example the LevelX) and the composite values (LevelX has a composite key column) in the path are not separated with dot mark:

--2000:
[Dimension].[TopLevel].&[Level1Field].&[Level2Field].&[LevelXField]
--2005: [Dimension].[Hierarchy].[LevelX].&[Level1Field]&[Level2Field]&[LevelXField]

For syntax changes we had to develop a quite sophisticated utility on the basis of regular expressions that was configured via XML. As you know the regular expressions in TSQL must be implemented via .Net assembly because it is not implemented in TSQL. We also found that the .Net deserialization is not somehow supported in SQL Server 2005 hosting environment.

Testing approach

During the migration we developed set of tests that were frequently running. There were always a couple of scripts in each test case: one on the source (production) environment and the second on the target (migrated) environment. Frequency of tests depended on data consistency – we run tests when we have comparable data on both environments.

The first set of test cases was for dimension compatibility. Results of AllMembers script for a production database should be the same as for the database on the development server. Sure there are exceptions when a dimension design was changed. Dimension tests showed to us some interesting issue with key columns.

For the new dimensions with multiple levels we had to take a special care for defining the composite keys on all levels.

The second set of test cases was developed for measures results. Here our effort came hand in hand with processing workflow because it was difficult to prepare a compatible data in the development database. A lot of changes in a production database still hit the historical data.

Testing scripts were running through the developed ASCMD2000 utility.

Structural changes in analytical database

Most structural changes were technically driven. For example virtual dimensions can be now bypassed with dimension attributes. However when a dimension has more than one hierarchy or additional attribute (behaves same like additional hierarchy) each member or level must quote a hierarchy.

For this purposes we developed a replacement facility where there is a pair of an old text and new one. A priority can be specified when some replacement must happen before another one.

The biggest effort in SSAS redesign took calculations redesign. The source system was special with number of calculated members. Most of calculations belong to the measures dimension but there is a special dimension used for measures meaning modification. For explanation there were some groups of modificators:
• time – for example month to date modificator showed for a [Measures].CurrentMember only a MTD time slice
• budget – slice for in budget/out budged
• promo – sales coming from promo etc.
• balance – average, closing balance modificators

For these modificators we decided to use Scope statement whenever suitable. Our findings show that Scope is sometimes hard to predict and simply is not working. For example we wanted the [Measures Modificator] dimension had all physical members coming from a table. During the implementation it turned out to be all calculated members.

Our other note is to the order of implementation of Scope. There seems to be a couple of stereotypes when a currently working Scope stops to work. Problems are often solved with reordering of calculations.

Finally we found that our AS2005 database is 2 times bigger in size than AS2000 production one.

PAS queries update

Our task was to change the user queries (about 12 000 BriefingPages i.e. queries) with minimal impact on user side.

At the beginning we wanted to use replacement facility not only to MDX queries (property of the Command object in the BriefingPage object) but on the whole BriefingPage and MDStored object (via its XML interface). The main problem was in our understanding and documentation of the whole XML interface.

Solution base fully on object model approach was in its time consumption very similar to running the whole queries during the Marking phase mentioned above.

The combination of direct database modifications with PAS object model modification showed the most suitable. We developed other utility called BookCreator. Utility was based on ProClarity desktop components (ActiveX) which we used to connect to PAS server, create book etc. On the whole it enabled us:
• Creation of flat BriefingBooks where pages were without folders
• Add BriefingPages and MDStores into flat BriefingBook from a set of MDX queries belonging to one book

For long running MDX we used a fake query that was replaced with the 2005 MDX. For this replacement we had to develop another utility that could replace BookElements.Xmldata attribute which is of NTEXT data type on MS SQL Server. Our experiment with UPDATETEXT and WRITETEXT commands failed because of time and disk space consumption. Change of 1000 records took 3 hours and 50 GB. Our final solution resulted in another utility (BlobUpdator) based on ADO 2.8 with the ADODB.Stream object. Modification took minutes and the repository grew in hundreds of MB.

The flat structure of newly created books was changed via a TSQL stored procedure that created BookElements of folder type and did other updates (author, library etc.) all based on the production version of the Books. .

Processing workflow application

Specific requirements led to development of another utility (OlapProcessor) that should:
• be driven via Oracle tables – these interface tables contained data about what type of change was on which dimension table or for which dates fact table records were added (all partitions are time based)
• process dimensions either with ProcessUpdate or ProcessAdd based on the type of change in the interface table
• create and process partitions (weekly, monthly, quarterly, yearly) based on the date of change in the interface table. The changing data has to be processed immediately after its change in Oracle (a message about change was accessed via special Oracle package)
• aggregation reprocess during idle time
• error aware – some processing states has to be detected and handled

Processing is heavily based on AMO and for processing it uses CaptureLog object. We came to small compromises in setting MaxParrallel tag or in detecting which aggregations were changed during a dimension ProcessUpdate.

Robert Srna, 2008
Adastra Corporation
www.adastragrp.com

Revised version of this post can be found at sqlservercentral.com

Monday, June 9, 2008

Olap Time Dimension Pains (will continue)

During our last project we ran into two big problems with time dimension in Analysis Services 2005. Today I will describe problem with a key attribute of datetime database type.

What is interesting with following members on time dimension?

Correct! The December 2000 must not be a member of 2001 calendar year.
How is it possible? Well it must have be a bug in Analysis Services but is quite easy to eliminate. Still not satisfied? Here are some simple steps to establish problem in Adventure Works:

Run 2 sql inserts into relational Adventure Works (script 1 in the bottom)
Run script 2 in the bottom to create new time sister dimension. Note: sister dimension [Date test] must be added into cube manually via BI studio. (Alternate step is to change key column of the Date attribute of the Date dimension - but you will run into lot of problems with measure groups)
Full process this new sister time dimension.
Process cube Adventure Works (I had to kick out all data mining things to make full processing working.)
Run script 3 to experience the strange behavior - also visible on next picture.
Change the MemberNamesUnique property and retry previous step. And there you go (like in Greece Wedding) : setting MemberNamesUnique to True/False is the feature when using natural datetime key for the key attribute.

Script 1:

--1.1
INSERT INTO dbo.[DimTime] (
[FullDateAlternateKey],
[DayNumberOfWeek],
[EnglishDayNameOfWeek],
[SpanishDayNameOfWeek],
[FrenchDayNameOfWeek],
[DayNumberOfMonth],
[DayNumberOfYear],
[WeekNumberOfYear],
[EnglishMonthName],
[SpanishMonthName],
[FrenchMonthName],
[MonthNumberOfYear],
[CalendarQuarter],
[CalendarYear],
[CalendarSemester],
[FiscalQuarter],
[FiscalYear],
[FiscalSemester]
) VALUES (
/* FullDateAlternateKey - datetime */ '20010101',
/* DayNumberOfWeek - tinyint */ 6,
/* EnglishDayNameOfWeek - nvarchar(10) */ N'Friday',
/* SpanishDayNameOfWeek - nvarchar(10) */ N'Viernes',
/* FrenchDayNameOfWeek - nvarchar(10) */ N'Vendredi',
/* DayNumberOfMonth - tinyint */ 1,
/* DayNumberOfYear - smallint */ 1,
/* WeekNumberOfYear - tinyint */ 1,
/* EnglishMonthName - nvarchar(10) */ N'January',
/* SpanishMonthName - nvarchar(10) */ N'Enero',
/* FrenchMonthName - nvarchar(10) */ N'Janvier',
/* MonthNumberOfYear - tinyint */ 1,
/* CalendarQuarter - tinyint */ 1,
/* CalendarYear - char(4) */ '2001',
/* CalendarSemester - tinyint */ 1,
/* FiscalQuarter - tinyint */ 3,
/* FiscalYear - char(4) */ '2000',
/* FiscalSemester - tinyint */ 2 )
--31.12
INSERT INTO dbo.[DimTime] (
[FullDateAlternateKey],
[DayNumberOfWeek],
[EnglishDayNameOfWeek],
[SpanishDayNameOfWeek],
[FrenchDayNameOfWeek],
[DayNumberOfMonth],
[DayNumberOfYear],
[WeekNumberOfYear],
[EnglishMonthName],
[SpanishMonthName],
[FrenchMonthName],
[MonthNumberOfYear],
[CalendarQuarter],
[CalendarYear],
[CalendarSemester],
[FiscalQuarter],
[FiscalYear],
[FiscalSemester]
) VALUES (
/* FullDateAlternateKey - datetime */ '20001231',
/* DayNumberOfWeek - tinyint */ 6,
/* EnglishDayNameOfWeek - nvarchar(10) */ N'Thursday',
/* SpanishDayNameOfWeek - nvarchar(10) */ N'Jueves',
/* FrenchDayNameOfWeek - nvarchar(10) */ N'Jeudi',
/* DayNumberOfMonth - tinyint */ 31,
/* DayNumberOfYear - smallint */ 366,
/* WeekNumberOfYear - tinyint */ 52,
/* EnglishMonthName - nvarchar(10) */ N'December',
/* SpanishMonthName - nvarchar(10) */ N'Diciembre',
/* FrenchMonthName - nvarchar(10) */ N'Décembre',
/* MonthNumberOfYear - tinyint */ 12,
/* CalendarQuarter - tinyint */ 4,
/* CalendarYear - char(4) */ '2000',
/* CalendarSemester - tinyint */ 2,
/* FiscalQuarter - tinyint */ 2,
/* FiscalYear - char(4) */ '2000',
/* FiscalSemester - tinyint */ 1 )

Sript 2 (is too long so here is attachment):

Script 3:

with member Measures.KeyName
as
[Date Test].Calendar.CurrentMember.UNIQUE_NAME
member KeyCaption
as
[Date Test].Calendar.MEMBER_CAPTION
select {KeyName, KeyCaption} on columns,
{[Date Test].Calendar.[Date].&[2000-12-31T00:00:00]:[Date Test].Calendar.[Date].&[2001-01-01T00:00:00]} on rows
from [Adventure Works];

Thursday, June 5, 2008

Checksum pains

Are you using TSQL Checksum function in MSSQL2005? If yes be carefull with decimal (numeric) types. Try this example:

SELECT CHECKSUM(1000.0)
,
CHECKSUM(100.0)

Should not be there different values? Of course. Retype above SQL into following cast:

SELECT CHECKSUM(CAST(1000.0 AS VARCHAR(50)))
,
CHECKSUM(CAST(100.0 AS VARCHAR(50)))

or you can use other examples. The positive is that casting takes very small performance points:

DECLARE @i DECIMAL(20,5), @c INT
SET @i=0;

WHILE (@i<500000)
BEGIN
SET @i=@i+1; --7s

SET @c=CHECKSUM(CAST(@i AS VARCHAR(50))); --10 s
--SET @c=CHECKSUM(CAST(@i AS FLOAT)); --10 s
--SET @c=CHECKSUM(CAST(@i AS FLOAT)); --10 s
--SET @c=CHECKSUM(@i + CAST(0 AS FLOAT)); --10 s
--SET @c=CHECKSUM(@i); --10 s

END;