XML Report Generators, Part 2

Based on comments to my earlier post, I put further work into Crystal Reports XI and SQL Server Reporting Services. Both seem capable-enough in terms of handling the data volumes and seem to have similar ease-of-use as far as the ultimate end-user goes (the inexperienced programmers who will be charged with writing reports). My take is that Crystal has more flexibility in terms of output formats, which might influence my choice.

However, there’s another wrinkle. The data is stored as an XML blob in the database. These are fairly hefty documents, 1-4MB apiece, with quite complex schema. But I don’t want to report on a single one of these blobs, I want to report on all of them. So my data model is not-quite-XML and not-quite-relational.

For instance, to use the cliched example, every record in the database contains data such as:

<?xml version='1.0' encoding='utf-8'?>
<author>Arnold Aarnison</author>
<author>Bob Billson</author>

And the reports we’re looking for might be “Total number of sales of books by author” which would seem to necessitate either:

  1. Joining the data in multiple database rows (i.e., make a new XML document that wraps the per-database-row <book> element in a root element called, say, <books>); or
  2. Mapping the XML schema into a relational view (i.e., create tables for Books, Authors, and Sales); or
  3. Reporting that can somehow combine SQL and XPath in a single query (e.g., “Select count(myXmlColumn.Xpath(/book/sales/text()) from myTable group by myXmlColumn.Xpath(/book/authors/author/text()”)

Option (1) is easy enough, but the resulting document is going to be massive — gigabytes. I’ve never worked with XML documents of that size and worry (prematurely?) about the capacity / performance of feeding a report generator such a volume of XML.

Option (2), which is essentially the reverse of (1), would solve the reporting problem (since once it’s in a relational structure, any reporting tool will work) and is do-able using Altova’s XMLSpy “Create DB Structure from XML Schema” but I’m concerned about the amount of manual labor required (I have to define dozens of foreign keys relationships by hand — but is that a one-time cost that’s no big deal?).

Option (3) would be the ideal, but I don’t think there are any reporting tools that work that way.

Any real-world experiences with similar challenges would be appreciated…