Datawatch Report Analytics Blog

Our blog covers all aspects of report analytics, report mining, data mining and how and why to liberate the data trapped in static reports and business documents.

Report Analytics Categories

Previous Datawatch Blog Posts

Datawatch Blog Authors

Jim Ward
Mike Sloger
Phil Storey
Scott Noble
Michael Morrison
Don Mulroy
Phil Lafond
Carol Giadone
  • Excel Strategies: When is a Spreadsheet not a Spreadsheet?

    by Scott Noble

    Scott NobleMicrosoft Excel has come a long way from the simple table provided in version 1.0.  These days Excel is more frequently used to create formatted reports designed for human eyes than flat tables designed for calculation.  Such visually rich spreadsheets can be near impossible to consume in Monarch as a data source.  Take the example below where mixed data types are presented as a report with hierarchical, non-linear special relationships, no column names and merged cells.

    Sample image

    Prerequisite file conversion to overcome challenging Excel data sources

    By far the easiest way to deal with this data source in Monarch would be to print it to file and model the static result.  But what do you do to automate ingestion for Monarch Data Pump ?

    Excel does not have a command line option that will encompass print to file, and developing code to utilise Excel components in background to output a static file is proprietary and beyond the scope of most Datawatch customers.  The resulting solution would require significant effort from a skilled IT department and end up inflexible and high maintenance.  Fortunately, several intrepid developers have created off-the-self XLS and XLSX to PDF converters with command line execution.  The programs range from $30 USD to $300 USD and vary greatly in true value, though  I have found a genuine freeware utility for simple conversion tasks (contact me at blog (at) datawatch (dot) com for product names).

    The command line structure for most of these utilities is straight-forward and well documented.  Most use wildcards and support basic renaming of files as they are processed. Many of these utilities also have rich output variables such as page size, worksheet and data range selection and file splitting to name a few.

    Using one of these utilities I was able to satisfy a potential customer who was being sent three Excel workbooks from head office each day, all looking like the one above, and needed to combine a subset of data from all three into one spreadsheet for routine operations.  ”Standard operating procedure” was copy and paste by three permanent staff to provision the department’s daily information — a task that left the department a day behind the delivery of the data.

    Using Data Pump’s prerequisite tab in the XPRJ editor, a command line was set to execute the conversion, and he resulting PDF files used as source reports.  The aggregation of the spreadsheet data was automated without the need for complex scripting.    As the command line accepted wildcards and output defined files names, new source files could be encompassed without change to the Data Pump prerequisite.  New or altered reports could be modelled by financial staff that understood the information (as it should be) and easily incorporated into the automation.

    While this is certainly not the only way the non-tabular Excel “documents” could be processed, using the simple third party tool allowed the department to achieve the required result without impact to established work practices and within the skill sets of the departmental staff for both implementation and on-going use of the new process.

    – Scott
    @datawatch

    4 Comments

4 Responses to “Excel Strategies: When is a Spreadsheet not a Spreadsheet?”

  1. DataExploiter said on

    Exciting to realize I’m not the only one that sees this as a classic overlooked source of valuable data. In my experience companies receiving daily , weekly, or monthly preformatted spreadsheets to answer one question and many times for one small group or individual user. The potential for accruing additional company improving benefits by using RA on these Excel reports is profound. Most RA/Monarch users overlook this rich source of painstakingly(expensive/labor intensive) created sources of valuable information as something to leverage. Many incorrectly still think of RA/Monarch only as text mining tool for ERP reports.

    As you know and in my experience the immediate return by applying RA on presentation formatted Excel reports for further value, safeguarding, accumulating, adding additional attributes, denormalizing for relational analysis, and making it available to other departments, functional experts to uncover/create additional benefits is significant.
    For example the print screen of the Excel report in your post is most likely audited for accuracy , customized and improved by trained capable workers that could become significantly more valuable not just for finance consolidating or executives reviewing the final consolidation of the data but instead, supply chain, marketing, executive committee, engineering could all have their own customized view of the data without changing the results using RA with a click of a button not detach attached file, print, throw in the garbage or file in drawer. Centralize, unlock and free the hard worked data in these preformatted daily, weekly, or monthly Excel reports to anyone in the organization that could use, discover new insight, or improve. Just like a good BI tool, OLAP, software package or data warehouse you’re not replacing just making it better and available to the entire talent pool in the organization with the help of RA.

    For example, using RA to exploit the preformatted Excel repots , marketing can have their view of the data or research a new idea using a filter. aggregate in a summary. crosstab or chart all predefined or created on the fly by the department or end user. The same would apply to the executives or supply chain group. The old way of consolidating, creating different reports, panicking and wasting time (money) consolidating historical reports for a one time query for senior management is wasteful to the employee and company. You get the idea not replacing all the Excel work just making the effort even more useful. Break down the political walls and the vested interest in doing things the old way and the company and the employees will quickly realize huge rewarding gains/improvements.

    BTW, I print to an XPS file format available on any windows machine. PDF file format do have an advantage as when I tried to consolidate multiple XPS files(hundreds) into one and it was impossible for me. Countless ways and each situation is unique as you know.

    John

  2. John,

    Thank you for elaborating this point. Your take on the subject is spot on; those reports are often among the most valuable assets of a business and are being left out of the business information chain.

    Beyond that, liberating the data in preformatted Excel reports is only half the story. While being able to utilise such work product in a cross-functional fashion within a business satisfies operational needs, many (most ?) businesses relying on such practices allow those assets to go unmanaged. There are usually multiple copies of the same report taking up space in eMail and file systems, changing over time as they are re-used for situational purposes, with no version control or security applied.

    By coupling Monarch Data Pump with Monarch Enterprise Server, the same simple pre-processing logic will enable:
    • browser based distribution of the results
    • further ad-hoc analysis
    • application of more advanced analytical tools
    • data visualisation
    • field level data security
    • and on-demand aggregation of historical data

    Along with all the governance, security and compliance benefits from an industrial strength archive solution; with the original Excel report stored alongside the extracted and transformed data.

    Hopefully, discussions like this will help get the word out.

    Scott

    PS: I’d have rather used XPS as well, but there is a dearth of decent command line XLSX to XPS converters. I haven’t encountered the scale limitation you ellude to though, have you reported that to our support centre ?

  3. Gareth Horton

    Gareth Horton said on

    Hi Scott,

    Here’s an MSDN sample for code to programmatically convert Excel files to PDF or XPS just using the Excel object model:

    http://msdn.microsoft.com/en-us/library/bb407651.aspx

    This way you don’t need any additional software other than Excel 2007 / 2010, plus you get the option to export to the superior (for parsing purposes anyway) XPS format.

    Gareth

    Gareth

  4. DataExploiter said on

    Just speaking from my real world experiences.

    “Centralize, unlock and free the hard worked data in these preformatted daily, weekly, or monthly Excel reports to anyone in the organization that could use, discover new insight, or improve. Just like a good BI tool, OLAP, software package or data warehouse you’re not replacing just making it better and available to the entire talent pool in the organization with the help of RA.”

    Oh yes .. the fixed time period data requirement. For example it could be executive management’s request for daily sales or other statistics consolidated by country or company . So a small group is dedicated to the collection and consolidation process. Detach excel file attachments, open copy and paste the data, consolidate and format further, Then take the time and hope amounts reconcile leaving little or no time to question or analyze the results submitted. Using the centralized data pump with RA now give these same employees the opportunity to develop their analytical skills discovering approaches to better analyze, distribute more timely, accumulate historically for improved value from the effort used to submit the excel reports.

    There are no winners with the old method, employee or the company. But with the centralized Report Analytic approach the employee can develop and continually improve transferable skills and the company benefits from a higher return on the employee labor.

    Personal agendas gets in the way and block the realization these powerful benefits.

    John

Leave a Reply

You must be logged in to post a comment.