Yesterday’s post touched on the importance of being able to use spreadsheets well, and particularly Microsoft Excel. I thought I would expand on that subject here.
I saw copies of Lotus 1-2-3 back in college and certainly read about it. The same went for VisiCalc. I started using random knockoff spreadsheet programs around 1987 while I was in the Army, on my own computer. I started using Quattro Pro for work and home beginning in the late 80s and continuing through the early 90s. One of the Turbo Pascal Toolbox products included code for a spreadsheet, so I saw some of how they were constructed from the inside. I started using Word to replace Professional Write around 1991 because it was so clean and simple for a true WYSIWYG product, and have used it ever since. Quattro was a great product, though, and I didn’t give it up until the bundling of Microsoft Office, which included Word, which I already knew and liked, obviated it. From there it as Excel all the way.
Over time I ended up doing huge amounts of analysis and tracking in Excel. I’ve used it for everything from project tracking (doing pseudo Gannt charts, among other things) to cost accounting and forecasting, lightweight simulation, lightweight Monte Carlo analysis, data conditioning, statistical characterization, plotting and charting, scientific and engineering calculations, inventory management, list-making and knowledge transfer, and earned value management. In short, I’ve used it for almost everything, and it’s rare that a day goes by when I don’t use it.
The Aircraft Maintenance Model I supported for a few years used to write output data directly into formatted tables in different tabs of Excel workbooks, which I spent quite a bit of time rearranging and improving. Eventually, however, some of the data sets we were working with grew beyond the 2-gigabyte limit Excel can handle. At that point we had to change the program so it wrote out comma-delimited text files (CSV files) and additional programs that would parse the data in the files and perform the same calculations. Users obviously should never use functions they don’t understand, but recreating those functions in code demonstrates an even deeper level of understanding of what they do. Indeed, I often reverse-engineer Excel mathematical functions in Excel to make sure I understand the version and context of a function I want to use completely.
Over the years I’ve seen job ads looking for people who can do data analysis in Excel. They usually make a big deal about being able to use the VLOOKUP function (they never seem to mention the HLOOKUP function). I suppose that’s fine if you want someone who’s just starting out, but in reality that function, along with pivot tables, is pretty trivial.
One of the projects I did with RTR Technologies involved reverse-engineering a fairly complex spreadsheet used to justify the need for about 27,000 staff for a particular function in a particular government agency. There was nothing difficult going on in the spreadsheet, but its sheer size and the amount of data it tried to track made it a configuration nightmare for the individual who created it and had been maintaining it for a few years. At the time I described this individual’s efforts as “heroic” given his level of knowledge of Excel and his overall training in computing in general. In the process of taking the workbook apart I saw many opportunities for including meta-calculations that could be used to check the integrity and accuracy of the work, especially as it was updated and extended from year to year. I tend to include such checks and verifications in all my work but this individual simply didn’t have the background for it, and neither did anyone else in the organization. We ended up writing a standalone program to recreate all of the functionality of the spreadsheet, and on delivery were able to demonstrate that we matched over 100,000 input and calculated values exactly. (That’s an entirely different story. Ask me about it over coffee sometime.)
A separate analyst put together an even more complex spreadsheet to calculate staffing needs for a different branch of the same agency. It took forever to hire that person back and by the time they did so my company’s contract had ended. (We can talk about that one over a second cup of said coffee.) I never look at that workbook in much detail but it used a lot of flexible, indirect addressing schemes that were quite clever. I’ve used those in my own analyses but they generally don’t come up much. That fact that they even exist in the program, though, is pretty impressive.
As an aside, it turns out that Excel and its other spreadsheet brethren constitute a meaningful form of programming, known as cell-oriented dataflow programming. This is a subclass of declarative programming, which is contrasted with the more traditional declarative approach. See here for more information.