Three Plates of Spaghetti

Spaghetti is great for a meal but not so much for code.  I’ve had to deal with three major plates of spaghetti code in my career.  Here’s how they went and what I learned.

I got to write all my code from scratch for the first few years of my working life, which was nice.  The one brush I had with anything that looked like spaghetti was quickly dispatched.  I had to adapt my predecessor’s code for one or two jobs when I first started writing model-predictive control systems for steel mills, but that only took a week or so in each case.  They were similar to several existing systems so adapting them was simpler.  The truth was that I had chosen to discard his code the first day I was there and build my own from scratch.

Several years later, though, I took custody of a control system that had been adapted and hacked over by several different people over the course of a decade.  I got to mentor the team that wrote the replacement but I was the high-level language guy that got to keep the old one running.

Our parent company’s equipment was originally run by Programmable Logic Controllers (PLCs) but it was cheaper to use a PC and some simple serial I/O devices.  The person charged with writing the original PC software took a shortcut — he mimicked the the PLC ladder logic in C++ code, though on an orderly, object-oriented basis.  It was clever and quick and it worked, but those who edited the code after him did it differently.  The first person added code in a consciously high-level way, as C++ that looked like nice C++.  The person or people who worked on it after that seemed to add patches any old way they wanted.  The mods I made were always in the style of the affected location.  It was hard enough to make fixes and mods without breaking anything; imposing order in a major way was a lower priority on a system that was due for replacement.

It was actually a pretty nice system.  It allocated memory for the controls for each furnace after reading a configuration file.  That file had 250 possible configuration settings and the international language file had another 700 entries.  The innards of the thing, though, were difficult to tease apart and doing so took more than a year.  Requests for fixes and mods came in at irregular intervals and it took a while for the internal structure to become clear to me.  It was ultimately replaced by the new version and I made sure that team included a framework for documentation and configuration management from the beginning.  Today I would do this in an even more aggressive and organized fashion.

I encountered another mess at my next job.  In this case the PC software had two components, a driver unit that handled the communications and much of the configuration and a UI unit that allowed the user to configure and use interface screens.  The UI unit was just a hair buggy but it was licensed from a third-party developer and we could neither see nor modify its code.  That was a problem before I was there, while I was there, and apparently long after I was there.  The driver section was just entropic in general.  I could always bang on it to get it to do what it needed to do (it took a few late nights) but I never intuited any real structure.  One of my colleagues who worked on it full time ended up getting a better grip and ultimately added a whole new structure to it.  He did some nice work.

I also had to write installers using InstallShield but, at least for that version, there seemed to be two different paradigms for doing things.  The installer was implemented in one when I started but the other one seemed much more straightforward and I eventually converted to it. We also discovered that neither the PC software nor Installshield dealt with Windows ME very well, so we simply chose not to support it.  It was unlikely anyone was going to try to run control software on that OS anyway.  I learned a lot more about how the Windows OS worked, that’s for sure.

The final tangle I encountered was a simulation of aircraft maintenance that had existed in various forms since the 1960s.  It was still supported by an engineer who was close to 80 years old and others who had worked on it over the years (the owners of the company) were all in their late 50s.  The owners had become managers and didn’t get into the code much but when they and the senior engineer did work on it they tended to just patch it.

This code was a problem for a bunch of reasons, some of which were legitimate.  The model itself was modified over the years to cover many different operations.  In particular it had code and variables related to operations on aircraft carriers (usually involving constraints of landing and service locations).  The analyses we were running did not consider carrier ops explicitly and the code had never been consciously reworked to remove the relevant references.  Indeed, after so many years the requirements and core modeling assumptions had changed on numerous occasions (sometimes for very good reasons) so the code naturally reflected all of that drift.

The people who did most of the work on the code were primarily concerned with the analyses they were performing and the code itself was a means and not an end.  They were all smart and capable but not always motivated by the latest ideas in design, consistency, modularity, and configuration management.  The code was so intertwined that changes often had unexpected side effects.  Testing was not exceptionally thorough and never had been.

Another issue was that the code was written in a rather obscure language called GPSS.  It is a discrete-event simulation language with a low level structure like that of assembly language in expression although more complex behind the scenes.  If you want to create really organized code in this language you have to work at it.  It’s very easy to let entropy creep in, especially for practitioners looking at it through more contemporary eyes.  It didn’t help either that this particular code was the most complex that had ever been written in the language.  It was big and had a lot of moving parts.

Substantially restructuring the code or replacing it entirely was going to be difficult for both financial and political reasons.  The financial limitations are that the team is funded by incremental contracts by the year (with the occasional bump for a one-off analysis) and there was never time to do much more than what was needed.  The first political limitation is that the software team employees were all subcontractors and the prime contractor was always angling for more involvement.  If the software was ever rewritten from scratch the prime might be able to ease the sub right out of the picture.  The other political limitation was that a replacement would have to demonstrate its credibility anew with the customer, even though it would likely be a superior product.

To the good the team in charge of the software was fully aware of all these issues and did what they could over time to hammer things into shape.  Old bits were slowly and carefully excised and various operations and structures were refactored to make them more consistent and approachable.  A series of regression tests were developed over time in the form of multiple input configurations that started simply and progressively tested more and more capabilities of the code.  The biggest improvement was the creation of a wrapper framework written in C# that managed all the inputs and outputs of the model, which were many, and in many different formats.  That wrapper had problems of its own but they were minor in comparison.

The system was subject to a formal configuration management process and most of the documentation of the configuration, structure, and operation of the system was updated fairly regularly.  The structure of the code itself was not externally documented but everything else was.

So what is the takeaway from all this?  First, there are at least some legitimate reasons for spaghetti code to exist.  It would be better if code never got into that state but once it gets there it can be difficult to replace.  Much of modern software development practice has to do with managing complexity (everything from test-driven development to source code control and build automation systems to functional programming) and that goes a long way to preventing this kind of pastafication.  Most of the rest is experience and good management.

If you are confronted with a plate of spaghetti try to do the following:

  • Be careful at first.  Don’t try to make radical changes; you don’t yet know what you might break.
  • Read any existing documentation in any form (e.g., user manuals, configuration guides, maintenance logs, developer notes, data descriptions, source code control system notes, and so on).
  • Learn what you can from any people that may still be available.
  • Document whatever you can.
  • Try to identify the underlying structure or structures.
  • If any configuration, input, or output processes can be automated then make it happen.  Try to remove ways to make mistakes.
  • Add various kinds of error checking and validation if they are lacking.
  • Make sure you document your own efforts to figure things out.  If you approach things systematically you are likely to receive more support and understanding.
  • Make sure you secure access to the tools needed to build and manage the system.  That can be a challenge in itself, especially on older systems.
  • Try to discern the underlying principles at work in the system.  What is it for?  How can it be made more consistent?
  • If you’re holding the fort while a replacement is being built, make sure you share your findings with the team doing that work.
  • Refactor slowly but try to impose some order over time.

I’m sure there are more good ideas.  What suggestions would you add?

This entry was posted in Software and tagged , , . Bookmark the permalink.

Leave a Reply