Automatically Generating Code

A programmer’s efforts to create code can be greatly enhanced by automating as many operations as possible. This process reaches its zenith when the code itself can be generated automatically. I’ve done this on a handful of occasions in both direct and indirect ways.

Curve-Fitting Tool: I’ve discussed my work with curve fitting over the course of several recent posts. The goal was always to generate functions or at least sections of code that could be dropped into a larger project. All of the segment functions for the properties of saturated steam and liquid water were generated by one or another of the tools I wrote. I hope to be able to expand those to generate segments and functions in a wide variety of languages going forward. I’ve also identified a number of improvements I can made to the generated code, documentation, and process of verification.

Automatic Matrix Solution Code Generator: While working at Bricmont I ended up doing a lot of things by hand over and over. The control systems I built all did the same things in principle but the details of each application were just different enough that generalizing and automating their creation did not seem to make sense. If I had stayed there longer I might have changed my mind. I did leverage previous work by adapting it to new situations instead of building each system from scratch. That way I was at least able to identify and implement improvements during each project.

There was one exception, however. I was able to automate the generation of the matrix solution code for each project. In general the block of code was always the same; there were only variances in the number and configuration of nodes and those could be handled by parameters. That said, the matrix calculations probably chewed up 80% of the CPU time required on some systems, so streamlining those bits of code represented the greatest possible opportunity to improve the system’s efficiency. To that end I employed extreme loop unrolling. That is, writing all of the explicit calculations carried out by the tightly looped matrix solution code with array indices expressed as constants. In that way you get rid of all calculations having to do with incrementing loop counters and calculating indirect addresses. The method saves around 30% of execution time in this application, but at the cost of requiring many, many more lines of code. The solution to a 147×8 symmetric banded matrix expanded to about 12,000 lines. The host system was way more constrained by calculation speed than it was by any kind of memory, so this was a good trade-off.

The code was generated automatically (in less than a second) by inserting write statements after each line of code in the matrix calculation — if that line performed any kind of multiplication or summation. The purpose of the inserted lines was to write the operations carried out in the line above in the desired target language (C++, FORTRAN, Pascal/Delphi at that time), with any array indices written out as constants. Run the matrix code once, it writes out the loop-unrolled code in the language of choice.

Once the code that runs the vast majority of time was made as efficient as possible, there wasn’t much value to be realized in trying to wring significant performance gains out of the rest of the code. That being the case (and knowing I always abide by guidelines that make things generally efficient), I was able to concentrate on making the remainder of the code (you know, the other 38,000 lines of a 50,000 line project) as clear, modular, organized, understandable, and maintainable as possible.

I’ll include this one as an honorable mention:

Automated Fluid Model Documentation and Coefficient Generator: The project management process used by Westinghouse was solid overall (it was even rated highly in a formal audit conducted by consultants from HP while I was there) but that doesn’t mean there weren’t problems. One monkey wrench in the system caused me to have to rewrite a particularly long document several times. After about the third time I wrote a program that allowed me to enter information about all of the components of the system to be modeled, and the system then generated text with appropriate sections, equations, variable definitions, introductory blurbs, and so on. The system also calculated the values of all of the constant coefficients that were to be used in the model (in the equations defined) and formatted them in tables where appropriate. I briefly toyed with extending the system to automate the generation of model code, but the contract ended before I got very far.

I’ve written other tools and modular systems but the systems descriptions they generated were more like parametric descriptions than native code. The main system runs based on the parameters but doesn’t spit out standalone code. I’m sure there’s a good philosophical discussion of the nature of the demarcation in there somewhere.

I’ve also been involved with systems that can write code which can then be incorporated into the running system on the fly. This requires the ability to generate code that knows what variables and connections are available in the main system, can compile or interpret the resulting code or script, and can integrate the results into the main system. My original concept, intended for use in the fluid modeling tool, was for the program to be able to write out the bulk of the code for calculations of pressure, energy, flow, accumulation, concentration, transport, state changes, instrumentation, user interaction, and so on. The system would then have to be able to let the user generate additional sections of code to handle special situations in a model that can’t be covered using the standard methods. At the time I was doing this I planned to have the system write the entire standard and custom code at one time in an integrated way, and then compile and run it by hand.

Today I would try to automate the process even further. I succeeded in doing this in a small way when I wrote my system to simulate operations in medical offices. It was able to generate the required parametric input files, kick off the execution process (written in SLX, it reads input files and generates several output files), generate results, and allow the user to review the results and spawn the separate output animation process (spawns a Proof process which reads the animation file generated by the SLX simulation run) from within the main program and user interface (written in C++).

More recently I worked with a development team that created a system to calculate staffing requirements based on a range of inspections of different types of traffic through the various land, air, and sea ports of entry. The program read in all of the provided raw arrival data from a number of sources but calculating the number of staff required combining those figures with information about number of staff and range of duration required for each inspection process. The former information came from a range of agency collection processes while the latter information was gathered by subject matter experts and experience. These bits of information were originally combined in an Excel spreadsheet. A spreadsheet is, in fact, a kind of computer language processor, but the calculations are expressed in a declarative, dataflow, cell-oriented paradigm.

The replacement system was written in C# and provided means of identifying individual data items by name and sometimes by the location (port of entry) they applied to. These items could then be retrieved, used in various mathematical and accumulation operations, and then stored as desired. The instructions for doing so were written out as user scripts in valid C#. While C# code can be more daunting to users than spreadsheet operations, the tool took great pains to hide as much of that complexity as possible from the user. The individual writing the scripts was able to get a lot done by following the patterns shown in a minimum number of examples, while an advanced user could accomplish almost anything. Once all of the scripts were written the tool could have them all processed so the calculations could be run within the tool. C#, Microsoft’s answer to Java, is, like Java, actually run by turning source code into meta code and then interpreting the meta code. These processes could all be done from within a running UI. Accomplishing the same thing in a purely compiled language would involve compiling added snippets of code and merging them in as a .DLL or something similar, and they would have to be embedded in a wrapper unit and function call with a known name. It wouldn’t be impossible, it would just be a bit more limited.

This entry was posted in Software and tagged , . Bookmark the permalink.

Leave a Reply