Prettying Up the 3D Graph Output

Here I’ve prettied the graph up so the lines change color every 100 degrees F, and I’ve added a legend. I suppose I could choose consecutive colors with a bit more contrast.

Posted in Tools and methods | Tagged , , , , , | Leave a comment

Graphing the Matrix Output

Here I continually run the heating routine for a piece with an initial temperature of 70 degrees in a furnace set to 2300 degrees. The temperature scale, expressed vertically, runs from 0-2500 degrees. The bottom of the piece, which sits on a solid hearth through which no heat is transferred, is to the lower right.

The graph is rendered using a shortcut isometric technique which obviates the need for matrix multiplications and perspective transformations.

Posted in Tools and methods | Tagged , , , , , | Leave a comment

How Quickly Can the Matrix Be Solved On Different Machines?

Today I ran 2750 iterations of the matrix on some additional machines.

My 64GB iPad 3 runs a 1 GHz, dual-core, 32-bit, ARM Cortex-A9 and ran the test in 1.308 seconds on load and about 1.15 seconds on rerun.

My MacBook Pro from 2013 runs a 2.7 GHz Core i7 and did the initial run in 0.09-0.130 seconds and re-runs in as little as 0.070 seconds. That’s almost as fast as my 2015 notebook PC.

I wasn’t kidding about digging up my old Palm Pre, either. I finally got it charged up and running but found it would only run up to 500 iterations before it would hang. I therefore created another version of the test which runs only 500 iterations, as shown below.

The Pre Plus, a 500 MHz ARM Cortex-A8, managed to grind through 500 iterations in 4.77 seconds. At that rate it would take 25.85 seconds to do 2750 iterations. That is slower than what a good desktop machine from the late 1990s could do. A couple of thoughts occur to me, however. The Palm Pre was introduced to great fanfare but had numerous problems from the beginning. What was left of Palm didn’t have the resources to get the software into the best shape when it was released (the ability to record video was only enabled after a handful of OS updates). I would not be surprised if the code for the browser or JavaScript implementation were less than optimal. The company was acquired by HP, which then managed to drive the final nails into Palm’s coffin.

Still, the matrix code doesn’t make use of any special or new features of HTML, CSS, or JavaScript and did manage to run to completion, so I guess that’s an interesting finding on its own. It’s kind of like the poodles in tutus that used to walk on their hind legs on the old Ed Sullivan Show. You weren’t impressed that they did it well, what was impressive is that they could do it at all.

Posted in Tools and methods | Tagged , , , , | Leave a comment

How Quickly Can the Matrix Be Solved?

The solution was finally made to run last week. Today the question is how fast the thing runs. My feel for the answer to this question has to do with the context in which I first asked it.

From 1994 to 2000 the fastest desktop computers I worked with maxed out at 400 MHz. That was true for PCs and the more expensive DEC Vax and Alpha machines we were running. By contrast the laptop I’m working on uses an Intel Core i7-4510U CPU with a base speed of 2.00 GHz. That would be five times faster on its own. However, that number is the base speed of each core, and depending on the number running the speed can theoretically be boosted to as much as 3.1 GHz. The Windows system page indicates that the fastest core may be running at 2.6 GHz.

After that we can consider that each core in a current CPU is likely to be further optimized in terms of its internal architecture, the number of clock cycles it takes to perform each operation, the amount of available cache memory, the speed at which data can be moved in and out of RAM, and other considerations. The current chip and OS is also 64 bits while the older ones were only 32 bits, though that consideration may not have much effect on the speed of calculations.

I don’t know how to evaluate the overhead incurred by the operating system itself, but running JavaScript in a browser at least carries the overhead of the browser.

Next we can consider the number of times the matrix code had to be run for every model iteration. The matrix had to be run at least once to update the current temperature of each piece in the furnace during each time step. The number of pieces in a furnace could range from, say, 8 to 144 depending on the application, so lets pick 50 for giggles. This number doesn’t really matter because the bulk of the work is done when trying to predict what happens in the future.

Control systems based on model-predictive simulation work by starting from current conditions and calculating what will happen in the future. If the results obtained by the simulation match the desired results then the control settings need not be changed. However, if the predicted results do not match the desired results the program has to figure out what control inputs to change. Without going into detail, the furnace Level 2 systems I wrote would march the pieces out of the furnace at the current average pace of movement through the furnace, figure out what the discharge average and differential temperatures (the difference between the temperatures of the hottest and coldest nodes in a particular piece, which value had to be below a specified threshold) would be, compare the predicted result with the desired result, and adjust the control settings.

The control settings adjusted by the Level 2 system were the temperature setpoints for each zone in the furnace. A furnace might have anywhere from one to six (or more) independently controlled zones. It was up to the Level 1 systems to do the detailed work necessary to make this happen. They used PLCs to control gas flows, fuel-air ratios, pressures, movements, cooling, opening and closing of doors, operation of exhaust flues, safety systems, and so on. They were a whole world of their own and I was always thoroughly impressed with the people who built those systems.

For this example I’ve assumed that there are three effective zones of control in the furnace, and the predicted temperature of at least one piece has to be calculated for each zone. The “critical” piece in each zone was the one that was logically identified to be the hardest to heat. Pieces could be a bit over temperature of more thoroughly soaked (lower than target differential temperature, but if they were too cold or not sufficiently soaked they could break a mill stand, which would slow the operation down). As the pieces are being marched out of the furnace the prediction code assumes that the Level 1 system will move the furnace temperatures to their setpoints for each zone if they are not already there. I ran successive iterations of the prediction code until I either got the discharge temperatures to within a few degrees of their targets or until the number of iterations gets up to five. Greater accuracy could be achieved during later time steps but the model could never be allowed to run longer than its designated time step.

Next I considered the pace of movement through the furnace. If the furnace was running slowly it might take two hours for a piece to get through the furnace, and that would mean 180 40-seecond time steps. So, at 3 pieces times 5 iterations times 180 steps we have 2700 matrix calculations. More are certainly possible.

I’ve set the example up to run 2750 iterations, which would be enough to update 50 pieces plus predict three as described. The code runs when the page is first loaded and again whenever the button is clicked. I noticed that clicking the button to repeat the test yields notably better results, so there must be some overhead associated with the initial load of the page.

I would have expected the code to take 5-10 seconds to run, but on my laptop, with the file served from local disk, the initial result was 0.08 seconds and about 0.05 when the button is clicked to rerun. This represents a performance improvement of at least 100, which is kind of mind-blowing to me.

Served over a fast connection from the website I get results of about 0.085 seconds on load and 0.052 seconds on rerun.

On my iPhone 5S I get 0.352 seconds on load and 0.230 seconds on rerun. That may be even more mind-blowing.

Tomorrow I’ll try this on a few other systems to see how it feels. What results are you getting on your setup?

Posted in Tools and methods | Tagged , , , , , , | Leave a comment

Finally Solving the Matrix

Once we get all the parameters calculated and the matrix loaded, then we can finally solve the thing. Here’s how the code looks:

We start with a 7×7-node workpiece where the interior nodes are two inches wide and two inches high. The surface nodes are half as wide or high, meaning the cross-section area of each node is four square inches in the interior of the piece, one square inch on the four corners of the piece, and two square inches for the other surface nodes. The length of the surface exposed to the furnace is zero for interior nodes, zero for the bottom middle nodes (the piece is sitting on a solid hearth through which it is assumed no heat transfer occurs), one inch for the bottom nodes on each side, and two inches for the remaining surface nodes (the two corner nodes include one inch across the top surface and one inch down the side surface). The center of each interior node is assumed to be in the middle of the node, one inch from each side. The “center” of each surface node is assumed to be on the middle of the outer surface of the node. The “center” of the outer corner nodes is assumed to be on the outermost corner. This way there are always exactly two inches between the centers of each node, so the heat conduction terms make sense and the radiation can “see” the “center” of each node.

I have done it this way from the beginning and found that the readings of surface temperature on pieces which were actually heated in furnace matched the predicted values with impressive accuracy.

If all nodes are initialized to 70 degrees F, then exposed to a furnace at 1800 degrees F for successive 40-second time steps, we get the following results:

Posted in Tools and methods | Tagged , , | Leave a comment

Dimensions and Other Considerations Before We Solve the Matrix

Yesterday’s post showed how some of the constants and parameters are initialized or calculated. Today we’ll describe a few more.

This exercise defines a cross-section of a steel billet that is 8 inches by 8 inches and divided into five nodes high and five nodes wide. I found it convenient to assume the surface nodes were only half the height or width of the middle nodes, and to assume the center of surface nodes is on the outer edge or corner of the node, or in the middle of the node for central nodes. This means that the centers are always equidistant both across and up. The surface nodes also receive conducted heat and radiated heat at the same location. Nodes on the corners have one quarter the cross-sectional area of the fully central nodes, and nodes on the surface but not on the corner have one half the area of the fully central nodes.

Posted in Tools and methods | Tagged , , | Leave a comment

Static and Dynamic Coefficients in Matrix Thermal Solutions

Recalling yesterday’s initial heat balance equation for each node:

it should be noted that the values of both the specific heat and the thermal conductivity of steel are themselves functions of temperature. That means that these values need to be calculated for every node, every time a temperature matrix is loaded and solved. Finding those values means interpolating from a table of values as a function of temperature (English units). The values of the other constants are shown as well.

The values for view factors are affected by shadowing caused by the structures the workpieces are sitting on (typically beams, a solid hearth, or rollers) and the workpieces each piece is sitting beside, if there is a gap between them. The values for the bottom and side views assume a solid hearth and notional effects of shading from neighboring pieces. I have written systems that considered a changing configuration of support structure over the length of a furnace and variable-width gaps between workpieces.

Posted in Tools and methods | Tagged , , , | Leave a comment

A Specific Example of Equations To Be Solved By Matrix

Dividing a cross-section of a steel billet into nodes, we can start by looking at the energy balance for each node:

All this equation is saying is that the change in the quantity of energy in a given node per unit time is equal to the sum of the energy flows into and out of the node. The quantity to the left is the accumulation terms while the quantities on the right side of the equation are terms for radiative heat transfer from the environment to the surface of the steel, conductive heat transfer between nodes within the steel, convective heat transfer between the external atmosphere and the steel, and inductive heat transfer into the steel.

Let’s start by breaking the mass in the accumulator term down as follows:

We continue by eliminating the convection and induction terms because they are comparatively small or nonexistent, and then by dividing all the remaining terms by the unit area:

Next we can expand terms:

Steal a first order dependent variable from the fourth order term:

Substituting back into the previous equation yields:

Gathering terms we get:

The coefficients for the Tn terms and the Tc terms on the left side of the equation fill the initialized coefficient matrix while the terms on the right side of the equation fill the initialized source matrix. There is one T term for each node. The radiation transfer terms drop out for any node that isn’t on the surface of the piece of steel being heated. The conduction terms drop out between any two nodes that aren’t internally connected.

This equation gets written for each node individually and then the system of equations gets solved. All of the Tn terms for the new time step end up in the source matrix at the end. Note that the temperatures are all represented in absolute degrees.

Posted in Tools and methods | Tagged , , | Leave a comment

Implementing the Efficient Matrix Solution for Special Classes of Engineering Problems

Yesterday I described how to set up a matrix for efficient solution, today I’ll describe how to actually execute the solution. I plan to go back and revisit the intermediate steps but I wanted to show how this series of techniques pays off. I also wanted to leverage this work to describe my feel for the amount of computing power that might be available for JavaScript execution on various computing platforms, which discussion started on Monday.

Here is an implementation of the symmetric banded matrix solution written in C, which is very easily converted into similar languages (Java, JavaScript, etc.). I chose this example because it was working code from a project I did back in the 90s, and I know it’s correct. A few things can be done to make it more compact (using the += formation, for instance) and there may be other clever rearrangements that take advantage of unique aspects of various languages, compilers, implementations, and so on, but mostly it’s ok to let the compilers do their magic on the code as it is. I also chose C because I can write text out to a file easily. The same would be true if I wrote the code in Java or any other language and ran it directly so it could write to disk. This could also be done in JavaScript but you’d have to figure out what to do with the output (here’s a discussion of one possibility).

This code is also repurposed from the FORTRAN I originally used at Westinghouse while building simulator models. That’s why you see all the capital letters and really short control variable names. The names of the constants were my own twist on Hungarian notation, which has been depopularized in favor of CamelCase names that don’t try to say anything about data types, since refactoring and other operations may override whatever type you were using to begin with. There is also the potential for the default size of standard variables types to vary on different platforms or implementations. The policy on the simulator was that the first letter of a variable represented a specific property (e.g., P=pressure, T=temperature, Y=admittance, X=concentration, V=volume). The next three letters represented the various plant systems (e.g., RWU = Reactor Water Cleanup, OGS = Offgas). That left us four whole letters to differentiate the various examples of each type (say, temperatures) in a system. It was common for systems of the era to limit variable names to eight characters, not to mention 8.3 file names or something similar. I adopted that basic system for my own thermodynamic models and control systems while I worked at Bricmont. The “itc” prefix meant short integer, thermal model, constant. To that I might have added an “a” for array. “u” as the first letter would indicate a structure (or record, Java and JavaScript just use objects), “r” was for floating point numbers, “f” for double-precision floats, “s” for character strings, “l” for logical/boolean, and so on.

There’s one more bizarre thing I did here. C, Java, and the like use zero-based arrays while Pascal and FORTRAN use one based arrays (Pascal’s situation is even more complex but…). When I translated this to C I preserved its “one-basedness.” It shouldn’t be too much trouble to update.

Here’s the solution code in its most basic form. The ZMATX array is the 2-dimensional coefficient matrix and ZSRCE is the 1-dimensional source or driver matrix. This matrix will hold the results when the process completes. Both matrices must be loaded as described yesterday before the function is run. The ZAUX matrix is used on a temporary basis during execution.

Now comes the trick. In the next block of code I’ve embedded statements that write the previous line of code out to a file, with array indices replaced with the constant values for that iteration. I don’t write out any of the loop or control statements or any of the ancillary calculations. I only write out the statements that do actual matrix calculations.

Running this function generates the code you can download via this link. It’s over 11,000 lines and over 500k in storage, so it clearly can’t be displayed directly on a web page, either as a listing or as client-side code (JavaScript). This particular method is appropriate for a dedicated computer, or perhaps a server, if the number of concurrent sessions is limited. Note that I’ve also added a bit of wrapper information, but the beginning of the generated code is indicated with a comment. The automated code generator can easily be expanded to include any desired wrapper code and commentary. Also note that this version of extreme or total loop unrolling can be used to realize maximum efficiency, but it is also possible to use these techniques on a limited basis. You’ll still save a respectable amount of time, but at the cost of far less output code. The balance is up to you.

Here are the first hundred-plus lines of the output as an illustration:

Writing the code out longhand this way saves all of the extra calculations associated with incrementing loop variables, jumping to the beginning of loops, and calculating the indirect indices of the various array variables. This cuts execution time for array solutions by thirty percent. Since matrix calculations make up the bulk of the model system’s running time, these modifications represent the maximum performance improvement possible. That means there was no point in trying to identify major performance improvements elsewhere in the system. It was more important, then, to make sure the rest of the code was clear and maintainable.

Posted in Software | Tagged , , , | Leave a comment

Efficient Matrix Solutions for Special Classes of Engineering Problems

There are a large class of analogous engineering problems that may be addressed with a particular mathematical solution. Fluid systems where volumes containing fluid are connected by fluid flows, mechanical systems where masses are connected by springs and dashpots (which you can think of shock absorbers), electrical systems with nodes having voltage levels that are connected by currents, and thermal systems where nodes containing energy are connected by energy flows are all analyzed and solved in similar ways. The change in any quantity at a node is a function of the flows into and out of the node, the quantities stored in the node, the extent of the node, and the behavioral properties of the node. The specific equations may differ; current flow is a function of voltage differential over electrical resistance while fluid flow is a function of the square root of the pressure differential over the fluid flow resistance.

Let’s take a look at a simple fluid system.

This system has four internal nodes, a pump, a collection of valves, an orifice plate, and connections to external systems. The tank, which contains a liquid section and a gaseous section, might be part of the system as it is logically defined, but it may or may not be considered part of the system that is included in a particular mathematical solution. In this case we’ll consider the bottom of the tank to be an external node.

To keep things really, really simple for today lets skip to a series of equations written int terms of the pressure at each node. There are several possible components to each coefficient. There can be what I’ll call an accumulation term, which has to do with each node, a resistance or conductance term which has to do with the connections between nodes, external driver terms which have to do with influences from outside systems, and internal modification terms which describe how things can change within the system. I’ll break this down in detail in the not-to-distant future.

A1P1 + B1P2 + C1P3 + D1P4 = E1dP/dt + F1
A2P1 + B2P2 + C2P3 + D2P4 = E2dP/dt + F2
A3P1 + B3P2 + C3P3 + D3P4 = E3dP/dt + F3
A4P1 + B4P2 + C4P3 + D4P4 = E4dP/dt + F4

The coefficients A through F may be different in each equation and may be composed of different values. If there is no connection between two nodes, then the coefficients describing connections between them will be zero. Nodes are always connected to themselves, so the diagonal terms in the coefficient matrix (elements 1,1, 2,2, 3,3, and 4,4) will always have values. The accumulation term will be written in the form of (accumulation term)(Pnew – Pold)/(increment of time) = (sum of flow effects) + (sum of internal effects). If we express the values and coefficients in matrix form we get the following:

Here we see that the diagonal terms have an accumulation term, flow terms (some of which are driven by external connections), and in some cases internal effects (nodes 1 and 2 which represent the intake and discharge of a centrifugal pump). The off-diagonal nodes only have information about flow terms between connected nodes. Note that node 1 is not connected to node 3 or 4, so those terms are zero, while node 2 is connected to nodes 3 and 4, so those terms are non-zero. The results matrix has the old half of the accumulation term and, if appropriate, driver terms related to external connections.

So, do we see anything special about this matrix? You might notice that the coefficients on opposite sides of the diagonal are the same. When this is true the matrix is said to be symmetric. This is nice because methods have been developed to solve symmetric matrix problems more efficiently than general matrix problems. It’s hard to see in this case but another thing you might notice is that a lot of the elements in the off-diagonal corners are filled with zeros. If those zeros make up nice, triangular patterns the matrix is said to be banded, and more efficient solutions have been developed for those cases as well. I’ve illustrated the banding in the figure below.

If a matrix is both symmetric and banded we can use even more efficient algorithms, if we rearrange the matrices as shown in the figure below.

Let’s walk through this again with a larger system. Let’s say I want to model the flow of heat through a cross-section of material I’m trying to heat up. Each node has an area, an accumulated amount of energy, energy flowing to and from connected nodes (by conduction), and energy flowing to and from the external environment (by radiation and convection).

All of the energy flows are driven by temperature differentials (and other things), so we can write a series of equations for each node in terms of its temperature. The figure below shows a streamlined form of the equations without node or coefficient subscripts. The accumulation terms are represented by the coefficient A, external flows by E, and transport flows by T. I used X as the variable to be solved for, which in this case is temperature. The symmetry and banding are both very visible in this matrix:

In this case we have accumulation and transfer terms in every entry on the main diagonal, as well as external driver terms for the surface nodes. The solution matrix has accumulation terms for each node and external driver terms for each surface node. Some of the surface nodes are missing transfer terms.

If we rearrange to get rid of the superfluous information we end up with the following:

That, as I’ve discussed, looks like it could be solved much more quickly than the original matrix, doesn’t it?

The banding is very clear in this example, and it’s also clear that the configuration of nodes ensures that we have the narrowest bands possible. I proved this to myself once by calculating the narrowest possible band for every permutation of node order. This may not be a very important insight for such a regular and predictable configuration, but for unpredictable and irregular configurations the exercise is definitely worthwhile. You can name the nodes in any order you’d like, but you want to drop them into the matrix in the order that will yield the narrowest bands and the most efficient solution. This was done for the simulators built at Westinghouse. The Interactive Model Builder tool they created would do this automatically, and engineers who understood the technique applied it to their own models. I know I did for the model I wrote for the Offgas system.

I’ve described previously that there are ways to adjust the code to make even this efficient solution technique run faster. I’ll describe that in detail tomorrow.

Posted in Tools and methods | Tagged , , , , , | Leave a comment