Yesterday I described how to set up a matrix for efficient solution, today I’ll describe how to actually execute the solution. I plan to go back and revisit the intermediate steps but I wanted to show how this series of techniques pays off. I also wanted to leverage this work to describe my feel for the amount of computing power that might be available for JavaScript execution on various computing platforms, which discussion started on Monday.

Here is an implementation of the symmetric banded matrix solution written in C, which is very easily converted into similar languages (Java, JavaScript, etc.). I chose this example because it was working code from a project I did back in the 90s, and I know it’s correct. A few things can be done to make it more compact (using the += formation, for instance) and there may be other clever rearrangements that take advantage of unique aspects of various languages, compilers, implementations, and so on, but mostly it’s ok to let the compilers do their magic on the code as it is. I also chose C because I can write text out to a file easily. The same would be true if I wrote the code in Java or any other language and ran it directly so it could write to disk. This could also be done in JavaScript but you’d have to figure out what to do with the output (here’s a discussion of one possibility).

This code is also repurposed from the FORTRAN I originally used at Westinghouse while building simulator models. That’s why you see all the capital letters and really short control variable names. The names of the constants were my own twist on Hungarian notation, which has been depopularized in favor of CamelCase names that don’t try to say anything about data types, since refactoring and other operations may override whatever type you were using to begin with. There is also the potential for the default size of standard variables types to vary on different platforms or implementations. The policy on the simulator was that the first letter of a variable represented a specific property (e.g., P=pressure, T=temperature, Y=admittance, X=concentration, V=volume). The next three letters represented the various plant systems (e.g., RWU = Reactor Water Cleanup, OGS = Offgas). That left us four whole letters to differentiate the various examples of each type (say, temperatures) in a system. It was common for systems of the era to limit variable names to eight characters, not to mention 8.3 file names or something similar. I adopted that basic system for my own thermodynamic models and control systems while I worked at Bricmont. The “itc” prefix meant short **i**nteger, **t**hermal model, **c**onstant. To that I might have added an “a” for array. “u” as the first letter would indicate a structure (or record, Java and JavaScript just use objects), “r” was for floating point numbers, “f” for double-precision floats, “s” for character strings, “l” for logical/boolean, and so on.

There’s one more bizarre thing I did here. C, Java, and the like use zero-based arrays while Pascal and FORTRAN use one based arrays (Pascal’s situation is even more complex but…). When I translated this to C I preserved its “one-basedness.” It shouldn’t be too much trouble to update.

Here’s the solution code in its most basic form. The ZMATX array is the 2-dimensional coefficient matrix and ZSRCE is the 1-dimensional source or driver matrix. This matrix will hold the results when the process completes. Both matrices must be loaded as described yesterday before the function is run. The ZAUX matrix is used on a temporary basis during execution.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
const short itcNodes = 147; //it wasn't 49 nodes in a 7x7 configuration, it was 21x7 and 147 nodes. const short itcBandWid = 8; //this is the narrow number of nodes on a side plus one float raZMATX[itcNodes+1][itcBandWid+1]; float raZSRCE[itcNodes+1]; float raZAUX[itcNodes+1]; void CalcMatrix(void) { float C; short N1,K,K1,K2,K3,NI,H,I,J; N1 = itcNodes - 1; for (K = 1; K <= N1; K++) { C = raZMATX[K][1]; K1 = K + 1; NI = K1 + itcBandWid - 2; H = NI; if (H > itcNodes) H = itcNodes; for (J = 2; J <= itcBandWid; J++) raZAUX[J] = raZMATX[K][J]; for (J = K1; J <= H; J++) { K2 = J - K + 1; raZMATX[K][K2] = raZMATX[K][K2]/C; } raZSRCE[K] = raZSRCE[K]/C; for (I = K1; I <= H; I++) { K2 = I - K1 + 2; C = raZAUX[K2]; if (fabs(C) > 1.0E-10) { for (J = I; J <= H; J++) { K2 = J - I + 1; K3 = J - K + 1; raZMATX[I][K2] = raZMATX[I][K2] - C*raZMATX[K][K3]; } raZSRCE[I] = raZSRCE[I] - C*raZSRCE[K]; } } } raZSRCE[itcNodes] = raZSRCE[itcNodes]/raZMATX[itcNodes][1]; for (I = 1; I <= N1; I++) { K = itcNodes - I; K1 = K + 1; NI = K1 + itcBandWid - 2; if (NI < itcNodes) H = NI; else H = itcNodes; for (J = K1; J <= H; J++) { K2 = J - K + 1; raZSRCE[K] = raZSRCE[K] - raZMATX[K][K2] * raZSRCE[J]; } } } |

Now comes the trick. In the next block of code I’ve embedded statements that write the previous line of code out to a file, with array indices replaced with the constant values for that iteration. I don’t write out any of the loop or control statements or any of the ancillary calculations. I only write out the statements that do actual matrix calculations.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
void MakeMatrix(int iLayer, int iBandWid) { int N1,NI,JJ,KK,K1,K2,K3,II,L,length,err; float C; FILE *f; char s[100]; f = fopen("MATRIX.COD","w+"); N1 = iLayer - 1; for (KK = 1; KK <= N1; KK++) { sprintf(s,"C = raZMATX[%2d][1];\n",KK); length = strlen(s); err = fwrite(&s,length,1,f); sprintf(s,"C = raZMATX[%2d][1];\n",KK); length = strlen(s); err = fwrite(&s,length,1,f); K1 = KK + 1; NI = K1 + iBandWid - 2; L = NI; if (L > iLayer) L = iLayer; for (JJ = 2; JJ <= iBandWid; JJ++) { sprintf(s,"raZAUX[%2d] = raZMATX[%2d][%2d];\n",JJ,KK,JJ); length = strlen(s); err = fwrite(&s,length,1,f); } for (JJ = K1; JJ <= L; JJ++) { K2 = JJ - KK + 1; sprintf(s,"raZMATX[%2d][%2d] /= C;\n",KK,K2); length = strlen(s); err = fwrite(&s,length,1,f); } sprintf(s,"raZSRCE[%2d] = raZSRCE[%2d]/C;\n",KK,KK); length = strlen(s); err = fwrite(&s,length,1,f); for (II = K1; II <= L; II++) { K2 = II - K1 + 2; sprintf(s,"C = raZAUX[%2d];\n",K2); length = strlen(s); err = fwrite(&s,length,1,f); sprintf(s,"if (fabs(C) <= 1.0E-10) C = 0.0\n{\n"); length = strlen(s); err = fwrite(&s,length,1,f); for (JJ = II; JJ <= L; JJ++) { K2 = JJ - II + 1; K3 = JJ - KK + 1; sprintf(s," raZMATX(II,K2) -= C*raZMATX(KK,K3);\n",II,K2,KK,K3); length = strlen(s); err = fwrite(&s,length,1,f); } sprintf(s," raZSRCE[II] -= C*raZSRCE[KK];\n}\n",II,KK); length = strlen(s); err = fwrite(&s,length,1,f); } } sprintf(s,"raZSRCE[%2d] /= raZMATX[%2d][1];\n",iLayer,iLayer); length = strlen(s); err = fwrite(&s,length,1,f); for(II = 1; II <= N1; II++) { KK = iLayer - II; K1 = KK + 1; NI = K1 + iBandWid - 2; sprintf(s,"if (NI < %2d)\n",iLayer); length = strlen(s); err = fwrite(&s,length,1,f); sprintf(s," L = NI;\nelse\n"); length = strlen(s); err = fwrite(&s,length,1,f); sprintf(s," L = %2d;\n",iLayer); length = strlen(s); err = fwrite(&s,length,1,f); for (JJ = K1; JJ <= L; JJ++) { K2 = JJ - KK + 1; sprintf(s,"raZSRCE[KK] -= raZMATX[KK][K2] * raZSRCE[JJ];\n",KK,KK,K2,JJ); length = strlen(s); err = fwrite(&s,length,1,f); } } fclose(f); }; //MakeMatrix; |

Running this function generates the code you can download via this link. It’s over 11,000 lines and over 500k in storage, so it clearly can’t be displayed directly on a web page, either as a listing or as client-side code (JavaScript). This particular method is appropriate for a dedicated computer, or perhaps a server, if the number of concurrent sessions is limited. Note that I’ve also added a bit of wrapper information, but the beginning of the generated code is indicated with a comment. The automated code generator can easily be expanded to include any desired wrapper code and commentary. Also note that this version of extreme or total loop unrolling can be used to realize maximum efficiency, but it is also possible to use these techniques on a limited basis. You’ll still save a respectable amount of time, but at the cost of far less output code. The balance is up to you.

Here are the first hundred-plus lines of the output as an illustration:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
C = raZMATX[ 1][1]; raZAUX[ 2] = raZMATX[ 1][ 2]; raZAUX[ 3] = raZMATX[ 1][ 3]; raZAUX[ 4] = raZMATX[ 1][ 4]; raZAUX[ 5] = raZMATX[ 1][ 5]; raZAUX[ 6] = raZMATX[ 1][ 6]; raZAUX[ 7] = raZMATX[ 1][ 7]; raZAUX[ 8] = raZMATX[ 1][ 8]; raZMATX[ 1][ 2] = raZMATX[ 1][ 2]/C; raZMATX[ 1][ 3] = raZMATX[ 1][ 3]/C; raZMATX[ 1][ 4] = raZMATX[ 1][ 4]/C; raZMATX[ 1][ 5] = raZMATX[ 1][ 5]/C; raZMATX[ 1][ 6] = raZMATX[ 1][ 6]/C; raZMATX[ 1][ 7] = raZMATX[ 1][ 7]/C; raZMATX[ 1][ 8] = raZMATX[ 1][ 8]/C; raZSRCE[ 1] = raZSRCE[ 1]/C; C = raZAUX[ 2]; if ((C > 1.0E-10) || (C < -1.0E-10)) { raZMATX[ 2][ 1] = raZMATX[ 2][ 1] - C*raZMATX[ 1][ 2]; raZMATX[ 2][ 2] = raZMATX[ 2][ 2] - C*raZMATX[ 1][ 3]; raZMATX[ 2][ 3] = raZMATX[ 2][ 3] - C*raZMATX[ 1][ 4]; raZMATX[ 2][ 4] = raZMATX[ 2][ 4] - C*raZMATX[ 1][ 5]; raZMATX[ 2][ 5] = raZMATX[ 2][ 5] - C*raZMATX[ 1][ 6]; raZMATX[ 2][ 6] = raZMATX[ 2][ 6] - C*raZMATX[ 1][ 7]; raZMATX[ 2][ 7] = raZMATX[ 2][ 7] - C*raZMATX[ 1][ 8]; raZSRCE[ 2] = raZSRCE[ 2] - C*raZSRCE[ 1]; } C = raZAUX[ 3]; if ((C > 1.0E-10) || (C < -1.0E-10)) { raZMATX[ 3][ 1] = raZMATX[ 3][ 1] - C*raZMATX[ 1][ 3]; raZMATX[ 3][ 2] = raZMATX[ 3][ 2] - C*raZMATX[ 1][ 4]; raZMATX[ 3][ 3] = raZMATX[ 3][ 3] - C*raZMATX[ 1][ 5]; raZMATX[ 3][ 4] = raZMATX[ 3][ 4] - C*raZMATX[ 1][ 6]; raZMATX[ 3][ 5] = raZMATX[ 3][ 5] - C*raZMATX[ 1][ 7]; raZMATX[ 3][ 6] = raZMATX[ 3][ 6] - C*raZMATX[ 1][ 8]; raZSRCE[ 3] = raZSRCE[ 3] - C*raZSRCE[ 1]; } C = raZAUX[ 4]; if ((C > 1.0E-10) || (C < -1.0E-10)) { raZMATX[ 4][ 1] = raZMATX[ 4][ 1] - C*raZMATX[ 1][ 4]; raZMATX[ 4][ 2] = raZMATX[ 4][ 2] - C*raZMATX[ 1][ 5]; raZMATX[ 4][ 3] = raZMATX[ 4][ 3] - C*raZMATX[ 1][ 6]; raZMATX[ 4][ 4] = raZMATX[ 4][ 4] - C*raZMATX[ 1][ 7]; raZMATX[ 4][ 5] = raZMATX[ 4][ 5] - C*raZMATX[ 1][ 8]; raZSRCE[ 4] = raZSRCE[ 4] - C*raZSRCE[ 1]; } C = raZAUX[ 5]; if ((C > 1.0E-10) || (C < -1.0E-10)) { raZMATX[ 5][ 1] = raZMATX[ 5][ 1] - C*raZMATX[ 1][ 5]; raZMATX[ 5][ 2] = raZMATX[ 5][ 2] - C*raZMATX[ 1][ 6]; raZMATX[ 5][ 3] = raZMATX[ 5][ 3] - C*raZMATX[ 1][ 7]; raZMATX[ 5][ 4] = raZMATX[ 5][ 4] - C*raZMATX[ 1][ 8]; raZSRCE[ 5] = raZSRCE[ 5] - C*raZSRCE[ 1]; } C = raZAUX[ 6]; if ((C > 1.0E-10) || (C < -1.0E-10)) { raZMATX[ 6][ 1] = raZMATX[ 6][ 1] - C*raZMATX[ 1][ 6]; raZMATX[ 6][ 2] = raZMATX[ 6][ 2] - C*raZMATX[ 1][ 7]; raZMATX[ 6][ 3] = raZMATX[ 6][ 3] - C*raZMATX[ 1][ 8]; raZSRCE[ 6] = raZSRCE[ 6] - C*raZSRCE[ 1]; } C = raZAUX[ 7]; if ((C > 1.0E-10) || (C < -1.0E-10)) { raZMATX[ 7][ 1] = raZMATX[ 7][ 1] - C*raZMATX[ 1][ 7]; raZMATX[ 7][ 2] = raZMATX[ 7][ 2] - C*raZMATX[ 1][ 8]; raZSRCE[ 7] = raZSRCE[ 7] - C*raZSRCE[ 1]; } C = raZAUX[ 8]; if ((C > 1.0E-10) || (C < -1.0E-10)) { raZMATX[ 8][ 1] = raZMATX[ 8][ 1] - C*raZMATX[ 1][ 8]; raZSRCE[ 8] = raZSRCE[ 8] - C*raZSRCE[ 1]; } C = raZMATX[ 2][1]; raZAUX[ 2] = raZMATX[ 2][ 2]; raZAUX[ 3] = raZMATX[ 2][ 3]; raZAUX[ 4] = raZMATX[ 2][ 4]; raZAUX[ 5] = raZMATX[ 2][ 5]; raZAUX[ 6] = raZMATX[ 2][ 6]; raZAUX[ 7] = raZMATX[ 2][ 7]; raZAUX[ 8] = raZMATX[ 2][ 8]; raZMATX[ 2][ 2] = raZMATX[ 2][ 2]/C; raZMATX[ 2][ 3] = raZMATX[ 2][ 3]/C; raZMATX[ 2][ 4] = raZMATX[ 2][ 4]/C; raZMATX[ 2][ 5] = raZMATX[ 2][ 5]/C; raZMATX[ 2][ 6] = raZMATX[ 2][ 6]/C; raZMATX[ 2][ 7] = raZMATX[ 2][ 7]/C; raZMATX[ 2][ 8] = raZMATX[ 2][ 8]/C; raZSRCE[ 2] = raZSRCE[ 2]/C; C = raZAUX[ 2]; if ((C > 1.0E-10) || (C < -1.0E-10)) { raZMATX[ 3][ 1] = raZMATX[ 3][ 1] - C*raZMATX[ 2][ 2]; raZMATX[ 3][ 2] = raZMATX[ 3][ 2] - C*raZMATX[ 2][ 3]; raZMATX[ 3][ 3] = raZMATX[ 3][ 3] - C*raZMATX[ 2][ 4]; raZMATX[ 3][ 4] = raZMATX[ 3][ 4] - C*raZMATX[ 2][ 5]; raZMATX[ 3][ 5] = raZMATX[ 3][ 5] - C*raZMATX[ 2][ 6]; raZMATX[ 3][ 6] = raZMATX[ 3][ 6] - C*raZMATX[ 2][ 7]; raZMATX[ 3][ 7] = raZMATX[ 3][ 7] - C*raZMATX[ 2][ 8]; raZSRCE[ 3] = raZSRCE[ 3] - C*raZSRCE[ 2]; } |

Writing the code out longhand this way saves all of the extra calculations associated with incrementing loop variables, jumping to the beginning of loops, and calculating the indirect indices of the various array variables. This cuts execution time for array solutions by thirty percent. Since matrix calculations make up the bulk of the model system’s running time, these modifications represent the maximum performance improvement possible. That means there was no point in trying to identify major performance improvements elsewhere in the system. It was more important, then, to make sure the rest of the code was clear and maintainable.