Graph Project: Automatically Generating Axis Labels

Up until not the graph object has required that the parameters governing the generation of axis tick labels be specified explicitly. However, it would be nice for the graph to be able to generate reasonable value labels on its own given the high and low range of data it’s supposed to plot. I decided that a) I wanted to make this happen and b) I’m not overly mad about the way I’ve seen it done by other software.

I have a few more scenarios to test (I haven’t looked at ranges that cross zero, i.e., that have both negative and positive values) but I did come up with a method that should generate roundish values across somewhere between six and eleven cycles.

I began with the observation that the size of the chosen interval should be based on the span of values (the difference between the highest and lowest values in a data set) and not by the magnitude of the values. Basically, I calculate the span, divide it by six, and then massage that value until it looks like a fairly rounded value of the appropriate magnitude. The code listing is shown farther down.

I wrote some code and tried a few range values but realized I needed to be more systematic and exhaustive, so I created an Excel worksheet that tested combinations of base values from 10-6 to 106 and ranges from 10-8 to 108, with some randomization thrown in for the first significant digit. It implemented the code in spreadsheet form and listed the expected results. You can see the patterns in the image, starting from column T and moving right.

As you can see, some combinations of base and range yield only two or three values for the range, while other combinations yield up to a dozen (and other testing has indicated that more may be possible). I therefore found it necessary to add extra checks to divide the interval if there are too few and shrink the interval if there are too many. That said, I also found that the code behaves just a little bit differently than does the spreadsheet (it gives better results, I think the problem in the spreadsheet is in columns K and L, which corresponds to lines 7 to 15 in the code snippet), but the adjustments are occasionally still needed.

So far I’ve just written the code to generate the range values but I have not yet extended this to draw the values generated. That’s going to be interesting because the calculations generate unexpected results when some of the values cannot be represented exactly. Who can tell what’s going to happen when you think you’re supposed to get 0.002999, 0.003009, and 0.000001 but you actually get 0.0029990000000000004, 0.0030080000000000016, and 0.0000010000000000000002? Another issue is that combinations of very large and very small numbers (e.g., 4,700,000,000.0000007) cannot be represented; the least significant digits get truncated entirely.

Annoyances like this came up when I wrote code to generate graphs in the early 90s and they come up now, but that’s part of the game, isn’t it? Computers do what they do and you have to work around that. The formatting routines for displaying the tick values may or may not take care of these issues so we’ll see how it goes. If they don’t, then I’m going to add some extra manipulations.

I’ll be testing this going forward and describe any further modifications I identify.

This entry was posted in Tools and methods and tagged , . Bookmark the permalink.

Leave a Reply