I recently wrote about how data is collected and used in the different phases of my business analysis framework. After giving the most recent version of my presentation on the subject I was asked for clarification about how the data is used, so I wanted to write about that today.
I want to start by pointing out that data comes in many forms. It can be highly numeric, which is more the case for simulations and physical systems, and it can be highly descriptive, which is often more the case for business systems. Make no mistake, though, it’s all data. I’ll describe how data came into play in several different jobs I did to illustrate this point.
Sprout-Bauer (now Andritz) (details)
My first engineering job was as a process engineer for an engineering and services firm serving the pulp and paper industry. Our company (which was part of Combustion Engineering at the time, before being acquired by Andritz) sold turnkey mechanical pulping lines based on thermomechanical refiners. I did heat and material balances and drawings for sales proposals, and pulp quality and mill audits to show that we were making our quality and quantity guarantees and to serve as a basis for making process improvement recommendations. Data came in three major forms:
- Pulp Characteristics: Quite a few of these are defined in categories like freeness (Canadian Standard Freeness is approximately the coolest empirical measure of anything, ever!), fiber properties, strength properties, chemical composition, optical properties, and cleanliness. We’d analyze the effectiveness of the process by analyzing the progression of about twenty of these properties at various points in the production line(s). I spent my first month in the company’s research facility in Springfied, Ohio learning about the properties they typically used. It seems that a lot of these measures have been automated, which must be really helpful for analysts at the plants. It used to be that I’d go to a plant to draw samples from various points in the process (you’d have to incrementally fill buckets at sample ports about hourly over the course of a day), then dewater them, seal them in plastic bags, label them, and ship them off to Springfield, where the lab techs would analyze everything and report the resullts. Different species of trees and hard and soft woods required different processing as well, and that was all part of the art and science of designing and analyzing pulping processes. One time we had to send wood chips back in 55-gallon drums. Somehow this involved huge bags and a pulley hanging over the side of a 100-foot-high building. My partners held me by the back of my belt as I leaned out to move the pulley in closer so we could feed a rope through it. So yeah, data.
- Process volumes and contents: Pulp-making is a continuous process so everything is expressed on a rate basis. For example, if the plant was supposed to produce 280 air dried metric tons per day it might have a process flow somewhere with 30,000 gallons per minute at 27% consistency (the percentage of the mass flow composed of wood fiber with the remainder being steam, a few noncondensable gases, chemicals like liquors and bleaches, and some dirt or other junk). Don’t do the math, I’m just giving examples here. The flow conditions also included temperatures (based on total energy or enthalpy), and pressures, which allowed calculation of the volume flows and thus the equipment and pipe sizing needed to support the desired flow rates. The thermodynamic properties of water (liquid and gaseous) are a further class of data needed to perform these calculations. They’ve been compiled by researchers of the years. The behavior of flow through valves, fittings, and pipe is another form of data that has been compiled over time.
- The specifications and sizes of different pieces of equipment were also part of the data describing each system. Many pieces of equipment came in standard sizes because it was too difficult to make custom-sized versions. This was especially true of pulp refiners, which came in several different configurations. Other items were custom made for each application. Examples of these included conveyors, screw de-watering presses, and liquid phase separators. Some items, like screens and cleaners, were made in standard sizes and various numbers of them were used in parallel to support the desired flow rates. Moreover, the screens and cleaners would often be arranged in multiple phases. I didn’t calculate flows based on equipment sizes for the most part, I calculated them based on the need to produce a certain amount of pulp. The equipment and piping would later be sized to support the specified flows.
- The fourth item in this three-item list is money. In the end, every designed process had to be analyzed in terms of installed, fixed, and operating costs vs. benefits from sales. I didn’t do those calculations as a young engineer but they were definitely done as we and our customers worked out the business cases. I certainly saw how our proposals were put together and had a decent idea of what things cost. I’d learn a lot more about how those things are done in later jobs.
All these data elements have to be obtained through observation, testing, research, and elicitation (and sometimes negotiation), and all must be understood to analyze the process.
Westinghouse Nuclear Simulator Division (details)
Here I leveraged a lot of the experience I gained analyzing fluid flows and doing thermodynamic analyses in the paper industry. Examples of how I/we incorporated thermodynamic properties are here, here, and here. In this case the discovery was done for the modellers in that the elements to be simulated were already identified. This meant that we started with data collection, which we performed in two phases. We started by visiting the plant control room and recording the readings on every dial and indicator and the position or every switch, button, and dial. This gave us an indication of a few of the flows, pressures, and temperatures at different points in the system. The remainder of those values had to be calculated based on the equipment layouts and the properties of the fluids.
- Flow characteristics: These were mostly based on the physical properties of water and steam but we sometimes had to consider noncondensables, especially when they made up the bulk of the flow, as they did in the building spaces and the offgas system I worked on. We also had to consider concentrations of particulates like boron and radioactive elements. The radiation was tracked as an abstract emittance level that decayed over time. We didn’t worry about the different kinds of radiation and the particles associated with them. (As much as I’ve thought about this work in the years since I did it I find it fascinating that I never really “got” this detail until just now as I’m writing this.) As mentioned above, the thermodynamic properties of the relevant fluids have all been discovered and compiled over the years.
- Process volumes and contents: The flow rates were crucial and were driven by pressure differentials and pump characteristics and affected by the equipment it flowed through.
- The specifications and sizes of different pieces of equipment were also part of the data describing each system. We needed to do detailed research through a library of ten thousand documents to learn the dimensions and behavior of all the pipes, equipment items, and even rooms in the containment structure.
Beyond the variables describing process states and equipment characteristics, the simulation equations required a huge number of coefficients. These all had to be calculated from the steady-state values of the different process conditions. I spent so much time doing calculations and updating documents that I found it necessary to create a tool to manage and automate the process.
Another important usage of data was in the interfaces between models. These had to be negotiated and documented, and the different models had to be scheduled to run in a way that would minimize race conditions as each model updated its calculations in real-time.
In this position I did the kind of business process analysis and name-address-phone number-account number programming I’d been trying to avoid, since I was a hard core mechanical engineer, and all. Who knew I’d learn so much and end up loving it? This position’s contrast to most others I worked in the first half of my career taught me more about performing purposeful business analysis than any other single thing I did. I’m not sure I understood the oeuvre as a whole at the time, but it certainly gave me a solid grounding and a lot of confidence for things I did later. Here I write about how I integrated various insights and experiences over time to become the analyst and designer that I am today.
The FileNet document imaging system is used to scan documents so their images are moved around almost for free while the original hardcopies are warehoused. We’d do a discovery process to map an organization’s activities, say, the disability underwriting section of an insurance company, to find out what was going on. We interviewed representatives of the groups of workers in each process area to identify each of the possible actions they could take in response to receipt of any type of document or group of documents. This gave us the nouns and verbs of the process. Once we knew that, we’d gather up the adjectives of the process, the data that describe the activities, the entities processed (the documents), and the results generated. We gathered the necessary data mostly through interviews and reviews of historical records.
The first phase of the effort involved a cost-benefit analysis that only assessed the volumes and process times associated with the section’s activities. Since this was an estimation we collected process times via a minimum of observation and descriptions of SMEs. As a cross-check we reviewed whether our findings made sense in light of what we knew about how many documents were processed daily by each group of workers and the amount of time taken per action. Since the total amount of time spent per day per worker usually totaled up to just around eight hours we assumed our estimates were on target.
The next step was to identify which actions no longer needed to be carried out by workers, since all operations involving the physical documents were automated. We also estimated the time needed for the new actions of scanning and indexing the documents as they arrived. Finally, given assumptions for average pay rates for each class of worker, we were able to calculate the cost of running the As-Is process and the To-Be automated process and estimate the total savings that would be realized. We ended up having a long discussion about whether we’d save one minute per review or two minutes per review of collated customer files by the actual underwriting analysts, which was the most important single activity in the entire process. We ultimately determined that we could make a sufficient economic case for the FileNet solution by assuming a time savings of only one minute per review. The customer engaged two competitors, each of whom performed similar analyses, and our solution was judged to realize the greater net savings, about thirty percent per year on labor costs.
The data items identified and analyzed were similar to those I worked with in my previous positions. They were:
- Document characteristics: The type of document determined how it needed to be processed. The documents had to be collated into patient files, mostly containing medical records, that would be reviewed and scored. This would determine the overall risk for a pool of employees a potential customer company wanted to provide disability coverage for. The insurer’s analysis would determine whether it would agree to provide coverage and what rate would be charged for that pool.
- Process volumes and contents: These flows were defined in terms of documents per day per worker and per operation, with the total number arriving for processing each day being known.
- The number and kind of workers in each department is analogous to the equipment described in the systems above. The groups of workers determined the volume of documents that could be processed and the types of transformations and calculations that could be carried out.
Once the initial phase was completed we examined the documents more closely to determine exactly what information had to be captured from them in order to collate them into files for each employee and how those could be grouped with the correct company pool. This information was to be captured as part of the scanning and indexing operation. The documents would be scanned and automatically assigned a master index number. Then, an index operation would follow which involved reading information identifying an employee so it could be entered into a form on screen. Other information was entered on different screens about the applying company and its employee roster. The scores for each employee file, as assigned by the underwriters, also had to be included. The data items needed to design the user and control screens all had to be identified and characterized.
The work I did at Bricmont was mapped a little bit differently than the work at my previous jobs. It still falls into the three main classifications in a sense but I’m going to describe things differently in this case. For additional background, I describe some of the detailed thermodynamic calculations I perform here.
- Material properties of metals being heated or even melted: As in previous descriptions, the properties of materials are obtained from published research. Examples of properties determined as a function of temperature are thermal conductivity and specific heat capacity. Examples of properties that remained constant were density, the emissivity of (steel) workpieces, and the Stephan-Boltzmann constant.
- Geometry of furnaces and metal workpieces being heated: The geometry of each workpiece determines the layout of the nodal network within it. The geometry of the furnace determines how heat, and therefore temperature, is distributed at different locations. The location of workpieces relative to each other determines the amount of heat radiation that can be transferred to different sections of the surface of the workpieces (this obviously doesn’t apply for heating by electrical induction). This determines viewing angles and shadows cast.
- Temperatures and energy inputs: Energy is transferred from furnaces to workpieces usually by radiative heat transfer, except in the cases where electric induction heating is used. Heat transfer is a function of temperature differential (technically the difference between the fourth power of the absolute temperature of the furnace and the fourth power of the temperature of the workpiece) for radiative heating methods and a function of the electrical inputs minus losses for inductive methods.
- Contents of messages received from and sent to external systems: Information received from external systems included messages about the nature of workpieces or materials loaded into a furnace, the values of instrument readings (especially thermocouples that measure temperature), other physical indicators that result in the movement of workpieces through the furnace, and permissions to discharge heated workpieces to the rolling mill, if applicable. Information forwarded to other systems included messages to casters or slab storage to send new workpieces, messages to the rolling mill about the workpiece being discharged, messages to the low-level control system defining what the new temperature or movement setpoints should be, and messages to higher-level administrative and analytic systems about all activities.
- Data logged and retrieved for historical analysis: Furnace control systems stored a wide range of operating, event, and status data that couuld be retrieved for further analysis.
The messages relayed between systems employed a wide variety of inter-process communication methods.
American Auto-Matrix (details)
In this position I worked with low-level controllers that exchanged messages with each other to control HVAC devices, manage other devices and setpoints, and record and analyze historical data. The platforms I worked on were mostly different from those at previous jobs but in principle they did the same things. This was mostly interesting because of the low-level granularity of the controllers used and the variety of communication protocols employed.
The major difference between the work I did with these two companies and all the previous work I’d done is that I switched from doing continuous simulation to discrete-event simulation. I discuss some of the differences here, though I could always go into in more detail. At a high level the projects I worked on incorporated the same three major classes of data as what I’ve described above (characteristics of the items being processed, the flow and content of items being processed, and the characteristics of the subsystems where the processing occurs). However, while discrete-event simulation can be deterministic, its real power comes from being able to analyze stochastic processes. This is accomplished by employing Monte Carlo methods. I described how those work in great detail yesterday.
To review, here are the major classes of data that describe a process and are generated by a process:
- Properties of items or materials being processed
- physical properties of materials that affect processing
- information content of items that affect processing
- states of items or materials that affect processing
- contents of messages that affect processing
- Volumes of items or materials being processed
- Characteristics of equipment or activities doing the processing
- Financial costs and benefits
- Output data, items, materials, behaviors, or decisions generated by the process
- Historical data recorded for later analysis