R.P. Churchill.com | Adventures in Engineering and Life

A Simulationist’s Framework for Business Analysis: Round Four

Posted on January 30, 2019 by R.P. Churchill

Yesterday I gave my talk at the Orlando IIBA chapter meetup. The updated presentation is here:

https://www.rpchurchill.com/presentations/zSimFrameForBA_Orlando/SimFrameForBA.html

This presentation was a bit different than previous ones because I’ve begun to add some new material dealing with what I call The Unified Theory of Business Analysis. I call it this because the surveys I’ve conducted (previous combined results are here) show that the contexts in which people perform business analysis can vary widely. This effort has been my attempt to get a handle on the practice and communicate my findings to an audience.

Specifically, I’m coming to the conclusion that the reason things look so different to different practitioners is because most people are only trying to solve a limited problem most of the time. My framework describes a process for conducting a full scope engagement from beginning to end, and most people — and certainly most business analysts — typically find themselves involved in a limited subset of such efforts.

Let’s look at a possible map of an entire (small) business or physical process.

Let’s also consider the process of completing an end-to-end process as a whole.

The first question we might ask is How can the 50 BA Techniques from the BaBOK be applied? We can imagine that some techniques are best applied to certain subparts of an operational process and during certain phases in time of a project process.

The next question to ask might be Where do the most commonly used software tools fit? Excel appears to be the most common tool BAs use to compile and manipulate information, while Jira and similar systems are used for communication and tracking. Visio, Word, Confluence, Outlook, and SharePoint are also used a lot by BAs.

Another source of confusion might be that a BA does not always participate in all phases of a project. He or she might be involved only in the requirements phase or the user testing phase or the discovery phase or even in a limited subset of phases. I’ve pointed out previously that all projects and efforts perform all of these phases implicitly, but some may be streamlined or omitted as the size of the effort scales down. The entire process takes place implicitly or explicitly even if any one participant or group only sees a small fraction of the activity.

A further consideration is whether the BA is looking at the function of a part of a process or the characteristics of what will perform that function. For example, the abstract part of a solution might involve calculations to be performed, messages to be passed into and out of a process component, and items to be stored and retrieved. The concrete part of a solution might involve determining the qualities the server must have to carry out the abstract functions with sufficient speed and reliability. That is, sometimes the BA will evaluate what something has to do and sometimes what something has to be.

The point is that organizations exist to provide value, and it takes many different kinds of people to make that happen. Business analysts are generally right in the middle of the action.

* * * * *

I also got a few more survey responses, the results of which are reported below.

List 5-8 steps you take during a typical project.

planning
elicitation
requirements
specification writing
QA
UAT

identify the problem

studying subject matter
planning
elicitation
functional specification writing
documentation

identify stakeholders
assess working approach (Waterfall, Agile, Hybrid)
determine current state of requirements and maturity of project vision
interview stakeholders
write and validate requirements

problem definition
value definition
decomposition
dependency analysis
solution assessment

process mapping
stakeholder interviews
write use cases
document requirements
research

List some steps you took on a weird or non-standard project.

A description of “weird” usually goes along with a particular person I am working with rather than a project. Some people like things done a certain way or they need things handed to them or their ego stroked. I accommodate all kinds of idiosyncrasies so that I can get the project done on time.
data dictionary standardization
document requirements after development work has begun
For a client who was unable to clearly explain their business processes and where several SMEs had to be consulted to form the whole picture, I drew workflows to identify inputs/outputs, figure out where the gaps in our understanding existed, and identify the common paths and edge cases.
investigate vendor-provided code for business process flows
reverse code engineering
review production incident logs
team up with PM to develop a plan to steer the sponsor in the right diection
track progress in PowerPoint because the sponsor insisted on it
train the team how to read use case diagrams

Name three software tools you use most.

Visio (5)
Jira (4)
Excel (3)
Confluence (2)
Google Docs (2)
email (1)
Gephi (dependency graphing) (1)
Google Calendar (1)
MS Teams (1)
OneNote (1)
Version One (1)
Visual Studio Team Services (now Azure DevOps) (VSTS) (1)
Word (1)

Name three non-software techniques you use most.

critical questioning
critical questioning (ask why fiive times), funnel questioning
data analysis
informal planning poker
interviews
meeting facilitation (prepare an agenda, define goals, manage time wisely, ensure notes are taken and action items documented)
observation
Post-It notes (Any time of planning or breaking down of a subject, I use different colored Post-Its, writing with a Sharpie, on the wall. This allows me to physically see an idea from any distance. I can also move and categorize at will. When done, take a picture.)
process mapping
relationship building
requirements verification and validation
stakeholder analysis
stakeholder interviews
visual modeling
whiteboarding
wireframe

Name the goals of a couple of different projects.

automate a manual form with a workflow
automate the process of return goods authorizations
build out end user-owned applications into IT managed services
develop a new process to audit projects in flight
develop an interface between two systems
implement data interface with two systems
implement software for a new client
implement vendor software with customizations
integrate a new application with current systems/vendors
merge multiple applications
migrate to a new system
redesign a system process to match current business needs
update the e-commerce portion of a website to accept credit and debit cards

These findings fit in nicely with previously collected survey data.

Posted in Tools and methods | Tagged BA survey, BABOK, Bob's Analytic Framework, business analysis, IIBA, presentation, Unified Theory of Business Analysis | Leave a comment

Computer Simulation

Posted on January 21, 2019 by R.P. Churchill

I gave this talk on computer simulation at the Mensa Regional Gathering in Orlando on Sunday, January 20, 2019. The slides for the presentation are here.

I give a brief description of the major types of simulation and discuss some applications of each.

Posted in Uncategorized | Leave a comment

Monitoring System Health and Availability, and Logging: Part 4

Posted on October 2, 2018 by R.P. Churchill

One more piece of context must be added to the discussion I’ve written up here, here, and here, and that is the place of these operations in the 7-layer OSI communications model.

The image above is copied from and linked to the appropriate Wikipedia page. The clarification I’m making is that, in general, all of the operations I’m describing with respect to monitoring and logging take place strictly at level 7, the application layer. This is the level of the communication process that application programmers deal with in most cases, particularly when working with higher-level protocols like TCP/IP and HTTP, even if that code writes specific information into message headers along with the messages.

Some applications will work with communications at lower levels. For example, I’ve worked with serial communications in real-time C++ code triggered by hardware interrupts where the operations at layers 5 and 6 were handled in the application, but the operations at layers 1 through 4 were handled by default in hardware and firmware and routing isn’t a consideration because serial is just a point-to-point operation. Even in that case, the monitoring and logging actions are performed (philosophically) at the application layer (layer 7).

Finally, it’s also possible to monitor configuration of certain items at lower levels. Examples are ports, urls, IP addresses, security certificates, authorization credentials, machine names, software version numbers (including languages, databases, operating systems), and other items that may affect the success or failure of communications. Misconfiguration of these items are likely to result in complete inability to communicate (e.g., incorrect network settings) or strange side-effects (e.g., incorrect language versions, especially for virtual machines supporting Java, JavaScript/Node, PHP, and the like).

Posted in Software | Tagged communication, distributed systems, inter-process communication, logging, monitoring and logging, system architecture | Leave a comment

Book: Facts and Fallacies of Software Engineering

Posted on September 29, 2018 by R.P. Churchill

This week I read the book Facts and Fallacies of Software Engineering by Robert L. Glass, in preparation for attending a new meetup devoted to discussing classic books about software development. The author opines that the information in the book is ageless, and that remains mostly true some fifteen years after its initial release. The book mentions Agile development once in passing, right near the end, and as much as that management technique was meant to address some of the issues discussed in the book, that method’s pioneers have gained a deep appreciation for what Agile (and Scrum) do well and what they don’t do so well. With that in mind they’ve moved on to newer approaches. (One should also bear in mind that every approach should be modified to local conditions and used as a guideline rather than a brittle, formalist approach to be followed to the letter above all.)

One of the observations Glass makes is that the history of software development is littered with new methods, paradigms, and so on, that promise to revolutionize the field by orders of magnitude. The truth, he suggests, is that the innovations, while valuable, tend to improve things by no more than forty percent, a far cry from “orders of magnitude.”

And why is this so? Mostly, it’s because software is incredibly complex, a fact that is too rarely understood or appreciated. This complexity is embedded in every step and aspect of software development and engineering, and an improvement in any individual facet–no matter how great–can have only so much effect on the practice as a whole. Some facets of the process are (my list): identifying customer/business/operational requirements, identifying system requirements, identifying user requirements, language features, low-level computer science methodology (classic, foundational algorithms), interface features, architecture and interface paradigms, testing methodologies, governance schemes, and tools and languages to accomplish all of these.

I spent five years analyzing the logistics of aircraft maintenance and supply, where individual aircraft were represented less as individual entities in their own right and more as collections of, as my mentor described it to me, “several thousand parts flying in close formation!” Of those thousands of parts, each one had a rate at which it needed to be maintained and/or replaced. Even if some parts almost never required maintenance while others required a lot more, the sheer number of parts overall meant that making even major improvements to the reliability or maintainability of a handful of the most “troublesome” parts could have only a limited effect on the overall maintainability and support overhead for any aircraft or group of aircraft. This was often a source of disappointment for our customers, and I can see why. The same must be true of evangelists for new techniques and ideas in software engineering. Nonetheless there will always be evangelists. They have things to sell and they might not know about the historical trajectories of previous ideas.

Still, forty-percent improvements are not trivial by any means, and a long succession of such improvements has had an incredible impact on our ability to produce software. Not only are we producing larger and more complex software systems more quickly, but more people are creating far more of them.

That said, it also remains true that a large proportion of software projects fail, and that happens for many reasons. I can’t remember if Glass says this directly, but I would say that the reasons for most failures don’t have to do so much with the quality or efforts of individual programmers (though the quality of individual teams is quite important, more on that in a bit), but rather have to do with not identifying the correct problem in the first place. I am particularly sensitive to this because, as my website’s tagline suggests, solving the right problem is what I consider to be my “superpower.”

Correctly identifying the problem to be solved entails several parts: understanding the customer’s process, figuring out how to abstract the important aspects of that process to automate and improve it, and to come up with a high-level architecture plan in which to implement it. Having a good governance methodology helps as well, but it is rarely the source of failure. The biggest reason for failure of any project, of course, is poor team dynamics, which affects every aspect of the process from discovery and design through implementation. I couldn’t find the exact quote, but I read somewhere that the outcome of battle is usually determined in the minds of the commanders. What’s more, the outcome is often determined before the battle even begins. This is akin to saying that if you don’t know what problem you are trying to solve you are likely doomed before you start.

There is some tension between system architecture on the one hand and flexibility and adaptability as you go. Starting with the architecture seems like a top down approach while attempting to use Agile and Scrum methods to elucidate the requirements on the fly seems like a bottom up approach. It’s not like either approach can’t work in its pure form in limited circumstances, but the tension between the two should be resolved using a hybrid approach. It’s perfectly OK, and even desirable, to have a good concept of the overall architecture are early as possible, so the entire effort can work within that framework. It’s also OK, and also desirable, to continually gather feedback from the customer as the project proceeds so course corrections can be made.

The key in my mind, is to always come up with a flexible and modular architecture that can be easily adapted to situations as they are identified. This is why I always strive to break down a customer’s process into its most basic components. Once that’s done I can identify common and repetitive themes which can be addressed by common building blocks of functionality. I can then design and implement a system based on the smallest number of building blocks, which can be combined in numerous ways with minimal customization, to address the problems at hand. The more modular the solution, the simpler the building blocks, and the less customization is required, the more the result can be made efficient, robust, maintainable, flexible, approachable and comprehensible (by both the user and the builder/maintainer), and maintainable. (This does not mean that software can be universalized, some uniqueness is unavoidable and necessary, but that is another topic the book discusses.)

That is always my way of thinking, anyway. So how does this comport with Glass’s observations?

In his discussion of software quality (itself a potentially elusive term) he talks about a list of -ilities: portability, reliability, efficiency, usability testability, understandability, and modifiability. All of the items on this list match observations I’ve had with the exception of portability, which turns out to be a special case I’ll revisit shortly. I don’t think this means I’m especially insightful, I think it means that anyone who’s seen and done enough in the field over a long enough period of time is likely to come to similar conclusions.

That said, Glass enumerates 55 facts and ten fallacies. He classifies his facts into categories as follows:

Twelve of the facts are simply little known. They haven’t been forgotten; many people haven’t heard of them. But they are, I would assert, fundamentally important.

Eleven of them are pretty well accepted, but no one seems to act on them.

Eight of them are accepted, but we don’t agree on how–or whether–to fix the problems they represent.

Six of them are probably totally accepted by most people, with no controversy and little forgetting.

Five of them, many people will flat-out disagree with.

Five of them are accepted by many people, but a few wildly disagree, making them quite controversial.

And yes, he knows that doesn’t add up to 55 and explains why.

The interesting thing about his observations is that he provides a context for each and explains the nature of the associated controversies, if there are any. I don’t recall having a strong disagreement with anything he described, which may or may not be meaningful.

In my parenthetical about uniqueness I note that different solutions have to address different needs. This necessarily limits how general any piece of software can be. This also affects portability. Smaller concepts are portable. These have to do with specific data structures, optimizations, and so on. They are often the stuff of pure computer science, at least in its early days. Larger concepts are less portable. It might be a good idea to share lessons about super-efficient sorting mechanisms across languages and platforms, but it’s less necessary for hardcore analytic simulations to run on an iPhone. Portability is not always the Grail, and in many cases it needn’t be worried about much or at all.

Glass also relates the timeless observation that a solution should have as little complexity as possible, but not less than it actually needs. This is a critical point that is obvious to those with experience but not obvious to others. If you need power, and if you need flexibility, and if you need to be able to deal with a wide range of truly different considerations, then you have to include custom approaches to each one. You can generalize that as much as possible, but unique processes and properties must be represented as needed. Think of this like compression software (think, PKZip): Some naive observers have claimed that they can compress and compress and compress a data set down to a single byte. Ahh, but how much information would you lose? If the process reversible? Clearly, since we have more than 256 unique sets of date we might want to compress we would need more than 256 possible representations in our compressed output. Compression processes make use of continuous and repeating pattern in the source data. Once all the substitutions are made then no further compression is possible. The compression stops when the compressed output appears completely random. There are situations where “lossy” compression is acceptable (JPEG is a lossy compression mechanism). In these cases the character of the input is sufficiently maintained in an abridged output. The decompression process is not reversible. A JPEG image still looks pretty good to the viewer, even if it’s not as clear as the original RAW image.

An insight that seems counterintuitive but makes perfect sense once you think about it is this. Research suggests that about sixty percent of all software engineering effort goes into maintenance. So far, so good, right? So is this a good thing or a bad thing? It turns out that, of this sixty percent, almost forty percent is devoted to adding new features or making things easier to use. Only about seventeen percent is devoted to fixing things that are outright broken. (The other few percent have to do with migrating to new hardware, hosts, tools, systems, or whatever, as old ones become obsolete.) In this context, performing more and more “maintenance” on a software system turns out to be an indication of its quality. A well-designed system that does it’s job and can be modified over a long period of time without breaking is evidence that it must be all the things I listed earlier (i.e., modular, efficient, robust, maintainable, flexible, approachable and comprehensible, and maintainable). That’s a good thing. Poorly designed and brittle systems often don’t get used long enough to absorb a lot of maintenance effort.

As much as I stated that the outcome of battle is often decided in the minds of commanders I should also point out that there are different levels of commanders. The author does describe how (non-technical) management has to provide an environment for deeply knowledgeable technologists to succeed–and then has to get out of the way. At that point the technological “commanders” have to come up with a good architecture, good algorithms, a good way of working with customers, and so on. The key to is be able to get the various parties to understand and work well with each other. To this extent the quality of the members of the team is extremely important. Now, there is a Pareto distribution (eighty percent of the effect is generated by twenty percent of the sources) in the quality of software practitioners just like there is a Pareto distribution of competence or effect in every other area of life. Sturgeon’s Law (“My dear sir, ninety percent of everything is crap!”) may even be said to apply. That means that not all practitioners are equal. Some are not just twenty or fifty percent better than others, they might be ten or twenty times better than others. (Don’t even get me started about hiring by unicorn-like list of specific technologies in lieu of considering experience, ability to learn, and adaptability…)

This has a few ancillary effects, like the observation that (the way I hear it when I worked at Westinghouse) “Nine women can’t have a baby in a month.” This says that throwing bodies at a problem may not only not speed up the solution to the problem, it may actually slow the solution down. Local knowledge, experience, and continuity among team members is extremely valuable. I’ve always said that I would rather build a team using a smaller number of experienced practitioners than from a large number of less experienced ones. Skill, knowledge, good communication, and trust are extremely important and should be fostered at every opportunity. If you have a lot of turnover you’re probably doing it wrong (either hiring the wrong people or not properly incentivizing the right ones to stick around). This does not say that some people don’t have to go; some problems you just can’t fix. That happens.

I will say that in the rare instances where I’ve hired people to work on teams of my choosing in situations I understood intimately, I had a very high rate of success (essentially 100%). I achieved this by not being overly specific about the requirements and experience I was looking for. What I was doing was so specific and “long-haired” that I knew I was going to have to teach them a lot when they came on board. What I looked for instead was raw intelligence combined with flexibility and adaptability (e.g., I didn’t care what languages they knew, I wanted them to know at least two languages; they didn’t necessarily need to come from the same industry, but they did need to come from a technical background with a certain amount of heft, etc.).

The second section of the book, which described ten fallacies, was a bit less interesting to me. The most important one, I think, has to with mistaken notions of managing by things you can explicitly measure. While identifying specific, efficient metrics can be very important, there remains a soft aspect of management that cannot be overlooked. I think everyone is in agreement on this.

The bottom line is that the book felt appropriate and familiar to me. Even if I didn’t know all the practitioners, details, and history of every item in the book, certainly most of the material was recognizable based on my own experience, they all the ring of verisimilitude or “truthiness.” I think this book is useful for experienced practitioners, to contextualize their own experience, but it may be even more valuable to newer practitioners. They might not get the same things out of it. What they might get instead is a guide to things to look for. Rather than having things seem familiar because you’ve already experienced them, they might seem familiar when you experience them for the first time.

They say that experience is something you get right after you needed it. Wisdom, by contrast, is knowing how to apply knowledge before you need to. Forewarned is forearmed, right? I think this book contains a lot of useful wisdom.

Posted in Software | Tagged books, Experience, management, quality, Software | Leave a comment

Monitoring System Health and Availability, and Logging: Part 3

Posted on August 22, 2018 by R.P. Churchill

Now that we’ve described how to sense error states and something about how to record logging information on systems running multiple processes, we’ll go into some deeper aspects of these activities. We’ll first discuss storing information so errors can be recovered from and reconstructed. We’ll then discuss errors from nested calls in multi-process systems.

Recovering From Errors Without Losing Information and Events

We’ve described how communications work between processes on a single machine and across multiple machines. If an original message is not successfully sent or received for any reason, the operation of the receiving or downstream process will be compromised. If no other related events occur in the downstream process, then the action downstream action will not be carried out. If the message is passed in support of some other downstream action that does occur, however, then the downstream action will be carried out with missing information (that might, for example, require the use of manually entered or default values in the place of what wasn’t received). An interesting instance of the latter case is manufacturing systems where a physical workpiece may move from one station to another while the informational messages are not forwarded along with them. This may mean that the workpiece in the downstream station will have to be processed without identifying information, processing instructions, and so on.

There are a few ways to overcome this situation:

Multiple retries: This involves re-sending the message from an upstream process to a downstream process until a successful receipt (and completion?) is received by the upstream process. This operation fails when the upstream process itself fails. It may also be limited if new messages must be sent from an upstream process to a downstream process before the previous message is successfully sent.
Queueing requests: This involves storing the messages sent downstream so repeated attempts can be made to get them all handled. Storing in a volatile queue (e.g., in memory) may fail if the upstream process fails. Storing in a non-volatile queue (e.g., on disk) is more robust. The use of queues may also be limited if the order in which messages are sent is important, though including timestamp information may overcome those limits.
Pushing vs. Pulling: The upstream process can queue and/or retry sending the messages downstream until they all get handled. The downstream system can also fetch current and missed messages from the upstream system. It’s up the the pushing or pulling system to keep track of which actions or messages have been successfully handled and which must still be dealt with.

There may be special considerations depending on the nature of the system being designed. Some systems are send-only be nature. This may be a function of the communication protocol itself or just a facet of the system’s functional design.

In time-sensitive systems some information or actions may “age out.” This means they might not be able to be used in any meaningful context as events are happening, but keeping the information around may be useful for reconstructing events after the fact. This may be done by hand or in some automated way by functions that continually sweep for unprocessed items that may be correlated with known events.

For example, an upstream process may forward a message to a downstream process in conjunction with a physical workpiece. The message is normally received by the downstream system ahead of the physical workpiece so that it may be associated with the workpiece when it is received by the downstream system. If the message isn’t received before the physical piece the downstream process may assign a temporary ID and default processing values to the piece. If the downstream process receives the associated message while the physical piece is still being worked on in the downstream process then it can be properly associated and the correct instructions have a better chance of being carried out. The operating logs of the downstream process can also be edited or appended as needed. It the downstream process receives the associated message after the physical piece has left, then all the downstream system can do is log the information, and possibly pass it down to the next downstream process, in hopes that it will eventually catch up with the physical piece.

Another interesting case arises when the communication (or message- or control-passing) process fails on the return trip, after all of the desired actions were successfully completed downstream. Those downstream actions might include permanent side-effects like the writing of database entries and construction of complex, volatile data structures. The queuing/retry mechanisms have to be smart enough to detect whether the desired operations aren’t repeated if they have actually been completed.

A system will ideally be robust enough to ensure that no data or events ever get lost, and that they are all handled exactly the right number of times without duplication. Database systems that adhere to the ACID model have these qualities.

Properly Dealing With Nested Errors

System that pass messages or control through nested levels of functionality and then receive responses in return need a message mechanism that clearly indicates what went right or wrong. More to the point, since the happy path functionality is most likely to work reasonably well, particular care must be taken to communicate a complete contextual description of any errors encountered.

Consider the following function:

A properly constructed function would return errors from every identifiable point of failure in detail and unidentifiable failure in general. (This drawing should also be expanded to include reporting on calculation and other internal errors.) This generally isn’t difficult in the inline code over which the function has control, but what happens if control is passed to a nested function of some kind? And what if that function is every bit as complex as the calling function? In that case the error that should be returned should include information identifying the point in the calling function, with a unique error code and/or text description (how verbose you want to be depends on the system), and within that should be embedded the same information returned from the called function. Doing this gives a form of stack trace for errors (this level of detail generally isn’t needed for successfully traversed happy paths) and a very, very clear understanding of what went wrong, where, and why. If the relevant processes can each perform their own logging they should also do so, particularly if the different bits of functionality reside on different machines, as would be the case in a microservices architecture, but scanning error logs across different systems can be problematic. Being able to record errors at a higher level makes understanding process flows a little more tractable and could save a whole lot of forensic reconstruction of the crime.

Another form of nesting is internal operations that deal with multiple items, whether in arrays or some in other kind of grouped structure. This is especially important if separate, complex, brittle, nested operations are to be performed on each, and where each can either complete or fail to complete with completely different return conditions (and associated error codes and messages). In this case the calling function should return information describing the outcome of processing for each element (especially those that returned errors), so only those items can be queued and/or retried as needed. This can get very complicated if that group of items is processed at several different steps in the calling function, and different items can return different errors not only within a single operation, but across multiple operations. That said, once an item generates an error at an early step, it probably shouldn’t continue to be part of the group being processed at a later step. It should instead be queued and retried at some level of nested functionality.

Further Considerations

One more way to clear things up is to break larger functions down into smaller ones where possible. There are arguments for keeping a series of operations in a single function if they make sense logically and in the context of the framework being used, but there are arguments for clarity, simplicity, separation of concerns, modularity, and understandability as well. Whatever choice you make, know that you’re making it and do it on purpose.

If it feels like we’re imposing a lot of overhead to do error checking, monitoring, reporting, and so on, consider the kind of system we might be building. In a tightly controlled simulation system used for analysis, where calculation speed is the most important consideration, the level of monitoring and so on can be greatly reduced if it is known that the system is properly configured. In a production business or manufacturing system, however, the main considerations are going to be robustness, security, and customer service. Execution speed is far less likely to be the overriding consideration. In that case the efforts taken to avoid loss of data and events is the main goal of the system’s operation.

Posted in Software | Tagged communication, distributed systems, error handling, inter-process communication, logging, microservices, monitoring and logging, system architecture | Leave a comment

Monitoring System Health and Availability, and Logging: Part 2

Posted on August 9, 2018 by R.P. Churchill

Continuing yesterday’s discussion of monitoring and logging, I wanted to work through a few specific cases and try to illustrate how things can be sensed in an organized way, and especially how errors of various types can be detected. To that end, let’s start with a slightly simplified situation as shown here, where we have a single main process running on each machine, and the monitoring, communication, and logging are all built in to each process.

right click and view in higher resolution

As always, functionality on the local machine is straightforward. In this simplified case, things are either working or they aren’t, and that should be reflected in whatever log are or are not written. Sensing the state and history of the remote machine is what’s interesting.

First, let’s imagine a remote system that supports only one type of communication. That channel must be able to support messaging for many functions, including whatever operations it’s carrying out, remote system administration (if appropriate), reporting on current status, and reporting on historical data. The remote system must be able to interpret the incoming messages so it can reply with an appropriate response. Most importantly it has to be able to sense which of the four kinds of messages it’s receiving. Let’s look at each function in turn.

Normal Operations: The incoming message must include enough information to support the desired operation. Messages will include commands, operating parameters, switches, data records, and so on.
Remote System Administration: Not much of this will typically happen. If the remote machine is complex and has a full, independent operating system, then it is likely to be administered manually at the machine or through a standard remote interface, different than the operational interface we’re thinking about now. Admin commands using this channel are likely to be few and simple, and include commands like stop, start, reboot, reset communications, and simple things like that. I include this mostly for completeness.
Report Current State: This is mostly a way to query and report on the current state of the system. The incoming command is likely to be quite simple. The response will be only as complex as needed to describe the running status of the system and its components. In the case of a single running process as shown here, there might not be much to report. It could be as simple as “Ping!” “Yup!” That said, the standard query may also include possible alarm conditions, process counts, current operating parameters for dashboards, and more.
Report Historical Data: This could involve reporting a summary of events that have been logged since the last scan, over a defined time period, or meeting a specified criteria. The reply might be lengthy and involve multiple send operations, or may involve sending one or more files back in their entirety.

Some setups may involve a separate communication channel using a different protocol and supporting different functions. Some of this was covered above and yesterday.

right click and view in higher resolution

Now let’s look at what can be sensed on the local and remote systems in some kind of logical order:

Condition	Current State Sensed	Historical State Sensed
No problems, everything working	Normal status returned, no errors reported	Normal logs written, no errors reported
Local process not running	Current status not accessible or not changing	Local log file has no entries for time period or file missing
Local process detectable program error	Error condition reported	Error condition logged
Error writing to disk	Error condition detected and reported	Local log file has no entries for time period or file corrupted or missing
Error packing outgoing message	Can be detected and reported normally	Can be detected and logged normally
Error connecting to remote system (not found / wrong address / can’t be reached, incorrect authentication, etc.)	Error from remote system received and reported (if it is running, else timeout on no connection made)	Error from remote system received and logged (if it is running, else timeout on no connection made)
Error sending to remote system (connection physically or logically down)	Error from remote system received and reported (if it is running, else timeout on no connection made)	Error from remote system received and logged (if it is running, else timeout on no connection made)
Remote system fails to receive message	Request timeout reported	Request timeout logged
Remote system error unpacking message	Error from remote system received and reported	Error from remote system received and logged
Remote system error validating message values	Error from remote system received and reported	Error from remote system received and logged
Remote system error packing reply values	Error from remote system received and reported (if sends relevant error)	Error from remote system received and logged (if sends relevant error)
Remote system error connecting	Assume this is obviated once message channel open	Assume this is obviated once message channel open
Remote system error sending reply	Request timeout reported	Request timeout logged
Remote system detectable program error	Error from remote system received and reported	Error from remote system received and logged
Remote system error writing to disk	Error from remote system received and reported (if sends relevant error)	Entries missing in remote log or remote file corrupted or missing
Remote system not running (OS or host running)	Error from remote system received and reported if sent by host/OS, otherwise timeout	Error from remote system received and logged if sent by host/OS, otherwise timeout
Remote system not running (entire system down)	Report unable to connect or timeout on reply	Log unable to connect or timeout on reply

Returning to yesterday’s more complex case, if the remote system supports several independent processes and a separate monitoring process, then there are a couple of other cases to consider.

Condition	Current State Sensed	Historical State Sensed
Remote monitor working, individual remote processes generating errors or not running	Normal status returned, relevant process errors reported	Normal status returned, relevant process errors logged
Remote monitor not running, separate from communications	Normal status returned, report monitor counter or heartbeat not updating	Normal status returned, log monitor counter or heartbeat not updating
Remote monitor not running, with embedded communications	Error from remote system received and reported if sent by host/OS, otherwise timeout reported	Error from remote system received and logged if sent by host/OS, otherwise timeout logged

A further complication arises if the remote system is actually a cluster of individual machines supporting a single, scalable process. It is assumed in this case that the cluster management mechanism and its interface allow interactions to proceed the same way as if the system was running on a single machine. Alternatively, the cluster manager will be capable of reporting specialized error messages conveying appropriate information.

Posted in Software | Tagged historical data, inter-process communication, logging, monitoring and logging, process monitoring, system architecture | Leave a comment

Monitoring System Health and Availability, and Logging: Part 1

Posted on August 8, 2018 by R.P. Churchill

Ongoing Monitoring, or, What’s Happening Now?

Any system with multiple processes or machines linked by communication channels must address real-time and communication issues. One of the important issues in managing such a system is monitoring it to ensure all of the processes are running and talking to each other. This is a critical (and often overlooked) aspect of system design and operation. This article describes a number of approaches and considerations. Chief among these is maintaining good coverage while keeping the overhead as low as possible. That is, do no less than you need to do, but no more.

First let me start with a system diagram I’ve probably been overfond of sharing. It describes a Level 2, model-predictive, supervisory control system of a type I implemented many times for gas-fired reheat furnaces in steel mills all over North America and Asia.

The blue boxes represent the different running processes that made up the Level 2 system, most of which communicated via a shared memory area reserved by the Load Map program. The system needed a way to monitor the health of all the relevant processes, so each of the processes continually updated a counter in shared memory, and the Program Monitor process scanned them all at appropriate intervals. We referred to the counters and the sensing of them the system’s heartbeat, and sometimes even represented the status with a colored heart somewhere on the UI. If any counter failed to update the Program Monitor flagged an alarm and attempted to restart the offending process. Alarm states were displayed on the Level 2 UI (part of the Level 2 Model process) and also in the Program Monitor process, and were logged to disk at regular intervals. There were some additional subtleties but that was the gist of it. The communication method was straightforward, the logic was dead simple, and the mechanism did the job.

This was effective, but limited.

This system was great for monitoring processes running on a single machine and regularly logging the status updates on a local hard disk. Such a system should log the time all processes start and when they stop, if the stoppage can be sensed, otherwise the stoppage time should be able to be inferred from the periodically updated logs.

This kind of system can also log errors encountered for other operational conditions, e.g., temperatures, flows, or pressures too high or low, programs taking too long to run (model slipping real-time), and so on, as well as failures trying to connect with, read from, and write to external systems over network links. External systems, particularly the main HMI (Level 1 system) provided by our company as the main control and historical interface to the industrial furnaces, needed to be able to monitor and log the health and status of the Level 2 system as well.

If all the communications between Level 1 and Level 2 were working, the Level 1 system could display and log all reported status items from the Level 2 system. If the communications are working but one or more processes aren’t working on the Level 2 side, the Level 1 system might be able to report and log that something is out of whack with Level 2, but it might not be able to trust the details, since they may not be getting calculated or reported properly. Please refer to yesterday’s discussion of documenting and implementing inter-process communications to get a feel for what might be involved.

The point is that one needs to make sure the correct information is captured, logged, and displayed, for the right people, in the right context.

If a system is small and all of the interfaces are close together (the Level 1 HMI computer and Level 2 computer often sat on the same desk or console, and if that wasn’t the case the Level 2 computer was almost always in an adjacent or very nearby room) then it’s easy to to review anything you might need. This gets a lot more difficult if a system is larger, more dispersed, and more complicated. In that case you want to arrange for centralized logging and monitoring of as many statuses and operations as possible.

Let’s look at a general case of what might happen in one local computer and one connected, remote computer, with respect to monitoring and logging. Consider the following diagram:

right click and view in higher resolution

Note that the details here are notional. They are intended to represent a general collection of common functions, though their specific arrangement and configuration may vary widely. I drew this model in a way that was inspired by a specific system I built (many times), but other combinations are possible. For example, one machine might support only a single process with an embedded communication channel. One machine might support numerous processes that each include their own embedded communication channel. The monitoring functions may operate independently or as part of a functional process. Imagine a single server providing an HTTP interface (a kind of web server), supporting a single function or service, and where the monitoring function is most embedded in the embedded communication channel. One may also imagine a virtualized service running across multiple machines with a single, logical interface.

Starting with the local system, as long as the system is running, and as long as the monitor process is running (this refers to a general concept and not a specific implementation), the system should be able to generate reliable status information in real-time. If the local UI is available a person will be able to read the information directly. If the disk or persistent media are available the system can log information locally, and the log can then be read and reviewed locally.

The more interesting aspect of monitoring system health and availability involves communicating with remote systems. Let’s start by looking at the communication process. A message must be sent from the local system to a remote system to initiate an action or receive a response. Sending a message involves the following steps:

Pack: Message headers and bodies must be populated with values, depending on the nature of the physical connection and information protocol. Some types of communications, like serial RS-232 or RS-485, might not have headers while routed IP communications definitely will. Some or all of the population of the header is performed automatically but the payload or body of the message must always be populated explicitly by the application software. Message bodies may be structured to adhere to user-defined protocols within industry-standard protocols, with the PUP and PHP standards defined by American Auto-Matrix for serial communication with its HVAC control products serving as an example. HTTP, XML, and JSON are other examples of standard protocols-within-protocols.
Connect: This involves establishing a communications channel with the remote system. This isn’t germane to hard-wired communications like serial, but it definitely is for routed network communications. Opening a channel may involve a series of steps involving identifying the correct machine name, IP address, and communications port, and then providing explicit authentication credentials (i.e., login and password). Failure to open a channel may be sensed by receipt of an error message. Failure of the remote system to respond at all is generally sensed by timing out.
Send: This is the process of forwarding the actual message to the communications channel if it is not part of the Connect step just described. Individual messages are sometimes combined with embedded access request information because they are meant to function as standalone events with one sent request and one received reply. In other cases the Connect step sets up a channel over which an ongoing two-way conversation is conducted. The communications subsystems may report errors, or communications may cease altogether, which again is sensed by timing out.
Receive: This is the process of physically receiving the information from the communications link. The protocol handling system generally takes care of this, so the user-written software only has to process the fully received message. The drivers and subsystems generally handle the accretion of data off the wire.
Unpack: The receiving system has to parse the received message to break it down into its relevant parts. The process for doing so depends on the protocol, message structure, and implementation language.
Validate: The receiving system can also check the received message components to ensure that the values are in range or otherwise appropriate. This can be done at the level of business operation or at the level of verifying the correctness of the communication itself. An example of verifying correct transmission of serial information is CRC checks, where the sending process calculates a value for the packet to be sent and embeds it in the packet. The receiving system then performs the same calculation and if it generates the same value then proceeds on the assumption that the received packet is probably correct.

I’ve drawn the reply process from the remote system to the local system as a mirror of the sending process, but in truth it’s probably simpler, because the connect process is mostly obviated. All of the connections and permissions should have been worked through as part of the local-to-remote connection process.

If the communications channels are working properly we can then discuss monitoring of remote systems. Real-time or current status values can be obtained by request from a remote system, based on what processes that machine or system or service is intended to support. As discussed above, this can be done via a query of a specialized monitoring subsystem or via the standard service interface that supports many kinds of queries.

In one example of a system I write the Level 2 system communicated with the Level 1 system by writing a file to a local RAM disk, that the Level 1 system would read, and reading a file from that RAM disk, that the Level 1 system would write. The read-write processes were mutex-locked using separate status files. The file written by the Level 2 system was written by the Model process and included information compiled by the Program Monitor process. The Level 1 system knew the Level 2 system was operating if the counter in the file was being continually updated. It knew the Level 2 system had no alarm states if the Program Monitor process was working and seeing all the process counters update. It knew the Level 2 system was available to assume control if it was running, there were no alarms, and the model ready flag was set. The Level 1 system could read that information directly in a way that would be appropriate for that method of communication. In other applications we used FTP, DECMessageQ, direct IP, shared memory, and database query communications with external systems. The same considerations apply for each.

An HTTP or any other interface might support a function that causes a status response to be sent, instead of whatever other response is normally requested. Status information might be obtained from remote systems using entirely different forms of communication. The ways to monitor status of remote systems are practically endless, but a few guidelines should be followed.

The overhead of monitoring the status of remote systems should be kept as light as possible. In general, but especially if there are multiple remote systems, a minimum number of queries should be made to request current statues. If those results are to be made visible to a large number of people (or processes), they should be captured in an intermediate source that can be pinged much more often, and even automatically updated. For example, rather than creating a web page each instance of which continuously pings all available systems, there should be a single process that continuously pings all the systems and then updates a web dashboard. All external users would then link to the dashboard, which can be configured to automatically refresh at intervals. That keeps the network load on the working systems down as much as possible.

Logging, or, What Happened Yesterday?

So far we’ve mostly discussed keeping an eye on the current operating state of local and remote systems. We’ve also briefly touched on logging to persistent media on a local machine.

There are many ways to log data on a local machine. They mostly involve appending a log file with increments of data in some form that can be reviewed. In FORTRAN and C++ I tended to write records of binary data, with multiple variant records for header data to keep the record size as small as possible. That’s less effective in languages like JavaScript (and Node and their variants), so the files tend to be larger as they are written out in text in YAML or XML or like format. It is also possible to log information to a database.

A few considerations for writing out log files are:

A means must be available to read and review them in appropriate formats.
Utilities can (and should be) written to sort, characterize, and compile statistical results from them, if appropriate.
The logs should have consistent naming conventions (for files, directories, and database tables). I’m very fond of using date and time formats that sort alphanumerically (e.g., LogType_YYYYMMDDD_HHMMSS).
Care should be taken to ensure logs don’t fill up or otherwise size out, which can result in the loss of valuable information. More on this below.

Logging events to a local system is straightforward, but accessing logs on remote systems is another challenge. Doing so may require that the normal communication channel for that system give access to the log or that a different method of access must be implemented. This may be complicated based on the volume of information involved.

It gets even more interesting if you want to log information that captures activities across multiple systems and machines. Doing this requires that all processes write to a centralized data repository of some kind. This can impose a significant overhead, so care must be taken to ensure that too much data isn’t written into the logging stream.

Here are a few ways to minimize or otherwise control the amount of data that gets logged:

Write no more information to the logs than is needed.
If very large entries are to be made, but the same information is likely to be repeated often (e.g., a full stack trace on a major failure condition), find a way to log a message saying something along the lines of, “same condition as last time,” and only do the full dump at intervals. That is, try not to repeatedly store verbose information.
Implement flexible levels of logging. These can include verbose, intermediate, terse, or none. Make the settings apply to any individual system or machine or to the system in its entirety.
Sometimes you want to be able to track the progress of items as they move through a system. In this case, the relevant data packages can include a flag that controls whether or not a logging operation is carried out for that item. That way you can set the flag for a test item or a specific class of items of interest, but not log events for every item or transaction. Flags can be temporarily to data structures or buried in existing data structures using tricks like masking a high, unused bit on one structure member.
If logs are in danger of overflowing, make sure they can roll over to a secondary or other succeeding log (e.g., YYYYMMDD_A, YYMMDD_B, YYMMDD_C, and so on), rather than just failing, stopping, or overwriting existing logs.

Well, that’s a fairly lengthy discussion. What would you add or change?

Posted in Software | Tagged alarms, dashboards, historical data, inter-process communication, logging, monitoring and logging, process monitoring, real-time computing, situational awareness, system architecture | Leave a comment

Designing and Documenting Inter-Process Communication Links

Posted on August 7, 2018 by R.P. Churchill

I’ve touched on the subject of inter-process communication previously here and here, but now that I’m charged with doing a lot of this formally I wanted to discuss the process in more detail. I’m sharing this in outline form to keep the verbiage down and level of hierarchical organization up.

The following discussion attempts to outline everything that reasonably could be documented about an inter-process communication link. As a practical matter a shorthand will be worked out over time, especially for elements that are repeated often (e.g., a standard way of performing yaml-formatted JSON transfers). The methods associated with the standard practices should be documented somewhere, and then referenced when they are used for an individually documented connection.

General: Overall context description

Message Name: Name / Title of message
Description: Text description of message
Architecture Diagram: graphical representation that shows context and connections
Reason / Conditions: Context of why message is sent and conditions under which it happens; note whether operation is to send or receive

Source: Sending Entity

Sending System: Name, identifiers, addresses
Sending Process / Module / Procedure / Function: Specific process initiating communication
System Owner: Developer/Owner name, department/organization, email, phone, etc.

Message Creation Procedure: Describe how message gets created (e.g., populated from scratch, modified from another message, etc.)

Load Procedure: How elements are populated, ranged, formatted, etc.
Verification Procedure: How package is reviewed for correctness (might be analogous to CRC check, if applicable)

Payload: Message Content

Message Class: General type of message with respect to protocol and hardware (e.g., JSON, XML, IP packet, File Transfer, etc.)
Grammar: Structure of message, meaning of YAML/XML tags, header layout, and so on (e.g., YAML, XML, binary structure, text file, etc.)
Data Items: List of data items, variable names, types, sizes, and meanings (note language/platform dependency: C++ has hard data types and rules for data packing in structures and objects, JavaScript is potentially more flexible by platform)

Acceptable range of values: Important for certain data types in terms of size, content, and values
Header: Message header information (if accessible / controlled / modified by sender)
Body: Active payload

Message Size: Appropriate to messages that have a fixed structure and size (i.e., not applicable to flexibly formatted XML messages

Transfer Process: Receiving/Accessed Entity

Access Method: Means of logging in to or connecting with destination system (if applicable)
Permissions: passwords, usernames, access codes, lock/mutex methods
Reason: why destination system is accessed

Destination: Receiving Entity

Receiving System: Name, identifiers, addresses
Receiving Process / Module / Procedure / Function: Specific process initiating communication
System Owner: Developer/Owner name, department/organization, email, phone, etc.

Methodology: How it’s done

Procedure Description: List of steps followed

Default Steps (happy path): Basic steps (e.g., connect, open, lock, transfer/send, verify, unlock, close, disconnect)
Fail / Retry Procedures: What happens when connect/disconnect, open/close, lock/unlock, read/write operations fail in terms of detection (error codes), retry, error logging, and complete failure
Error Checking / Verification: How procedure determines whether different operations were performed correctly

Pack / Load / Unpack / Unload: check values/ranges, resulting structure format, etc.
Send / Receive: check connection / communication status

Side Effects / Logs: Describe logging activity and other side effects

This information is diagrammed below. (I need to add some arrows to show the steps taken but they can be inferred for now.)

right-click and view in higher resolution

The small diagram below shows the full context of one interface, which as a practical matter is a two-way communication where each direction could be fully documented as described above.

Posted in Software | Tagged documentation, inter-process communication, real-time computing, system architecture | Leave a comment

Microsoft Excel: It’s More Than Just VLOOKUP

Posted on June 22, 2018 by R.P. Churchill

Yesterday’s post touched on the importance of being able to use spreadsheets well, and particularly Microsoft Excel. I thought I would expand on that subject here.

I saw copies of Lotus 1-2-3 back in college and certainly read about it. The same went for VisiCalc. I started using random knockoff spreadsheet programs around 1987 while I was in the Army, on my own computer. I started using Quattro Pro for work and home beginning in the late 80s and continuing through the early 90s. One of the Turbo Pascal Toolbox products included code for a spreadsheet, so I saw some of how they were constructed from the inside. I started using Word to replace Professional Write around 1991 because it was so clean and simple for a true WYSIWYG product, and have used it ever since. Quattro was a great product, though, and I didn’t give it up until the bundling of Microsoft Office, which included Word, which I already knew and liked, obviated it. From there it as Excel all the way.

Over time I ended up doing huge amounts of analysis and tracking in Excel. I’ve used it for everything from project tracking (doing pseudo Gannt charts, among other things) to cost accounting and forecasting, lightweight simulation, lightweight Monte Carlo analysis, data conditioning, statistical characterization, plotting and charting, scientific and engineering calculations, inventory management, list-making and knowledge transfer, and earned value management. In short, I’ve used it for almost everything, and it’s rare that a day goes by when I don’t use it.

The Aircraft Maintenance Model I supported for a few years used to write output data directly into formatted tables in different tabs of Excel workbooks, which I spent quite a bit of time rearranging and improving. Eventually, however, some of the data sets we were working with grew beyond the 2-gigabyte limit Excel can handle. At that point we had to change the program so it wrote out comma-delimited text files (CSV files) and additional programs that would parse the data in the files and perform the same calculations. Users obviously should never use functions they don’t understand, but recreating those functions in code demonstrates an even deeper level of understanding of what they do. Indeed, I often reverse-engineer Excel mathematical functions in Excel to make sure I understand the version and context of a function I want to use completely.

Over the years I’ve seen job ads looking for people who can do data analysis in Excel. They usually make a big deal about being able to use the VLOOKUP function (they never seem to mention the HLOOKUP function). I suppose that’s fine if you want someone who’s just starting out, but in reality that function, along with pivot tables, is pretty trivial.

One of the projects I did with RTR Technologies involved reverse-engineering a fairly complex spreadsheet used to justify the need for about 27,000 staff for a particular function in a particular government agency. There was nothing difficult going on in the spreadsheet, but its sheer size and the amount of data it tried to track made it a configuration nightmare for the individual who created it and had been maintaining it for a few years. At the time I described this individual’s efforts as “heroic” given his level of knowledge of Excel and his overall training in computing in general. In the process of taking the workbook apart I saw many opportunities for including meta-calculations that could be used to check the integrity and accuracy of the work, especially as it was updated and extended from year to year. I tend to include such checks and verifications in all my work but this individual simply didn’t have the background for it, and neither did anyone else in the organization. We ended up writing a standalone program to recreate all of the functionality of the spreadsheet, and on delivery were able to demonstrate that we matched over 100,000 input and calculated values exactly. (That’s an entirely different story. Ask me about it over coffee sometime.)

A separate analyst put together an even more complex spreadsheet to calculate staffing needs for a different branch of the same agency. It took forever to hire that person back and by the time they did so my company’s contract had ended. (We can talk about that one over a second cup of said coffee.) I never look at that workbook in much detail but it used a lot of flexible, indirect addressing schemes that were quite clever. I’ve used those in my own analyses but they generally don’t come up much. That fact that they even exist in the program, though, is pretty impressive.

As an aside, it turns out that Excel and its other spreadsheet brethren constitute a meaningful form of programming, known as cell-oriented dataflow programming. This is a subclass of declarative programming, which is contrasted with the more traditional declarative approach. See here for more information.

Posted in Tools and methods | Tagged Excel, spreadsheets, tools | Leave a comment

Post University CIS Advisory Board Meeting 2018

Posted on June 21, 2018 by R.P. Churchill

Today I participated in the annual meeting of the advisory committee for the undergraduate CIS department at Post University in Waterbury, CT (as I have on previous occasions). Most of the students participate in the program remotely, which is a definite indication of current and future trends in the education space, but that doesn’t have much affect on the board’s discussions or recommendations. Rather, the discussions and recommendations centered on the material to be emphasized in the curriculum and the various concentrations offered.

Here’s what I wrote in response to the after action survey.

* * * * *

I thought the participants brought up a lot of good points in general. I was on fire to discuss business analysis subjects and did not mind in the least when the group stole my thunder by bringing it up first before I could. I’ve been doing all the elements for my entire 30-year career and had a whale of a time communicating my value and skillset to potential employers. However, the world finally seems to be getting it, and Post U should definitely make the context clear to its students.

Regarding certifications, yes, they might be a ridiculous and expensive cottage industry meant to part fools from their money, but I also observe that much of the educational model in this country and around the world is going to change at all levels. It’s going to be more self-directed, granular, gamified, customized, flipped, and streamlined into a vocational model with a specific end. About two-thirds of the second-rate little schools are going to disappear because the cost-benefit ratio for many subjects doesn’t work for the students. Legions of third-rate liberal arts professors are going to have to get real jobs when that happens, which will be to the good of everyone. Certifications are going to be part of the more granular and ongoing process of continuing education that adults will pursue as their interests and professional needs move them. Market forces will continually shape this process from the demand side, as well. It might be nice if the university structured their normal degree program so graduating students were prepared to pass the relevant BA certification exams at an appropriate level. The IIBA had three levels (I have a CBAP) and is considering a fourth (higher) one. PMI has its own version, because it seems to have a cert for everything. I’m less familiar with its tiered structure, if any.

Whether students choose to avail themselves of an exam or certification or not, being conversant with the material will be useful to the curriculum designers at Post. I strongly agree that structuring coursework along the lines of performing a business analysis for a case-based scenario is a very good idea. It might be an even better idea to work through a handful of smaller cases and one larger one near the end. This is probably as or more valuable than exposing students to the concepts of Agile/Scrum/Kanban and Waterfall and hybrid management approaches. You might also have someone come in and tell horror stories about how messy things get in the real world (see Rodney Dangerfield trying to straighten out the professor of business early in the movie, Back to School).

The other interesting subject was the use of Microsoft Excel. Being able to use the important aspects of this tool (and its free variants) is critical to any capable professional. People mentioned VLookups and Pivot tables, but there are a lot of other useful capabilities, especially including tables. They have a lot of powerful functionality built in that very, very few people appreciate.

In the end, every profession is about solving problems. Including instruction on a specific, ubiquitous tool and a general context for problem solving in the enterprise (which is always about solving customer problems in some way) is a solid approach for the near future.

Thank you very much for allowing me to participate in this exercise. I learn from it as much as I hope I provide some value for you.

Posted in Software | Tagged business value, curriculum development, education, online courses, Post University CIS Advisory Board, training | Leave a comment

Search for:
Recent Posts
Recent Comments
Categories
Meta
October 2025

M T W T F S S

1 2 3 4 5

6 7 8 9 10 11 12

13 14 15 16 17 18 19

20 21 22 23 24 25 26

27 28 29 30 31

« Jan

Recent Posts

Recent Comments

Categories

Meta