Unified Theory of Business Analysis: Part One

Defining and Distinguishing Between the Solution Effort and the Engagement Effort

There seems to be a great deal of confusion in describing what business analysts do, since they often seem to be doing such different things. I’ve confirmed this after having taken a number of surveys of business analysts. I’ve also seen close to fifty presentations on the subject, taken a training course, earned a certification, followed a whole bunch of discussion on LinkedIn and elsewhere, and reviewed job ads until my eyes bled. I’ve thought about this subject a lot over the years and the answer has finally become rather clear to me.

I’ve worked through a number of engagements of various kinds. They’ve ranged from small to large, quick to long-lasting, and very ad hoc to very formal. What I noticed is that the smaller, shorter, less formal ones all involved the same activities as the larger, longer, formal ones, it’s just that they didn’t include every possible activity. And therein lies the discovery. I call it the Unified Theory of Business Analysis.

The reason so many BA efforts feel so different is because many efforts either do not include every activity, or because any one BA may only be involved in a limited part of a larger effort. This is true with respect to the multiple phases of an engagement through time, and to the multiple components of a solution’s scope are addressed.

I’m convinced that all of the phases of a full scale engagement are actually conducted during even the most slap-dash, ad hoc analysis effort. However, in many cases the context is so well understood that many of the steps I define in my business analysis framework are addressed implicitly to the point where no special effort is taken to do everything you can do in every possible phase. You could have a big, formal effort with kickoff and close-out meetings for every phase, or you could hear that Dwayne in shipping hates the new data entry screen and why don’t Diane and Vinay shoot down, find out what the confusion is, and put together a quick fix for him. In the latter case the Intended Use and Conceptual Model model phases are obviated and the remaining phases can be accomplished with minimal oversight, time, and fanfare. If an effort is simple enough it doesn’t require the overhead of requirements traceability matrices, change control boards, and a bunch of documentation (though the right people should be involved and the relevant documents and systems should be updated).

This is why things look different to BA practitioners and their managers and colleagues through the phases of an engagement through time, but it’s also possible that BAs will only be involved in a limited part of the solution effort, as I discussed and defined yesterday.

In short, I wanted to introduce this discussion be distinguishing between the solution effort, which is the process of defining and implementing the solution, and the engagement effort, which is the meta-process of managing the phases that support the solution. It may seem like a subtle distinction but it seemed important to me as I prepare for the next part of this discussion.

Posted in Management | Tagged , | Leave a comment

Components of a Solution

I wanted to define some terms and provide some context in preparation for the next few posts. Specifically, I’m describing the components of a solution. A process model is a model of a system that either does exist or will exist, either after it is implemented or after it is modified from an existing system. I’m going to be referring to the work needed to affect the solution itself as the solution or the solution effort over the next few days.

As called out in the diagram directly below, a model is made up of many standard components. I’ll refer to these repeatedly in the descriptions that follow, so I want to start by describing the components here.

  • System (or Model): A system is the entire system or process being investigated. A model is a representation of that system or process.
  • Entity: Entities are items that move around in the system. They can represent information or physical objects, including people. Entities will sometimes pass through a system (like cars going through a car wash) and sometimes move only within a system (like aircraft in a maintenance simulation or employees serving customers in a barber shop). Entities may split or spawn new entities at various points within the system, and can also be merged or joined together.
  • Entry: Entities of go into a system at entry points. Note that the entry can represent an exit point from an external system, and thus a connection with that system.
  • Station (or Process Block or Activity): These blocks are where the important activities and transformations happen. The important concepts associated with stations are processing time, diversion percentage (percentage of time entities being processed go to different destinations when they leave the station), and the nature of the transformations carried out on the entities being processed (if any). A wide variety of other characteristics and behaviors can be defined for these components, and those characteristics (like being open or closed, for example) can change as defined by a schedule.
  • Group (or Facility): A group of stations performing related operations is referred to as a facility. A toll plaza includes multiple stations that all do the same or very similar things. A loading dock that serves multiple trucks works the same way, where each space is a separate station. In a more information-driven setting, a roomful of customer service workers could be represented as individual stations, and collectively would be referred to as a group or facility.
  • Queue: A queue is where entities wait to be processed. FIFO, LIFO, and other types of queues can be defined. A queue can have a defined, finite capacity or an infinite capacity. If the queue has a physical component it might be given physical dimensions for length, area, or volume. A minimum traversal time may also be defined in such a case.
  • Path: A path defines how entities move from one non-path component to another. In logical or informational systems the time needed to traverse a path may be zero (or effectively zero), while length and speed may have to be defined for physical paths. The direction of movement along each path must also be specified. Paths are often one-way only, but can allow flow in both directions (although it’s often easier to include separate paths for items traveling in each direction).
  • Exit: Entities leave a system through exit components. Note that the exit can represent an entry point of an external system, and thus a connection with that system.
  • Bag (or Parking Lot): A bag is a special kind of queue where entities may be stored but may be processed at arbitrary times according to arbitrary criteria. They are used when their operations are not FIFO, LIFO, or some other obvious behavior. You can think of them like parking lots, where the residence time is determined by the amount of time it takes passengers to get out of the car, walk somewhere, conduct their business in some establishment, walk back to the car, get back in and buckled up, get their phone mounted and plugged in, take a swing of their coffee, set the GPS for their next destination, get their audiobook playing, readjust their mirrors, and finally start pulling out. In such a case the car will be represented by an entity that continues to take up space in the parking lot (or bag), while the passenger is a new entity that is spawned or split from the car entity, and then later merged or rejoined back with it.
  • Combined (Queue and Process): It is possible to define components that embody the characteristics of both a station and a queue. This is typically done for representational simplicity.
  • Resource (not shown): A resource is something that must be available to carry out an operation. Resources can be logical (like electricity or water, though they can be turned off or otherwise fail to be available) or they can be discrete, like mechanics in a repair shop. When a car pulls into a station or process block (representing a space in a garage) a mechanic has to go to perform the service. Sometimes multiple mechanics are needed to carry out an operation. Sometimes different specialists are needed to perform certain actions. Discrete resources can be represented by entities in a system. If no, or not enough, resources are available, the the process waits for them to become available, and only starts its clock when they do.
  • Resource Pool (not shown): Resources can be drawn from a collection, which itself is referred to as a pool. There can be one or multiple pools of resources for different purposes and the resources can have different characteristics. There may be different numbers and types of resources available at different times, and this can be defined according to a schedule.
  • Component: All of the items listed above — except the system itself — are referred to as system (or model) components.

That just describes some of the major items I tend to include in process models. Here are some of the others that are possible.

  • Event: Events are occurrences or changes of state that trigger a change to some other component in a system.
  • Decision Point: These control elements govern the movement of entities through the system, either logically or physically. The diversion percentage characteristic described for stations, above, is an example of embedding a decision point into another component. However, decision points can be included in models as standalone components. In my experience they are usually connected by paths and not other components, but that is an implementation detail.
  • Role: This represents a person or group involved in the process. Roles can be modeled as stations or resources, as described above, but they can also be included as other kinds of components, depending on the implementation.

If the process model is itself a simulation there is yet another layer of components can be added. These are meta-elements that control the simulation and its components. Simulations are often called models, although the BABOK reserves the word model for the graphical representation of a system and associates simulation with the concept of process analysis.

  • Editing interface for the system: This allows a user to add, remove or reconfigure components within the system.
  • Editing interface for components: This allows a user to define or modify the operational characteristics of the components. This can include schedules, events, arrivals, and other items, as well as traditional elements like durations, dimensions, and capacities.
  • Operational interface: This allows a user to start, pause, and stop the simulation.
  • Data analysis capability: Simulations are generally implemented to generate a lot of output data. They can also require a lot of complex input data. Integrated analysis capabilities are sometimes included in simulation tools.
Posted in Tools and methods | Tagged , , , | Leave a comment

Combined Survey Results, Updated (March 2019)

With the addition of survey data taken at recent meetups in Orlando and Tampa, the combined results are updated here.

List 5-8 steps you take during a typical project.

These vary, but mostly follow a recognizable pattern, in rough keeping with what the BABOK describes.

  1. Requirements Gathering
  2. Initiation
  3. Testing
  4. QA
  5. Feedback
  6. User acceptance
  1. Requirement Elicitation
  2. UX Design
  3. Software Design for Testability
  1. Identify Business Goal
  2. ID Stakeholders
  3. Make sure necessary resources are available
  4. Create Project Schedule
  5. Conduct regular status meetings
  1. Meet with requester to learn needs/wants
  2. List details/wants/needs
  3. Rough draft of Project/proposed solutions
  4. Check in with requester on rough draft
  5. Make edits/adjustments | test
  6. Regularly schedule touch-point meeting
  7. Requirement analysis/design | functional/non-functional
  8. Determine stakeholders | user acceptance
  1. List the stakeholders
  2. Read through all documents available
  3. Create list of questions
  4. Meet regularly with the stakeholders
  5. Meet with developers
  6. Develop scenarios
  7. Ensure stakeholders endorse requirements
  8. other notes
    • SMART PM milestones
    • know players
    • feedback
    • analysis steps
    • no standard
  1. identify stakeholders / Stakeholder Analysis
  2. identify business objectives / goals
  3. identify use cases
  4. specify requirements
  5. interview Stakeholders
  1. project planning
  2. user group sessions
  3. individual meetings
  4. define business objectives
  5. define project scope
  6. prototype / wireframes
  1. identify audience / stakeholders
  2. identify purpose and scope
  3. develop plan
  4. define problem
  5. identify objective
  6. analyze problems / identify alternative solutions
  7. determine solution to go with
  8. design solution
  9. test solution
  1. gathering requirements
  2. assess stakeholder priorities
  3. data pull
  4. data scrub
  5. data analysis
  6. create summary presentation
  1. define objective
  2. research available resources
  3. define a solution
  4. gather its requirements
  5. define requirements
  6. validate and verify requirements
  7. work with developers
  8. coordinate building the solutions
  1. requirements elicitation
  2. requirements analysis
  3. get consensus
  4. organizational architecture assessment
  5. plan BA activities
  6. assist UAT
  7. requirements management
  8. define problem to be solved
  1. understand thhe business need of the request
  2. understand why the need is important – what is the benefit/value?
  3. identify the stakeholders affected by the request
  4. identify system and process impacts of the change (complexity of the change)
  5. understand the cost of the change
  6. prioritize the request in relation to other requests/needs
  7. elicit business requirements
  8. obtain signoff on business requests / validate requests
  1. understanding requirements
  2. writing user stories
  3. participating in Scrums
  4. testing stories
  1. research
  2. requirements meetings/elicitation
  3. document requirements
  4. requirements approvals
  5. estimation with developers
  6. consult with developers
  7. oversee UAT
  8. oversee business transition
  1. brainstorming
  2. interview project owner(s)
  3. understand current state
  4. understand need / desired state
  5. simulate / shadow
  6. inquire about effort required from technical team
  1. scope, issue determination, planning
  2. define issues
  3. define assumptions
  4. planning
  5. ccommunication
  6. analysis – business and data modeling
  1. gather data
  2. sort
  3. define
  4. organize
  5. examples, good and bad
  1. document analysis
  2. interviews
  3. workshops
  4. BRD walkthroughs
  5. item tracking
  1. ask questions
  2. gather data
  3. clean data
  4. run tests
  5. interpret results
  6. visualize results
  7. provide conclusions
  1. understand current state
  2. understand desired state
  3. gap analysis
  4. understand end user
  5. help customer update desired state/vision
  6. deliver prioritized value iteratively
  1. define goals and objectives
  2. model As-Is
  3. identify gaps/requirements
  4. model To-Be
  5. define business rules
  6. conduct impact analysis
  7. define scope
  8. identify solution / how
  1. interview project sponsor
  2. interview key stakeholders
  3. read relevant information about the issue
  4. form business plan
  5. communicate and get buy-in
  6. goals, objectives, and scope
  1. stakeholder analysis
  2. requirements gathering
  3. requirements analysis
  4. requirements management – storage and updates
  5. communication – requirements and meetings
  1. analyze evidence
  2. desiign application
  3. develop prototype
  4. implement product
  5. evaluate product
  6. train users
  7. upgrade functionality
  1. read material from previous similar projects
  2. talk to sponsors
  3. web search on topic
  4. play with current system
  5. ask questions
  6. draw BPMs
  7. write use cases
  1. document current process
  2. identify users
  3. meet with users; interview
  4. review current documentation
  5. present proposed solution or iteration
  1. meeting with stakeholders
  2. outline scope
  3. research
  4. write requirements
  5. meet and verify with developers
  6. test in development and production
  7. outreach and maintenance with stakeholders
  1. As-In analysis (current state)
  2. write lightweight business case
  3. negotiate with stakeholders
  4. write user stories
  5. User Acceptance Testing
  6. cry myself to sleep đŸ™‚
  1. initiation
  2. elicitation
  3. discussion
  4. design / user stories / use cases
  5. sign-off
  6. sprints
  7. testing / QA
  8. user acceptance testing
  1. planning
  2. elicitation
  3. requirements
  4. specification writing
  5. QA
  6. UAT
  1. identify the problem
  1. studying subject matter
  2. planning
  3. elicitation
  4. functional specification writing
  5. documentation
  1. identify stakeholders
  2. assess working approach (Waterfall, Agile, Hybrid)
  3. determine current state of requirements and maturity of project vision
  4. interview stakeholders
  5. write and validate requirements
  1. problem definition
  2. value definition
  3. decomposition
  4. dependency analysis
  5. solution assessment
  1. process mapping
  2. stakeholder interviews
  3. write use cases
  4. document requirements
  5. research
  1. listen – to stakeholders and customers
  2. analyze – documents, data, atc. to understand thhings further
  3. repeat back what I’m hearing to make sure I’m understanding correctly
  4. synthesize – the details
  5. document – as needed(e.g., Visio diagramsPowerPoint decks, Word, tool, etc.)
  6. solution
  7. help with implementing
  8. assess and improve – if/as needed
  1. understand the problem
  2. understand the environment
  3. gather the requirements
  4. align with IT on design
  5. test
  6. train
  7. deploy
  8. follow-up
  1. watch how it is currently done
  2. listen to clients’ pain points
  3. define goals of project
  1. critical path tasks
  2. pros/cons of tasks
  3. impacts
  4. risks
  5. goals
  1. discovery – high level
  2. analysis / evaluation
  3. presentation of options
  4. requirements gathering
  5. epic / feature / story definition’
  6. prioritization
  1. who is driving the requirements?
  2. focus on what is needed for project
  3. who is going to use the product?
  1. elicit requirements
  2. hold focus groups
  3. create mock-ups
  4. test
  5. write user stories
  1. analyze
  2. document process
  3. identify waste (Lean)
  4. communicate
  5. document plan / changes
  1. meeting
  2. documentation
  3. strategy
  4. execution plan
  5. reporting plan

List some steps you took on a weird or non-standard project.

I’m not sure these responses are too out of the ordinary. I think that once BAs become more experienced they see these are just variations on familiar themes. That said, most practitioners won’t understand this until they actually have that experience. This work is an attempt to illuminate the larger picture for them ahead of time. It is also, of course, interesting to hear what people think is unexpected or nonstandard.

  • Steps:
    1. Why is there a problem? Is there a problem?
    2. What can change? How can I change it?
    3. How to change the process for lasting results
  • A description of “weird” usually goes along with a particular person I am working with rather than a project. Some people like things done a certain way or they need things handed to them or their ego stroked. I accommodate all kinds of idiosyncrasies so that I can get the project done on time.
  • adjustments in project resources
  • after initial interview, began prototyping and iterated through until agreed upon design
  • built a filter
  • create mock-ups and gather requirements
  • create strategy to hit KPIs
  • data dictionary standardization
  • describing resource needs to the customer so they better understand how much work actually needs to happen and that there isn’t enough staff
  • design sprint
  • design thinking
  • developers and I create requirements as desired
  • document requirements after development work has begun
  • documented non-value steps in a process new to me
  • explained project structure to stakeholders
  • For a client who was unable to clearly explain their business processes and where several SMEs had to be consulted to form the whole picture, I drew workflows to identify inputs/outputs, figure out where the gaps in our understanding existed, and identify the common paths and edge cases.
  • guided solutioning
  • identified handoffs between different contractors
  • identify end results
  • interview individuals rather than host meetings
  • investigate vendor-provided code for business process flows
  • iterative development and delivery
  • made timeline promises to customers without stakeholder buy-in/signoff
  • make excutive decisions withoutstakeholder back-and-forth
  • moved heavy equipment
  • observe people doing un-automated process
  • personally evaluate how comitted mgt was to what they said they wanted
  • phased delivery / subject areas
  • physically simulate each step of an operational process
  • regular status reports to CEO
  • resources and deliverables
  • reverse code engineering
  • review production incident logs
  • simulation
  • start with techniques from junior team members
  • starting a project without getting agreed funding from various units
  • statistical modeling
  • surveys
  • team up with PM to develop a plan to steer the sponsor in the right diection
  • town halls
  • track progress in PowerPoint because the sponsor insisted on it
  • train the team how to read use case diagrams
  • translating training documents into Portuguese
  • travel to affiliate sites to understand their processes
  • understanding cultural and legal requirements in a foreign country
  • use a game
  • using a ruler to estimate level of effort to digitize paper contracts in filing cabinets gathered over 40 years
  • work around manager who was afraid of change – had to continually demonstrate the product, ease of use, and savings
  • worked with a mechanic
  • write requirements for what had been developed

Name three software tools you use most.

This is probably the most interesting part of the survey for me. Excel, MS Office products, and collaboration and cummunication products get the most use by BAs, which shouldn’t be surprising. Actual programming tools are almost exclusively the domain of developers, it seems. I’ve done both, but in general there doesn’t seem to be a lot of crossover. The same goes for hardcore use of database tools.

  • Excel (24)
  • Jira (14)
  • Visio (14)
  • Word (13)
  • Confluence (8)
  • Outlook (7)
  • SharePoint (6)
  • Azure DevOps (5)
  • MS Team Foundation Server (4)
  • PowerPoint (4)
  • email (3)
  • Google Docs (3)
  • MS Dynamics (2)
  • MS Visual Studio (2)
  • Notepad (2)
  • OneNote (2)
  • SQL Server (2)
  • Version One (2)
  • Adobe Reader (1)
  • all MS products (1)
  • ARC / Knowledge Center(?) (Client Internal Tests) (1)
  • Basecamp (1)
  • Blueprint (1)
  • Bullhorn (1)
  • CRM (1)
  • database, spreadsheet, or requirement tool for managing requirements (1)
  • Doors (1)
  • Enbevu(?) (Mainframe) (1)
  • Enterprise Architect (1)
  • Gephi (dependency graphing) (1)
  • Google Calendar (1)
  • Google Drawings (1)
  • illustration / design program for diagrams (1)
  • Kingsway Soft (1)
  • LucidChart (1)
  • MS Office (1)
  • MS Office tools (1)
  • MS Project (1)
  • MS Word developer tools (1)
  • NUnit (1)
  • Power BI (1)
  • Process 98 (1)
  • Python (1)
  • R (1)
  • requirements repositories, e.g., RRC, RTC (1)
  • RoboHelp (1)
  • Scribe (1)
  • Scrumhow (?) (1)
  • Siebel (1)
  • Skype (1)
  • Slack (1)
  • SnagIt (1)
  • SQL (1)
  • Tableau (1)
  • Visible Analyst (1)

Name three non-software techniques you use most.

These should roughly correspond to the 50 business analysis techniques described in the BABOK, but a lot of creativity and personal interaction is also in evidence. It’s also likely that more of these items could be aggregated, since they are so close, but for now I’m trying to preserve the original language of the survey respondents as much as possible.

  • communication (3)
  • interviews (3)
  • meetings (2)
  • process mapping (2)
  • prototyping (2)
  • relationship building (2)
  • wireframing (2)
  • “play package” (1)
  • 1-on-1 meetings to elicit requirements (1)
  • active listening (1)
  • analysis (1)
  • analyze audience (1)
  • apply knowledge of psychology to figure out how to approach the various personalities (1)
  • business process analysis (1)
  • calculator (1)
  • change management (1)
  • coffees with customers (1)
  • coffees with teams (1)
  • collaboration (1)
  • conference calls (1)
  • conflict resolution and team building (1)
  • costing out the requests (1)
  • critical questioning (1)
  • critical questioning (ask why fiive times), funnel questioning (1)
  • data analysis (1)
  • data modeling (1)
  • decomposition (1)
  • design thinking (1)
  • develop scenarios (1)
  • development efforts (1)
  • diagramming/modeling (1)
  • documentation (1)
  • documenting notes/decisions (1)
  • drinking (1)
  • elicitation (1)
  • expectation level setting (1)
  • face-to-face technique (1)
  • facilitiation (1)
  • fishbone diagram (1)
  • Five Whys (1)
  • focus groups (1)
  • handwritten note-taking (1)
  • hermeneutics / interpretation of text (1)
  • impact analysis (1)
  • individual meetings (1)
  • informal planning poker (1)
  • initial mockups / sketches (1)
  • interview end user (1)
  • interview stakeholders (1)
  • interview users (1)
  • interviewing (1)
  • JAD sessions (Joint Application Development Sessions) (1)
  • listening (1)
  • lists (1)
  • meeting facilitation (prepare an agenda, define goals, manage time wisely, ensure notes are taken and action items documented) (1)
  • notes (1)
  • observation (1)
  • organize (1)
  • paper (1)
  • pen and paper (1)
  • phone calls and fate-to-face meetings (1)
  • Post-It notes (Any time of planning or breaking down of a subject, I use different colored Post-Its, writing with a Sharpie, on the wall. This allows me to physically see an idea from any distance. I can also move and categorize at will. When done, take a picture.) (1)
  • prioritization (MOSCOW) (1)
  • process decomposition (1)
  • process design (1)
  • process flow diagrams (1)
  • process modeling (1)
  • prototyping (can be on paper) (1)
  • recognize what are objects (nouns) and actions (verbs) (1)
  • requirements meetings (1)
  • requirements verification and validation (1)
  • responsibility x collaboration using index cards (1)
  • rewards (food, certificates) (1)
  • Scrum Ceremonies (1)
  • Scrums (1)
  • shadowing (1)
  • sketching (1)
  • spreadsheets (1)
  • stakeholder analysis (1)
  • stakeholder engagement (1)
  • stakeholder engagement – visioning to execution and post-assessment (1)
  • stakeholder interviews (1)
  • surveys (1)
  • swim lanes (1)
  • taking notes (1)
  • test application (1)
  • training needs analysis (1)
  • use paper models / process mapping (1)
  • user group sessions (1)
  • user stories (1)
  • visual modeling (1)
  • whiteboard diagrams (1)
  • whiteboard workflows (1)
  • whiteboarding (1)
  • workflows (1)
  • working out (1)

Name the goals of a couple of different projects.

  • add enhancements to work flow app
  • adhere to regulatory requirements
  • adjusting solution to accommodate the needs of a new/different user base
  • automate a manual form with a workflow
  • automate a manual login/password generation and dissemination to users
  • automate a manual process
  • automate a manual process, reduce time and staff to accomplish a standard organizational function
  • automate a paper-based contract digitization process
  • automate and ease reporting (new tool)
  • automate highly administrative, easily repeatable processes which have wide reach
  • automate new process
  • automate the contract management process
  • automate the process of return goods authorizations
  • automate workflow
  • automate workflows
  • automation
  • block or restore delivery service to areas affected by disasters
  • bring foreign locations into a global system
  • build out end user-owned applications into IT managed services
  • business process architecture
  • clear bottlenecks
  • consolidate master data
  • create a “how-to” manual for training condo board members
  • create a means to store and manage condo documentation
  • create a reporting mechanism for healthcare enrollments
  • data change/update
  • data migration
  • design processes
  • develop a new process to audit projects in flight
  • develop and interface between two systems
  • develop data warehouse
  • develop effort tracking process
  • develop new functionality
  • develop new software
  • document current inquiry management process
  • enhance system performance
  • establish standards for DevOps
  • establish vision for various automation
  • I work for teams impplementing Dynamics CRM worldwide. I specialize in data migration and integration.
  • implement data interface wiith two systems
  • implement new software solution
  • implement software for a new client
  • implement vendor software with customizations
  • improve a business process
  • improve system usability
  • improve the usage of internal and external data
  • improve user interface
  • include new feature on mobile application
  • increase revenue and market share
  • integrate a new application with current systems/vendors
  • maintain the MD Product Evaluation List (online)
  • map geographical data
  • merge multiple applications
  • migrate to a new system
  • move manual Excel reports online
  • process data faster
  • process HR data and store records
  • provide business recommendations
  • recover fuel-related cost fluctuations
  • redesign
  • redesign a system process to match current business needs
  • reduce technical debt
  • re-engineer per actual user requirements
  • reimplement solution using newer technology
  • replace current analysis tool with new one
  • replace manual tools with applications
  • replatform legacy system
  • simplify / redesign process
  • simplify returns for retailer and customer
  • standardize / simplify a process or interface
  • system integration
  • system integration / database syncing
  • technical strategy for product
  • transform the customer experience (inside and outside)
  • update a feature on mobile app
  • update the e-commerce portion of a website to accept credit and debit cards
Posted in Tools and methods | Tagged , , , , , | Leave a comment

Business Analysis Survey: Round Five

I collected more survey responses at the recent Tampa IIBA meetup, and the results are posted below.

List 5-8 steps you take during a typical project.

  1. listen – to stakeholders and customers
  2. analyze – documents, data, atc. to understand thhings further
  3. repeat back what I’m hearing to make sure I’m understanding correctly
  4. synthesize – the details
  5. document – as needed(e.g., Visio diagramsPowerPoint decks, Word, tool, etc.)
  6. solution
  7. help with implementing
  8. assess and improve – if/as needed
  1. understand the problem
  2. understand the environment
  3. gather the requirements
  4. align with IT on design
  5. test
  6. train
  7. deploy
  8. follow-up
  1. watch how it is currently done
  2. listen to clients’ pain points
  3. define goals of project
  1. critical path tasks
  2. pros/cons of tasks
  3. impacts
  4. risks
  5. goals
  1. discovery – high level
  2. analysis / evaluation
  3. presentation of options
  4. requirements gathering
  5. epic / feature / story definition’
  6. prioritization
  1. who is driving the requirements?
  2. focus on what is needed for project
  3. who is going to use the product?
  1. elicit requirements
  2. hold focus groups
  3. create mock-ups
  4. test
  5. write user stories
  1. analyze
  2. document process
  3. identify waste (Lean)
  4. communicate
  5. document plan / changes
  1. meeting
  2. documentation
  3. strategy
  4. execution plan
  5. reporting plan

List some steps you took on a weird or non-standard project.

  • built a filter
  • create strategy to hit KPIs
  • design sprint
  • design thinking
  • identify end results
  • moved heavy equipment
  • resources and deliverables
  • start with techniques from junior team members
  • translating training documents into Portuguese
  • understanding cultural and legal requirements in a foreign country
  • worked with a mechanic

Name three software tools you use most.

  • Excel (6)
  • Outlook (5)
  • Jira (3)
  • Azure DevOps (2)
  • MS Dynamics (2)
  • MS Teams (2)
  • SharePoint (2)
  • Bullhorn (1)
  • Confluence (1)
  • Kingsway Soft (1)
  • MS Office tools (1)
  • Power BI (1)
  • PowerPoint (1)
  • Scribe (1)
  • Siebel (1)
  • Skype (1)
  • Slack (1)
  • Version One (1)
  • Visio (1)
  • Visual Studio Team Services (now Azure DevOps) (VSTS) (1)
  • Word (1)

Name three non-software techniques you use most.

  • change management
  • coffees with customers
  • coffees with teams
  • collaboration
  • conference calls
  • design thinking
  • development efforts
  • documenting notes/decisions
  • drinking
  • face-to-face technique
  • focus groups
  • interviewing
  • interviews
  • prioritization (MOSCOW)
  • process design
  • prototyping (can be on paper)
  • relationship building
  • sketching
  • stakeholder engagement
  • stakeholder engagement – visioning to execution and post-assessment
  • working out

Name the goals of a couple of different projects.

  • add enhancements to work flow app
  • automate highly administrative, easily repeatable processes which have wide reach
  • automate workflow
  • automate workflows
  • bring foreign locations into a global system
  • business process architecture
  • consolidate master data
  • design processes
  • develop new software
  • establish standards for DevOps
  • establish vision for various automation
  • I work for teams impplementing Dynamics CRM worldwide. I specialize in data migration and integration.
  • improve the usage of internal and external data
  • replace manual tools with applications
  • simplify / redesign process
  • standardize / simplify a process or interface
  • technical strategy for product
  • transform the customer experience (inside and outside)
Posted in Tools and methods | Tagged , , , , , | Leave a comment

Communicating With Leaders

Yesterday I attended the Tampa IIBA’s chapter meeting to see the presentation given by Kara Sundar, titled Communicating With Leaders: Increase Your Productivity and Influence. The presentation was very effective and based on materials from the book Change Intelligence by Barbara Trautlein.

Ms. Sundar described how people with different viewpoints contribute to and shape projects in different ways, based on their individual concerns, roles, and outlooks. Three primary axes are defined, as shown in the first figure below. The three axes are defined as “Head,” “Heart,” and “Hands,” which you can otherwise think of as thinking, feeling, and doing.

Six roles are then defined with respect to these axes as follows. Primary concerns of each role are listed in the bullets and in the second image below.

  • Visionary: Centered on high end of head axis
    • Strategy, Goals, Objectives, Opportunities, Outcomes
  • Champion: Between high ends of head and heart axes
    • Impact, Stakeholders, Benefits, Performance, Success, Results
  • Coach: Centered on high end of heart axis
    • Engagement, Development, Team Dynamics, Buy-In, Clients & Customers, Feedback
  • Facilitator: Between high ends of heart and hands axes
    • Process, Workflow, Evaluation, Learning Plan, Training Material, Documentation
  • Executer: Centered on high end of hands axis
    • Plan, Metrics, Deadline, Testing, Tickets/Tracking, Budget
  • Driver: Between high ends of hands and head axes
    • Plan, Execution, Milestones, KPIs, Decisions, Dashboards, Commitment
  • Adapter: Centered around the origin of the intersection of all aces
    • Essentially a hybrid of all the other roles

The context of the presentation was that different kinds of people approach activities in different ways. Understanding the different possibilities can provide some insight into why your co-workers, customers, and stakeholders might see things differently than you do, and also suggest guidance for bridging those gaps. In that regard it serves sort of the same function as Myers-Briggs Personality Indicators which, whether you like them or not, at least give some appreciation that people can be different and how they might be understood and approached. (This book describes potential MBTI applications in the workplace.) No such representation is going to be perfect, but each can serve as a useful point of departure for thinking about things.

Barbara Trautlein included a radar chart of the prevalence of the outlooks in the general population, which she probably derived from survey data she collected. It showed the occurrence from most to least prevalent to be (roughly) Champion (22%), Coach (20%), Visionary (17%), Adapter (15%), Executer (11%), Driver (8%), and Facilitator (7%). Graphically, this skews up and left on the triangular plot. One might be tempted to observe that more people seem to care about how the work gets done than about actually doing it, but, when Ms. Sundar tried to survey the room, the BA practitioners there skewed hard to the lower right. (I’ve done so much of all these I’ve become an adapter, but I definitely skewed towards the practitioner-doer end of things when I started out.)

I generally like frameworks that try to classify concepts in a useful way, but it took me a while to let this sink in. My original impression was that this classification was the flip side of the Nine SDLC Cross-Functional Areas breakdown defined by Kim Hardy that I wrote about a couple of years ago. Ms. Hardy defines nine team roles in terms of realizing different functional parts of a solution (Business Value, User Experience, Process Performance, Development Process, System Value, System Integrity, Implementation, Application Architecture, and Technical Architecture). These are more the whats of a project, which is contrasted with the schema Ms. Sundar described, which defines the hows. There’s a hair of overlap, but I think the formulations are trying to get at different things. Both, of course, are useful and thought-provoking.

This made me think about my own analytic and procedural framework, which defines a series of iterative phases that guide progress through a project or engagement. My phases are Planning, Intended Use, Assumptions/Capabilities/Risks, Conceptual Model, Data Sources, Requirements, Design, Implementation, Test, and Acceptance. (This can be shortened to Intended Use, Conceptual Model, Requirements, Design, Implementation, and Test.) This framework tries to incorporates the hows and the whats through time. It’s not a stretch to map the Change Intelligence or SDLC Cross-Functional roles to the phases in my framework.

In fact, let’s try it. Not only am I going to try to map the Change Intelligence and SDLC roles to the phases in my framework, I’m going to map yet another set of defined roles to it, based on the roles I’ve served in over the years.

  • Bob’s Framework Phase: Planning
    • CI Roles: Driver, Executer, Coach, Visionary, Champion
    • SDLC Roles: Business Value, System Value, Application Architecture
    • Bob’s Roles: Systems Analyst, Project/Program Manager, Full Life Cycle Engineer
  • Bob’s Framework Phase: Intended Use
    • CI Roles: Facilitator, Visionary, Champion
    • SDLC Roles: Business Value, System Value, Application Architecture, Technical Architecture
    • Bob’s Roles: Systems Analyst, Project/Program Manager, Full Life Cycle Engineer
  • Bob’s Framework Phase: Assumptions/Risks
    • CI Roles: Driver, Visionary, Champion
    • SDLC Roles: Business Value, System Value, Application Architecture
    • Bob’s Roles: Systems Analyst, Project/Program Manager, Full Life Cycle Engineer
  • Bob’s Framework Phase: Conceptual Model
    • CI Roles: Coach, Facilitator, Champion
    • SDLC Roles: Business Value, System Value, Application Architecture
    • Bob’s Roles: Systems Analyst, Product Owner, Discovery Lead
  • Bob’s Framework Phase: Data Sources
    • CI Roles: Driver, Executor, Facilitator, Champion
    • SDLC Roles: Business Value, User Experience, System Value, System Integrity, Application Architecture, Technical Architecture
    • Bob’s Roles: Systems Analyst, Product Owner, Data Collector, Data Analyst
  • Bob’s Framework Phase: Requirements
    • CI Roles: Driver, Executer, Coach, Facilitator
    • SDLC Roles: Business Value, User Experience, Process Performance, System Value, System Integrity, Application Architecture, Technical Architecture
    • Bob’s Roles: Systems Analyst, System Architect, Software Engineer, Product Owner, Project/Program Manager, Full Life Cycle Engineer
  • Bob’s Framework Phase: Design
    • CI Roles: Executor, Coach, Facilitator
    • SDLC Roles: Business Value, User Experience, Process Performance, Development Process, System Value, System Integrity, Implementation, Application Architecture, Technical Architecture
    • Bob’s Roles: Systems Analyst, System Architect, Software Engineer, Product Owner, Project/Program Manager, Full Life Cycle Engineer
  • Bob’s Framework Phase: Implementation
    • CI Roles: Driver, Executer, Coach, Facilitator
    • SDLC Roles: User Experience, Process Performance, Development Process, System Integrity, Implementation, Application Architecture, Technical Architecture
    • Bob’s Roles: Systems Analyst, System Architect, Software Engineer, Product Owner, Project/Program Manager, Full Life Cycle Engineer
  • Bob’s Framework Phase: Test
    • CI Roles: Driver, Executer, Coach, Facilitator
    • SDLC Roles: User Experience, Process Performance, Development Process, System Integrity, Implementation
    • Bob’s Roles: Systems Analyst, System Architect, Software Engineer, Product Owner, Project/Program Manager, Full Life Cycle Engineer
  • Bob’s Framework Phase: Acceptance
    • CI Roles: Driver, Executer, Coach, Facilitator, Visionary, Champion
    • SDLC Roles: Business Value, User Experience, Process Performance, System Value, System Integrity, Application Architecture, Technical Architecture
    • Bob’s Roles: Systems Analyst, System Architect, Product Owner, Project/Program Manager, Full Life Cycle Engineer

This is not the cleanest of exercises as you can see, which is an indication that every role and outlook applies across a variety of activities. Some of the activities I listed on my Roles page are so general or otherwise apply out of context that I didn’t include them here. They are, Tech Lead, Simulation Engineer, Operations Research Analyst, Control Systems Engineer, Field Engineer, and Process Improvement Specialist. The Independent VV&A activity from my Portfolio page indicates that I should also define a role of Test and V&V Agent, since I’ve served in that role in many projects and contexts, even if I haven’t done it in a specialized, standalone capacity in an organization. The Program Manager role should probably be applied only to the beginning and ending phases of individual efforts, leaving Project Managers to handle the details during each one.

Other roles, like ScrumMaster and Scrum Developer can overlap the above roles and phases in different ways. Other functional specialties haven’t been included in these discussions, either. They include roles like Database Analyst/Developer, Graphic Artist, and UI/UX Designer, and perhaps even Business Analyst (I see a systems analyst as being more general). The number of roles that can be defined are potentially endless, which probably contributes to some of the confusion which exists in the practice of business analysis. To that end I’ve been working on a Unified Theory of Business Analysis, which I’ll be writing about in the coming days.

Posted in Management | Tagged , , , , , | Leave a comment

A Simulationist’s Framework for Business Analysis: Round Four

Yesterday I gave my talk at the Orlando IIBA chapter meetup. The updated presentation is here:

https://www.rpchurchill.com/presentations/zSimFrameForBA_Orlando/SimFrameForBA.html

This presentation was a bit different than previous ones because I’ve begun to add some new material dealing with what I call The Unified Theory of Business Analysis. I call it this because the surveys I’ve conducted (previous combined results are here) show that the contexts in which people perform business analysis can vary widely. This effort has been my attempt to get a handle on the practice and communicate my findings to an audience.

Specifically, I’m coming to the conclusion that the reason things look so different to different practitioners is because most people are only trying to solve a limited problem most of the time. My framework describes a process for conducting a full scope engagement from beginning to end, and most people — and certainly most business analysts — typically find themselves involved in a limited subset of such efforts.

Let’s look at a possible map of an entire (small) business or physical process.

Let’s also consider the process of completing an end-to-end process as a whole.

The first question we might ask is How can the 50 BA Techniques from the BaBOK be applied? We can imagine that some techniques are best applied to certain subparts of an operational process and during certain phases in time of a project process.

The next question to ask might be Where do the most commonly used software tools fit? Excel appears to be the most common tool BAs use to compile and manipulate information, while Jira and similar systems are used for communication and tracking. Visio, Word, Confluence, Outlook, and SharePoint are also used a lot by BAs.

Another source of confusion might be that a BA does not always participate in all phases of a project. He or she might be involved only in the requirements phase or the user testing phase or the discovery phase or even in a limited subset of phases. I’ve pointed out previously that all projects and efforts perform all of these phases implicitly, but some may be streamlined or omitted as the size of the effort scales down. The entire process takes place implicitly or explicitly even if any one participant or group only sees a small fraction of the activity.

A further consideration is whether the BA is looking at the function of a part of a process or the characteristics of what will perform that function. For example, the abstract part of a solution might involve calculations to be performed, messages to be passed into and out of a process component, and items to be stored and retrieved. The concrete part of a solution might involve determining the qualities the server must have to carry out the abstract functions with sufficient speed and reliability. That is, sometimes the BA will evaluate what something has to do and sometimes what something has to be.

The point is that organizations exist to provide value, and it takes many different kinds of people to make that happen. Business analysts are generally right in the middle of the action.

* * * * *

I also got a few more survey responses, the results of which are reported below.

List 5-8 steps you take during a typical project.

  1. planning
  2. elicitation
  3. requirements
  4. specification writing
  5. QA
  6. UAT
  1. identify the problem
  1. studying subject matter
  2. planning
  3. elicitation
  4. functional specification writing
  5. documentation
  1. identify stakeholders
  2. assess working approach (Waterfall, Agile, Hybrid)
  3. determine current state of requirements and maturity of project vision
  4. interview stakeholders
  5. write and validate requirements
  1. problem definition
  2. value definition
  3. decomposition
  4. dependency analysis
  5. solution assessment
  1. process mapping
  2. stakeholder interviews
  3. write use cases
  4. document requirements
  5. research

List some steps you took on a weird or non-standard project.

  • A description of “weird” usually goes along with a particular person I am working with rather than a project. Some people like things done a certain way or they need things handed to them or their ego stroked. I accommodate all kinds of idiosyncrasies so that I can get the project done on time.
  • data dictionary standardization
  • document requirements after development work has begun
  • For a client who was unable to clearly explain their business processes and where several SMEs had to be consulted to form the whole picture, I drew workflows to identify inputs/outputs, figure out where the gaps in our understanding existed, and identify the common paths and edge cases.
  • investigate vendor-provided code for business process flows
  • reverse code engineering
  • review production incident logs
  • team up with PM to develop a plan to steer the sponsor in the right diection
  • track progress in PowerPoint because the sponsor insisted on it
  • train the team how to read use case diagrams

Name three software tools you use most.

  • Visio (5)
  • Jira (4)
  • Excel (3)
  • Confluence (2)
  • Google Docs (2)
  • email (1)
  • Gephi (dependency graphing) (1)
  • Google Calendar (1)
  • MS Teams (1)
  • OneNote (1)
  • Version One (1)
  • Visual Studio Team Services (now Azure DevOps) (VSTS) (1)
  • Word (1)

Name three non-software techniques you use most.

  • critical questioning
  • critical questioning (ask why fiive times), funnel questioning
  • data analysis
  • informal planning poker
  • interviews
  • meeting facilitation (prepare an agenda, define goals, manage time wisely, ensure notes are taken and action items documented)
  • observation
  • Post-It notes (Any time of planning or breaking down of a subject, I use different colored Post-Its, writing with a Sharpie, on the wall. This allows me to physically see an idea from any distance. I can also move and categorize at will. When done, take a picture.)
  • process mapping
  • relationship building
  • requirements verification and validation
  • stakeholder analysis
  • stakeholder interviews
  • visual modeling
  • whiteboarding
  • wireframe

Name the goals of a couple of different projects.

  • automate a manual form with a workflow
  • automate the process of return goods authorizations
  • build out end user-owned applications into IT managed services
  • develop a new process to audit projects in flight
  • develop an interface between two systems
  • implement data interface with two systems
  • implement software for a new client
  • implement vendor software with customizations
  • integrate a new application with current systems/vendors
  • merge multiple applications
  • migrate to a new system
  • redesign a system process to match current business needs
  • update the e-commerce portion of a website to accept credit and debit cards

These findings fit in nicely with previously collected survey data.

Posted in Tools and methods | Tagged , , , , , , | Leave a comment

Computer Simulation

I gave this talk on computer simulation at the Mensa Regional Gathering in Orlando on Sunday, January 20, 2019. The slides for the presentation are here.

I give a brief description of the major types of simulation and discuss some applications of each.

Posted in Uncategorized | Leave a comment

Monitoring System Health and Availability, and Logging: Part 4

One more piece of context must be added to the discussion I’ve written up here, here, and here, and that is the place of these operations in the 7-layer OSI communications model.

The image above is copied from and linked to the appropriate Wikipedia page. The clarification I’m making is that, in general, all of the operations I’m describing with respect to monitoring and logging take place strictly at level 7, the application layer. This is the level of the communication process that application programmers deal with in most cases, particularly when working with higher-level protocols like TCP/IP and HTTP, even if that code writes specific information into message headers along with the messages.

Some applications will work with communications at lower levels. For example, I’ve worked with serial communications in real-time C++ code triggered by hardware interrupts where the operations at layers 5 and 6 were handled in the application, but the operations at layers 1 through 4 were handled by default in hardware and firmware and routing isn’t a consideration because serial is just a point-to-point operation. Even in that case, the monitoring and logging actions are performed (philosophically) at the application layer (layer 7).

Finally, it’s also possible to monitor configuration of certain items at lower levels. Examples are ports, urls, IP addresses, security certificates, authorization credentials, machine names, software version numbers (including languages, databases, operating systems), and other items that may affect the success or failure of communications. Misconfiguration of these items are likely to result in complete inability to communicate (e.g., incorrect network settings) or strange side-effects (e.g., incorrect language versions, especially for virtual machines supporting Java, JavaScript/Node, PHP, and the like).

Posted in Software | Tagged , , , , , | Leave a comment

Monitoring System Health and Availability, and Logging: Part 3

Now that we’ve described how to sense error states and something about how to record logging information on systems running multiple processes, we’ll go into some deeper aspects of these activities. We’ll first discuss storing information so errors can be recovered from and reconstructed. We’ll then discuss errors from nested calls in multi-process systems.

Recovering From Errors Without Losing Information and Events

We’ve described how communications work between processes on a single machine and across multiple machines. If an original message is not successfully sent or received for any reason, the operation of the receiving or downstream process will be compromised. If no other related events occur in the downstream process, then the action downstream action will not be carried out. If the message is passed in support of some other downstream action that does occur, however, then the downstream action will be carried out with missing information (that might, for example, require the use of manually entered or default values in the place of what wasn’t received). An interesting instance of the latter case is manufacturing systems where a physical workpiece may move from one station to another while the informational messages are not forwarded along with them. This may mean that the workpiece in the downstream station will have to be processed without identifying information, processing instructions, and so on.

There are a few ways to overcome this situation:

  • Multiple retries: This involves re-sending the message from an upstream process to a downstream process until a successful receipt (and completion?) is received by the upstream process. This operation fails when the upstream process itself fails. It may also be limited if new messages must be sent from an upstream process to a downstream process before the previous message is successfully sent.
  • Queueing requests: This involves storing the messages sent downstream so repeated attempts can be made to get them all handled. Storing in a volatile queue (e.g., in memory) may fail if the upstream process fails. Storing in a non-volatile queue (e.g., on disk) is more robust. The use of queues may also be limited if the order in which messages are sent is important, though including timestamp information may overcome those limits.
  • Pushing vs. Pulling: The upstream process can queue and/or retry sending the messages downstream until they all get handled. The downstream system can also fetch current and missed messages from the upstream system. It’s up the the pushing or pulling system to keep track of which actions or messages have been successfully handled and which must still be dealt with.

There may be special considerations depending on the nature of the system being designed. Some systems are send-only be nature. This may be a function of the communication protocol itself or just a facet of the system’s functional design.

In time-sensitive systems some information or actions may “age out.” This means they might not be able to be used in any meaningful context as events are happening, but keeping the information around may be useful for reconstructing events after the fact. This may be done by hand or in some automated way by functions that continually sweep for unprocessed items that may be correlated with known events.

For example, an upstream process may forward a message to a downstream process in conjunction with a physical workpiece. The message is normally received by the downstream system ahead of the physical workpiece so that it may be associated with the workpiece when it is received by the downstream system. If the message isn’t received before the physical piece the downstream process may assign a temporary ID and default processing values to the piece. If the downstream process receives the associated message while the physical piece is still being worked on in the downstream process then it can be properly associated and the correct instructions have a better chance of being carried out. The operating logs of the downstream process can also be edited or appended as needed. It the downstream process receives the associated message after the physical piece has left, then all the downstream system can do is log the information, and possibly pass it down to the next downstream process, in hopes that it will eventually catch up with the physical piece.

Another interesting case arises when the communication (or message- or control-passing) process fails on the return trip, after all of the desired actions were successfully completed downstream. Those downstream actions might include permanent side-effects like the writing of database entries and construction of complex, volatile data structures. The queuing/retry mechanisms have to be smart enough to detect whether the desired operations aren’t repeated if they have actually been completed.

A system will ideally be robust enough to ensure that no data or events ever get lost, and that they are all handled exactly the right number of times without duplication. Database systems that adhere to the ACID model have these qualities.

Properly Dealing With Nested Errors

System that pass messages or control through nested levels of functionality and then receive responses in return need a message mechanism that clearly indicates what went right or wrong. More to the point, since the happy path functionality is most likely to work reasonably well, particular care must be taken to communicate a complete contextual description of any errors encountered.

Consider the following function:

A properly constructed function would return errors from every identifiable point of failure in detail and unidentifiable failure in general. (This drawing should also be expanded to include reporting on calculation and other internal errors.) This generally isn’t difficult in the inline code over which the function has control, but what happens if control is passed to a nested function of some kind? And what if that function is every bit as complex as the calling function? In that case the error that should be returned should include information identifying the point in the calling function, with a unique error code and/or text description (how verbose you want to be depends on the system), and within that should be embedded the same information returned from the called function. Doing this gives a form of stack trace for errors (this level of detail generally isn’t needed for successfully traversed happy paths) and a very, very clear understanding of what went wrong, where, and why. If the relevant processes can each perform their own logging they should also do so, particularly if the different bits of functionality reside on different machines, as would be the case in a microservices architecture, but scanning error logs across different systems can be problematic. Being able to record errors at a higher level makes understanding process flows a little more tractable and could save a whole lot of forensic reconstruction of the crime.

Another form of nesting is internal operations that deal with multiple items, whether in arrays or some in other kind of grouped structure. This is especially important if separate, complex, brittle, nested operations are to be performed on each, and where each can either complete or fail to complete with completely different return conditions (and associated error codes and messages). In this case the calling function should return information describing the outcome of processing for each element (especially those that returned errors), so only those items can be queued and/or retried as needed. This can get very complicated if that group of items is processed at several different steps in the calling function, and different items can return different errors not only within a single operation, but across multiple operations. That said, once an item generates an error at an early step, it probably shouldn’t continue to be part of the group being processed at a later step. It should instead be queued and retried at some level of nested functionality.

Further Considerations

One more way to clear things up is to break larger functions down into smaller ones where possible. There are arguments for keeping a series of operations in a single function if they make sense logically and in the context of the framework being used, but there are arguments for clarity, simplicity, separation of concerns, modularity, and understandability as well. Whatever choice you make, know that you’re making it and do it on purpose.

If it feels like we’re imposing a lot of overhead to do error checking, monitoring, reporting, and so on, consider the kind of system we might be building. In a tightly controlled simulation system used for analysis, where calculation speed is the most important consideration, the level of monitoring and so on can be greatly reduced if it is known that the system is properly configured. In a production business or manufacturing system, however, the main considerations are going to be robustness, security, and customer service. Execution speed is far less likely to be the overriding consideration. In that case the efforts taken to avoid loss of data and events is the main goal of the system’s operation.

Posted in Software | Tagged , , , , , , , | Leave a comment

Monitoring System Health and Availability, and Logging: Part 2

Continuing yesterday’s discussion of monitoring and logging, I wanted to work through a few specific cases and try to illustrate how things can be sensed in an organized way, and especially how errors of various types can be detected. To that end, let’s start with a slightly simplified situation as shown here, where we have a single main process running on each machine, and the monitoring, communication, and logging are all built in to each process.

As always, functionality on the local machine is straightforward. In this simplified case, things are either working or they aren’t, and that should be reflected in whatever log are or are not written. Sensing the state and history of the remote machine is what’s interesting.

First, let’s imagine a remote system that supports only one type of communication. That channel must be able to support messaging for many functions, including whatever operations it’s carrying out, remote system administration (if appropriate), reporting on current status, and reporting on historical data. The remote system must be able to interpret the incoming messages so it can reply with an appropriate response. Most importantly it has to be able to sense which of the four kinds of messages it’s receiving. Let’s look at each function in turn.

  • Normal Operations: The incoming message must include enough information to support the desired operation. Messages will include commands, operating parameters, switches, data records, and so on.
  • Remote System Administration: Not much of this will typically happen. If the remote machine is complex and has a full, independent operating system, then it is likely to be administered manually at the machine or through a standard remote interface, different than the operational interface we’re thinking about now. Admin commands using this channel are likely to be few and simple, and include commands like stop, start, reboot, reset communications, and simple things like that. I include this mostly for completeness.
  • Report Current State: This is mostly a way to query and report on the current state of the system. The incoming command is likely to be quite simple. The response will be only as complex as needed to describe the running status of the system and its components. In the case of a single running process as shown here, there might not be much to report. It could be as simple as “Ping!” “Yup!” That said, the standard query may also include possible alarm conditions, process counts, current operating parameters for dashboards, and more.
  • Report Historical Data: This could involve reporting a summary of events that have been logged since the last scan, over a defined time period, or meeting a specified criteria. The reply might be lengthy and involve multiple send operations, or may involve sending one or more files back in their entirety.

Some setups may involve a separate communication channel using a different protocol and supporting different functions. Some of this was covered above and yesterday.

Now let’s look at what can be sensed on the local and remote systems in some kind of logical order:

Condition Current State Sensed Historical State Sensed
No problems, everything working Normal status returned, no errors reported Normal logs written, no errors reported
Local process not running Current status not accessible or not changing Local log file has no entries for time period or file missing
Local process detectable program error Error condition reported Error condition logged
Error writing to disk Error condition detected and reported Local log file has no entries for time period or file corrupted or missing
Error packing outgoing message Can be detected and reported normally Can be detected and logged normally
Error connecting to remote system (not found / wrong address / can’t be reached, incorrect authentication, etc.) Error from remote system received and reported (if it is running, else timeout on no connection made) Error from remote system received and logged (if it is running, else timeout on no connection made)
Error sending to remote system (connection physically or logically down) Error from remote system received and reported (if it is running, else timeout on no connection made) Error from remote system received and logged (if it is running, else timeout on no connection made)
Remote system fails to receive message Request timeout reported Request timeout logged
Remote system error unpacking message Error from remote system received and reported Error from remote system received and logged
Remote system error validating message values Error from remote system received and reported Error from remote system received and logged
Remote system error packing reply values Error from remote system received and reported (if sends relevant error) Error from remote system received and logged (if sends relevant error)
Remote system error connecting Assume this is obviated once message channel open Assume this is obviated once message channel open
Remote system error sending reply Request timeout reported Request timeout logged
Remote system detectable program error Error from remote system received and reported Error from remote system received and logged
Remote system error writing to disk Error from remote system received and reported (if sends relevant error) Entries missing in remote log or remote file corrupted or missing
Remote system not running (OS or host running) Error from remote system received and reported if sent by host/OS, otherwise timeout Error from remote system received and logged if sent by host/OS, otherwise timeout
Remote system not running (entire system down) Report unable to connect or timeout on reply Log unable to connect or timeout on reply

Returning to yesterday’s more complex case, if the remote system supports several independent processes and a separate monitoring process, then there are a couple of other cases to consider.

Condition Current State Sensed Historical State Sensed
Remote monitor working, individual remote processes generating errors or not running Normal status returned, relevant process errors reported Normal status returned, relevant process errors logged
Remote monitor not running, separate from communications Normal status returned, report monitor counter or heartbeat not updating Normal status returned, log monitor counter or heartbeat not updating
Remote monitor not running, with embedded communications Error from remote system received and reported if sent by host/OS, otherwise timeout reported Error from remote system received and logged if sent by host/OS, otherwise timeout logged

A further complication arises if the remote system is actually a cluster of individual machines supporting a single, scalable process. It is assumed in this case that the cluster management mechanism and its interface allow interactions to proceed the same way as if the system was running on a single machine. Alternatively, the cluster manager will be capable of reporting specialized error messages conveying appropriate information.

Posted in Software | Tagged , , , , , | Leave a comment