Balanced Scorecard

I have never explicitly used this technique, but I have certainly examined problems, systems, and opportunities from all the perspectives contained within it. This is true of many techniques described in the BABOK. I often note that every formalized technique or analytical framework is just an organized way of getting you to do what you should be doing anyway, if you were particularly experienced, creative, or thorough. I further frequently note that all described techniques can be and are criticized. While these techniques can inform and clarify the thinking of an experienced practitioner, less experienced practitioners have to start somewhere, and this is an interesting technique for directing analysis to many important considerations.

As someone who has studied economics for much of my life, and especially in the last two decades, I appreciate that economics is the study of choices made under conditions of scarcity and that, contrary to many people’s impressions, those choices do not always involve money. It may be similarly tempting to evaluate all business (or organizational more generally) activities in terms of money, but the balanced scorecard techniques forces the analyst to look at other areas. A monetary value can be placed on every activity, sure, but you have to drill down through any layers of cause and effect to see what they might be.

The balanced scorecard explores a business or system across four dimensions:

  • Financial: Inspiring analysts for look at considerations other than finance does not mean that it doesn’t need to be considered. After all, if an organization or a process continually loses money, it won’t be around for long. (Accounting, by the way, along with the recognition that one of the three functions of money is to serve as a unit of account, is one of thee great discoveries of humankind.)
  • Learning and Growth: This encompasses employee training, corporate culture, and all manner of innovation and improvement. It can even involve training the customers. Improvements in financial performance can result in tracing back through all activities to see how the effect the ending outcomes.
  • Business Process: This involves measuring the performance of people and processes internally, and customer satisfaction externally.
  • Customer: This dimension focuses of the customers’ needs and satisfaction.

There is some overlap between those ares, but the point is to get analysts to focus on each area, specifically.

Analysis of each of those dimensions involves:

  • Objectives: What is the organization trying to accomplish?
  • Measures: How can the organization determine whether it is succeeding?
  • Targets: These must be expressed in terms of things that can be measured (in theory). TQM defines “quality” as adherence to requirements (i.e., a result either meets defined standards or it does not.). This attempts to turn continuum problems into discrete (yes/no, pass/fail) problems.
  • Initiatives: What activities will the organization take to improve or maximize performance within each dimension?

It may helpful to construct a table, but since multiple objectives, measures, targets, and initiative can be defined for each dimension, that’s probably overkill.

Again, this isn’t a method I’ve ever explicitly used. As the BABOK notes, it does not take the place of other types of absolutely necessary forms of analysis. I believe it’s main utility is to inspire analysts to consider things from different points of view.

I once had the insight that most of the stuff in the training materials for my Scrum certifications was kind of beside the point, but that flipping through the three slim notebooks of course notes one morning every couple of weeks might help keep some ideas fresh in my mind. Reviewing the list of fifty BABOK techniques may serve a similar purpose.

Posted in Tools and methods | Tagged , , | Leave a comment

Concept Modeling

A concept model, or conceptual model, is an abstract representation of an organization, process, system, or product. It relates the nouns and verbs and other categorizations within and between elements. It can take any arbitrary form, in contrast with the prescribed rules for formatting a mind map. It can include labels on the nodes and on the connections between them. It uses vocabulary germane to the industry, project, and engagement team.

I think of two possible definitions of what makes a concept model. In my framework, any abstract representation created during the conceptual model phase is a conceptual model. This is usually generated after the discovery operation. The other context, per the BABOK, is more general, and has to do with the makeup and contents of the model (which is usually a drawing or figure but should include all descriptive materials that communicate similar information).

I have often tended to construct these in the form of process diagrams but they can also take the form of architecture diagrams, hierarchy diagrams, representational diagrams, and hybrids of these and others. Certain drawings tend to be produced during specific phases I define in the framework. Concept models, it should come as no surprise, are produced to reflect the As-Is state in the conceptual model phase. They can also be produced to represent the abstract To-Be state in the requirements phase. Model diagrams produced in the design phase tend to be more concrete in their more detailed description of the To-Be state. Diagrams from the Implementation and Test phases will be even more concrete as they will reflect the As-Built state (which itself becomes the new As-Is state).

This foregoing discussion of abstractness and concreteness is a little different than what a concept model is meant to represent. Concept models are necessarily abstract in that they are not intended to show exact details of physical objects, even if they are otherwise drawn to scale. The figure below shows a simulation model built for a land border crossing. It is laid out on a scale drawing of the ground layout of the facility, so the correct distances and movement times are modeled. This particular diagram represents the As-Is state, but a To-Be diagram prepared for the design or implementation phase would look exactly the same. What makes it a concept model is that it shows the nouns and verbs of the customer’s process using its preferred vocabulary.

A diagram of this layout, by itself, would not be considered to be a concept model.

This model might be considered a little more conceptual as it shows less detail.

The next series of diagrams show different conceptual representations of a different systems.

Posted in Tools and methods | Tagged , , , | Leave a comment

Extended Engagement Life Cycle

When creating a new project, engagement, or product from scratch, we should think in terms of its full life cycle. For this reason I have drawn the diagram of my business analysis engagement framework with two extra cycles. One shows an extended period of operation and maintenance, where the originally delivered capability is exercised by the customer for its intended use. During this time the system may be maintained, modified, and updated. When the delivered capability as a whole is no longer useful, it is retired and replaced. This is shown as a separate and final phase of a long-term engagement.

Here are the all the possible phases in the context of a full life cycle engagement, with the two new additional phases added. The new phases occur after the initial engagement has been closed out, including final delivery, acceptance, and handover to the customer. Note that different internal and external vendors or consultants may serve on different engagement teams during the initial engagement phases and the extended support and retirement phases.

The Framework:
  • Project Planning
  • Intended Use
  • Assumptions, Capabilities, and Risks and Impacts
  • Conceptual Model (As-Is State)
  • Data Sources, Collection, and Conditioning
  • Requirements (To-Be State: Abstract)
    • Functional (What it Does)
    • Non-Functional (What it Is, plus Maintenance and Governance)
  • Design (To-Be State: Detailed)
  • Implementation
  • Test
    • Operation, Usability, and Outputs (Verification)
    • Outputs and Fitness for Purpose(Validation)
  • Acceptance (Accreditation)
  • Project Close
  • Operation and Maintenance
  • End-of-Life and Replacement

Here is the above list in the more stylized and streamlined form I show in the main figure(s).

The Framework: Simplified
  •   Intended Use
  •   Conceptual Model (As-Is State)
  •   Data Sources, Collection, and Conditioning
  •   Requirements (To-Be State: Abstract)
    • Functional (What it Does)
    • Non-Functional (What it Is, plus Maintenance and Governance)
  •   Design (To-Be State: Detailed)
  •   Implementation
  •   Test
    • Operation, Usability, and Outputs (Verification)
    • Outputs and Fitness for Purpose (Validation)
  •   Operation and Maintenance
  •   End-of-Life and Replacement

Discussion of these topics is always a bit vague in the BABOK, because the business analysis oeuvre explicitly differentiates itself from the project management oeuvre (see here for a discussion of the overlaps), and also because the BABOK tries not to be overly prescriptive. It gives you a whole bunch of techniques and contexts and ways of thinking about things, but it never says, “Do A, then do B, then do C.” My framework may appear to be at least a little bit prescriptive, but it isn’t really. It really just codifies language that is already in the BABOK, and it gives practitioners a better feel for what they’re doing and when and why. I discuss how the main phases all occur in different contexts and management approaches here. I touch of the same things in my standard presentation on my overall framework.

In keeping with this amorphousness and flexibility, I’ll point out that individual modifications to an existing, deployed capability are themselves independent engagements with all six of the standard phases. This is true even if there are many such modifications over a long period of time, and even if individual ones are small and informal enough not to require large-scale efforts in every phase. I include the figure below for illustration of the idea.

Indeed, the End-of-Life and Replacement phase is likely to be a full engagement on its own.

Posted in Tools and methods | Leave a comment

Survey or Questionnaire

Surveys and questionnaires are excellent tools for gathering data, opinions, needs, and observations from large groups of respondents in a relatively short time. They come in many forms, with individual questions being either open-ended, where respondents can provide any type of answer they want, or closed-ended, where respondents must choose from a fixed group of possible responses. The former may require significantly more effort and interpretation to process, while the latter are more amenable to automation.

I’ve incorporated surveys in my own work here and studied specific types of surveys while preparing for my Lean Six Sigma Black Belt exam. I discuss a particularly memorable decision that used survey techniques, at least in part, here. The treatment of this subject in the BABOK is outstanding, so I’m mostly going to paraphrase its descriptions.

The process for conducting surveys is outlined below.

  1. Prepare the survey or questionnaire and plan for how the data is going to be collected and processed.
    • Define the objective: Determine what information you hope to gather to support the decision(s) you are typing to make.
    • Define the target survey group: Identify the relevant audience to be queried. This may involve anything from a broad customer base to a narrow job function.
    • Choose the correct format: Determine the type of survey questions to ask to get the types of responses you need. You should consider the level of engagement of the audience you are polling. Highly engaged customers and employees may be willing to spend a lot of time and effort providing detailed and voluminous responses, while you may design a lighter and more streamlined process in hopes of gaining sufficient responses from less engaged participants.
    • Select a sample group if appropriate: You may want to or have to choose a subset of a larger group to survey. This may require statistical structuring across many demographics to keep the results from being skewed.
    • Select distribution and collection methods: Determine how the surveys will be sent and how the answers will be returned and processed.
    • Set the target response rate and response end time: Determine the minimum number of responses you need for the effort to be considered valid, and the time window within which responses will be accepted.
    • Determine if additional activities are needed to support the effort: Additional work may need to be done to design the questions or interpret the responses. This work may involve interviews and other techniques.
    • Write the questions: These will be based on the information you need and the decision processes you hope to support. The size, format, and complexity of the survey will be determined by the audience you plan to query and the information you hope to obtain.
    • Test the survey or questionnaire: This involves testing of the mechanics of distribution of questions and collection and processing of responses (verification), and also the methods of assessing the responses for correctness and applicability (validation).
  2. Distribute the survey or questionnaire while considering:
    • the urgency to obtaining the responses,
    • the level of security required, and
    • the location of the respondents.
  3. Gather the responses and document the results.
    • Collate the responses.
    • Summarize the results.
    • Review the details and identify any emerging themes.
    • Formulate categories for breaking down the data.
    • Arrange the results into actionable segments.

When writing about BABOK techniques on this blog I don’t generally go into their strengths and weaknesses. I figure if you understand the techniques well enough you should be able to reason through most working situations or potential questions on certification exams. However, there are some well known issues with this process that aren’t mentioned in the BABOK.

The first issue is selection bias in the respondents, which can take many forms. One example is that people may be more likely to offer responses (or reviews) when they are angry with a product or service. That might yield useful information about complaints, but it may not give an accurate reading of the overall level of satisfaction. Another example is that the nature of the survey may tend to elicit responses from people in excess of their proportion in the overall population. Many requests for responses published in magazines yielded notoriously skewed figures for the prevalence of certain behaviors. In addition, respondents may be inclined to stretch the truth by telling a tall tale or two. The method of polling may skew the results as well. Several studies over recent decades have identified potential issues with telephone canvassing, especially in advance of political elections. If certain demographics are less likely to be at home at certain hours, do not have landlines, or tend to screen calls or hang up on pollsters, the validity of such polls can be severely compromised.

Another issue is that people may simply lie. This can happen when the results aren’t sufficiently confidential or when respondents don’t wish to seem mean, prejudiced, or otherwise unpleasant. It’s also possible when describing affinities for things people have no talent for. Respondents may enjoy the idea of singing popular music in a band, but it’s not going to matter much if they can’t carry a tune in a bucket. (I experienced a couple of these in a career assessment survey in high school. I don’t know if my responses skewed any wider results since the exercise was intended to illuminate our own interests and abilities, but if they were trying to do anything else it couldn’t have been good.)

A major problem with polling, especially in certain subjects, is that the questions may tend to lead subjects toward certain responses. This may not be purposeful, in which case testing, review, and revision should be applied to correct any deficiencies, but we are probably all aware of polls that are structured to drive public opinion rather than accurately reflect it.

Posted in Tools and methods | Tagged , , , , | Leave a comment

Mind Mapping

A mind map is a particular type of diagram used for taking notes, organizing thoughts, and understanding hierarchies. Business analysts can use any type of diagram that aids understanding and communication between participants in any engagement. (My website is littered with different kinds of diagrams.)

Mind maps, however, are a very specific type of diagram. It is essentially a tree diagram that tends to be laid out in a radial pattern. These diagrams can incorporate many elements and variations to enhance clarity and information content, including colors, images, shapes, text formats, line styles and thicknesses, and probably more. The Wikipedia article on the subject includes many examples, plus additional history and background. A salient feature of a mind map is that it cannot have any cross-links. Diagrams that include those (and other features) are often referred to as concept maps.

Numerous software applications for drawing and editing mind maps exist for a variety of environments and devices.

Many diagram types can be formatted as mind maps, as in the following example.

Fishbone diagram recast as a mind map:

I first encountered mind mapping techniques when I worked with a PhD principal investigator at one of the national labs around 2007 or so. As someone who prefers using unlined paper for taking notes, as a way to remove constraints on where and what type of information (text, diagrams, tables, relationship links) can be included, I was naturally curious about the idea. But, I have never been sufficiently curious to actually use any example of it.

Studies of the effectiveness of the technique seem to indicate that it doesn’t bring major benefits in certain note-taking situations, but any techniques that work for any person should be used. And, as the examples in the Wikipedia article demonstrate, they sure can make some pretty pictures!

Posted in Tools and methods | Tagged , , , | Leave a comment

Functional Decomposition

My first engineering job was as a process engineer in the paper industry, where I designed and analyzed large industrial systems that ran this…

through this…

to make this…

We can break the system down (or build it up) like this…

using components like this…

and this…

which can further be broken down like this…

…into as much detail as you’d like to get into.

When tracking items through an engagement…

they can be decomposed as requirements are more elaborated and defined.

Remember that RTMs must crosslink horizontally across phases and vertically to define the logical, hierarchical relationships of the solution elements.

Here is a high level example of a hierarchical breakdown of a large system. Imagine this being rotated ninety degrees counter-clockwise and plotted vertically in the RTM shown above.

If a system can be described by known equations, each term can be analyzed in terms of identifying every possible effect that could make any individual variable larger or smaller, and also considering terms that may drop out entirely. The first equation is explicit and formal while the latter serves as more of a mnemonic.

This diagram shows the flow of calculations in a large spreadsheet, which is just another form of very long equation.

Behaviors and decisions can be analyzed down to very low levels.

Process models can be analyzed from the top down…

and from the bottom up. Multiple operations can take place within a location, station, or subprocess…

and those can be broken down in exacting detail.

Large systems can be broken down to understand contexts and details. Each element in the diagram below is its own, highly complex entity involving the work of multiple creators and integration of a myriad of materials and technologies.

Analyzing all aspects of a systemic capability, potentially across multiple products, can highlight commonalities and differences, and can help identify opportunities to plug gaps, regularize techniques, increase modularization, and so on.

This type of diagram is a common tool to perform root cause analysis. The categories of the “ribs” could be remembered using 5 Ms and an E.

Employing many different modes of decomposition gives many possible perspectives and insights. This helps ensure that analyses will be thorough and robust.

The BABOK identifies the following categorizations on the subject, many of which are discussed above. Please consult the relevant section of the BABOK for further details.

  1. Decomposition Objectives
    • Measuring and Managing
    • Designing
    • Analyzing
    • Estimating and Forecasting
    • Reusing
    • Optimization
    • Substitution
    • Encapsulation
  2. Subjects of Decomposition
    • Business Outcomes
    • Work to be Done
    • Business Process
    • Function
    • Business Unit
    • Solution Component
    • Activity
    • Products and Services
    • Decisions
  3. Level of Decomposition
    • Per the examples above, decomposition can continue down through as many levels as make sense for a given analysis.
  4. Representation of Decomposition Results
    • Tree diagrams
    • Nested diagrams
    • Use Case diagrams
    • Flow diagrams
    • State Transition diagrams
    • Cause-Effect diagrams
    • Decision Trees
    • Mind Maps
    • Component diagram
    • Decision Model and Notation
Posted in Tools and methods | Tagged , , | Leave a comment

Data Modeling

From the BABOK:

A data model describes the entities, classes or data objects relevant to a domain, the attributes that are used to describe them, and the relationships among them to provide a common set of semantics for analysis and implementation.

I’ve written about data in many contexts, but I usually start by pointing out that data is identified through the processes of discovery, which identifies the nouns and verbs of a process (the BABOK refers to these as entities), and data collection, which describes the adjectives and adverbs of a process (the BABOK refers to these as attributes). The BABOK further describes relationships or associations between entities and attributes (entities-attributes, entities-entities, attributes-attributes). Finally, this information is often represented in the form of diagrams.

Different types of data models are generated during different phases of an engagement (per my six-phase, iterative framework).

The conceptual data model is created during the conceptual model phase (oooh, there’s a shock!). This shows how the business thinks of its data, and these diagrams are produced as the result of the discovery and data collection processes mentioned above. This work may be folded into other phases if the engagement is meant to build something new, as opposed to modifying (or simulating) something that already exists.

The logical data model is typically developed during the requirements and design phases. This is an extension or abstraction of the conceptual data model that describes the relationships and rules for normalization that help govern and ensure the integrity of the data representation.

The physical data model is defined during the implementation phase. This shows how the data is physically and logically arranged in memory, files structures, databases, and so on.

There are many ways to list and describe data in diagrams.

This diagram shows the nouns and some implied verbs of a system, sometimes using slightly different verbiage.

Here is a representation of the attributes associated with each identified entity.

Here is a simple representation of the physical location of data in an implemented system.

The header listing below shows a detailed description of the shared memory area from the diagram above.

Here is a more complicated and explicit representation of data in a database.

image linked from a paper on researchgate.net, ma be subject to copyright

The BABOK describes two specific types of diagrams, an Entity-Relationship Diagram using Crow’s Foot notation, and a Class Diagram from UML. I recommend researching these two types of diagrams as questions about them may arise on the CBAP exam and other exams.

Posted in Tools and methods | Tagged , , , | Leave a comment

Prototyping

This BABOK technique involves creating something that allows investigation of one or more aspects of the solution being developed. This creation can be physical, in the case of mock-ups meant to illustrate a concept or explore ergonomics or test a subsystem or plan manufacturability, or abstract, in the case of diagrams or storyboards or process descriptions or user interface designs.

Prototypes can be produced as throw-aways, which means they are only temporary creations. How many ventures started from drawing on a napkin in a restaurant? They can be functional, which means they actually perform at least some aspect of the end solution. Many famous examples of these can be found in museums, up to complete experimental aircraft. A series of prototypes can be created as the proposed design evolves over multiple iterations. Think of all the prototypes made to test the thousands of items needed for the moon landings.

They can be used to demonstrate a proof of principle or proof of concept. These are created to explore the new applications of tools, technologies, discoveries, or arrangements. Sometimes the tools or technologies are being used for the first time by anyone, but usually they are just used by teams to test fitness for the present purpose, or to demonstrate that the team can use them. A coworker of mine built a mock-up of a novel walking mechanism for a steel reheat furnace — out of cardboard, pipe cleaners, and toothpicks. It allowed all viewers to quickly grasp its simplicity. Moving the walking carriage by hand inside the outer shell clearly demonstrated that a simple mechanical movement could robustly and reliably achieve the desired results in a harsh industrial environment.

Prototypes can be created to explore the usability of solutions by their intended customers, including of software GUIs and physical interfaces on things like consumer electronics. GUI mock-ups can be created with varying degrees of functionality and visual appeal using tools like rapid builders (Borland’s GUI tools were terrific for this), Balsamiq, Visio, or whiteboard drawings. The original computer mouse was made of wood.

Some prototypes test the visual aspects of a proposed solution, including color, arrangement, font size and shape, and other visual cues. Examples include signage, warning labels, product packaging, and industrial design. Some clever (read: cynical and slightly evil) manufacturers realized that making potato peelers with brown handles made users more likely to throw them away with their potato peels, so they would have to buy more more!

Functional prototypes allow testing the operation of a proposed solution in whole or in part. Think of docking ports on spacecraft, or of computer algorithms.

Models and simulations are a potentially powerful, and also potentially complex form of prototyping. One of my past companies used a version of my suggestion for its advertising slogan: “We do it a thousand times so you do it right the first time.” These can involve 3D models, process models, visual models, and so on.

Prototypes can be used in every part of the engagement and product life cycle.

Posted in Tools and methods | Tagged , , | Leave a comment

Scope Modeling

I think an argument can be made that this technique should more properly be called “scope determination” or “scope identification.” I also observe that determinations of scope happen during the conceptual modeling phase, when you discover what’s happening in an existing process, or during requirements and design, when defining aspects of the solution. However, since those can all be thought of as forms of modeling, maybe I can live with it.

I’ve discussed elements of scope determination previously (here and here). The BABOK goes into a lot of detail in its section on scope modeling, but in the end I think it really boils down to whether something is either in scope or out of scope of your work, or within one part or another of your work.

The BABOK mentions the contexts of control (who does or is responsible for what), need (who needs what), solution (what part of the system does what), and change (what does and does not change), and under relationships between logical components lists Parent-Child or Composition Subset, Function-Responsibility, Supplier-Consumer, and Cause-Effect, but I think of these as points of interface that fall out of the normal course of work. Knowing where to draw the boundaries between different components, functions, subsystems, organizations, teams, microservices, and so on is a crucial part of understanding and architecting systems.

The BABOK also lists emergent as a type of relationship, to describe the possibility that unexpected behaviors can arise from the interactions within complex systems. While this is undoubtedly true, I don’t see that it has much to do with scope.

Another aspect of scope, in my experience, has to do with the phase of assumptions, capabilities, and risks and impacts, which implicitly proceeds in conjunction with the conceptual model, requirements, and design phases. And this primarily concerns assumptions. An effect or data source may be within the accepted scope, but there may be valid reasons why you need to simplify mechanisms or assume values.

Posted in Tools and methods | Tagged , , , | Leave a comment

Data Dictionary

Maintaining a dictionary of data items is a very good idea for engagements of sufficiently wide scope and involving enough participants. This greatly simplifies and clarifies communication and mutual understanding among all participants, and should fall out of the work naturally while continuously iterating within and between phases.

Data can be classified in many different ways, and all of them can be included in the description of each data item.

The context of data (my own arbitrary term) describes the item’s conceptual place within a system. (I described this in a webinar.)

  • System Description: data that describes the physical or conceptual components of a process (tends to be low volume and describes mostly fixed characteristics)
  • Operating Data: data that describe the detailed behavior of the components of the system over time (tends to be high volume and analyzed statistically); these components include both subprocesses within the system and items that are processed by the system as they enter, move through or within, and exit the system.
  • Governing Parameters: thresholds for taking action (control setpoints, business rules, largely automated or automatable)
  • Generated Output: data produced by the system that guides business actions (KPIs, management dashboards, drives human-in-the-loop actions, not automatable)

The class of data (my own arbitrary term again) describes the nature of each data item.

  • Measure: A label for the value describing what it is or represents
  • Type of Data:
    • numeric: intensive value (temperature, velocity, rate, density – characteristic of material that doesn't depend on the amount present) vs. extensive value (quantity of energy, mass, count – characteristic of material that depends on amount present)
    • text or string value: names, addresses, descriptions, memos, IDs
    • enumerated types: color, classification, type
    • logical: yes/no, true/false
  • Continuous vs. Discrete: most numeric values are continuous but counting values, along with all non-numeric values, are discrete
  • Deterministic vs. Stochastic: values intended to represent specific states (possibly as a function of other values) vs. groups or ranges of values that represent possible random outcomes
  • Possible Range of Values: numeric ranges or defined enumerated values, along with format limitations (e.g., credit card numbers, phone numbers, postal addresses)
  • Goal Values: higher is better, lower is better, defined/nominal is better
  • Samples Required: the number of observations that should be made to obtain an accurate characterization of possible values or distributions
  • Source and Availability: where and whether the data can be obtained and whether assumptions may have to be made in its absence
  • Verification and Authority: how the data can be verified (for example, data items provided by approved individuals or organizations may be considered authoritative)
  • Relationship to Other Data Items: This involves situations where data items come in defined sets (from documents, database records, defined structures, and the like), and where there may be value dependencies between items.

It is also important to identify approaches for conditioning data. Complete data may not be available, and for good reason. Keeping records is sometimes rightly judged to be less important than accomplishing other tasks. Here are some options for dealing with missing data (from Udemy course R Programming: Advanced Analytics In R For Data Science by Kirill Eremenko):

  • Predict with 100% accuracy from accompanying information or independent research.
  • Leave record as is, e.g., if data item is not needed or if analytical method takes this into account.
  • Remove record entirely.
  • Replace with mean or median.
  • Fill in by exploring correlations and similarities.
  • Introduce dummy variable for “missingness” and see if any insights can be gleaned from that subset.

More considerations for conditioning data include:

  • Data from different sources may need to be regularized so they all have the same units, formats, and so on. (This is a big part of ETL efforts.) Note that an entire Mars probe was lost because two teams did not ensure the interface between two systems used consistent units.
  • Sanity checks should be performed for internal consistency (e.g., a month’s worth of hourly totals should match the total reported for the month).
  • Conversely, analysts should be aware that seasonality and similar effects mean subsets of larger collections of data may vary over time.
  • Data items should be reviewed to see if reporting methods or formats have changed over time.
  • Data sources should be documented for points of contact, frequency of issue, permissions and sign-offs, procedures for obtaining alternate data, and so on.
Posted in Tools and methods | Tagged , , , , | Leave a comment