New-Age Five Questions >> Find the Time That is Money by Asking Duration Questions
By Richard G. Lamb, PE, CPA, ICBB; Analytics4Strategy.com
Any operational process is a network of stages such as shown in Figure 1. An action item enters each stage and remains for some duration and then exits when the action is completed. The collective durations across the stages have a direct connection to the firm’s reported earnings and return on investment (ROI).
It is also noteworthy that the optimization of process stages has been largely left untouched by operational excellence projects. The reason is that the practices for duration analytics and management have not yet been brought into the practices of process improvement. Traditional practices to assess stages cannot give us the insight to truly work the problem. They report backlog, average time in the stage and aging. The response to the findings has been limited to removing the laggards from the backlog until some average time is reached. That is just not nearly good enough!
This article explains how to ask and answer duration questions along the stages of an operational process. Of course, the first question is which of the typically many stages most touch the firm’s financials. Once identified, they are explored for redesign and control. This article will explain how the exploration is done.
There are three facets of exploration. First, is to view and evaluate the “baseline shape” of duration and exiting events in a stage. Second, is to identify the process variables that are explanatory to the baseline shape. Third, is to look at duration with respect to multiple exiting events.
The questioning of process stages is done with analytics variously called duration, event history, and survival and hazards. Regardless of name, the article will begin by giving its readers the fundamental necessary grasp of what is actually being modeled. It will then explain the three facets of exploration.
However, as always the explanation we all want in the end is how to conduct the exploration. The article, “DMAIC Done the New-Age Way,” explains how the duration questioning of this article, is one of five core types of questioning (relationship, difference, time series, duration and apparency.), are woven into the stages to define, measure, analyze, improve and control processes. Although presented in the context of DMAIC, the explanation is universal to any problem solving sequence.
How It Works
Asking and answering duration questions of process stages is done with regression. However, regressions for duration have a special nature in order to deal with the reality of stages.
The outcome of some action items is not yet known. Others may be removed for a myriad of reasons rather than one. We may lose track of some others.
Furthermore, the outcome variable of duration is not a number or probability. It is a construction of enter, exit and event.
The menu of regression models for duration is non-parametric and parametric. “Non-parametric” means that no particular probability distribution is required to make the fit. All models are available in the software “R” (https://www.r-project.org/).
The non-parametric models serve most purposes. In fact, they are used to confirm the fit of the parametric models. They are the Cox Regression (coxreg), Cox Proportional Hazard (coxph), Cox Mixed Effects (coxme), and Cummulative Incidence (cminc) models.
The core parametric model is the Proportional Hazard Regression (phreg) model. The phreg model allows us to fit the event history to Weibull, Lognormal, Loglogistic, Extreme Value, Gompertz and Piecewise Constant Hazard distributions. The Piecewise Constant Hazard distribution tends to work when the other distributions cannot be made to fit. All are validated against the non-parametric models for fit by overlaying the respective parametric cumulative hazard plot on the non-parametric plot.
All of the models conduct the same calculation and the best fitting one is selected from amongst them. Accordingly, the calculation and associated evaluations will be the subject of the article rather than the individual models.
The first calculation is to determine what the chance (hazard) is of an exit (event) once an item has remained in the stage a length of time such as days. At each event occasion, the number of occurring events is divided by the number of action items in the stage “just” before the event occasion. The second calculation is simply a running total of the first calculation. The third calculation rolls the first calculation over to compute the probability of action items remaining in the stage until “just” before each event occasion. The full suite of calculations is demonstrated in Figures 3 and 4.
However, before the calculations are executed by modeling, the improvement team must make a two-dimensional decision. As shown in Figure 2, the team must establish the age and calendar window for its analysis. Both will be set as it is relevant to the exploration at hand. For example, we may want to exclude rush cases from the study.
In the figure, the box represents two explorations made at different calendar time spans. Each starts and ends with a date. Ages of the items included in the studies are 0 to 40 days for the first study and 6 to 40 days for the second.
In both cases we see action items that entered the stage before the study window, but had not yet exited the stage when the study began. We see others that entered the study window after the study began.
In the second of the windows, some items entered the stage during the study, but were not included in the study because they had an exit event before the age set at six days. In contrast, they would have been included in the first study.
With respect to the windows, the figure shows that some items had an exit event (+). Others did not because they did not exit the stage before the study ended. They are call censored (c). Action items that left the study for a reason beside the study’s exit event are also marked as censored. The solid lines in Figure 2 are the response variable to the duration regression—enter age, exit age and either “+” or “c.”
Figure 3 depicts the next two steps that take place in the model. The left side shows the duration of a hypothetical set of five items with three events that occurred in a study’s frame. Shown are the calculations of hazard (chance) for the event occasions at days 1, 4 and 6.
The right side of Figure 3 shows what is called the hazard function—we could call it the “chance” function. The individual calculated hazards roll over to the hazard function as shown. Now we see a plot of the probability of an event at each event occasion. It is the first of three functions.
Next, behind the curtain of the model, the cumulative hazard and survival functions are constructed from the hazard function. Figure 4-left shows that the individual hazards are added to form the cumulative hazard function. Figure 4-right shows that the individual hazards also roll over to form the survival function. The calculations are shown for each function.
These are the principles of calculation behind the durations questions. They operate behind the curtains of every one of the previously listed models. If you understand the principles, then you are in a position to ask and answer duration questions of the stages of an operational process.
The As-Is and To-Be Baseline Shapes
Now we understand the three functions and the source individual durations upon which they are built. The process improvement team can evaluate the percent of items that remain in the subject stage for some duration and the chance of exiting the stage ”just” after the duration.
The insight allows us to ask and answer questions directed at the “as-is” and “to-be” baseline shapes of the three functions. To jump the gap, the firm will explore and redesign the operational processes that are expressed in the shapes.
Figure 5 shows what the is called the baseline shapes of the survival, hazard and cumulative hazard functions. The baseline for the as-is and to-be shapes are generated just as explained in the previous section.
Our first questioning is directed at what the as-is shapes tell us. We see a stage in which virtually 100 percent of the action items exit by day 30 to 35. The chance of exit increases slowly with time, but sharply after 30 days. However, leading up to the sharp change, the slower but apparent rate of change seen in the survival and cumulative hazard plots indicate randomness that is contradictory to a first-in, first-served process.
This insight into the as-is begs questions for what the shape of the curves should be. Do we want the currently experienced duration until almost 100 percent exit is to be our case? Is the duration of 30 to 35 days acceptable if the action items in the stage can actually be served in about 25 days?
The next case-specific question is if the downward shape of the survival plot is what we want? Instead, do we want to improve the operation to much stronger indicate a first-in, first-served operation?
A third issue appears in the two hazard functions. Can we accept that the oldest action items seem to become stagnant rather than increasingly more likely to be served? The as-is shows the chance of exiting turns down rather than upward—increase at a decreasing rate.
Although not the case of Figure 5, there could be another observed pattern. Some items may be given a “rush” treatment upon entering the stage. That would appear as a spike in the first several days of the hazard function. A spike followed by a steady decline in the survival plot may show that the stage is being gamed rather than following the operational rules of engagement.
Ultimately, the team will define the to-be shape and, in turn, improve the operational processes as needed to make it happen. Periodically the analyses will be done as a control to assure that the process improvements actually work and are being complied with.
The to-be shapes are overlaid on the as-is plots of Figure 5. The overlays depict a first-in, first-served operation in which rushed and expedited action items are strongly controlled.
The to-be survival plot has a very small decline until late on. At that time, the chance of exit events increases at an increasing rate. Almost 100 percent of the items will not remain in the stage beyond 25 days.
How the Variables Play
The previous section focused on interpreting the shapes of the baseline functions. The purpose of the insight is to specify the “to-be” shape of the baseline with respect to the impact of the stage on the firm’s financials.
The natural next step is to redesign the process stage. Now we need to know which variables across the system of operational processes are significantly related to the chance of stage events? In other words, which are related to the event calculations shown in Figure 2-left.
Within the significant variables where are the splits between survival functions—multiple distributions? Are any of the variables interrelated? Do any of the categorical variables have dependence within their levels while being independent across variables.
We make the determinations by regression modeling. Figure 6 shows the case of exploring a single significant categorical variable with two levels.
To readily reveal what the figure shows, we would build a model one variable at a time. The message of the graph is that Grade is significant to the event calculation. The Analyst II grade has a shorter survival function than Analyst I—serves and completes the action items in queue sooner. In technical-speak, the survival functions have different probability distributions.
Notice the trend between the two plots. A separation begins and grows starting at 0 until approximately 28 days of duration. Accordingly, we would want to explore the respective grades for opportunities to reshape the baseline.
What is seen in the plots is confirmed with the regression model. A snippet of the regression report is included in the figure. It shows that the variable is significant to the stage’s operational duration. There is a significantly different relative chance of an event with respect to the baseline. This is verified by the p-value (less than 0.025). The magnitude of the difference is quantified by exp(coef). The “Score (logrank) Test” reports a significant difference between the survival functions to each grade.
Figure 7 shows what the alternative looks like. It is a categorical variable with three levels. Notice the levels overlay until about day 20 and, thence, sloppily so.
The snippet from the regression report confirms the visual assessment. It shows that the variable is not significant to the stage’s duration. Therefore, the variable is not a candidate through which to reshape the baseline functions.
The comparison of Departments B and C to Department A have a p-value too large to indicate a significantly different relative chance of an event with respect to the baseline. Meanwhile, the p-value of the Score (logrank) Test is too large to indicate survival functions with different probability distributions.
We have shown the graphic analysis of categorical variables to test for significance and the splits for multiple survival distributions. What if we attached a continuous variable formed with variables from the human resources system to explore if there is a relationship between analyst time-in-grade and duration?
To answer the question, we would still read the p-value for relative chance as the measure of significance to duration in the stage. However, if we wish to explore for the splits to multiple distributions, we must transform the continuous variable to be a categorical variable. This is done by cutting the range of the continuous variable into logical distribution groups and then subjecting them to the same analysis as we did for grade and department.
Ultimately, all variables would be combined in a single model. The objective is to test and prune variables for significance in the context of all legitimate candidates.
This also allows us to explore whether there are interactions between the variables. Between variables, are there different relative chances of an event along and at the levels of the significant variables? If so, the team will explore the interactions for insight and design boundaries.
A final exploration may be in order. Are the relative chances of an event actually independent or dependent with respect to the levels of one or more of the process variables? An example is events that are not independent of department.
This question is answered by trial and error comparison of “battling” regression models. The fit with the Cox Mixed Effects model is built one scenario at a time and compared to a parallel non-mixed-effect model. The better or the equality between the comparative fits tell the story.
If mixed effects are determined, the baseline can be stratified on the distinction. For example, stratify on departments and evaluate each separately.
Competing Events in the Stage
So far, we have gained insight with respect to stages with a single exiting event. However, it is conceivable that there are stages with more than one exiting events. Alternately, we may want to evaluate for departures from exit policies.
Now we are probing what is called competing events. If one event occurs, none of the others can occur.
Figure 8 is such a case. There are two competing events—first-in, first-served (FIFS) and expedited. The events are also grouped by department.
Consequently, there are newly possible questions to ask of a process stage. Is there a significant number of expedited cases as compared to the mandated first-in, first-served policy? Is the level of expedited cases acceptable for each department? Is there a pattern of rush exits in the first two days? Are the first-in, first-served cases exiting in a timely manner?
From the answers, we may want to go down the trail already explained for single-event stages. We can look into the plots singly or in combinations around single events. Thence, the team can assess the curves as baselines, multiple survival distributions, interactions and mixed effects.
We now know there is an analytic for gaining deep insight out of asking and answering duration questions of the stages along any operational process. Imagine how your firm’s earnings and ROI can be effected when the power of insight is targeted on the stages throughout an operational process which are the critical or constraint paths in operational performance.
Sources for self-directed learning: Discovering Statistics Using R, Field and Miles, 2012 | Multilevel Modeling Using R, Holmes, 2014 | Machine Learning with R, Lantz, 2015 | ggplot2, Elegant Graphics for Data Analysis, Wickham, 2016 | Introductory Time Series with R, Cowpertwait and Metcalfe, 2009 | Event History Analytics with R, Bostrom, 2012 | Package “tsoutliers,” Javier López-de-Lacalle, 2017