loading...
A LinkedIn discussion started by Tham Nguyen Khoa asks:
Why [are] control limits on control chart are [sic] drawn at 3s?
Control limits on a control chart are commonly drawn at 3s from the center line because 3-sigma limits are a good balance point between two types of errors:
Type I or alpha errors occur when a point falls outside the control limits even though no special cause is operating. The result is a witch-hunt for special causes and adjustment of things here and there. The tampering usually distorts a stable process as well as wasting time and energy.
Type II or beta errors occur when you miss a special cause because the chart isn’t sensitive enough to detect it. In this case, you will go along unaware that the problem exists and thus unable to root it out.
Are there any more reasons?
The discussion goes on at great length (48 comments at the time this is written,) but I’ll just post my comment here:
Things like type I and type II errors apply to enumerative statistics. Control charts are analytic statistical tools, so these terms do not apply here. Type I and Type II errors can be stated with precision because, as enumerative statistics, inferences based on them apply to a static population. Analytic statistics, in contrast, are used to make inferences about the future performance of a dynamic process. Errors related to inferences about the future can never be precisely calculated.
That being said, the idea that tampering occurs when a process that is not being influenced by special causes of variation is changed as if it were, and that tampering makes matters worse, is certainly true. When we want to determine if a special cause is present in a process, we make use of data to help us decide. No matter what the data show, there is always a chance that we mistakenly conclude that a special cause exists (or doesn’t exist.) It’s obvious that the further a data point is from the “norm,” the smaller the probability that we’ll mistakenly conclude that a special cause is present. Shewhart did not base control limits on precise calculations of Type I or Type II error. He based them on the fact that in practice engineers at Western Electric were able to easily identify the special cause of variation when observations fell 3 or more sigma from the long term mean. They were more challenged to find a special cause for observations closer to the mean.
Think about it like this: if you created a list of everything that caused a process to change even a small amount you would have a very, very long list. You could never pin down the one big thing from this long list, because there is no one big thing. But if you ask for a list of everything that caused a process to change a lot, say by 3 sigma, that list would be relatively short. In between these two extremes are changes of intermediate magnitude and lists that vary between the long “any change list” and the short “3-sigma change list.” Just where to draw the line depends on a large number of things, such as the cost of checking out the possible causes on the list, the cost of missing something, the frequency that changes of a given magnitude occur, etc.. As a default starting point we can use 3-sigma to trigger our special cause search, if for no other reason than this has worked pretty well for 93 years. But that doesn’t mean that it should be accepted as dogma. What we are solving for are lines (control limits) that minimize total costs. In the end, it’s a management decision, hopefully one that’s based on facts and data.
A student of mine had numerous questions about the various statistics used in Six Sigma. Here is my response to him in an open email:
The questions you are asking regarding “Where do these statistics come from?” require entire courses in statistics to answer. In Lean Six Sigma we take information from a dozen or so statistics courses, project management courses, psychology courses, business courses, mathematics courses, etc. and put it into an action framework that can be used to make fast improvements. We probably present less than 10% of the information you would receive if you sat through all of these courses, but we do so in less than 5% of the time it would take to complete all of these courses. It’s a tradeoff. We make the greatest compromises in the field of statistics. We discuss the use and interpretation of a select subset of statistics, and answer the question “where do these statistics come from?” by saying “they come from computer software.” While most are satisfied with this answer, some find the answer to be most unsatisfying. Judging from your questions, I suspect you are in the latter group.
Assuming you don’t have the time or the desire to take all of the courses relating to the Lean Six Sigma body of knowledge, but still seek answers to the specific statistics you asked about, I recommend the E-Handbook of Statistical Methods. This reference source is free and very comprehensive. It’s easy to search and will give you the answers you seek. For example, I searched on the term sum of squares, which you asked about, and the search returned pages on the half-normal probability plot (your question about fig. 10.26,) 1-way ANOVA (several of your question were about these calculations,) and several other related topics. A search on ss interaction provides answers to your question about the calculation of this intermediate statistic.
Sorry I can’t address all of your questions via email, but perhaps the reference above will start you on your way to answers. I had the same questions when I started learning about quality improvement nearly 45 years ago, and I am still looking for answers to questions today. Have fun!
Tom Pyzdek
I held a Webinar for Pyzdek Institute students entitled Statistical Surprises and Absurdities. Topics discussed included sampling bias, misused and misleading averages, distorting results by use of selective data weighting, selective reporting, missing information, distorted graphics, Say What? and So What? statistics, and much more! Here’s the recording
Here’s a link to the slides presented in the webinar.
The Pyzdek Institute has announced that it is giving away a complete Statistics course with registration for any of its Six Sigma or Lean Six Sigma Green Belt or Black Belt training courses. The statistics course, which includes 4 DVDs and two follow-along printed guides, consists of 24 lectures of 30 minutes each. Part 1 (12 lectures) covers all of the subjects commonly included with college introductory statistics course. Part 2 (12 lectures) explores a wide variety of applications of statistical methods.These challenging yet accessible lectures assume no background in mathematics beyond basic algebra. While most introductory college statistics courses stress technical problem solving and plugging data into formulae, this course focuses on the logical foundations and underlying strategies of statistical reasoning, illustrated with plenty of examples. Professor Michael Starbird walks you through the most important equations, but his emphasis is on the role of statistics in daily life, giving you a broad overview of how statistical tools are employed in risk assessment, college admissions, drug testing, fraud investigation, and a host of other applications.
This offer is good only while supplies last. Click here to register or to get additional details.
The Laney p’ Control Chart is an exciting innovation in statistical process control (SPC). The classic control charts for attributes data (p-charts, u-charts, etc.) are based on assumptions about the underlying distribution of their data (binomial or Poisson). Inherent in those assumptions is the further assumption that the “parameter” (mean) of the distribution is constant over time. In real applications, this is not always true (some days it rains and some days it does not). This is especially noticeable when the subgroup sizes are very large. Until now, the solution has been to treat the observations as variables in an individual’s chart. Unfortunately, this produces flat control limits even if the subgroup sizes vary. David B. Laney developed an innovative approach to this situation which has come to be known as the Laney p’ chart (p-prime chart.) It is a universal technique that is applicable whether the parameter is stable or not.

David B. Laney
David B. Laney worked for 33 years at BellSouth as Directory of Statistical Methodology. He is a pioneer at BellSouth in TQM, DOE, and Six Sigma. David’s p-prime chart is an innovation that is being used in a wide variety of areas. It is now included in many statistical applications, such as Minitab and SigmaXL. David is enjoying retirement with his family in the Birmingham, Alabama area.
Session #1, 1:00 PM Eastern Time. Click here to register.
Session #2, 7:00 PM Eastern Time. Click here to register.
Click here to view a video recording of David’s webinar.
Click here to access the NIST/SEMATECH e-Handbook of Statistical Methods. NIST is an agency of the US Department of commerce, so this work was undertaken at public expense. It covers literally every statistical tool used in Lean Six Sigma, and many, many more. It includes hundreds of case studies and examples. Best of all, it’s free! Enjoy!
In the movie “The Graduate,” the new graduate is told by a would-be mentor to remember only one word as he heads out into the world: Plastics. Times have changed. Hal Varian, the chief economist at Google says, ‘‘I keep saying that the sexy job in the next 10 years will be statisticians. And I’m not kidding.’’ Statistical methods are being used by a larger cross-section of people in a wider variety of industries than ever before. There are numerous reasons for this. Nearly everyone has what was once considered to be a supercomputer sitting on their desktop. Powerful statistical software is widely available, including popular packages like Minitab, JMP, SAS and SPSS, and extremely powerful free software. Oracle’s Crystal Ball software makes it possible to create a statistical distribution for any cell in a spreadsheet, making statistical simulation a snap. While becoming more sophisticated, the software is also becoming easier to use. Output is increasingly graphical and easier to explain to laypersons. The number of people trained in Lean Six Sigma methods is growing rapidly. There is an enormous amount of data saved in public and corporate data warehouses. The list goes on and on.
It isn’t necessary that all items on the list be checked off, but the list is useful in evaluating whether an activity qualifies as Statistical Engineering or if it’s merely another clever use of statistics. The important thing isn’t the label we apply, but the improvement that can be achieved by properly using statistical methods along with science and technology to achieve a challenging goal.
I’m an advocate of using the I-chart as the default control chart. If I am teaching statistical process control (SPC) and can only teach one chart, the I-chart is always the one that I teach. It’s the only control chart I cover in my Lean Six Sigma Green Belt training. It’s the only chart that I teach in Process Excellence Leadership training. It’s the only chart I use if the data I’m looking at are reasonably close to symmetric (note that I didn’t say “normal”,) unless I have some compelling need for greater sensitivity. I teach that the I-chart is the “Swiss army knife” of control charts.
But I still sometimes use other control charts.
Organizations don’t do SPC for the fun of it. They do it because it helps them achieve their goals. Organizations exist to produce things of value for the benefit of customers, investors, and employees. They do this by transforming inputs into outputs of higher value via processes. They can do this better if they minimize variability of outcomes, which can best be accomplished by controlling the sources of variation in the inputs and processes. This is where SPC comes in. SPC is a methodology that uses statistical guidelines to help separate “special cause” and “common cause” variation. If a special cause of variation exists, it signals the need to act. Special cause variation is defined as a change of such a large magnitude that its cause can probably be identified if looked for at once. SPC operationally defines such a change as a measurement result more than 3 standard deviations from the process mean for whatever process metric is being monitored.
A problem might exist if the process generates measurements that are highly skewed, even when it is not being influenced by special causes of variations. Such processes are quite common in the real world. For example, nearly all measurements produced by geometric dimensioning and tolerancing are skewed, as are measurements of time-based phenomena such as those encountered in services industries including the healthcare and hospitality industries. Highly skewed distributions produce a relatively high percentage of results more than 3 standard deviations from the mean even if no special causes exist. In other words, they produce many “false alarms” that will trigger a search for a problem when there is no problem. The false alarms may even lead to tampering, thereby causing a stable process to become unstable.
The skewed distribution problem is exacerbated by using I-charts. I-charts are relatively insensitive to moderate departures from normality, and very insensitive if the non-normality still produces a symmetric distribution. But for the data described above, this is not the case. If you use the I-chart for these data you will experience many false alarms. It’s just that simple.
The problem is to determine if a process is or is not being influenced by special causes of variation. A process distribution might appear as skewed because of special cause outliers, or because it naturally produces skewed data. The I-chart treats all data beyond 3 sigma as outliers; it doesn’t help you separate the natural, common cause process outcomes from special cause outcomes. Is the point beyond 3 sigma an outlying chicken, or a common cause egg? I.e., is the process being influenced by special causes, or only common causes? If the process data are naturally skewed you can’t answer this question using an I-chart.
The solution that I recommend is to begin your investigation with averages charts, also known as x-bar charts. Averages tend to have distributions that are approximately normal, even if the individual values are skewed. This means that, for a process with a skewed distribution that is not influenced by special causes, averages are much more likely to produce results that stay within 3 standard deviations of the mean than I-charts. It’s the best of both worlds: few false alarms, but still sensitive to special causes. If you have a nice run of subgroup averages without a special cause, plot a histogram of the data and see if the distribution looks skewed or symmetric. If the latter, you can use I-charts with confidence. If the former, stick with averages charts, or find a statistician or Master Black Belt to help you find a more advanced solution.
Before ending this article, I’d like to address another pet peeve of mine. I believe that too many teachers of SPC obsess on the need for normality. They confuse normality with the absence of special causes, also known as statistical control. I usually attribute this misunderstanding to a lack of experience with the real world, where normal distributions are so rare as to be virtually non-existent. By insisting on normality we encourage tampering and all of the problems associated with this approach to “process management.”
On the other hand, I am also impatient with people who insist that all non-normality be ignored. These individuals advocate using I-charts in all situations, regardless of the risk of false alarms. This attitude may also be due to a lack of experience. However, I’ve seen SPC lose its credibility when concerned process owners look for special causes over-and-over again without finding them. Like the boy who cried “Wolf!”, out-of-control signals become something to ignore. Eventually so does SPC.
My approach, which favors the I-chart but doesn’t make its use dogma, provides a rational middle ground.