Archive for the ‘Statistical Tools for Six Sigma’ Category

Kurtosis

Wednesday, February 23rd, 2011
Kurtosis

Old Timer Kurtosis Equation

Kurtosis again

Modern Equation for Kurtosis

A student asked me if the kurtosis of the normal distribution was 0 or 3. It seems that I’d said both at different times. Strangely, the answer is that kurtosis of a normal distribution is sometimes 0 and sometimes 3. If you look at Wikipedia you see this old-timer equation for Kurtosis, where μ4 is the fourth moment about the mean and σ is the standard deviation. As Wikipedia explains, this is sometimes used as the definition of kurtosis in older works. Wikipedia goes on to say kurtosis is more commonly defined as the fourth cumulant divided by the square of the second cumulant, which is equal to the fourth moment around the mean divided by the square of the variance of the probability distribution minus 3, which is also known as excess kurtosis.The “minus 3″ at the end of this formula is often explained as a correction to make the kurtosis of the normal distribution equal to zero.

Regardless of whether you use 0 or 3 as the number representing the kurtosis of the normal distribution, kurtosis is a measure of the peakedness or flatness of the distribution. A high kurtosis distribution has a sharper peak and longer, fatter tails, while a low kurtosis distribution has a more rounded peak and shorter thinner tails. Distributions that are more peaked than the normal are called leptokurtic, while those that are flatter than the normal are called platykurtic. To help you remember the difference, think of the platypus, a flat critter if ever there was one. Platykurtic. The kangaroo, which loves to leapto (groan) strikes a much more peaked pose!

As for my answer to the student, it seems obvious (at least to me) that I used 0 back in the old days, and 3 now. Makes sense, right? Either that, or I made a mistake.

Platypus

Platykurtic animal

Kangaroo

Leptokurtic Animal

GD Star Rating
loading...

Why I Hate Hypothesis Testing

Tuesday, February 8th, 2011

If it were up to me, statistical hypothesis inference testing would be entirely replaced by confidence intervals. Both methods provide exactly the same information, however:

  • Confidence intervals are graphical
  • Confidence intervals are able to compare more than two hypotheses
  • In any real-world situation the null hypothesis is always false. If your sample size is large enough you will always reject the null hypothesis. This often leads to positively silly behavior, such as many government regulations. Newspaper headlines are even sillier.
  • The null hypothesis (literally, the hypothesis that there is no difference) is boring and uninteresting. Hypothesis testing promotes poor science by encouraging researchers to run one experiment and compare the results to this boring alternative. It would generally be better practice to develop and compare several hypotheses with each other.
  • Confidence intervals are less confusing to students, lay persons, and quite frankly most statistics instructors

If you do some research you’ll find quite a body of literature complaining about the hypothesis testing approach. This is my small contribution to that cause.

GD Star Rating
loading...

Has the Process Mean Changed?

Tuesday, February 8th, 2011

Here’s an exercise from Pyzdek Institute Green Belt training. At a pharmaceutical company they have developed an IV drip device that has an advertised drip rate of 5 drops per minute. A sample of 10 “drippers” is taken from the process and tested by counting the number of drips that occur during a 10 minute span. The average for each dripper is found by dividing the total drops by 10. The results are (average drops per minute):

4.9
5.1
4.6
5.0
5.1
4.7
4.3
4.7
4.6
5.0

Use the t test to conduct a test of hypothesis and answer this question at a 95% confidence level: “Is the process producing IV drip devices that average 5 drops per minute?” Also use confidence intervals to answer the same question.

This video shows a way to answer these questions using the QI Macros software.

GD Star Rating
loading...

Non-normal capability and negative z scores

Thursday, October 28th, 2010

Here’s a question from a Pyzdek Institute Green Belt student:

Question: I’m currently looking at data for my Green Belt project.  The primary metric is turn-around-time, and the data is non-normal. I have run the Distribution ID plot and determined that the closest fits for my data are a Wiebull distribution and a Normal Distribution with a box-cox transformation with a Lambda of 0.15. I was wondering which you believe would be better to calculate my process capability? The process is highly variable and is not currently monitored across the value stream, leading to huge variation.  A USL of 100 days has been dtermined through VOC, using this I get the following Z.Bench: -0.16 using the box-cox transformation and -0.18 using the Wiebull. Could you please advise what is the best course of action for determining the sigma level of this process (fully expected that it is low due to the nature of how the process is administered)?

Answer: Any time you have a negative Z score, things are pretty bad. I presume that the error rate is very extreme. Many Six Sigma experts suggest that you not bother with Z scores or sigma levels for extremely poor processes. I’m okay with just knowing the historical error rate in this case and not asking for the Z score or historical sigma level.

As far as whether the Weibull or Box-Cox is the better choice, it really doesn’t matter, although you might want to know the p-value for the fitted curve for each (I.e., is the lack of fit significant?) Both methods are examples of what is called “curve fitting.” Both are just trying to find equations for curves that do a decent job of describing the distribution of the data. In scientific and engineering work equations are based on known understanding of how nature works, which is superior to mere curve fitting. In business we usually must rely on curve fitting to describe our processes and our data, but there’s no real reason to prefer one curve fitting method over another as long as they both do a good job of fitting the data.

GD Star Rating
loading...

A capability index question

Thursday, October 21st, 2010

The following question came to me from a student in my online Six Sigma Black Belt course:

QUESTION: I am calculating Ppk’s for various processes and some values are, for example -1.2 and 1.2. Is it acceptable to take the absolute value for -1.2 and say that both process performances are equal even though the values are less than the desired value of 1.33? My second question is how to interpret various Ppk values of, say, 0.1 and 0.6? I would say the process is not capable of meeting the specification. Is it possible to differentiate in words small differences in Ppk values?

ANSWER

No, you can’t consider the absolute value to be equivalent. Ppk = -1.2 indicates that the process average is outside of one of the specification limits. Ppk = +1.2 indicates a process that is vastly better than Ppk = -1.2. Some texts assume that both (USL-xbar) and (xbar-LSL) are positive, but this would simply mean that you need not bother doing capability analysis if one of these is not positive because the process is so poor that capability analysis serves no purpose.

As for comparing capability indexes that are relatively close to one another in magnitude, you need to be able to calculate the confidence intervals of the capability indexes to make the comparison. This article describes the calculations. It gets a little deep and we don’t require that Black Belts learn this. It’s more in the domain of Master Black Belts, Quality Engineers, and Statisticians.

Rather than learning the advanced math for problems like this I like to suggest that Black Belts (and MBBs for that matter) use resampling. This involves setting up a spreadsheet with your raw data, calculating whatever statistic you’re interested in from the raw data, then randomly sampling the raw data with replacement and calculating the same statistics on the resampled data many times, say 100 or more. The range of the statistic in the resamples gives a valid confidence interval and it is more precise the more samples you use. To use this approach with your problem, create a spreadsheet with the raw data and calculate the difference in the two Cpks (0.6-0.1=0.5). Then resample and calculate the differences for each resample 100 or more times. The percent of times you get a difference greater than 0 is an estimate of the confidence you have that there is a real difference between the two processes.

Here’s a spreadsheet that uses resampling to estimate the confidence interval on Cpk. The pseudo-lower-confidence-limit on Cpk is in the Cpk Min cell, the upper is in the Cpk Max cell. I used a test data set with 50 values from a random normal universe (mean=100, sigma=10) and 100 resamples for these calculations. If you’re running excel 2003 you can repeatedly press F9 to see how these limits change with a new resampling set.

GD Star Rating
loading...

How to Lie With Statistics-Schwab Chart

Sunday, May 30th, 2010

Chart of schwab fund's investment resultsNormally when the bottom axis represents time periods, the more recent time periods are on the right side. Not so with this graphic of Schwab U.S. Treasury Money Fund dated April 30, 2010. On this chart the most recent period is on the left, not the right. Nice try Schwab!

When using graphics in quality and process improvement, care must be taken not to inadvertently “lie” with statistics. Sometimes even the best get it wrong.

GD Star Rating
loading...

Wikipedia Entry Quality Causes Assessed

Friday, March 5th, 2010

Two researchers at the University of Arizona performed a study to determine why some Wikipedia articles rate high in terms of quality, while others score lower. Eller College of Management Professor Sudha Ram and Jun Liu, a graduate student, have found that entries on Wikipedia – the world’s largest open-access online encyclopedia – gain greater quality with contributions from people in many different roles. Sudha Ram, a UA’s Eller College of Management professor, co-authored the article with Jun Liu, a graduate student in the management information systems department (MIS). Their work in this area received a “Best Paper Award” at the Workshop on Information Technology and Systems held in conjunction with the International Conference on Information Systems, or ICIS.

Wikipedia has an internal quality rating system for entries, with featured articles at the top, followed by A, B, and C-level entries. Ram and Liu randomly collected 400 articles at each quality level and applied a data provenance model they developed in an earlier paper. “What was missing was an explanation for why some articles are of high quality and others are not,” Ram said. “We investigated the relationship between collaboration and data quality.”

To generate the best-quality entries, she says, people in many different roles must collaborate. Ram and Liu suggest that the results of this study should spark the design of software tools that can help improve quality. “A software tool could prompt contributors to justify their insertions by adding links,” she said, “and down the line, other software tools could encourage specific role setting and collaboration patterns to improve overall quality.”

GD Star Rating
loading...

Carbon Cycle Feedback Effect Adjusted Downward

Friday, January 29th, 2010

In a letter published in the journal Nature (Nature 463, 527-530 (28 January 2010)) entitled “Ensemble reconstruction constraints on the global carbon cycle sensitivity to climate” the authors discuss the processes controlling the carbon flux and carbon storage of the atmosphere, ocean and terrestrial biosphere. These processes are likely to provide a positive feedback leading to amplified anthropogenic (i.e., human caused) warming. But the magnitude of the climate sensitivity of the global carbon cycle and thus of its positive feedback strength, is under debate, giving rise to large uncertainties in global warming projections. The paper describes a study designed to quantitatively estimate the feedback parameter, γ, based on pre-industrial CO2 estimates based on “proxies” such as ice cores.

The authors conclusion:

“We find that γ is about twice as likely to fall in the lowermost than in the uppermost quartile of their range. Our results are incompatibly lower (P < 0.05) than recent pre-industrial empirical estimates of ~40 p.p.m.v. CO2 per °C, and correspondingly suggest ~80% less potential amplification of ongoing global warming.” (italics added.)

In short, the cabon cycle feedback effect is weaker than formerly thought by climate researchers. This will require a revision of the simulation models used to forecast climate change and will, in all likelihood, lower the projected impact of human activity on the climate. An amplification reduction in the 80% range could result in dramatically lower projected impact.

All models are wrong, some models are useful. Corollary: apply models with care and always temper their interpretation with sound judgment.

GD Star Rating
loading...

A Question of Sampling

Friday, November 20th, 2009

A reader asks

I want practice  to SPC method to know whether my production process is in control, in case of all data available is from batch to batch, is it rational to construct the sub-group based on batch to batch data? What conclusion can I get from batch to batch? Any suggestion? Thank you very much.”

The answer is, maybe. I’d need a more complete description of your process so I can figure out what you mean. For example, I don’t know if your process is chemical, mechanical, or electrical. I don’t know if batches are arbitrarily created by filling a container from a larger container. Et cetera.

The guiding principle is called rational subgrouping. Your control limits should compare long-term variability to limits based on short-term variability. The underlying premise is that in a stable process there won’t be any long-term variability unless something substantial changed in the process, i.e., a special cause. Usually this would mean basing your control limits on within-batch variation and plotting batch-to-batch results against these limits. However, for some processes this doesn’t work because there’s too little variation within a batch compared to between batches. For example, in a homogenous chemical solution the within batch variation may be miniscule. The solution in these cases is to use individuals control charts and base control limits on moving ranges from subgroups formed by consecutive observations. And if your data are autocorrelated (i.e., observations taken at close to the same time are correlated), then the sampling interval of the individuals chart will need to be adjusted.

Take a look at your process and see if this works for you.

PS: You may also wish to look at the article by John David Kendrick.

GD Star Rating
loading...

Jumping to Statistical Conclusions

Tuesday, September 8th, 2009

Have you attributed your results to the right base data?

It may come as a surprise that the biggest challenge facing black belts and master black belts is usually not in selecting the best statistical technique for analyzing a particular data set. Most statistical techniques work fairly well even if the underlying assumptions are not precisely correct. If a black belt supplements the numerical analysis with graphical evaluation, the chance of making grossly erroneous decisions is negligible.

A mistake that is far more serious–but far more common–is comparing the results of a study to the wrong base data. These “apples to oranges” comparisons often lead to poor decisions and, worse still, to inaccurate beliefs that can derail faith in the Six Sigma approach itself. A recent incident with a client brought this point home for me.

The situation involved a project in the sales organization of a software company. The company had several sales teams and wanted to know if a new approach to closing the sale would improve the rate of closing sales. The company didn’t have a Six Sigma program, and the project was planned and carried out without black belts. The results were presented to management in a classic form: a bar chart (see Figure 1). The team had declared victory, and management–convinced by the “data”–prepared to revamp the sales training to incorporate the new approach companywide. All of the leaders looked forward to the bottom-line improvement they’d see from a 29-percent improvement in the sales closing rate.

Figure 1: Sales Closing Rate Improved by New Approach

All of the leaders, that is, except Lorraine. She’d received green belt training from her previous employer, and she’d seen enough black belt presentations to know that the analysis of the sales team was seriously flawed. It was undeniable that the project team’s sales close rate was 2.53 percent higher than the sales close rate for the rest of the sales department during the 16 weeks of the test, and, yes, the 2.53 percent did represent a 29-percent improvement over the 8.83-percent rate for the rest of the team. Despite these “facts” and the air of scientific objectivity surrounding the analysis, Lorraine had many unanswered questions. She asked management to delay any decision until she could explore these questions with a Six Sigma consultant. That’s where things stood when I entered the picture.

Table 1: Old vs.
New Closing Rates

Lorraine viewed the analysis as important because it would demonstrate that the Six Sigma approach could be applied in this service company, something that skeptical managers didn’t believe. In a meeting with the sales team leader, I was presented with the data shown in Table 1. As often happens, this summary data was all that was available; for a variety of reasons (but chiefly due to a time constraint) the number of sales calls used to compute these rates could not be obtained.

If you are a black belt or master black belt, or just statistically inclined, please take a couple of minutes before reading the remainder of this column to think about the data and jot down how you’d proceed from here.

When dealing with the data in Table 1, it’s tempting to apply a statistical technique such as a paired t-test to it. Using Microsoft Excel, it’s a simple matter to compute the t-statistic, which is 4.55, a highly significant result. Statistical purists would ask if the data are approximately normal and an endless variety of other technical questions about the data. I would argue, however, that all of this is premature and, ultimately, beside the point. The first order of business is to determine if we are comparing apples to apples.

Table 2: Apples-to-Apples Comparison

Further discussion revealed that the company had not two but nine sales teams, all of the same size. A further complication was that the teams sold different products. More probing uncovered the fact that four of the eight other teams sold a product mix similar to that of the team using the new closing method. At this point it appeared that, to make an apples-to-apples comparison, you would assess the results of these five teams for the 16-week project. Descriptive statistics are shown in Table 2.

Table 3: Data Groups

Further analysis using nonparametric methods indicated that there are three distinct groups in these data (see Table 3).

Table 3 presents a decidedly different picture than was originally given to management. The new closing method now appears to be no better than normal. Still, there are bright spots. Assuming that teams 5 and 8 aren’t oranges being compared to apples, potential gains should be possible from discovering why team 5 performs under the norm, and why team 8 outperforms the norm. More information might also be obtained by plotting the 16 weeks over time to identify trends and other patterns. Using the Six Sigma approach, the information can be converted to knowledge, the knowledge to action, and the action to an improved bottom line. It’s more work than the old standby, the bar chart, but it’s worth it.

The complete data file used in this article is posted at www.pyzdek.com/2000-05.xls . The challenge is to analyze the data in a number of different ways to determine how the different analyses would affect management decisions. Send your results to me for inclusion in a future column.

GD Star Rating
loading...