Probability analysis is what is used to obtain levels of confidence. I already showed the probability analysis. Statistical significance testing is a probability analysis like that for the coin problems. It uses the experimental and control data to construct a
model that is used in place of the
population probability distribution (which is unknown) in a probability calculation. When you do a t-test you are calculating the probability of obtaining a
single value from a control
population. If the probability is low, like the probability go getting a tails from a >95% head coin, then you assign a 95% confidence to the result.
To see this it is useful to consider a limiting case: a t test done when the number of controls is very large.
Let Ns be the number of experiments that got the treatment and Np be the number of control experiments. The null hypothesis in this case is that the treatment had no effect. That is, the treatment sample is hypothesized to be equivalent to a random sample of Ns lots drawn from the control population. One might proceed by doing a t test with assumed equal variance.
In this case you calculate t as follows:
- t = (SA – CA)/Se
Here SA is sample average, CA is control average and Se is the standard error of the mean.
Se is given by:
- Se = Spool (1/Ns + 1/Np)^0.5
Here Spool is the pooled standard deviation of the controls (Sp) and treatments (Ss):
- Spool = {((Ns-1) *Ss^2 + (Np-1)*Sp^2)/(Ns + Np -2)}^0.5
For Np >> Ns, ((Ns-1) *Ss^2 << (Np-1)*Sp^2 and the first term can be neglected. Equation 3 can be approximated as
- Spool = { (Np-1)*Sp^2)/(Ns + Np -2)}^0.5
Similarly, for very large Np, (Np – 1) and (and Ns + Np -2) can be approximated as Np; equation 4 can be written as
- Spool = (Np*Sp^2)/Np)^0.5 = Sp
Substituting 5 into 2 gives
- Se = Sp (1/Ns + 1/Np)^0.5
For very large Np. 1/Np is negligible compared to 1/Ns and 6 can be written as:
- Se = Sp / Np^0.5
What this says is for very large Np, you can obtain a very good estimate for
sigma the standard deviation of the
population of all lots from the very large collection of historical lots. The degrees of freedom (dof) are so large that the t-distribution is no longer sensitive to dof, and there is no need to use the extra 14 dof from the experimental sample to improve the estimate of the population standard deviation (sigma). In this case a test of a sample of Ns lots would involve comparing the sample average to the control average using the control standard error in the calculation of the t statistic. A t-value of 1.5 means the two averages are 1.5 standard errors apart. With this information the statistical analysis is finished. To obtain a value of p, you must do a probability calculation like you did for coins, except you use the t-distribution instead of the binomial distribution.
Another way of looking at this would be to recast the controls can as Np/Ns successive
averages of Ns lots. The distribution of averages data will show the same mean as the original sample and a
standard deviation equal to the standard error of the average. In this case you are calculating the probability that a single value (the experiment average) came from the
control population of averages. If the sample average is 3 standard deviations out, the probability that it belongs in the control population is small. It is not zero, you could have gotten lucky--there is a p chance of that happening. Hence one concludes that the null hypothesis (the sample average belongs in the population of averaged controls) is rejected with confidence 1-p.
Statistical significance testing is NOT the analysis of patterns. It is a straightforward probability calculation that makes use of a
model for the
population probability distribution that is generated from the experimental data themselves. This model is used to calculate probabilities. The goodness of the model depends in the number of values use to “fit” the model.
Recall from algebra that N data points can define up to N parameters. A typical probability model uses two parameters, an average and standard deviation. To calculate the second you need a value for the first. You do this by taking an average of the data, but doing this “uses up” one of your N degrees of freedom leaving N-1 independent values (deviations from the average) that can be used to calculate the standard deviation. This is why N-1 shows up frequently for dof.