When n less than 30 and the population standard deviation is known what is the appropriate distribution?

The t-distribution describes the standardized distances of sample means to the population mean when the population standard deviation is not known, and the observations come from a normally distributed population.

Is the t-distribution the same as the Student’s t-distribution?

Yes.

What’s the key difference between the t- and z-distributions?

The standard normal or z-distribution assumes that you know the population standard deviation. The t-distribution is based on the sample standard deviation.

The t-distribution is similar to a normal distribution. It has a precise mathematical definition. Instead of diving into complex math, let’s look at the useful properties of the t-distribution and why it is important in analyses.

  • Like the normal distribution, the t-distribution has a smooth shape.
  • Like the normal distribution, the t-distribution is symmetric. If you think about folding it in half at the mean, each side will be the same.
  • Like a standard normal distribution (or z-distribution), the t-distribution has a mean of zero.
  • The normal distribution assumes that the population standard deviation is known. The t-distribution does not make this assumption.
  • The t-distribution is defined by the degrees of freedom. These are related to the sample size.
  • The t-distribution is most useful for small sample sizes, when the population standard deviation is not known, or both.
  • As the sample size increases, the t-distribution becomes more similar to a normal distribution.

Consider the following graph comparing three t-distributions with a standard normal distribution:

Figure 1: Three t-distributions and a standard normal (z-) distribution.

All of the distributions have a smooth shape. All are symmetric. All have a mean of zero.

The shape of the t-distribution depends on the degrees of freedom. The curves with more degrees of freedom are taller and have thinner tails. All three t-distributions have “heavier tails” than the z-distribution.

You can see how the curves with more degrees of freedom are more like a z-distribution. Compare the pink curve with one degree of freedom to the green curve for the z-distribution. The t-distribution with one degree of freedom is shorter and has thicker tails than the z-distribution. Then compare the blue curve with 10 degrees of freedom to the green curve for the z-distribution. These two distributions are very similar.

A common rule of thumb is that for a sample size of at least 30, one can use the z-distribution in place of a t-distribution. Figure 2 below shows a t-distribution with 30 degrees of freedom and a z-distribution. The figure uses a dotted-line green curve for z, so that you can see both curves. This similarity is one reason why a z-distribution is used in statistical methods in place of a t-distribution when sample sizes are sufficiently large.

Figure 2: z-distribution and t-distribution with 30 degrees of freedom

When you perform a t-test, you check if your test statistic is a more extreme value than expected from the t-distribution.

For a two-tailed test, you look at both tails of the distribution. Figure 3 below shows the decision process for a two-tailed test. The curve is a t-distribution with 21 degrees of freedom. The value from the t-distribution with α = 0.05/2 = 0.025 is 2.080. For a two-tailed test, you reject the null hypothesis if the test statistic is larger than the absolute value of the reference value. If the test statistic value is either in the lower tail or in the upper tail, you reject the null hypothesis. If the test statistic is within the two reference lines, then you fail to reject the null hypothesis.

Figure 3: Decision process for a two-tailed test

For a one-tailed test, you look at only one tail of the distribution. For example, Figure 4 below shows the decision process for a one-tailed test. The curve is again a t-distribution with 21 degrees of freedom. For a one-tailed test, the value from the t-distribution with α = 0.05 is 1.721. You reject the null hypothesis if the test statistic is larger than the reference value. If the test statistic is below the reference line, then you fail to reject the null hypothesis.

Figure 4: Decision process for a one-tailed test

Most people use software to perform the calculations needed for t-tests. But many statistics books still show t-tables, so understanding how to use a table might be helpful. The steps below describe how to use a typical t-table.

  1. Identify if the table is for two-tailed or one-tailed tests. Then, decide if you have a one-tailed or a two-tailed test. The columns for a t-table identify different alpha levels.
    If you have a table for a one-tailed test, you can still use it for a two-tailed test. If you set α = 0.05 for your two-tailed test and have only a one-tailed table, then use the column for α = 0.025.
  2. Identify the degrees of freedom for your data. The rows of a t-table correspond to different degrees of freedom. Most tables go up to 30 degrees of freedom and then stop. The tables assume people will use a z-distribution for larger sample sizes.
  3. Find the cell in the table at the intersection of your α level and degrees of freedom. This is the t-distribution value. Compare your statistic to the t-distribution value and make the appropriate conclusion.

In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation.

The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
, then the distribution of the sample means will be approximately normally distributed. This will hold true regardless of whether the source population is normal or skewed, provided the sample size is sufficiently large (usually n > 30). If the population is normal, then the theorem holds true even for samples smaller than 30. In fact, this also holds true even if the population is binomial, provided that min(np, n(1-p))> 5, where n is the sample size and p is the probability of success in the population. This means that we can use the normal probability model to quantify uncertainty when making inferences about a population mean based on the sample mean.

For the random samples we take from the population, we can compute the mean of the sample means:

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

and the standard deviation of the sample means:

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

Before illustrating the use of the Central Limit Theorem (CLT) we will first illustrate the result. In order for the result of the CLT to hold, the sample must be sufficiently large (n > 30). Again, there are two exceptions to this. If the population is normal, then the result holds for samples of any size (i..e, the sampling distribution of the sample means will be approximately normal even for samples of size less than 30).

Central Limit Theorem with a Normal Population

The figure below illustrates a normally distributed characteristic, X, in a population in which the population mean is 75 with a standard deviation of 8.

When n less than 30 and the population standard deviation is known what is the appropriate distribution?

If we take simple random samples (with replacement)

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
of size n=10 from the population and compute the mean for each of the samples, the distribution of sample means should be approximately normal according to the Central Limit Theorem. Note that the sample size (n=10) is less than 30, but the source population is normally distributed, so this is not a problem. The distribution of the sample means is illustrated below. Note that the horizontal axis is different from the previous illustration, and that the range is narrower.

When n less than 30 and the population standard deviation is known what is the appropriate distribution?

The mean of the sample means is 75 and the standard deviation of the sample means is 2.5, with the standard deviation of the sample means computed as follows:

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

If we were to take samples of n=5 instead of n=10, we would get a similar distribution, but the variation among the sample means would be larger. In fact, when we did this we got a sample mean = 75 and a sample standard deviation = 3.6.

Central Limit Theorem with a Dichotomous Outcome

Now suppose we measure a characteristic, X, in a population and that this characteristic is dichotomous (e.g., success of a medical procedure: yes or no) with 30% of the population classified as a success (i.e., p=0.30) as shown below.

When n less than 30 and the population standard deviation is known what is the appropriate distribution?

The Central Limit Theorem applies even to binomial populations like this provided that the minimum of np and n(1-p) is at least 5, where "n" refers to the sample size, and "p" is the probability of "success" on any given trial. In this case, we will take samples of n=20 with replacement, so min(np, n(1-p)) = min(20(0.3), 20(0.7)) = min(6, 14) = 6. Therefore, the criterion is met.

We saw previously that the population mean and standard deviation for a binomial distribution are:

Mean binomial probability:

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

Standard deviation:

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

The distribution of sample means based on samples of size n=20 is shown below.

When n less than 30 and the population standard deviation is known what is the appropriate distribution?

The mean of the sample means is

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

and the standard deviation of the sample means is:

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

Now, instead of taking samples of n=20, suppose we take simple random samples (with replacement) of size n=10. Note that in this scenario we do not meet the sample size requirement for the Central Limit Theorem (i.e., min(np, n(1-p)) = min(10(0.3), 10(0.7)) = min(3, 7) = 3).The distribution of sample means based on samples of size n=10 is shown on the right, and you can see that it is not quite normally distributed. The sample size must be larger in order for the distribution to approach normality.

Central Limit Theorem with a Skewed Distribution

The Poisson distribution is another probability model that is useful for modeling discrete variables such as the number of events occurring during a given time interval. For example, suppose you typically receive about 4 spam emails per day, but the number varies from day to day. Today you happened to receive 5 spam emails. What is the probability of that happening, given that the typical rate is 4 per day? The Poisson probability is:

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

Mean = μ

Standard deviation =

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

The mean for the distribution is μ (the average or typical rate), "X" is the actual number of events that occur ("successes"), and "e" is the constant approximately equal to 2.71828. So, in the example above

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

Now let's consider another Poisson distribution. with μ=3 and σ=1.73. The distribution is shown in the figure below.

 

When n less than 30 and the population standard deviation is known what is the appropriate distribution?

This population is not normally distributed, but the Central Limit Theorem will apply if n > 30. In fact, if we take samples of size n=30, we obtain samples distributed as shown in the first graph below with a mean of 3 and standard deviation = 0.32. In contrast, with small samples of n=10, we obtain samples distributed as shown in the lower graph. Note that n=10 does not meet the criterion for the Central Limit Theorem, and the small samples on the right give a distribution that is not quite normal. Also note that the sample standard deviation (also called the "standard error

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
") is larger with smaller samples, because it is obtained by dividing the population standard deviation by the square root of the sample size. Another way of thinking about this is that extreme values will have less impact on the sample mean when the sample size is large.

When n less than 30 and the population standard deviation is known what is the appropriate distribution?

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?

When n less than 30 and the population standard deviation is known what is the appropriate distribution?

When n less than 30 and the population standard deviation is known what is the appropriate distribution?
When n less than 30 and the population standard deviation is known what is the appropriate distribution?