Chapter 10 Assignment One-Sample Tests Of Hypotheses

Printer-friendly version
• Statistical Test for Population Proportion and Population Mean
• Statistical and Practical Significances
• Using a Confidence Interval to Draw a Conclusion About a Two-tailed Test

An Introduction to Statistical Methods and Data Analysis, (See Course Schedule).

Six Steps to Conducting a Statistical Test

1. The null and alternative hypotheses
2. Level of significance $$\alpha$$
3. Test statistics
4. Compute the p-value
5. Check whether to reject the null hypothesis by comparing p-value to $$\alpha$$
6. Conclusion in words

A reminder of what is a p-value in hypothesis testing:  P-value is a probability of obtaining a value of the test statistic or a more extreme value of the test statistic assuming that the null hypothesis is true.

Caution: Sometimes p-value is also referred to as the level of significance. One should be aware that $$\alpha$$ (alpha) is also called level of significance. This makes for a confusion in terminology. α is the preset level of significance whereas p-value is the observed level of significance.  The p-value, in fact, is a summary statistic which translates the observed test statistic's value to a probability which is easy to interpret.

Example:  Online Purchases

An e-commerce research company claims that 60% or more graduate students have bought merchandise on-line. A consumer group is suspicious of the claim and thinks that the proportion is lower than 60%. A random sample of 80 graduate students show that only 22 students have ever done so. Is there enough evidence to show that the true porportion is lower than 60%?  Conduct the test at 10% Type I error rate, and use the p-value and rejection region approaches.

Work out your answers to the questions below and then click on the icon to compare answers.

Set up the hypotheses for the consumer advocate, described above. Specify whether it is a left-tailed test, right-tailed test, or a two-tailed test.

Set up the hypotheses for the consumer advocate, described above. Specify whether it is a left-tailed test, right-tailed test, or a two-tailed test.

Test on population proportion:

Ho: $$p =0.6$$ (stands for $$p \geq 0.6$$)

Ha: $$p <0.6$$

It is a left-tailed test.

Now at this point we want to check whether the sample size is large enough so that we can use the one-proportion z-test.

Check the conditions below now.  1)  $$np_0 \geq 5$$, and 2)   $$n(1 - p_0) \geq 5$$

np0 = 80 × 0.6 = 48 ≥ 5

n(1 - p0) = 80 × 0.4 = 32 ≥ 5

Thus, we can use the z approximation.

Next, we can calculate the test statistic.

$\hat{p} = \frac{22}{80} = 0.275$ $Z^{*} = \frac{\hat{p}-p_{o}}{\sqrt\frac{p_{o}\times(1-p_{o})}{n}} = \frac{0.275-0.6}{\sqrt\frac{0.6\times0.4}{80}}=-5.93$

Next find the rejection region for the level of significance and also the p-value for the test statistic.

The example states a 10% Type I error rate which corresponds to an alpha value of 0.10.

From the Z-table and since test is left-tailed, we want as critical value the z-value with 0.1 to the left of it. This corresponds to - 1.28 therefore the rejection region is Z* ≤ -1.28.

The p-value would be found by $$P(Z \leq Z^{*})= P(Z \leq -5.93) \approx 0$$. This Z* is off the table implying the p-value would be close to zero.

Finally, use the test statistic and p-value to make decision and overall conclusion.

With the test statistic Z* of -5.93 falling in the rejection region (i.e. less than -1.645) we will reject the null hypothesis.

With the p-value close to zero and being less than $$\alpha$$ of 0.10 we will reject the null hypothesis. We have statistical evidence at the 10% level of significance to conclude that fewer than 60% of graduate students have purchased merchandise online.

To determine whether the probability is small, we will compare it to the preset level of significance, which is the probability of Type I error. Recall that Type I error is the more serious error  - to reject the null hypothesis when that null hypothesis is true.  Think of finding guilty a person who is actually innocent.

When we specify our hypotheses, we should have some idea of what size Type I error we can tolerate. It is denoted as $$\alpha$$. A conventional choice of $$\alpha$$ is 0.05. Values ranging from 0.001 to 0.1 are also common and the choice of $$\alpha$$ depends on the problem one is working on.

Or we can summarize the data by reporting the p-value and let the users decide to reject $$H_0$$ or not to reject $$H_0$$ for their subjectively chosen $$\alpha$$ values. The p-value can also be called the observed level of significance and our book just sometimes refers to it as the level of significance. That may cause confusion and thus we recommend always calling it the p-value and reserve the term level of significance to represent the preset $$\alpha$$ value.

Example: Emergency Room Wait Time

The administrator at your local hospital states that on weekends the average wait time for emergency room visits is 10 minutes.  Based on discussions you have had with friends who have complained on how long they waited to be seein in the ER over a weekend, you dispute the administrator's claim.  You decide to test you hypothesis.  Over the course of a few weekends you record the wait time for 40 randomly selected patients.  The average wait time for these 40 patients is 11 minutes with a standard deviation of 3 minutes.  Do you have enough evidence to support your hypothesis that the average ER wait time exceeds 10 minutes?  You opt to conduct the test at a 5% level of significance.

Work out your answers to the questions below and then click on the icon to compare answers.

Set up the hypotheses for the example described above. Specify whether it is a left-tailed test, right-tailed test, or a two-tailed test.

Set up the hypotheses for the example described above. Specify whether it is a left-tailed test, right-tailed test, or a two-tailed test.

Test on population mean:

Ho: $$\mu = 10$$

Ha: $$\mu > 10$$

It is a right-tailed test.

At this point we want to check whether the data is approximately normal so we can use the one-mean t-test.

Check the condition.

Since we do not have the actually wait times to check normality, we will consider the sample size. With a sample size of 40 we exceed our minimum requirement of 30 and can proceed with the test.

Next, we can calculate the test statistic.

$t^{*}=\frac{\bar{x}-\mu_{0}}{S/\sqrt{n}}=\frac{11-10}{3/\sqrt{40}}=2.11$

Next find the rejection region for the level of significance and also the p-value for the test statistic.

The example states a 5% level of significance so $$\alpha = 0.5$$.

From the t-table and since test is right-tailed, we want as critical value the t-value with 0.05 to the right of it. With degrees of freedom equal to n - 1, the df are 39. Since 39 is not on the table we will use the closest without exceeding which is 35. With 35 degrees of freedom, the critical value is 1.69. Therefore the rejection region is t* $$\geq$$ 1.69.

The p-value would be found by $$P(t \geq t^{*})= P(t \geq 2.11)$$. This t* is not in the table for 35 df, but does fall between table t-values of 2.030 and 2.438. These t-values correspond to right-tail probabilities of 0.025 and 0.01 suggesting the p-value for our t* of 2.11 is between 0.01 and 0.025

Finally, use the test statistic and p-value to make decision and overall conclusion.

With the test statistic t* of 2.11 falling in the rejection region (i.e. greater than -1.69) we will reject the null hypothesis.

With the p-value between 0.01 and 0.025 making the p-value less than $$\alpha$$ of 0.05, we will reject the null hypothesis. We have statistical evidence at the 5% level of significance to conclude that The average emergency wait time at the hospital is more than 10 minutes.

Statistical and Practical Significances

Our decision in this last example was to reject the null hypothesis and conclude that the average wait time exceeds 10 minutes.  However, our sample mean of 11 minutes wasn't too far off from 10.  So what do you think of our conclusion?  Yes, statistically there was a difference at the 5% level of significance, but are that "impressed" with the results?  That is, do you think 11 minutes is really that much different from 10 minutes?  Since we are sampling data we have to expect some error in our results therefore even if the true wait time was 10 minutes it would be extremely unlikely for our sample data to have mean of exactly 10 minutes. This is the difference between statistical significanceandpractical significance. The former is the result produced from the sample data while the latter is the practical application of those results.

Words of Caution

Critics of hypothesis-testing procedures have observed that a population mean is rarely exactly equal to the value in the null hypothesis and hence, by obtaining a large enough sample, virtually any null hypothesis can be rejected. Thus, it is important to distinguish between statistical significance and practical significance.

Statistical significance is concerned with whether an observed effect is due to chance and practical significance means that the observed effect is large enough to be useful in the real world.

To determine whether the probability is small, we will compare it to the preset level of significance, which is the probability of Type I error. Recall that Type I error is the more serious error  - to reject the null hypothesis when that null hypothesis is true.  Think of finding guilty a person who is actually innocent.

When we specify our hypotheses, we should have some idea of what size Type I error we can tolerate. It is denoted as $$\alpha$$ . A conventional choice of $$\alpha$$ is 0.05. Values ranging from 0.001 to 0.1 are also common and the choice of $$\alpha$$ depends on the problem one is working on. Or we can summarize the data by reporting the p-value and let the users decide to reject $$H_0$$ or not to reject $$H_0$$ for their subjectively chosen α values.

Another one-proportion example - this example uses $$\pi$$ in place of p to represent the proportion.  Note this changes nothing in the overall testing process.

A pharmaceutical company claims that a new treatment is successful in reducing fever in more than 60% of the cases. The treatment was tried on 40 randomly selected cases and 11 were successful. Do you think the company's claim is valid? (Can you reject the company's claim)

Work this our yourself and then review the video (no sound) below:

Using a Confidence Interval to Draw a Conclusion About a Two-tailed Test

The primary purpose of a confidence interval is to estimate some unknown parameter.  A secondary use of confidence intervals is to support decisions in hypothesis testing, especially when the test is two-tailed.  The essence of this method is to compare the hypothesized value to the confidence interval.  If the hypothesized value falls within the interval we fail to reject the null hypothesis.  If the hypothesized value falls outside the interval we reject the null hypothesis.  Let's look at a couple of examples.

For the two-tailed test:

$$H_0: \mu = \mu_0$$
$$H_a: \mu \ne \mu_0$$

The null hypothesis will be rejected at level α if and only if the value $$\mu_0$$ does not fall within the (1 - $$\alpha$$) confidence interval for $$\mu$$ .

Recall our lumber example from the lesson on confidence intervals to show how to use a confidence interval to draw a conclusion about a two-tailed test. A 95% confidence interval for the mean lumber length was 8.03 feet to 8.57 feet.

For our two-tailed test the hypotheses were:

$$H_0: \mu = 8.5$$
$$H_a: \mu \ne 8.5$$

Since 8.5 falls within the 95% confidence interval, we cannot reject the null hypothesis at level 0.05.  In general, if the null value falls within the confidence interval we fail to reject the null hypothesis.  If the null value falls outside the confidence interval then we would reject the null hypothesis.

It is possible to use a one-sided confidence bound to draw a conclusion about a one-sided test, but you have to be very careful about obtaining the one-sided confidence bound.

Chapter 10 – One-Sample Tests of Hypothesis Chapter 10 One-Sample Tests of Hypothesis 1. a. Two-tailed, because the alternate hypothesis does not indicate a direction. b Reject H o when z does not fall in the region from – 1.96 and 1.96 c.  1.2, found by 49 50 (5/ 36) z   d. Fail to reject H o e. p = 0.2302, found by 2(0.5000 – 0.3849). There is a 23.02% chance of finding a z value this large by “sampling error” when H o is true. 2. a. One-tailed, because the alternate hypothesis indicates a greater than direction. b. Reject H o when z > 2.05 c. 4, found by 12 10 (3/ 36) z   d. Reject H o and conclude that  >10 e. The p-value is close to 0. So there is very little chance H o is true. 3. a. One-tailed, because the alternate hypothesis indicates a greater than direction. b. Reject H o when z > 1.65 c. 1.2, found by 21 20 (5/ 36) z   d. Fail to reject H o at the 0.05 significance level. e. p = 0.1151, found by 0.5000 – 0.3849. There is an 11.51% chance of finding a z value this large or larger by “sampling error” when H o is true. 4. a. One-tailed, because the alternate hypothesis indicates a less than direction. b. Reject H o when z <  1.88 c.  2.67, found by 215 220 (15/ 64) z   d. Reject H o and conclude that the population mean is less than 220 at the 0.03 significance level. e. p = 0.0038, found by 0.5000 – 0.4962. There is less than 0.5% chance H o is true. 5. a. H o : = 60,000 H 1 :  60,000 b. Reject H o if z <  1.96 or z > 1.96 c.  0.69, found by 59,500 60,000 (5000/ 48) z   d. Do not reject H o e. p = 0.4902, found by 2(0.5000 – 0.2549). Crosset’s experience is not different from that claimed by the manufacturer. If the H o is true, the probability of finding a value more extreme than this is 0.4902. 10-1

Categories: 1