▸ Back to all demo topics

Hypothesis Testing High-frequency topic

Often examined
Statistics — Paper 2 · AQA AS Mathematics (sample) · Drill 1 of 2
Why this topic matters. Hypothesis Testing with the Binomial distribution appears on AQA Paper 2 or Paper 3 in almost every sitting, carrying 7–12 marks. The conclusion-in-context mark is the single most consistently dropped mark across all AS Mathematics candidates: writing “reject H₀” without a real-world sentence loses the final accuracy mark. Mastering the full 5-step method earns you the whole question, every time.

The scaffold to use (SHCCC Framework)

  1. State: write H₀ and H₁ in terms of p, the probability parameter.
  2. Hypothesise: state the test statistic distribution under H₀ (e.g. X ~ B(n, p₀)).
  3. Compute: find P(X ≤ x) or P(X ≥ x) using the Binomial distribution.
  4. Compare: compare the probability to the significance level α.
  5. Conclude: write a sentence in context. Must reference the real-world claim.
Foundational — recall and single-step application

Q1. A coin is tossed 10 times and lands heads 8 times. Test, at the 5% significance level, whether there is evidence that the coin is biased towards heads.

Foundational
[6 marks]
Method to use Consider SHCC Framework — set up H₀: p = 0.5 (fair) and H₁: p > 0.5 (biased towards heads); find P(X ≥ 8) under H₀.
M1 H₀: p = 0.5 (coin is fair).   H₁: p > 0.5 (coin biased towards heads). One-tailed test. AO1 — stating hypotheses correctly Common mistake: Writing H₁: p ≠ 0.5 (two-tailed) when the question says “biased towards heads”.
M1 Under H₀: X ~ B(10, 0.5). Observed: x = 8. For one-tailed upper test: P(X ≥ 8). AO1 — setting up the binomial model
A1 P(X ≥ 8) = P(X=8) + P(X=9) + P(X=10) = 10C8⋅(0.5)10 + 10C9⋅(0.5)10 + (0.5)10 = (45+10+1)/1024 = 56/1024 ≈ 0.0547. AO1 — computing the tail probability
A1 0.0547 > 0.05 (significance level). Do not reject H₀. AO2 — comparing probability to significance level Common mistake: Rejecting H₀ when the probability is greater than the significance level.
A1 There is insufficient evidence at the 5% level to conclude that the coin is biased towards heads. AO2 — conclusion in context — the mark most often dropped
A1 Note: had 9 or 10 heads been observed, P(X≥9)=11/1024≈0.011 < 0.05, and we would reject H₀. AO2 — understanding the critical boundary

Q2. State the null and alternative hypotheses for each scenario: (a) A manufacturer claims 30% of items are defective; a quality controller believes the rate is lower. (b) A die is suspected of landing on six more than 1/6 of the time.

Foundational
[3 marks]
Method to use Consider SHCC Framework — identify the parameter p (probability of event) and write H₀ as the baseline claim; H₁ reflects the suspected deviation direction.
M1 (a) p = probability an item is defective. H₀: p = 0.3; H₁: p < 0.3. (One-tailed lower test.) AO1 — writing hypotheses in terms of p
A1 (b) p = probability of landing on six. H₀: p = 1/6; H₁: p > 1/6. (One-tailed upper test.) AO1 — writing hypotheses with correct direction Common mistake: Writing H₁: p ≠ 1/6 when a specific direction is indicated.
A1 In both cases H₀ states the “no change” or “baseline” value of p; H₁ states the direction of the suspected change. AO2 — explaining the purpose of each hypothesis

Q3. Under H₀: X ~ B(20, 0.4). Find P(X ≤ 5).

Foundational
[2 marks]
Method to use Consider P-Value Approach — use the binomial cumulative distribution: P(X ≤ 5) under B(20, 0.4) using tables or a calculator.
M1 P(X ≤ 5) under B(20, 0.4) = Σ[k=0..5] 20Ck⋅(0.4)k⋅(0.6)20−k. Use Binomial CDF tables or calculator. AO1 — identifying the required cumulative probability
A1 P(X ≤ 5) ≈ 0.1256 (to 4 d.p.). [Exact: 0.12560…] AO1 — correct probability value

Q4. A seed producer claims 70% of seeds germinate. A gardener plants 15 seeds and only 7 germinate. Test at the 10% significance level whether the germination rate is lower than claimed.

Foundational
[6 marks]
Method to use Consider SHCC Framework — H₀: p = 0.7; H₁: p < 0.7; find P(X ≤ 7) under B(15, 0.7) and compare to 0.10.
M1 H₀: p = 0.7 (claimed rate). H₁: p < 0.7 (gardener believes rate is lower). One-tailed lower test. AO1 — stating hypotheses
M1 Under H₀: X ~ B(15, 0.7). Observed: 7 germinated. P-value = P(X ≤ 7). AO1 — setting up the calculation
A1 P(X ≤ 7) = 0.0500 (from Binomial tables, B(15,0.7)). [Exact: 0.05001…] AO1 — computing the tail probability
A1 0.0500 ≤ 0.10 (significance level). Reject H₀. AO2 — comparison and decision
A1 There is sufficient evidence at the 10% significance level to suggest the germination rate is lower than 70%. AO2 — conclusion in context — must reference germination, not just p Common mistake: Writing “we reject H₀” without a context sentence loses the final mark on every AQA hypothesis testing question.
A1 Interpretation: If the true rate were 70%, the probability of seeing 7 or fewer germinations in 15 is only 5%, which is at or below the 10% threshold ⇒ reject. AO2 — explaining the reasoning in words

Q5. Define: (a) the critical region; (b) the actual significance level; (c) a Type I error.

Foundational
[3 marks]
Method to use Consider Critical Region Approach — these are definitions — recall them precisely using the language the examiner expects.
M1 (a) The critical region is the set of values of the test statistic for which H₀ would be rejected. It is the region in the tail(s) where the probability is ≤ the significance level. AO1 — defining critical region precisely
A1 (b) The actual significance level is the probability that the test statistic falls in the critical region, given H₀ is true. It equals P(X in CR | H₀), which may be slightly less than the nominal significance level (due to discrete distribution). AO2 — distinguishing nominal and actual significance level
A1 (c) A Type I error is rejecting H₀ when it is in fact true. Its probability equals the actual significance level. AO2 — correct definition of Type I error

Q6. X ~ B(12, p). Under H₀: p = 0.25 and H₁: p > 0.25. Find the critical region at the 5% significance level, and state the actual significance level.

Foundational
[4 marks]
Method to use Consider Critical Region Approach — find the smallest integer c such that P(X ≥ c) ≤ 0.05 under B(12, 0.25); the critical region is X ≥ c.
M1 Find P(X ≥ c) for B(12, 0.25). P(X ≥ c) = 1 − P(X ≤ c−1). AO1 — setting up the critical region search
M1 Try c=6: P(X≥6) = 1−P(X≤5) = 1−0.9456 = 0.0544 > 0.05. Try c=7: P(X≥7) = 1−0.9857 = 0.0143 ≤ 0.05. AO1 — evaluating at candidate critical values Common mistake: Using P(X ≥ 6) without checking the next value; c=6 does not give a probability ≤ 0.05.
A1 Critical region: X ≥ 7. AO1 — correct critical region
A1 Actual significance level: P(X ≥ 7 | p=0.25) = 0.0143 (1.43%). This is less than 5% because the discrete distribution cannot achieve exactly 5%. AO2 — correct actual significance level with explanation
Mid-tier — application and analysis (exam-bulk questions)

Q7. A supermarket claims that 60% of its customers spend more than £50 per visit. In a random sample of 20 customers, 9 spend more than £50. Test at the 5% significance level whether the true proportion is less than 60%.

Mid-tier
[7 marks]
Method to use Consider SHCC Framework — one-tailed lower test; H₀: p=0.6; H₁: p<0.6; find P(X ≤ 9) under B(20, 0.6).
M1 Let p = probability a customer spends >£50. H₀: p = 0.6. H₁: p < 0.6. One-tailed lower test at 5%. AO1 — stating hypotheses with context definition of p
M1 Under H₀: X ~ B(20, 0.6). Observed x = 9. P-value = P(X ≤ 9). AO1 — identifying test statistic and tail
A1 P(X ≤ 9) = 0.1275 (from B(20, 0.6) tables). AO1 — correct probability
A1 0.1275 > 0.05. Do not reject H₀. AO2 — correct comparison
A1 There is insufficient evidence at the 5% significance level to conclude that the proportion of customers spending more than £50 is less than 60%. AO2 — conclusion in context
A1 Note: 9/20 = 45% is below 60%, but the sample is too small to rule out chance at this significance level. AO2 — interpreting the result for the business context
A1 The p-value of 12.75% means: even if p=0.6 were true, we’d see 9 or fewer about 1 in 8 times by chance. Not rare enough to reject. AO2 — explaining the p-value in words

Q8. A two-tailed test is performed at the 10% significance level. The test statistic X ~ B(25, p) under H₀: p = 0.3, H₁: p ≠ 0.3. Find the critical region.

Mid-tier
[6 marks]
Method to use Consider Critical Region Approach — for a two-tailed test, split alpha equally: find the lower CR (P(X ≤ c₁) ≤ 0.05) and upper CR (P(X ≥ c₂) ≤ 0.05).
M1 Two-tailed test, α=10%. Split: 5% in each tail. Under B(25, 0.3). AO2 — two-tailed split strategy
M1 Lower tail: find largest c₁ s.t. P(X ≤ c₁) ≤ 0.05. P(X≤3)=0.0332≤0.05; P(X≤4)=0.0905>0.05. So c₁=3. AO1 — finding lower critical value
A1 Upper tail: find smallest c₂ s.t. P(X ≥ c₂) ≤ 0.05. P(X≥12)=1−P(X≤11)=1−0.9558=0.0442≤0.05; P(X≥13)=1−0.9825=0.0175≤0.05 (smaller still). So c₂=12 (the smallest c with P(X≥c)≤0.05). AO1 — finding upper critical value Common mistake: Using the full 10% in each tail (giving a total 20% type I error rate).
A1 Critical region: X ≤ 3 or X ≥ 12. AO1 — correct critical region statement
A1 Actual significance level: P(X≤3)+P(X≥12) = 0.0332+0.0442 = 0.0775 (7.75%). AO2 — actual significance level is less than 10%
A1 Observed X=10: 10 is not in the critical region (3 < 10 < 12). Do not reject H₀. AO2 — applying the test to a concrete observation

Q9. An author believes that 40% of readers prefer digital books. She surveys 18 readers and finds 4 prefer digital. Perform a hypothesis test at the 5% significance level.

Mid-tier
[7 marks]
Method to use Consider SHCC Framework — one-tailed lower test (or consider whether a two-tailed test is appropriate based on the question wording); find P(X ≤ 4) under B(18, 0.4).
M1 p = proportion who prefer digital. H₀: p=0.4 (author’s belief). H₁: p<0.4 (survey suggests lower). One-tailed. AO1 — hypotheses with context
M1 X ~ B(18, 0.4) under H₀. Observed x=4. P-value = P(X ≤ 4). AO1 — test setup
A1 P(X ≤ 4) = 0.0942 (from B(18, 0.4) tables or calculator). AO1 — probability value
A1 0.0942 > 0.05. Do not reject H₀. AO2 — comparison
A1 There is insufficient evidence at the 5% level to suggest that fewer than 40% of readers prefer digital books. AO2 — conclusion in context
A1 4/18 = 22%, which is well below 40%, yet the p-value is just above 5% due to the small sample size. AO2 — interpreting the near-boundary result
A1 At the 10% level, P(X≤4)=0.0942 < 0.10, and we would reject H₀. The conclusion is level-sensitive. AO2 — level sensitivity

Q10. X ~ B(30, p). A researcher tests H₀: p = 0.2 against H₁: p > 0.2 at the 1% significance level. She observes X = 12. (a) Find the p-value. (b) State the conclusion.

Mid-tier
[6 marks]
Method to use Consider P-Value Approach — p-value for upper one-tailed test = P(X ≥ 12) under B(30, 0.2); compare to 0.01.
M1 P-value = P(X ≥ 12) under B(30, 0.2) = 1 − P(X ≤ 11). AO1 — computing the p-value for an upper test
A1 P(X ≤ 11) ≈ 0.9905. P-value = 1 − 0.9905 = 0.0095. AO1 — correct p-value
M1 Compare: 0.0095 ≤ 0.01 (significance level). AO2 — comparison
A1 Reject H₀. AO1 — correct decision
A1 There is sufficient evidence at the 1% level to conclude that p > 0.2. AO2 — conclusion in context
A1 Note: the p-value 0.0095 is below both 1% and 5%, so the test rejects H₀ at either level here. Always compare against the level the question specifies. AO2 — level sensitivity explanation

Q11. A hospital claims 35% of patients wait more than 4 hours. An inspector samples 40 patients; 22 waited more than 4 hours. Test at 5% whether the true proportion exceeds 35%.

Mid-tier
[7 marks]
Method to use Consider SHCC Framework — upper one-tailed test; H₀: p=0.35; H₁: p>0.35; find P(X ≥ 22) under B(40, 0.35).
M1 p = probability a patient waits >4 hours. H₀: p=0.35. H₁: p>0.35. One-tailed upper test, α=5%. AO1 — hypotheses
M1 X ~ B(40, 0.35) under H₀. Observed x=22. P-value = P(X ≥ 22) = 1 − P(X ≤ 21). AO1 — test setup
A1 P(X ≤ 21) ≈ 0.9925. P-value = 1 − 0.9925 = 0.0075. AO1 — p-value
A1 0.0075 ≤ 0.05. Reject H₀. AO2 — comparison and decision
A1 There is sufficient evidence at the 5% level to conclude that the proportion of patients waiting more than 4 hours exceeds 35%. AO2 — conclusion in context — reference to the hospital waiting-time context
A1 22/40=55%≫35%; combined with p-value=0.75%, this is strong evidence against H₀ (it also rejects at the 1% level, since 0.0075 ≤ 0.01). AO2 — contextualising the strength of evidence
A1 A Type I error here would mean: concluding the waiting rate exceeds 35% when it actually equals 35%. Probability of this error = actual significance level = 0.75%. AO2 — Type I error interpretation

Q12. Two-tailed test: X ~ B(15, p). H₀: p=0.5, H₁: p≠0.5, α=10%. (a) Find the critical region. (b) An observed value of X=3 is obtained. State the conclusion.

Mid-tier
[6 marks]
Method to use Consider Critical Region Approach — split α into 5% per tail; find critical values at each end of B(15, 0.5).
M1 (a) B(15,0.5). Lower: P(X≤c₁)≤0.05. P(X≤3)=0.0176≤0.05; P(X≤4)=0.0592>0.05. So c₁=3. AO1 — lower critical value
A1 Upper: by symmetry (p=0.5), c₂=15−3=12. P(X≥12)=P(X≤3)=0.0176≤0.05 ✓. AO2 — using symmetry correctly Common mistake: Not recognising the symmetry of B(n, 0.5) and recomputing from scratch.
A1 Critical region: X ≤ 3 or X ≥ 12. AO1 — correct CR
A1 Actual significance level: 2×0.0176 = 0.0352 (3.52%). AO2 — actual sig level
A1 (b) X=3 falls in the critical region (X≤3). Reject H₀. AO2 — applying the CR to the observation
A1 There is sufficient evidence at the 10% level to conclude that p ≠ 0.5. AO2 — two-tailed conclusion in context

Q13. A biologist claims that, under certain conditions, exactly 25% of tadpoles survive to become frogs. She observes that in a sample of 20, only 1 survives. Test at 5% whether this provides evidence against the claim. State clearly whether you use a one-tailed or two-tailed test and justify your choice.

Mid-tier
[7 marks]
Method to use Consider P-Value Approach — the word “against the claim” without a direction suggests two-tailed; but if the observation is in the lower tail, use lower one-tailed is also defensible — justify your choice.
M1 p = probability a tadpole survives. H₀: p=0.25. H₁: p≠0.25 (two-tailed, as the question asks about evidence “against the claim” without specifying direction). AO2 — choosing and justifying test direction Common mistake: Using a one-tailed test without stating and justifying the direction.
M1 X ~ B(20, 0.25) under H₀. Observed x=1. Lower tail p-value = P(X≤1). AO1 — selecting the relevant tail
A1 P(X≤1) = P(X=0)+P(X=1) = (0.75)20 + 20(0.25)(0.75)19 ≈ 0.00317+0.02114 = 0.0243. AO1 — computing the probability
A1 Two-tailed p-value = 2×0.0243 = 0.0486 ≤ 0.05. Reject H₀. AO2 — doubling for two-tailed
A1 There is sufficient evidence at the 5% level to conclude that the survival probability differs from 25%. AO2 — conclusion in context with correct two-tailed language
A1 Alternative: use one-tailed at 5% with H₁: p<0.25. P(X≤1)=0.0243<0.05. Reject H₀. Conclusion: evidence the survival rate is less than 25%. (Both approaches acceptable if justified.) AO2 — alternative valid approach
A1 The observation (1 in 20 = 5%) is far below the claimed 25%, consistent with the rejection result. AO2 — contextual plausibility check

Q14. The probability of a randomly selected voter supporting a policy is thought to be 0.6. In a survey of 25 voters, x support the policy. The hypothesis test H₀: p=0.6 vs H₁: p≠0.6 at 5% is performed. Find the critical region and, for any observation in the critical region, state the form of the conclusion.

Mid-tier
[6 marks]
Method to use Consider Critical Region Approach — two-tailed test, 2.5% in each tail, B(25, 0.6).
M1 Two-tailed, 5%. Each tail 2.5%. Under B(25, 0.6). AO2 — two-tailed split
M1 Lower: P(X≤c)≤0.025 under B(25,0.6). P(X≤10)=0.0344>0.025; P(X≤9)=0.0132≤0.025. c₁=9. AO1 — lower critical value
A1 Upper: P(X≥c)≤0.025. P(X≥20)=1−P(X≤19)=1−0.9706=0.0294>0.025. P(X≥21)=1−P(X≤20)=1−0.9905=0.0095≤0.025. c₂=21. AO1 — upper critical value
A1 Critical region: X ≤ 9 or X ≥ 22. AO1 — correct CR
A1 If X is in the CR: “There is sufficient evidence at the 5% significance level to reject the claim that 60% of voters support the policy. The evidence suggests the true proportion differs from 60%.” AO2 — model conclusion language for two-tailed test
A1 Actual significance level: P(X≤9)+P(X≥21)=0.0132+0.0095=0.0227. AO2 — actual significance level

Q15. Explain the difference between a one-tailed and two-tailed hypothesis test, and give an example of a context that would require each type.

Mid-tier
[4 marks]
Method to use Consider SHCC Framework — the direction of H₁ determines the tail; justify using the research question.
M2 A one-tailed test is used when there is a specific direction of interest in H₁ (e.g. “is the proportion greater than p₀?” or “less than p₀?”). Example: a drug company tests whether a new medicine increases the recovery rate above 40% ⇒ H₁: p > 0.4. All of α is in the upper tail. AO2 — one-tailed explanation with example
A2 A two-tailed test is used when the alternative is simply that p differs from p₀, without specifying direction. Example: a coin is tested for any bias ⇒ H₁: p ≠ 0.5. α is split equally between both tails. This is more conservative (harder to reject H₀), appropriate when the direction of deviation is unknown. AO2 — two-tailed explanation with example and context

Q16. A company claims its customer service resolves 80% of complaints at first contact. An audit of 30 complaints finds 20 resolved. Using the critical region approach, test at the 5% significance level whether the resolution rate has fallen.

Mid-tier
[6 marks]
Method to use Consider Critical Region Approach — one-tailed lower test; find critical region for B(30, 0.8) at the 5% lower tail.
M1 p = resolution probability. H₀: p=0.8. H₁: p<0.8. One-tailed lower test, α=5%. AO1 — hypotheses
M1 Under B(30, 0.8). Find c s.t. P(X≤c)≤0.05. P(X≤19)=0.0256≤0.05; P(X≤20)=0.0611>0.05. Critical region: X≤19. AO1 — finding critical region for B(30,0.8)
A1 Critical region: X ≤ 19. Actual significance level = P(X≤19 | p=0.8) = 0.0256. AO1 — CR and actual sig level
A1 Observed X=20. 20 is NOT in the critical region (20 > 19). Do not reject H₀. AO2 — applying CR correctly
A1 There is insufficient evidence at the 5% level to conclude the resolution rate has fallen below 80%. AO2 — conclusion in context
A1 20/30 ≈ 67%, substantially below 80%, yet the small sample means we cannot rule out chance variation at 5%. AO2 — contextual remark on sample size
Stretch — discriminators, twist-traps, top-band signals

Q17. A coin is tested for bias. In 50 tosses, 31 heads are obtained. (a) Perform a two-tailed test at the 5% level. (b) Find the critical region. (c) State the probability of a Type I error.

Stretch
[9 marks]
Method to use Consider Critical Region Approach — two-tailed, B(50,0.5); find both tail probabilities; specify the CR and its actual probability.
M1 p = P(head). H₀: p=0.5. H₁: p≠0.5. Two-tailed, 5%. Under B(50,0.5). AO1 — hypotheses
M1 (a) P-value: observed X=31. P(X≥31) under B(50,0.5). P(X≥31) = 1−P(X≤30). AO1 — identifying the tail for the observed value
A1 P(X≤30) = 0.9405 (B(50,0.5) tables). P(X≥31) = 1−0.9405 = 0.0595. AO1 — correct single tail probability
A1 Two-tailed p-value = 2×0.0595 = 0.1190. 0.1190 > 0.05. Do not reject H₀. AO2 — doubling and comparison
A1 There is insufficient evidence at the 5% level that the coin is biased. AO2 — conclusion in context
M1 (b) Critical region: find c₁ s.t. P(X≤c₁)≤0.025 and c₂ s.t. P(X≥c₂)≤0.025. By symmetry: c₁=17 (P(X≤17)=0.0164), c₂=33 (P(X≥33)=0.0164). AO1 — finding CR
A1 Critical region: X ≤ 17 or X ≥ 33. AO1 — correct CR
A1 (c) P(Type I error) = actual sig level = P(X≤17)+P(X≥33) = 2×0.0164 = 0.0328 (3.28%). AO2 — Type I error probability
A1 X=31 is not in the CR (17<31<33) — consistent with not rejecting H₀. AO2 — linking CR to conclusion

Q18. A random variable X ~ B(n, p). Under H₀: p = 1/3. A one-tailed upper test is performed at 5%. If n = 40, find the critical region and the actual significance level. Comment on whether the test is sensitive to departures from p = 1/3.

Stretch
[7 marks]
Method to use Consider Critical Region Approach — find the smallest c such that P(X ≥ c) ≤ 0.05 under B(40, 1/3); compute the actual significance level.
M1 Under B(40, 1/3). Find c s.t. P(X≥c)≤0.05. Mean = 40/3 ≈ 13.3. AO1 — identifying the search region near the mean+
A1 P(X≥19) = 1−P(X≤18) ≈ 0.0510 > 0.05. P(X≥20) = 1−P(X≤19) ≈ 0.0271 ≤ 0.05. So c=20. AO1 — correct critical value
A1 Critical region: X ≥ 20. Actual significance level = 0.0271 (2.71%). AO1 — CR and actual sig level
A1 To reject H₀, we need at least 20 successes in 40 trials, which is 50% when the claimed rate is 33%. This means the test only detects deviations of p ≥ 0.50 reliably. AO2 — sensitivity analysis
A1 For modest departures (e.g. p=0.45 vs. p=0.33), the test may lack power; a larger n would give a more sensitive test. AO3 — extending to test design
A1 The actual sig level of 2.71% is well below 5% because of the discrete distribution; the test is conservative. AO2 — interpreting the discrete effect
A1 If X=22 were observed: 22≥20 ⇒ in CR ⇒ reject H₀. Conclude p > 1/3. AO2 — applying the CR to a concrete example

Q19. A researcher conducts 20 independent hypothesis tests, each at the 5% significance level, when all null hypotheses are true. What is the expected number of Type I errors? What is the probability of at least one Type I error? What does this suggest about multiple testing?

Stretch
[6 marks]
Method to use Consider P-Value Approach — treat each Type I error as a Bernoulli trial with p=0.05; use Binomial for expectation and complement for “at least one”.
M1 Each test has P(Type I error) = 0.05 (actual sig level, assumed equal to nominal here). Let T = number of Type I errors. T ~ B(20, 0.05). AO2 — modelling Type I errors
A1 E(T) = 20 × 0.05 = 1. On average, 1 spurious rejection even when all H₀ are true. AO1 — expected number
A1 P(T ≥ 1) = 1 − P(T=0) = 1 − (0.95)20 ≈ 1 − 0.3585 = 0.6415. AO1 — probability of at least one Type I error
A1 There is a 64% chance of at least one false positive across 20 tests, even when every H₀ is true. AO2 — interpreting the result
A1 This illustrates the multiple testing problem: performing many tests inflates the overall Type I error rate far above the nominal 5%. Corrections (e.g. Bonferroni) are used in practice. AO3 — connecting to wider statistical concepts
A1 AQA AS will not ask for Bonferroni, but understanding the multiple-testing problem earns top-band marks in contextual questions. AO2 — exam-relevance note

Q20. A clinical trial tests whether a new drug increases the success rate of a treatment above the current 45%. Of 60 patients, 36 respond successfully. (a) Perform the test at 1%. (b) Find the minimum number of successes in 60 trials required to reject H₀ at 1%. (c) A critic argues a one-tailed test is inappropriate. Explain why and how a two-tailed test changes the conclusion.

Stretch
[10 marks]
Method to use Consider SHCC Framework — for (a) find P(X≥36) under B(60, 0.45); for (b) find the critical region at 1%; for (c) discuss tail choice.
M1 (a) p = P(success). H₀: p=0.45. H₁: p>0.45 (drug increases success). One-tailed upper, α=1%. AO1 — hypotheses
M1 X ~ B(60, 0.45). Observed x=36. P-value = P(X≥36). AO1 — test setup
A1 P(X≥36) = 1−P(X≤35) ≈ 1−0.9861 = 0.0139. 0.0139 > 0.01. Do not reject H₀ at 1%. AO1 — p-value and decision
A1 There is insufficient evidence at the 1% level to conclude the drug increases the success rate above 45%. AO2 — conclusion in context
M1 (b) Find c: P(X≥c) ≤ 0.01. P(X≥36)=1−P(X≤35)=1−0.9861=0.0139>0.01; P(X≥37)=1−P(X≤36)=1−0.9931=0.0069≤0.01. Minimum: 37 successes. AO1 — critical value for (b)
M1 (c) The one-tailed test assumes we had prior reason to expect an increase. If we didn’t (or if the drug could harm as well as help), a two-tailed test is more appropriate. AO3 — justifying test direction
A1 Two-tailed at 1%: each tail 0.5%. P-value for two-tailed = 2×0.0139=0.0278 > 0.01. Same conclusion (do not reject), but the two-tailed test is even harder to reject. AO2 — two-tailed comparison
A1 The choice of tail affects the critical region and p-value. AQA questions usually specify the direction; if they don’t, justify your choice of one or two-tailed explicitly. AO2 — exam strategy note
A1 This question illustrates that at 5% (p-value=0.0139<0.05) we would reject one-tailed. The threshold of significance is a design choice, not a mathematical fact. AO3 — deeper insight
Stuck on a question?
← Gradora home A real sample pack. The full version is personalised per student and launches autumn 2026. Join the waitlist → All demo topics