Statistics -- Diagnostic Tests
Statistics — Diagnostic Tests
Unit Tests
UT-1: Averages and Spread from Grouped Data
Question: The grouped frequency table shows the marks of 60 students in a maths test:
| Mark | Frequency |
|---|---|
| 4 | |
| 11 | |
| 18 | |
| 20 | |
| 7 |
(a) Calculate an estimate for the mean mark. (b) Identify the modal class. (c) Find the median class and estimate the median. (d) Explain why the mean is only an estimate.
Solution:
(a) Using midpoints: 10, 30, 50, 70, 90.
\text{Estimated mean = \frac{10 \times 4 + 30 \times 11 + 50 \times 18 + 70 \times 20 + 90 \times 7}{60}
marks.
(b) Modal class: (highest frequency: 20).
(c) There are 60 students, so the median is the average of the 30th and 31st values. Cumulative frequencies: 4, 15, 33, 53, 60. The 30th and 31st values fall in the class.
Estimated median marks.
(d) The mean is an estimate because we use the midpoint of each class as a representative value, assuming all values in a class are evenly distributed. In reality, values may be clustered at one end of the class interval, making the midpoint an approximation.
UT-2: Cumulative Frequency and Box Plots
Question: The cumulative frequency table shows the heights of 80 students:
| Height (cm) | Cumulative Frequency |
|---|---|
| 5 | |
| 12 | |
| 28 | |
| 52 | |
| 68 | |
| 76 | |
| 80 |
(a) Draw a cumulative frequency curve. (b) Find the median, lower quartile, and upper quartile. (c) Calculate the interquartile range. (d) A student is 182 cm tall. Is this an outlier? Justify.
Solution:
(b) Median (40th value): From the curve, approximately 163 cm. Lower quartile (20th value): approximately 158 cm. Upper quartile (60th value): approximately 169 cm.
(c) Interquartile range cm.
(d) An outlier is defined as a value more than 1.5 \times \text{IQR below Q1 or above Q3. . Upper fence cm. So 182 cm is not an outlier.
UT-3: Scatter Graphs and Correlation
Question: A teacher records the number of hours of revision and the exam score for 10 students:
| Hours | 2 | 3 | 5 | 1 | 4 | 6 | 3 | 7 | 5 | 8 |
|---|---|---|---|---|---|---|---|---|---|---|
| Score | 35 | 45 | 55 | 25 | 50 | 70 | 40 | 75 | 60 | 85 |
(a) Plot the data and describe the correlation. (b) Draw a line of best fit and use it to estimate the score for a student who revised for 4.5 hours. (c) Explain why extrapolation beyond 8 hours would be unreliable. (d) Calculate the mean hours and mean score, then estimate the gradient of the line of best fit.
Solution:
(a) The scatter graph shows strong positive correlation — as revision hours increase, exam scores increase consistently.
(b) The line of best fit passes approximately through (2, 35) and (8, 85). Gradient per hour. -intercept .
For 4.5 hours: . Estimated score: 56.
(c) Extrapolation is unreliable because: (1) The linear trend may not continue beyond the observed data range. (2) There may be diminishing returns to revision (beyond a certain point, more hours may not improve scores proportionally). (3) Other factors may become dominant (fatigue, burnout). (4) There are no data points to validate the trend in that region.
(d) Mean hours . Mean score .
The line of best fit passes through . Using another point near the extremes, e.g., : gradient per hour.
Integration Tests
IT-1: Statistical Investigation (with Number)
Question: A company tests the lifetime (in hours) of two brands of lightbulb:
Brand A: 420, 480, 510, 390, 450, 520, 470, 440, 500, 460 Brand B: 380, 410, 360, 400, 430, 370, 420, 390, 440, 350
(a) Calculate the mean, median, and range for each brand. (b) Calculate the interquartile range for each brand. (c) Which brand would you recommend and why? (d) The company claims “Brand A bulbs last on average 50% longer than Brand B.” Is this claim justified? Calculate the percentage difference.
Solution:
(a) Brand A: Sorted: 390, 420, 440, 450, 460, 470, 480, 500, 510, 520. Mean hours. Median hours. Range hours.
Brand B: Sorted: 350, 360, 370, 380, 390, 400, 410, 420, 430, 440. Mean hours. Median hours. Range hours.
(b) Brand A: Q1 = (420+440)/2 = 430, Q3 = (480+500)/2 = 490. IQR = 490 - 430 = 60 hours. Brand B: Q1 = (360+370)/2 = 365, Q3 = (410+420)/2 = 415. IQR = 415 - 365 = 50 hours.
(c) Brand A is recommended because it has a higher mean (464 vs 395) and median (465 vs 395), meaning on average it lasts longer. Brand B has a smaller range (90 vs 130), suggesting more consistency, but the lower average makes it less attractive.
(d) Percentage difference .
The claim of “50% longer” is not justified. Brand A lasts approximately 17.5% longer on average, not 50%.
IT-2: Histograms and Frequency Density (with Geometry)
Question: The histogram shows the distribution of weights of 50 parcels:
| Weight (kg) | Frequency |
|---|---|
| 8 | |
| 15 | |
| 18 | |
| 9 |
(a) Calculate the frequency density for each class. (b) A parcel weighing 3.5 kg costs to send and a parcel weighing 12 kg costs . Assuming a linear pricing model, find the formula for cost in terms of weight . (c) Calculate the total cost to send all 50 parcels, using the estimated total weight. (d) Calculate the proportion of parcels that cost more than .
Solution:
(a) Class widths: 2, 3, 5, 10. Frequency densities: 8/2 = 4$$15/3 = 5$$18/5 = 3.6$$9/10 = 0.9.
(b) Linear model: . When w = 3.5$$C = 4: . When w = 12$$C = 12: . Subtracting: 8.5m = 8$$m = 0.941. .
(to 2 d.p.).
(c) Estimated total weight: kg. Total cost .
(d) when . . kg.
Parcels in the class: 9 parcels (all weigh more than 7.76 kg). Estimated parcels in above 7.76 kg: parcels.
Total parcels. Proportion .
IT-3: Probability and Statistics Combined (with Algebra)
Question: A bag contains red and blue counters. Two counters are drawn at random without replacement. The probability of drawing two red counters is . The probability of drawing two blue counters is . (a) Let be the number of red counters and the number of blue counters. Write two equations in and . (b) Solve the equations to find and . (c) Calculate the probability of drawing one counter of each colour. (d) If 3 counters are drawn without replacement, calculate the probability that exactly 2 are red.
Solution:
(a) Total counters . P(\text{two red) = \frac{r(r-1)}{n(n-1)} = \frac{1}{6}. P(\text{two blue) = \frac{b(b-1)}{n(n-1)} = \frac{1}{3}.
(b) From P(\text{two red): . From P(\text{two blue): . So Giving .
Also: \frac{1}{6} + \frac{1}{3} + P(\text{one of each) = 1So P(\text{one of each) = 1 - \frac{1}{6} - \frac{1}{3} = \frac{1}{2}.
P(\text{one of each) = \frac{2rb}{n(n-1)} = \frac{1}{2}.
From and : . . . .
Substituting into : .
.
. . .
(since gives But then P(\text{two blue) = 0Not ).
. Total .
Check: P(\text{two red) = \frac{15 \times 14}{36 \times 35} = \frac{210}{1260} = \frac{1}{6} \checkmark. P(\text{two blue) = \frac{21 \times 20}{36 \times 35} = \frac{420}{1260} = \frac{1}{3} \checkmark.
(c) P(\text{one of each) = \frac{15 \times 21}{36 \times 35} \times 2 = \frac{630}{1260} = \frac{1}{2}.
(d) P(\text{exactly 2 red in 3) = \frac{\binom{15}{2} \times \binom{21}{1}}{\binom{36}{3}} = \frac{105 \times 21}{7140} = \frac{2205}{7140} = \frac{441}{1428} = \frac{147}{476} \approx 0.309.
Summary
The key principles covered in this topic are linked in the sub-pages above. Focus on understanding the definitions, applying the formulas or frameworks, and evaluating strengths and limitations of each approach.
Worked Examples
Worked examples demonstrating the application of key concepts are covered in the detailed sub-pages linked above.
Common Pitfalls
- Confusing terminology or concepts that appear similar but have distinct meanings.
- Overlooking key assumptions or boundary conditions that limit applicability.