Statistics and Probability
Statistics and Probability
Higher Statistics
Measures of Central Tendency and Spread
Mean:
The mean is the balance point of the data. It is sensitive to outliers: a single extreme value can Shift the mean significantly.
Median: The middle value when data is sorted. For data points, if is odd, the median is The Th value. If is even, it is the average of the Th and Th values. The median is robust to outliers.
Mode: The most frequently occurring value. A data set can be unimodal, bimodal, or multimodal.
Standard Deviation:
The divisor (Bessel’s correction) gives an unbiased estimate of the population standard Deviation. With in the denominator, the sample standard deviation systematically underestimates The population parameter.
For grouped data with frequencies :
Computational formula (avoids subtracting the mean from each value):
Example: Calculate the mean and standard deviation of the following data:
| 2 | 4 | 6 | 8 | 10 | |
|---|---|---|---|---|---|
| 3 | 5 | 8 | 4 | 2 |
Transformation properties of the mean and standard deviation. If where and Are constants:
This is useful when data needs to be converted between units (e.g., Celsius to Fahrenheit).
Interquartile Range and Box Plots
The interquartile range (IQR) is Where is the 25th percentile and is The 75th percentile.
Box plots (box-and-whisker diagrams) display five key statistics: minimum, Median, And maximum. The box spans from to With the median marked inside. Whiskers extend to The most extreme data points within 1.5 \times \mathrm{IQR of the quartiles.
Outlier detection: A value is a potential outlier if it falls below Q_1 - 1.5 \times \mathrm{IQR or above Q_3 + 1.5 \times \mathrm{IQR.
Example: Data set: .
. Median .
Lower half: . .
Upper half: . .
\mathrm{IQR = 18 - 7 = 11.
Upper fence: .
Since The value is an outlier.
Probability
Basic Rules:
For mutually exclusive events: So .
For independent events: .
Conditional Probability:
Bayes’ Theorem:
This theorem is foundational in statistics, machine learning, and medical testing. It allows you to “invert” conditional probabilities: if you know but need Bayes’ theorem provides The bridge.
Example: In a school, 60% of students study Maths, 40% study Physics, and 25% study both. A Student is chosen at random.
(a) What is the probability they study at least one of Maths or Physics?
(b) Given that a student studies Physics, what is the probability they also study Maths?
(c) Are the events “studies Maths” and “studies Physics” independent?
But . Since The Events are not independent.
Example: A medical test has a 95% true positive rate (P(\mathrm{positive | \mathrm{disease) = 0.95) and a 2% false positive rate (P(\mathrm{positive | \mathrm{no disease) = 0.02). If 1% of the population has the condition, find P(\mathrm{disease | \mathrm{positive).
Let = has disease, = tests positive.
Despite a 95% true positive rate, only about 32% of positive tests indicate actual disease, due to The low prevalence (base rate fallacy).
Probability Trees
A probability tree diagram is useful for multi-stage experiments. Each branch represents a possible Outcome at each stage, with probabilities labeled on the branches.
Rules for probability trees:
- The probabilities on branches from a single node sum to 1.
- To find the probability of a path through the tree, multiply the probabilities along the path.
- To find the probability of an event, add the probabilities of all paths leading to that event.
Example: A bag contains 3 red and 5 blue balls. Two balls are drawn without replacement. Find The probability that both are red.
First draw: .
Second draw (given first was red): .
P(\mathrm{both red) = \frac{3}{8} \times \frac{2}{7} = \frac{6}{56} = \frac{3}{28}Binomial Distribution
A binomial distribution describes the number of successes in independent Trials, each with probability of success .
Conditions for a binomial distribution:
- Fixed number of trials
- Each trial has exactly two outcomes (success/failure)
- Trials are independent
- Probability of success is constant across trials
Mean:
Variance: \mathrm{Var(X) = np(1-p)
Proof of . Let be the indicator variable for the Th trial: if Success, if failure. Then and .
Example: A fair die is rolled 8 times. Find the probability of getting exactly 3 sixes.
Example: . Find .
This is most evaluated using a calculator or statistical tables.
Normal Distribution
A continuous random variable has a normal distribution if its Probability density function is:
Why the normal distribution is ubiquitous. The Central Limit Theorem states that the sum (or Average) of a large number of independent, identically distributed random variables is approximately Normally distributed, regardless of the original distribution. This is why measurements of natural Phenomena (heights, blood pressure, measurement errors) tend to be normally distributed.
Standard Normal: Where .
Properties:
- The curve is symmetric about
- Approximately 68% of data lies within one standard deviation of the mean
- Approximately 95% within two standard deviations
- Approximately 99.7% within three standard deviations
Key standard normal values:
| 1.00 | 0.8413 |
| 1.645 | 0.9500 |
| 1.96 | 0.9750 |
| 2.00 | 0.9772 |
| 2.326 | 0.9900 |
| 2.576 | 0.9950 |
Example: The heights of Scottish men are normally distributed with mean 175 cm and standard Deviation 8 cm. Find the probability that a randomly chosen man is taller than 185 cm.
From standard normal tables, .
Example: Exam scores are normally distributed with mean 60 and standard deviation 12. The top 10% of candidates receive an A grade. Find the minimum score for an A grade.
We need I.e., .
From tables, .
The minimum A grade score is approximately 75.
Normal Approximation to the Binomial
When is large and is not too close to 0 or 1, .
Rule of thumb: Use the normal approximation when and .
Continuity correction: Since the binomial is discrete and the normal is continuous, apply a Continuity correction: .
Example: . Approximate .
, .
Example: . Use the normal approximation with continuity correction to Estimate .
, .
With continuity correction:
Correlation and Regression
Product Moment Correlation Coefficient (PMCC):
Properties:
- : perfect positive linear correlation
- : perfect negative linear correlation
- : no linear correlation (but there may be a non-linear relationship)
Important caveat: Correlation does not imply causation. Two variables may be strongly correlated Because they are both driven by a third variable (confounding variable).
Least Squares Regression Line:
Where and .
Interpretation: The slope tells you the expected change in for a One-unit increase in .
Extrapolation warning: The regression line is reliable only within the range of the observed Data. Predicting outside this range (extrapolation) is unreliable because the linear relationship May not hold.
Example: Calculate the PMCC for the following data:
| 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|
| 2 | 5 | 6 | 9 | 11 |
, .
.
.
.
There is a very strong positive linear correlation.
Advanced Higher Statistics
Probability Distributions
Expected Value and Variance:
\mathrm{Var(X) = E(X^2) - [E(X)]^2Properties of expectation:
- (always, even if and are dependent)
- If and are independent: \mathrm{Var(X + Y) = \mathrm{Var(X) + \mathrm{Var(Y)
- If and are dependent: \mathrm{Var(X + Y) = \mathrm{Var(X) + \mathrm{Var(Y) + 2\mathrm{Cov(X, Y)
Example: A biased coin lands on heads with probability . Let be the number of heads in 3 Tosses.
.
E(X) = 3p, \quad \mathrm{Var(X) = 3p(1-p)Discrete Probability Distributions
Uniform distribution: for equally likely outcomes.
E(X) = \frac{1}{n}\sum x_i, \quad \mathrm{Var(X) = \frac{1}{n}\sum x_i^2 - \left(\frac{1}{n}\sum x_i\right)^2Example: A fair die is rolled. Find and \mathrm{Var(X).
\mathrm{Var(X) = \frac{91}{6} - \frac{49}{4} = \frac{182 - 147}{12} = \frac{35}{12} \approx 2.917The Geometric Distribution
The geometric distribution models the number of trials until the first success in a sequence of Independent Bernoulli trials with success probability .
Mean:
Variance: \mathrm{Var(X) = \dfrac{1-p}{p^2}
Proof of .
Using the identity for With :
Example: A die is rolled until a 6 appears. Find the probability that more than 4 rolls are Needed.
Alternatively: . This is because means the first 4 rolls are all failures.
The Poisson Distribution (Introduction)
The Poisson distribution models the number of events occurring in a fixed interval of time or space, Given that events occur independently at a constant average rate .
Mean:
Variance: \mathrm{Var(X) = \lambda
The Poisson distribution is a limiting case of the binomial when and with fixed.
Example: A call centre receives an average of 4 calls per minute. Find the probability of Receiving exactly 6 calls in a given minute.
X \sim \mathrm{Po(4)Hypothesis Testing
Steps:
- State the null hypothesis and alternative hypothesis
- State the significance level
- Calculate the test statistic
- Determine the critical region or -value
- Make a decision and state the conclusion in context
Type I and Type II errors:
- Type I error: Rejecting when it is true (false positive). Probability = .
- Type II error: Failing to reject when it is false (false negative). Probability = .
- The power of a test is The probability of correctly rejecting a false .
Trade-off between errors. Decreasing (making the test more conservative) increases (making it harder to detect a real effect). The only way to decrease both simultaneously is To increase the sample size.
Example: A manufacturer claims that the mean weight of bags of sugar is 500 g. A sample of 16 Bags has mean weight 497 g with standard deviation 5 g. Test at the 5% significance level whether The mean weight differs from 500 g.
, .
Test statistic: .
Critical values for -distribution with 15 degrees of freedom at 5% (two-tailed): approximately .
Since We reject . There is sufficient evidence to suggest the mean Weight differs from 500 g.
Chi-Squared Test
Used to test for association between categorical variables.
Where is the observed frequency and is the expected frequency.
Conditions: All expected frequencies should be at least 5. If any Combine categories.
Example: A survey investigates whether there is an association between gender and preferred Subject among 200 students:
| Maths | Science | English | Total | |
|---|---|---|---|---|
| Male | 40 | 35 | 15 | 90 |
| Female | 30 | 25 | 55 | 110 |
| Total | 70 | 60 | 70 | 200 |
Expected frequencies: E_{ij} = \dfrac{\mathrm{row total \times \mathrm{column total}{\mathrm{grand total}.
Degrees of freedom: .
Critical value at 5% for 2 df: 5.991.
Since We reject . There is significant evidence of an association.
One-Tailed vs Two-Tailed Tests
- Two-tailed test: specifies that the parameter differs from in either direction (e.g., ). The significance level is split between both tails: in each.
- One-tailed test: specifies a direction (e.g., or ). The entire significance level is in one tail, making the test more powerful for detecting an effect in that direction.
Example: A machine is supposed to fill bottles with 500 ml. A sample of 20 bottles has mean 498 Ml with standard deviation 4 ml. Test at the 5% level whether the machine is underfilling.
, (one-tailed).
.
Critical value for -distribution with 19 df at 5% (one-tailed): approximately .
Since We reject . There is sufficient evidence that the machine is Underfilling.
Note: If this were a two-tailed test, the critical value would be approximately And So we would still reject in this case. However, the one-tailed Test has a lower threshold, making it easier to detect a difference in the specified direction.
Coefficient of Determination
The coefficient of determination represents the proportion of the variance in that is Explained by the linear relationship with .
Where are the predicted values from the regression line.
Interpretation: If Then 85% of the variation in is accounted for by the linear Regression on . The remaining 15% is due to other factors or random variation.
Worked Examples
See the examples integrated throughout the sections above.
Common Pitfalls
-
Using instead of for sample standard deviation: The sample standard deviation uses (Bessel’s correction) in the denominator. Using gives a biased estimate that systematically underestimates the population standard deviation.
-
Confusing with : These are not the same. A classic example: P(\mathrm{disease | \mathrm{positive test) is much lower than P(\mathrm{positive test | \mathrm{disease) because the base rate of the disease matters. Use Bayes’ theorem if needed.
-
Forgetting continuity correction: When approximating a discrete distribution (e.g., binomial) with a continuous one (normal), apply a continuity correction. Without it, your approximation can be significantly off.
-
Incorrect expected frequencies: In a chi-squared test, expected frequencies must be calculated correctly using row and column totals. Each is the product of its row total and column total, divided by the grand total.
-
Not stating hypotheses : Always explicitly state and before performing a hypothesis test. The conclusion must be stated in the context of the problem.
-
Interpreting correlation as causation: A strong correlation between and does not mean causes . There may be a confounding variable, or the relationship may be spurious.
-
Extrapolating beyond the data range: The regression line is only valid within the range of observed data. Predicting outside this range is unreliable.
-
Assuming normality without justification: The normal approximation to the binomial requires and . For small or extreme Use the exact binomial distribution.
-
Confusing one-tailed and two-tailed tests: A two-tailed test has a critical region split between both tails. The significance level is shared between the two tails, so each tail has . Using a two-tailed test when a one-tailed test is appropriate reduces the power of the test.
Practice Questions
-
The weights of packets of crisps are normally distributed with mean 35 g and standard deviation 1.5 g. Find the probability that a randomly chosen packet weighs between 33 g and 37 g.
-
. Find .
-
A company tests whether a new drug reduces blood pressure. In a sample of 25 patients, the mean reduction was 4.2 mmHg with standard deviation 3.1 mmHg. Test at the 1% significance level whether the drug reduces blood pressure (one-tailed test).
-
Calculate the PMCC for the following data and interpret the result:
| 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|
| 2 | 5 | 6 | 9 | 11 |
-
In a group of 100 students, 55 study Maths, 40 study Chemistry, and 20 study both. A student is selected at random. Find the probability that they study neither subject.
-
A die is rolled until a 6 appears. Find the probability that more than 4 rolls are needed.
-
For the data in question 4, find the equation of the least squares regression line of on and use it to predict when .
-
A survey of 300 people classified by age group and voting preference gives with 4 degrees of freedom. Test at the 5% significance level whether there is an association between age and voting preference.
-
A machine produces bolts with mean length 10 cm and standard deviation 0.1 cm. Find the probability that a randomly selected bolt is between 9.85 cm and 10.15 cm.
-
. Use the normal approximation with continuity correction to estimate .
-
A medical test has a 95% true positive rate and a 2% false positive rate. If 1% of the population has the condition, find the probability that a person who tests positive actually has the condition.
-
Explain the difference between a Type I error and a Type II error in the context of a hypothesis test.
-
Find and \mathrm{Var(X) for the probability distribution:
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0.1 | 0.3 | 0.4 | 0.2 |
-
Two dice are rolled. Let be the sum. Find and \mathrm{Var(X).
-
A bag contains 4 red and 6 blue balls. Three balls are drawn without replacement. Find the probability that exactly two are red.
-
Heights are normally distributed with mean 170 cm and standard deviation 10 cm. Find the height that only 5% of people exceed.
Summary
This topic covers the mathematical techniques and concepts related to statistics and probability, including key theorems, methods, and problem-solving approaches.
Key concepts include:
- measures of central tendency and spread
- probability distributions (binomial, normal)
- hypothesis testing
- correlation and regression
- sampling methods
Regular practice with a variety of question types is essential to build fluency and confidence in applying these mathematical techniques.