The Null Hypothesis


Hypothesis testing is a method used to determine whether there is enough evidence in a sample of data to infer that a certain condition holds true for the entire population. It starts with a null hypothesis (H₀), which assumes no effect or difference, and uses a p-value to measure how likely the observed data would occur if H₀ were true. If the p-value is below a chosen threshold (like 0.05), the null hypothesis is rejected, suggesting that the observed effect is statistically significant.

Contents:
  • Null Hypothesis Vs Alternative Hypothesis
  • What a P-Value is
  • How they are used together in statistical hypothesis testing
  • When and why they are useful
  • Limitations and when not to use them
What Is the Null Hypothesis?

The null hypothesis (often written as H₀) is a starting assumption in statistics. It represents the idea that there is no effect, no difference, or nothing unusual happening in your data.

  • It’s the idea you are trying to challenge.
  • The opposite of what you are trying to test for

The property you are trying to test for is Alternative Hypothesis (H₁ or Hₐ).  Which is what you testing for statistical significance.

For example:

Thesis –  Trading signal X has a higher probability of directional prediction than random entry.

  • H₀ = Trading signal X has no benefit over a random entry trading signal.
  • H₁ = Trading signal X does have benefit over a random entry trading signal.
What Is a P-Value?

The p-value is a number between 0 and 1 that tells you how likely it is to observe your data if the null hypothesis were true.

In simple terms:

The smaller the p-value, the less likely the null hypothesis is to be true.

The p-value represents the probability that H₀ can be observed in the data. The lower the value the more certainty that the alternative hypothesis,  H₁ is observed and as a result increases statistical significance of that property being observed.

How Null Hypothesis and P-Values Work Together in Hypothesis Testing

The basis steps of a hypothesis test:

1. State the Hypotheses

  • Null Hypothesis (H₀): e.g., μ = 0
  • Alternative Hypothesis (H₁): e.g., μ ≠ 0

2. Choose a Significance Level (α):

  • This is your threshold for rejecting H₀.
  • Common choices: α = 0.05 (5%), 0.01 (1%)

3. Collect Data and Run a Test:

  • Use statistical tests: t-test, chi-square test, ANOVA, etc.
  • These give you a p-value

4. Compare p-value to α:

  • If p ≤ α → Reject H₀ (evidence against the null)
  • If p > α → Fail to reject H₀ (not enough evidence)

Example:

Let’s say you’re testing whether a new teaching method improves scores.

  • H₀: No improvement in test scores (mean = 70)
  • H₁: Improvement (mean > 70)
  • α = 0.05
  • You get a p-value of 0.01

Since 0.01 < 0.05 → Reject H₀ → The new method likely works.

The p-value represents the error threshold or percentage that you might have a false positive and still be wrong, the probability that you’re happy with living with as a risk that null hypothesis may actually be true.

The Significance Level (α) that you set determines how accurate or strict you need to be, 1%, and 5% are the most common, while 10% is used it often is considered less reliable as it allows a 10% chance that you have a false positive and in fact H₁ has no observable statistical significance and we actually Fail to Reject H₀

When to Use P-Value Testing
  • You want to test a specific claim or effect in a data sample.
  • You have enough data to make probabilistic conclusions.
  • You want to make objective, data-driven decisions.
  • You are comparing two or more groups, and want to know if differences are real or due to chance.
Limitations & When Not to Use It

1. Misinterpretation of the P-Value

  • P-value ≠ probability the null hypothesis is true.
  • It’s about the probability of your data given H₀ is true, not the other way around.

2. Statistical Significance ≠ Practical Significance

  • A result may be “statistically significant” but have tiny or meaningless real-world impact.
  • E.g., a drug improves recovery by 0.1% with p=0.01. Is it worth it?

3. P-Hacking / Multiple Testing

  • If you run lots of tests, some will come up significant just by chance.
  • This inflates false positives unless you correct for it (e.g., Bonferroni correction).

4. Overreliance on Arbitrary Thresholds

  • People treat 0.05 like a sacred cut off, but it’s just a convention.
  • P=0.049 → “Significant!” vs. P=0.051 → “Not significant!” is a silly binary.

5. Small Sample Sizes

  • With too little data, you might miss real effects (false negatives).
  • With too much data, even tiny effects become “significant”.
How Effective Is It?

Strengths

  • Well-established and widely used in scientific and business research.
  • Offers a clear framework for testing assumptions.
  • Works well when the assumptions (normality, independence, etc.) hold.

Weaknesses

  • Easy to misuse or misinterpret.
  • Can be gamed (e.g., by cherry-picking results).
  • Doesn’t provide the size or direction of an effect (you need confidence intervals or effect sizes for that).