Power analysis (English): Difference between revisions

From Wikistatistiek
Jump to navigation Jump to search
No edit summary
Line 26: Line 26:


Finally, the logistic planning of a study benefits from a sample size calculation.
Finally, the logistic planning of a study benefits from a sample size calculation.
== Power and statistical significance ==
The term ‘power’ pops up everywhere in medical research, certainly in sample size calculations. Often, the term power is interpreted as a synonym for the number of patients tested
in a study. ‘Our study did not have enough power to control for possible confounders’ is
understood as ‘you didn’t test enough patients to account for several effects’. ‘Our study
had 80% power to detect an OR of 1.1 at a significance level of 5%’ is understood as: ‘you
have tested enough patients to pick up a possible effect’. Although these interpretations are
not (absolutely) wrong, in order to use the concept of power in a sample size calculation,
we need to understand its exact meaning. Formally:
the power of a study testing the null hypothesis H0 against the alternative
hypothesis H1 is the probability that the test (based on a sample from this
population) rejects H0, given H0 is false (in the whole population).
So the power is the chance of correctly rejecting a null hypothesis (rejecting a null
hypothesis given it should be rejected). Since in most tests H0 is stated as ’no difference
between groups or no effect of intervention’, for example H0 = ’no difference in survival
between treated and control group’, rejecting H0 means you have reason to believe there
is a difference. In other words, the power reflects the ability to pick up an effect that is
present in a population using a test based on a sample from that population (true positive).
The power of a study is closely related to the so called type II error (β), the probability
of falsely accepting H0. The power of a study is 1 − β, so it is the probability of rightfully
rejecting H0 (see Table 1). In the table also the significance level α is stated. Alpha is the
probability of falsely rejecting H0, i.e., falsely picking up an effect (false positive). Note
that α only concerns about situations in which no true effect exists in the population.
In a sample size calculation one determines the number of patients needed to test the
hypothesis with large enough power and small enough significance level. In this way one
protects oneself against false negative and false positive conclusions.
Table 1: Possible conclusions and errors of a study in relation to the truth.
Whole population
effect exists no effect exists
H1 is true H0 is true
effect observed true positive false positive
Study H1 appears true power (1 − β) type I error (α)
conclusion no effect observed false negative true negative
H0 appears true type II error (β) (1 − α)


== Referenties ==
== Referenties ==

Revision as of 15:57, 29 January 2020

This text is an edited version of the AMC sample size calculation manual [1]. It provides a practical guide into sample size calculations used in clinical research. After reading the manual, a researcher will know:

  • why power analysis is used to plan and evaluate medical research
  • what power and statistical significance mean
  • what information is needed for a sample size calculation
  • where to find the information needed
  • how to perform a simple sample size calculation
  • how to write down a power calculation.

In addition, the manual contains two practical examples of sample size calculations.

Why perform a sample size calculation?

The main reasons for perfomring a sample size calculation are ethical. If the number of subjects tested in a study is too small to detect the effect being investigated, the subjects will be subjected to the risks of participating in the study in vain.

The study will easily result in a false negative conclusion. On the other hand, testing too many subjects may also lead to undesirable situations. If an intervention turns out to be effective, too many subjects have missed out on this intervention. If the intervention is not effective, too many have been exposed to this ineffective intervention. For these reasons a trial should always consider what number of subjects would be appropriate to answer the study question. Sample size calculations prior to a study can help focus on the number of subjects that is needed and sufficient for a study. Moreover, a sample size calculation helps one to focus on a clinically relevant effect, instead of the erroneous strategy of testing as many subjects as needed to reach statistical significance of an irrelevant effect.

The CONSORT statement (guideline for reporting clinical trials) states that a researcher should calculate study size on beforehand and should report this calculation in the methods section of the resulting scientific paper. The AMC Medical Ethics Board (MEC) and the Animal Experiments Committee (DEC) also ask for a power calculations in the approval process. The same holds for most study grant applications (e.g., ZonMW).

Finally, the logistic planning of a study benefits from a sample size calculation.

Power and statistical significance

The term ‘power’ pops up everywhere in medical research, certainly in sample size calculations. Often, the term power is interpreted as a synonym for the number of patients tested in a study. ‘Our study did not have enough power to control for possible confounders’ is understood as ‘you didn’t test enough patients to account for several effects’. ‘Our study had 80% power to detect an OR of 1.1 at a significance level of 5%’ is understood as: ‘you have tested enough patients to pick up a possible effect’. Although these interpretations are not (absolutely) wrong, in order to use the concept of power in a sample size calculation, we need to understand its exact meaning. Formally: the power of a study testing the null hypothesis H0 against the alternative hypothesis H1 is the probability that the test (based on a sample from this population) rejects H0, given H0 is false (in the whole population). So the power is the chance of correctly rejecting a null hypothesis (rejecting a null hypothesis given it should be rejected). Since in most tests H0 is stated as ’no difference between groups or no effect of intervention’, for example H0 = ’no difference in survival between treated and control group’, rejecting H0 means you have reason to believe there is a difference. In other words, the power reflects the ability to pick up an effect that is present in a population using a test based on a sample from that population (true positive). The power of a study is closely related to the so called type II error (β), the probability of falsely accepting H0. The power of a study is 1 − β, so it is the probability of rightfully rejecting H0 (see Table 1). In the table also the significance level α is stated. Alpha is the probability of falsely rejecting H0, i.e., falsely picking up an effect (false positive). Note that α only concerns about situations in which no true effect exists in the population. In a sample size calculation one determines the number of patients needed to test the hypothesis with large enough power and small enough significance level. In this way one protects oneself against false negative and false positive conclusions. Table 1: Possible conclusions and errors of a study in relation to the truth. Whole population effect exists no effect exists H1 is true H0 is true effect observed true positive false positive Study H1 appears true power (1 − β) type I error (α) conclusion no effect observed false negative true negative H0 appears true type II error (β) (1 − α)


Referenties

  1. van Geloven N, Dijkgraaf M, Tanck M, Reitsma J. AMC biostatistics manual - Sample size calculation. 2009. Amsterdam: Academic Medical Center.

    [geloven2009]