Skip to content

Control Case Study

Printer-friendly version

Case-control study designs are used to estimate the relative risk for a disease from a specific risk factor.  The estimate is the odds ratio, which is a good estimate of the relative risk especially when the disease is rare. Case-control studies are useful when epidemiologists to investigate an outbreak of a disease because the study design is powerful enough to identify the cause of the outbreak especially when the sample size is small.  Attributable risks may also be calculated.

The approach for a case-control study is straightforward.  Case-control studies begin by enrolling persons based upon their current disease status. Previous exposure status is subsequently determined for each case and control.   However, because these studies collect data after disease has already occurred,  they are considered retrospective, which is a limitation.  While a case-control study design offers less support for a causation hypothesis than the longer and more expensive cohort design, it does provide stronger evidence than a cross-sectional study.

Below is a 2 × 2 table for case-control data:

 

Case
(Number)

Controls
(Number)

Total Exposure
(Number)

Exposed

A

B

Total Exposed

Not Exposed

C

D

Total Not exposed

 

Total Cases

Total Controls

Total

With case-control studies,  we essentially work down the columns of the 2 × 2 table.  Cases are identified first, then controls. The investigator then determines whether cases and controls were exposed or not exposed to the risk factor. We calculate the odds of exposure among cases (A/C) and the odds of exposure among controls (B/D).  The odds ratio is then (A/C)/(B/D), which simplifies, after cross-multiplication, to (A*D)/(B*C).

Think About It!

Come up with an answer to this question and then click on the icon to the left to reveal the answer.

Why can't we determine the incidence rate from a case-control study?

We have selected cases and controls from a population, often an unknown population. For example, we might enoll patients in a hospital, but we don't really know the size of the general popluation that would have come to the hospital.  Also, we have not followed persons at risk to monitor the development of disease. Furthermore, the investigator selects the number of cases relative to the number of controls.

A most critical and often controversial component of a case-control study is the selection of the controls.  Controls must be comparable to cases in every way except that they do not have the disease.  Preferably controls are drawn from the same population as the cases.  Some studies, though, draw the controls from a different data source. For example, cases may be detected from a disease registry but the controls are selected randomly from another data source. Controls should be selected without regard to their exposure status (e.g., exposed/non-exposed), but may be sampled proportional to their time at risk (which is called density sampling).

There are two basic types of case-control studies, distinguished by the method used to select controls.  The first is a non-matched case-control study in which we enroll controls without regard to the number or characteristics of the cases.    In this study design, the number of controls does not necessarily equal the number of cases.  For example, we may enroll 105 cases and 178 controls. Analytic methods for non-matched case-control studies include:

  • Chi-square 2 × 2 analysis;
  • Mantel-Hanszel statistic (This test takes into account the possibility that there are different effects for the different strata (e.g., effect modification))
  • Fisher’s Exact test (This test is used if an expected cell size is <5)
  • Unconditional logistic regression (The method is used to simultaneously adjust for mutliple confounders; a multivariable analysis).

The other basic type is a matchedcase-control study.  In a matched study, we enroll controls based upon some characteristic(s) of the case.  For example, we might match the sex of the control to the sex of the case.  The idea in matching is to match upon a potential confounding variable in order to remove the confounding effect.  (We will look at how matching occurs in the example below.) 

There are two basic types of matched designs: one-to-n matching (i.e., one case to one control, or one case to a specific number of controls) and frequency-matching, where matching is based upon the distributions of the characteristics among the cases.  For example, 40% of the cases are women so we choose the controls such that 40% of the controls are women.

In an analysis of a matched study design, only discordant pairs are used.  A discordant pair occurs when the exposure status of case is different than the exposure status of the control.  Analytic methods for matched case control studies include conditional logistic regression, conditioned upon the matching.

To review, for a simple non-matched case control study, you find a case, determine whether the person is exposed or not. Find a control; determine their exposure status. The data can be summarized in a 2 × 2 table as below:

 

Case
(Number)

Controls
(Number)

Exposed

A

B

Not Exposed

C

D

In contrast, the matched case-control study has linked a case to a control based on matching of one or more variables. The summary table will differ for a matched case-control study.

Let's look at an example.  Suppose we plan to match cases to controls by gender and age (+/- 5 years).  We first identify the following case:

Case: Male, 45 years of age (Patient 1); Exposure status: Exposed

If this was a non-matched study,  the case would be counted in cell A in the preceding table because he is exposed. However, in the age- and gender-matched case-control study we must also find  a male control within five years of age. Searching in the appropriate control population, we locate the following control:

Control: Male 48 years of age (Person 47); Exposure status: Exposed

If Person 47 were counted in an unmatched study, he would belong in cell B of the preceding table.  In a matched case-control study however, we are interested in results for the matched pair.  The data from Patient 1 and Person 47 are linked for the duration of the study. The appropriate table for the matched study is depicted below. Where do Patient 1 and Person 47 belong?

  

Cases

  

Exposed
(Number)

Not Exposed
(Number)

Total
(Number)

Controls

Exposed
(Number)

A
(Concordant Pair)

B
(Discordant Pair)

Total ExposedControls

Not Exposed
(Number)

C
(Discordant Pair)

D
(Concordant Pair)

Total Not exposedControls

  

Total ExposedCases

Total Not exposedCases

Total

Patient 1 is a case and he is exposed so he fits into either cell A or cell C. Based upon his control's status we determine which cell is the correct placement for this pair. Patient 1's control is exposed, therefore Patient 1 and Person 47 fit into cell A as a pair. This is a concordant pair because both are exposed. Concordancy is based upon exposure status. In a matched case-control study, the cell counts represent pairs, not individuals.  In the statistical analysis, only the discordant pairs are important.  Cells B and C contribute to the odds ratio in a matched design. Cells A and D do not contribute to to the odds-ratio. If the risk for disease is increased due to exposure, C will be greater than B.

Think About It!

Come up with an answer to this question and then click on the icon to the left to reveal the answer.

Can you think of more than one reason why a matched case-control study could take longer to complete than an unmatched study?

First you must identify matched controls, sometimes more than one per case. Second, since only the discordant pairs contribute to the statistical analysis, achieving a desired statistical power depends on obtaining a particular number of discordant pairs.

Think About It!

Come up with an answer to this question and then click on the icon to the left to reveal the answer.

Why bother with matching if it means a longer case-control study?

We match to eliminate the possibility of the relationship being confounded by the matching variable because both the case and the control are similar for that variable.  In the above example, we control for confounding from age or sex because we matched on age and sex.  We don't want to match on too many variables because it will cause an extreme delay in the completion of the study.

When performing statistical analysis, the matched variables are not included in the statistical model.

(In a cohort study, confounding is dealt with by including the terms in the model to adjust for their effects. In a matched case-control study, the adjustment for this confounding has been made through the matching.)

We will learn more about designing a cohort study later in this course. Below is table comparing advantages and disadvantages of the cohort design to a case-control design.

Quick Comparison of Cohort and Case-control Studies

Cohort Study

  • Can calculate incidence rate, risk, and relative risk
  • Potentially greater strength for causal investigations
  • Expensive
  • Long-term study
  • Large sample size required
  • Efficient design for rare exposure
  • Good for multiple outcomes
  • Less potential for recall bias
  • More potential for loss-to-follow up
  • Possibly generalizable
  • Allows examination of natural course of disease, survival

Case-Control Study

  • Only estimates relative risk
  • Potentially weaker causal investigation
  • Inexpensive
  • Short-term study
  • Can be powerful with small sample of cases
  • Efficient design for rare disease
  • Good for multiple exposures
  • More potential for recall bias
  • Less potential for loss-to-follow up
  • Probably not generalizable
  • Does not allow examination of natural course of disease, survival

Check out this example

Serum Carotenoids and Risk of Cervical Intraepithelial Neoplasia in Southwestern American Indian Women

Schiff, M. et. al, (2001) Serum Carotenoids and Risk of Cervical Intraepithelial Neoplasia in
Southwestern American Indian Women, Cancer Epidemiology, Biomarkers & Prevention, Vol. 10, 1219–1222.

Notice the high study participation rate.

How were cases and controls determined?

Is this a matched study?

Take a look at Table 1: how many cases and how many controls?

Are the demographic characteristics similar for cases and controls?

Are any different?

What statistical method is used to analyze the data?

How do the results support the conclusions (Table 2 and conclusions)?

If you have questions about this study design or the results, ask in the Week 6 General Discussion.

Design of Experiments > Case-Control Study

What is a Case-Control Study?

A case-control study is a retrospective study that looks back in time to find the relative risk between a specific exposure (e.g. second hand tobacco smoke) and an outcome (e.g. cancer). A control group of people who do not have the disease or who did not experience the event is used for comparison. The goal is figure out the relationship between risk factors and disease or outcome and estimate the odds of an individual getting a disease or experiencing an event.

Case-control studies have four main steps:

  1. The study begins by enrolling people who already have a certain disease or outcome.
  2. A second control group of similar size is sampled, preferably from a population identical in every way except that they don’t have the disease or condition being studied. They should not be selected because of an exposure status.
  3. People are asked about their exposure to risk factors.
  4. Finally, an odds ratio is calculated.

The Odds Ratio is also used to figure out if a particular exposure (like eating processed meat) is a risk factor for a particular outcome (like colon cancer). Image: Michigan.gov



The two types of case-control studies are:
  • Non-matched case-control study: this is the simplest form. Find a person with the disease and enroll them in the study. Then enroll a control and determine their exposure status.
  • Matched case-control: Find a person with the disease and enroll them in the study. Match the person for some characteristic (e.g. sex, age, weight) with a control. This can eliminate or minimize confounding variables. However, it generally results in a longer study; the more characteristics being “matched”, the longer the study takes.

Advantages and Disadvantages

Advantages
A case-control study is often the best choice for rare conditions or diseases. Let’s say 10 people in Duval county in Florida had a particularly rare disease. Random sampling for a cohort study would involve large numbers of people and may not pick up any of the diseased people at all. With a case-control study, all 10 people who have the disease can be identified (assuming they are in a medical database) and enrolled in the study. Random sampling could then be used on the non-diseased population to form the control group.
Other Advantages:

  • Short term study that doesn’t require waiting for events to happen, as they have already occurred.
  • Inexpensive.
  • Multiple risk factors can be studied at the same time.
  • Quickly establishes associations between risk factors and disease. This can be especially useful with disease outbreaks, as causes can be identified with small sample sizes.
  • Stronger than cross-sectional studies for establishing causation.

Disadvantages:

  • Control groups can be difficult to find.
  • Results can easily be tainted by recall bias, where people with the disease or condition are more likely to remember past details compared to people who don’t have the disease or condition.
  • Is weaker than a cohort study for establishing causation.
  • Usually not generalizable.

Examples from Real Life

  1. This study for non-Hodgkin lymphoma found a connection between the disease and inflammatory disorders like Sjögrens, Celiac and rheumatoid arthritis.
  2. This studyinvestigated how increased consumption of fruits and vegetables protects against Cervical Intraepithelial Neoplasia.
  3. This INTERHEART study looked at second hand tobacco smoke and increased risk of myocardial infarction.
------------------------------------------------------------------------------

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!