Student Perspectives: Verification Bias in Diagnostic Test Accuracy Studies with Conditional Reference Standards

A post by Vera Hudak, PhD student on the Compass programme.

Introduction

To evaluate the accuracy of a new diagnostic test (the `index test’), the ideal approach is to compare it against an error-free reference standard, known as the gold standard.
However, gold standards may be unavailable, invasive, or costly. In such situations, a possible approach is to condition gold standard testing on the outcome of some initial imperfect reference standard test(s).

We focus on a conditional testing design which we refer to as `check the negatives’. Here, all participants receive the index test (Test A) and an imperfect reference standard (Test B), then those testing negative on Test B are followed up with the gold standard (GS). The diagnostic accuracy of Test A is assessed against observed disease status, derived from the test sequence combining Test B and the GS. Figure 1 illustrates this design.

Figure 1: Test sequence for `check the negatives’.

Now, if Test B was 100% specific, the `check the negatives’ design would lead to unbiased estimates of the sensitivity and specificity of Test A, given by:

\text{Sensitivity} = \frac{\text{TP}}{\text{TP} + \text{FN}} \quad \text{and} \quad \text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}}.

However, as Test B is an imperfect test, this is unlikely to be the case, and bias can be anticipated. We quantified this bias and proposed a Bayesian adjustment method.

Probabilities in `Check the Negatives’ Studies

Let A_{se} and A_{sp} denote the true sensitivity and specificity of Test A, B_{sp} the true specificity of Test B, and \pi the prevalence of the condition in the study population. We also introduce c_0 as the covariance of errors between Test A and Test B in the disease-free population.

In a `check the negatives’ design, we do not observe the complete set of outcomes from the index test, the imperfect reference standard, and the gold standard. Instead, we only observe a reduced set of results: each participant’s outcome on Test A and their observed disease status, as determined by the conditional diagnostic test sequence combining Test B and the GS. Table 1 shows the probabilities associated with these observed outcomes under conditional dependence between Test A and Test B [1]. The corresponding probabilities under conditional independence can be obtained by setting c_0 = 0.

Table 1: Probabilities observed in a `check the negatives’ study.

Quantifying Bias

We used the probabilities from Table \ref{tab:partially ver prob} to find closed-form expressions for the naive estimates of the sensitivity (\widehat{A_{se}}) and specificity (\widehat{A_{sp}}) of Test A, and hence, the bias. The bias in the naive estimate of specificity is as follows:

\text{Bias}(\widehat{A_{sp}}) = \frac{c_0}{B_{sp}}.

Under conditional independence (c_0 = 0), the naive specificity estimate is unbiased. If, however, Tests A and B are conditionally dependent among the disease-free population, then there is a bias which depends on B_{sp} and c_0.

Similarly, the bias in the naive estimate of the sensitivity of Test A can be expressed as:

\text{Bias}(\widehat{A_{se}}) = \frac{(1-\pi)((1-A_{sp}) (1-B_{sp}) + c_0 - A_{se}(1-B_{sp}))}{(1-\pi)(1-B_{sp}) + \pi},

Assuming independence in the disease-free population (c_0 = 0), Figure 2 shows \text{Bias}(\widehat{A_{se}}) as a function of B_{sp}, for selected values of \pi, A_{se}, A_{sp} and B_{sp}.

Figure 2: Bias in the naive estimate of the sensitivity of Test A against the specificity of Test B for different values of Test A sensitivity, specificity and disease prevalence.

As expected from the study design, bias tends to 0 as B_{sp} tends to 1. Bias increases as the accuracy of Test A improves, i.e. when A_{se} or A_{sp} is larger, or with lower prevalence. We can see that when \pi = 0.9, the bias is almost negligible. However, bias can be substantial in some scenarios, specifically under low prevalence, even when Test B has high specificity (e.g. over 95%).

Bias Adjustment

We proposed a Bayesian model with an informative prior on Test B specificity can be used for adjusting for the bias in the naive estimate of the sensitivity of Test A, under conditional independence (c_0 = 0).

We let \boldsymbol{x} = (TP, FN, FP, TN) be the data reported by a `check the negatives’ study evaluating Test A. Then \boldsymbol{x} \sim \text{Multinomial}(\boldsymbol{p}, n), where n is the number of participants in the study, and the probabilities \boldsymbol{p} = (p_1, p_2, p_3, p_4) are as specified in Table 1, with c_0 = 0.

Suppose we have prior information about the specificity of Test B represented with a Beta prior distribution. Then a bias-adjusted estimate of A_{se} could be obtained by fitting this multinomial distribution to the data with vague \text{Beta}(1,1) priors for the remaining three parameters, A_{se}, A_{sp}, and \pi.

Simulation Study

We assessed this adjustment method through a simulation study under two scenarios: where the informative prior is correctly centred on the true specificity, and where it underestimates the truth by 5%, to examine the impact of moderate prior misspecification. Prior precision (sd) is also varied to assess the impact of increasing uncertainty. In Figure 3, we present some result from this simulation study for the correctly centred prior case. These results show that a correctly centred prior consistently eliminates bias under high precision and reduces it under lower precision.

Figure 3: Crude and adjusted bias in the sensitivity of Test A using correctly centred priors, for some parameter combinations.

Although not shown here, we found that overly pessimistic priors can over-correct, increasing absolute bias, especially when initial bias is small. This risk is mitigated when the informative prior is less precise. We are currently writing up this simulation study as a paper to be submitted for publication soon.

Future Work

Further work could be done to explore adjustment under conditional dependence between tests, or situations in which the third test in the sequence, here the GS, is imperfect.

References

[1] Pamela M. Vacek. The effect of conditional dependence on the evaluation of diagnostic tests. Biometrics, 41:959, 12 1985.

Skip to toolbar