Blog Post

Blinded Variance Estimation Sample Size Adjustments

November 13, 2024

Interim sample size adjustments and their many approaches are a frequent discussion point between Sponsors and statisticians during protocol development. One such approach is a blinded assessment of variance, favored by some Sponsors for its lack of alpha penalty. Below, we will discuss how this method works, the pros, the cons, and if this approach might be appropriate for your protocol.

How does it work?

For a normally distributed endpoint, the most common method estimates the variance of the treatment difference using 1) the total sample size, 2) the variance of the available data (all of it pooled, ignoring the treatment groups since we’re still blinded) and 3) the assumed difference between the groups that was used in the power calculation. If this estimated variance is higher than what was used in the power calculation, then sample size will be increased accordingly.

While this seems straightforward enough, one of the early papers on this methodology (Gould and Shih, 1992) contains the following caution:

“However, if the alternative hypothesis H1: µ1– µ2 = Δ is true and if n is large enough so that X1 – X2 is reasonably close to Δ…”

Translating from statistics to English, that means the assumed difference between the treatment groups from our power calculation needs to be close to the actual difference at the end of our study for this to work – but how close is “reasonably close”? We’re not really given a clear definition.

Without knowing how close is “reasonably close”, it’s easy to see how this process could go wrong. Take the illustrations below: two treatment group distributions are shown, one in red and one in blue, with the pooled distribution (i.e., if we ignore the treatment groups because we’re assessing blinded data) in green. Example 1 on the left (“Trial 1”) has a treatment difference between blue and red of 1 unit; example 2 on the right (“Trial 2”) has a difference of 2 units.

If this were data from two drug trials, it’s clear that Trial 2 is doing far better than Trial 1 – yet, if we look at our green lines, the apparent variance from the blinded data is wider in Trial 2, all other things being equal. If the power calculation assumed a 1 unit difference for the averages, and the observed data had a 2 unit difference, the blinded interim would conclude that a sample size increase was needed for Trial 2, despite the drug overperforming expectations.

The good news for blinded variance sample size adjustments:

The above example is a wild exaggeration: the effect size has literally doubled while the apparent variance goes up a little. This “backwards” read exists in a noticeable way under “all things equal” assumptions and out-of-this-world effect sizes, but when you take a range of effect sizes that are common and reasonable to see in clinical trials (from 0.2 to 0.6), you see that the blinded formula estimates of variance for the treatment difference track pretty well versus the true variance for the unblinded data. If the unblinded treatment difference is exactly the assumed one, then the blinded variance estimate is precisely the true one – no problems there; but even with somewhat large misspecifications of the treatment difference, the blinded variance estimate does pretty well.

The bad news for blinded variance sample size adjustments:

Usually, it’s not the variance we’re worried about. In simulated data where the treatment was very much underperforming the power calculation, but still in the “promising zone” of other sample size adjustment approaches (Mehta and Pocock, 2011), if the sample variance looked OK, this formula wouldn’t adjust up the sample size, leaving the conditional power quite low. This scenario of treatment underperforming is quite common when going from phase II to phase III and this approach could miss a crucial opportunity to increase the sample size. On the other hand, if treatment was wildly over-performing, the “backwards” read might result in a small sample size increase when you didn’t really need one. Additionally, this approach has no way to detect that you may be able to stop for efficacy early like with other unblinded approaches to sample size adjustment.

The bottom line:

Blinded variance estimation generally does what it says: it does a pretty good job of getting at what the treatment difference variance is using the pooled variance. However, it completely fails to account for a more common scenario in clinical trials where the treatment underperforms. There is also risk for wasted resources if a treatment is overperforming. Given how small the alpha penalty is under other unblinded approaches, the unblinded methods are typically better choices for late-phase clinical trials where an investment in increasing the sample size is most relevant.

Need help choosing the right interim analysis for your trials? Contact us to speak with one of our Biometrics Regulatory Experts.

References:

Gould, A. L., & Shih, W. J. (1992). Sample size re-estimation without unblinding for normally distributed outcomes with unknown variance. Communications in Statistics – Theory and Methods, 21(10), 2833–2853.

Mehta, C. R. & Pocock, S. J. (2011). Adaptive increase in sample size when interim results are promising: a practical guide with examples. Statistics in Medicine, 30(28), 3267–3284.

Bridget Daly, MS, JD, Senior Biostatistician, has eight years of experience providing statistical support for Phase 1-4 clinical trials, with six years of experience providing statistical support for integrated analyses (ISS/ISE). Ms. Daly has supported three NDA submissions, including work on pivotal clinical trials and continued post-submission support through FDA review. Ms. Daly’s experience includes consulting on protocol development, writing detailed statistical analysis plans, preparing CDISC-compliant specifications for analysis (ADaM) databases, creating CDISC-compliant submissions packages, specifying and performing statistical analyses, preparing displays and reports, communicating with sponsors, and providing safety evaluations for data monitoring committees. She has also provided support for a face-to-face FDA meeting with the Division of Anesthesiology, Addiction Medicine, and Pain Medicine (DAAP). She has collaborated with clinical researchers and medical writers on a diverse selection of indications, including acute pain, fibromyalgia, chronic kidney disease-associated pruritus, alopecia, and hyperhidrosis.