Blog Post
Statisticians and Critical Variable Review Help Streamline Data Management and Clinical Operations Activities
October 24, 2016
Heather Kopetskie, MS, is a Senior Biostatistician at Rho. She has over 10 years of experience in statistical planning, analysis, and reporting for Phase 1, 2 and 3 clinical trials and observational studies. Her research experience includes over 8 years focusing on solid organ and cell transplantation through work on the Immune Tolerance Network (ITN) and Clinical Trials in Organ Transplantation (CTOT) project. In addition, Heather serves as Rho’s biostatistics operational service leader, an internal expert sharing biostatistical industry trends, best practices, processes and training.
Assessing risk and using it to determine the focus of clinical trial activities has been an important goal in clinical research for a number of years now. One way statisticians can contribute to this is through critical variable review. In critical variable review, statisticians map the case report form (CRF) to the primary and secondary endpoints.
This review is important for several reasons and impacts all team members managing the clinical data. First, it ensures upfront that all of the data needed for the planned analyses are being collected. While this seems obvious, in an unfortunate number of cases, it is not until the end of the study that teams discover that all of the information needed has not been collected. Second, data managers can incorporate critical variables into the data management plan to focus edit checks, cross checks, and data cleaning activities on forms containing critical variables. Third, clinical monitors can focus the clinical monitoring plan on critical variables and ensuring key variables are reviewed during on-site visits but also during remote data monitoring.
Statisticians, data managers, and clinical monitors should start reviewing data as early as possible after first patient first visit with a focus on critical variables. Initially focusing on a report showing for each critical variable how many subjects are expected to have the data, how many have completed data entry, and how many are missing the variable. These reports can be used to identify issues early in a trial and determine how to address issues such site retraining that’s required, process changes when an assessment isn’t standard of care at a site, protocol deviations resulting from missing data, etc. As the study progresses descriptive statistics can be performed on the critical variables for investigators to review and ensure the study is progressing as expected without unblinding the study.
As data management evolves from data primarily being collected in an EDC system to data being collected from multiple sources such as EDC, ePRO systems, health electronic records, and central laboratories, additional strategies need to be implemented to ensure a clean integrated database for analysis. Instead of data managers providing all the data cleaning data managers, programmers, statisticians, and clinical monitors will need to collaborate. All members of the team should meet regularly to discuss progress and develop tools that will facilitate cleaning across multiple data sources. New tools and strategies will need to be implemented. We outline a few strategies we’ve piloted for collaboratively reviewing data early after database launch. Early looks at the data can provide a sense of how sites are entering data. Dealing early on with issues that arise will prevent lots of dirty data at the end of the study.
One strategy is an evaluation of all free text fields completed in the database. Sites may be entering data in the wrong place or collecting data that is not needed which can be fixed through site re-training. Additionally, this review can highlight additional fields or updates that need to be added to the CRFs.
Another strategy is code book reviews. A code book is a file which provides descriptive statistics on all fields in the EDC system that can be reviewed by all members of the study team. This is an easy way to identify outliers by data field and site-to-site differences. (Codebook examples and macros are available in Github.)
Statisticians and programmers can also compile data across multiple sources to identify what data fields are missing (ePRO not entered), what information doesn’t reconcile (e.g. biopsy date in EDC versus specimen collection system), what deviations may be expected from data sources outside of EDC, etc. and provide one succinct report for the data managers to facilitate communication with the site to reconcile and update data.
Additionally, constant communication between team members can bring to light common themes the clinical monitors are seeing during their visits, data managers are seeing through queries, and statisticians are observing during data preparation. This allows for early action which can minimize time spent at the end of the study to clean the data and lock the database.
One thing that has become abundantly clear is that a risk-based approach to clinical trials requires close collaboration between disciplines. Data managers, clinical monitors, and statisticians must work together in ways they have not in the past. Traditional models that rely on functionally-aligned silos will not allow risk-based approaches to succeed.