Phen2Test: Enhancing Genetic Testing Decisions for Pediatric Rare Disorders

Rare diseases, though individually uncommon, collectively impact millions of people worldwide. Most of these conditions manifest in childhood, with genetic factors being the primary cause in about 80% of cases. Children with rare genetic disorders often endure prolonged diagnostic journeys characterized by numerous specialist visits, misdiagnoses, and resultant anxiety and financial strain. A genetic diagnosis is crucial for understanding the disease’s cause, prognostication, optimizing management plans, and establishing appropriate support networks.

Currently, the genetic testing pathway commonly followed includes a tiered approach starting with Chromosomal Microarray (CMA) and testing for Fragile X syndrome for children with developmental delays. If these first-tier tests are inconclusive, second-tier gene panels targeting multiple genes based on clinical symptoms are used. When these also fail to yield a diagnosis, the third-tier tests, ES/GS, are employed, providing higher diagnostic yields albeit at higher costs.

In conventional practice, general pediatricians must refer to genetic specialists for evaluations and orders, leading to delays. Given the complexity surrounding the choice between gene panels and ES/GS, pediatricians with limited genetic training face significant challenges. The Phen2Test model in this study aims to support these pediatricians by predicting a geneticist’s test choice, potentially expediting the diagnosis process.

Introduction and Background

The Burden of Rare Diseases

Rare diseases collectively affect millions of individuals worldwide, with a significant proportion presenting in childhood. Genetic factors are the primary cause in about 80% of these cases. The diagnostic journey for children with rare genetic disorders is often long and arduous, involving numerous specialist visits, misdiagnoses, and significant emotional and financial strain on families. A timely genetic diagnosis is essential for understanding the disease’s cause, guiding treatment, and providing appropriate support.

Timely diagnosis helps reduce unnecessary testing and hospital visits, thereby alleviating the anxiety and uncertainty faced by the families of affected children. Genetic information also plays a pivotal role in identifying potential treatment options, participating in clinical trials, and connecting with patient support groups. The establishment of a clear genetic diagnosis is not just beneficial for the individual patient but also critical for advancing our understanding of these rare conditions and contributing to the broader field of genomic medicine.

Current Genetic Testing Pathways

The current genetic testing pathway for pediatric rare disorders typically follows a tiered approach. Initial tests include Chromosomal Microarray (CMA) and Fragile X syndrome testing for children with developmental delays. If these tests are inconclusive, second-tier gene panels targeting multiple genes based on clinical symptoms are used. When these also fail to yield a diagnosis, third-tier tests such as whole-exome sequencing (ES) or whole-genome sequencing (GS) are employed, offering higher diagnostic yields but at a higher cost.

The tiered approach is designed to manage costs and improve diagnostic efficiency. CMA, the first step, is relatively inexpensive and can detect large genetic anomalies. Gene panels are more comprehensive and target specific genes linked to the patient’s symptoms. However, they may miss rare or novel mutations. ES and GS provide a broader examination of the patient’s genome, increasing the likelihood of identifying rare genetic disorders but are more costly. This tiered strategy is effective in a high-resource setting but often results in patients undergoing multiple tests over extended periods, delaying definitive diagnosis and treatment.

Challenges for General Pediatricians

In general practice, primary care pediatricians often encounter significant challenges when making decisions about genetic testing due to their limited genetic training. The complex choice between initiating gene panels versus opting directly for whole-exome or whole-genome sequencing (ES/GS) typically necessitates referrals to genetic specialists, resulting in further delays. Phen2Test, the model developed in this study, aims to bridge this expertise gap by predicting the type of genetic test a specialist would likely recommend, thereby potentially expediting the diagnostic process.

Pediatricians may face difficulties interpreting genetic test results or understanding the clinical implications of various genetic abnormalities. Additionally, the time and resource constraints in general practice settings further hinder effective decision-making. By employing the Phen2Test model, pediatricians can make more informed genetic testing recommendations, reduce the time to diagnosis, and minimize the burden on families. This model not only aids in clinical decision-making but also helps optimize resource utilization in healthcare facilities by reducing unnecessary specialist referrals and repeated tests.

Methodology and Model Development

Data Collection and Curation

The study involved substantial data collection and preprocessing from electronic health records (EHRs) at CUIMC and CHOP. The initial cohorts comprised individuals who underwent genetic testing advised by geneticists. The inclusion criteria excluded tests not conducted for diagnostic purposes and individuals aged 19 and older. Genetic test orders were categorized as either direct ES/GS or gene panels, with the latter two categories grouped as ES/GS for the study’s purposes. Historical test decisions were adjusted to align with contemporary ACMG recommendations, which advocate for ES/GS in cases involving congenital anomalies and developmental disorders.

A comprehensive dataset was pivotal to model development, which required meticulous curation to ensure accuracy and relevance. Data included detailed patient demographics, clinical symptoms, and genetic test outcomes. To conform with current standards, historical data had to be realigned with up-to-date guidelines, reflecting the ACMG’s latest recommendations. This alignment process was crucial for the model to predict genetic testing recommendations accurately based on real-world contemporary clinical practices.

Feature Engineering

For model training, features were extracted from both structured and unstructured EHR data. From structured data, condition concepts were mapped to phecodes and aggregated into frequency counts. Demographic information, including age, sex, and race, was also collated. From unstructured clinical notes, phenotypic features matching phecode terms were extracted, incorporating a negation detection model to exclude negative mentions. Additional features included counts of clinical notes to reflect overall healthcare utilization.

This process ensured that the model could capture a comprehensive picture of each patient’s clinical and demographic profile. The inclusion of phenotypic features from clinical notes was particularly important as it provided a more detailed representation of the patient’s health status than structured data alone could offer. By using a negation detection model, the influence of irrelevant or incorrect information was minimized, enhancing the overall quality of the input data. The additional features reflecting healthcare utilization provided context on the intensity and frequency of medical interactions, which could be indicative of more complex cases requiring ES/GS.

Model Training and Optimization

Multiple feature sets were prepared and used to train Logistic Regression, Random Forest, and XGBoost classifiers. A nested cross-validation approach was employed for hyperparameter tuning and model evaluation. Techniques like SMOTE and class weight adjustments were used to manage class imbalance, while Principal Component Analysis (PCA) was tested for feature reduction. The model achieving the highest average AUPRC on the validation set using structured EHR data and demographics was selected as the optimal model, referred to as Phen2Test.

The various algorithms and techniques employed in the training process were aimed at ensuring the model’s robustness and accuracy. Logistic Regression provided a simple yet effective baseline, while Random Forest and XGBoost offered more sophisticated approaches capable of capturing complex patterns in the data. Cross-validation ensured that the model’s performance generalized well to unseen data, reducing the risk of overfitting. Managing class imbalance was critical since the proportion of cases recommended for ES/GS versus gene panels could skew the model’s predictions. By evaluating different feature sets and applying techniques like PCA, the model could maintain high performance while being computationally efficient.

Results and Performance Evaluation

Data Insights

Data for 1005 individuals from CUIMC indicated that those recommended for ES/GS were younger on average than those suggested gene panels. The external validation cohort from CHOP consisted of 997 individuals, with similar proportions undergoing ES/GS directly or gene panel testing based on phenotypically derived recommendations. This comparison underscored the consistency of the phenotypic-driven recommendations across different institutions with potentially varied clinical practices.

Analyzing the data revealed significant patterns that informed the model’s predictions. Younger patients were more frequently recommended for ES/GS, likely due to the higher suspicion of congenital or early-onset genetic conditions. This pattern held true across different cohorts, indicating a robust trend that the model could leverage. Additionally, the similar proportions of ES/GS recommendations in external validations suggested that the Phen2Test model’s recommendations were aligned with broader clinical practices, further validating its reliability.

Model Performance

The optimal Random Forest model, leveraging structured phecode-based features, significantly outperformed HPO-based and combined feature sets, achieving an AUROC of 0.823 and an AUPRC of 0.918 on internal testing. Features such as neurological, genetic, and congenital abnormalities were among the most predictive of ES/GS recommendations. The model was further validated against an expert-determined ground truth, confirming its comparable performance to genetic specialists, significantly better than a general pediatrician’s recommendations.

The high performance of the Random Forest model demonstrated its ability to accurately distinguish between cases warranting ES/GS and those for gene panels. This performance was particularly notable given the complex nature of the data and the variability in clinical presentations. By focusing on key phenotypic features, the model could effectively prioritize cases for advanced genetic testing. The validation against expert recommendations provided additional confidence in the model’s utility, suggesting that it could serve as a reliable decision support tool in clinical practice.

Sensitivity Analysis

The model was validated against different time periods and demographic subgroups, with a noted improvement in performance in more recent years, likely reflecting the evolving clinical practice and guidelines adoption. The model’s robustness across demographic groups was also confirmed, showcasing stability in performance regardless of age, sex, or race. This consistency suggested that the Phen2Test model could be broadly applicable across diverse patient populations, enhancing its generalizability and utility.

Sensitivity analyses were crucial in confirming that the model’s predictions were reliable across varying contexts. By validating the model over different timeframes, the study could confirm that Phen2Test adapted well to changes in clinical practice and maintained its performance despite shifts in guidelines. Additionally, the model’s consistent performance across demographic groups highlighted its potential to minimize biases and ensure equitable decision-making in genetic testing recommendations. This robustness indicated that Phen2Test could be a valuable tool in various clinical settings, supporting diverse patient populations.

Cost-Effectiveness Analysis

A brief cost-benefit analysis highlighted Phen2Test’s potential economic advantages. When applied to the CHOP cohort, Phen2Test demonstrated significant cost savings compared to both a tiered testing approach (gene panels followed by ES/GS if negative) and a direct ES/GS testing approach. This analysis suggested that using Phen2Test could optimize resource utilization by prioritizing cases with a high likelihood of benefiting from ES/GS, thus avoiding unnecessary intermediate tests and reducing overall healthcare costs.

The cost-effectiveness analysis underscored the practical benefits of integrating Phen2Test into clinical workflows. By reducing the need for multiple tests, the model could streamline the diagnostic pathway, saving not just costs but also valuable time for patients and their families. These savings could be substantial, given the high costs associated with advanced genetic testing and the potential for reducing hospital visits and associated healthcare expenses. The analysis reinforced the model’s potential as a cost-effective solution, enhancing its appeal to healthcare providers and payers.

Discussion

The use of clinical phenotypes from EHRs to guide genetic testing represents a substantial step forward in precision medicine for rare pediatric disorders. Phen2Test effectively integrates phenotypic data to inform genetic testing decisions, aligning closely with expert recommendations and offering potential cost savings. This aligns with the growing reliance on genomic technologies in clinical settings and the need for efficient decision support tools to bridge the expertise gap among general pediatricians.

The integration of phenotypic data from EHRs is a testament to the potential of leveraging big data and machine learning in healthcare. By accurately predicting genetic testing recommendations, Phen2Test can support more timely and precise diagnoses, ultimately improving patient outcomes. Furthermore, the model’s alignment with expert recommendations suggests it could serve as a valuable second opinion, empowering pediatricians to make more informed decisions. As genetic testing becomes increasingly integral to clinical care, tools like Phen2Test will be essential in optimizing these processes and ensuring that patients receive the most appropriate care.

Limitations and Future Directions

Despite its promising performance, the model’s applicability in lower-resource settings remains untested, and its exclusion of photo/video inputs limits its scope. Financial considerations and resource availability are real-world factors not accounted for by Phen2Test. The tool’s integration into routine practice also hinges on acceptance by clinicians and payers, necessitating further prospective validation studies.

Looking ahead, addressing these limitations will be crucial for maximizing the impact of Phen2Test. Future research should explore the model’s performance in diverse clinical environments, including lower-resource settings, to ensure its broad applicability. Incorporating additional data types, such as clinical images or videos, could further enhance the model’s predictive capabilities. Moreover, engaging with stakeholders, including clinicians and payers, will be vital for integrating Phen2Test into everyday clinical practice and realizing its full potential in improving genetic testing pathways for pediatric rare disorders.

Overall, the Phen2Test model presents a significant advancement in pediatric rare disease diagnostics, offering an efficient, cost-effective, and expert-aligned decision support system, paving the way for more timely and accurate genetic testing recommendations. The implications for healthcare are profound, as faster and more accurate diagnoses can lead to better patient outcomes and more efficient use of healthcare resources.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later