AI Uncovers Hidden Self-Harm Histories in Medical Records

AI Uncovers Hidden Self-Harm Histories in Medical Records

As a leading figure in biopharma innovation and research, Ivan Kairatov has spent years navigating the intersection of technology and patient care. His expertise lies in transforming complex data sets into actionable clinical insights, particularly within high-stakes environments like mental health and drug development. In this conversation, we explore the systemic challenges of tracking patient history, the hidden patterns within medical records that even experienced clinicians might overlook, and the revolutionary machine learning methods being used to bridge the gap between documented data and clinical reality.

Standard diagnosis codes capture roughly one-fourth of clinically documented self-harm histories. How does this significant visibility gap alter our understanding of patient needs and the overall planning for mental health services?

When we look at the data from more than 1.3 million veterans, the discrepancy is staggering and frankly quite sobering for those of us in health informatics. We found that while diagnosis codes only showed a 1.85% prevalence of self-harm, our deeper analysis revealed the true estimated figure to be closer to 7.9%, which is more than four times the visible rate. This gap means that for every patient we “see” through standard billing codes, there are three others whose histories remain essentially invisible to high-level administrative searches. From a systems-level perspective, if we are only counting what is easy to find in a database, we are substantially underestimating the actual demand for life-saving mental health resources. This isn’t just a statistical error; it is a fundamental planning failure that leaves clinicians and health systems ill-equipped to provide the necessary support for those at the highest risk.

With some electronic health records containing more than 500,000 lines of notes, it is impossible for a clinician to review everything during a standard visit. In what ways does this information overload contribute to what you describe as a “systems-level visibility problem”?

The sheer volume of data in modern healthcare has become a double-edged sword where the most critical details are often buried under a mountain of routine documentation. Imagine a clinician trying to prepare for a twenty-minute appointment while facing a record that spans half a million lines; it is a physical and cognitive impossibility to find every nuanced mention of past self-harm or behavioral risk. This creates a scenario where the information technically “exists” within the system, but it is functionally lost because it isn’t surfaced in a way that a human can process in real-time. We call this a systems-level visibility problem because the failure doesn’t lie with the individual doctor’s diligence, but with how the record is structured and summarized. Without better tools to distill these massive histories, the most important predictors of a patient’s future well-being remain hidden in plain sight.

You utilized a novel machine learning method called PULSNAR to address these inconsistencies. Could you explain how this approach handles the “messy” nature of real-world medical records compared to traditional models?

Traditional machine learning usually requires a “gold standard” where every case is clearly labeled as a “yes” or a “no,” but medical data is rarely that clean because a missing code does not equate to the absence of a condition. PULSNAR, which stands for Positive Unlabeled Learning Selected Not At Random, was specifically designed to thrive in this environment of uncertainty. It starts by learning from the patients who definitely have a code and then calculates the probability that similar patterns exist among the “unlabeled” patients who lack that specific tag. By acknowledging that certain cases are more likely to be coded than others, the model can estimate the true prevalence of a condition without making the false assumption that uncoded patients are healthy. This allows us to account for the “noisy” reality of clinical practice, where a patient might have all the risk factors and behaviors associated with self-harm, even if a formal diagnosis was never entered into the system.

Even when self-harm appeared in diagnosis codes, it was missing from the problem list in nearly 78% of cases. Why is the problem list failing to serve its purpose as a primary flag for clinical teams?

The problem list is intended to be a concise, high-level summary of a patient’s most vital conditions, yet our study found that only 22.6% of veterans with a self-harm diagnosis actually had it listed there. This tells us that even when a condition is recognized and coded for billing or administrative reasons, it often fails to migrate to the summary fields that clinicians rely on most during active care. It highlights a massive disconnect in how data is maintained; problem lists are often not consistently updated or are managed by multiple providers who may not feel responsible for documenting mental health history. When four out of five patients with a known history of self-harm have that information missing from their primary summary, the safety net we rely on to flag suicide risk is effectively broken. This fragmentation forces clinicians to hunt through the record, which, as we’ve discussed, is a losing battle when faced with hundreds of thousands of lines of notes.

Past self-harm is a primary predictor of future risk, but it also influences how we treat conditions like PTSD, depression, and traumatic brain injury. How does a more complete record change the clinical approach to these interconnected issues?

A patient’s history of self-harm isn’t just an isolated data point; it is a lens through which we must view their entire clinical profile, including depression, PTSD, bipolar disorder, and substance use. When a care team has a complete picture, they can make much more informed decisions about medication safety, the intensity of follow-up care, and the specific therapeutic interventions that might be required. For instance, knowing a patient has a history of self-harm might change how a clinician manages a traumatic brain injury or how they prioritize social work involvement. Without this context, we are essentially treating the symptoms in a vacuum, which increases the risk of overlooking the underlying volatility that could lead to a crisis. Having that documented history visible and quantified allows for a holistic approach that acknowledges the complex interplay between mental health and physical trauma.

The PULSNAR method has already been applied to detect under-coded opioid use disorder. What other areas of medicine do you believe are most in need of this kind of “hidden data” discovery?

Our work with opioid use disorder and self-harm is really just the tip of the iceberg, as we are already extending this methodology to conditions like sleep disorders, unrecognized PTSD, and bipolar disorder. These are areas where the medical record often shows an incomplete or “blurry” picture, either due to the stigma surrounding the diagnosis or the complexity of the symptoms. We are particularly interested in identifying patterns of behavior or secondary clinical indicators that point toward a condition long before a formal code is ever applied. By finding these “missing” cases at scale, we can help health systems better estimate the true burden of disease and identify specific patient records that warrant a much closer manual review by a specialist. The ultimate goal is to move beyond just tracking what is coded and toward a model where the data itself helps us find the patients who are falling through the cracks of standard documentation.

What is your forecast for the role of machine learning in bridging the gap between clinical notes and formal diagnosis codes over the next decade?

I believe that over the next ten years, machine learning will transition from a retrospective research tool to a real-time “clinical co-pilot” that continuously audits the electronic health record for inconsistencies. We will see systems that don’t just store data but actively interpret it, flagging a clinician when a patient’s behavioral notes or medication history suggest a condition like self-harm that hasn’t been officially coded yet. This will move us away from the current “systems-level visibility problem” toward a more transparent environment where the 500,000 lines of text are no longer a barrier but a source of proactive insight. Ultimately, the integration of tools like PULSNAR will ensure that a patient’s story—no matter how buried it is in the notes—is always present and accounted for at the point of care, which is the only way we can truly deliver personalized and safe mental health services.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later