Simplifying population health management and the identification of social determinants with natural language processing

June 20, 2019

By Elizabeth Marshall

The healthcare industry has come a long way in its appreciation of non-clinical factors impacting a patient’s overall well-being, such as social determinants of health (SDoH).

However, the industry has made less progress when it comes to gathering information on individual patients’ social determinants, analyzing the details and — most importantly — translating the findings into actionable information that healthcare organizations (HCOs) can use to improve population health management.

One reason for this lag is that SDoH information frequently isn’t readily available to payer and provider organizations when important treatment decisions are being made. Too often, clinicians are unaware of key SDoH information until after a patient’s health has been negatively affected.

Frustratingly, HCOs often already have SDoH data in their patient records, but struggle to analyze it and leverage it for quality reporting because it is essentially trapped as unstructured data (e.g., free text within clinical notes). Indeed, it’s estimated that 80% of clinical data is unstructured and difficult to analyze. Data such as clinician narratives, nurse notes, radiology reports, discharge summaries and patient-reported information have the ability to contribute a wealth of useful clinical information, but the details are rarely easy to access.

To unlock value from this unstructured data, more HCOs are looking to Natural Language Processing (NLP) technology, which makes unstructured data usable by automating the identification and extraction of key concepts from large volumes of clinical documentation. HCOs can then transform this information into structured data to guide more informed treatment decisions.

Why SDoH data is key for developing population health strategies
Before designing effective population health management programs, HCOs must first understand the health risks that their patient populations face. Once HCOs have stratified their population based on level of risk, they can then initiate evidence-based care plans and safety-net programs to improve outcomes for at-risk patients and install preventive programs for all patients.

However, unless HCOs account for SDoH factors, the accuracy of risk assessments and patient stratification is compromised. That’s because, according to the Centers for Disease Control and Prevention, only 10 percent of factors affecting premature death are related to clinical care, and 30 percent of factors relate to genetics. This means that 60 percent of factors impacting premature death are based on a combination of social/environmental factors (20 percent) and behavior (40 percent).

For HCOs contemplating the launch of new population health programs, data analysis is often challenging due to the heterogeneous nature of patient-related data. Accessing and analyzing unstructured data is difficult without advanced technologies such as NLP – at least without resorting to expensive and time-consuming manual chart reviews.

However, the ability to automatically extract precise data from unstructured text is invaluable for HCOs that are transitioning to value-based contracting arrangements. By employing NLP, HCOs can look at structured and unstructured data for a complete, 360-degree-view of their patient populations. They can then identify and extract specific details to assess risk or improve population health. Additionally, NLP can enable clinicians to assess critical lifestyle details and specific behaviors from individual patients, such as smoking and alcohol consumption, and gain insights into living arrangements, access to care and mobility status.

Using NLP to improve diabetes population health
Consider how NLP can help an accountable care organization (ACO) better understand its patient population’s risk for diabetes — a condition that, along with prediabetes, affects more than 100 million Americans and exacts a high cost on the U.S. health system: The cost of care for people with diabetes averages about $16,752 a year and accounts for approximately 25% of healthcare dollars spent in the U.S., according to a study in Diabetes Care.

Diabetes risk is closely associated with social and economic factors, and is more common among non-white populations, with black, Hispanic, and Native American populations experiencing the disease at much higher rates than whites. An analysis of structured data can detail risk factors tied to weight, race and age, but is likely to miss additional risk factors that could be noted in free text within physicians’ notes.

By leveraging NLP to analyze patient records, however, the ACO could identify a host of additional risk factors impacting its patient population, such as limited access to proper medications and healthy foods, barriers to physical activity, high stress levels and social isolation.

Additionally, many signs of early diabetes symptoms appear in unstructured data, such as mentions of excessive thirst or hunger, frequent urination, fatigue, and blurred vision. Further, information pertaining to laboratory values such as hemoglobin A1c and blood glucose levels are also markers that may appear in free text lab reports and be missed when relying solely on structured data.

Armed with this enriched picture of patient population risk, the ACO is much better positioned to develop a population health program that more precisely identifies patients’ care needs.

Elizabeth Marshall

Social factors contribute significantly to individual patients’ health and well-being, but frequently aren’t accounted for in population heath programs simply because HCOs don’t know what they don’t know about certain patients. NLP enables payers and providers to mine unstructured data for hidden insights — important SDoH patient information hidden in free-text clinical notes — and advance their population health management efforts.

About the author: Elizabeth Marshall MD, MBA is the director of clinical analytics at Linguamatics