Achilles’ heel for AI Failure in Drug Development!

by Applied Informatics

Of the many subsets of healthcare, nowhere will you find more exuberance over AI’s emergence, than in drug development. Courted by its lofty promise to solve pharma’s top two challenges, AI can potentially shorten the drug development pipeline and cut expenses by over 50%. Biopharma, known for massive operational costs to bring a drug to market is about to get a radical makeover, and CEOs are on the frontlines of the AI phenomenon.


But can AI deliver on its promises to drug developers? Most likely, but a monumental problem with potentially catastrophic consequences, lurks behind the scene, and unless resolved, stands in the way to AI’s success in accelerating drug development.  What is this Achilles’ heel that can potentially undermine one of the greatest advancements for the pharmaceutical industry?


Regardless of size, pharmaceutical companies big or small, rely extensively on their data to make strategic development decisions.  But what is it about pharma data, that threatens to undermine AI’s potential for solving the industry’s toughest problems?


In a recent survey of 1.3 million members from the data science community,  Kaggle – (a Google company), members when asked about the biggest barriers faced in operationalizing Machine Learning, the most common answer was “dirty data,”  According to The Data Warehouse Institute (TDWI), dirty data ends up costing U.S. companies around $600 billion every year.


In pharma, that cost is exponentially higher with risks to public health factored into the equation. A sharp increase in FDA warning letters to drug companies citing inadequate data integrity underscores the apparent widespread problem in the industry.


“It’s a huge story right now,” says Barbara Unger, a former quality and regulatory affairs manager at Eli Lilly & Co. and Amgen who started a data integrity consulting firm in 2014. Data system remediation projects are underway across the industry-at drug firms and at the contract services firms that manufacture active ingredients and finish drugs for them. Efforts are both time-consuming and costly, Unger says. “And regulatory agencies are aware that it doesn’t happen overnight.”


As increased regulatory scrutiny drives demand for data integrity, pharmaceutical companies have become keenly aware of barriers they must overcome to avoid costly penalties. With the rise in big data analytics in recent years, key measures to resolve the problem have resulted in marginal success. To complicate things, a new wave of AI technologies have entered center stage with promise to cut drug development costs and time to market.


The hope of bringing life-saving drugs to the market quicker and cheaper has top pharma executives at the edge of their seats, and rightfully so. TechEmergence, an AI market research company, revealed 50% of industry executives surveyed, expect a wide-scale adoption of AI in just seven years! In fact, in the last two years, as drug manufacturers compete for AI market dominance, big pharma companies like, Johnson & Johnson ($76 billion), Roche ($54 billion), Gilead Sciences ($26 billion), Novartis ($50 billion), Bristol-Meyers Squibb ($21 billion), Teva ($22 billion), Sanofi ($41 billion), AstraZeneca ($23 billion),  Eli Lilly ($23 billion), Amgen ($23 billion), AbbVie ($28 billion),  Bayer ($29 billion), Merck & Co. ($40 billion), GlaxoSmithKline ($39 billion), and Pfizer ($53 billion)  have shelled out more than $350 billion in AI investments and mergers.


With this kind of industry buy-in, its safe to conclude AI is here to stay. But, what about the unresolved data integrity issue? Can the industry circumvent data integrity and FDA scrutiny to successfully adopt AI? Cautiously, yes, but not without one important caveat!


Within AI’s landscape, another impressive player, may offer light at the end of the tunnel for pharma’s data integrity debacle.  As a prerequisite to AI, Machine Learning can be used to validate the accuracy of the data within an organization.


Dirty data used in AI as historical data to train the predictive model produces faulty new data. When the newly generated data is used by the model to make future predictions the results from this newly created compromised data can be dire.  Machine Learning, offers a unique way to intercept and resolve this widely pervasive data integrity challenge across the industry, and should be employed before AI implementation during the historical data phase when data is being gathered and cleaned.


At Applied Informatics, we have been developing innovative approaches to ensure that data is cleaned and curated automatically using Machine Learning approaches. See below an example:

The dirty site data (with missing or erroneous values) is sent through the Applied ML pipeline that performs probabilistic inference using the data in the tables and an external reference database to produce a clean dataset of sites. Such cleaned site data is one of the prerequisite to perform clinical trials site selection.

Once the data is cleaned, the next major challenge involves integrating data from disparate sources. Traditional methods include various heuristics string matching approaches that do not scale. Machine Learning offers a new paradigm to learn patterns that enable semi-automated integration of disparate data sources quickly and efficiently.


Let’s face it, getting data integrity right can be daunting for any pharma company, but consider the pain of getting it wrong. The quality demands for clean data in manufacturing drugs have always been steep, and data replete with error comes with a double-edged sword and deadly consequences. If poor data quality was a challenge then, in the new era of AI, demands for clean data are exponentially higher, and “garbage in, garbage out” is now amplified a thousand-fold!


As a disruptive technology, AI brings a new frontier of exploration and discovery to the drug development world. It offers promise and hope to help solve critical challenges that have plagued and stagnated the industry for generations.  Scott Gottlieb, Commissioner of the Food and Drug Administration, speaking at the CNBC Healthy Returns Conference in New York on March 28th, 2018 emphasizes that, “AI holds enormous promise for the future of medicine,” He continued by saying,  “We’re actively developing a new regulatory framework to promote innovation in this space, and support the use of AI-based technologies.”


AI is going to stick around. Sooner or later your organization will begin to explore the possibilities AI can offer your company. Embarking on this journey will require a sound strategy to resolve any data integrity issues before implementation. If you are facing challenges in data quality, integration, and prediction, APPLIED ML can help. Try our free one-month proof-of-concept to see if Machine Learning works for you.


Leave a Reply

Your email address will not be published. Required fields are marked *