Active Pharmacovigilance using Electronic Health Records – Introduction to Mayhem – Part 1

by Shweta Mishra

Research shows that drug-related events account for up to 50% of adverse events occurring in hospital stays each year. This significantly increases costs and length of stay in hospitals. Also about one fourth of the adverse drug reactions are drug-drug reactions occurring as a result of concomitant use – meaning – taking multiple pills at the same time. Electronic medical records (EMR) are a rich source of clinical information to scan medical events such as adverse reactions – in other words – to aid pharmacovigilance.

Pic courtesy:


However EMR’s/clinical notes have their own share of challenges and barriers to their adoption in pharmacovigilance. The main issues are:  

  • Amount and structure of content: Clinical notes contain large amount of unstructured content including details of the disease, its prophylaxis, management and recommendations for care. Manual curation of these clinical description makes it labor intensive to process the needed information – calling for automation.
  • Privacy: Clinical notes contain identifying information, such as names, dates, and locations.
  • Accessibility: Privacy issues make the care organizations reluctant to share clinical notes.

Natural language processing (NLP) systems like cTAKES can help overcome these challenges.

In one of our new projects at Applied, we worked on clinical text mining for a client, using our proprietary algorithms that we created by enhancing libraries like cTAKES. We built the system known as Mayhem.

Before talking about how natural language processing softwares like cTAKES and Mayhem are being put to use in pharmacovigilance, it is important that we understand a few technical terms.


What is Pharmacovigilance?  

Pharmacovigilance is another name for drug safety surveillance. Few examples that explain what pharmacovigilance entails are – detecting associations between a drug and an adverse event, and finding associations between an adverse event and drug–drug interactions. In order to market or test a drug in most countries, drug regulatory authorities require companies to compile and submit adverse events data – also called – pharmacovigilance reports.


What is Natural Language Processing (NLP)? 

According to the Open Health Natural Language Processing Consortium “Natural language processing (NLP) – by definition – is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages.”

Let me explain that in terms of clinical documentation. The clinical and research medical community creates, manages, and uses a wide variety of semi-structured and unstructured textual documents. These documents contain information that can be useful for:

  • performing research
  • reviewing quality
  • detecting outbreaks
  • improving standards of care
  • evaluating treatment outcomes

All in all, these documents aid decision making in the clinical field.

Biomedical NLP systems extract structured information from textual reports, facilitating searches, comparing, classification, and summarization.

One good example to help it simplify even better are defining the inclusion and exclusion criteria in clinical trials. In clinical research, investigators are required to outline certain inclusion and exclusion criteria based on the requirements of a clinical trial. Inclusion criteria are characteristics that subjects must have to be included in the study, and exclusion criteria are characteristics that disqualifies patients from being included in a study. Investigators recruit patients based on these criteria.

Natural language processing techniques can help parse or read data from clinical notes detecting drug–adverse event associations and adverse events associated with drug–drug interactions – aiding in defining inclusion/exclusion criterion.

So, NLP in clinical domain can be used for:

  • Patient cohort identification
  • Clinical decision support
  • Healthcare quality research
  • Personalized medicine
  • Bio-surveillance
  • Drug development
  • Text summarization


What is cTAKES?

cTAKES – The clinical Text Analysis and Knowledge Extraction System is one such NLP software, that can extract information from electronic medical record clinical text. It’s components are specifically trained for clinical domain, allowing it to define the meaning and form of medical terms used in clinical notes, and also the relationship between them. For example, it ingests a clinical note and categorizes information about the patient’s current location, patient’s disease, location of the disease in the body, prescribed medications, the doctor who prescribed, medications which are not effective, next appointment and so on.

It uses Unified Medical Language System (UMLS) as a knowledge source. (“The UMLS is a set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems.”) cTAKES was released as open-source at It builds on existing open-source technologies – the Unstructured Information Management Architecture (UIMA) framework and Open NLP toolkit.


How can cTAKES help overcome barriers to using clinical notes in Pharmacovigilance?  

One of the features of cTAKES is its capability to ‘read’ through and extract concepts from plain text notes and transform them into structured and normalized information – in more technical terms parse information from the text and classify it in the Unified Medical Language System (UMLS).


Features of cTAKES

Firstly, cTAKES is fast and powerful. It can process 50,000 clinical notes in an hour. This makes the process of browsing through large amount of unstructured data – time efficient. It is portable and can run on any computer platform.

cTAKES provides several great features which aid parsing of medical terms from a medical note & hence aid pharmacovigilance. It can discover clinical concepts, events, and attributes, and the relationship between them. The following screenshots of a mock clinical note from makes it easy to understand these features:


  • Co-reference resolution  – cTAKES can recognize terms having same meaning but presented differently. For instance, as the following example highlights, it can recognize the word “biopsy” even if it is presented as “biopsies”.


Co-reference resolution


  • Location identification: cTAKES can “read” a clinical note and identify the relationship between a disease and its location in the body, as the following picture shows.



  • Temporal linking: Clinical notes almost always contain plan of care or recommendations for the patient suggesting the time when the patient should take the next appointment. cTAKES can decipher and connect the timelines mentioned in a clinical note.


  • Negation & Uncertainty Detection: – cTAKES has the ability to detect negation and uncertainty indicated in clinical notes. For example, if the notes say “Crocin not helping in pain relief”, cTAKES can detect that Crocin was not effective for that particular patient.


Currently cTAKES is being used by hospitals such as Boston Children’s Hospital, Cincinnati Children’s Hospital, Mayo Clinic, and various universities.

Going leaps ahead in transforming pharmacovigilance and drug safety operations, a client company is using artificial intelligence in pharmacovigilance, to not simply track, but also predict and solve potential problems, to aid improvement in drug quality and patient outcomes. Our team at Applied is excited to be a part of this project!

Stay tuned for Part 2 of this blog to learn how Mayhem helped us in building this pharmacovigilance system.



Leave a Reply

Your email address will not be published. Required fields are marked *

Healthcare Informatics Solutions

Healthcare IT news, developments, opinions and solutions

Contact us now

Popular Posts