HaVoc – A Dev-friendly Health Vocabulary API for the UMLSby Sharib Khan
We are really excited to officially launch the HaVoc API. HaVoc (Health Vocabulary) is our live REST API that supports several health vocabulary operations so that you can easily leverage the knowledge within the UMLS in your applications.
First, why did we build HaVoc and why it could be a game changer for your healthcare applications.
One of the common functions in any health application is having the ability to search for information using different medical terms. For example, on a consumer health website, users maybe interested in searching for articles for say “Prostate cancer”. The problem is that the medical concept “Prostate cancer” has several terms (synonyms) that are used to represent it. Some patients may use the exact phrase “Prostate cancer”, some may use “prostate ca” , or “ca of the prostate”. Or further still “carcinoma prostate”, “prostate carcinoma” and so on. This is where it becomes difficult to provide a search system that can handle all these different terms and retrieve the same set of records. Yes, you can do partial matches, remove stop words, maintain a list of abbreviation mappings, but it soon becomes unmanageable (imagine doing this for all medical disease and treatment terms).
The other common and even more difficult task is, how do you build your system to handle class based queries. That is what if the user wanted to search information not by one specific disease but a group of diseases. For example, they want to see everything that falls under “Cardiovascular Diseases”. Now that means you need to be able to retrieve anything related to Myocardial Infarction, Angina, Congestive Heart Failure, Rheumatic Heart Disease and so on. Few developers will have that kind of medical knowledge to expand a class based query into such child concepts.
Well there are more such use cases (like knowing what a term in one medical vocabulary is called in another), but you get the general idea. Doing semantic operations on medical information can get messy very quickly.
So this is where HaVoc and the UMLS come. UMLS stands for the Unified Medical Language System – it is a huge meta thesaurus that has been painstakingly maintained and expanded on by the National Library of Medicine (kudos to you folks up there). They have collated different medical vocabularies and linked terms across them to create a large dictionary of medical terms in excess of 2 million and even many more (12 million) relationships. However, it is not easy to use the UMLS – just being able to download its 50 GB+ data and understand the different tables and their relationships is a developer’s nightmare (see our blog on working with the UMLS). Many a PhDs have been done with this behemoth system.
You get the picture – all this great medical information exists but has largely been in a format that has been difficult for HIT developers to use easily and integrate in their products.
This is the gap that HaVoc aims to address. The HaVoc API gives you access to all the biomedical terminologies in the UMLS and the concepts and relationships that exist therein. Now that means the 50GB+ of complex UMLS data can now be accessed via simple RESTful API calls. We have been using this API internally within several of our applications at Applied – iHealth patient portal, TrialX (the iConnect clinical trial platform), and Ask dory clinical trials finder and had previously open sourced the implementation after winning an ONC software challenge in 2013 .
Now, thanks to some awesome effort over the last few months by Nadeem, our semantic data engineer, we have been able to expose the API as a cloud service for developers. And we are officially going live today at the HIMSS Connected Health Conference.
So what exactly does HaVoc allow you to do and why we think this is the next big thing for smart Health IT developers.
Lets start with a simple example: say you want to know all the terms that exist for the concept “Multiple Myeloma”. Simple. You make an API call (GET /concepts) for “Multiple Myeloma” and it will return the CUI for this concept in the UMLS. CUI is the “concept unique identifier” in the UMLS. Then next you use this CUI to get all the synonyms (using GET /concepts/:cui/synonyms) of this concept across all source biomedical vocabularies in the UMLS. You can even choose to filter by certain source vocabularies only.
Similarly, the API allows you to do hierarchical queries, i.e. get children of a given concept and by specifying optional parameters you can get children at different depth levels. You can also walk up the terminology tree, and use the GET /concepts/:cui/parents API call, which returns all the parents of a given CUI The API supports the following calls.
So how can all this help you as a developer. Well going back to the two uses cases described above, we think the API can truly make end applications semantic and smart. Here is how.
1. To retrieve synonyms and thus enable semantic search
Typically systems store medical data as they are entered. So if you have a record that pertains to say Prostate cancer you are likely to save and index it for the lexical term “prostate cancer”. Now when you need to retrieve it, as explained above, most systems are limited to retrieving this with some simple lexical variants for the term “prostate cancer” or build their own dictionaries to add terms like “cancer of prostate” to denote that they semantically mean the same thing. But with HaVoc, you no longer need to do this. You can simply get all the synonyms of “Prostate Cancer” from the API and index your record with all the synonyms. Alternatively you can use the API calls to parse your user’s search query and get its synonyms to perform an index look up in your system with all these synonyms appended in an “OR” SQL query to get back all records in your system for this concept. With just a simple API call, you can now support user queries like “prostate caner” , “cancer of the prostate”, “prostate ca” or any such variant that exists in the UMLS.
2. Hierarchical Lookups
This takes the semantic search capability one step further. As explained above, if you needed to find all records pertaining to Cardiovascular Diseases, you can now do that with a simple HaVoc API call. You use the GET /concepts/:cui/children API call to get all children of the concept “cardiovascular disease“. From the list that is returned, you can run queries in your system to retrieve records for each child concept under cardiovascular diseases. Thats it.
A couple of API calls and you can leverage all the years of medical knowledge in seconds and make your systems that much more smarter!
We are super excited about this. Drop us a line with feedback and suggestions. And heack ya, get your API access here and give it a good spin!