UMLS Metathesaurs Tool – Part 3 (Loading UMLS Schema/Data to MySQL)

by Nadeem Nazeer

In this post, we will discuss how to load subset of UMLS Metathesaurus into a MySQL database by using the scripts and files created by MetamorphoSys. To generate the necessary files and scripts, select MySQL database in “Write Database Load Scripts” section of “Output Options” tab, as discussed in the previous post here.

So before we start, we need:

  1. MySQL up and running.
  2. Scripts to create Schema and Indexes.
  3. Data Files to load.

For anything related to MySQL check MySQL docs.

Scripts to Create Schema and Indexes:

Scripts for creating schema and indexes are located in the sub-folder named “config/2014AB/DB_RRF”  inside the main folder, where all files were extracted in 3rd step of the first part of this tutorial. You can also have a quick recap of Rich Release format here.

So where are data files located?

If you are following this tutorial series, and want to refresh your memory, go through the 7th step of second part of this tutorial  to know the destination location of the extracted files.

Creating a Database:

In order to load your subset into a MySQL database, you will have to either use an existing database or create a new one. Two important considerations to account for when creating a MySQL database are, the default character set and collation settings. NLM docs recommend using UTF8 and the utf8_unicode_ci collation setting. For example:

CREATE DATABASE IF NOT EXISTS umls CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Now, UMLS docs here recommend doing all creating/loading stuff by using bash/shell scripts, but many are instances where we just want to play around with only some of the tables or we just want to do it in a simple, traditional way of importing delimited data files as we always do it in MySQL.

So as an example we set up one of tables:

We will create schema, indexes. and load data for MRCONSO table (containing Concepts, Concept Names, and their sources). Click HERE!  to read more about other files as well.

So, lets get started:

  1. Open up the file named mysql_tables.sql in “config/2014AB/DB_RRF/Mysql5.5/mysql_tables.sql”, and here you can see schema definition for various tables to create.
  2. Copy the part for MRCONSO:
    DROP TABLE IF EXISTS MRCONSO;
    CREATE TABLE MRCONSO (
    CUI char(8) NOT NULL,
    LAT char(3) NOT NULL,
    ……………………………
    ) CHARACTER SET utf8;
  3. For creating indexes we open file named mysql_tables.sql in “config/2014AB/DB_RRF/Mysql5.5/mysql_indexes.sql”, and copy the part for MRCONSO again:
    CREATE INDEX X_MRCONSO_CUI ON MRCONSO(CUI);ALTER TABLE MRCONSO ADD CONSTRAINT X_MRCONSO_PK PRIMARY KEY BTREE (AUI);CREATE INDEX X_MRCONSO_SUI ON MRCONSO(SUI);…………………………………
  4. And for loading the data files, note down data files folder location as discussed above, and then:
    again open file named mysql_tables.sql in “config/2014AB/DB_RRF/Mysql5.5/mysql_tables.sql” and copy the lines for loading data file which are just after the table definition.
    load data local infile ‘MRCONSO.RRF’ into table MRCONSO fields terminated by ‘|’ ESCAPED BY ” lines terminated by …………………………….

So, in this way we create MRCONSO table with indexes and load data in it. We can now use it as needed in our customized apps and tools.

At Applied, we are using this in one of our apps, Health Vocabulary Rest API, and our open source projects that you may check out via git here.

Drop a comment here for any queries.

5 thoughts on “UMLS Metathesaurs Tool – Part 3 (Loading UMLS Schema/Data to MySQL)”

  1. Thank you so much for all of these insights. I’m new into MySQL and when I reached the last part “loading data” I had a syntax error :

    Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘@LINE_TERMINATION@ (@cui,@lat,@ts,@lui,@stt,@sui,@ispref,@aui,@saui,@scui,@sdui,’ at line 1

    I’ve tried to copy the line as it is, as well as putting the data files’ path in place of ‘MRCONSO.RRF’ in the syntax but both didn’t work.

    1. I solved it!
      The reason this was happening to me is because “‘@LINE_TERMINATION@ ” needed to be replaced with ‘\n’.

  2. Interesting article, especially about Blockchain. FHIR is locked in to the EHR and legacy HL7 V2. Serial isn’t really that hard to deal with anyway, so REST, sort of.

    I will be speaking about Blockchain in healthcare at the Blockchain Symposium June 9th 2016 in Redwood city, drop by and say hi
    http://events.42tek.com/

  3. Great post. I loved your analysis. It’s spot on. I agree with your call to open up the APIs. That’s what ONC should be doing with their next spec.

    One more thing: we need good reference architectures in this space. I’d love to see your blockchain architecture.

  4. Great, it was of much help for me to install UMLS for my SQL server. But during creating tables, one query didn’t succeed:

    “load data local infile ‘MRCXT.RRF’ into table MRCXT fields terminated by ‘|’ ESCAPED BY ” lines terminated by ‘\n’

    the file MRCXT.RFF is simply absent in the folder. Other queries were successfully run. Is this normal, or am I missing something?

Leave a Reply

Your email address will not be published. Required fields are marked *

Data Science & PopHealth

Methods, tools, systems for healthcare data analysis

Contact us now

Healthcare Informatics Solutions

Healthcare IT news, developments, opinions and solutions

Contact us now

Popular Posts