The UMLS Metathesaurus is a compilation of names, relationships, and associated information from a variety of biomedical naming systems representing different views of biomedical practice or research. The Metathesaurus is organized by meaning, and the fundamental unit in the Metathesaurus is the concept. Differing names for a biomedical meaning are linked in a single Metathesaurus concept. Extensive additional information describing semantic characteristics, occurrence in machine-readable information sources, and how concepts co-occur in these sources is also provided, enabling a greater comprehension of the concept in its various contexts. The Metathesaurus is not a standardized vocabulary; it is a tool for maximizing the usefulness of existing vocabularies. It serves as a knowledge source for developers of biomedical information applications and as a powerful resource for biomedical information specialists.
Applications exploiting the hierarchical relations recorded in the Unified
Medical Language System (UMLS) Metathesaurus suffer from the presence
of inconsistencies in these relations. A formal approach to identifying
and eliminating circular hierarchical relations has been proposed
in previous work, leading to the creation of a directed acyclic Metathesaurus
graph. However, this approach is at best semi-automatic and
its implementation is far from trivial. A simpler, alternative approach
consists in avoiding loops while traversing the Metathesaurus graph
by preventing nodes from being visited twice. Our objective is to evaluate
the benefit of the formal approach to eliminating cycles over a naïve
approach to avoiding them. To this end, we compared the size
and semantic coherence of sets of descendants obtained by both approaches. 12% of
the concepts with descendants exhibit some differences. The
formal approach significantly reduces the number of descendants
in these cases. The benefits in terms of semantic coherence are
The HL7 Vocabulary Technical Committee (TC) was organized to select and maintain the vocabulary used in HL7 messages. The goal is to make implementations of the Version 3 HL7 Standard more plug-and-play compatible. In order to make the vocabulary readily accessible to the public, HL7 is collaborating with the U.S. National Library of Medicine (NLM) to include HL7 vocabulary in the Unified Medical Language System (UMLS) Metathesaurus. This article describes a proposal for how HL7 data elements and coded values can be represented accurately in the relational tables of the UMLS Metathesaurus.
I previously developed methods for identifying cases of multiple synonymous concepts (redundancy) and concepts with multiple meanings (ambiguity) and applied them to the 1995 UMLS Metathesaurus. These methods use semantic approaches (including knowledge about word synonymy and the semantic types assigned to concepts) to complement the standard lexical approaches. In this paper, I describe the results of their application to the 2001 Metathesaurus and examine their implications for the evolution of the UMLS.
OBJECTIVES: Assess query expansion using thesaurus relationships and definitions in the UMLS Metathesaurus for improving searching performance. METHODS: The queries from a MEDLINE test collection (OHSUMED) were expanded using synonym, hierarchical, and related term information as well as term definitions from the UMLS Metathesaurus. Documents were retrieved from a word-statistical retrieval system and assessed for recall and precision based on relevance judgments from the test collection. RESULTS: All types of query expansion degraded aggregate retrieval performance as measured by recall and precision, although 38.6% of the queries with synonym expansion and up to 29.7% of the queries with hierarchical expansion showed improvement. CONCLUSIONS: Thesaurus-based query expansion causes a decline in retrieval performance generally but improves it in specific instances. Further research must focus on identifying instances where performance improves and how it can be exploited by real users.
The entire collection of 11.5 million MEDLINE abstracts was processed to extract 549 million noun phrases using a shallow syntactic parser. English language strings in the 2002 and 2001 releases of the UMLS Metathesaurus were then matched against these phrases using flexible matching techniques. 34% of the Metathesaurus names (occurring in 30% of the concepts) were found in the titles and abstracts of articles in the literature. The matching concepts are fairly evenly chemical and non-chemical in nature and span a wide spectrum of semantic types. This paper details the approach taken and the results of the analysis.
The Unified Medical Language System(R) (UMLS) Metathesaurus contains records arranged by concept or meaning. Each concept contains a unique identifier (CUI) that can be used to track the concept over time. Since the January 2001 release, the Metathesaurus has included the file MRCUI that contains mappings for CUIs that disappear. This paper describes the processes that facilitated this effort and the ongoing effort to find suitable mappings for concepts whose meanings no longer exist in the Metathesaurus. This study highlights the need to identify missed synonymy prior to a release. It also shows a need to work more closely with source providers to identify the closest match in the Metathesaurus when they eliminate terms from their vocabularies
The Unified Medical Language System (UMLS) is being designed to provide uniform access to computer-based resources in biomedicine. For the foreseeable future, the foundation of the UMLS will be a metathesaurus of concepts, synthesized from existing biomedical nomenclatures. Meta-1, the first version of the Metathesaurus, will contain all of MeSH, a selection of terms from primary care, clinical medicine, and other domains, and all terms from SNOMED, ICD-9-CM, and CPT-4 which “match” them -- about 30,000 terms. In addition, Meta-1 will contain information about the occurrence and co-occurrence of its terms in selected resources, such as MEDLINE. As Meta-1 will contain about 100MB of terms and relationships, it is unlikely that it will be “printed.” Instead, some UMLS applications will support Metathesaurus browsing. One way of browsing Meta-1 will be via the Apple Macintosh® application called HyperCard®. A demonstration of a HyperCard interface, called Meta-Card™ will first acquaint viewers with the contents of the pre-human-review version of Meta-1, and second, illustrate how an object-oriented interface can be programmed to support various visual metaphors, e.g. “click-to-get-more-information,” and “click-to-follow-a-semantic-link...
The Unified Medical Language System (UMLS) is being designed to provide uniform access to computer-based resources in biomedicine. For the foreseeable future, the foundation of the UMLS will be a metathesaurus of concepts, synthesized from existing sources, including MeSH, SNOMED, ICD-9-CM, CPT-4, DSM-III and other biomedical nomenclatures and classification systems. In Meta-1, the first version of the Metathesaurus, the synthesis is being implemented using a three-part methodology: 1) Concept names (terms) and intra-source relationships, such as synonymy, have been extracted from each source, and converted to a homogeneous representation; 2) inter-source lexical matches have been used to combine terms from different sources into Metathesaurus entries; and 3) some 30,000 of these entries, those containing MeSH terms and a selected sample of terms from other domains, will be reviewed by humans, enhanced, and modified, as appropriate. This methodology must eventually support incremental development and an audit trail, and it must preserve relationships added during human review. The 30,000 Meta-1 entries will contain in excess of 60,000 biomedical terms, and these terms will participate in more than 100,000 thesaurus relationships. These “normative” relationships will be supplemented by “empirical” relationships computed from certain UMLS resources. The first of the empirical relationships will be counts of the occurrence and co-occurrence of Meta-1 concepts in MEDLINE.
The Unified Medical Language System (UMLS) is intended to support uniform access of machine-readable biomedical information resources. The foundation of the UMLS is a Metathesaurus, which will link terms in different biomedical nomenclatures. Because the resources and nomenclatures continue to evolve, the Metathesaurus must evolve with them. Thus, an important criterion for the design of the Metathesaurus is the graceful accommodation of change. A model of such accommodation is presented. A key design decision was the representation of the Metathesaurus, and prospective updates, as a database of “facts.” Particular emphasis is placed on database operations that use the results of inter-nomenclature lexical matching to collapse entries from different nomenclatures into Metathesaurus entries. The implementation of this model for a simplified version of the Metathesaurus is described.
The Unified Medical Language System Metathesaurus represents the results of a synthesis of existing biomedical naming systems (thesauri). The naming and other information about the meanings in the Metathesaurus can be used to find the preferred naming of that meaning in the source chosen by the user, by exploiting the property of semantic locality. The aspects of semantic locality in the Metathesaurus which can be thus exploited are the terms, the semantic types, the use of that term in a source context, and the co-occurrence of terms in MEDLINE. To find how a meaning is named in the source of choice, a user must exploit one of these aspects of semantic locality, entering a term somehow related to the term being sought, and navigating to the preferred term. While the first three of these aspects of semantic locality are normative, the last is empirical. Testing of the utility of the aspects of semantic locality in information retrieval would require a uniform interface with 1, no Metathesaurus, 2, the Metathesaurus without the aspects in question, and 3, the Metathesaurus including all the aspects. Other potential uses of empirically derived semantic locality include defining or suggesting potentially relevant concepts in a given situation.
We propose a method for resolving ambiguities encountered when mapping free text to the UMLS Metathesaurus. Much of the research in medical informatics involves the manipulation of free text. The Metathesaurus contains extensive information which supports solutions to problems encountered while processing such text. After discussing the process of mapping free text to the Metathesaurus and describing the ambiguities which are often the result of such mapping, we provide examples of rules designed to eliminate mapping ambiguities. These rules refer to the context in which the ambiguity occurs and crucially depend on semantic types obtained from the Metathesaurus. We have conducted a preliminary test of the methodology and the results obtained indicate that the rules successfully resolve ambiguity around 80% of the time.
The Metathesaurus is a machine-created, human edited and enhanced synthesis of authoritative biomedical terminologies. Its formal properties permit it to be a) exploited by computers, and b) modified and enhanced without compromising that usage. If further constraints were imposed on the existence and identity of Metathesaurus relationships, i.e., if every Metathesaurus concept had a "genus" and a "differentia," then the Metathesaurus could be converted into an "Aristotelian Hierarchy." In this sense, a genus is a concept that classifies another concept, and a differentia is a concept that distinguishes the classified concept from all other concepts in the same class. Since, in principle, these constraints would make the Metathesaurus easier to leverage and maintain computationally, it is interesting to ask to what degree the maintenance and enhancement procedures now in place are producing a Metathesaurus that is also an "Aristotelian Hierarchy." Given a liberal interpretation of the current Metathesaurus schema, the proportion of the Metathesaurus that is "Aristotelian" in each annual version is increasing in spite of dramatic concurrent increases in the number of Metathesaurus concepts. Without formality there is no modifiability nor scalability.  We need formal methods and computer-based tools that can help us with the task [of controlled medical vocabulary construction]. We need research in which controlled vocabulary development is the focus rather than a stepping stone for work on other theories and applications. 
The third version of the UMLS Metathesaurus, Meta-1.2, to be released in October 1992, will have a simpler schema and simpler distribution formats than the first two versions, Meta-1.0 and Meta-1.1 released in October 1990 and 1991, respectively. For one thing, it will have only a single kind of entry (Concept), rather than three (Concept, Related, and Synonym). Further, the Relational Format, will consist of four logical relations, or tables, instead of the nearly three score different tables used to represent the same kind of information in Meta-1.1. These four tables will contain, respectively, (1) the names of each concept, (2) the relationships between concepts, (3) attributes of the concepts, and (4) a word-based index into the concept names. We argue that the new schema and formats provide a better conceptual model of the Metathesaurus, and represent the information contained there more uniformly. Even though these changes are incremental and evolutionary, both users and software developers should find the Meta-1.2 significantly easier to understand, and the information contained in it significantly easier to use.
A successful medical informatics program helps its users to match their information needs as closely and efficiently as possible to the capabilities of the system. CHARTLINE is a computer program whose input is a free text, "natural language" patient chart in ASCII format. Using the UMLS Metathesaurus Knowledge Sources, CHARTLINE can suggest bibliographic references relevant to the patient case described in the chart. The program does not attempt to "understand" the natural language content of the chart. CHARTLINE only recognizes UMLS Metathesaurus Main Concept terms (or their synonyms) as they occur in the medical text, since those terms represent the tokens used to index the literature. The program depends on user feedback to determine which topics of a large number of potentially relevant subjects are of interest to the user.
This paper advances a detailed exploration of the complex relationships among terms, concepts, and synonymy in the UMLS (Uniﬁed Medical Language System) Metathesaurus, and proposes the study and understanding of the Metathesaurus from a model-theoretic perspective. Initial sections provide the background and motivation for such an approach, and a careful informal treatment of these notions is offered as a context and basis for the formal analysis. What emerges from this is a set of puzzles and confusions in the Metathesaurus and its literature pertaining to synonymy and its relation to terms and concepts. A model theory for a segment of the Metathesaurus is then constructed, and its adequacy relative to the informal treatment is demonstrated. Finally, it is shown how this approach clariﬁes and addresses the puzzles educed from the informal discussion, and how the model-theoretic perspective may be employed to evaluate some fundamental criticisms of the Metathesaurus. For users of the UMLS, two signiﬁcant results of this analysis are a rigorous clariﬁcation of the different senses of synonymy that appear in treatments of the Metathesaurus and an illustration of the dangers in computing inferences involving ambiguous terms.
The Unified Medical Language System (UMLS) Metathesaurus is widely used for biomedical natural language processing (NLP) tasks. In this study, we systematically analyzed UMLS Metathesaurus terms by analyzing their occurrences in over 18 million MEDLINE abstracts. Our goals were: 1. analyze the frequency and syntactic distribution of Metathesaurus terms in MEDLINE; 2. create a filtered UMLS Metathesaurus based on the MEDLINE analysis; 3. augment the UMLS Metathesaurus where each term is associated with metadata on its MEDLINE frequency and syntactic distribution statistics. After MEDLINE frequency-based filtering, the augmented UMLS Metathesaurus contains 518,835 terms and is roughly 13% of its original size. We have shown that the syntactic and frequency information is useful to identify errors in the Metathesaurus. This filtered and augmented UMLS Metathesaurus can potentially be used to improve efficiency and precision of UMLS-based information retrieval and NLP tasks.
A critical knowledge source being developed as part of the NLM's UMLS Project is a biomedical thesaurus, called the Metathesaurus. Central to the Metathesaurus will be inter-term relationships, across several biomedical nomenclatures and classification systems, which are derivable from lexical mapping techniques. Previous UMLS research on intervocabulary mapping elaborated these techniques. During the Fall of 1988, they were extended and used to build META-0, a 2,000-concept demonstration Metathesaurus. META-0 was composed primarily of the most frequently occurring MEDLINE index terms from MeSH, and MeSH will be the main source of concepts for META-1, the initial public version of the Metathesaurus. Review of META-0 suggested several refinements to the methodology for building META-1. These include labeling MeSH Entry Terms as lexical variants or synonyms before linking them to other sources. Later work refined algorithmic methods which detect lexical variants in MeSH. The META-1 lexical mapping methodology derives from this research.
The National Library of Medicine's Unified Medical Language System  Metathesaurus contains the richest single corpus of biomedical names in existence. Yet, developers wishing to make use of the Metathesaurus will be confronted by users who want to add local terminology and relationships not already represented there. We urge developers to fill those needs, while, at the same time, they plan for the many consequences of unilateral Metathesaurus enhancement. Foremost among these consequences is the need to maintain local enhancements across subsequent releases of the Metathesaurus. These problems are illustrated via examples of candidate Metathesaurus enhancement terms in use at the Columbia-Presbyterian Medical Center (CPMC), at the Mayo Clinic, and in Current Disease Descriptions (CDD). Sharing and reuse of Metathesaurus enhancement methods may permit local enhancements to be used at other sites, and it may permit the global Metathesaurus utilization effort to benefit from economies of scale.
The UMLS Metathesaurus, the largest thesaurus in the biomedical domain, provides a representation of biomedical knowledge consisting of concepts classified by semantic type and both hierarchical and non-hierarchical relationships among the concepts. This knowledge has proved useful for many applications including decision support systems, management of patient records, information retrieval (IR) and data mining. Gaining effective access to the knowledge is critical to the success of these applications. This paper describes MetaMap, a program developed at the National Library of Medicine (NLM) to map biomedical text to the Metathesaurus or, equivalently, to discover Metathesaurus concepts referred to in text. MetaMap uses a knowledge intensive approach based on symbolic, natural language processing (NLP) and computational linguistic techniques. Besides being applied for both IR and data mining applications, MetaMap is one of the foundations of NLM's Indexing Initiative System which is being applied to both semi-automatic and fully automatic indexing of the biomedical literature at the library.