Subscribe:
RSS feed
Text mining of very large document sets has become feasible within the last years, where it is possible to search millions of text documents for the occurrence of certain keywords within a matter of hours. In the following, I present an approach for extracting mentions of Medical Subject Headings from MEDLINE’s 2010 baseline corpus of […]
July 25, 2010
The U.S. National Library of Medicine produces the Medical Subject Headings (MeSH) thesaurus that covers a wide variety of medical terms, cross-references between MeSH records and link-outs to medical literature. MeSH is available for download, where the dataset is either formatted in XML or a proprietary ASCII-based file format (.bin). In this blog post, I […]
November 1, 2010
2