bioknack’s pubmed2ensembl Query Wrapper

Posted on June 17, 2011

3


This is the first post in a new “mini” series of tool features and research results. Unlike previous long winded blog posts, the mini series only features a very short description of the topic, gives an example and concludes.

bioknack‘s collection of tools has been extended with a small command line tool for retrieving the PubMed IDs that are associated with a certain gene. The information is retrieved from www.pubmed2ensembl.org where the user only specifies a species and Ensembl gene ID to query their data sources “Entrez Gene”, “MEDLINE”, “PMC”, “EMBL BLAST”, “EMBL XREF” and “text2genome” (see this poster for a data-source description). Results are displayed in a TSV format. The first column denotes the attribute in the BioMart that was queried and the second column denotes the PubMed ID which has been returned for said attribute.

Examples

Query 1: Query PubMed IDs for the Ensembl gene ENSG00000139618 for the default species Homo sapiens

bk_pubmed2ensembl.rb -g ENSG00000139618

Output:

blast56_c100t_flat_blast56_c100t_pmid_1090	8524414
blast56_c100t_flat_blast56_c100t_pmid_1090	8640236
blast56_c100t_flat_blast56_c100t_pmid_1090	12100744
blast56_c100t_flat_blast56_c100t_pmid_1090	14722926
embl_flat_embl_pmid_1092	8640236
entrez_flat_entrez_pmid_1094	1072445
entrez_flat_entrez_pmid_1094	7581463
entrez_flat_entrez_pmid_1094	7597059
entrez_flat_entrez_pmid_1094	8091231
[...]

Query 2: Query PubMed IDs for the Ensembl gene FBgn0001325 of Drosophila melanogaster

bk_pubmed2ensembl.rb -s dmelanogaster -g FBgn0001325

Output:

entrez_flat_entrez_pmid_1094	1327756
entrez_flat_entrez_pmid_1094	1346367
entrez_flat_entrez_pmid_1094	1346608
entrez_flat_entrez_pmid_1094	1348871
entrez_flat_entrez_pmid_1094	1423595
entrez_flat_entrez_pmid_1094	1438276
entrez_flat_entrez_pmid_1094	1451665
entrez_flat_entrez_pmid_1094	1457465
entrez_flat_entrez_pmid_1094	1463605
entrez_flat_entrez_pmid_1094	1480489
[...]

Query 3: List the available species

bk_pubmed2ensembl.rb -l

Output:

Available species:
  acarolinensis
  btaurus
  celegans
  cfamiliaris
  choffmanni
  cintestinalis
  cjacchus
  cporcellus
  csavignyi
  [...]

Acknowledgements