Literature Mining

We have developed large-scale text-mining software that has processed all 26 million MEDLINE records to create a weighted network of linked entities, e.g., genes, diseases, phenotypes, chemicals, lipids, metabolites, aging-related concepts, etc. (Wren et al., 2004; Wren & Garner, 2004). An example of how associations can be identified that are implied from published relationships is shown in the figure where we originally used our programs to search for drugs without any published relationship with cardiac hypertrophy, yet were predicted to affect cardiac hypertrophy based on the published relationships they shared (i.e., implicit relationships). Chlorpromazine was the top candidate identified, and when tested experimentally in a mouse model of isoproterenol-induced cardiac hypertrophy, it was found to significantly reduce cardiac hypertrophy (Wren, 2004).   

The programs we have developed can be used for several purposes:

  1. Identify shared relationships among entities (e.g., differentially expressed genes in a microarray experiment)
  2. Find how two things are connected in MEDLINE, or find implied relationships for an entity of interest (Figure 1)
  3. Evaluate how your genes of interest connect to concepts, or the literature “cohesiveness” of a set of genes (e.g., random gene sets have different network structures than experimental ones).


Wren, J.D., Bekeredjian, R., Stewart, J.A., Shohet, R.V., and Garner, H.R. (2004). Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20, 389-398.

Wren, J.D. and Garner, H.R. (2004), Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics  20,191-198.

Wren, J.D. (2004). Extending the mutual information measure to rank inferred literature relationships. BMC bioinformatics  5,145.