Table of Contents: Text Mining


Image Credit: PLOS
Issue Image

Across all realms of the sciences and beyond, the rapid growth in the number of works published digitally presents new challenges and opportunities for making sense of this wealth of textual information. The maturing field of Text Mining aims to solve problems concerning the retrieval, extraction and analysis of unstructured information in digital text, and to revolutionize how scientists access and interpret data that might otherwise remain buried in the literature.

Here PLOS acknowledges the growing body of work in the area of Text Mining by bringing together major reviews and new research studies published in PLOS journals to create the PLOS Text Mining Collection. It is no coincidence that research in Text Mining in PLOS journals is burgeoning: the widespread uptake of the Open Access publishing model developed by PLOS and other publishers now makes it easier than ever to obtain, mine and redistribute data from published texts. The launch of the PLOS Text Mining Collection complements related PLOS Collections on Open Access and Altmetrics, and further underscores the importance of the PLOS Application Programming Interface, which provides an open source interface with which to mine PLOS journal content.

The Collection is now open across the PLOS journals to all authors who wish to submit research or reviews in this area. Articles are presented below in order of publication date and new articles will be added to the Collection as they are published.


Open Access: Taking Full Advantage of the Content

Philip E. Bourne, J. Lynn Fink, Mark Gerstein

Messages from ISCB

Getting Started in Text Mining

K. Bretonnel Cohen, Lawrence Hunter

Getting Started in Text Mining: Part Two

Andrey Rzhetsky, Michael Seringhaus, Mark B. Gerstein



Facts from Text—Is Text Mining Ready to Deliver?

Dietrich Rebholz-Schuhmann, Harald Kirsch, Francisco Couto


Tough Mining

Steven Dickman

Research Articles

Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization

Sofie Van Landeghem, Jari Björne, Chih-Hsuan Wei, Kai Hakala, Sampo Pyysalo, Sophia Ananiadou, Hung-Yu Kao, Zhiyong Lu, Tapio Salakoski, Yves Van de Peer, Filip Ginter

Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database

Allan Peter Davis, Thomas C. Wiegers, Robin J. Johnson, Jean M. Lay, Kelley Lennon-Hopkins, Cynthia Saraceni-Richards, Daniela Sciaky, Cynthia Grondin Murphy, Carolyn J. Mattingly

Approximate Subgraph Matching-Based Literature Mining for Biomedical Events and Relations

Haibin Liu, Lawrence Hunter, Vlado Kešelj, Karin Verspoor

The Expression of Emotions in 20th Century Books

Alberto Acerbi, Vasileios Lampos, Philip Garnett, R. Alexander Bentley

Automated Authorship Attribution Using Advanced Signal Classification Techniques

Maryam Ebrahimpour, Tālis J. Putniņš, Matthew J. Berryman, Andrew Allison, Brian W.-H. Ng, Derek Abbott

Utilizing Descriptive Statements from the Biodiversity Heritage Library to Expand the Hymenoptera Anatomy Ontology

Katja C. Seltmann, Zsolt Pénzes, Matthew J. Yoder, Matthew A. Bertone, Andrew R. Deans

Connecting the Dots between PubMed Abstracts

M. Shahriar Hossain, Joseph Gresock, Yvette Edmonds, Richard Helm, Malcolm Potts, Naren Ramakrishnan

Differences among Major Taxa in the Extent of Ecological Knowledge across Four Major Ecosystems

Rebecca Fisher, Nancy Knowlton, Russell E. Brainard, M. Julian Caley

pubmed2ensembl: A Resource for Mining the Biological Literature on Genes

Joachim Baran, Martin Gerner, Maximilian Haeussler, Goran Nenadic, Casey M. Bergman

Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts

Francisco S. Roque, Peter B. Jensen, Henriette Schmock, Marlene Dalgaard, Massimo Andreatta, Thomas Hansen, Karen Søeby, Søren Bredkjær, Anders Juul, Thomas Werge, Lars J. Jensen, Søren Brunak

A Network of Genes, Genetic Disorders, and Brain Areas

Satoru Hayasaka, Christina E. Hugenschmidt, Paul J. Laurienti

Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

BalaKrishna Kolluru, Lezan Hawizy, Peter Murray-Rust, Junichi Tsujii, Sophia Ananiadou

Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

Kevin W. Boyack, David Newman, Russell J. Duhon, Richard Klavans, Michael Patek, Joseph R. Biberstine, Bob Schijvenaars, André Skupin, Nianli Ma, Katy Börner

Database Citation in Full Text Biomedical Articles

Şenay Kafkas, Jee-Hyub Kim, Johanna R. McEntyre

The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text

Evangelos Pafilis, Sune P. Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, Lars Juhl Jensen

Phrasal Paraphrase Based Question Reformulation for Archived Question Retrieval

Yu Zhang, Wei-Nan Zhang, Ke Lu, Rongrong Ji, Fanglin Wang, Ting Liu

‘HypothesisFinder:’ A Strategy for the Detection of Speculative Statements in Scientific Text

Ashutosh Malhotra, Erfan Younesi, Harsha Gurulingappa, Martin Hofmann-Apitius

Learning to Recognize Phenotype Candidates in the Auto-Immune Literature Using SVM Re-Ranking

Nigel Collier, Mai-vu Tran, Hoang-quynh Le, Quang-Thuy Ha, Anika Oellrich, Dietrich Rebholz-Schuhmann

Collective Instance-Level Gene Normalization on the IGN Corpus

Hong-Jie Dai, Johnny Chi-Yang Wu, Richard Tzong-Han Tsai

Books Average Previous Decade of Economic Misery

R. Alexander Bentley, Alberto Acerbi, Paul Ormerod, Vasileios Lampos

Biological Diversity in the Patent System

Paul Oldham, Stephen Hall, Oscar Forero