|
|
Mistaken Identifiers: Gene Name Errors Can
Be Introduced Inadvertently When Using Excel in Bioinformatics
Barry R Zeeberg, Joseph Riss, David W Kane,
Kimberly J Bussey, Edward Uchio, Marston Linehan, J. Carl Barrett and John N Weinstein
BMC Bioinformatics 2004, 5:80 doi:10.1186/1471-2105-5-80
Abstract:
Background
When processing microarray data sets, we recently noticed that
some gene names were being changed inadvertently to non-gene names.
Results
A little detective work traced the problem to default date format
conversions and floating-point format conversions in the very useful Excel program
package. The date conversions affect at least 30 gene names; the floating-point
conversions affect at least 2,000 if Riken identifiers are included. These conversions
are irreversible; the original gene names cannot be recovered.
Conclusions
Users of Excel for analyses involving gene names should be aware
of this problem, which can cause genes, including medically important ones, to
be lost from view and which has contaminated even carefully curated public databases.
We provide work-arounds and scripts for circumventing the problem.
|