Genomics and Bioinformatics Group Genomics and Bioinformatics Group Genomics and Bioinformatics Group
Genomics and Bioinformatics Group

2002 Publication

Genomics and Bioinformatics Group
   Home
  Publications
      2008
      2007
      2006
      2005
      2004
      2003
      2002
      2001
      2000
      1999
      Before 1999
      Selected
   Tools
   Data Sets
   Molec Maps
   μA Analysis
   Members
   Links
   Contact
   Search
 
Pharmacogenomic Analysis: Correlating Molecular Substructure Classes with Microarray Gene Expression Data

Blower, P., Jr., Yang, C., Fligner, M.A., Verducci, J.S., Yu, L., Richman, S. , and Weinstein, J.N.

The Pharmacogenomics Journal (Nature) 2002; 2:259-271.

Full text (PDF) Read article online Gene Databases Used for the Paper

Abstract:Genomic studies are producing large databases of molecular information on cancers and other cell and tissue types. Hence, we have the opportunity to link these accumulating data to the drug discovery processes. Our previous efforts at .information.intensive. molecular pharmacology have focused on the relationship between patterns of gene expression and patterns of drug activity. In the present study, we take the process a step further.relating gene expression patterns, not just to the drugs as entities, but to ~27 000 substructures and other chemical features within the drugs. This coupling of genomic information with structure-based data mining can be used to identify class es of compounds for which detailed experimental structure.activity studies may be fruitful. Using a systematic substructure analysis coupled with statistical correlations of compound activity with differential gene expression,we have identified two subclasses of quinones whose patterns of activity in the National Cancer Institute's 60-cell line screening panel (NCI-60) correlate strongly with the expression patterns of particular genes: (i) The growth nhibitory patterns of an electron-withdrawing subclass of benzodithiophene- dione-containing compounds over the NCI-60 are highly correlated with the expression patterns of Rab7 and other melanoma-specific genes; (ii) the inhibitory patterns of indolonaphthoquinone-containing compounds are highly correlated with the expression patterns of the hematopoietic lineage- specific gene HS1 and other leukemia genes. As illustrated by these proof- of-principle examples, we introduce here a set of conceptual tools and fluent computational methods for projecting directly from gene expression patterns to drug substructures and vice versa. The analysis is presented in terms of the NCI-60 cell lines and microarray-based gene expression patterns, but the concept and methods are broadly applicable to other large-scale pharmaco- genomic database sets as well. The approach (SAT for Structure-Activity- Target) provides a systematic way to mine databases for the design of further structure.activity studies, particularly to aid in target and lead identification.

Note: To download the compressed databases, right-click and choose 'save as' (PC) or hold down 'control' and click (Mac).
Dataset This is a copy of the 9,704-cDNA database available at http://discover.nci.nih.gov. (Download compressed:   gz format)
Dataset This dataset contains a 3,748-gene subset of the 9,704-cDNA database. The first column is the Washington University Clone ID number, and the second column is a brief description from the data source. The subset was selected to include only genes whose identities had been sequence-verified (see: Ross, et. al. Nat Genet 2000; 24: 227-35) and which had < 10% missing data values over the 60 cell lines. (Download compressed:    gz format)
Dataset This set contains standardized gene expression values for the 3,748-gene subset. The gene expression values were standardized into Z-scores by subtracting the row-mean and dividing by the row-wise standard deviation. (Download compressed:   gz format)
Dataset This set contains activity values for the set of 4,463 compounds that have been tested in the NCI Developmental Therapeutics Program's sulforhodamine B assay two or more times and for which we have structure records. The compound activity values are the concentration needed for 50% growth inhibition and listed as -log(GI50).(Download compressed:    gz format)
Dataset This set contains activity values for the set of 4,463 compounds that have been standardized into Z-scores by subtracting the row-mean and dividing by the row-wise standard deviation. (Download compressed:   gz format)
Dataset This set contains structure records for the set of 4,463 compounds in MDL's SDF file format; see: http://www.mdli.com for details of the structure format. (Download compressed:   gz format)
Dataset This set contains Pearson correlation coefficients for three quinones (NSC 656238, NSC 661223 and NSC 682991) and the 476 genes we selected using the Studentized range test on the 7 cell panels for which the average expression level was significantly high or low. The first column is the Washington University Clone ID number, and the remaining columnS contain compound-gene correlation coefficients. (Download compressed:    gz format)

Genomics and Bioinformatics Group Home Page Link to Center for Cancer Research Home Page Link to National Cancer Institute Home Page Link to National Institutes of Health Link to Department of Health & Human Services Home Page