Download Excel spreadsheet of periodic transcripts identified in at least two data sets by SPM
Download spreadsheet of periodic transcripts identified in only one data set by SPM
A statistical modeling approach has been devised to search large microarray data sets for genes that have a transcriptional response to a stimulus. To illustrate this strategy we have derived a Single Pulse Model (SPM) to depict the profile expected for a periodically transcribed gene and used it to look for budding yeast transcripts that adhere to this profile. Using objective criteria, this method identifies 81% of the known periodic transcripts and 1088 genes which show significant periodicity in at least one of the three data sets analyzed. 65% of the genes identified as periodic by Spellman et al., l998 (MBOC 9,3273) are also on this list of 1088 genes. Only one quarter of these 1088 genes show significant oscillations in at least two data sets and can be classified as periodic with high confidence.
The primary assumptions of SPM are that cell cycle regulated transcripts will peak only once per cycle and that these pulses occur at invariant times in consecutive cycles. To reduce the impact of noise, peaks and troughs must also persist for at least two data points. We estimated cell cycle span for each data set using a set of 104 known cell cycle regulated genes. As expected, the cycle span differs for each synchrony method: 58 minutes for the alpha factor synchrony, 115' for the cdc15 cells, and 85' for the cdc28 culture. SPM provides estimates of the mean activation and deactivation times, induced and basal expression levels and statistical measures of the quality of these estimates for each periodic transcript. This website provides Excel spreadsheets containing this information.
At threshold value of |Z| > 5 was set to identify transcripts with significant oscillations and a test statistic (chi2) was used to identify profiles that depart significantly from SPM. Those with low Z values, or where chi2 > 11.3 are not considered periodic and are not listed. The R2 values represent the fraction of the data variation that can be explained by the model. In cases where a transcript meets SPM criteria in only one of the three data sets, the activation times and expression levels are set to 0.0 for the other two data sets. The data for the alpha factor synchrony is listed first, followed by the cdc15 and the cdc28 data. The first three columns in each case provide the statistical values. They are followed by the activation (act) and deactivation (deact) time in minutes, then the basal and elevated (elev) expression levels are provided as log ratios. Two spread sheets are provided. The first provides data for transcripts that are significantly periodic in either two or three data sets. The second shows transcripts which meet SPM criteria for periodicity in only one of the three data sets.
For further information please refer to: Statistical modeling
of large microarray data sets to identify stimulus-response profiles.
PNAS 98: 5631-5636 (2001).
(If your institution subscribes to PNAS, you can view the HTML or print thePDF version of this paper.)
Lue Ping Zhao1,3, Ross Prentice1, Linda Breeden2,3
1Division of Public Health Sciences, 2Division of Basic Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. North, Seattle, WA 98006, U.S.A. 3All correspondence should be addressed to Linda Breeden (lbreeden@fhcrc.org) or Lue Ping Zhao (lzhao@fhcrc.org).