A | |
---|---|
1 | |
2 | Data collected by Heather Piwowar |
3 | hpiwowar@gmail.com |
4 | This data is made available under a CC0 waiver |
5 | Please attribute according to academic norms :) |
6 | |
7 | For more info: |
8 | http://researchremix.wordpress.com/2011/02/18/early_results/ |
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | 2007 | 2008 | 2009 | 2010 | ||
2 | Hits in PubMed for "gene expression profiling"[mesh] | 7,056 | 8,003 | 8,889 | 7,397 | |
3 | Hits in PMC for "gene expression profiling"[mesh] | 1,631 | 2,575 | 3,163 | 1,866 | |
4 | % of PubMed in PMC | 23% | 32% | 36% | 25% | |
5 | ||||||
6 | ||||||
7 | 2007 | 2008 | 2009 | 2010 | TOTAL | |
8 | Use by data-producing authors: | |||||
9 | mentions of 2007 GEO accessions in PMC | 472 | 221 | 83 | 60 | 836 |
10 | extrapolation to all PubMed | 2,042 | 687 | 233 | 238 | 3,200 |
11 | ||||||
12 | ||||||
13 | Use by NON data-producing authors: | |||||
14 | mentions of 2007 GEO accessions in PMC | 12 | 59 | 117 | 150 | 338 |
15 | extrapolation to all PubMed | 52 | 183 | 329 | 595 | 1,159 |
16 | ||||||
17 | ||||||
18 | Number of GSE submissions to GEO in 2007 | 2711 | ||||
19 | Per number of submissions, Original and secondary use by NON data-producing authors: | |||||
20 | mentions of 2007 GEO accessions in PMC | 0.004 | 0.022 | 0.043 | 0.055 | 0.125 |
21 | extrapolation to all PubMed | 0.019 | 0.068 | 0.121 | 0.219 | 0.427 |
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | 656 | Number of instances of reuse across all PMC articles | ||||
2 | 364 | Number of accessions with at least one reuse in PMC | ||||
3 | 2711 | Number of GSE accessions in 2007 | ||||
4 | 0.134267798 | Proportion of accessions that led to reuse | ||||
5 | ||||||
6 | ||||||
7 | ||||||
8 | ||||||
9 | reuse by others | 1 | ||||
10 | ||||||
11 | Count of accession | |||||
12 | gse | Total | % of all reuses | running % of reuses | rank | rank percentile of all 2007 datasets |
13 | ['7390'] | 24 | 3.66% | 3.66% | 1 | 0.04% |
14 | ['6532'] | 20 | 3.05% | 6.71% | 2 | 0.07% |
15 | ['7307'] | 16 | 2.44% | 9.15% | 3 | 0.11% |
16 | ['6919'] | 10 | 1.52% | 10.67% | 4 | 0.15% |
17 | ['4107'] | 9 | 1.37% | 12.04% | 5 | 0.18% |
18 | ['6893'] | 9 | 1.37% | 13.41% | 6 | 0.22% |
19 | ['6740'] | 7 | 1.07% | 14.48% | 7 | 0.26% |
20 | ['5620'] | 7 | 1.07% | 15.55% | 8 | 0.30% |
21 | ['6901'] | 7 | 1.07% | 16.62% | 9 | 0.33% |
22 | ['6691'] | 6 | 0.91% | 17.53% | 10 | 0.37% |
23 | ['6764'] | 6 | 0.91% | 18.45% | 11 | 0.41% |
24 | ['7951'] | 6 | 0.91% | 19.36% | 12 | 0.44% |
25 | ['7448'] | 5 | 0.76% | 20.12% | 13 | 0.48% |
26 | ['5629'] | 5 | 0.76% | 20.88% | 14 | 0.52% |
27 | ['5632'] | 5 | 0.76% | 21.65% | 15 | 0.55% |
28 | ['5764'] | 5 | 0.76% | 22.41% | 16 | 0.59% |
29 | ['5859'] | 5 | 0.76% | 23.17% | 17 | 0.63% |
30 | ['6838'] | 5 | 0.76% | 23.93% | 18 | 0.66% |
31 | ['6908'] | 5 | 0.76% | 24.70% | 19 | 0.70% |
32 | ['8671'] | 5 | 0.76% | 25.46% | 20 | 0.74% |
33 | ['6536'] | 5 | 0.76% | 26.22% | 21 | 0.77% |
34 | ['8401'] | 5 | 0.76% | 26.98% | 22 | 0.81% |
35 | ['7761'] | 4 | 0.61% | 27.59% | 23 | 0.85% |
36 | ['5621'] | 4 | 0.61% | 28.20% | 24 | 0.89% |
37 | ['5623'] | 4 | 0.61% | 28.81% | 25 | 0.92% |
38 | ['5624'] | 4 | 0.61% | 29.42% | 26 | 0.96% |
39 | ['5630'] | 4 | 0.61% | 30.03% | 27 | 1.00% |
40 | ['5631'] | 4 | 0.61% | 30.64% | 28 | 1.03% |
41 | ['5634'] | 4 | 0.61% | 31.25% | 29 | 1.07% |
42 | ['5847'] | 4 | 0.61% | 31.86% | 30 | 1.11% |
43 | ['6719'] | 4 | 0.61% | 32.47% | 31 | 1.14% |
44 | ['7621'] | 4 | 0.61% | 33.08% | 32 | 1.18% |
45 | ['8024'] | 4 | 0.61% | 33.69% | 33 | 1.22% |
46 | ['8501'] | 4 | 0.61% | 34.30% | 34 | 1.25% |
47 | ['9566'] | 4 | 0.61% | 34.91% | 35 | 1.29% |
48 | ['9832'] | 4 | 0.61% | 35.52% | 36 | 1.33% |
49 | ['6008'] | 4 | 0.61% | 36.13% | 37 | 1.36% |
50 | ['1986'] | 3 | 0.46% | 36.59% | 38 | 1.40% |
51 | ['4183'] | 3 | 0.46% | 37.04% | 39 | 1.44% |
52 | ['5287'] | 3 | 0.46% | 37.50% | 40 | 1.48% |
53 | ['5628'] | 3 | 0.46% | 37.96% | 41 | 1.51% |
54 | ['5949'] | 3 | 0.46% | 38.41% | 42 | 1.55% |
55 | ['6269'] | 3 | 0.46% | 38.87% | 43 | 1.59% |
56 | ['6477'] | 3 | 0.46% | 39.33% | 44 | 1.62% |
57 | ['6783'] | 3 | 0.46% | 39.79% | 45 | 1.66% |
58 | ['6872'] | 3 | 0.46% | 40.24% | 46 | 1.70% |
59 | ['6916'] | 3 | 0.46% | 40.70% | 47 | 1.73% |
60 | ['7146'] | 3 | 0.46% | 41.16% | 48 | 1.77% |
61 | ['7378'] | 3 | 0.46% | 41.62% | 49 | 1.81% |
62 | ['7576'] | 3 | 0.46% | 42.07% | 50 | 1.84% |
63 | ['7670'] | 3 | 0.46% | 42.53% | 51 | 1.88% |
64 | ['7763'] | 3 | 0.46% | 42.99% | 52 | 1.92% |
65 | ['7904'] | 3 | 0.46% | 43.45% | 53 | 1.95% |
66 | ['9006'] | 3 | 0.46% | 43.90% | 54 | 1.99% |
67 | ['3165'] | 2 | 0.30% | 44.21% | 55 | 2.03% |
68 | ['4302'] | 2 | 0.30% | 44.51% | 56 | 2.07% |
69 | ['5327'] | 2 | 0.30% | 44.82% | 57 | 2.10% |
70 | ['5462'] | 2 | 0.30% | 45.12% | 58 | 2.14% |
71 | ['5615'] | 2 | 0.30% | 45.43% | 59 | 2.18% |
72 | ['5617'] | 2 | 0.30% | 45.73% | 60 | 2.21% |
73 | ['5720'] | 2 | 0.30% | 46.04% | 61 | 2.25% |
74 | ['5791'] | 2 | 0.30% | 46.34% | 62 | 2.29% |
75 | ['5843'] | 2 | 0.30% | 46.65% | 63 | 2.32% |
76 | ['5862'] | 2 | 0.30% | 46.95% | 64 | 2.36% |
77 | ['5961'] | 2 | 0.30% | 47.26% | 65 | 2.40% |
78 | ['6236'] | 2 | 0.30% | 47.56% | 66 | 2.43% |
79 | ['6292'] | 2 | 0.30% | 47.87% | 67 | 2.47% |
80 | ['6385'] | 2 | 0.30% | 48.17% | 68 | 2.51% |
81 | ['6481'] | 2 | 0.30% | 48.48% | 69 | 2.55% |
82 | ['6596'] | 2 | 0.30% | 48.78% | 70 | 2.58% |
83 | ['6624'] | 2 | 0.30% | 49.09% | 71 | 2.62% |
84 | ['6710'] | 2 | 0.30% | 49.39% | 72 | 2.66% |
85 | ['6731'] | 2 | 0.30% | 49.70% | 73 | 2.69% |
86 | ['6791'] | 2 | 0.30% | 50.00% | 74 | 2.73% |
87 | ['6802'] | 2 | 0.30% | 50.30% | 75 | 2.77% |
88 | ['6858'] | 2 | 0.30% | 50.61% | 76 | 2.80% |
89 | ['6883'] | 2 | 0.30% | 50.91% | 77 | 2.84% |
90 | ['7069'] | 2 | 0.30% | 51.22% | 78 | 2.88% |
91 | ['7112'] | 2 | 0.30% | 51.52% | 79 | 2.91% |
92 | ['7181'] | 2 | 0.30% | 51.83% | 80 | 2.95% |
93 | ['7196'] | 2 | 0.30% | 52.13% | 81 | 2.99% |
94 | ['7329'] | 2 | 0.30% | 52.44% | 82 | 3.02% |
95 | ['7333'] | 2 | 0.30% | 52.74% | 83 | 3.06% |
96 | ['7540'] | 2 | 0.30% | 53.05% | 84 | 3.10% |
97 | ['7765'] | 2 | 0.30% | 53.35% | 85 | 3.14% |
98 | ['7896'] | 2 | 0.30% | 53.66% | 86 | 3.17% |
99 | ['7902'] | 2 | 0.30% | 53.96% | 87 | 3.21% |
100 | ['7929'] | 2 | 0.30% | 54.27% | 88 | 3.25% |
101 | ['7930'] | 2 | 0.30% | 54.57% | 89 | 3.28% |
102 | ['8004'] | 2 | 0.30% | 54.88% | 90 | 3.32% |
103 | ['8052'] | 2 | 0.30% | 55.18% | 91 | 3.36% |
104 | ['8218'] | 2 | 0.30% | 55.49% | 92 | 3.39% |
105 | ['8311'] | 2 | 0.30% | 55.79% | 93 | 3.43% |
106 | ['8365'] | 2 | 0.30% | 56.10% | 94 | 3.47% |
107 | ['8514'] | 2 | 0.30% | 56.40% | 95 | 3.50% |
108 | ['8650'] | 2 | 0.30% | 56.71% | 96 | 3.54% |
109 | ['8668'] | 2 | 0.30% | 57.01% | 97 | 3.58% |
110 | ['8700'] | 2 | 0.30% | 57.32% | 98 | 3.61% |
111 | ['8759'] | 2 | 0.30% | 57.62% | 99 | 3.65% |
112 | ['8835'] | 2 | 0.30% | 57.93% | 100 | 3.69% |
113 | ['8884'] | 2 | 0.30% | 58.23% | 101 | 3.73% |
114 | ['8910'] | 2 | 0.30% | 58.54% | 102 | 3.76% |
115 | ['8919'] | 2 | 0.30% | 58.84% | 103 | 3.80% |
116 | ['8945'] | 2 | 0.30% | 59.15% | 104 | 3.84% |
117 | ['8977'] | 2 | 0.30% | 59.45% | 105 | 3.87% |
118 | ['8990'] | 2 | 0.30% | 59.76% | 106 | 3.91% |
119 | ['9254'] | 2 | 0.30% | 60.06% | 107 | 3.95% |
120 | ['9574'] | 2 | 0.30% | 60.37% | 108 | 3.98% |
121 | ['9650'] | 2 | 0.30% | 60.67% | 109 | 4.02% |
122 | ['9877'] | 2 | 0.30% | 60.98% | 110 | 4.06% |
123 | ['7302'] | 2 | 0.30% | 61.28% | 111 | 4.09% |
124 | ['7956'] | 2 | 0.30% | 61.59% | 112 | 4.13% |
125 | ['6210'] | 1 | 0.15% | 61.74% | 113 | 4.17% |
126 | ['7186'] | 1 | 0.15% | 61.89% | 114 | 4.21% |
127 | ['8608'] | 1 | 0.15% | 62.04% | 115 | 4.24% |
128 | ['8842'] | 1 | 0.15% | 62.20% | 116 | 4.28% |
129 | ['1993'] | 1 | 0.15% | 62.35% | 117 | 4.32% |
130 | ['2888'] | 1 | 0.15% | 62.50% | 118 | 4.35% |
131 | ['3629'] | 1 | 0.15% | 62.65% | 119 | 4.39% |
132 | ['3748'] | 1 | 0.15% | 62.80% | 120 | 4.43% |
133 | ['3990'] | 1 | 0.15% | 62.96% | 121 | 4.46% |
134 | ['4091'] | 1 | 0.15% | 63.11% | 122 | 4.50% |
135 | ['4327'] | 1 | 0.15% | 63.26% | 123 | 4.54% |
136 | ['4494'] | 1 | 0.15% | 63.41% | 124 | 4.57% |
137 | ['4582'] | 1 | 0.15% | 63.57% | 125 | 4.61% |
138 | ['4654'] | 1 | 0.15% | 63.72% | 126 | 4.65% |
139 | ['4716'] | 1 | 0.15% | 63.87% | 127 | 4.68% |
140 | ['4786'] | 1 | 0.15% | 64.02% | 128 | 4.72% |
141 | ['4859'] | 1 | 0.15% | 64.18% | 129 | 4.76% |
142 | ['4992'] | 1 | 0.15% | 64.33% | 130 | 4.80% |
143 | ['5054'] | 1 | 0.15% | 64.48% | 131 | 4.83% |
144 | ['5108'] | 1 | 0.15% | 64.63% | 132 | 4.87% |
145 | ['5140'] | 1 | 0.15% | 64.79% | 133 | 4.91% |
146 | ['5145'] | 1 | 0.15% | 64.94% | 134 | 4.94% |
147 | ['5180'] | 1 | 0.15% | 65.09% | 135 | 4.98% |
148 | ['5265'] | 1 | 0.15% | 65.24% | 136 | 5.02% |
149 | ['5310'] | 1 | 0.15% | 65.40% | 137 | 5.05% |
150 | ['5381'] | 1 | 0.15% | 65.55% | 138 | 5.09% |
151 | ['5460'] | 1 | 0.15% | 65.70% | 139 | 5.13% |
152 | ['5559'] | 1 | 0.15% | 65.85% | 140 | 5.16% |
153 | ['5595'] | 1 | 0.15% | 66.01% | 141 | 5.20% |
154 | ['5622'] | 1 | 0.15% | 66.16% | 142 | 5.24% |
155 | ['5633'] | 1 | 0.15% | 66.31% | 143 | 5.27% |
156 | ['5652'] | 1 | 0.15% | 66.46% | 144 | 5.31% |
157 | ['5658'] | 1 | 0.15% | 66.62% | 145 | 5.35% |
158 | ['5685'] | 1 | 0.15% | 66.77% | 146 | 5.39% |
159 | ['5715'] | 1 | 0.15% | 66.92% | 147 | 5.42% |
160 | ['5722'] | 1 | 0.15% | 67.07% | 148 | 5.46% |
161 | ['5745'] | 1 | 0.15% | 67.23% | 149 | 5.50% |
162 | ['5749'] | 1 | 0.15% | 67.38% | 150 | 5.53% |
163 | ['5784'] | 1 | 0.15% | 67.53% | 151 | 5.57% |
164 | ['5808'] | 1 | 0.15% | 67.68% | 152 | 5.61% |
165 | ['5828'] | 1 | 0.15% | 67.84% | 153 | 5.64% |
166 | ['5923'] | 1 | 0.15% | 67.99% | 154 | 5.68% |
167 | ['6013'] | 1 | 0.15% | 68.14% | 155 | 5.72% |
168 | ['6043'] | 1 | 0.15% | 68.29% | 156 | 5.75% |
169 | ['6090'] | 1 | 0.15% | 68.45% | 157 | 5.79% |
170 | ['6130'] | 1 | 0.15% | 68.60% | 158 | 5.83% |
171 | ['6151'] | 1 | 0.15% | 68.75% | 159 | 5.86% |
172 | ['6162'] | 1 | 0.15% | 68.90% | 160 | 5.90% |
173 | ['6171'] | 1 | 0.15% | 69.05% | 161 | 5.94% |
174 | ['6181'] | 1 | 0.15% | 69.21% | 162 | 5.98% |
175 | ['6192'] | 1 | 0.15% | 69.36% | 163 | 6.01% |
176 | ['6195'] | 1 | 0.15% | 69.51% | 164 | 6.05% |
177 | ['6238'] | 1 | 0.15% | 69.66% | 165 | 6.09% |
178 | ['6267'] | 1 | 0.15% | 69.82% | 166 | 6.12% |
179 | ['6271'] | 1 | 0.15% | 69.97% | 167 | 6.16% |
180 | ['6364'] | 1 | 0.15% | 70.12% | 168 | 6.20% |
181 | ['6448'] | 1 | 0.15% | 70.27% | 169 | 6.23% |
182 | ['6461'] | 1 | 0.15% | 70.43% | 170 | 6.27% |
183 | ['6474'] | 1 | 0.15% | 70.58% | 171 | 6.31% |
184 | ['6476'] | 1 | 0.15% | 70.73% | 172 | 6.34% |
185 | ['6509'] | 1 | 0.15% | 70.88% | 173 | 6.38% |
186 | ['6514'] | 1 | 0.15% | 71.04% | 174 | 6.42% |
187 | ['6526'] | 1 | 0.15% | 71.19% | 175 | 6.46% |
188 | ['6542'] | 1 | 0.15% | 71.34% | 176 | 6.49% |
189 | ['6547'] | 1 | 0.15% | 71.49% | 177 | 6.53% |
190 | ['6558'] | 1 | 0.15% | 71.65% | 178 | 6.57% |
191 | ['6569'] | 1 | 0.15% | 71.80% | 179 | 6.60% |
192 | ['6571'] | 1 | 0.15% | 71.95% | 180 | 6.64% |
193 | ['6593'] | 1 | 0.15% | 72.10% | 181 | 6.68% |
194 | ['6595'] | 1 | 0.15% | 72.26% | 182 | 6.71% |
195 | ['6621'] | 1 | 0.15% | 72.41% | 183 | 6.75% |
196 | ['6625'] | 1 | 0.15% | 72.56% | 184 | 6.79% |
197 | ['6634'] | 1 | 0.15% | 72.71% | 185 | 6.82% |
198 | ['6640'] | 1 | 0.15% | 72.87% | 186 | 6.86% |
199 | ['6675'] | 1 | 0.15% | 73.02% | 187 | 6.90% |
200 | ['6688'] | 1 | 0.15% | 73.17% | 188 | 6.93% |
201 | ['6690'] | 1 | 0.15% | 73.32% | 189 | 6.97% |
202 | ['6712'] | 1 | 0.15% | 73.48% | 190 | 7.01% |
203 | ['6720'] | 1 | 0.15% | 73.63% | 191 | 7.05% |
204 | ['6734'] | 1 | 0.15% | 73.78% | 192 | 7.08% |
205 | ['6741'] | 1 | 0.15% | 73.93% | 193 | 7.12% |
206 | ['6754'] | 1 | 0.15% | 74.09% | 194 | 7.16% |
207 | ['6770'] | 1 | 0.15% | 74.24% | 195 | 7.19% |
208 | ['6772'] | 1 | 0.15% | 74.39% | 196 | 7.23% |
209 | ['6784'] | 1 | 0.15% | 74.54% | 197 | 7.27% |
210 | ['6794'] | 1 | 0.15% | 74.70% | 198 | 7.30% |
211 | ['6798'] | 1 | 0.15% | 74.85% | 199 | 7.34% |
212 | ['6800'] | 1 | 0.15% | 75.00% | 200 | 7.38% |
213 | ['6822'] | 1 | 0.15% | 75.15% | 201 | 7.41% |
214 | ['6836'] | 1 | 0.15% | 75.30% | 202 | 7.45% |
215 | ['6841'] | 1 | 0.15% | 75.46% | 203 | 7.49% |
216 | ['6881'] | 1 | 0.15% | 75.61% | 204 | 7.52% |
217 | ['6887'] | 1 | 0.15% | 75.76% | 205 | 7.56% |
218 | ['6903'] | 1 | 0.15% | 75.91% | 206 | 7.60% |
219 | ['6907'] | 1 | 0.15% | 76.07% | 207 | 7.64% |
220 | ['6914'] | 1 | 0.15% | 76.22% | 208 | 7.67% |
221 | ['6930'] | 1 | 0.15% | 76.37% | 209 | 7.71% |
222 | ['6934'] | 1 | 0.15% | 76.52% | 210 | 7.75% |
223 | ['6955'] | 1 | 0.15% | 76.68% | 211 | 7.78% |
224 | ['6965'] | 1 | 0.15% | 76.83% | 212 | 7.82% |
225 | ['6967'] | 1 | 0.15% | 76.98% | 213 | 7.86% |
226 | ['6969'] | 1 | 0.15% | 77.13% | 214 | 7.89% |
227 | ['6976'] | 1 | 0.15% | 77.29% | 215 | 7.93% |
228 | ['6996'] | 1 | 0.15% | 77.44% | 216 | 7.97% |
229 | ['7007'] | 1 | 0.15% | 77.59% | 217 | 8.00% |
230 | ['7012'] | 1 | 0.15% | 77.74% | 218 | 8.04% |
231 | ['7023'] | 1 | 0.15% | 77.90% | 219 | 8.08% |
232 | ['7029'] | 1 | 0.15% | 78.05% | 220 | 8.12% |
233 | ['7032'] | 1 | 0.15% | 78.20% | 221 | 8.15% |
234 | ['7094'] | 1 | 0.15% | 78.35% | 222 | 8.19% |
235 | ['7097'] | 1 | 0.15% | 78.51% | 223 | 8.23% |
236 | ['7118'] | 1 | 0.15% | 78.66% | 224 | 8.26% |
237 | ['7123'] | 1 | 0.15% | 78.81% | 225 | 8.30% |
238 | ['7124'] | 1 | 0.15% | 78.96% | 226 | 8.34% |
239 | ['7127'] | 1 | 0.15% | 79.12% | 227 | 8.37% |
240 | ['7148'] | 1 | 0.15% | 79.27% | 228 | 8.41% |
241 | ['7152'] | 1 | 0.15% | 79.42% | 229 | 8.45% |
242 | ['7153'] | 1 | 0.15% | 79.57% | 230 | 8.48% |
243 | ['7208'] | 1 | 0.15% | 79.73% | 231 | 8.52% |
244 | ['7211'] | 1 | 0.15% | 79.88% | 232 | 8.56% |
245 | ['7226'] | 1 | 0.15% | 80.03% | 233 | 8.59% |
246 | ['7230'] | 1 | 0.15% | 80.18% | 234 | 8.63% |
247 | ['7236'] | 1 | 0.15% | 80.34% | 235 | 8.67% |
248 | ['7238'] | 1 | 0.15% | 80.49% | 236 | 8.71% |
249 | ['7247'] | 1 | 0.15% | 80.64% | 237 | 8.74% |
250 | ['7305'] | 1 | 0.15% | 80.79% | 238 | 8.78% |
251 | ['7316'] | 1 | 0.15% | 80.95% | 239 | 8.82% |
252 | ['7332'] | 1 | 0.15% | 81.10% | 240 | 8.85% |
253 | ['7339'] | 1 | 0.15% | 81.25% | 241 | 8.89% |
254 | ['7348'] | 1 | 0.15% | 81.40% | 242 | 8.93% |
255 | ['7372'] | 1 | 0.15% | 81.55% | 243 | 8.96% |
256 | ['7377'] | 1 | 0.15% | 81.71% | 244 | 9.00% |
257 | ['7392'] | 1 | 0.15% | 81.86% | 245 | 9.04% |
258 | ['7407'] | 1 | 0.15% | 82.01% | 246 | 9.07% |
259 | ['7414'] | 1 | 0.15% | 82.16% | 247 | 9.11% |
260 | ['7436'] | 1 | 0.15% | 82.32% | 248 | 9.15% |
261 | ['7473'] | 1 | 0.15% | 82.47% | 249 | 9.18% |
262 | ['7483'] | 1 | 0.15% | 82.62% | 250 | 9.22% |
263 | ['7493'] | 1 | 0.15% | 82.77% | 251 | 9.26% |
264 | ['7509'] | 1 | 0.15% | 82.93% | 252 | 9.30% |
265 | ['7524'] | 1 | 0.15% | 83.08% | 253 | 9.33% |
266 | ['7556'] | 1 | 0.15% | 83.23% | 254 | 9.37% |
267 | ['7578'] | 1 | 0.15% | 83.38% | 255 | 9.41% |
268 | ['7586'] | 1 | 0.15% | 83.54% | 256 | 9.44% |
269 | ['7648'] | 1 | 0.15% | 83.69% | 257 | 9.48% |
270 | ['7660'] | 1 | 0.15% | 83.84% | 258 | 9.52% |
271 | ['7678'] | 1 | 0.15% | 83.99% | 259 | 9.55% |
272 | ['7688'] | 1 | 0.15% | 84.15% | 260 | 9.59% |
273 | ['7694'] | 1 | 0.15% | 84.30% | 261 | 9.63% |
274 | ['7704'] | 1 | 0.15% | 84.45% | 262 | 9.66% |
275 | ['7743'] | 1 | 0.15% | 84.60% | 263 | 9.70% |
276 | ['7764'] | 1 | 0.15% | 84.76% | 264 | 9.74% |
277 | ['7774'] | 1 | 0.15% | 84.91% | 265 | 9.77% |
278 | ['7775'] | 1 | 0.15% | 85.06% | 266 | 9.81% |
279 | ['7808'] | 1 | 0.15% | 85.21% | 267 | 9.85% |
280 | ['7815'] | 1 | 0.15% | 85.37% | 268 | 9.89% |
281 | ['7822'] | 1 | 0.15% | 85.52% | 269 | 9.92% |
282 | ['7841'] | 1 | 0.15% | 85.67% | 270 | 9.96% |
283 | ['7846'] | 1 | 0.15% | 85.82% | 271 | 10.00% |
284 | ['7864'] | 1 | 0.15% | 85.98% | 272 | 10.03% |
285 | ['7880'] | 1 | 0.15% | 86.13% | 273 | 10.07% |
286 | ['7895'] | 1 | 0.15% | 86.28% | 274 | 10.11% |
287 | ['7944'] | 1 | 0.15% | 86.43% | 275 | 10.14% |
288 | ['7946'] | 1 | 0.15% | 86.59% | 276 | 10.18% |
289 | ['7965'] | 1 | 0.15% | 86.74% | 277 | 10.22% |
290 | ['8023'] | 1 | 0.15% | 86.89% | 278 | 10.25% |
291 | ['8051'] | 1 | 0.15% | 87.04% | 279 | 10.29% |
292 | ['8111'] | 1 | 0.15% | 87.20% | 280 | 10.33% |
293 | ['8128'] | 1 | 0.15% | 87.35% | 281 | 10.37% |
294 | ['8159'] | 1 | 0.15% | 87.50% | 282 | 10.40% |
295 | ['8191'] | 1 | 0.15% | 87.65% | 283 | 10.44% |
296 | ['8238'] | 1 | 0.15% | 87.80% | 284 | 10.48% |
297 | ['8249'] | 1 | 0.15% | 87.96% | 285 | 10.51% |
298 | ['8251'] | 1 | 0.15% | 88.11% | 286 | 10.55% |
299 | ['8270'] | 1 | 0.15% | 88.26% | 287 | 10.59% |
300 | ['8286'] | 1 | 0.15% | 88.41% | 288 | 10.62% |
301 | ['8325'] | 1 | 0.15% | 88.57% | 289 | 10.66% |
302 | ['8333'] | 1 | 0.15% | 88.72% | 290 | 10.70% |
303 | ['8379'] | 1 | 0.15% | 88.87% | 291 | 10.73% |
304 | ['8425'] | 1 | 0.15% | 89.02% | 292 | 10.77% |
305 | ['8441'] | 1 | 0.15% | 89.18% | 293 | 10.81% |
306 | ['8528'] | 1 | 0.15% | 89.33% | 294 | 10.84% |
307 | ['8555'] | 1 | 0.15% | 89.48% | 295 | 10.88% |
308 | ['8557'] | 1 | 0.15% | 89.63% | 296 | 10.92% |
309 | ['8586'] | 1 | 0.15% | 89.79% | 297 | 10.96% |
310 | ['8596'] | 1 | 0.15% | 89.94% | 298 | 10.99% |
311 | ['8597'] | 1 | 0.15% | 90.09% | 299 | 11.03% |
312 | ['8607'] | 1 | 0.15% | 90.24% | 300 | 11.07% |
313 | ['8621'] | 1 | 0.15% | 90.40% | 301 | 11.10% |
314 | ['8625'] | 1 | 0.15% | 90.55% | 302 | 11.14% |
315 | ['8634'] | 1 | 0.15% | 90.70% | 303 | 11.18% |
316 | ['8658'] | 1 | 0.15% | 90.85% | 304 | 11.21% |
317 | ['8667'] | 1 | 0.15% | 91.01% | 305 | 11.25% |
318 | ['8692'] | 1 | 0.15% | 91.16% | 306 | 11.29% |
319 | ['8716'] | 1 | 0.15% | 91.31% | 307 | 11.32% |
320 | ['8753'] | 1 | 0.15% | 91.46% | 308 | 11.36% |
321 | ['8757'] | 1 | 0.15% | 91.62% | 309 | 11.40% |
322 | ['8762'] | 1 | 0.15% | 91.77% | 310 | 11.43% |
323 | ['8786'] | 1 | 0.15% | 91.92% | 311 | 11.47% |
324 | ['8788'] | 1 | 0.15% | 92.07% | 312 | 11.51% |
325 | ['8841'] | 1 | 0.15% | 92.23% | 313 | 11.55% |
326 | ['8853'] | 1 | 0.15% | 92.38% | 314 | 11.58% |
327 | ['8855'] | 1 | 0.15% | 92.53% | 315 | 11.62% |
328 | ['8872'] | 1 | 0.15% | 92.68% | 316 | 11.66% |
329 | ['8894'] | 1 | 0.15% | 92.84% | 317 | 11.69% |
330 | ['8920'] | 1 | 0.15% | 92.99% | 318 | 11.73% |
331 | ['8961'] | 1 | 0.15% | 93.14% | 319 | 11.77% |
332 | ['8970'] | 1 | 0.15% | 93.29% | 320 | 11.80% |
333 | ['8981'] | 1 | 0.15% | 93.45% | 321 | 11.84% |
334 | ['9015'] | 1 | 0.15% | 93.60% | 322 | 11.88% |
335 | ['9103'] | 1 | 0.15% | 93.75% | 323 | 11.91% |
336 | ['9138'] | 1 | 0.15% | 93.90% | 324 | 11.95% |
337 | ['9151'] | 1 | 0.15% | 94.05% | 325 | 11.99% |
338 | ['9164'] | 1 | 0.15% | 94.21% | 326 | 12.03% |
339 | ['9166'] | 1 | 0.15% | 94.36% | 327 | 12.06% |
340 | ['9210'] | 1 | 0.15% | 94.51% | 328 | 12.10% |
341 | ['9290'] | 1 | 0.15% | 94.66% | 329 | 12.14% |
342 | ['9311'] | 1 | 0.15% | 94.82% | 330 | 12.17% |
343 | ['9342'] | 1 | 0.15% | 94.97% | 331 | 12.21% |
344 | ['9444'] | 1 | 0.15% | 95.12% | 332 | 12.25% |
345 | ['9452'] | 1 | 0.15% | 95.27% | 333 | 12.28% |
346 | ['9463'] | 1 | 0.15% | 95.43% | 334 | 12.32% |
347 | ['9465'] | 1 | 0.15% | 95.58% | 335 | 12.36% |
348 | ['9476'] | 1 | 0.15% | 95.73% | 336 | 12.39% |
349 | ['9499'] | 1 | 0.15% | 95.88% | 337 | 12.43% |
350 | ['9561'] | 1 | 0.15% | 96.04% | 338 | 12.47% |
351 | ['9633'] | 1 | 0.15% | 96.19% | 339 | 12.50% |
352 | ['9692'] | 1 | 0.15% | 96.34% | 340 | 12.54% |
353 | ['9716'] | 1 | 0.15% | 96.49% | 341 | 12.58% |
354 | ['9738'] | 1 | 0.15% | 96.65% | 342 | 12.62% |
355 | ['9751'] | 1 | 0.15% | 96.80% | 343 | 12.65% |
356 | ['9776'] | 1 | 0.15% | 96.95% | 344 | 12.69% |
357 | ['9782'] | 1 | 0.15% | 97.10% | 345 | 12.73% |
358 | ['9803'] | 1 | 0.15% | 97.26% | 346 | 12.76% |
359 | ['9874'] | 1 | 0.15% | 97.41% | 347 | 12.80% |
360 | ['9954'] | 1 | 0.15% | 97.56% | 348 | 12.84% |
361 | ['5083'] | 1 | 0.15% | 97.71% | 349 | 12.87% |
362 | ['6583'] | 1 | 0.15% | 97.87% | 350 | 12.91% |
363 | ['7413'] | 1 | 0.15% | 98.02% | 351 | 12.95% |
364 | ['7415'] | 1 | 0.15% | 98.17% | 352 | 12.98% |
365 | ['7525'] | 1 | 0.15% | 98.32% | 353 | 13.02% |
366 | ['7827'] | 1 | 0.15% | 98.48% | 354 | 13.06% |
367 | ['7831'] | 1 | 0.15% | 98.63% | 355 | 13.09% |
368 | ['7971'] | 1 | 0.15% | 98.78% | 356 | 13.13% |
369 | ['8142'] | 1 | 0.15% | 98.93% | 357 | 13.17% |
370 | ['8163'] | 1 | 0.15% | 99.09% | 358 | 13.21% |
371 | ['8252'] | 1 | 0.15% | 99.24% | 359 | 13.24% |
372 | ['8371'] | 1 | 0.15% | 99.39% | 360 | 13.28% |
373 | ['9137'] | 1 | 0.15% | 99.54% | 361 | 13.32% |
374 | ['9520'] | 1 | 0.15% | 99.70% | 362 | 13.35% |
375 | ['7303'] | 1 | 0.15% | 99.85% | 363 | 13.39% |
376 | ['7664'] | 1 | 0.15% | 100.00% | 364 | 13.43% |
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | accession | data_release_date | gse_accession | gds_accession | data_deposit_pmids | data_reuse_pmcid | data_reuse_pmid | this_submit_authors | this_reuse_authors | authors_name_intersection | reuse_journal | reuse_year | reuse_date_published | is_on_geos_reuse_list | attribution_excerpts | manually annotated as data reuse | manually annotated as data deposit | manually annotated as a passing mention | group | author_overlap | considered_third_party_reuse |
2 | GDS2515 | 1/1/2007 | ['6210'] | ['2515'] | [u'17141629'] | 2761245 | [u'19818710'] | ['', 'Huntgeburth', 'Barbatelli', 'Cinti', 'Lowell', 'Choi', 'Lin', 'Vianna', 'Shulman', 'Spiegelman', 'Kim', 'Krauss', 'Tzameli', 'Coppari'] | ['Kumar', 'Juan', 'Marx', 'Young', 'Sartorelli'] | [] | Mol Cell | 2009 | 10/9/2009 | 0 | (DM, differentiation medium). Genome-wide expression profiling also indicates that the primary miR-199/214 transcript is up-regulated during C2C12 cell differentiation ( Caretti et al., 2006 ) (GEO, GDS2515{{tag}}--MENTION-- record, Mm.29567, http://www.ncbi.nlm.nih.gov ). Increased miR-199 and 214 accumulation in fully confluent cells coincided with initial Ezh2 protein reduction ( Figure 1F , 100% GM) and cell diffe | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
3 | GDS2518 | 1/12/2007 | ['6710'] | ['2518'] | [u'16858420'] | 2718700 | [u'19680446'] | ['Heubach', 'Mrowietz', 'Reischl', 'Beekman', u'Ternes', u'Sturzebecher', 'St\xc3\xbcrzebecher', u'Bauer', 'Schwenke'] | ['Abecasis', 'Krueger', 'Feng', 'Zhang', 'Sun', 'Bowcock', 'Nair', 'Begovich', 'Callis-Duffin', 'Elder', 'Goldgar', 'Schrodi', 'Stuart', 'Soltani-Arabshahi'] | [] | PLoS Genet | 2009 | 2009 Aug | 0 | the most important candidate genes at this locus. It has been observed that the transcription of C6orf10 in keratinocytes can be triggered by TNF-α (Gene Expression Omnibus dataset number: GDS1289) [39] , an important proinflammatory cytokine in the pathogenesis of psoriasis, although the function of the C6orf10 product is not known. Nevertheless, other genes cannot be exc|ceptor, NKG2D. In psoriasis, it has been shown that MICA is down-regulated in lesional skin compared with non-lesional skin (p = 0.007, Gene Expression Omnibus dataset number: GDS2518{{tag}}--REUSE--) [41] . The under-expression of the MICA protein might allow the unwanted cells to escape the cytolysis by NK or CD8+ T-cells, resulting in keratinocyte proliferation and t | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
4 | GDS2545 | 1/30/2007 | ['6919'] | ['2545'] | [u'17430594', u'15254046'] | 2781753 | [u'19736252'] | ['', 'Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Tsuda', 'Kayano', 'Mamitsuka', 'Takigawa', 'Shiga'] | [] | Bioinformatics | 2009 | 11/1/2009 | 1 | ng the lowest P -value for each gene pair of the 10 interactions in Table 4 . For example, for COX6C and UBA1, the gene pair of the first interaction of Table 4 , we found a switching mechanism in GDS2960_1 with the P -value of −3.9532, showing the statistical significance of this mechanism. This directly indicates that there must exist a switching mechanism in expression between these two |ank Gene pair #datasets GDS P - value #ex. #ex. Annotation from GEO class 1 class 2 1 {COX6C,UBA1} 117 GDS2960_1 −3.9532 60 41 Marfan syndrome: cultured skin fibroblasts 2 {RERE,TNFRSF1A} 284 GDS2736_25 −5.9049 19 15 Malignant fibrous histiocytoma and various soft tissue sarcomas 3 {ATP5D,ITCH} 324 GDS1875_3 −5.1235 27 24 Host cell response to HIV-1 Vpr-induced cell cycle arrest|GDS2733_1 −7.9996 17 17 Cytosine arabinoside effect on Ewing's sarcoma cell line 5 {NCSTN,HSPA5} 102 GDS2545{{tag}}--REUSE--_5 −6.4398 63 25 Metastatic prostate cancer (HG-U95A) 6 {NDUFA8,NDUFA6} 142 GDS2733_4 −4.7027 17 16 Cytosine arabinoside effect on Ewing's sarcoma cell line 7 {ALS2,SLC25A6} 108 GDS1627_2 −3.2808 16 15 Breast cancer cell lines response to chemotherapeutic drugs 8 {|P5J} 418 GDS2960_1 −3.1628 60 41 Marfan syndrome: cultured skin fibroblasts 9 {COX6C,UBA1} 117 GDS2960_1 −3.9532 60 41 Marfan syndrome: cultured skin fibroblasts 10 {NDUFA10,COX4} 232 GDS2643_9 −6.2133 13 12 Waldenstrom's macroglobulinemia: B lymphocytes and plasma cells For each gene pair of 10 interactions in Table 4 , the number of datasets obtained from GEO, the GDS which g|ng genes. 3 In each GDS, if it has more than two classes or replicated experiments, we consider all possible pairwise combinations of them. We then name generated multiple datasets from one GDS (e.g. GDS2960) those like GDS2960_1, GDS2960_2, etc. This results in that the number of datasets we used could be >36. The actual number of datasets for each gene pair is shown in Table 5 . 4 For each g | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
5 | GDS2545 | 1/30/2007 | ['6919'] | ['2545'] | [u'17430594', u'15254046'] | 2877722 | [u'20523739'] | ['', 'Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Geman', 'Eddy', 'Price', 'Hood'] | [] | PLoS Comput Biol | 2010 | 5/27/2010 | 0 | rom patients with different stages of prostate disease. The gene expression data, originally reported by Yu et al. [24] and publically available in the NCBI Gene Expression Omnibus (GDS2545{{tag}}--REUSE--), contains 108 human prostate samples: 18 samples of normal prostate tissue (NP) from organ donors, 65 primary prostate tumor (PT) samples, and 25 metastatic prostate tumor (MT) samples. The findin | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
6 | GDS2546 | 1/30/2007 | ['6919'] | ['2546'] | [u'17430594', u'15254046'] | 2887649 | [u'20563261'] | ['', 'Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Cher', 'Sheng', 'Kandagatla', 'Singareddy', 'Cai', 'Chinni', 'Kropinski'] | [] | Transl Oncol | 2010 | 6/1/2010 | 0 | nd CXCR4 Expression in Human Benign Prostate and Prostate Cancer Tissue Expression profile data sets for human benign and PC tissue were queried for ERG and CXCR4 expression using the Gene Expression Omnibus database ( http://www.ncbi.nlm.nih.gov/geo/ ). This record was deposited by Yu et al. as previously described [ 24 ]. We extracted the gene expression values for ERG and CXCR4 for benign prostate ( n|ure 1 ERG and CXCR4 were highly expressed in TMPRSS2-ERG fusion-positive cell and prostate tumor cells, and ERG binds to CXCR4 promoter. (A) Expression array data for ERG and CXCR4 were obtained from GDS2546{{tag}}--REUSE-- record from Gene Expression Omnibus database. Mann-Whitney (more ...) Figure 1 ERG and CXCR4 were highly expressed in TMPRSS2-ERG fusion-positive cell and prostate tumor cells, and ERG binds to | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
7 | GDS2546 | 1/30/2007 | ['6919'] | ['2546'] | [u'17430594', u'15254046'] | 3009682 | [u'21144054'] | ['', 'Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Goodglick', 'Chia', 'Knutzen', 'Zhou', 'Bagryanova', 'Liu', 'Liebeskind', 'Mah', 'Maresh', 'Horvath', 'Alavi'] | ['Liu'] | BMC Cancer | 2010 | 12/13/2010 | 0 | er Two publically available datasets were used to examine AGR2 gene expression in human prostate samples [ 31 , 32 ]. The first dataset was generated using Affymetrix U95B Array (GEO Accession number GDS2546{{tag}}--REUSE--) [ 31 ]. It included 66 prostate cancer tissues, 17 normal prostate tissues, and 25 metastatic prostate tumor samples obtained from 4 patients. The second data set was generated using a cDNA microa|imary tumor tissues, 41 matched normal prostate tissues, and 9 unmatched pelvic lymph node metastases. For all datasets we used the peer-reviewed normalization procedures described by the authors. We downloaded the normalized data from the GEO database. Prostate tissue microarray analysis Formalin-fixed, paraffin-embedded archival tumor specimens were obtained from 187 patients who underwent radical retro | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
8 | GDS2547 | 1/30/2007 | ['6919'] | ['2547'] | [u'17430594', u'15254046'] | 2988752 | [u'21044312'] | ['', 'Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Pettaway', 'Gorlov', 'Logothetis', 'Byun', 'Gorlova', 'Zhao', 'Maity', 'Navone', 'Troncoso', 'Sircar'] | [] | BMC Cancer | 2010 | 11/2/2010 | 0 | ensively used to classify cancers by gene-expression signatures [ 1 - 3 ]. It has also been used for predicting response to treatment [ 4 - 6 ] and prognosis [ 7 - 9 ]. Ample gene-expression data are publicly available. Two major databases are Oncomine http://www.oncomine.org and the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/ ). Oncomine is a rapidly growing compendium of more than |10, comprised 35 datasets related to different aspects of prostate tumorigenesis. The GEO database is part of the open National Center for Biotechnology Information (NCBI) resources. GEO, the largest publicly available repository of gene-expression data [ 10 ], was established in 2000 to house and distribute those data to researchers [ 11 ]. This resource, however, is not generally used efficiently becaus|. In silico validation: primary tumors vs. distant prostate metastases The results of 2 studies comparing gene expression in primary tumors and distant metastases were recently published (GEO dataset GDS2547{{tag}}--REUSE--) [ 20 , 21 ]. We estimated the overlap between the genes expressed differently in primary tumors vs. distant metastases and the candidate genes our analysis identified. Experimental validation of t|ciated with prostate cancer development. Recently Nakagawa et al . [ 24 ] published the results of a large retrospective case-control study of systemic progression of prostate cancer (GEO dataset ID GSE10645). Those authors identified 68 genes that are significantly up-regulated or down-regulated in patients with relapse defined by prostate-specific antigen concentration and systemic progression (dete|this correlation is unlikely to explain more than 10% of the observed variation in Z scores (data not shown). Conclusions In conclusion, our multilevel meta-analysis efficiently combined almost all publicly available gene-expression data on prostate cancer and allowed identification of candidate genes associated with prostate tumor development. The results of several in silico validation tests of the |of candidate genes we have generated may be a useful resource for researchers studying the molecular mechanisms underlying prostate cancer development. Abbreviations and acronyms GEO: Gene Expression Omnibus database; GO: Gene Ontology database; TGF-β: transforming growth factor beta; CAPC: clinically advanced prostate cancer; BPH: benign prostatic hypertrophy; Competing interests The authors dec|O: mining tens of millions of expression profiles--database and tools update Nucleic Acids Res 2007 35 Database issue D760 D765 10.1093/nar/gkl887 17099226 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 207 210 10.1093/nar/30.1.207 11752295 Gur-Dedeoglu B Konu O Kir S Ozturk AR Bozkurt B Ergul G Yulug IG A resamp | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
9 | GDS2604 | 3/5/2007 | ['6013'] | ['2604'] | [u'17331233'] | 2737627 | [u'19753302'] | ['Nymark', 'Korpela', 'Knuutila', 'Hollm\xc3\xa9n', 'Lahti', 'Lindholm', 'Ruosaari', 'Anttila', 'Kaski', 'Kinnula', u'Hollmen'] | ['Citro', 'Baldi', 'Vicidomini', 'Calogero', 'Menegozzo', 'Crispi', 'Mellone', 'Facciolo', 'Pierantoni', 'Fasano', 'Santini', 'Cobellis', 'Vincenzi', 'Meccariello'] | [] | PLoS One | 2009 | 9/15/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
10 | GDS2609 | 3/20/2007 | ['4107'] | ['2609'] | [u'17317818'] | 2872883 | [u'20187943'] | ['Eu', '', 'Hong', 'Cheah', 'Ho'] | ['Whitfield', 'Weng', 'DeLisi', 'Hu', 'Yang', 'Hung'] | [] | Genome Biol | 2010 | 2010 | 0 | , these two are among those whose currently available cancer expression data in the GEO database have adequate sample size for statistical testing. Case study I: colon cancer dataset The dataset [GEO:GDS2609{{tag}}--REUSE--] [ 16 ] consists of 10 normal and 12 early onset colorectal cancer samples. Since the mutual influence (Equation 1) of two genes depends on the correlation between their expression levels, the TIF|gration for these two cancers. Case study II: small cell lung cancer dataset The small cell lung cancer dataset consists of 19 normal and 15 primary small cell lung cancer samples collected from [GEO:GSE1037] [ 35 ]. The ten genes with highest TIF scores among 201 pathways are listed in Table 3 . These genes are associated with cell cycle (growth and division), apoptosis, immune response and metabol | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
11 | GDS2609 | 3/20/2007 | ['4107'] | ['2609'] | [u'17317818'] | 3008745 | [u'21203531'] | ['Eu', '', 'Hong', 'Cheah', 'Ho'] | ['Koenig', 'Kerick', 'Fischer', 'Barmeyer', 'Bertram', 'Garshasbi', 'Trappe', 'Boerno', 'Lappe', 'Schweiger', 'Lehrach', 'Herrmann', 'Roehr', 'Timmermann', 'Seemann', 'Wunderlich', 'Isau', 'Kuss', 'Zatloukal', 'Werber'] | [] | PLoS One | 2010 | 12/22/2010 | 0 | sequencing we performed 13 Genome Sequencer FLX runs, which produced over 558 million bases and 1.43 million reads per run. Reads were aligned to the human reference genome, NCBI build 36 ( http://hgdownload.cse.ucsc.edu/goldenPath/hg18/ ), using GS Reference Mapper Version 2.0.0.12 (Roche). The best matches in the genome were used as the location for the reads with multiple matches. Only unique reads wi|ele frequencies or average heterozygosity >0.01. Variants were subjected to many comparisons with external data sources. Most data sources are integrated in the UCSC genome browser ( http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ ) or were derived from websites like the 1000 Genomes Project ( ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/2009_04/ ) the gene ontology data ( http://arch|ved if their conservation score was greater or equal 2.0 (0.975 quantil of all conservation scores). For gene expression healthy control samples from http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2609{{tag}}--REUSE-- were used. We calculated genewise mean expression values across all samples and used the first quartile as threshold to determine gene expression. Pathway analyses were performed with the ingenuit | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
12 | GDS2614 | 3/29/2007 | ['7333'] | ['2614'] | [u'17397913'] | 2646459 | [u'19066325'] | ['Ransom', 'Srivastava', 'Schwartz', 'Tsuchihashi', 'Muth', 'Li', 'Zhao', 'McManus', 'Vedantham', 'von'] | ['Lakoski', 'Vandenbergh', 'Lionikas', 'McClearn', 'Spicer', 'Blizard', 'Vasilopoulos', 'Stout', 'Vogler', 'Gerhard', 'Klein', 'Mack', 'Griffith', 'Larsson'] | [] | Physiol Genomics | 2009 | 2/2/2009 | 1 | istinct in the B6 strain (Supplemental Fig. S1). 1 Variants of the following genes were predicted (see materials and methods ) to have probably damaging effects: E130016E03Rik, expressed in heart (GDS2614{{tag}}), embryonic kidney (GDS1583), smooth muscle cells (GDS799); B230312A22Rik, expressed in embryonic kidney (GDS1583), heart (GDS2304), aortic smooth muscle cells (GDS2704); Nipsnap3a , expressed in |heart (GDS1228), aortic smooth muscle cells (GDS2704), kidney (GDS1583). The following variants are predicted to be possibly damaging: Tln1 , expressed in heart (GDS2614{{tag}}, GDS1228), embryonic kidney (GDS1748), aortic smooth muscle (GDS2704); Mdn1 , expressed in heart (GDS1228), embryonic kidney (GDS1583), smooth muscle cells (GDS799); Ltb4dh , expressed in heart (GDS1080, GDS2727), embryonic kidney ( | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
13 | GDS2617 | 3/10/2007 | ['6883'] | ['2617'] | [u'17229949'] | 2537559 | [u'18793472'] | ['Wang', 'Clarke', 'Dalerba', 'Lewicki', 'Liu', 'Gurney', 'Sherlock', 'Chen', 'Shedden', 'Hoey'] | ['Weil', 'Ptitsyn', 'Thamm'] | [] | BMC Bioinformatics | 2008 | 8/12/2008 | 1 | variation over samples, comparing max/min and max - min with predefined values and excluding genes not obeying both conditions. The resulting data are available at . Colorectal cancer data sets The GDS756 dataset provided insight the progression of cancer from primary tumor growth to metastasis by comparison of gene expression in SW480, a primary tumor colon cancer cell line, to that in SW620, an iso|ed out for each cell line and six hybridized arrays obtained. Raw data were analyzed using two microarray analysis software packages, dChip (13) and R-Robust Microarray Analysis (R-RMA) (14). We have downloaded and used these data sets from GEO (GDS756 and GDS1780). Each data set contains 22283 features (probesets) and 6 columns (samples) representing two contrast classes, each with three replicate experi | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
14 | GDS2617 | 3/10/2007 | ['6883'] | ['2617'] | [u'17229949'] | 2582597 | [u'18842630'] | ['Wang', 'Clarke', 'Dalerba', 'Lewicki', 'Liu', 'Gurney', 'Sherlock', 'Chen', 'Shedden', 'Hoey'] | ['Ding', 'Zheng', 'Zhu', 'Wang', 'Hao', 'Tu', 'Li', 'Liu', 'Fan', 'Dong', 'Chen', 'Jiang', 'Thiesen'] | ['Chen', 'Liu', 'Wang'] | Nucleic Acids Res | 2008 | 2008 Nov | 1 | and all- trans -retinoic acid (ATRA) on acute myeloblastic leukemia cells, OCI/AML2. A total of four microarray assays were done in their experiments (data can be downloaded from the gene expression omnibus by ID GDS1215), one array was treated with VPA and another with vehicle. These two were analyzed using our method. After a similarity search, the top 10 chemicals with highest scores were presented (| the effect of hypoxia on ‘gene expression’ in MCF7 cell line. Six microarray assays in their experiments (three replicates for hypoxia treatment and normoxia treatment, respectively, GDS2758) were analyzed using our method. Search results are presented ( Table 2 ). All top 10 agents show fully positively correlation with the query, and most of them (8 of 10) are reported to have a tigh|therapeutics. This case is taken from a work ( 28 ) that analyzed expression changes in breast cancer cells having high tumorigenic capacity. Nine microarray assays (three normal and six tumorigenic, GDS2617{{tag}}--REUSE--) were analyzed using our method. Top 10 hits are presented ( Table 3 ). Trichostatin A is histone deacetylase inhibitor, which has long been investigated as a potential antitumor agent against brea|election. Global similarity search using only a small fraction of genes may cause insufficient information usage, which may lead to instabilities of search result. Here we took the data already used (GDS1215: VPA treatment versus vehicle) to illustrate the problems that may arise when using Connectivity Map improperly. Table 4 shows both chemicals appears in top 10 and their ranks are highly diverse | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
15 | GDS2622 | 3/1/2007 | ['6784'] | ['2622'] | [u'17322878'] | 2992512 | [u'21073694'] | ['Segal', 'Rechavi', 'Shay', 'Mills', 'Tarcic', 'Lu', 'Lahad', 'Vaisman', u'Ami', u'Yosef', 'Zhang', 'Amariglio', u'Eytan', 'Jacob-Hirsch', 'Domany', 'Citri', 'Yarden', 'Alon', u'Ido', 'Amit', 'Katz', 'Siwak', u'Tal'] | ['Komurov', 'Ram'] | [] | BMC Syst Biol | 2010 | 11/12/2010 | 0 | at the distribution of well-studied proteins across the EV spectrum is relatively uniform so as to allow for the detection of statistically significant patterns. Methods Datasets The ExpO dataset was downloaded from the web site for Expression Project for Oncology ( http://www.intgen.org/expo/ ). Each column in the final dataset of 2158 samples was first normalized by quantile normalization, and then each|nd log2 transformed. EV values were determined as statistical variance value of a gene across all the samples in the normalized dataset. The CK compendium was derived from datasets in Gene Expression Omnibus: GDS649 (IL1 treatment of HUVEC cells), GDS1290 (TGF-beta treatment of Th1 and Th2 cells), GDS1249 (arachidonic acid treatment of dendritic cells), GDS2516 (interferon treatment of endothelial and fi|(retinoic acid treatment of sebocyte cells), GDS1926 (leukotriene and thrombin treatment of endothelial cells), GDS2626 (EGF and HRG treatment of MCF7 cells), GDS2422 (FGF2 treatment of fibroblasts), GDS2484 (TNF-alpha treatment of endothelial cells), GDS2622{{tag}}--REUSE-- (EGF treatment of MCF10A cells), GDS3217 (estradiol treatment of MCF7), GDS2090 (sphingosine treatment of glioblastoma cell line), GDS855 (TGF-be | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
16 | GDS2623 | 3/1/2007 | ['6783'] | ['2623'] | [u'17322878'] | 2952859 | [u'20484370'] | ['Segal', 'Rechavi', 'Shay', 'Mills', 'Tarcic', 'Lu', 'Lahad', 'Vaisman', u'Ami', u'Yosef', 'Zhang', 'Amariglio', u'Eytan', 'Jacob-Hirsch', 'Domany', 'Citri', 'Yarden', 'Alon', u'Ido', 'Amit', 'Katz', 'Siwak', u'Tal'] | ['Gijsbers', 'De', 'Ceulemans', 'Bartholomeeusen', 'Debyser'] | [] | Nucleic Acids Res | 2010 | 2010 Oct | 0 | ther analysis. Data analysis The RefSeq data set and all Encyclopedia of DNA Elements (ENCODE) data sets were obtained from the UCSC website ( http://genome.ucsc.edu/ ). The HIV-1 integration set was downloaded from the Bushman Lab website ( http://microb230.med.upenn.edu/ ) ( 28 ). All data analysis was performed with in-house-written Python scripts ( http://www.python.org/ ). To determine the distributi| inside one TU and directly upstream of a second TU, the weight of a base was assigned to the former TU since LEDGF/p75 islands do not overlap with TSSs. Correlation with transcriptional activity Two publicly available HG-U133A Affymetrix HeLa cell line expression array data sets (GSM156764 from data set GSE9750 and GSM156764 from data set GDS2623{{tag}}--REUSE--) were obtained from the NCBI GEO data set browser ( www.nc | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
17 | GDS2635 | 3/16/2007 | ['5764'] | ['2635'] | [u'17389037'] | 2687718 | [u'19243594'] | ['Bouchal', 'Skarda', 'Srovnal', 'Klein', 'Wei', 'Murray', 'Hajduch', 'Baumforth', 'Kolar', 'Ehrmann', 'Fridman', 'Dziechciarkova', 'Turashvili'] | ['Eiseler', 'Goodison', 'Yan', 'D\xc3\xb6ppler', 'Storz'] | [] | Breast Cancer Res | 2009 | 2009 | 0 | tigation of other publicly available microarray datasets on the NCBI Gene Expression Omnibus (GEO) showed that PRKD1 is detected at appreciable levels in normal lobular and ductal breast cells [GEO:GDS2635{{tag}}--MENTION--] [ 45 ], in atypical hyperplasia [GEO:GDS1250] [ 46 ] and in the cancerous lesions invasive ductal and lobular carcinomas [GEO:GDS2635{{tag}}--REUSE--] [ 45 ], suggesting that PKD1 expression is indeed decreased w | 1 | 0 | 1 | NOT pmc_gds | 0 | 1 |
18 | GDS2635 | 3/16/2007 | ['5764'] | ['2635'] | [u'17389037'] | 2631593 | [u'19116033'] | ['Bouchal', 'Skarda', 'Srovnal', 'Klein', 'Wei', 'Murray', 'Hajduch', 'Baumforth', 'Kolar', 'Ehrmann', 'Fridman', 'Dziechciarkova', 'Turashvili'] | ['Ergul', 'Yulug', 'Gur-Dedeoglu', 'Kir', 'Bozkurt', 'Ozturk', 'Konu'] | [] | BMC Cancer | 2008 | 12/30/2008 | 1 | used as an estimate of the measure of expression. Data retrieval and analysis for validation studies The ".cel files" of the three publicly available independent microarray gene expression data sets, GDS2635{{tag}}--REUSE-- [ 5 ], GDS2250 [ 7 ] and GDS1329 [ 4 ], were downloaded from GEO [ 28 ] and processed by the BRB-ARRAYTOOLS [ 26 ]. All three datasets were obtained using the Affymetrix HGU133A or HGU133 Plus 2.0 |lation to their normal ductal and lobular cells (n = 10). The authors identified multiple genes differentially expressed in comparisons between ductal and lobular tumor and normal cells [ 5 ]. In the GDS2250 study, a gene expression array-based analysis of three breast tumor subtypes, i.e., sporadic basal-like cancer (BLC), BRCA-associated breast cancer, and non-BLC, was performed. They used 47 human b|s for the meta-gene lists, DN (Ductal/Normal) and LN (Lobular/Normal). Study GEO ID Class Meta gene-list DN LN N T Accuracy (%) Number of genes r DN Accuracy (%) Number of genes r LN Turashvili [ 5 ] GDS2635{{tag}}--REUSE-- 10 10 93 57 0.85 80 49 0.87 Richardson [ 7 ] GDS2250 7 40 100 145 0.86 100 96 0.78 Karnoub [ 8 ] GSE8977 15 7 95.5 109 0.72 95.5 89 0.81 Normal (N) and tumor (T) sample sizes, accuracy of predictio|tive gene set differentially expressed between tumor and normal cells (Additional file 9 ). Twenty-eight genes from the DN or LN meta-gene lists intersected with the three other microarray datasets (GDS2635{{tag}}--REUSE--, GDS2250, and GD1329); 17 of which were differentially expressed between basal vs. non-basal and/or ER status (Additional file 9 ). For example, ADAMTS1 , ATF3 , IGFBP6 , PRNP , EGFR , FN1 ,|3c; 0.05; Additional file 9 ). Validation of ductal vs. lobular meta-gene list Comparison of fold-change values of the DL meta-gene list consisting of 65 genes with that of the Turashvili's DL list (GDS2635{{tag}}--REUSE--) resulted in a high degree of correlation (r = 0.53; p < 0.001), suggesting that the direction and magnitude of expression change between the IDC and ILC samples were largely consistent bet | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
19 | GDS2643 | 3/1/2007 | ['6691'] | ['2643'] | [u'17252022'] | 2715883 | [u'19657382'] | [u'Hern\xe1ndez', u'S\xe1nchez', 'Ocio', 'Arcos', 'San', 'Guti\xc3\xa9rrez', 'Fermi\xc3\xb1\xc3\xa1n', 'S\xc3\xa1nchez', 'de', 'Maiso', u'De', u'Fermi\xf1an', 'Hern\xc3\xa1ndez', u'Guti\xe9rrez', 'Delgado'] | ['Agarwal', 'Hu'] | [] | PLoS One | 2009 | 8/6/2009 | 1 | fication. Profile1 or Signature * Profile2 * MeSH term for profile1/signature MeSH term for profile2 Level of matched disease in MeSH tree # Enrichment score (correlation coefficient) GDS1956.0.3 GDS1956.0.6 Amyotrophic Lateral Sclerosis Muscular Dystrophy, Emery-Dreifuss 2 1.29 GDS2118.0.1 GDS2118.0.2 Anemia, Refractory Anemia, Sideroblastic 4 1.58 GDS2118.0.3 GDS2397.0.1 Anemia, Refra|321.0.2 Barrett Esophagus Adenocarcinoma 0 1.59 GDS2190.0.1 GDS810.0.2 Bipolar Disorder Alzheimer Disease 1 1.06 GDS2250.0.3 GDS2418.0.1 Carcinoma, Basal Cell Cervical Intraepithelial Neoplasia 4 1.2 GDS651.0.1 GDS651.0.2 Cardiomyopathy, Dilated Cardiomyopathy, Restrictive 3 1.42 GDS1989.1.7.0 GDS2418.0.1 Cervical Intraepithelial Neoplasia Lymphatic Metastasis 1 1.01 GDS1615.0.1 GDS1615.0.2 Colitis, Ul| (0.86) GDS2200.0.1 GDS2200.0.2 Keratosis Carcinoma, Squamous Cell 0 1.5 GDS1989.1.6.0 GDS1989.1.7.0 Lymphatic Metastasis Melanoma 1 1.49 GDS1989.1.2.0 GDS1989.1.7.0 Lymphatic Metastasis Nevus 1 1.17 GDS2643{{tag}}--REUSE--.0.1.5 GDS2643{{tag}}--REUSE--.0.3.5 Multiple Myeloma Waldenstrom Macroglobulinemia 4 1.08 GDS1956.0.2 GDS1956.0.8 Muscular Dystrophy, Duchenne Dermatomyositis 3 (0.73) GDS1956.0.6 GDS1956.0.7 Muscular Dystrophy, E|6.0.5 Myopathy, Central Core Muscular Dystrophy, Facioscapulohumeral 3 (0.57) GDS1375.0.1 GDS1375.0.2 Nevus Melanoma 3 1.5 GDS1746.2.3.0 GDS1746.2.4.0 Prostatic Hyperplasia Prostatic Neoplasms 3 1.09 GDS1439.0.1 GDS1439.0.2 Prostatic Neoplasms Neoplasm Metastasis 1 1.09 GDS1282.0.1 GDS1282.0.2 Wilms Tumor Sarcoma, Clear Cell 2 1.1 Results from both the enrichment score and correlation coefficient metho | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
20 | GDS2643 | 3/1/2007 | ['6691'] | ['2643'] | [u'17252022'] | 2781753 | [u'19736252'] | [u'Hern\xe1ndez', u'S\xe1nchez', 'Ocio', 'Arcos', 'San', 'Guti\xc3\xa9rrez', 'Fermi\xc3\xb1\xc3\xa1n', 'S\xc3\xa1nchez', 'de', 'Maiso', u'De', u'Fermi\xf1an', 'Hern\xc3\xa1ndez', u'Guti\xe9rrez', 'Delgado'] | ['Tsuda', 'Kayano', 'Mamitsuka', 'Takigawa', 'Shiga'] | [] | Bioinformatics | 2009 | 11/1/2009 | 1 | ng the lowest P -value for each gene pair of the 10 interactions in Table 4 . For example, for COX6C and UBA1, the gene pair of the first interaction of Table 4 , we found a switching mechanism in GDS2960_1 with the P -value of −3.9532, showing the statistical significance of this mechanism. This directly indicates that there must exist a switching mechanism in expression between these two |ank Gene pair #datasets GDS P - value #ex. #ex. Annotation from GEO class 1 class 2 1 {COX6C,UBA1} 117 GDS2960_1 −3.9532 60 41 Marfan syndrome: cultured skin fibroblasts 2 {RERE,TNFRSF1A} 284 GDS2736_25 −5.9049 19 15 Malignant fibrous histiocytoma and various soft tissue sarcomas 3 {ATP5D,ITCH} 324 GDS1875_3 −5.1235 27 24 Host cell response to HIV-1 Vpr-induced cell cycle arrest|GDS2733_1 −7.9996 17 17 Cytosine arabinoside effect on Ewing's sarcoma cell line 5 {NCSTN,HSPA5} 102 GDS2545_5 −6.4398 63 25 Metastatic prostate cancer (HG-U95A) 6 {NDUFA8,NDUFA6} 142 GDS2733_4 −4.7027 17 16 Cytosine arabinoside effect on Ewing's sarcoma cell line 7 {ALS2,SLC25A6} 108 GDS1627_2 −3.2808 16 15 Breast cancer cell lines response to chemotherapeutic drugs 8 {|P5J} 418 GDS2960_1 −3.1628 60 41 Marfan syndrome: cultured skin fibroblasts 9 {COX6C,UBA1} 117 GDS2960_1 −3.9532 60 41 Marfan syndrome: cultured skin fibroblasts 10 {NDUFA10,COX4} 232 GDS2643{{tag}}--REUSE--_9 −6.2133 13 12 Waldenstrom's macroglobulinemia: B lymphocytes and plasma cells For each gene pair of 10 interactions in Table 4 , the number of datasets obtained from GEO, the GDS which g|ng genes. 3 In each GDS, if it has more than two classes or replicated experiments, we consider all possible pairwise combinations of them. We then name generated multiple datasets from one GDS (e.g. GDS2960) those like GDS2960_1, GDS2960_2, etc. This results in that the number of datasets we used could be >36. The actual number of datasets for each gene pair is shown in Table 5 . 4 For each g | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
21 | GDS2643 | 3/1/2007 | ['6691'] | ['2643'] | [u'17252022'] | 2967746 | [u'21044363'] | [u'Hern\xe1ndez', u'S\xe1nchez', 'Ocio', 'Arcos', 'San', 'Guti\xc3\xa9rrez', 'Fermi\xc3\xb1\xc3\xa1n', 'S\xc3\xa1nchez', 'de', 'Maiso', u'De', u'Fermi\xf1an', 'Hern\xc3\xa1ndez', u'Guti\xe9rrez', 'Delgado'] | ['Ding', 'Huang', 'Zhang', 'Jin', 'Payne', 'Keen-Circle', 'Ozer', 'Borlawsky', 'Xiang'] | [] | BMC Bioinformatics | 2010 | 10/28/2010 | 0 | we focused on the co-expression network involving ZAP70, a well characterized biomarker for CLL. We selected 23 microarray datasets corresponding to multiple types of cancer from the Gene Expression Omnibus (GEO) and used the frequent network mining algorithm CODENSE to identify highly connected gene co-expression networks spanning the entire genome, then evaluated the genes in the co-expression network|a similar approach by studying genes co-expressed with ZAP70 in multiple datasets. Specifically, we selected 23 microarray datasets corresponding to multiple types of cancers from the Gene Expression Omnibus (GEO) and used the CODENSE algorithm to identify highly connected gene co-expression network spanning the entire genome. We then narrowed down the gene list in the co-expression networks in which ZAP|e top 10 enriched biological functions of Network 17 genes using IPA. Identify genes in the ZAP70 co-expression network with differential expression levels between different IgV H mutation groups in GDS1454 dataset Among these 51 genes, we further selected genes whose expression levels can predict IgV H mutation status using the three steps outlined in the Method section. Table 1 summarizes the gen| same group. It is worth noting that although our selection criteria does not include the more conservative multiple t-test compensation methods such as Bonferroni test, out of the 12651 probesets in GDS1454, only 190 satisfied our criteria with 122 up-regulated and 68 down-regulated, which constitute a reasonable set of genes for further screening. In addition, in the GeneCards database, we identified|recently been identified as a potential CLL prognostic biomarker in another experimental study[ 13 ]. Validate the prognostic capability of the identified biomarkers using new CLL microarray dataset (GSE10138) Figure 4 shows the Kaplan-Meier curves of patient time-to-treatment (TTT) from microarray data (GSE10138) of 61 CLL patients, using ZAP70 and all above identified biomarker candidates as featur|05, figure not shown). Interestingly, with or without KLRK1, the patient grouping results and p-values stay the same. Figure 4 The Kaplan-Meier curves of the two groups of CLL patients in the dataset GSE10138 using unsupervised K-mean clustering. The biomarkers used to generate the survival curves are: ZAP70, LAG3, IL2RB, CD247, CD8A and KLRK1. Discussion As shown in Figure 5 , except for KLRK1, all |g a prognostic biomarker for CLL. CD8A and CD247 : Both are T-cell surface antigens, but expression of CD8A on B-cells has been reported in CLL patients [ 18 , 19 ]. Since the samples for the data in GDS1454 are generated from mononuclear cells including both T-cells and B-cells, it is not clear what the origin of these molecules is. Regardless, they demonstrate comparable capacity in predicting IgV H |information. Methods Data selection We initiated this project by querying the GEO database using the term "chronic lymphocytic leukemia" [ 21 ]. Five GDS dataset results were returned from the query: GDS2676, GDS2643{{tag}}--REUSE--, GDS2501, GDS1454, and GDS1388. We filtered these results to identify datasets comparing patients in different groups; yielding the GDS1454 and GDS1388 data sets. GDS1454 is particularly i| it contains data obtained from the mononuclear cells of 111 subjects (11 normal subjects, 49 CLL patients without IgV H mutation, and 51 CLL patients with IgV H mutations). These GDS datasets were downloaded for analysis. In addition, a recently available CLL microarray dataset GSE10138 containing 68 patients was used to validate the biomarkers identified in the paper. Among them, the clinical informat|ctivity ratios r > 0.4 (i.e., given a co-expression network with K nodes and L edges, r = L /( K ( K-1 )/ 2 )) were selected for further analysis. Test selected genes on a CLL dataset (GDS1454) using supervised methods For the genes in the co-expression networks that included ZAP70, we further selected a subset of genes as potential prognosis markers for CLL by identifying genes whose ex|ur approach to doing so includes three steps (also see Figure 1 ): 1. We compared their expression levels between the 49 patients without IgV H mutation and the 51 patients with IgV H mutations in GDS1454 and selected genes which demonstrated significant differential expression between the two groups. 2. The genes selected in step 1 were further tested for their capability of predicting IgV H mutat| It allows us to select a subset of genes that can effectively distinguish the two groups of subjects (IgV H unmutated vs. IgV H mutated). Cross-validate the prognostic biomarkers with CLL dataset (GSE10138) Unsupervised K-mean clustering (K=2) was performed 100 times (to ensure convergence and avoid local optimal results) on CLL microarray dataset GSE10138 using the expression levels of ZAP70, IL2RB | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
22 | GDS2643 | 3/1/2007 | ['6691'] | ['2643'] | [u'17252022'] | 2621253 | [u'19139413'] | [u'Hern\xe1ndez', u'S\xe1nchez', 'Ocio', 'Arcos', 'San', 'Guti\xc3\xa9rrez', 'Fermi\xc3\xb1\xc3\xa1n', 'S\xc3\xa1nchez', 'de', 'Maiso', u'De', u'Fermi\xf1an', 'Hern\xc3\xa1ndez', u'Guti\xe9rrez', 'Delgado'] | ['Zhong', 'Wu', 'Pfeifer', 'Rauch', 'Riggs'] | [] | Proc Natl Acad Sci U S A | 2009 | 1/20/2009 | 1 | AND pmc_gds | 0 | 1 | ||||
23 | GDS2649 | 2/23/2007 | ['6740'] | ['2649'] | [u'17251300'] | 2872390 | [u'20360561'] | ['Kovacs', 'Loutfy', 'Der', 'Halpenny', 'Wilkins', u'Loufty', 'Yang', 'Heisler', u'Covacs', 'Hyrcza', 'Ostrowski'] | ['Liu', 'Zhou', 'Huang'] | [] | Proc Natl Acad Sci U S A | 2010 | 4/13/2010 | 0 | Abstract article-meta The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes th|ferences  The rapid accumulation of high-throughput genomic data offers an unprecedented opportunity to study human diseases. The National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) ( 1 ) with more than 330,000 gene expression profiles and an annual growth rate of 150%, is currently the largest database of its kind. The GEO systematically documents the molecular basis of m|rmance after Stage II refinement. ( B ) An example illustrating the error correction by the Stage II refinement. The query profile studies uterine leiomyomas obtained from fibroid afflicted patients (GDS484). The profile is annotated with four concepts by UMLS text mapping: Connective/Soft Tissue Neoplasm, Muscle tissue neoplasm, fibroid tumor, and uterine fibroids. The Stage I diagnosis predicted four| prediction is later corrected by Stage II refinement. ( C ) The figure presents the 110 disease classes and their hierarchical relationships. The red nodes represent diagnosed disease concepts for GDS563: (1) Nervous system disorder (2) Neuromuscular diseases (3) Myopathy (4) Musculoskeletal diseases (5) Congenital, Hereditary, and Neonatal diseases and abnormalities (CHNDA) (6) Genetic diseases, in| prediction performance decreases with the data reduction. Table 1. Prediction result of a subset of prevalent diseases We further exemplify the performance of our approach using the NCBI GEO dataset GDS563. This dataset was produced to identify modifying factors and pathogenic pathways involved in Duchenne Muscular Dystrophy (DMD). It consists of 24 microarrays from two subsets: 12 from DMD patients a|mance is shown in SI Text . A closer examination of the results shows further interesting features of our method. One example comes from the result for a query profiling the T-cells of HIV patients (GDS2649{{tag}}). Even though HIV is not included in the 110 disease classes of our diagnosis database due to the lack of sufficient training data, we obtain the relevant concept RNA virus infection that can descr|.org/cgi/content/full/0912043107/DCSupplemental .  Other Sections� Abstract Results Discussion Methods Supplementary Material References References 1. Edgar R, Domrachev M, Lash AE. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002; 30 :207�210. [ PMC free article ] [ PubMed ] 2. Horton PB, Kiseleva L, Fujibuchi W. RaPiDS: an algorith|ssion: directed search of large microarray compendia. Bioinformatics. 2007; 23 (20):2692�2699. [ PubMed ] 5. Zhu Y, et al. GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008; 24 (23):2798�2800. [ PMC free article ] [ PubMed ] 6. Shah NH, et al. Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics|e expression: directed search of large microarray compendia. Bioinformatics. 2007 Oct 15; 23(20):2692-9. [Bioinformatics. 2007] GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008 Dec 1; 24(23):2798-800. [Bioinformatics. 2008] Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics. 2009 Feb 5; 10 Suppl 2():S1. |e expression: directed search of large microarray compendia. Bioinformatics. 2007 Oct 15; 23(20):2692-9. [Bioinformatics. 2007] GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008 Dec 1; 24(23):2798-800. [Bioinformatics. 2008] Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics. 2009 Feb 5; 10 Suppl 2():S1. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
24 | GDS2649 | 2/23/2007 | ['6740'] | ['2649'] | [u'17251300'] | 2688935 | [u'19393085'] | ['Kovacs', 'Loutfy', 'Der', 'Halpenny', 'Wilkins', u'Loufty', 'Yang', 'Heisler', u'Covacs', 'Hyrcza', 'Ostrowski'] | ['Gohlke', 'Thomas', 'Parham', 'Portier', 'Stopper'] | [] | Genome Biol | 2009 | 2009 | 0 | AND pmc_gds | 0 | 1 | ||||
25 | GDS2649 | 2/23/2007 | ['6740'] | ['2649'] | [u'17251300'] | 2837949 | [u'19548271'] | ['Kovacs', 'Loutfy', 'Der', 'Halpenny', 'Wilkins', u'Loufty', 'Yang', 'Heisler', u'Covacs', 'Hyrcza', 'Ostrowski'] | ['Hudig', 'Lowe', 'Tamang', 'Alves', 'Elliott', 'Leong', 'Marshall', 'Redelman'] | [] | Cell Biochem Funct | 2009 | 2009 Jul | 0 | e detection threshold, the relative amount of colipase mRNA in CTLs was less than 1 × 10 −6 that of the pancreas. However, a search of the NCBI’s GEO Profiles Gene Expression Omnibus database indicated that under certain conditions T lymphocytes will express colipase. (See Geo Profiles: GDS993, GDS2649{{tag}}--MENTION-- and GDS1336.) We wondered if an unknown factor associated with the exception l | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
26 | GDS2654 | 1/24/2007 | ['6238'] | ['2654'] | [u'15960800'] | 2629779 | [u'19094235'] | ['Carter', 'Fuchs', 'Swaroop', 'Greenhall', 'Yoshida', 'Helton', 'Barlow', 'Lockhart'] | ['Chen', 'Thosar', 'Mallelwar', 'Butte', 'Venkatasubrahmanyam'] | [] | BMC Bioinformatics | 2008 | 12/18/2008 | 1 | 1,515 GEO data sets (GDS) spanning 231 microarray platforms (GPL) and 42 species with the latest probe-gene annotations. We performed all group versus group comparisons within each GDS. For example, GDS2654{{tag}}--REUSE--, a study of age-related neurological senescence in mice, was annotated with 4 experimental variables, including disease state, strain, tissue, and age. There were 2 groups in the disease state, res|#x02264; 0.05, fold > 2) (Additional File 1 , [ 13 ]). The most significant change in Nanog expression was 1000-fold upregulation in ES cells compared to hematopoietic stem cells (q = 0.006, GDS2718). Within the top 15 studies where Nanog was up-regulated by more than 25 fold, all except two tissue comparisons were studies of preimplantation embryonic developments in different strains of mice.|ls from male with consistent and severe teratozoospermia, a condition in which less than 4 percent of sperm cells are morphologically normal, compared to controls (q < 0.0005, rank = 0.5%ile, GDS2697). It was also significantly downregulated in the peripheral blood mononuclear cells (PBMC) from malaria infected patients compared to uninfected controls (fold = 4.5, rank = 3.3%ile, q < 0.|ig. 1 , [ 15 ]). It was 13.3-fold downregulated in primary malignant melanoma samples (q < 0.0005) and 2.2-fold downregulated in benign skin nevi (q = 0.004) compared to normal skin samples (GDS1375). It was also significantly downregulated in metastatic prostate cancer tumors compared to benign tumors (fold = 3.6, q = 0.022) and clinically localized tumors (fold = 3.5, q = 0.024, GDS1439). It|xpression was significantly affected by certain drugs (Additional File 2 , [ 19 ]). For example, it was 6.4-fold upregulated in non-small cell lung cancer after treatment with gemcitabine (q = 0.03, GDS2777). It was also 2.9-fold downregulated in PBMC of malaria infected patients after treatment with chloroquine (q = 0.003, GDS2362). Its expression in ES cells was downregulated after treatment with re|0, q = 0.006), Oct4 (fold = 333, q = 0.006), Sox2 (fold = 1000, q = 0.006) and LIN28 (fold = 143, q = 0.006) were all significantly upregulated in ES cells compared to hematopoietic stem cells (GDS2718). This confirmed their roles in ES cell development. An age comparison showed that Nanog (fold = 2.5, q = 0.006), Oct4 (fold = 3.2, q = 0.036), Sox2 (fold = 9.7, q = 0.013) and LIN28 (fold | (fold = 1.14, q = 0.01), Oct4 (fold = 1.1, q = 0.029), Sox2 (fold = 1.1, q = 0.006) and LIN28 (fold = 1.1, q = 0.024) were all downregulated in stearoyl-CoA desaturase 1 (Scd1) deficient mice (GDS1517). Considering they were only 1.2-fold downregulated after the knockout of Oct4 in mice (GDS1824), these changes are still significant. This finding suggests that Scd1 might be an upstream regulat|.8, q < 0.0005), Sox2 (fold = 1.5, q = 0.018) and LIN28 (fold = 2.3, q = 0.033) were all significantly downregulated in primary malignant melanoma samples compared to normal skin samples (GDS1375). To further investigate the role of ES cell factors in diseases, we performed another multiple gene search on Nanog , Oct4 and Sox2 and found that they were all significantly upregulated in|03c; 0.0005), Oct4 (fold = 2.3, q < 0.0005) and Sox2 (fold = 3.0, q < 0.0005) were all significantly downregulated in the PBMC from malaria infected patients compared to controls (GDS2362). The result indicates that key regulators of pluripotency are down-regulated in the response to malarial infection. Figure 2 Comparisons resulting differential expression of Nanog , Oct4 and S | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
27 | GDS2655 | 3/26/2007 | ['6236'] | ['2655'] | [u'17405831'] | 2875035 | [u'20097653'] | ['Lee', 'Josleyn', 'Miller', 'Gherman', 'Cam', 'Danner', 'Goh'] | ['Ruppin', 'Sharan', 'Shlomi', 'Tuller', 'Waldman'] | [] | Nucleic Acids Res | 2010 | 2010 May | 0 | e). GE data All expression data was downloaded from Gene Expression Omnibus ( 34 ) ( http://www.ncbi.nlm.nih.gov/geo/ ). Human tissues (including fetal tissues): we used the GE of Su et al. ( 35 ) (GDS596). As the original data set is redundant (i.e. it includes similar tissues; for example, more than 20 of the tissues are from different parts of the brain) we focused our analysis on 30 (out of 79) n|ssues ( Supplementary Table S2 ). Other GE sets: fetal and adult circulating blood reticulocytes (GDS2655{{tag}}--REUSE--), Mouse tissues (GDS592), Mouse fetal and adult liver (GSE13149), Mouse embryonic stem cells (GDS2666), Yeast (GDS772, wild type), Chimpanzee (GSE7540), Rat (GDS589, three strains), E. coli (GSE6836), D. melanogaster (GSE7763) and C. elegans (GSE8004). We averaged technical repeats and probes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
28 | GDS2677 | 4/25/2007 | ['7588'] | ['2677'] | [u'16822547'] | 2735178 | [u'19609302'] | ['Fu', 'Shinnick'] | ['Myllykallio', 'Ren', 'Flament', 'Lavigne', 'Meslet-Cladiere', 'Ladenstein', 'Norais', 'K\xc3\xbchn', 'Briffotaux'] | [] | EMBO J | 2009 | 8/19/2009 | 0 | ion forks ( Gari et al , 2008 ). NucS proteins may thus play a specialized role in processing of ssDNA extremities created by archaeal Hef proteins ( Komori et al , 2004 ). Finally, gene expression omnibus records GDS2677{{tag}}--MENTION-- and GDS326 ( www.ncbi.nih.gov ) indicate that in M. tuberculosis , NucS (Rv1321) is expressed in stressed cells. We propose a functional model of how the interactions of P. abyssi | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
29 | GDS2678 | 4/26/2007 | ['7540'] | ['2678'] | [u'14557539'] | 2592720 | [u'18811952'] | ['Preuss', 'Kudo', 'Redmond', 'Geschwind', 'Zapala', 'C\xc3\xa1ceres', 'Lachuer', u'C\xe1ceres', 'Barlow', 'Lockhart'] | ['Ruppin', 'Kupiec', 'Tuller'] | [] | Genome Biol | 2008 | 2008 | 1 | cal and 12 subcortical). The mouse gene expression appears in Additional data file 13. The gene expression of P. troglodytes (chimp) was downloaded from the Gene Expression Omnibus database [ 48 ] (GDS2678{{tag}}--REUSE-- record). It included 12 brain tissues; one of them was subcortical and the others were all cortical. Estimating gene age An estimation of gene age was obtained following the procedure described in | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
30 | GDS2697 | 4/27/2007 | ['6872'] | ['2697'] | [u'17327269'] | 2629779 | [u'19094235'] | ['Krawetz', 'Diamond', 'Goodrich', 'Strader', 'Quintana', 'Platts', 'Rockett', 'Dix', 'Rawe', 'Thompson', 'Chemes'] | ['Chen', 'Thosar', 'Mallelwar', 'Butte', 'Venkatasubrahmanyam'] | [] | BMC Bioinformatics | 2008 | 12/18/2008 | 1 | 1,515 GEO data sets (GDS) spanning 231 microarray platforms (GPL) and 42 species with the latest probe-gene annotations. We performed all group versus group comparisons within each GDS. For example, GDS2654, a study of age-related neurological senescence in mice, was annotated with 4 experimental variables, including disease state, strain, tissue, and age. There were 2 groups in the disease state, res|#x02264; 0.05, fold > 2) (Additional File 1 , [ 13 ]). The most significant change in Nanog expression was 1000-fold upregulation in ES cells compared to hematopoietic stem cells (q = 0.006, GDS2718). Within the top 15 studies where Nanog was up-regulated by more than 25 fold, all except two tissue comparisons were studies of preimplantation embryonic developments in different strains of mice.|ls from male with consistent and severe teratozoospermia, a condition in which less than 4 percent of sperm cells are morphologically normal, compared to controls (q < 0.0005, rank = 0.5%ile, GDS2697{{tag}}--REUSE--). It was also significantly downregulated in the peripheral blood mononuclear cells (PBMC) from malaria infected patients compared to uninfected controls (fold = 4.5, rank = 3.3%ile, q < 0.|ig. 1 , [ 15 ]). It was 13.3-fold downregulated in primary malignant melanoma samples (q < 0.0005) and 2.2-fold downregulated in benign skin nevi (q = 0.004) compared to normal skin samples (GDS1375). It was also significantly downregulated in metastatic prostate cancer tumors compared to benign tumors (fold = 3.6, q = 0.022) and clinically localized tumors (fold = 3.5, q = 0.024, GDS1439). It|xpression was significantly affected by certain drugs (Additional File 2 , [ 19 ]). For example, it was 6.4-fold upregulated in non-small cell lung cancer after treatment with gemcitabine (q = 0.03, GDS2777). It was also 2.9-fold downregulated in PBMC of malaria infected patients after treatment with chloroquine (q = 0.003, GDS2362). Its expression in ES cells was downregulated after treatment with re|0, q = 0.006), Oct4 (fold = 333, q = 0.006), Sox2 (fold = 1000, q = 0.006) and LIN28 (fold = 143, q = 0.006) were all significantly upregulated in ES cells compared to hematopoietic stem cells (GDS2718). This confirmed their roles in ES cell development. An age comparison showed that Nanog (fold = 2.5, q = 0.006), Oct4 (fold = 3.2, q = 0.036), Sox2 (fold = 9.7, q = 0.013) and LIN28 (fold | (fold = 1.14, q = 0.01), Oct4 (fold = 1.1, q = 0.029), Sox2 (fold = 1.1, q = 0.006) and LIN28 (fold = 1.1, q = 0.024) were all downregulated in stearoyl-CoA desaturase 1 (Scd1) deficient mice (GDS1517). Considering they were only 1.2-fold downregulated after the knockout of Oct4 in mice (GDS1824), these changes are still significant. This finding suggests that Scd1 might be an upstream regulat|.8, q < 0.0005), Sox2 (fold = 1.5, q = 0.018) and LIN28 (fold = 2.3, q = 0.033) were all significantly downregulated in primary malignant melanoma samples compared to normal skin samples (GDS1375). To further investigate the role of ES cell factors in diseases, we performed another multiple gene search on Nanog , Oct4 and Sox2 and found that they were all significantly upregulated in|03c; 0.0005), Oct4 (fold = 2.3, q < 0.0005) and Sox2 (fold = 3.0, q < 0.0005) were all significantly downregulated in the PBMC from malaria infected patients compared to controls (GDS2362). The result indicates that key regulators of pluripotency are down-regulated in the response to malarial infection. Figure 2 Comparisons resulting differential expression of Nanog , Oct4 and S | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
31 | GDS2704 | 3/1/2007 | ['6526'] | ['2704'] | [u'17327490'] | 2646459 | [u'19066325'] | ['Shirvani', 'Mookanamparambil', 'Chin', 'Ramoni'] | ['Lakoski', 'Vandenbergh', 'Lionikas', 'McClearn', 'Spicer', 'Blizard', 'Vasilopoulos', 'Stout', 'Vogler', 'Gerhard', 'Klein', 'Mack', 'Griffith', 'Larsson'] | [] | Physiol Genomics | 2009 | 2/2/2009 | 1 | istinct in the B6 strain (Supplemental Fig. S1). 1 Variants of the following genes were predicted (see materials and methods ) to have probably damaging effects: E130016E03Rik, expressed in heart (GDS2614), embryonic kidney (GDS1583), smooth muscle cells (GDS799); B230312A22Rik, expressed in embryonic kidney (GDS1583), heart (GDS2304), aortic smooth muscle cells (GDS2704{{tag}}); Nipsnap3a , expressed in |heart (GDS1228), aortic smooth muscle cells (GDS2704{{tag}}), kidney (GDS1583). The following variants are predicted to be possibly damaging: Tln1 , expressed in heart (GDS2614, GDS1228), embryonic kidney (GDS1748), aortic smooth muscle (GDS2704{{tag}}); Mdn1 , expressed in heart (GDS1228), embryonic kidney (GDS1583), smooth muscle cells (GDS799); Ltb4dh , expressed in heart (GDS1080, GDS2727), embryonic kidney ( | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
32 | GDS2718 | 4/20/2007 | ['7069'] | ['2718'] | [u'17448993'] | 2629779 | [u'19094235'] | ['Harel', 'Mirny', 'Reizis', 'Doetsch', 'Hou', 'Arenzana', 'Galan-Caridad'] | ['Chen', 'Thosar', 'Mallelwar', 'Butte', 'Venkatasubrahmanyam'] | [] | BMC Bioinformatics | 2008 | 12/18/2008 | 1 | 1,515 GEO data sets (GDS) spanning 231 microarray platforms (GPL) and 42 species with the latest probe-gene annotations. We performed all group versus group comparisons within each GDS. For example, GDS2654, a study of age-related neurological senescence in mice, was annotated with 4 experimental variables, including disease state, strain, tissue, and age. There were 2 groups in the disease state, res|#x02264; 0.05, fold > 2) (Additional File 1 , [ 13 ]). The most significant change in Nanog expression was 1000-fold upregulation in ES cells compared to hematopoietic stem cells (q = 0.006, GDS2718{{tag}}--REUSE--). Within the top 15 studies where Nanog was up-regulated by more than 25 fold, all except two tissue comparisons were studies of preimplantation embryonic developments in different strains of mice.|ls from male with consistent and severe teratozoospermia, a condition in which less than 4 percent of sperm cells are morphologically normal, compared to controls (q < 0.0005, rank = 0.5%ile, GDS2697). It was also significantly downregulated in the peripheral blood mononuclear cells (PBMC) from malaria infected patients compared to uninfected controls (fold = 4.5, rank = 3.3%ile, q < 0.|ig. 1 , [ 15 ]). It was 13.3-fold downregulated in primary malignant melanoma samples (q < 0.0005) and 2.2-fold downregulated in benign skin nevi (q = 0.004) compared to normal skin samples (GDS1375). It was also significantly downregulated in metastatic prostate cancer tumors compared to benign tumors (fold = 3.6, q = 0.022) and clinically localized tumors (fold = 3.5, q = 0.024, GDS1439). It|xpression was significantly affected by certain drugs (Additional File 2 , [ 19 ]). For example, it was 6.4-fold upregulated in non-small cell lung cancer after treatment with gemcitabine (q = 0.03, GDS2777). It was also 2.9-fold downregulated in PBMC of malaria infected patients after treatment with chloroquine (q = 0.003, GDS2362). Its expression in ES cells was downregulated after treatment with re|0, q = 0.006), Oct4 (fold = 333, q = 0.006), Sox2 (fold = 1000, q = 0.006) and LIN28 (fold = 143, q = 0.006) were all significantly upregulated in ES cells compared to hematopoietic stem cells (GDS2718{{tag}}--REUSE--). This confirmed their roles in ES cell development. An age comparison showed that Nanog (fold = 2.5, q = 0.006), Oct4 (fold = 3.2, q = 0.036), Sox2 (fold = 9.7, q = 0.013) and LIN28 (fold | (fold = 1.14, q = 0.01), Oct4 (fold = 1.1, q = 0.029), Sox2 (fold = 1.1, q = 0.006) and LIN28 (fold = 1.1, q = 0.024) were all downregulated in stearoyl-CoA desaturase 1 (Scd1) deficient mice (GDS1517). Considering they were only 1.2-fold downregulated after the knockout of Oct4 in mice (GDS1824), these changes are still significant. This finding suggests that Scd1 might be an upstream regulat|.8, q < 0.0005), Sox2 (fold = 1.5, q = 0.018) and LIN28 (fold = 2.3, q = 0.033) were all significantly downregulated in primary malignant melanoma samples compared to normal skin samples (GDS1375). To further investigate the role of ES cell factors in diseases, we performed another multiple gene search on Nanog , Oct4 and Sox2 and found that they were all significantly upregulated in|03c; 0.0005), Oct4 (fold = 2.3, q < 0.0005) and Sox2 (fold = 3.0, q < 0.0005) were all significantly downregulated in the PBMC from malaria infected patients compared to controls (GDS2362). The result indicates that key regulators of pluripotency are down-regulated in the response to malarial infection. Figure 2 Comparisons resulting differential expression of Nanog , Oct4 and S | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
33 | GDS2719 | 1/31/2007 | ['6916'] | ['2719'] | [u'15496517'] | 3022631 | [u'21172034'] | ['', 'Griswold', 'Shima', 'Uzumcu', 'Skinner', 'Small'] | ['Engreitz', 'Thathoo', 'Dudley', 'Altman', 'Morgan', 'Chen', 'Butte'] | [] | BMC Bioinformatics | 2010 | 12/21/2010 | 0 | Results Discussion Conclusions Methods Authors' contributions Supplementary Material References Abstract article-meta Background With the expansion of public repositories such as the Gene Expression Omnibus (GEO), we are rapidly cataloging cellular transcriptional responses to diverse experimental conditions. Methods that query these repositories based on gene expression content, rather than textual ann|bations and phenotypes. Public repositories provide a wealth of data amenable to this task. The largest of these repositories, the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) [ 7 ], now contains over 400,000 individual samples from more than 17,000 experiments detailing the molecular characteristics of diverse cell types, diseases, and drug treatments. The European |ogene. We compared DE profiles using Pearson correlation and applied hierarchical clustering to find that the profiles cluster primarily by disease and tissue. One GEO Series appears more than once: GSE3790 provides three profiles that cluster together, comparing normal to diseased tissue in cerebellum, frontal cortex, and caudate nucleus. First we compared the effects of data representation on our a|elated biological processes and perturbations. Figure ​ Figure4B 4B shows a multi-experiment cluster examining gonad development in mouse, consisting of differential expression profiles from GDS2098, GDS2203, and GDS2719{{tag}}--REUSE--. Each profile compares gonad tissue at two developmental stages, between 10 and 18 days post coitum. The highly significant associations between the testis (GDS2098) and ovary| the hypothesis that experiments that differentially express similar genes and pathways would also share functional phenotypic relationships. To assess the utility of this approach, we used data from GDS1824 to investigate the effects of Nanog knockdown in embryonic stem cells (ESCs) [ 30 ]. We created a DE profile comparing Nanog knockdown to control in mouse ESCs, and queried all GEO DataSets to iden| induced by removal of LIF (leukemia inhibitory factor) [ 33 ], a cytokine necessary to maintain the undifferentiated state of ESCs [ 34 ]. The Nanog knockdown search also identified comparisons from GDS1823, also from Loh et al. [ 30 ], where ESC differentiation was induced by drug treatment with retinoic acid (RA) or hexa-methylene-bis-acetamide (HMBA). Figure 5 Search results for Nanog knockdown . F|proportional to the contribution that the gene makes to the final correlation score, and thus is a function of the magnitude as well as the significance of differential expression. (A) Comparison of GSE18326 : FoxO3 null versus wild type and GDS2758: normoxia versus hypoxia. (B) Comparison of GDS1824: Nanog knockdown versus control and GDS1688: non-small cell adenocarcinoma versus small cell cancer. A| cell cycle arrest, differentiation, and detoxification of reactive oxygen species (ROS) [ 41 , 42 ]. We created a DE profile comparing wild type to FoxO3 -/- adult mice using normalized data from GSE18326 . A query of GEO DE profiles yielded numerous significant results, the most significant of which are shown in Figure ​ Figure7. 7 . Several matching profiles (Results 2, 3, 5 and 9) implic|MEFs), and furthermore that this activation requires functional HIF-1 α [ 43 ]. Renault et al. also found that FoxO3 is required for the expression of hypoxia-dependent genes in NSCs [ 40 ]. GDS2162 (Result 5) compares p300 and CBP null MEFs in response to dipyridyl (DP) or control (EtOH). DP, a hypoxia mimetic, induces HIF-1 α [ 44 ] and thus potentially FoxO3. In all four hypoxia-re|e [ 59 ] to evaluate the performance of various data processing methods. GEO DataSet search To search GEO for experiments with similar transcriptional patterns, we indexed all GEO DataSets (GDSs). We downloaded processed data from GEO and used the GDS "Value type" field to transform the data to log 2 space. Each GDS is manually annotated with one or more factors, e.g., "disease state" or "time", which ou|ck here for file (11M, PDF) Additional file 4 Legend for differential expression profile network . Legend for Additional file 3 . Click here for file (11K, PDF) Additional file 5 FXYD6 Expression in GDS1824 . GEO Gene profile for FXYD6 in GDS1824. See http://www.ncbi.nlm.nih.gov/sites/entrez?db=geo&term=GDS1824[ACCN]+fxyd6 . Click here for file (5.5K, PNG) Additional file 6 FXYD6 Expression i | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
34 | GDS2727 | 3/8/2007 | ['7196'] | ['2727'] | [u'17488637'] | 2646459 | [u'19066325'] | [u'Gigu\xe8re', 'Wilson', 'Evans', 'Huss', 'Alaynick', 'Gigu\xc3\xa8re', 'Dufour', 'Kelly', 'Blanchette', 'Downes'] | ['Lakoski', 'Vandenbergh', 'Lionikas', 'McClearn', 'Spicer', 'Blizard', 'Vasilopoulos', 'Stout', 'Vogler', 'Gerhard', 'Klein', 'Mack', 'Griffith', 'Larsson'] | [] | Physiol Genomics | 2009 | 2/2/2009 | 1 | istinct in the B6 strain (Supplemental Fig. S1). 1 Variants of the following genes were predicted (see materials and methods ) to have probably damaging effects: E130016E03Rik, expressed in heart (GDS2614), embryonic kidney (GDS1583), smooth muscle cells (GDS799); B230312A22Rik, expressed in embryonic kidney (GDS1583), heart (GDS2304), aortic smooth muscle cells (GDS2704); Nipsnap3a , expressed in |heart (GDS1228), aortic smooth muscle cells (GDS2704), kidney (GDS1583). The following variants are predicted to be possibly damaging: Tln1 , expressed in heart (GDS2614, GDS1228), embryonic kidney (GDS1748), aortic smooth muscle (GDS2704); Mdn1 , expressed in heart (GDS1228), embryonic kidney (GDS1583), smooth muscle cells (GDS799); Ltb4dh , expressed in heart (GDS1080, GDS2727{{tag}}), embryonic kidney ( | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
35 | GDS2728 | 3/6/2007 | ['7181'] | ['2728'] | [u'17483311'] | 2881867 | [u'20539758'] | ['Wischhusen', 'Brawanski', 'Lohmeier', 'Proescholdt', 'Bogdahn', 'Aigner', 'Beier', 'Hau', 'Oefner'] | ['Duvall', 'Irvin', 'Zhang', 'Wheeler', 'Zhai', 'Black', 'Panwar', 'Seksenyan', 'Jouanneau', 'Sarayba'] | [] | PLoS One | 2010 | 6/7/2010 | 0 | ) Principal Component Analyses focused on discrete gene lists were plotted in GeneSpring GX 7.3, and group clusters circled, on: 59 GBMs from UCLA database (“UCLA GBM”; GEO accession #GSE4412), 12 GBMs from 6 patients collected before and after DC vaccination (“vaccinated GBM”; GEO accession #GSE9166); 10 GBMs from 5 patients collected before and after standard radiation|c;control GBM”; GEO accession #GSE9166) (red); CD133 - and CD133 + CSCs from 6 University of Regensberg GBM patients [29] (“UR GSC”; GEO accession #GDS2728{{tag}}--REUSE--) (green); stem cell media-cultured GBM lines from 2 Henry Ford Hospital patients (“HFH CS lines”; GEO accession #GSE4536); murine GL26 glioma samples recovered and cultured ≤|pression ( Fig. S1A ). GL26B6V exhibited parallel trends in all analogous gene lists (right column). (B) Primary GBM microarray expression values from 200 Henry Ford Hospital patients (GEO accession #GSE4536) were assessed for similarity to averaged expression values of 6 UCLA glioma CSCs by determining Pearson's coefficients across 54,674 transcripts, and arranged in order of ascending coefficient val|ne.0010974.g002 Figure 2 Regulation of stem-like gene expression in proportion to anti-tumor T cell activity. (A) CSC similarity (Pearson's coefficient for similarity to GSCs – GEO accession #GDS2728{{tag}}--REUSE-- - across all transcripts) from 200 Henry Ford Hospital GBM patients (GEO accession #GSE4536) and 6 CSMC GBM patients was assessed and found to be statistically identical, demonstrating absence of r|M”, total n = 200), and cultured GBM lines grown in stem cell media (“HFH CS lines”, total n = 23) as indicated (GEO accession #GSE4536 for both), were arranged in groups with increasing global CSC similarity as in Fig. 1B (n = 20/group for GBM, groups A-J; n = 5 for 4 groups, and n|GSCs across all transcripts), and CD133 expression, were determined for GBM from HFH (n = 200) and high-grade gliomas from UCLA (n = 45; GEO accession #GSE4412), plotted against each other for each individual sample, trendlines generated, and r and P values determined as depicted. CSC similarity correlated significantly with CD133 expression within two se|arity, to distinguish non-fractionated GBM from CD133 − or CD133 + GSCs (29) (GEO accession #GDS2728{{tag}}--REUSE--) or from stem cell media-cultured GBM lines from 2 patients (37) (GEO accession #GSE4536); was determined (P<0.01 denoted by red asterisk). Unlike CD133 expression, only GSC similarity distinguished CD133 + or CD133 GSCs (from multiple sources) from surgical GBM sample|fficient for similarity to GSCs - GEO accession #GDS2728{{tag}} - across all transcripts), and CD133 expression, were determined for de novo GBM, secondary GBM, and grade 3 gliomas from UCLA (GEO accession #GSE4412) and each parameter assessed for inter-group differences by one-tailed T-test (P<0.01 denoted by red asterisk). CD133 expression has been shown to be highest in de novo GBM (29), and this w | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
36 | GDS2733 | 3/1/2007 | ['6930'] | ['2733'] | [u'17425403'] | 2781753 | [u'19736252'] | ['Lessnick', 'Stegmaier', 'Wright', 'Wong', 'Kung', 'Chow', 'Peck', 'Golub', 'Ross'] | ['Tsuda', 'Kayano', 'Mamitsuka', 'Takigawa', 'Shiga'] | [] | Bioinformatics | 2009 | 11/1/2009 | 1 | ng the lowest P -value for each gene pair of the 10 interactions in Table 4 . For example, for COX6C and UBA1, the gene pair of the first interaction of Table 4 , we found a switching mechanism in GDS2960_1 with the P -value of −3.9532, showing the statistical significance of this mechanism. This directly indicates that there must exist a switching mechanism in expression between these two |ank Gene pair #datasets GDS P - value #ex. #ex. Annotation from GEO class 1 class 2 1 {COX6C,UBA1} 117 GDS2960_1 −3.9532 60 41 Marfan syndrome: cultured skin fibroblasts 2 {RERE,TNFRSF1A} 284 GDS2736_25 −5.9049 19 15 Malignant fibrous histiocytoma and various soft tissue sarcomas 3 {ATP5D,ITCH} 324 GDS1875_3 −5.1235 27 24 Host cell response to HIV-1 Vpr-induced cell cycle arrest|GDS2733{{tag}}--REUSE--_1 −7.9996 17 17 Cytosine arabinoside effect on Ewing's sarcoma cell line 5 {NCSTN,HSPA5} 102 GDS2545_5 −6.4398 63 25 Metastatic prostate cancer (HG-U95A) 6 {NDUFA8,NDUFA6} 142 GDS2733{{tag}}--REUSE--_4 −4.7027 17 16 Cytosine arabinoside effect on Ewing's sarcoma cell line 7 {ALS2,SLC25A6} 108 GDS1627_2 −3.2808 16 15 Breast cancer cell lines response to chemotherapeutic drugs 8 {|P5J} 418 GDS2960_1 −3.1628 60 41 Marfan syndrome: cultured skin fibroblasts 9 {COX6C,UBA1} 117 GDS2960_1 −3.9532 60 41 Marfan syndrome: cultured skin fibroblasts 10 {NDUFA10,COX4} 232 GDS2643_9 −6.2133 13 12 Waldenstrom's macroglobulinemia: B lymphocytes and plasma cells For each gene pair of 10 interactions in Table 4 , the number of datasets obtained from GEO, the GDS which g|ng genes. 3 In each GDS, if it has more than two classes or replicated experiments, we consider all possible pairwise combinations of them. We then name generated multiple datasets from one GDS (e.g. GDS2960) those like GDS2960_1, GDS2960_2, etc. This results in that the number of datasets we used could be >36. The actual number of datasets for each gene pair is shown in Table 5 . 4 For each g | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
37 | GDS2736 | 4/1/2007 | ['6481'] | ['2736'] | [u'17464315'] | 2781753 | [u'19736252'] | ['Kawai', 'Nemoto', 'Ichikawa', 'Toyama', 'Seki', 'Ohta', 'Hasegawa', 'Nakayama', 'Yoshida', 'Takahashi'] | ['Tsuda', 'Kayano', 'Mamitsuka', 'Takigawa', 'Shiga'] | [] | Bioinformatics | 2009 | 11/1/2009 | 1 | ng the lowest P -value for each gene pair of the 10 interactions in Table 4 . For example, for COX6C and UBA1, the gene pair of the first interaction of Table 4 , we found a switching mechanism in GDS2960_1 with the P -value of −3.9532, showing the statistical significance of this mechanism. This directly indicates that there must exist a switching mechanism in expression between these two |ank Gene pair #datasets GDS P - value #ex. #ex. Annotation from GEO class 1 class 2 1 {COX6C,UBA1} 117 GDS2960_1 −3.9532 60 41 Marfan syndrome: cultured skin fibroblasts 2 {RERE,TNFRSF1A} 284 GDS2736{{tag}}--REUSE--_25 −5.9049 19 15 Malignant fibrous histiocytoma and various soft tissue sarcomas 3 {ATP5D,ITCH} 324 GDS1875_3 −5.1235 27 24 Host cell response to HIV-1 Vpr-induced cell cycle arrest|GDS2733_1 −7.9996 17 17 Cytosine arabinoside effect on Ewing's sarcoma cell line 5 {NCSTN,HSPA5} 102 GDS2545_5 −6.4398 63 25 Metastatic prostate cancer (HG-U95A) 6 {NDUFA8,NDUFA6} 142 GDS2733_4 −4.7027 17 16 Cytosine arabinoside effect on Ewing's sarcoma cell line 7 {ALS2,SLC25A6} 108 GDS1627_2 −3.2808 16 15 Breast cancer cell lines response to chemotherapeutic drugs 8 {|P5J} 418 GDS2960_1 −3.1628 60 41 Marfan syndrome: cultured skin fibroblasts 9 {COX6C,UBA1} 117 GDS2960_1 −3.9532 60 41 Marfan syndrome: cultured skin fibroblasts 10 {NDUFA10,COX4} 232 GDS2643_9 −6.2133 13 12 Waldenstrom's macroglobulinemia: B lymphocytes and plasma cells For each gene pair of 10 interactions in Table 4 , the number of datasets obtained from GEO, the GDS which g|ng genes. 3 In each GDS, if it has more than two classes or replicated experiments, we consider all possible pairwise combinations of them. We then name generated multiple datasets from one GDS (e.g. GDS2960) those like GDS2960_1, GDS2960_2, etc. This results in that the number of datasets we used could be >36. The actual number of datasets for each gene pair is shown in Table 5 . 4 For each g | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
38 | GDS2737 | 5/1/2007 | ['6364'] | ['2737'] | [u'17510236'] | 2872390 | [u'20360561'] | ['Nezhat', '', 'Burney', 'Giudice', 'Vo', 'Hamilton', 'Talbi', 'Nyegaard', 'Lessey'] | ['Liu', 'Zhou', 'Huang'] | [] | Proc Natl Acad Sci U S A | 2010 | 4/13/2010 | 0 | Abstract article-meta The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes th|ferences  The rapid accumulation of high-throughput genomic data offers an unprecedented opportunity to study human diseases. The National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) ( 1 ) with more than 330,000 gene expression profiles and an annual growth rate of 150%, is currently the largest database of its kind. The GEO systematically documents the molecular basis of m|rmance after Stage II refinement. ( B ) An example illustrating the error correction by the Stage II refinement. The query profile studies uterine leiomyomas obtained from fibroid afflicted patients (GDS484). The profile is annotated with four concepts by UMLS text mapping: Connective/Soft Tissue Neoplasm, Muscle tissue neoplasm, fibroid tumor, and uterine fibroids. The Stage I diagnosis predicted four| prediction is later corrected by Stage II refinement. ( C ) The figure presents the 110 disease classes and their hierarchical relationships. The red nodes represent diagnosed disease concepts for GDS563: (1) Nervous system disorder (2) Neuromuscular diseases (3) Myopathy (4) Musculoskeletal diseases (5) Congenital, Hereditary, and Neonatal diseases and abnormalities (CHNDA) (6) Genetic diseases, in| prediction performance decreases with the data reduction. Table 1. Prediction result of a subset of prevalent diseases We further exemplify the performance of our approach using the NCBI GEO dataset GDS563. This dataset was produced to identify modifying factors and pathogenic pathways involved in Duchenne Muscular Dystrophy (DMD). It consists of 24 microarrays from two subsets: 12 from DMD patients a|mance is shown in SI Text . A closer examination of the results shows further interesting features of our method. One example comes from the result for a query profiling the T-cells of HIV patients (GDS2649). Even though HIV is not included in the 110 disease classes of our diagnosis database due to the lack of sufficient training data, we obtain the relevant concept RNA virus infection that can descr|.org/cgi/content/full/0912043107/DCSupplemental .  Other Sections� Abstract Results Discussion Methods Supplementary Material References References 1. Edgar R, Domrachev M, Lash AE. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002; 30 :207�210. [ PMC free article ] [ PubMed ] 2. Horton PB, Kiseleva L, Fujibuchi W. RaPiDS: an algorith|ssion: directed search of large microarray compendia. Bioinformatics. 2007; 23 (20):2692�2699. [ PubMed ] 5. Zhu Y, et al. GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008; 24 (23):2798�2800. [ PMC free article ] [ PubMed ] 6. Shah NH, et al. Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics|e expression: directed search of large microarray compendia. Bioinformatics. 2007 Oct 15; 23(20):2692-9. [Bioinformatics. 2007] GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008 Dec 1; 24(23):2798-800. [Bioinformatics. 2008] Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics. 2009 Feb 5; 10 Suppl 2():S1. |e expression: directed search of large microarray compendia. Bioinformatics. 2007 Oct 15; 23(20):2692-9. [Bioinformatics. 2007] GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008 Dec 1; 24(23):2798-800. [Bioinformatics. 2008] Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics. 2009 Feb 5; 10 Suppl 2():S1. | 0 | 0 | 0 | NOT pmc_gds | 0 | 1 |
39 | GDS2739 | 3/28/2007 | ['7377'] | ['2739'] | [u'17591970'] | 2862708 | [u'20454660'] | ['Allred', 'Lee', 'Mohsin', 'Tsimelzon', 'Medina', 'Wu', 'Mao'] | ['Reyes-Vald\xc3\xa9s', 'Herrera-Estrella', 'Mart\xc3\xadnez'] | [] | PLoS One | 2010 | 5/3/2010 | 0 | tic, enlarged lobular units (HELUs; 8 samples) from RNA samples obtained by microdissection [Lee, 2007 #108]. The data were downloaded from the GEO database at the NCBI (GEO accession GDS2739{{tag}}--REUSE--; http://www.ncbi.nlm.nih.gov/ ) in July 2008. This dataset has expression information for 34,702 human genes in each of the 16 arrays. Information theory and statistical analyses If we consider th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
40 | GDS2744 | 5/12/2007 | ['7765'] | ['2744'] | [u'17517823'] | 2688935 | [u'19393085'] | ['Yoon', 'Wang', 'Zhang', 'Choi', 'Taylor', 'Hsu', 'Chen', 'Hankinson'] | ['Gohlke', 'Thomas', 'Parham', 'Portier', 'Stopper'] | [] | Genome Biol | 2009 | 2009 | 0 | AND pmc_gds | 0 | 1 | ||||
41 | GDS2765 | 6/13/2007 | ['5140'] | ['2765'] | [u'17416441'] | 2784758 | [u'19925686'] | [u'H\xf6lter', 'Bender', 'Schmidt', 'Quintanilla-Martinez', 'Gailus-Durner', 'Ruthsatz', 'Mijalski', 'H\xc3\xb6lter', 'Rujescu', 'Fuchs', 'Schneider', 'de', 'Wurst', u'Vogt', u'Hrab\xe9', 'Vogt-Weisenhorn', 'Haack', 'Becker', 'Klopstock', 'Beckers', 'Mader', 'Genius', 'Irmler'] | ['Wan', 'Kiseleva', 'Horton', 'Mamitsuka', 'Harada'] | [] | Source Code Biol Med | 2009 | 11/20/2009 | 1 | ts from GEO. The table shows the dimensions of the data set in the second and third columns. In these experiments, we use the entire data set, irrespective of any additional information. For example, GDS596 contains 158 experiments of two sets of replicates. Thus, it would be more appropriate to apply the system to only 79 of the experiments. Nevertheless, the purpose of these results is to demonstrate|uch smaller. Table 4 Dimensions of GEO test data and running time and memory usage of build-mst . Data set size Execution of build-mst Data set Experiments Probes Elapsed time (s) Memory usage (MB) GDS596 158 22283 23.28 201.391 GDS1962 180 54681 66.65 383.980 GDS2765{{tag}}--REUSE-- 13 45101 1.03 28.305 GDS2771 192 22283 33.03 276.352 GDS3069 12 22283 0.52 25.000 GDS3216 12 22810 0.56 25.758 The dimensions of the G|unning times are reported as seconds and averaged across 5 trials. The running time and memory usage of build-mst are shown as the last two columns of Table 4 . The longest time is associated with GDS1962, which takes just over 1 minute. As expected, the running time is more dependent on the number of experiments than the number of probes, as shown by comparing the results of GDS596 with GDS2765{{tag}}--REUSE--. As| the number of virtual processors range from 1 to 8. The files generated were Portable Network Graphics files (PNG) at a resolution of 300 dots per inch (DPI). (Higher resolutions require more time.) GDS2765{{tag}}--REUSE--: Effect of Creatine on Mice As an example, we apply both HAMSTER and hierarchical clustering to the data set GDS2765{{tag}}--REUSE--, where researchers investigated the effect creatine has on the expression level |iment type (untreated or treated), except in the sense that we chose the color associated with each experiment based on its type (untreated in red; treated in blue). In Figure 9 , the dendrogram for GDS2765{{tag}}--REUSE-- shows at least two groups - the five controls on the far left, followed by five treated samples. Four of the treated samples appear as one such group in the center. As with the example dendrogram o|ent. This is because an MST can have a node connected to any number of other nodes. In a dendrogram, though, the structure is restricted to recursive, pair-wise relationships. Figure 9 Dendrogram for GDS2765{{tag}} . The dendrogram was constructed using Euclidean distance and single linkage. The control samples are in red; the treated ones are in blue. Figure 10 MST 0 for GDS2765{{tag}} . The first MST generated usi|experiments remain by themselves and all other experiments have merged into node 8. The node has the default color and shape attributes since it has a mix of both attribute types. Figure 11 MST 6 for GDS2765{{tag}}--REUSE-- . The node colors correspond to those of Figure 10. So far, control and treated samples have not yet mixed within any node. The normalized association score is 97.78. Figure 12 MST 9 for GDS2765{{tag}}--REUSE-- . |ically, while the modified one now peaks in the middle. The remaining two scoring schemes (Gaps and ANOVA) grow steadily and peak after the half-way point. Figure 13 Comparison of scoring schemes for GDS2765{{tag}}--REUSE-- . A comparison of the scoring schemes available to HAMSTER corresponding to the MSTs of Figures 10, 11, and 12. All schemes represent scores as a percentage of the maximum to facilitate easy compar|w well the experiments of a data set cluster, regardless of the clustering method. We suggest that users try the scheme which is most suitable for their needs, based on the definitions given earlier. GDS3069: Various brain tumors Our next sample data set is GDS3069 which was used to analyze 12 primary brain tumors based on their histological diagnoses [ 30 ]. Unlike the previous data set of two distinc|wo samples (GSM215423 and GSM215425) are grouped together in the same sub-tree, while the third (GSM215422) is combined with the four samples to its right at an earlier step. Figure 14 Dendrogram for GDS3069 . This data set reports on the expression levels of 12 primary brain tumors, divided into 5 categories. The colors assigned to them are: high grade glioblastoma (red), high grade gliosarcoma (yello| set, occurs between the two high grade gliosarcoma (yellow) samples. This example demonstrates the use of MSTs for data sets where the sample categories can be placed on a scale. Figure 15 MST 0 for GDS3069 . The nodes of the first MST have been colored in the same way as Figure 14. Euclidean distance with average linkage has been selected. GDS3216: Whole seedling roots response to salinity stress: ti|each other. Then, the experiments would be ordered based on time. For an MST, others have shown that a chain of experiments would be formed, ordered by the time-index [ 22 ]. Figure 16 Dendrogram for GDS3216 . The study associated with this data set measured the response A. thaliana seedling roots have to sa | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
42 | GDS2777 | 1/31/2007 | ['6914'] | ['2777'] | [u'17483357'] | 2629779 | [u'19094235'] | [u'Valencia', u'Sanders', 'Yen', 'Tooker', 'Ng', 'Negro-Vilar', 'Hermann', u'Prudente', u'Wan-Ching', u'Luo'] | ['Chen', 'Thosar', 'Mallelwar', 'Butte', 'Venkatasubrahmanyam'] | [] | BMC Bioinformatics | 2008 | 12/18/2008 | 1 | 1,515 GEO data sets (GDS) spanning 231 microarray platforms (GPL) and 42 species with the latest probe-gene annotations. We performed all group versus group comparisons within each GDS. For example, GDS2654, a study of age-related neurological senescence in mice, was annotated with 4 experimental variables, including disease state, strain, tissue, and age. There were 2 groups in the disease state, res|#x02264; 0.05, fold > 2) (Additional File 1 , [ 13 ]). The most significant change in Nanog expression was 1000-fold upregulation in ES cells compared to hematopoietic stem cells (q = 0.006, GDS2718). Within the top 15 studies where Nanog was up-regulated by more than 25 fold, all except two tissue comparisons were studies of preimplantation embryonic developments in different strains of mice.|ls from male with consistent and severe teratozoospermia, a condition in which less than 4 percent of sperm cells are morphologically normal, compared to controls (q < 0.0005, rank = 0.5%ile, GDS2697). It was also significantly downregulated in the peripheral blood mononuclear cells (PBMC) from malaria infected patients compared to uninfected controls (fold = 4.5, rank = 3.3%ile, q < 0.|ig. 1 , [ 15 ]). It was 13.3-fold downregulated in primary malignant melanoma samples (q < 0.0005) and 2.2-fold downregulated in benign skin nevi (q = 0.004) compared to normal skin samples (GDS1375). It was also significantly downregulated in metastatic prostate cancer tumors compared to benign tumors (fold = 3.6, q = 0.022) and clinically localized tumors (fold = 3.5, q = 0.024, GDS1439). It|xpression was significantly affected by certain drugs (Additional File 2 , [ 19 ]). For example, it was 6.4-fold upregulated in non-small cell lung cancer after treatment with gemcitabine (q = 0.03, GDS2777{{tag}}--REUSE--). It was also 2.9-fold downregulated in PBMC of malaria infected patients after treatment with chloroquine (q = 0.003, GDS2362). Its expression in ES cells was downregulated after treatment with re|0, q = 0.006), Oct4 (fold = 333, q = 0.006), Sox2 (fold = 1000, q = 0.006) and LIN28 (fold = 143, q = 0.006) were all significantly upregulated in ES cells compared to hematopoietic stem cells (GDS2718). This confirmed their roles in ES cell development. An age comparison showed that Nanog (fold = 2.5, q = 0.006), Oct4 (fold = 3.2, q = 0.036), Sox2 (fold = 9.7, q = 0.013) and LIN28 (fold | (fold = 1.14, q = 0.01), Oct4 (fold = 1.1, q = 0.029), Sox2 (fold = 1.1, q = 0.006) and LIN28 (fold = 1.1, q = 0.024) were all downregulated in stearoyl-CoA desaturase 1 (Scd1) deficient mice (GDS1517). Considering they were only 1.2-fold downregulated after the knockout of Oct4 in mice (GDS1824), these changes are still significant. This finding suggests that Scd1 might be an upstream regulat|.8, q < 0.0005), Sox2 (fold = 1.5, q = 0.018) and LIN28 (fold = 2.3, q = 0.033) were all significantly downregulated in primary malignant melanoma samples compared to normal skin samples (GDS1375). To further investigate the role of ES cell factors in diseases, we performed another multiple gene search on Nanog , Oct4 and Sox2 and found that they were all significantly upregulated in|03c; 0.0005), Oct4 (fold = 2.3, q < 0.0005) and Sox2 (fold = 3.0, q < 0.0005) were all significantly downregulated in the PBMC from malaria infected patients compared to controls (GDS2362). The result indicates that key regulators of pluripotency are down-regulated in the response to malarial infection. Figure 2 Comparisons resulting differential expression of Nanog , Oct4 and S | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
43 | GDS2825 | 5/23/2007 | ['6712'] | ['2825'] | [u'17379704'] | 2850364 | [u'20170516'] | ['H\xc3\xa9braud', u'M\xf8retr\xf8', 'Langsrud', u'Skj\xe6ret', 'M\xc3\xb8retr\xc3\xb8', 'Chafsey', 'Rudi', 'Skjaeret', u'Hebraud', 'Bore', 'Chambon', 'Moen'] | ['Mongiov\xc3\xac', 'Di', 'Giugno', 'Shasha', 'Pulvirenti', 'Ferro'] | [] | BMC Bioinformatics | 2010 | 2/19/2010 | 0 | gene expressions. We extracted The largest connected component from the complete network, available with the supplementary material of [ 13 ]. Gene expression profiles of 22 samples of the experiment GDS2825{{tag}}--REUSE-- (freely available from NCBI [ 14 ]), concerning the analysis of E. Coli K12 strains adapted to grow in benzalkonium chloride (a commonly used disinfectant and preservative) were used. We discretize | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
44 | GDS2832 | 5/25/2007 | ['7896'] | ['2832'] | [u'18393631'] | 2720182 | [u'19584098'] | ['Avery', 'Shepherd', u'Inniss', 'Moore', 'Heath'] | ['Paradowska', 'Arpanahi', 'Miller', 'Steger', 'Krawetz', 'Brinkworth', 'Tedder', 'Saida', 'Iles', 'Platts'] | [] | Genome Res | 2009 | 2009 Aug | 1 | AND pmc_gds | 0 | 1 | ||||
45 | GDS2922 | 6/28/2007 | ['5180'] | ['2922'] | [u'17502243'] | 2809109 | [u'20098615'] | ['McKellar', 'Majumdar', 'Miller', 'Sreekumar', 'Ballman', 'Bolander', 'Unnikrishnan', 'Sarkar', 'Sundt'] | ['Mendelsohn', 'Montefusco', 'Housman', 'Payne', 'Kapur', 'Hedgepeth', 'Wooten', 'Huggins', 'Iyer'] | [] | PLoS One | 2010 | 1/21/2010 | 0 | ed syndromes. Second, we identified genes differentially expressed in the aorta of subjects with BAV compared with a normal tricuspid aortic valve (TAV) from the Gene Expression Omnibus (GEO) dataset GDS2922{{tag}}--REUSE--. [25] Analysis of these arrays was carried out by both parametric (limma) and non-parametric (RankProd) methods. These two approaches predominantly selected different gene-sets rel|tion (see Table S1 ). Third, we included the most significantly altered genes (n = 41) detected in the peripheral blood of patients with TAA compared with normal individuals (GSE9106). [26] Finally, we included chromosomal loci linked with BAV by microsatellite-based study [9] , [27] . Through our primary analysis of microarray| by STRING, CANDID, and FitSNPs. ENG was initially selected for inclusion by RankProd analysis that found differential expression in aortic aneurismal tissue from patients with BAV compared with TAV (GDS2922{{tag}}--REUSE--). Haplotype analysis identified one block including a conservative coding region variant (ENG-T343T, rs3739817) associated with BAV (p-value 5.88×10 −04 , OR 2.79) ( Figure 5 ). ENG|opment of BAV [27] . These core terms produced a basic protein interaction network consisting of 124 proteins (and genomic locations). 2) Addition of prioritized genes : We analyzed GDS2922{{tag}}--REUSE-- and GSE9106 separately using limma [48] and RankProd [49] applying default settings within the statistical analysis program R [50] . 3) STRING | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
46 | GDS2951 | 5/18/2007 | ['7833'] | ['2951'] | [u'19008490'] | 2776015 | [u'19852844'] | ['Winchester', 'Maxwell', 'Gleadle', 'Smith', 'Brooks', 'Elvidge', 'Talbot', 'Liu', 'Ragoussis', 'Glenny', 'Robbins'] | ['Nonaka', 'Ivanov', 'Roth', 'Tsao', 'Liu', 'Ivanova', 'Wistuba', 'Pass', 'Prudkin'] | ['Liu'] | Mol Cancer | 2009 | 10/24/2009 | 1 | c link. As a result, we found in the NCBI Gene Expression Omnibus several experimental assays performed on different cell types that consistently showed TUSC2 mRNA suppression by hypoxia (see records GDS2759, GDS2018, GDS2760, and GDS2951{{tag}}--REUSE-- for FUS1/TUSC2 at ). Based on these data and the dual role of TUSC2 as an immune regulator and tumor suppressor it may be suggested that TUSC2 suppression during tum | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
47 | GDS2952 | 7/7/2007 | ['7524'] | ['2952'] | [] | 2889041 | [u'20439746'] | [u'Mulcahy', u'Molloy', u"O'Gara", u'Adams'] | ['Pawelec', 'Goldmann', 'Uddin', 'de', 'Koenen', 'Wildman', 'Aiello', 'Galea'] | [] | Proc Natl Acad Sci U S A | 2010 | 5/18/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
48 | GDS2958 | 6/1/2007 | ['7562'] | ['2958'] | [u'17560336'] | 2851999 | [u'20308550'] | ['Jiao', 'Xie', 'Mellinghoff', 'Kennedy', 'Rose', 'Palaskas', 'Tran', 'Wu', 'Sawyers', 'Getz', 'Vivanco', 'Finn', 'Davis', 'Loda', 'Golub'] | ['Mulholland', 'Kubek', 'Mischel', 'Versele', 'Cloughesy', 'Vivanco', 'Palaskas', 'Rohle', 'Wu', 'Tanaka', 'Perera', 'Rajasekaran', 'Hsueh', 'Dikic', 'Iwanami', 'Dang', 'Oldrini', 'Liau', 'Evans', 'Mellinghoff', 'Wolle', 'Brennan', 'Kuga'] | ['Mellinghoff', 'Wu', 'Palaskas', 'Vivanco'] | Proc Natl Acad Sci U S A | 2010 | 4/6/2010 | 0 | the A431 and HCC827 isogenic pairs, but did not find any statistically significant differences between PTEN-intact and PTEN-deficient sublines (GEO, http://www.ncbi.nlm.nih.gov/geo/ , accession no. GDS2958{{tag}}--REUSE-- ). We next examined the effect of PTEN on ligand-induced EGFR activation and subsequent signal termination. Down-regulation of the EGFR protein in response to EGF was blunted in the A431 subline wi | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
49 | GDS2960 | 9/14/2007 | ['8759'] | ['2960'] | [u'17850668'] | 2677677 | [u'19436755'] | ['Ruzzo', 'Schwartz', 'Milewicz', 'Jaeger', 'Morale', 'Yao', u'Morales', 'Francke', 'Mulvihill', 'Emond'] | ['Devriendt', 'Nitsch', 'Thienpont', 'Thorrez', 'Tranchevent', 'Moreau', 'Van'] | [] | PLoS One | 2009 | 2009 | 1 | osis) we have determined small neighborhoods of 150 or 20 neighboring genes, because for these data sets we obtained the best signal for small neighborhoods (data not shown) after applying the Fisher omnibus statistics (see Materials and Methods for more details). However, for one data set (Becker muscular dystrophy) we have determined a larger neighborhood of 2000 neighboring genes due to high signal |onsidered). We wanted the size of the neighborhood to be dependent on the disruption we found in the network. We determined this size by analyzing the observed signals obtained by applying the Fisher omnibus statistic to the list of candidate genes for different neighborhood sizes, and choosing the size for which we caught the best signal as the most reliable one. For three data sets (FXS, MFS, CF), we d|ghborhood size must be computed independently. Therefore, we first run the analysis for various value of β (from 0.001 to 0.5) and, for each, measure the signal captured by using the Fisher omnibus statistic ( ) on the rankings produced. We then generate a new p -value from the statistic S for each β using the χ 2 distribution. The value of the parameter β with t|. By contrast, for inappropriate neighborhood sizes, all candidates will have uniformly distributed p -values, leading to a low statistic S . Table S9 illustrates the signals derived by the Fisher omnibus meta-analysis using the example of FXS [11] , leading to an appropriate neighborhood size of approximately 150 genes for this data set. Data sets FXS Mendelian di|ion 1 (FMR1, OMIM *309550) Phenotype: mental retardation, macroorchidism, and distinct facial features Expression data: Nishimura et al. (2007) [11] (GEO accession number: GSE7316). Lymphoblastoid cell cultures from patients with confirmed FMR1 full mutation (CGG repeat expansion). Platform: Agilent-012391 Whole Human Genome Oligo Microarray G4112A MFS |ature, disproportionately long limbs and digits, joint laxity, eye anomalies and progressive cardiovascular problems. Expression data: Yao et al. (2007) [12] (GEO accession number GDS2960{{tag}}--REUSE--). Fibroblast cultures from patients with confirmed FBN1 missense (9) and nonsense (7) mutations as well as one multi-exon deletion. Platform: Research Genetics (Invitrogen) - GF211 Microarray Filte|b;43] ) Phenotype: chronic obstructive lung disease, bronchiectasia, and exocrine pancreatic insufficiency Expression data: Wright et al. (2006) [13] (GEO accession number GDS2143). Analysis of the nasal respiratory epithelium of cystic fibrosis (CF) patients with mild (4) or severe (5) lung disease. Platform: Affymetrix GeneChip Human Genome U133 Array Set HG-U133B BMD |ophin (DMD, OMIM *300377) Phenotype: muscle wasting and weakness, and in some cases with mental impairment. Expression data: Bakay et al. (2006) [14] (GEO accession number GDS2855). Analysis of muscle biopsy specimens from patients with various muscle diseases. Platform: Affymetrix GeneChip Human Genome U133 Array Set HG-U133B Stein-Levental syndrome Me|in-Levental syndrome [26] . Phenotype: obesity, hyperandrogenism and chronic anovulation Expression data: Cortón et al. (2007) [22] (GEO accession number: GDS2084). Omental fat biopsy from patients. Unconfirmed disorder etiology. Platform: Affymetrix GeneChip Human Genome U133 Array Set HG-U133A Supporting Information Supplementary Materials S1 (0.07 MB DO|the example of data set 1 (FXS [11] ). The neighborhood size is controlled by a weighting function (w = exp(−β⋅r). Applying the Fisher omnibus meta-analysis (S = ∑−2 ln (p-value)) for each parameter β, new p-values are generated from a Χ∧2 distribution. The parameter β | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
50 | GDS2960 | 9/14/2007 | ['8759'] | ['2960'] | [u'17850668'] | 2781753 | [u'19736252'] | ['Ruzzo', 'Schwartz', 'Milewicz', 'Jaeger', 'Morale', 'Yao', u'Morales', 'Francke', 'Mulvihill', 'Emond'] | ['Tsuda', 'Kayano', 'Mamitsuka', 'Takigawa', 'Shiga'] | [] | Bioinformatics | 2009 | 11/1/2009 | 1 | ng the lowest P -value for each gene pair of the 10 interactions in Table 4 . For example, for COX6C and UBA1, the gene pair of the first interaction of Table 4 , we found a switching mechanism in GDS2960{{tag}}--REUSE--_1 with the P -value of −3.9532, showing the statistical significance of this mechanism. This directly indicates that there must exist a switching mechanism in expression between these two |ank Gene pair #datasets GDS P - value #ex. #ex. Annotation from GEO class 1 class 2 1 {COX6C,UBA1} 117 GDS2960{{tag}}--REUSE--_1 −3.9532 60 41 Marfan syndrome: cultured skin fibroblasts 2 {RERE,TNFRSF1A} 284 GDS2736_25 −5.9049 19 15 Malignant fibrous histiocytoma and various soft tissue sarcomas 3 {ATP5D,ITCH} 324 GDS1875_3 −5.1235 27 24 Host cell response to HIV-1 Vpr-induced cell cycle arrest|GDS2733_1 −7.9996 17 17 Cytosine arabinoside effect on Ewing's sarcoma cell line 5 {NCSTN,HSPA5} 102 GDS2545_5 −6.4398 63 25 Metastatic prostate cancer (HG-U95A) 6 {NDUFA8,NDUFA6} 142 GDS2733_4 −4.7027 17 16 Cytosine arabinoside effect on Ewing's sarcoma cell line 7 {ALS2,SLC25A6} 108 GDS1627_2 −3.2808 16 15 Breast cancer cell lines response to chemotherapeutic drugs 8 {|P5J} 418 GDS2960{{tag}}--REUSE--_1 −3.1628 60 41 Marfan syndrome: cultured skin fibroblasts 9 {COX6C,UBA1} 117 GDS2960{{tag}}--REUSE--_1 −3.9532 60 41 Marfan syndrome: cultured skin fibroblasts 10 {NDUFA10,COX4} 232 GDS2643_9 −6.2133 13 12 Waldenstrom's macroglobulinemia: B lymphocytes and plasma cells For each gene pair of 10 interactions in Table 4 , the number of datasets obtained from GEO, the GDS which g|ng genes. 3 In each GDS, if it has more than two classes or replicated experiments, we consider all possible pairwise combinations of them. We then name generated multiple datasets from one GDS (e.g. GDS2960{{tag}}--REUSE--) those like GDS2960{{tag}}--REUSE--_1, GDS2960{{tag}}--REUSE--_2, etc. This results in that the number of datasets we used could be >36. The actual number of datasets for each gene pair is shown in Table 5 . 4 For each g | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
51 | GDS3069 | 8/7/2007 | ['8692'] | ['3069'] | [u'17726534'] | 2784758 | [u'19925686'] | ['Puskar', 'Petzold', 'Santiago', 'Lao', 'Papagiannakopoulos', 'Lee', 'Kornblum', 'Kosik', 'Liu', 'Doyle', 'Nelson', 'Clay', 'Qi', 'Shraiman'] | ['Wan', 'Kiseleva', 'Horton', 'Mamitsuka', 'Harada'] | [] | Source Code Biol Med | 2009 | 11/20/2009 | 1 | ts from GEO. The table shows the dimensions of the data set in the second and third columns. In these experiments, we use the entire data set, irrespective of any additional information. For example, GDS596 contains 158 experiments of two sets of replicates. Thus, it would be more appropriate to apply the system to only 79 of the experiments. Nevertheless, the purpose of these results is to demonstrate|uch smaller. Table 4 Dimensions of GEO test data and running time and memory usage of build-mst . Data set size Execution of build-mst Data set Experiments Probes Elapsed time (s) Memory usage (MB) GDS596 158 22283 23.28 201.391 GDS1962 180 54681 66.65 383.980 GDS2765 13 45101 1.03 28.305 GDS2771 192 22283 33.03 276.352 GDS3069{{tag}}--REUSE-- 12 22283 0.52 25.000 GDS3216 12 22810 0.56 25.758 The dimensions of the G|unning times are reported as seconds and averaged across 5 trials. The running time and memory usage of build-mst are shown as the last two columns of Table 4 . The longest time is associated with GDS1962, which takes just over 1 minute. As expected, the running time is more dependent on the number of experiments than the number of probes, as shown by comparing the results of GDS596 with GDS2765. As| the number of virtual processors range from 1 to 8. The files generated were Portable Network Graphics files (PNG) at a resolution of 300 dots per inch (DPI). (Higher resolutions require more time.) GDS2765: Effect of Creatine on Mice As an example, we apply both HAMSTER and hierarchical clustering to the data set GDS2765, where researchers investigated the effect creatine has on the expression level |iment type (untreated or treated), except in the sense that we chose the color associated with each experiment based on its type (untreated in red; treated in blue). In Figure 9 , the dendrogram for GDS2765 shows at least two groups - the five controls on the far left, followed by five treated samples. Four of the treated samples appear as one such group in the center. As with the example dendrogram o|ent. This is because an MST can have a node connected to any number of other nodes. In a dendrogram, though, the structure is restricted to recursive, pair-wise relationships. Figure 9 Dendrogram for GDS2765 . The dendrogram was constructed using Euclidean distance and single linkage. The control samples are in red; the treated ones are in blue. Figure 10 MST 0 for GDS2765 . The first MST generated usi|experiments remain by themselves and all other experiments have merged into node 8. The node has the default color and shape attributes since it has a mix of both attribute types. Figure 11 MST 6 for GDS2765 . The node colors correspond to those of Figure 10. So far, control and treated samples have not yet mixed within any node. The normalized association score is 97.78. Figure 12 MST 9 for GDS2765 . |ically, while the modified one now peaks in the middle. The remaining two scoring schemes (Gaps and ANOVA) grow steadily and peak after the half-way point. Figure 13 Comparison of scoring schemes for GDS2765 . A comparison of the scoring schemes available to HAMSTER corresponding to the MSTs of Figures 10, 11, and 12. All schemes represent scores as a percentage of the maximum to facilitate easy compar|w well the experiments of a data set cluster, regardless of the clustering method. We suggest that users try the scheme which is most suitable for their needs, based on the definitions given earlier. GDS3069{{tag}}: Various brain tumors Our next sample data set is GDS3069{{tag}}--REUSE-- which was used to analyze 12 primary brain tumors based on their histological diagnoses [ 30 ]. Unlike the previous data set of two distinc|wo samples (GSM215423 and GSM215425) are grouped together in the same sub-tree, while the third (GSM215422) is combined with the four samples to its right at an earlier step. Figure 14 Dendrogram for GDS3069{{tag}} . This data set reports on the expression levels of 12 primary brain tumors, divided into 5 categories. The colors assigned to them are: high grade glioblastoma (red), high grade gliosarcoma (yello| set, occurs between the two high grade gliosarcoma (yellow) samples. This example demonstrates the use of MSTs for data sets where the sample categories can be placed on a scale. Figure 15 MST 0 for GDS3069{{tag}} . The nodes of the first MST have been colored in the same way as Figure 14. Euclidean distance with average linkage has been selected. GDS3216: Whole seedling roots response to salinity stress: ti|each other. Then, the experiments would be ordered based on time. For an MST, others have shown that a chain of experiments would be formed, ordered by the time-index [ 22 ]. Figure 16 Dendrogram for GDS3216 . The study associated with this data set measured the response A. thaliana seedling roots have to salinity in a time-course experiment. The nodes are colored according to the colors of the spect|een chosen. The top dendrogram is the default one produced by agnes. By swapping the branches at the internal nodes indicated in red in this dendrogram, the one below is produced. Figure 17 MST 0 for GDS3216 . The nodes of the first MST have been colored in the same way as Figure 16. Manhattan distance with average linkage has been selected. The images show that the replicates are generally adjacent to|er V Hrabé de Angelis M Wurst W Schmidth J Klopstock T Creatine improves health and survival of mice Neurobiology of Aging 2008 29 9 1404 1411 http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2765 10.1016/j.neurobiolaging.2007.03.001 17416441 Liu T Papagiannakopoulos T Puskar K Qi S Santiago F Clay W Lao K Lee Y Nelson SF Kornblum HI Doyle F Petzold L Shraiman B Kosik KS Detection of a micro | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
52 | GDS3203 | 6/27/2007 | ['8286'] | ['3203'] | [u'18276018'] | 3002353 | [u'21122122'] | ['Liu', 'Eksarko', 'Pope', 'Huang', 'Shi'] | ['Sendelbach', 'Lorkowski', 'Maess'] | [] | BMC Mol Biol | 2010 | 12/1/2010 | 0 | Three studies using a monocyte-to-macrophage maturation model involving either THP-1 or primary human cells were available (GEO entries GDS3554, GDS3203{{key}}--REUSE-- and GDS2430) [32-34]. Unfortunately, these microarray raw data online provide average signal intensities for more than 20.000 probes but lack information on the specificity of the signal and whether the respective probe of the corresponding Affymetrix microarray was called present or absent. Thus, a comparative assessment of our selection of reference genes using these microarray data has severe limitations | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
53 | GSE10005 | 12/29/2007 | ['10005'] | [] | [u'18284684'] | 2263044 | [u'18284684'] | ['Yang', 'Padgett', 'Lee', 'Edery'] | ['Yang', 'Padgett', 'Lee', 'Edery'] | ['Yang', 'Padgett', 'Lee', 'Edery'] | BMC Genomics | 2008 | 2/19/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
54 | GSE10009 | 12/21/2007 | ['10009'] | [] | [u'18160665'] | 2275028 | [u'18160665'] | ['', 'Lam', 'Rosenwald', 'Farinha', 'Wright', 'Gascoyne', 'Chan', 'Staudt', 'Dang', 'Davis', 'Lenz'] | ['Lam', 'Rosenwald', 'Farinha', 'Wright', 'Gascoyne', 'Chan', 'Staudt', 'Dang', 'Davis', 'Lenz'] | ['Lam', 'Rosenwald', 'Farinha', 'Staudt', 'Gascoyne', 'Chan', 'Wright', 'Davis', 'Lenz', 'Dang'] | Blood | 2008 | 4/1/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
55 | GSE1154 | 5/29/2007 | ['1154'] | [] | [u'16432259', u'16981998'] | 1590023 | [u'16981998'] | [u'Schloetterer', 'Harr', 'Schl\xc3\xb6tterer'] | ['Harr', 'Schl\xc3\xb6tterer'] | ['Harr', 'Schl\xc3\xb6tterer'] | BMC Microbiol | 2006 | 9/18/2006 | 0 | AND pmc_gds | 1 | 0 | ||||
56 | GSE1986 | 2/23/2007 | ['1986'] | ['3052'] | [] | 2238883 | [u'17932064'] | [u'Murphy', u'Lindsley'] | ['Hayashi', 'Shibaoka', 'Kinoshita', 'Obayashi', 'Saeki', 'Ohta'] | [] | Nucleic Acids Res | 2008 | 2008 Jan | 1 | rks, where genes are selected from highly coexpressed pairs up to a defined number (30 genes for GO network and 1000 genes for tissue-specific network). Tissue-specific gene expression data Data from GSE3526 for human ( 21 ) and GSE1986{{tag}}--REUSE-- for mouse in NCBI GEO were used for the tissue-specific gene expression graph on the gene page . After RMA normalization, the probe intensities were averaged for each | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
57 | GSE1986 | 2/23/2007 | ['1986'] | ['3052'] | [] | 2847524 | [u'20164257'] | [u'Murphy', u'Lindsley'] | ['Wang', 'Zhang', 'Shen', 'Long', 'Marchiando', 'Raleigh', 'Turner', 'Sasaki'] | [] | Mol Biol Cell | 2010 | 4/1/2010 | 0 | 2005 ). Using the updated annotation file for the Affymetrix Mouse Genome 430 2.0 Array, Mouse4302_Mm_ENSG ( Dai et al. , 2005 ), we reanalyzed the tissue expression profile database GEO series GSE1986{{tag}}--REUSE-- ( Barrett et al. , 2007 ) using GCRMA and Bioconductor ( Gentleman et al. , 2004 ; Wu and Irizarry, 2004 ) and extracted the expression information for marvelD3, tricellulin, and occludin | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
58 | GSE1986 | 2/23/2007 | ['1986'] | ['3052'] | [] | 2711087 | [u'19538736'] | [u'Murphy', u'Lindsley'] | ['Nedorezov', 'Karmazin', 'Forabosco', 'Ottolenghi', 'Uda', 'Cao', 'Piao', 'Schlessinger', 'Garcia-Ortiz', 'Cole', 'Pelosi', 'Omari'] | [] | BMC Dev Biol | 2009 | 6/18/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
59 | GSE1993 | 5/26/2007 | ['1993'] | [] | [u'18445660'] | 2743637 | [u'19732454'] | ['Freeman', 'Plant', 'Wayland', 'Oulas', 'Liu', 'Backlund', 'Collins', 'Petalidis', 'Poirazi', 'Happerfield'] | ['Park', 'Dreyfuss', 'Johnson'] | [] | Mol Cancer | 2009 | 9/4/2009 | 0 | e or more GBMs and AAs in patient samples, regardless of platform. Table 1 Summary of the data sets Petalidis Phillips Sun Tso TOTAL AA 19 21 19 9 68 GBM 39 56 81 45 221 TOTAL 58 77 100 54 289 GEO ID GSE1993{{tag}}--REUSE-- GSE4271 GSE4290 (at UCLA) Affy chip U133A U133A and U133B U133 plus 2.0 U133A Journal Mol Cancer Ther Cancer Cell Cancer Cell Cancer Research Year 2008 2006 2006 2006 Statistical approach to combin | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
60 | GSE1993 | 5/26/2007 | ['1993'] | [] | [u'18445660'] | 2819720 | [u'18445660'] | ['Freeman', 'Plant', 'Wayland', 'Oulas', 'Liu', 'Backlund', 'Collins', 'Petalidis', 'Poirazi', 'Happerfield'] | ['Freeman', 'Plant', 'Wayland', 'Oulas', 'Liu', 'Backlund', 'Collins', 'Petalidis', 'Poirazi', 'Happerfield'] | ['Freeman', 'Plant', 'Wayland', 'Oulas', 'Liu', 'Backlund', 'Collins', 'Petalidis', 'Poirazi', 'Happerfield'] | Mol Cancer Ther | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
61 | GSE2165 | 7/20/2007 | ['2165'] | [] | [] | 2493024 | [u'18682807'] | [u'Kaminski', u'Zhou', u'Mustari', u'Hatala', u'Gong', u'Cheng'] | ['Kaminski', 'Zhou', 'Mustari', 'Hatala', 'Gong', 'Howell', 'Cheng'] | [u'Gong', u'Zhou', u'Mustari', u'Hatala', u'Kaminski', u'Cheng'] | Mol Vis | 2008 | 8/4/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
62 | GSE2627 | 10/31/2007 | ['2627'] | [] | [u'18068629'] | 2148463 | [u'18068629'] | ['Earl', 'Ahmed', 'Blenkiron', 'Bell', 'Massie', 'Swanton', 'Ibrahim', 'Mills', 'Caldas', 'McGeoch', 'Brenton', 'Vias', 'Downward', 'Nicke', 'Crawford', 'Temple', 'Laskey', 'Iyer'] | ['Earl', 'Ahmed', 'Blenkiron', 'Bell', 'Massie', 'Swanton', 'Ibrahim', 'Mills', 'Caldas', 'McGeoch', 'Brenton', 'Vias', 'Downward', 'Nicke', 'Crawford', 'Temple', 'Laskey', 'Iyer'] | ['Earl', 'Ahmed', 'Swanton', 'Bell', 'Massie', 'Ibrahim', 'Mills', 'Caldas', 'Blenkiron', 'McGeoch', 'Brenton', 'Vias', 'Downward', 'Nicke', 'Crawford', 'Temple', 'Laskey', 'Iyer'] | Cancer Cell | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
63 | GSE2888 | 4/30/2007 | ['2888'] | [] | [] | 2890980 | [u'20473674'] | [u'Schultze', u'Debey'] | ['Schork', 'Hovatta', 'Lillie', 'Zapala', 'Winn', 'Risbrough'] | [] | Mamm Genome | 2010 | 2010 Jun | 0 | AND pmc_gds | 0 | 1 | ||||
64 | GSE3165 | 5/1/2007 | ['3165'] | [] | [u'17493263', u'18782450'] | 2686096 | [u'19503830'] | ['Van', 'Herschkowitz', 'Yin', 'Quackenbush', 'Rasmussen', 'Simin', 'Weigman', 'Palazzo', 'Glazer', 'Brown', 'Bastein', 'Hu', 'Fan', 'Khramtsov', 'Mikaelian', 'He', 'Jones', 'Usary', 'Green', 'Furth', 'Backlund', 'Assefnia', 'Olopade', 'Kopelovich', 'Chandrasekharan', 'Churchill', 'Bernard', 'Perou'] | ['Lindvall', 'Zylstra', 'Evans', 'West', 'Dykema', 'Furge', 'Williams'] | [] | PLoS One | 2009 | 6/5/2009 | 0 | s repeated twice and included mammary epithelial cells collected from 4 Lrp6 +/− and 5 Lrp6 +/+ females. Expression profiling Processed expression chip used in the GSE3744 dataset contained multiple probes that mapped to LRP6. Therefore in this dataset, the average LRP6 expression value was computed for each sample and used in subsequent analysis. In the GSE3165{{tag}}--REUSE-- da | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
65 | GSE3165 | 5/1/2007 | ['3165'] | [] | [u'17493263', u'18782450'] | 2879567 | [u'20346151'] | ['Van', 'Herschkowitz', 'Yin', 'Quackenbush', 'Rasmussen', 'Simin', 'Weigman', 'Palazzo', 'Glazer', 'Brown', 'Bastein', 'Hu', 'Fan', 'Khramtsov', 'Mikaelian', 'He', 'Jones', 'Usary', 'Green', 'Furth', 'Backlund', 'Assefnia', 'Olopade', 'Kopelovich', 'Chandrasekharan', 'Churchill', 'Bernard', 'Perou'] | ['Smyth', 'Lim', 'Visvader', 'Lindeman', 'Bouras', 'Yagita', 'Wu', 'Vaillant', 'Asselin-Labat', 'Pal'] | [] | Breast Cancer Res | 2010 | 2010 | 0 | AND pmc_gds | 0 | 1 | ||||
66 | GSE3165 | 5/1/2007 | ['3165'] | [] | [u'17493263', u'18782450'] | 1929138 | [u'17493263'] | ['Van', 'Herschkowitz', 'Yin', 'Quackenbush', 'Rasmussen', 'Simin', 'Weigman', 'Palazzo', 'Glazer', 'Brown', 'Bastein', 'Hu', 'Fan', 'Khramtsov', 'Mikaelian', 'He', 'Jones', 'Usary', 'Green', 'Furth', 'Backlund', 'Assefnia', 'Olopade', 'Kopelovich', 'Chandrasekharan', 'Churchill', 'Bernard', 'Perou'] | ['Van', 'Herschkowitz', 'Yin', 'Quackenbush', 'Rasmussen', 'Simin', 'Weigman', 'Palazzo', 'Glazer', 'Brown', 'Bastein', 'Hu', 'Khramtsov', 'Mikaelian', 'Jones', 'Usary', 'Green', 'Furth', 'Backlund', 'Assefnia', 'Olopade', 'Kopelovich', 'Chandrasekharan', 'Churchill', 'Bernard', 'Perou'] | ['Van', 'Herschkowitz', 'Yin', 'Quackenbush', 'Rasmussen', 'Simin', 'Weigman', 'Palazzo', 'Glazer', 'Brown', 'Bastein', 'Hu', 'Khramtsov', 'Mikaelian', 'Jones', 'Usary', 'Green', 'Furth', 'Backlund', 'Assefnia', 'Olopade', 'Kopelovich', 'Chandrasekharan', 'Churchill', 'Bernard', 'Perou'] | Genome Biol | 2007 | 2007 | 1 | AND pmc_gds | 1 | 0 | ||||
67 | GSE3165 | 5/1/2007 | ['3165'] | [] | [u'17493263', u'18782450'] | 2614508 | [u'18782450'] | ['Van', 'Herschkowitz', 'Yin', 'Quackenbush', 'Rasmussen', 'Simin', 'Weigman', 'Palazzo', 'Glazer', 'Brown', 'Bastein', 'Hu', 'Fan', 'Khramtsov', 'Mikaelian', 'He', 'Jones', 'Usary', 'Green', 'Furth', 'Backlund', 'Assefnia', 'Olopade', 'Kopelovich', 'Chandrasekharan', 'Churchill', 'Bernard', 'Perou'] | ['Perou', 'Herschkowitz', 'Fan', 'He'] | ['Perou', 'Herschkowitz', 'Fan', 'He'] | Breast Cancer Res | 2008 | 2008 | 0 | AND pmc_gds | 1 | 0 | ||||
68 | GSE3178 | 2/20/2007 | ['3178'] | [] | [u'17150101'] | 1764759 | [u'17150101'] | ['', 'Oh', 'Perou', 'Herschkowitz', 'Troester', 'Hoadley', 'Barbier', 'He'] | ['Oh', 'Perou', 'Herschkowitz', 'Troester', 'Hoadley', 'Barbier', 'He'] | ['Oh', 'Perou', 'Herschkowitz', 'Troester', 'Hoadley', 'Barbier', 'He'] | BMC Cancer | 2006 | 12/6/2006 | 0 | AND pmc_gds | 1 | 0 | ||||
69 | GSE3579 | 10/1/2007 | ['3579'] | [] | [u'17919333'] | 2174954 | [u'17919333'] | ['', 'Koga', 'Taboada', 'van', 'Godschalk', 'Acedillo', 'Gilbert', 'Endtz', 'Nash', 'Yuki'] | ['Koga', 'Taboada', 'van', 'Godschalk', 'Acedillo', 'Gilbert', 'Endtz', 'Nash', 'Yuki'] | ['Koga', 'Taboada', 'van', 'Godschalk', 'Acedillo', 'Gilbert', 'Endtz', 'Nash', 'Yuki'] | BMC Genomics | 2007 | 10/5/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
70 | GSE3583 | 6/20/2007 | ['3583'] | ['2911'] | [u'17708681'] | 1950164 | [u'17708681'] | ['Lee', 'Cashorali', 'Gusella', 'MacDonald', 'Kohane', 'Ivanova', 'Seong'] | ['Lee', 'Cashorali', 'Gusella', 'MacDonald', 'Kohane', 'Ivanova', 'Seong'] | ['Lee', 'Cashorali', 'Gusella', 'MacDonald', 'Kohane', 'Ivanova', 'Seong'] | PLoS Genet | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
71 | GSE3629 | 2/28/2007 | ['3629'] | [] | [u'17255260'] | 2797819 | [u'19961616'] | ['Kobunai', 'Sugimoto', 'Watanabe', 'Kazama', 'Yokoyama', 'Ajioka', 'Okayama', 'Muto', 'Nagawa', 'Oka', 'Hata', 'Toda', 'Tanaka', 'Konishi', 'Yamamoto', 'Kanazawa', 'Sasaki', 'Kojima'] | ['Geman', 'Price', 'Edelman', 'Toia', 'Zhang'] | [] | BMC Genomics | 2009 | 12/5/2009 | 0 | s Classification Task Tissue Source Samples (Positive/Negative) GEO ID # Probes GI Stromal Tumor vs Leiomyosarcoma GI Biopsy 68 (37/31) N/A 43,931 Crohn's Disease vs Healthy Controls PBMC 101 (59/42) GDS1615 22,283 Ischemic vs Idiopathic Cardiomyopathy Cardiac Biopsy 194 (86/108) GSE5406 22,283 Type I Diabetes vs Healthy Controls PBMC 105 (81/24) GSE9006 22,283 Type II Diabetes vs Healthy Controls PBMC| Ulcerative Colitis W/WO Transformation Colon Biopsy 54 (11/43) GSE3629{{tag}}--REUSE-- 54,681 Gram-Negative vs Gram-Positive Infection PBMC 73 (29/44) GSE6269 22,283 Gram-Negative vs Viral Infection PBMC 62 (18/44) GSE6269 22,283 HIV Infection vs Healthy Controls PBMC 86 (74/12) GDS1449 8793 Microarray gene expression datasets obtained from the Gene Expression Omnibus. Transcriptional analysis was performed on either | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
72 | GSE3657 | 1/5/2007 | ['3657'] | [] | [u'17220229'] | 1899381 | [u'17220229'] | ['Vazquez-Torres', 'McClelland', 'Evans', 'Troxell', 'Hassan', 'Fink', 'Jones-Carson', 'Libby', 'Porwollik'] | ['Vazquez-Torres', 'McClelland', 'Evans', 'Troxell', 'Hassan', 'Fink', 'Jones-Carson', 'Libby', 'Porwollik'] | ['Vazquez-Torres', 'McClelland', 'Evans', 'Troxell', 'Hassan', 'Fink', 'Jones-Carson', 'Libby', 'Porwollik'] | J Bacteriol | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
73 | GSE3748 | 2/8/2007 | ['3748'] | [] | [u'17360649', u'17462724'] | 2853688 | [u'20223837'] | ['Siepka', 'Yoo', u'Straume', 'Lee', 'Zhang', 'Miller', 'Esser', 'Takahashi', 'Antoch', 'Kumar', 'Hayes', 'Park', 'Andrews', 'Song', 'Walker', 'Hu', 'McDearmon', 'Panda', 'Hogenesch'] | ['Shiraishi', 'Kimura', 'Okada'] | [] | Bioinformatics | 2010 | 4/15/2010 | 0 | riction. 4.1 Data We adopted the time-course gene expression profile of Mus musculus circadian liver cells as experimental data, which is available at Gene Expression Omnibus (GEO, Accession number GSE3748{{tag}}--REUSE--). Samples were collected every 4 h for 48 h, for a total of 12 time points. We focused on 853 circadian genes, the list of which is available from Table 1 of the supporting information of Miller | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
74 | GSE3751 | 2/7/2007 | ['3751'] | [] | [u'17360649', u'17462724'] | 1802006 | [u'17360649'] | ['Siepka', 'Yoo', u'Straume', 'Lee', 'Zhang', 'Miller', 'Esser', 'Takahashi', 'Antoch', 'Kumar', 'Hayes', 'Park', 'Andrews', 'Song', 'Walker', 'Hu', 'McDearmon', 'Panda', 'Hogenesch'] | ['Zhang', 'Miller', 'Esser', 'Takahashi', 'Antoch', 'Hayes', 'Andrews', 'Walker', 'McDearmon', 'Panda', 'Hogenesch'] | ['Zhang', 'Miller', 'Esser', 'Takahashi', 'Antoch', 'Hayes', 'Andrews', 'Walker', 'McDearmon', 'Panda', 'Hogenesch'] | Proc Natl Acad Sci U S A | 2007 | 2/27/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
75 | GSE3845 | 12/1/2007 | ['3845'] | [] | [u'18055696'] | 2113026 | [u'18055696'] | ['Segal', 'Zhang', 'Sinha', 'Kawahara', 'Chang', 'Adler'] | ['Segal', 'Zhang', 'Sinha', 'Kawahara', 'Chang', 'Adler'] | ['Segal', 'Zhang', 'Sinha', 'Kawahara', 'Chang', 'Adler'] | Genes Dev | 2007 | 12/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
76 | GSE3950 | 7/12/2007 | ['3950'] | [] | [u'17621275'] | 2360382 | [u'17712314'] | ['Franc', 'Thomas', 'Delys', 'Maenhaut', 'Tronko', 'Bogdanova', 'Detours', 'Libert', 'Dumont'] | ['Franc', 'Delys', 'Thomas', 'Weiss', 'Maenhaut', 'Bogdanova', 'Detours', 'Libert', 'Dumont'] | ['Franc', 'Delys', 'Thomas', 'Maenhaut', 'Bogdanova', 'Detours', 'Libert', 'Dumont'] | Br J Cancer | 2007 | 9/17/2007 | 0 | tail of BRAF-RET/PTC status determination, RNA processing and microarray data preprocessing is available in Supplementary information file S3 . Microarray data are available from the Gene Expression Omnibus ( www.ncbi.nlm.nih.gov/geo ), accession number GSE3950{{tag}} . Comparison of microarray platforms Jarzab et al (2005) data were downloaded from www.genomika.pl/thyroidcancer/PTCCancerRes.html . We |ssigned randomly to samples and counting how many runs produced classification error below the error obtained on the actual data. Derivation of the γ -radiation vs H 2 O 2 signature We downloaded the Supplementary data set S2 of Amundson et al (2005) from the Oncogene web site ( www.nature.com/onc/index.html ). Genes with expression values differing by 1.5-fold between the 2.5 Gy &#x|A-damaging agents: H 2 O 2 , radiation (neutron and γ- rays at 2.5 and 8 Gy), adriamycin, arsenite, campothecin, CdCl 2 , cisplatin, methyl methanesulphonate and UVB (280−320 nm). We downloaded the expression data published with the paper and produced the hierarchical clustering shown in Figure 4 (see online Materials and Methods). The responses to 200 μ M H 2 O 2 and to 2.5 G | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
77 | GSE3990 | 1/6/2007 | ['3990'] | ['2769'] | [u'17028315'] | 2785812 | [u'19917117'] | ['Deng', 'Meller'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990{{tag}}--REUSE-- Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
78 | GSE3990 | 1/6/2007 | ['3990'] | ['2769'] | [u'17028315'] | 1698640 | [u'17028315'] | ['Deng', 'Meller'] | ['Deng', 'Meller'] | ['Deng', 'Meller'] | Genetics | 2006 | 2006 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
79 | GSE3990 | 1/6/2007 | ['3990'] | ['2769'] | [u'17028315'] | 2691757 | [u'19307603'] | ['Deng', 'Meller'] | ['Koya', 'Deng', 'Meller', 'Kong'] | ['Deng', 'Meller'] | Genetics | 2009 | 2009 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
80 | GSE4020 | 1/11/2007 | ['4020'] | [] | [u'17335568'] | 1829156 | [u'17335568'] | ['Pera', 'Hawes', 'Nikolic-Paterson', 'Grimmond', 'Stamp', 'Laslett', 'Haylock', u'David', 'Gardiner', 'Lin', 'Wormald'] | ['Pera', 'Hawes', 'Nikolic-Paterson', 'Grimmond', 'Stamp', 'Laslett', 'Haylock', 'Gardiner', 'Lin', 'Wormald'] | ['Pera', 'Hawes', 'Nikolic-Paterson', 'Grimmond', 'Stamp', 'Laslett', 'Haylock', 'Gardiner', 'Lin', 'Wormald'] | BMC Dev Biol | 2007 | 3/2/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
81 | GSE4044 | 1/15/2007 | ['4044'] | [] | [u'16507160'] | 1431733 | [u'16507160'] | ['Schroder', 'Kasukawa', 'Taylor', 'Kawai', 'Carninci', 'Grimmond', u'Crowe', 'Lo', 'Kai', 'Himes', 'Katayama', 'Wells', 'Chalk', u'chalk', 'Waddell', 'Faulkner', 'Hume', 'Forrest', 'Kawaji', 'Hayashizaki'] | ['Schroder', 'Kasukawa', 'Taylor', 'Kawai', 'Carninci', 'Grimmond', 'Lo', 'Kai', 'Himes', 'Katayama', 'Wells', 'Chalk', 'Waddell', 'Faulkner', 'Hume', 'Forrest', 'Kawaji', 'Hayashizaki'] | ['Schroder', 'Kawai', 'Taylor', 'Carninci', 'Grimmond', 'Lo', 'Kai', 'Himes', 'Katayama', 'Wells', 'Kawaji', 'Chalk', 'Waddell', 'Faulkner', 'Hume', 'Forrest', 'Kasukawa', 'Hayashizaki'] | Genome Biol | 2006 | 2006 | 0 | AND pmc_gds | 1 | 0 | ||||
82 | GSE4064 | 1/19/2007 | ['4064'] | [] | [u'17407579'] | 1853087 | [u'17407579'] | ['Nieto-D\xc3\xadaz', 'Nieto-Sampedro', u'Nieto-Diaz', 'Pita-Thomas'] | ['Nieto-D\xc3\xadaz', 'Nieto-Sampedro', 'Pita-Thomas'] | ['Nieto-D\xc3\xadaz', 'Nieto-Sampedro', 'Pita-Thomas'] | BMC Genomics | 2007 | 4/3/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
83 | GSE4091 | 12/1/2007 | ['4091'] | [] | [u'17347439'] | 2790488 | [u'19834459'] | ['Mito', 'Henikoff'] | ['van', 'Hogan', 'Braunschweig', 'Pagie'] | [] | EMBO J | 2009 | 12/2/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
84 | GSE4107 | 3/20/2007 | ['4107'] | ['2609'] | [u'17317818'] | 2869450 | [u'19383890'] | ['Eu', '', 'Hong', 'Cheah', 'Ho'] | ['Guess', 'Quaranta', 'Lafleur', 'Weidow'] | [] | Cancer Epidemiol Biomarkers Prev | 2009 | 2009 May | 0 | an be used to distinguish between early- and advanced-stage cancer specimens and shed light on mechanistic questions raised by previous studies. Statistical analyses of human microarray data from the publicly available expression project in Oncology (expO) dataset, including examination of the distributions of Ln-332 subunit mRNA levels, identified a significant decrease in the Ln-332 β3:γ|standardized Affymetrix microarray platform commonly used for transcriptional analysis ( 20 , 21 ). The expression project in Oncology (expO) dataset (National Center for Biotechnology Information, GSE2109 ), compiled by the International Genomics Consortium, was chosen for data-mining due to the high number ( n = 870) of tumors from different tissue types and various pathologic stages available at |y experiments were done per manufacturer’s protocol by the International Genomics Consortium for their Expression Project for Oncology, using Affymetrix HG-U33 Plus 2.0 chips. This dataset is publicly available via the National Center for Biotechnology Information Gene Expression Omnibus website ( http://www.ncbi.nlm.nih.gov/geo/ ), under accession number GSE2109 . Expression data and associated |samples with associated Pstage data listed were considered ( n = 710). Further, because “normal” (i.e., noncancerous) controls were not a part of the expO study, a separate dataset ( GSE4107{{tag}}--REUSE-- ; n = 10), obtained using the same methodology, was used as a baseline for our studies. Data downloaded by the Vanderbilt Microarray Shared Resource were provided as a single text file and subseq|g ( A ) log-transformed LAMC2 versus LAMC2* mRNA expression levels of non-carcinoma ( n = 56), low-stage ( n = 163), and metastatic ( n = 151) cancer samples from expO microarray dataset (NCBI GEO GSE 2109) or ( B ) log-transformed (more ...) Figure 1 Box and whisker plots showing ( A ) log-transformed LAMC2 versus LAMC2* mRNA expression levels of non-carcinoma ( n = 56), low-stage ( n = 163), |00b1; 0.3221. Because there were no normal (non–cancer) tissues sampled in the expO dataset, we used normal colon samples from the National Center for Biotechnology Information GEO’s GSE4107{{tag}}--REUSE-- dataset for a baseline comparison with expO’s colon cancer samples. These samples were analyzed exactly as the samples in the expO dataset. Interestingly, LAMC2 and LAMC2* transcript level|ancer ( n = 41) to metastatic cancer ( n = 29; B ).  Other Sectionsâ�¼ Abstract Introduction Materials and Methods Results Discussion References Discussion Advanced analyses of a previously existing publicly available database, consisting of standardized Affymetrix mRNA measurements from various cancer tissue types, enabled us to examine critical aspects of Ln-332 expression largely overlooked in commonl | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
85 | GSE4107 | 3/20/2007 | ['4107'] | ['2609'] | [u'17317818'] | 2732297 | [u'18990722'] | ['Eu', '', 'Hong', 'Cheah', 'Ho'] | ['Tarca', 'Mittal', 'Romero', 'Kusanovic', 'Draghici', 'Kim', 'Hassan', 'Khatri'] | [] | Bioinformatics | 2009 | 1/1/2009 | 0 | dataset compares 12 colorectal cancer samples with 10 normal samples (Hong et al. , 2007 ) using Affymetrix HG-U133 Plus 2.0. microarray platform. This dataset is available via the Gene Expression Omnibus (ID= GSE4107{{tag}}--REUSE-- ) and it will be referred to as the Colorectal cancer dataset. Several pathways are known to be relevant to the colorectal cancer, including the colorectal pathway itself, the PPAR | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
86 | GSE4107 | 3/20/2007 | ['4107'] | ['2609'] | [u'17317818'] | 2637897 | [u'19146704'] | ['Eu', '', 'Hong', 'Cheah', 'Ho'] | ['Jen', 'Lin', 'Tung', 'Wang', 'Hsu'] | [] | BMC Genomics | 2009 | 1/16/2009 | 0 | biological function interpretation. Methods Data collections and preprocessing Six independent data sets (Normal, HCC 1 , HCC 2 , Tumor 1 , Tumor 2 , Tumor 3 ), including one normal tissue data set (GSE3526), two HCC data sets (E-TABM-36 and GSE6764), and data sets for other three tumor types: thyroid cancer (GSE3678), colon cancer(GSE4107{{tag}}--REUSE--) and lung cancer (GSE7670), were downloaded from two public ar | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
87 | GSE4107 | 3/20/2007 | ['4107'] | ['2609'] | [u'17317818'] | 2955048 | [u'20846437'] | ['Eu', '', 'Hong', 'Cheah', 'Ho'] | ['Carter', 'Xu'] | [] | BMC Bioinformatics | 2010 | 9/16/2010 | 0 | 144 609 Hochberg 0 144 609 SidakSD 0 144 614 BH 0 407 5552 BY 0 227 2221 qvalue 0 407 6108 SAM 0 0 5330 Bayes 0 0 5705 EDR 5 593 4810 The raw cel files of these three data sets [ 21 , 23 , 25 ] were downloaded from the NCBI GEO database (GSE7146, GSE7333, GSE4107{{tag}}--REUSE--) and were preprocessed by the GC-RMA method. Two groups in each data set were tested by two-tailed t test assuming equal variance. All multiple|0 543 1319 891 1628 1724 TN 3774 4238 3424 3870 3110 3011 FN 91 118 80 98 75 72 TPR 0.4972 0.3481 0.5580 0.4586 0.5856 0.6022 FPR 0.2061 0.1136 0.2781 0.1871 0.3436 0.3641 The expression data set was downloaded from http://www.ambystoma.org and was preprocessed by the RMA method [ 38 ]. Differentially expressed genes (DEGs) were detected at the significance level of 0.05 by the EDR method and the other |ining 7129 probe sets. Only three genes were reported to be regulated by insulin in human muscle cell using a Wilcoxon signed rank test after filtering removed 5952 probe sets. The raw cel files were downloaded from the NCBI GEO database (GSE7146) containing data that are MIAME compliant as detailed on the MGED Society website http://www.mged.org/Workgroups/MIAME/miame.html The GC-RMA algorithm was used|d type and three miR-1-2 knockout mice at postnatal days 10 were compared for gene expression levels using Affymetrix mouse genome 430 2.0 array that contains 45101 probe sets. The raw cel files were downloaded from the NCBI GEO database (GSE7333) and were preprocessed by GC-RMA algorithm. With this data set, the EDR method was compared with 11 other multiple test procedures (Figure 2 ) at the same signi|ing the GeneChip U133-Plus 2.0 Array [ 25 ]. Twelve tumor specimens and ten adjacent grossly normal-appearing tissues from at least 8 cm away were collected for RNA extraction. The raw cel files were downloaded from the NCBI GEO database (GSE4107{{tag}}--REUSE--) and were preprocessed by GC-RMA algorithm. With this data set, the EDR method was compared with the other 11 multiple test procedures (Figure 2 ) at the same s|dent limb regeneration [ 26 ]. The same RNA samples were detected by Ambystoma GeneChip and 454 cDNA sequencing. There are total 4844 probe sets (TGs) on this GeneChip array. The raw cel files were downloaded from the public Ambystoma Microarray Database [ 37 ]. Detailed information of these data files and the DEGs confirmation by 454 cDNA sequencing were described in the original study [ 26 ]. The RMA | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
88 | GSE4107 | 3/20/2007 | ['4107'] | ['2609'] | [u'17317818'] | 2981572 | [u'21085593'] | ['Eu', '', 'Hong', 'Cheah', 'Ho'] | ['Bader', 'Emili', 'Stueker', 'Merico', 'Isserlin'] | [] | PLoS One | 2010 | 11/15/2010 | 0 | d according to a statistic measuring difference in one experimental condition versus another. Ranked lists are analyzed for enrichment in known sets of functionally related genes (e.g. pathways) from publicly accessible databases. An enrichment map is drawn, representing the enrichment results as a network of gene-sets (nodes) related by their similarity (edges), with enrichment significance encoded by th|es weighted to consider the most informative genes in the gene set (such as the most differentially expressed). Materials and Methods Microarray data analysis All microarray gene expression data were downloaded from the NCBI GEO (Gene Expression Omnibus) database. The raw .CEL files were processed with the rma statistical model for gene expression signals, using the Bioconductor [54] |ar separation of samples from different classes). Enrichment analysis was performed after conversion from Affymetrix to NCBI Entrez-Gene identifiers, utilizing the Bioconductor hgu133plus2 package (downloaded March 2009). Estrogen treatment of breast cancer cells The microarray data (GSE11352) were originally composed of 18 samples, with 3 replicates for every one of the 6 classes (3 time-points for tre|ets satisfying these enrichment thresholds: nominal p-value<0.001, FDR<5%. The enrichment map overlap coefficient was set to 0.5. Early Onset Colon Cancer The microarray data (GSE4107{{tag}}--REUSE--) are composed of 22 samples, with 10 normal and 12 colon cancer samples (colonic mucosa surgical samples). The data set was analyzed using GSEA, t-test , 2000 gene-set permutations. The enrichme|ing thresholds: nominal p-value<0.001, FDR<5%. The overlap coefficient was set to 0.5. Gene-set pre-processing Human Gene Ontology (GO) [11] annotations were downloaded from Bioconductor, org.Hs.eg.db package (March 2009). In order to maximize the coverage of GO annotations, no evidence code based filter was applied. Terms annotating more than 500 or less than 1 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
89 | GSE4107 | 3/20/2007 | ['4107'] | ['2609'] | [u'17317818'] | 2949900 | [u'20875095'] | ['Eu', '', 'Hong', 'Cheah', 'Ho'] | ['Dawany', 'Tozeren'] | [] | BMC Bioinformatics | 2010 | 9/27/2010 | 0 | he individual datasets usually small in size, but the inferences made from individual studies are often inconsistent with similar studies [ 1 ]. As thousands of microarray samples have accumulated in publicly accessible databases in the last decade [ 2 - 4 ], several statistical methods have been developed to allow for the combination and comparison of data from multiple sources. Among the many methodolog|ons based on hypergeometric test. Table 1 Overview of datasets used and distribution of microarray samples Analysis Tissue Accession # Normal Cancer Platform IV1/IV2/SAM1/SAM2 Colon E-MTAB-57 22 25 A GSE4107{{tag}}--REUSE-- 10 12 P2 GSE4183 8 15 P2 Kidney E-TABM-282 11 16 P2 GSE11024† 12 60 P2 GSE11151 3 57 P2 GSE14762† 12 10 P2 GSE15641 23 57 A GSE6344 10 10 A GSE7023 12 35 P2 Liver GSE14323 19 47 A/A|2 49 58 A GSE7670 27 27 A Pancreas E-MEXP-1121† 6 17 A E-MEXP-950 11 14 A GSE15471 39 39 P2 GSE16515 15 36 P2 Total: 294 619 SAM2 Colon E-MEXP-1224 0 55 A E-MEXP-383 0 36 A E-TABM-176 55 0 P2 GSE12945 0 36 A GSE17538 0 232 P2 Kidney GSE10320 0 144 A GSE11904 0 21 A2 Liver E-TABM-292 0 32 A E-TABM-36 0 57 A GSE9843 0 69 P2 Lung GSE10445 0 72 P2 GSE12667 0 75 P2 Total: 55 829 IV2 Colon GSE6988 28|5E-257 No data - 262 2.34E-299 * Only 338 genes are used for colon IV1 Moreover, to assess the effect of the refRMA method in normalizing data, three samples from different colon datasets (E-MTAB-57, GSE4107{{tag}}--REUSE-- and GSE4183) were chosen. The expression values for the three arrays were obtained based on classical RMA and refRMA normalization techniques. Quantile-quantile plots were produced to compare the d|election A total of 31 Affymetrix microarray datasets containing 1,768 unique samples from human cancer (1,429) and corresponding healthy control tissues (339) were collected from the Gene Expression Omnibus (GEO; [ 2 , 3 ] and Array Express [ 4 ] online repositories (Additional File 2 ). Samples were selected for 5 different tissue types: colon, kidney, liver, lung and pancreas, then categorized into c|ms and the conversion of data to Entrez IDs resulted in the study of varying number of genes per dataset as well as different total overlap with the common Affymetrix platform (shown in parentheses); GSE6988: 9,072 (5,834) genes, GSE3: 12,452 (6,598) genes, GSE7367: 2118 (1,301) genes, GSE2088: 13754 (7,038) genes, and GSE8596: 6740 (4,330) genes. The datasets contained cancer versus normal samples fro|NCBI GEO: mining tens of millions of expression profiles--database and tools update Nucleic Acids Res 2007 35 Database D760 765 10.1093/nar/gkl887 17099226 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 1 207 210 10.1093/nar/30.1.207 11752295 Brazma A Parkinson H Sarkans U Shojatalab M Vilo J Abeygunawardena N Ho | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
90 | GSE4107 | 3/20/2007 | ['4107'] | ['2609'] | [u'17317818'] | 2944782 | [u'20885780'] | ['Eu', '', 'Hong', 'Cheah', 'Ho'] | ['Klassen', 'Khush', 'Morgan', 'Valantine', 'Dudley', 'Li', 'Sigdel', 'Sarwal', 'Kambham', 'Chen', 'Butte', 'Hsieh', 'Caohuu'] | [] | PLoS Comput Biol | 2010 | 9/23/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
91 | GSE4183 | 10/31/2007 | ['4183'] | [] | [u'19461970', u'20087348'] | 2949900 | [u'20875095'] | ['', 'Szallasi', 'Wichmann', 'Spis\xc3\xa1k', 'Eklund', 'Kren\xc3\xa1cs', 'Valcz', 'Gyorffy', 'Tulassay', 'Molnar', 'T\xc3\xb3th', 'Moln\xc3\xa1r', 'Lage', 'Sipos', 'Galamb', 'Solymosi'] | ['Dawany', 'Tozeren'] | [] | BMC Bioinformatics | 2010 | 9/27/2010 | 0 | he individual datasets usually small in size, but the inferences made from individual studies are often inconsistent with similar studies [ 1 ]. As thousands of microarray samples have accumulated in publicly accessible databases in the last decade [ 2 - 4 ], several statistical methods have been developed to allow for the combination and comparison of data from multiple sources. Among the many methodolog|ons based on hypergeometric test. Table 1 Overview of datasets used and distribution of microarray samples Analysis Tissue Accession # Normal Cancer Platform IV1/IV2/SAM1/SAM2 Colon E-MTAB-57 22 25 A GSE4107 10 12 P2 GSE4183{{tag}}--REUSE-- 8 15 P2 Kidney E-TABM-282 11 16 P2 GSE11024† 12 60 P2 GSE11151 3 57 P2 GSE14762† 12 10 P2 GSE15641 23 57 A GSE6344 10 10 A GSE7023 12 35 P2 Liver GSE14323 19 47 A/A|2 49 58 A GSE7670 27 27 A Pancreas E-MEXP-1121† 6 17 A E-MEXP-950 11 14 A GSE15471 39 39 P2 GSE16515 15 36 P2 Total: 294 619 SAM2 Colon E-MEXP-1224 0 55 A E-MEXP-383 0 36 A E-TABM-176 55 0 P2 GSE12945 0 36 A GSE17538 0 232 P2 Kidney GSE10320 0 144 A GSE11904 0 21 A2 Liver E-TABM-292 0 32 A E-TABM-36 0 57 A GSE9843 0 69 P2 Lung GSE10445 0 72 P2 GSE12667 0 75 P2 Total: 55 829 IV2 Colon GSE6988 28|5E-257 No data - 262 2.34E-299 * Only 338 genes are used for colon IV1 Moreover, to assess the effect of the refRMA method in normalizing data, three samples from different colon datasets (E-MTAB-57, GSE4107 and GSE4183{{tag}}--REUSE--) were chosen. The expression values for the three arrays were obtained based on classical RMA and refRMA normalization techniques. Quantile-quantile plots were produced to compare the d|election A total of 31 Affymetrix microarray datasets containing 1,768 unique samples from human cancer (1,429) and corresponding healthy control tissues (339) were collected from the Gene Expression Omnibus (GEO; [ 2 , 3 ] and Array Express [ 4 ] online repositories (Additional File 2 ). Samples were selected for 5 different tissue types: colon, kidney, liver, lung and pancreas, then categorized into c|ms and the conversion of data to Entrez IDs resulted in the study of varying number of genes per dataset as well as different total overlap with the common Affymetrix platform (shown in parentheses); GSE6988: 9,072 (5,834) genes, GSE3: 12,452 (6,598) genes, GSE7367: 2118 (1,301) genes, GSE2088: 13754 (7,038) genes, and GSE8596: 6740 (4,330) genes. The datasets contained cancer versus normal samples fro|NCBI GEO: mining tens of millions of expression profiles--database and tools update Nucleic Acids Res 2007 35 Database D760 765 10.1093/nar/gkl887 17099226 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 1 207 210 10.1093/nar/30.1.207 11752295 Brazma A Parkinson H Sarkans U Shojatalab M Vilo J Abeygunawardena N Ho | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
92 | GSE4183 | 10/31/2007 | ['4183'] | [] | [u'19461970', u'20087348'] | 2948500 | [u'20957034'] | ['', 'Szallasi', 'Wichmann', 'Spis\xc3\xa1k', 'Eklund', 'Kren\xc3\xa1cs', 'Valcz', 'Gyorffy', 'Tulassay', 'Molnar', 'T\xc3\xb3th', 'Moln\xc3\xa1r', 'Lage', 'Sipos', 'Galamb', 'Solymosi'] | ['Jarosz', 'Rubel', 'Oledzki', 'Paziewska', 'Pachlewski', 'Goryca', 'Skrzypczak', 'Mikula', 'Ostrowsk'] | [] | PLoS One | 2010 | 10/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
93 | GSE4183 | 10/31/2007 | ['4183'] | [] | [u'19461970', u'20087348'] | 2944782 | [u'20885780'] | ['', 'Szallasi', 'Wichmann', 'Spis\xc3\xa1k', 'Eklund', 'Kren\xc3\xa1cs', 'Valcz', 'Gyorffy', 'Tulassay', 'Molnar', 'T\xc3\xb3th', 'Moln\xc3\xa1r', 'Lage', 'Sipos', 'Galamb', 'Solymosi'] | ['Klassen', 'Khush', 'Morgan', 'Valantine', 'Dudley', 'Li', 'Sigdel', 'Sarwal', 'Kambham', 'Chen', 'Butte', 'Hsieh', 'Caohuu'] | [] | PLoS Comput Biol | 2010 | 9/23/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
94 | GSE4183 | 10/31/2007 | ['4183'] | [] | [u'19461970', u'20087348'] | 2680989 | [u'19461970'] | ['', 'Szallasi', 'Wichmann', 'Spis\xc3\xa1k', 'Eklund', 'Kren\xc3\xa1cs', 'Valcz', 'Gyorffy', 'Tulassay', 'Molnar', 'T\xc3\xb3th', 'Moln\xc3\xa1r', 'Lage', 'Sipos', 'Galamb', 'Solymosi'] | ['Lage', 'Molnar', 'Szallasi', 'Eklund', 'Gyorffy'] | ['Szallasi', 'Gyorffy', 'Molnar', 'Eklund', 'Lage'] | PLoS One | 2009 | 5/21/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
95 | GSE4284 | 12/20/2007 | ['4284'] | [] | [u'17332498'] | 1855038 | [u'17332498'] | ['', 'Eshaghi', 'Li', 'Peng', 'Liu', 'Chu', 'Karuturi'] | ['Eshaghi', 'Li', 'Peng', 'Liu', 'Chu', 'Karuturi'] | ['Eshaghi', 'Li', 'Peng', 'Liu', 'Chu', 'Karuturi'] | Mol Biol Cell | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
96 | GSE4286 | 10/5/2007 | ['4286'] | ['3018'] | [u'17485520'] | 2118572 | [u'17485520'] | ['Duisters', 'Heymans', 'Pinto', 'van', 'Debets', 'Bertrand', 'Kubben', 'Schroen', 'Schellings', 'Schwake', 'Saftig', 'Janssen', 'Leenders', 'H\xc3\xb8ydal'] | ['Duisters', 'Heymans', 'Pinto', 'van', 'Debets', 'Bertrand', 'Kubben', 'Schroen', 'Schellings', 'Schwake', 'Saftig', 'Janssen', 'Leenders', 'H\xc3\xb8ydal'] | ['Duisters', 'Heymans', 'Pinto', 'van', 'Schwake', 'Bertrand', 'Kubben', 'Schroen', 'Schellings', 'Saftig', 'Debets', 'Leenders', 'H\xc3\xb8ydal', 'Janssen'] | J Exp Med | 2007 | 5/14/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
97 | GSE4302 | 10/12/2007 | ['4302'] | [] | [u'17898169'] | 2967749 | [u'21044366'] | ['Pantoja', 'Donnelly', 'Barker', 'Woodruff', 'Sidhu', 'Fahy', 'Erle', 'Yang', 'Dao-Pick', 'Boushey', 'Yamamoto', 'Ellwanger', 'Dolganov'] | ['Yousif', 'Mbagwu', 'Ohno-Machado', 'Lacson'] | [] | BMC Bioinformatics | 2010 | 10/28/2010 | 0 | es/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background The amount of data deposited in the Gene Expression Omnibus (GEO) has expanded significantly. It is important to ensure that these data are properly annotated with clinical data and descriptions of experimental conditions so that they can be useful for future|istency. Association between relevant variables, however, was adequate. 10–12 March 2010 2010 AMIA Summit on Translational Bioinformatics San Francisco, CA, USA Background The Gene Expression Omnibus (GEO) project was initiated by the National Center for Biotechnology Information (NCBI) to serve as a repository for gene expression data [ 1 , 2 ]. In addition to GEO, there are several other large-| 400,000 samples. There has been an ever growing interest in large microarray repositories for several reasons: (a) Microarray data are required by funding agencies and scientific journals to be made publicly accessible; (b) such repositories enable researchers to view data from other research groups; and (c) with proper pre-processing, such repositories may allow researchers to formulate and test hypothe|viously described [ 14 , 15 ]. The annotation tool used for this research was developed to facilitate human annotation by allowing easy access between the data descriptions and measurements that were downloaded from GEO and appropriate scientific publications from Pubmed [ 13 ]. The annotators are able to read the study descriptions that researchers deposited in GEO, as well as individual sample descripti|, and the results are displayed in Table 3 . Table 4 shows all the studies’ goals and the number of samples in each of the 17 annotated studies. Table 3 Coverage of Asthma variables in GDS GSE 470 GSE 473 GSE 3183 GSE 3004 Total Agent 100% 0% 100% 100% 17.4% Disease State 100% 100% 0% 0% 88.2% Time 100% 0% 100% 0% 12.7% Other 0% 100% 0% 0% 82.5% No. of Samples 12 175 15 10 212 Table 4 Annota|dy No. of Samples Topic/Title GSE8052 404 Determinants of susceptibility to childhood asthma GSE473 175 Defining diagnostic genes from purified CD4+ blood cells that have specific diagnostic profiles GSE4302{{tag}}--REUSE-- 118 Profiling of airway epithelial cells GSE3184 40 Murine airway hyperresponsiveness GSE483 39 Allergic response to ragweed GSE1301 24 Mechanisms by which IL-13 elicits the symptoms of asthma GSE8|fects of exercise on gene expression GSE6858 16 Expression data from experimental murine asthma GSE3183 15 Early cytokine-mediated mechanisms that lead to asthma GSE470 12 Asthma exacerbatory factors GSE9465 12 Pulmonary responses to ambient particulate matter GSE3004 10 Effects of allergen challenge on airway cell gene expression GSE2276 9 Effect of PGE receptor subtype agonist on an asthma model GSE4|d inhaler 697 24.1 Disease frequency 627 31.7 Gender 489 46.7 Atopic 425 53.7 Tissue 403 56.1 Challenge 0 1.0 The consistency of the studies in the asthma domain was also measured. In one such study (GSE4302{{tag}}--REUSE--), the data for 32 asthmatics randomized to a placebo-controlled trial of fluticasone propionate were examined. The authors use the generic name “fluticasone propionate” within both | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
98 | GSE4302 | 10/12/2007 | ['4302'] | [] | [u'17898169'] | 2957424 | [u'20976054'] | ['Pantoja', 'Donnelly', 'Barker', 'Woodruff', 'Sidhu', 'Fahy', 'Erle', 'Yang', 'Dao-Pick', 'Boushey', 'Yamamoto', 'Ellwanger', 'Dolganov'] | ['Shamir', 'Karp', 'Ulitsky', 'Krishnamurthy'] | [] | PLoS One | 2010 | 10/19/2010 | 0 | pone.0013367.t001 Table 1 Gene expression datasets used in this study. Dataset KEGG pathway Reference GEO accession Number of cases Number of controls AD Alzheimer's disease (AD) [41] GSE5281 10 13 ASTHMA Asthma [46] GSE4302{{tag}}--REUSE-- 42 28 PYLORI Epithelial cell signaling in Helicobacter pylori infection - GSE5081 8 8 HD Huntington's disease (HD) [48] GSE3790 38 3|-GLIOBLASTOMA Pathways in cancer [47] GSE4290 77 23 SUN-ASTROCYTOMA Pathways in cancer [47] GSE4290 26 23 SUN-OLIGODENDROGLIOMA Pathways in cancer [47] GSE4290 50 23 ESTILO-OTSCC Pathways in cancer [44] GSE13601 31 26 YE-OTSCC Pathways in cancer [45] GSE9844 26 12 MORAN-PD Parkinson's disease (PD) [42] GSE83|SLE Systemic lupus erythematosus (SLE) [49] GSE8650 38 21 Each dataset contained a comparison of sick individuals and healthy controls. All the data were obtained from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ ). We first evaluated the performance of different variants of our algorithm and found that DEGAS usually identified the smallest pathways ( Text S1 and Figu | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
99 | GSE4302 | 10/12/2007 | ['4302'] | [] | [u'17898169'] | 2753788 | [u'19628779'] | ['Pantoja', 'Donnelly', 'Barker', 'Woodruff', 'Sidhu', 'Fahy', 'Erle', 'Yang', 'Dao-Pick', 'Boushey', 'Yamamoto', 'Ellwanger', 'Dolganov'] | ['Nguyenvu', 'Huang', 'Nakagami', 'Woodruff', 'Eisley', 'Park', 'Fahy', 'Erle', 'Barbeau', 'Verhaeghe'] | ['Erle', 'Woodruff', 'Fahy'] | Am J Respir Crit Care Med | 2009 | 10/1/2009 | 0 | nalyses of FOXA2 , FOXA3 , MUC5AC , and CLCA1 expression in bronchial epithelial cells from subjects with asthma and control subjects, microarray data generated in our previous study ( 27 ) were downloaded from the Gene Expression Omnibus ( www.ncbi.nlm.nih.gov/geo ; accession number GSE4302{{tag}}--REUSE-- ). Significance testing was performed by Student t test or by analysis of variance and Tukey-Kramer posttes | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
100 | GSE4302 | 10/12/2007 | ['4302'] | [] | [u'17898169'] | 2742757 | [u'19483109'] | ['Pantoja', 'Donnelly', 'Barker', 'Woodruff', 'Sidhu', 'Fahy', 'Erle', 'Yang', 'Dao-Pick', 'Boushey', 'Yamamoto', 'Ellwanger', 'Dolganov'] | ['Jia', 'Modrek', 'Woodruff', 'Fahy', 'Arron', 'Abbas', 'Koth', 'Choy', 'Ellwanger'] | ['Ellwanger', 'Woodruff', 'Fahy'] | Am J Respir Crit Care Med | 2009 | 9/1/2009 | 0 | ray analyses on epithelial brushings had been performed as described previously ( 9 ). These data are available in MIAME-compliant format at GEO ( http://www.ncbi.nlm.nih.gov/geo/ , accession number GSE4302{{tag}}--DEPOSIT-- ). Additional real-time PCR (qPCR) analyses were performed using RNA from homogenized bronchial biopsies (34 patients with asthma and 14 healthy control subjects) and from lavage macrophages (14 pa | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
101 | GSE4302 | 10/12/2007 | ['4302'] | [] | [u'17898169'] | 2000427 | [u'17898169'] | ['Pantoja', 'Donnelly', 'Barker', 'Woodruff', 'Sidhu', 'Fahy', 'Erle', 'Yang', 'Dao-Pick', 'Boushey', 'Yamamoto', 'Ellwanger', 'Dolganov'] | ['Pantoja', 'Donnelly', 'Barker', 'Woodruff', 'Sidhu', 'Fahy', 'Erle', 'Yang', 'Dao-Pick', 'Boushey', 'Yamamoto', 'Ellwanger', 'Dolganov'] | ['Donnelly', 'Barker', 'Woodruff', 'Fahy', 'Erle', 'Sidhu', 'Yang', 'Pantoja', 'Dao-Pick', 'Boushey', 'Yamamoto', 'Ellwanger', 'Dolganov'] | Proc Natl Acad Sci U S A | 2007 | 10/2/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
102 | GSE4327 | 2/21/2007 | ['4327'] | ['2607'] | [u'17303130'] | 2680807 | [u'19416532'] | ['Biagi', 'Bordoni', 'Di', 'Pession', 'Franzoni', 'Danesi', 'Astolfi', 'Morandi'] | ['Thomas', 'Zhang', 'Murphy', 'Gohlke', 'Mattingly', 'Davis', 'Rosenstein', 'Portier', 'Becker'] | [] | BMC Syst Biol | 2009 | 5/5/2009 | 1 | bal gene expression datasets utilized for validation of metabolic syndrome and neuropsychiatric subnetworks METABOLIC SYNDROME Condition Species Tissue GEO Acc . Reference obese/lean Human adipocytes GSE2508 [ 30 ] obese/lean Mouse adipocytes GSE4692 [ 31 ] Familial combined hyperlipedemia Human monocytes GSE11393 [ 32 ] Treatment Species Tissue GEO Acc . Reference Fenofibrate Rat liver GSE8251 [ 3|amide Rat liver GSE3952 [ 34 ] 9-cis retinoic acid Rat liver GSE3952 [ 34 ] Targretin Rat liver GSE3952 [ 34 ] Vitamin A deficient diet Rat liver GSE1600 [ 35 ] Omega 3 fatty acids Rat cardiomyocytes GSE4327{{tag}}--REUSE-- [ 36 ] Thiazolidinediones Human 3T3-L1 adipocytes GSE1458 [ 37 ] Atorvastatin Human monocytes GSE11393 [ 32 ] Cyfluthrin Human astrocytes GSE5023 [ 38 ] NEUROPSYCHIATRIC DISORDERS Condition Specie| 39 ] Depression Human prefrontal cortex GSE12654 [ 39 ] Schizophrenia Human prefrontal cortex GSE12654 [ 39 ] Schizophrenia Human frontal cortex E-MEXP-857 [ 40 ] Anxiety Mouse various brain regions GSE3327 [ 41 ] Autism Human lymphoblastoid cell lines GSE7329 [ 42 ] Autism Human whole blood GSE6575 [ 43 ] Treatment Species Tissue GEO Acc . Reference Chlorpyrifos Human astrocytes GSE5023 [ 38 ] Ch | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
103 | GSE4346 | 12/7/2007 | ['4346'] | [] | [u'17620351'] | 2044516 | [u'17620351'] | ['Reitzer', 'Pearson', 'Wang', 'Laurence', 'Rasko', 'Hansen', 'Blick'] | ['Reitzer', 'Pearson', 'Wang', 'Laurence', 'Rasko', 'Hansen', 'Blick'] | ['Reitzer', 'Pearson', 'Wang', 'Laurence', 'Rasko', 'Hansen', 'Blick'] | Infect Immun | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
104 | GSE4348 | 12/7/2007 | ['4348'] | [] | [u'17620351'] | 2044516 | [u'17620351'] | ['Reitzer', 'Pearson', 'Wang', 'Laurence', 'Rasko', 'Hansen', 'Blick'] | ['Reitzer', 'Pearson', 'Wang', 'Laurence', 'Rasko', 'Hansen', 'Blick'] | ['Reitzer', 'Pearson', 'Wang', 'Laurence', 'Rasko', 'Hansen', 'Blick'] | Infect Immun | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
105 | GSE4407 | 2/27/2007 | ['4407'] | ['3147'] | [u'17296732'] | 1899932 | [u'17296732'] | ['Okuno', 'Kozutsumi', 'Takematsu', 'Yamaji', 'Miyake', 'Tsujimoto', 'Naito', 'Koyama', 'Sugai', 'Hashimoto', 'Itohara', 'Yamamoto', 'Kawasaki', 'Suzuki', 'Fujinawa'] | ['Okuno', 'Kozutsumi', 'Takematsu', 'Yamaji', 'Miyake', 'Tsujimoto', 'Naito', 'Koyama', 'Sugai', 'Hashimoto', 'Itohara', 'Yamamoto', 'Kawasaki', 'Suzuki', 'Fujinawa'] | ['Okuno', 'Kozutsumi', 'Takematsu', 'Yamaji', 'Miyake', 'Tsujimoto', 'Naito', 'Koyama', 'Sugai', 'Hashimoto', 'Fujinawa', 'Yamamoto', 'Kawasaki', 'Suzuki', 'Itohara'] | Mol Cell Biol | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
106 | GSE4422 | 8/1/2007 | ['4422'] | [] | [u'17727724'] | 2072952 | [u'17727724'] | ['', 'Frank', 'Li', 'Cavener', 'Iida', 'McGrath'] | ['Cavener', 'Frank', 'Iida', 'McGrath', 'Li'] | ['Frank', 'Cavener', 'Iida', 'McGrath', 'Li'] | BMC Cell Biol | 2007 | 8/29/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
107 | GSE4494 | 7/28/2007 | ['4494'] | ['2841'] | [u'17517326'] | 2785812 | [u'19917117'] | ['Strother', 'McClintick', 'Carr', 'Kimpel', 'Edenberg', 'Liang', 'McBride'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494{{tag}}--REUSE-- RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
108 | GSE4494 | 7/28/2007 | ['4494'] | ['2841'] | [u'17517326'] | 1976291 | [u'17517326'] | ['Strother', 'McClintick', 'Carr', 'Kimpel', 'Edenberg', 'Liang', 'McBride'] | ['Strother', 'McClintick', 'Carr', 'Kimpel', 'Edenberg', 'Liang', 'McBride'] | ['Strother', 'McClintick', 'Carr', 'Kimpel', 'Edenberg', 'Liang', 'McBride'] | Alcohol | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
109 | GSE4524 | 8/25/2007 | ['4524'] | [] | [u'18588690'] | 2459201 | [u'18588690'] | ['Villa', 'Reis', 'Martins', 'Neves', 'Hirata', u'Col\xf3', 'Colo', 'Boccardo', 'Termini', 'Esteves'] | ['Villa', 'Reis', 'Martins', 'Neves', 'Hirata', 'Colo', 'Boccardo', 'Termini', 'Esteves'] | ['Villa', 'Martins', 'Neves', 'Reis', 'Boccardo', 'Colo', 'Hirata', 'Termini', 'Esteves'] | BMC Med Genomics | 2008 | 6/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
110 | GSE4557 | 3/27/2007 | ['4557'] | [] | [u'17947375'] | 2234583 | [u'17947375'] | ['Carter-Su', 'Maures', 'Schwartz', 'Jin', 'Huo', 'Rabbani', 'Chen'] | ['Carter-Su', 'Maures', 'Schwartz', 'Jin', 'Huo', 'Rabbani', 'Chen'] | ['Carter-Su', 'Maures', 'Schwartz', 'Jin', 'Huo', 'Rabbani', 'Chen'] | Mol Endocrinol | 2008 | 2008 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
111 | GSE4562 | 5/1/2007 | ['4562'] | [] | [u'17511876'] | 1899176 | [u'17511876'] | ['Jayaraman', 'Wood', 'Lee'] | ['Jayaraman', 'Wood', 'Lee'] | ['Jayaraman', 'Wood', 'Lee'] | BMC Microbiol | 2007 | 5/18/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
112 | GSE4567 | 1/26/2007 | ['4567'] | ['2565'] | [u'17450221'] | 1852686 | [u'17450221'] | ['Karoly', 'Hyseni', 'Dailey', 'Huang', 'Li'] | ['Karoly', 'Hyseni', 'Dailey', 'Huang', 'Li'] | ['Karoly', 'Hyseni', 'Dailey', 'Huang', 'Li'] | Environ Health Perspect | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
113 | GSE4577 | 5/4/2007 | ['4577'] | [] | [u'17466061'] | 1868913 | [u'17466061'] | ['Skovgaard', u'Hornsh\xf8j', 'Jensen', u'S\xf8rensen', 'S\xc3\xb8rensen', 'Heegaard', 'Mortensen', 'Bendixen', 'Hedegaard', 'Hornsh\xc3\xb8j'] | ['Skovgaard', 'Jensen', 'S\xc3\xb8rensen', 'Heegaard', 'Mortensen', 'Bendixen', 'Hedegaard', 'Hornsh\xc3\xb8j'] | ['Skovgaard', 'Jensen', 'S\xc3\xb8rensen', 'Heegaard', 'Mortensen', 'Bendixen', 'Hedegaard', 'Hornsh\xc3\xb8j'] | Acta Vet Scand | 2007 | 4/27/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
114 | GSE4582 | 7/31/2007 | ['4582'] | [] | [] | 2605471 | [u'18925948'] | [u'Vial', u'Winzeler', u'Le', u'Zhou'] | ['Dufayard', 'Gascuel', 'Br\xc3\xa9h\xc3\xa9lin'] | [] | BMC Bioinformatics | 2008 | 10/16/2008 | 0 | each data source; (3) a procedure that combines all data sources to provide a global summary and confidence estimate of the predictions. Gonna has been applied to all P. falciparum genes using most publicly available transcriptome, proteome and interactome data sources. Gonna provides predictions for numerous genes without any annotations. For example, 2 434 genes without any annotations in the Biologic|minator pairs of the confidence score. This prevents the GDB from pollution by irrelevant or too noisy data sources. Results Data To produce the PlasmoDraft database, Gonna has been applied to most publicly available postgenomic data sources we were aware. 9 transcriptomic (microarray), 1 proteomic (mass-spectrometry), and 1 protein-protein interaction data sets were used. Below is a short description o| developmental cycle. Measurements for ~5 300 genes. • LE07: A transcriptomic data set analysing the parasite response to choline analog T4 during the intraerythrocytic life cycle. See series GSE4582{{tag}}--REUSE-- in the NCBI Gene Expression Omnibus . Measurements for ~5 100 genes. • LE04: Le Roch et al. (2004) data set [ 31 , 32 ]. A proteomic data set that covers 7 stages of the entire cycle of st|aCount et al. (2005) data set [ 33 ]. A protein-protein interaction data set. Measurements for ~1 300 genes. The Gene Ontology file (revision 5.754) and the gene annotations file (revision 1.54) were downloaded from the GO website. Accessing the database Users can access the predictions by browsing the database or querying for a specific gene, GO term, or keyword. Results are displayed using three types o|efficient and we used two sets of parameters ( K , K' ): ( K = 6, K' = 4) and ( K = 6, K' = 2). All annotations different from IEA, ISS and RCA were used (gene annotation file revision 1.1323, downloaded from the GO website), which involves 4 165 genes characterized in the BP ontology, and a total of 1 220 different GO terms. The TDR s were estimated for each GO term by cross-validation. Figure 5 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
115 | GSE4595 | 3/26/2007 | ['4595'] | [] | [u'17244347'] | 1796866 | [u'17244347'] | ['Cavallaro', 'Pantelidou', 'Santama', 'Lederer', 'Torrisi'] | ['Cavallaro', 'Pantelidou', 'Santama', 'Lederer', 'Torrisi'] | ['Cavallaro', 'Pantelidou', 'Santama', 'Lederer', 'Torrisi'] | BMC Genomics | 2007 | 1/23/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
116 | GSE4609 | 4/5/2007 | ['4609'] | [] | [u'17559657'] | 1904434 | [u'17559657'] | ['Marti', 'Okamoto', 'Carvalho', u'V\xeancio', 'V\xc3\xaancio', 'Moreira-Filho'] | ['Carvalho', 'Marti', 'Moreira-Filho', 'V\xc3\xaancio', 'Okamoto'] | ['Carvalho', 'Marti', 'Moreira-Filho', 'Okamoto', 'V\xc3\xaancio'] | Cancer Cell Int | 2007 | 6/8/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
117 | GSE4624 | 3/27/2007 | ['4624'] | [] | [u'17378698'] | 1796639 | [u'17378698'] | ['Somogyi', 'Baron', 'Roy', 'Kelvin', 'Busque', 'Rineau', 'Perreault', 'Cho', u'Sekaly', 'Chagnon', 'Greller', 'S\xc3\xa9kaly', 'Cameron', 'Wilkinson'] | ['Somogyi', 'Baron', 'Roy', 'Kelvin', 'Busque', 'Rineau', 'Perreault', 'Cho', 'Chagnon', 'Greller', 'S\xc3\xa9kaly', 'Cameron', 'Wilkinson'] | ['Somogyi', 'Wilkinson', 'Baron', 'Perreault', 'Kelvin', 'Busque', 'Rineau', 'Cho', 'Chagnon', 'Greller', 'S\xc3\xa9kaly', 'Cameron', 'Roy'] | PLoS Med | 2007 | 2007 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
118 | GSE4628 | 1/5/2007 | ['4628'] | [] | [u'17190607'] | 2719772 | [u'18313399'] | ['', 'Shaw', 'Sahni', 'Iannucculli', 'Edwards', 'Moroz', 'Lovell', 'Liu', 'Farmerie', 'Heyland', 'Sheng', 'Kandel', 'Ju', 'Nguyen', 'Ha', 'Yu', 'Russo', 'Puthanveettil', 'Kalachikov', 'Knudsen', 'Kohn', 'Panchin', 'Chen', 'Jezzini'] | ['Moroz', 'Panchin'] | ['Moroz', 'Panchin'] | Biochem Biophys Res Commun | 2008 | 5/9/2008 | 0 | ilent Technologies using 60-mer oligonucleotide sequences designed from each nonredundant sequence in the Aplysia EST database [ 10 ]. The data have been deposited in NCBI’s Gene Expression Omnibus (GEO) and are accessible through the GEO Series accession number GSE4628{{tag}}--DEPOSIT-- [ 10 ]. Some experimental modifications are discussed in the text and supplements 1–7 .  Other Sectionsâ�¼ Abstract M | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
119 | GSE4628 | 1/5/2007 | ['4628'] | [] | [u'17190607'] | 2939879 | [u'20856805'] | ['', 'Shaw', 'Sahni', 'Iannucculli', 'Edwards', 'Moroz', 'Lovell', 'Liu', 'Farmerie', 'Heyland', 'Sheng', 'Kandel', 'Ju', 'Nguyen', 'Ha', 'Yu', 'Russo', 'Puthanveettil', 'Kalachikov', 'Knudsen', 'Kohn', 'Panchin', 'Chen', 'Jezzini'] | ['Griffitt', 'Kroll', 'Brown-Peterson', 'Feswick', 'Liu', 'Barber', 'Denslow', 'Glazer', 'Spade'] | ['Liu'] | PLoS One | 2010 | 9/15/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
120 | GSE4654 | 4/1/2007 | ['4654'] | [] | [u'17417638'] | 2861062 | [u'20338033'] | ['Iyer', 'Killion', 'Hu'] | ['Henriksson', 'Xue-Franz\xc3\xa9n', 'Johnsson', 'Wright', 'Brodin', 'B\xc3\xbcrglin'] | [] | BMC Genomics | 2010 | 3/25/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
121 | GSE4716 | 1/20/2007 | ['4716'] | [] | [u'15064725'] | 2689870 | [u'19393097'] | ['Yanagisawa', 'Mitsudomi', 'Some', 'Koshikawa', 'Tomida', 'Harano', 'Osada', 'Ogura', 'Yatabe', 'Takahashi'] | ['Nakai', 'Mizuno', 'Kitada', 'Sarai'] | [] | BMC Med Genomics | 2009 | 4/24/2009 | 1 | as possible. All tables were relationally linked and stored in the MySQL server. Table 1 Dataset content from PrognoScan Dataset Cancer type Subtype Cohort Author/Contributor Array type n Data source GSE13507 Bladder cancer Transitional cell carcinoma Cheongju Kim Human-6 v2 n = 165 GEO GSE5287 Bladder cancer Aarhus (1995–2004) Als et al . [ 10 ] HG-U133A n = 30 GEO GSE12417-GPL570 Blood cance|-GPL96 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133A n = 163 GEO GSE12417-GPL97 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133B n = 163 GEO GSE8970 Blood cancer AML San Diego Raponi et al . [ 12 ] HG-U133A n = 34 GEO GSE4475 Blood cancer B-cell lymphoma Berlin (2003–2005) Hummel et al . [ 13 ] HG-U133A n = 158 GEO E-TABM-346 Blood ca|E2658 Blood cancer Multiple myeloma Arkansas Zhan et al . [ 15 ] HG-U133_Plus_2 n = 559 GEO E-TABM-158 Breast cancer UCSF, CPMC (1989–1997) Chin et al . [ 16 ] HG-U133A n = 129 ArrayExpress GSE11121 Breast cancer Mainz (1988–1998) Schmidt et al . [ 17 ] HG-U133A n = 200 GEO GSE1378 Breast cancer MGH (1987–2000) Ma et al . [ 18 ] Arcturus 22 k n = 60 GEO GSE1379 Breast cancer|6-GPL96 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133A n = 159 GEO GSE1456-GPL97 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133B n = 159 GEO GSE2034 Breast cancer Rotterdam (1980–1995) Wang et al . [ 20 ] HG-U133A n = 286 GEO GSE2990 Breast cancer Uppsala, Oxford Sotiriou et al . [ 21 ] HG-U133A n = 187 GEO GSE3143 Breast cancer Duke |GSE3494-GPL96 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133A n = 236 GEO GSE3494-GPL97 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133B n = 236 GEO GSE4922-GPL96 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133A n = 249 GEO GSE4922-GPL97 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133B n = 249 GEO GSE|SE7378 Breast cancer UCSF Zhou et al . [ 26 ] U133AAofAv2 n = 54 GEO GSE7390 Breast cancer Uppsala, Oxford, Stockholm, IGR, GUYT, CRH (1980–1998) Desmedt et al . [ 27 ] HG-U133A n = 198 GEO GSE7849 Breast cancer Duke (1990–2001) Anders et al . [ 28 ] HG-U95A n = 76 GEO GSE9195 Breast cancer GUYT2 Loi et al . [ 25 ] HG-U133_Plus_2 n = 77 GEO GSE9893 Breast cancer Montpellier, Bordeau|man 21 K V12.0 n = 155 GEO GSE11595 Esophagus cancer Adenocarcinoma Sutton Giddings CRUKDMF_22 K_v1.0.0 n = 34 GEO GSE7696 Glioma Glioblastoma Lausanne Murat et al . [ 30 ] HG-U133_Plus_2 n = 70 GEO GSE4271-GPL96 Glioma MDA Phillips et al . [ 31 ] HG-U133A n = 77 GEO GSE4271-GPL97 Glioma MDA Phillips et al . [ 31 ] HG-U133B n = 77 GEO GSE2837 Head and neck cancer Squamous cell carcinoma VUMC, VAMC, |Adenocarcinoma Harvard Beer et al . [ 33 ] HG-U95A n = 84 Author's web site MICHIGAN-LC Lung cancer Adenocarcinoma Michigan (1994–2000) Beer et al . [ 33 ] HuGeneFL n = 86 Author's web site GSE11117 Lung cancer NSCLC Basel Baty Novachip human 34.5 k n = 41 GEO GSE3141 Lung cancer NSCLC Duke Bild et al . [ 22 ] HG-U133_Plus_2 n = 111 GEO GSE4716{{tag}}--REUSE---GPL3694 Lung cancer NSCLC Nagoya (1995–|da et al . [ 34 ] GF200 n = 50 GEO GSE4716{{tag}}--REUSE---GPL3696 Lung cancer NSCLC Nagoya (1995–1996) Tomida et al . [ 34 ] GF201 n = 50 GEO GSE8894 Lung cancer NSCLC Seoul Son HG-U133_Plus_2 n = 138 GEO GSE4573 Lung cancer Squamous cell carcinoma Michigan (1991–2002) Raponi et al . [ 35 ] HG-U133A n = 129 GEO DUKE-OC Ovarian cancer Duke Bild et al . [ 22 ] HG-U133A n = 134 Author's web site GSE8|horts. Datasets come from a number of different institutions around the world, and patient backgrounds differ. In addition, several datasets are based on specific subpopulations, for example, dataset GSE2034 is from lymph node-negative breast cancers, and GSE5287 is from cisplatin-containing chemotherapy-treated bladder cancers. Hence, it is possible that the specific association between gene expressio|subsequent care may affect the clinical course of a patient. 3) Experimental factors. Expression measurement of microarray is subject to various factors at the experiment level. Microdissection (e.g. GSE1378) would reduce contamination of mRNAs from non-cancer cells [ 57 ]. Formalin fixation of a sample (e.g. GSE2873) influences the quality of mRNAs [ 58 ]. Array type (e.g. Affymetrix, cDNA microarrays | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
122 | GSE4741 | 1/9/2007 | ['4741'] | [] | [u'17190839'] | 2118424 | [u'17190839'] | ['Somogyi', 'Sekaly', 'Van', 'Balderas', 'Riou', 'Campos-Gonzalez', 'Yassine-Diab', 'Haddad', 'Gimmig', 'Greller', 'Kelvin', 'Cameron', 'Shi', 'Wilkinson', 'Gagnon'] | ['Somogyi', 'Sekaly', 'Van', 'Balderas', 'Riou', 'Campos-Gonzalez', 'Yassine-Diab', 'Haddad', 'Gimmig', 'Greller', 'Kelvin', 'Cameron', 'Shi', 'Wilkinson', 'Gagnon'] | ['Somogyi', 'Van', 'Balderas', 'Riou', 'Campos-Gonzalez', 'Yassine-Diab', 'Haddad', 'Gimmig', 'Greller', 'Sekaly', 'Cameron', 'Shi', 'Wilkinson', 'Kelvin', 'Gagnon'] | J Exp Med | 2007 | 1/22/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
123 | GSE4763 | 8/8/2007 | ['4763'] | [] | [u'17892325'] | 2811030 | [u'19906696'] | ['Yi', 'Baylin', 'Van', 'Ahuja', 'Gl\xc3\xb6ckner', 'Cope', 'Toyota', 'Jair', 'Herman', 'Chan', 'Schuebel', 'Ting', 'Chen', 'van', 'Imai', 'Yu', 'Suzuki'] | ['Serre', 'Ting', 'Lee'] | ['Ting'] | Nucleic Acids Res | 2010 | 2010 Jan | 0 | cells was used for annotating CTCF binding sites. Gene expression analysis Differential gene expression data from HCT116, DAC-treated HCT116, and DKO cells were obtained from Gene Expression Omnibus (GSE4763{{tag}}--REUSE--). For each probe on the Agilent microarray, the mean log2 change in gene expression was calculated for the DAC-treated HCT116/HCT116 (chemical demethylation) and DKO/HCT116 (genetic demethylation) | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
124 | GSE4763 | 8/8/2007 | ['4763'] | [] | [u'17892325'] | 1988850 | [u'17892325'] | ['Yi', 'Baylin', 'Van', 'Ahuja', 'Gl\xc3\xb6ckner', 'Cope', 'Toyota', 'Jair', 'Herman', 'Chan', 'Schuebel', 'Ting', 'Chen', 'van', 'Imai', 'Yu', 'Suzuki'] | ['Yi', 'Baylin', 'Van', 'Ahuja', 'Gl\xc3\xb6ckner', 'Cope', 'Toyota', 'Jair', 'Herman', 'Chan', 'Schuebel', 'Ting', 'Chen', 'van', 'Imai', 'Yu', 'Suzuki'] | ['Yi', 'Van', 'Ahuja', 'Gl\xc3\xb6ckner', 'Cope', 'Toyota', 'Jair', 'Herman', 'Chan', 'Schuebel', 'Ting', 'van', 'Chen', 'Imai', 'Baylin', 'Yu', 'Suzuki'] | PLoS Genet | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
125 | GSE4763 | 8/8/2007 | ['4763'] | [] | [u'17892325'] | 2429944 | [u'18507500'] | ['Yi', 'Baylin', 'Van', 'Ahuja', 'Gl\xc3\xb6ckner', 'Cope', 'Toyota', 'Jair', 'Herman', 'Chan', 'Schuebel', 'Ting', 'Chen', 'van', 'Imai', 'Yu', 'Suzuki'] | ['Yi', 'Van', 'Velculescu', 'Ahuja', 'Cope', 'Glockner', 'Chan', 'Schuebel', 'Chen', 'Baylin', 'Herman'] | ['Yi', 'Van', 'Ahuja', 'Cope', 'Chan', 'Schuebel', 'Chen', 'Baylin', 'Herman'] | PLoS Med | 2008 | 5/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
126 | GSE4766 | 5/16/2007 | ['4766'] | [] | [u'17472752'] | 1929140 | [u'17472752'] | ['Van', 'Boyd', 'Freedman', 'Azzam', 'Haugen', 'Meyer'] | ['Van', 'Boyd', 'Freedman', 'Azzam', 'Haugen', 'Meyer'] | ['Van', 'Boyd', 'Azzam', 'Freedman', 'Haugen', 'Meyer'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
127 | GSE4771 | 5/4/2007 | ['4771'] | [] | [u'17299593'] | 1790703 | [u'17299593'] | ['Collart', 'Smith', 'Ramis'] | ['Collart', 'Smith', 'Ramis'] | ['Collart', 'Smith', 'Ramis'] | PLoS One | 2007 | 2/14/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
128 | GSE4777 | 5/5/2007 | ['4777'] | [] | [] | 1790703 | [u'17299593'] | [u'Collart', u'Smith', u'Ramis'] | ['Collart', 'Smith', 'Ramis'] | ['Collart', 'Smith', 'Ramis'] | PLoS One | 2007 | 2/14/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
129 | GSE4782 | 9/30/2007 | ['4782'] | [] | [] | 2080788 | [u'17652031'] | [u'Gayle', u'Mark', u'Eyster', u'Martin'] | ['Gayle', 'Martin', 'Eyster', 'Mark'] | ['Gayle', 'Martin', 'Eyster', 'Mark'] | Vascul Pharmacol | 2007 | 2007 Oct | 0 | eSpring software normalized the expression of each gene to the median and each slide to the 50 th percentile. The DNA microarray data presented herein have been deposited at the NCBI Gene Expression Omnibus (GEO, < www.ncbi.nlm.nih.gov/geo >) as recommended by MIAME standards ( Brazma et al., 2001 ) and can be accessed through GEO Series accession number GSE4782{{tag}}--DEPOSIT-- . 2.4. Real time RT-PCR | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
130 | GSE4786 | 5/1/2007 | ['4786'] | ['2681'] | [u'16890326'] | 2702675 | [u'18155270'] | ['Yamasoba', 'Tanokura', 'Someya', 'Prolla', 'Weindruch'] | ['Swindell'] | [] | Mech Ageing Dev | 2008 | 2008 Mar | 1 | Sections� Abstract 1. Introduction 2. Materials and Methods 3. Results 4. Discussion Supplementary Material References 2. Materials and Methods Microarray datasets were obtained from Gene Expression Omnibus ( Barrett et al., 2007 ), ArrayExpress ( Parkinson et al., 2007 ) or directly from the contact author of published studies. All datasets were generated from experiments that evaluated the effects of |aximum value associated with each gene is plotted in the figure. 3.4. Effects of CR and Aging in Liver Five differential expression signatures associated with aging in liver were generated using data downloaded from the Gene Expression Omnibus and ArrayExpress databases ( GSE3129 , GSE3150 , EMEXP153, EMEXP839) ( Amador-Noguez et al., 2004 ; Boyleston et al., 2006 ; Niedernhofer et al., 2006 ). The sig|ficant overlap was found (lvr4a, lvr4b, lvr10 and lvr13). Figure 10 Effects of CR and resveratrol in liver. The effects of resveratrol in liver were evaluated by Baur et al. (2006) (Gene Expression Omnibus series GSE6089 ). Differential expression signatures from data generated by Baur et al. (2006) were compared to those (more ...) Figure 10 Effects of CR and resveratrol in liver. The effects of|ity of Michigan Department of Pathology. Helpful comments and suggestions were provided by two anonymous reviewers. The author thanks laboratories for providing microarray data to the Gene Expression Omnibus and ArrayExpress databases, as well as researchers who responded to requests for experimental data (Yoshikazu Higami, Yinghe Hu, Patricia L. Mote, Thomas A. Prolla, Steven R. Spindler, James M. Vann, | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
131 | GSE4821 | 7/31/2007 | ['4821'] | [] | [] | 2530823 | [u'18234529'] | [u'Burger', u'Baker', u'Muzyczka', u'Mandel', u'Lopez'] | ['Burger', 'Baker', 'Muzyczka', 'Lopez', 'Mandel'] | ['Lopez', 'Burger', 'Baker', 'Mandel', 'Muzyczka'] | Neurobiol Learn Mem | 2008 | 2008 May | 0 | azma, Hingamp et al. 2001 ), Affymetrix DAT and CEL. files, as well as the TXT file that results from the former two, as well as the Dchip expression matrix have been deposited at the Gene Expression Omnibus website (GEO: http://www.ncbi.nlm.nih.gov/geo/ Accession series record number: GSE4821{{tag}}--DEPOSIT-- ). 2.6. Pathway analysis For the 85 learning-associated genes, we used a combination of bioinformatics softwa | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
132 | GSE4857 | 7/31/2007 | ['4857'] | [] | [u'17344319'] | 1874664 | [u'17344319'] | ['Mendrzyk', 'Radlwimmer', 'Schlaeger', 'Benner', 'Lichter', 'Scheurlen', 'Kulozik', 'Pfister', 'Wittmann'] | ['Mendrzyk', 'Radlwimmer', 'Schlaeger', 'Benner', 'Lichter', 'Scheurlen', 'Kulozik', 'Pfister', 'Wittmann'] | ['Mendrzyk', 'Radlwimmer', 'Schlaeger', 'Benner', 'Lichter', 'Scheurlen', 'Kulozik', 'Pfister', 'Wittmann'] | Nucleic Acids Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
133 | GSE4859 | 9/1/2007 | ['4859'] | ['2928', '2927'] | [u'17884332'] | 2785812 | [u'19917117'] | ['Hessner', 'Carvan', 'Hutz', u'King', 'Struble', 'Rise', 'Heiden'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859{{tag}}--REUSE-- Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
134 | GSE4859 | 9/1/2007 | ['4859'] | ['2928', '2927'] | [u'17884332'] | 2693207 | [u'17884332'] | ['Hessner', 'Carvan', 'Hutz', u'King', 'Struble', 'Rise', 'Heiden'] | ['Hessner', 'Carvan', 'Hutz', 'Struble', 'Rise', 'Heiden'] | ['Hessner', 'Carvan', 'Hutz', 'Struble', 'Rise', 'Heiden'] | Reprod Toxicol | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
135 | GSE4903 | 9/12/2007 | ['4903'] | [] | [u'17822691'] | 2254939 | [u'17822691'] | ['Kern', 'Wessels', 'Chintalapudi', 'Phelps', 'Wirrig', 'Hoffman', 'Fresco', 'Snarr', 'Barth', 'Mjaatvedt', "O'neal", 'Trusk', 'Argraves', 'Toole'] | ['Kern', 'Wessels', 'Chintalapudi', 'Phelps', 'Wirrig', 'Hoffman', 'Fresco', 'Snarr', 'Barth', 'Mjaatvedt', "O'neal", 'Trusk', 'Argraves', 'Toole'] | ['Kern', 'Wessels', 'Chintalapudi', 'Phelps', 'Toole', 'Hoffman', 'Fresco', 'Snarr', 'Barth', 'Mjaatvedt', "O'neal", 'Trusk', 'Argraves', 'Wirrig'] | Dev Biol | 2007 | 10/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
136 | GSE4918 | 11/21/2007 | ['4918'] | [] | [u'18030337'] | 2065904 | [u'18030337'] | [u'Hornsh\xf8j', 'Hedegaard', u'S\xf8rensen', 'S\xc3\xb8rensen', 'Bendixen', 'Panitz', 'Conley', 'Hornsh\xc3\xb8j'] | ['Hedegaard', 'S\xc3\xb8rensen', 'Bendixen', 'Panitz', 'Conley', 'Hornsh\xc3\xb8j'] | ['Hedegaard', 'S\xc3\xb8rensen', 'Bendixen', 'Panitz', 'Conley', 'Hornsh\xc3\xb8j'] | PLoS One | 2007 | 11/21/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
137 | GSE4923 | 7/31/2007 | ['4923'] | [] | [u'17466076'] | 1896016 | [u'17466076'] | ['Angulo', 'Beltran', 'Corominas', 'Serras', 'Pignatelli'] | ['Angulo', 'Beltran', 'Corominas', 'Serras', 'Pignatelli'] | ['Angulo', 'Beltran', 'Corominas', 'Serras', 'Pignatelli'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
138 | GSE4966 | 5/20/2007 | ['4966'] | [] | [u'17355627'] | 1852312 | [u'17355627'] | ['Rocha', 'Ulian', 'Menossi', 'Hemerly', u'Drummond', 'Medeiros', 'Figueira', 'Vinagre', 'Souza', 'Nishiyama', u'Nishiyama-Jr', 'Barsalobres', 'de', 'Vicentini', 'Silva-Filho', 'Galbiatti', 'V\xc3\xaancio', 'Rodrigues', 'Almeida', 'Zingaretti', 'Papini-Terzi', 'Duarte', u'V\xeancio'] | ['Medeiros', 'Barsalobres', 'Almeida', 'Rodrigues', 'Papini-Terzi', 'de', 'Vicentini', 'Vinagre', 'Figueira', 'Duarte', 'Rocha', 'Hemerly', 'Ulian', 'Souza', 'Nishiyama', 'Zingaretti', 'Galbiatti', 'Menossi', 'Silva-Filho', 'V\xc3\xaancio'] | ['Rodrigues', 'Vinagre', 'Barsalobres', 'Almeida', 'Hemerly', 'Papini-Terzi', 'de', 'Medeiros', 'Silva-Filho', 'Figueira', 'Duarte', 'Vicentini', 'Ulian', 'Souza', 'Nishiyama', 'Zingaretti', 'Galbiatti', 'Menossi', 'Rocha', 'V\xc3\xaancio'] | BMC Genomics | 2007 | 3/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
139 | GSE4971 | 5/20/2007 | ['4971'] | [] | [u'17355627'] | 1852312 | [u'17355627'] | ['Rocha', 'Ulian', 'Menossi', 'Hemerly', 'Medeiros', 'Figueira', 'Vinagre', 'Souza', 'Nishiyama', u'Nishiyama-Jr', 'Barsalobres', u'di', 'de', 'Vicentini', 'Silva-Filho', 'Galbiatti', 'V\xc3\xaancio', 'Rodrigues', 'Almeida', 'Zingaretti', 'Papini-Terzi', 'Duarte', u'V\xeancio'] | ['Medeiros', 'Barsalobres', 'Almeida', 'Rodrigues', 'Papini-Terzi', 'de', 'Vicentini', 'Vinagre', 'Figueira', 'Duarte', 'Rocha', 'Hemerly', 'Ulian', 'Souza', 'Nishiyama', 'Zingaretti', 'Galbiatti', 'Menossi', 'Silva-Filho', 'V\xc3\xaancio'] | ['Rodrigues', 'Vinagre', 'Barsalobres', 'Almeida', 'Hemerly', 'Papini-Terzi', 'de', 'Medeiros', 'Silva-Filho', 'Figueira', 'Duarte', 'Vicentini', 'Ulian', 'Souza', 'Nishiyama', 'Zingaretti', 'Galbiatti', 'Menossi', 'Rocha', 'V\xc3\xaancio'] | BMC Genomics | 2007 | 3/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
140 | GSE4972 | 6/1/2007 | ['4972'] | [] | [u'17534440'] | 1871610 | [u'17534440'] | ['Zhou', 'Lambert', 'Haddad', 'Morcillo', 'Xue', 'Chen', 'White'] | ['Zhou', 'Lambert', 'Haddad', 'Morcillo', 'Xue', 'Chen', 'White'] | ['Zhou', 'Lambert', 'Haddad', 'Morcillo', 'Xue', 'Chen', 'White'] | PLoS One | 2007 | 5/30/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
141 | GSE4992 | 12/29/2007 | ['4992'] | [] | [] | 2693232 | [u'18841463'] | [u'Paredes', u'Papoutsakis', u'Alsaker'] | ['Alexe', 'Tan', 'Reiss'] | [] | Breast Cancer Res Treat | 2009 | 2009 Jun | 0 | ood that TGFβ signaling is active in that tumor. Each bar represents an individual tumor specimen. Three different breast cancer expression profile data sets were analyzed. These included the GSE_2034 data of 286 specimens described by Wang et al. [ 146 ], the GSE_4992{{tag}}--REUSE-- data of 249 specimens described by Ivshina et al. [ 181 ] and the GSE_7390 data on 165 specimens reported by Desmedt et al. [ 1|reast cancer subsets. As shown in Fig. 3B , the TBRS MSKCC was strongly positively associated with tumors in the HER2(NI), BA2 and LA1 subsets. These results were validated across three independent publicly available breast cancer expression data sets from different centers [ 147 , 180 , 181 ]. Moreover, the TBRS CINJ gave identical results to that developed by Padua et al. [ 147 ]. The principal dif | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
142 | GSE5054 | 8/24/2007 | ['5054'] | ['2939', '2938'] | [u'17640998'] | 2785812 | [u'19917117'] | ['Doherty', 'Van', 'Wang', 'Gauger', 'Baker', 'Fan', 'Kuick'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054{{tag}}--REUSE--|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
143 | GSE5083 | 3/27/2007 | ['5083'] | [] | [] | 1871584 | [u'17428342'] | [''] | ['Zhu', 'Wan', 'Xue', 'Tu', 'Leng', 'Wang', 'Zhang', 'Shen', 'Sun', 'Ding', 'Li', 'Peng', 'Liu', 'Yang', 'Jin', 'Dong', 'Chen', 'Ma', 'Qian', 'Yu', 'Xu'] | [] | BMC Genomics | 2007 | 4/11/2007 | 0 | nBank under accession numbers: [ DW405580 – DW407270 and DW678211 – DW711189 ]. The microarray related data were submitted to Gene Expression Omnibus (GEO) under accession number: [ GSE5083{{tag}}--DEPOSIT-- ] Authors' contributions TL carried out the T. rubrum functional genomics studies, EST sequence analysis, cDNA microarray preparation, the gene expression analysis and participated in drafting th | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
144 | GSE5087 | 1/1/2007 | ['5087'] | [] | [] | 2246292 | [u'17927820'] | [u'Zhang', u'He'] | ['Feng', 'Qin', 'Jiang', 'Cheng', 'Zhang', 'Tang', 'Wei', 'Lu', 'Cao', 'Liang', 'Chen', 'Liao', 'He', 'Xu'] | [u'Zhang', u'He'] | Genome Biol | 2007 | 2007 | 0 | average linkage algorithm for aCGH analysis. All the aCGH data can be accessed at the National Center for Biotechnology Information Gene Expression Omnibus (GEO) database [ 90 ] with accession number GSE5087{{tag}}--DEPOSIT--. Putative AHD CDSs identified by aCGH were examined by PCR using the primers designed within CDSs in strain 8004. The oligonucleotide primers and the PCR results are shown in Additional data file 3 | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
145 | GSE5091 | 6/16/2007 | ['5091'] | [] | [u'17555569'] | 1920521 | [u'17555569'] | ['Houlihan', 'Secombes', 'Zou', 'Martin'] | ['Houlihan', 'Secombes', 'Zou', 'Martin'] | ['Houlihan', 'Secombes', 'Zou', 'Martin'] | BMC Genomics | 2007 | 6/7/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
146 | GSE5108 | 2/15/2007 | ['5108'] | ['3092'] | [u'17462640'] | 2752458 | [u'19735579'] | ['Klinkova', 'Hansen', 'Eyster', 'Kennedy'] | ['Zhao', 'He', 'Wang', 'Pan', 'Bai'] | [] | Reprod Biol Endocrinol | 2009 | 9/8/2009 | 1 | microarray raw or normalized data are available. Finally six public gene expression data sets were involved in our study, which assessed endometriosis transcripts on a genome-wide basis. In data set GSE7307, total 677 samples from more than 90 distinct tissue types were processed, but only the profiles related to endometriosis and eutopic endometrium were considered here. The data generated from human| Characteristics of datasets included in the studies. First Author or Contributor Chip GEO Accession Experimental design Classification Probes Number of samples Disease Normal Sha [ 4 ] U133 PLUS 2.0 GSE7846 unpaired, HEECS ovarian 54K 5 5 Burney [ 14 ] U133 PLUS 2.0 GSE6364 unpaired, tissues Ovarian, peritoneal, rectovaginal 54K 21 16 Eyster [ 15 ] CodeLink GSE5108{{tag}}--REUSE-- paired, tissues Ovarian, peritoneal |nd adjusted, normalized and log2 probe-set intensities calculated using the Robust Multichip Averaging (RMA) algorithm in affy package [ 23 , 24 ], and the Codelink arrays normalizations performed in GSE5108{{tag}}--REUSE-- were retained. Genes which cannot be mapped to any KEGG pathway identified were excluded from the further analysis. The interquartile range (IQR) was used as a measure of variability. From the resu | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
147 | GSE5120 | 4/25/2007 | ['5120'] | [] | [u'17559304'] | 1891326 | [u'17559304'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | PLoS Pathog | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
148 | GSE5145 | 2/20/2007 | ['5145'] | ['2628'] | [u'17213369'] | 2880288 | [u'20459635'] | ['Boss\xc3\xa9', 'Hudson', u'Boss\xe9', 'Maghni'] | ['Patel', 'Butte'] | [] | BMC Med Genomics | 2010 | 5/6/2010 | 0 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 | |
149 | GSE5170 | 6/27/2007 | ['5170'] | [] | [u'17074098'] | 1635984 | [u'17074098'] | ['Kaukinen', u'M\xe4ki', 'Tuimala', 'Kainulainen', 'M\xc3\xa4ki', 'Juuti-Uusitalo'] | ['Juuti-Uusitalo', 'Kainulainen', 'Tuimala', 'Kaukinen', 'M\xc3\xa4ki'] | ['Juuti-Uusitalo', 'Kainulainen', 'Tuimala', 'Kaukinen', 'M\xc3\xa4ki'] | BMC Genomics | 2006 | 10/31/2006 | 0 | AND pmc_gds | 1 | 0 | ||||
150 | GSE5181 | 6/22/2007 | ['5181'] | [] | [u'18003923'] | 2141819 | [u'18003923'] | ['Telonis-Scott', 'Wayne', 'Harshman', 'Nuzhdin', 'Bono', 'Kopp', 'McIntyre'] | ['Telonis-Scott', 'Wayne', 'Harshman', 'Nuzhdin', 'Bono', 'Kopp', 'McIntyre'] | ['Wayne', 'Harshman', 'Telonis-Scott', 'Nuzhdin', 'Kopp', 'Bono', 'McIntyre'] | Proc Natl Acad Sci U S A | 2007 | 11/20/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
151 | GSE5183 | 6/22/2007 | ['5183'] | [] | [u'18003923'] | 2141819 | [u'18003923'] | ['Telonis-Scott', 'Wayne', 'Harshman', 'Nuzhdin', 'Bono', 'Kopp', 'McIntyre'] | ['Telonis-Scott', 'Wayne', 'Harshman', 'Nuzhdin', 'Bono', 'Kopp', 'McIntyre'] | ['Wayne', 'Harshman', 'Telonis-Scott', 'Nuzhdin', 'Kopp', 'Bono', 'McIntyre'] | Proc Natl Acad Sci U S A | 2007 | 11/20/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
152 | GSE5188 | 3/9/2007 | ['5188'] | [] | [u'17121677'] | 1687195 | [u'17121677'] | ['Bento', 'Deshusses', 'Schrenzel', 'Fran\xc3\xa7ois', u'Galle', 'Koessler', u'Francois', 'Huyghe', 'Charbonnier', 'Renzoni', 'Lew', 'Zimmermann-Ivol', 'Stahl-Zeng', 'Masselot', 'Gall\xc3\xa9', 'Vaudaux', 'Vaezzadeh', 'Binz', 'Scherl', 'Sanchez', 'Fischer', 'Hochstrasser'] | ['Sanchez', 'Charbonnier', 'Fischer', 'Renzoni', 'Bento', 'Deshusses', 'Binz', 'Schrenzel', 'Vaudaux', 'Scherl', 'Vaezzadeh', 'Fran\xc3\xa7ois', 'Masselot', 'Lew', 'Zimmermann-Ivol', 'Koessler', 'Stahl-Zeng', 'Gall\xc3\xa9', 'Huyghe', 'Hochstrasser'] | ['Sanchez', 'Charbonnier', 'Fischer', 'Renzoni', 'Bento', 'Deshusses', 'Binz', 'Vaudaux', 'Schrenzel', 'Vaezzadeh', 'Fran\xc3\xa7ois', 'Masselot', 'Scherl', 'Lew', 'Zimmermann-Ivol', 'Koessler', 'Stahl-Zeng', 'Gall\xc3\xa9', 'Huyghe', 'Hochstrasser'] | BMC Genomics | 2006 | 11/22/2006 | 0 | AND pmc_gds | 1 | 0 | ||||
153 | GSE5189 | 6/22/2007 | ['5189'] | [] | [u'18003923'] | 2141819 | [u'18003923'] | [u'Barmina', 'Telonis-Scott', 'Wayne', 'Harshman', 'Nuzhdin', 'Bono', 'Kopp', 'McIntyre'] | ['Telonis-Scott', 'Wayne', 'Harshman', 'Nuzhdin', 'Bono', 'Kopp', 'McIntyre'] | ['Wayne', 'Harshman', 'Telonis-Scott', 'Nuzhdin', 'Kopp', 'Bono', 'McIntyre'] | Proc Natl Acad Sci U S A | 2007 | 11/20/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
154 | GSE5190 | 6/22/2007 | ['5190'] | [] | [u'18003923'] | 2141819 | [u'18003923'] | [u'Barmina', 'Telonis-Scott', 'Wayne', 'Harshman', 'Nuzhdin', 'Bono', 'Kopp', 'McIntyre'] | ['Telonis-Scott', 'Wayne', 'Harshman', 'Nuzhdin', 'Bono', 'Kopp', 'McIntyre'] | ['Wayne', 'Harshman', 'Telonis-Scott', 'Nuzhdin', 'Kopp', 'Bono', 'McIntyre'] | Proc Natl Acad Sci U S A | 2007 | 11/20/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
155 | GSE5194 | 6/29/2007 | ['5194'] | [] | [u'17220249'] | 1828789 | [u'17220249'] | ['Harwood', 'Rey', 'Heiniger'] | ['Harwood', 'Rey', 'Heiniger'] | ['Harwood', 'Rey', 'Heiniger'] | Appl Environ Microbiol | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
156 | GSE5210 | 7/2/2007 | ['5210'] | [] | [u'17416649'] | 1913351 | [u'17416649'] | ['Grossman', 'Britton', 'K\xc3\xbcster-Sch\xc3\xb6ck', u'Kuester-Schoeck', 'Auchtung'] | ['Grossman', 'Britton', 'K\xc3\xbcster-Sch\xc3\xb6ck', 'Auchtung'] | ['Grossman', 'Britton', 'K\xc3\xbcster-Sch\xc3\xb6ck', 'Auchtung'] | J Bacteriol | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
157 | GSE5221 | 7/6/2007 | ['5221'] | [] | [u'18242515'] | 2262951 | [u'18242514'] | ['Lam', u'Kouros-Mehr', 'Poon', 'Chu', 'Ng', u'Werb', 'Yang', 'Fan', 'Ho', 'Lau', 'Yu', 'Ngai'] | ['Ewald', 'Kouros-Mehr', 'Littlepage', 'Egeblad', 'Slorach', 'Werb', 'Bechis', 'Ho', 'Pai'] | ['Werb', 'Kouros-Mehr', 'Ho'] | Cancer Cell | 2008 | 2008 Feb | 0 | Accession numbersÊMicroarray data were submitted to the GEO Omnibus Repository with accession numbersGSE5223ÊandÊGSE5221{{tag}}--REUSE--. | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
158 | GSE5223 | 7/6/2007 | ['5223'] | [] | [u'18242515'] | 2262951 | [u'18242514'] | ['Lam', u'Kouros-Mehr', 'Poon', 'Chu', 'Ng', u'Werb', 'Yang', 'Fan', 'Ho', 'Lau', 'Yu', 'Ngai'] | ['Ewald', 'Kouros-Mehr', 'Littlepage', 'Egeblad', 'Slorach', 'Werb', 'Bechis', 'Ho', 'Pai'] | ['Werb', 'Kouros-Mehr', 'Ho'] | Cancer Cell | 2008 | 2008 Feb | 0 | Accession numbersÊMicroarray data were submitted to the GEO Omnibus Repository with accession numbersGSE5223{{tag}}--REUSE--ÊandÊGSE5221. | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
159 | GSE5265 | 6/1/2007 | ['5265'] | [] | [u'17875704'] | 2856677 | [u'20419098'] | ['Yoo', 'Shih', 'Tang', 'Desprez', 'Wakefield', 'Parks', 'Nam', 'Vu', 'Mamura', 'Michalowska', 'Du', 'Ooshima', 'Anver'] | ['Madan', 'Yoon', 'Fang', 'Lin', 'Foltz', 'Yan', 'Kim', 'Hwang', 'Hood'] | [] | PLoS One | 2010 | 4/19/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
160 | GSE5287 | 8/1/2007 | ['5287'] | [] | [u'17671123'] | 2689870 | [u'19393097'] | ['', 'Toldbod', 'Mansilla', 'Orntoft', 'von', 'Dyrskj\xc3\xb8t', 'Koed', 'Sengel\xc3\xb8v', 'Ulh\xc3\xb8i', 'Als', 'Jensen'] | ['Nakai', 'Mizuno', 'Kitada', 'Sarai'] | [] | BMC Med Genomics | 2009 | 4/24/2009 | 1 | as possible. All tables were relationally linked and stored in the MySQL server. Table 1 Dataset content from PrognoScan Dataset Cancer type Subtype Cohort Author/Contributor Array type n Data source GSE13507 Bladder cancer Transitional cell carcinoma Cheongju Kim Human-6 v2 n = 165 GEO GSE5287{{tag}}--REUSE-- Bladder cancer Aarhus (1995–2004) Als et al . [ 10 ] HG-U133A n = 30 GEO GSE12417-GPL570 Blood cance|-GPL96 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133A n = 163 GEO GSE12417-GPL97 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133B n = 163 GEO GSE8970 Blood cancer AML San Diego Raponi et al . [ 12 ] HG-U133A n = 34 GEO GSE4475 Blood cancer B-cell lymphoma Berlin (2003–2005) Hummel et al . [ 13 ] HG-U133A n = 158 GEO E-TABM-346 Blood ca|E2658 Blood cancer Multiple myeloma Arkansas Zhan et al . [ 15 ] HG-U133_Plus_2 n = 559 GEO E-TABM-158 Breast cancer UCSF, CPMC (1989–1997) Chin et al . [ 16 ] HG-U133A n = 129 ArrayExpress GSE11121 Breast cancer Mainz (1988–1998) Schmidt et al . [ 17 ] HG-U133A n = 200 GEO GSE1378 Breast cancer MGH (1987–2000) Ma et al . [ 18 ] Arcturus 22 k n = 60 GEO GSE1379 Breast cancer|6-GPL96 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133A n = 159 GEO GSE1456-GPL97 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133B n = 159 GEO GSE2034 Breast cancer Rotterdam (1980–1995) Wang et al . [ 20 ] HG-U133A n = 286 GEO GSE2990 Breast cancer Uppsala, Oxford Sotiriou et al . [ 21 ] HG-U133A n = 187 GEO GSE3143 Breast cancer Duke |GSE3494-GPL96 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133A n = 236 GEO GSE3494-GPL97 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133B n = 236 GEO GSE4922-GPL96 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133A n = 249 GEO GSE4922-GPL97 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133B n = 249 GEO GSE|SE7378 Breast cancer UCSF Zhou et al . [ 26 ] U133AAofAv2 n = 54 GEO GSE7390 Breast cancer Uppsala, Oxford, Stockholm, IGR, GUYT, CRH (1980–1998) Desmedt et al . [ 27 ] HG-U133A n = 198 GEO GSE7849 Breast cancer Duke (1990–2001) Anders et al . [ 28 ] HG-U95A n = 76 GEO GSE9195 Breast cancer GUYT2 Loi et al . [ 25 ] HG-U133_Plus_2 n = 77 GEO GSE9893 Breast cancer Montpellier, Bordeau|man 21 K V12.0 n = 155 GEO GSE11595 Esophagus cancer Adenocarcinoma Sutton Giddings CRUKDMF_22 K_v1.0.0 n = 34 GEO GSE7696 Glioma Glioblastoma Lausanne Murat et al . [ 30 ] HG-U133_Plus_2 n = 70 GEO GSE4271-GPL96 Glioma MDA Phillips et al . [ 31 ] HG-U133A n = 77 GEO GSE4271-GPL97 Glioma MDA Phillips et al . [ 31 ] HG-U133B n = 77 GEO GSE2837 Head and neck cancer Squamous cell carcinoma VUMC, VAMC, |Adenocarcinoma Harvard Beer et al . [ 33 ] HG-U95A n = 84 Author's web site MICHIGAN-LC Lung cancer Adenocarcinoma Michigan (1994–2000) Beer et al . [ 33 ] HuGeneFL n = 86 Author's web site GSE11117 Lung cancer NSCLC Basel Baty Novachip human 34.5 k n = 41 GEO GSE3141 Lung cancer NSCLC Duke Bild et al . [ 22 ] HG-U133_Plus_2 n = 111 GEO GSE4716-GPL3694 Lung cancer NSCLC Nagoya (1995–|da et al . [ 34 ] GF200 n = 50 GEO GSE4716-GPL3696 Lung cancer NSCLC Nagoya (1995–1996) Tomida et al . [ 34 ] GF201 n = 50 GEO GSE8894 Lung cancer NSCLC Seoul Son HG-U133_Plus_2 n = 138 GEO GSE4573 Lung cancer Squamous cell carcinoma Michigan (1991–2002) Raponi et al . [ 35 ] HG-U133A n = 129 GEO DUKE-OC Ovarian cancer Duke Bild et al . [ 22 ] HG-U133A n = 134 Author's web site GSE8|horts. Datasets come from a number of different institutions around the world, and patient backgrounds differ. In addition, several datasets are based on specific subpopulations, for example, dataset GSE2034 is from lymph node-negative breast cancers, and GSE5287{{tag}}--REUSE-- is from cisplatin-containing chemotherapy-treated bladder cancers. Hence, it is possible that the specific association between gene expressio|subsequent care may affect the clinical course of a patient. 3) Experimental factors. Expression measurement of microarray is subject to various factors at the experiment level. Microdissection (e.g. GSE1378) would reduce contamination of mRNAs from non-cancer cells [ 57 ]. Formalin fixation of a sample (e.g. GSE2873) influences the quality of mRNAs [ 58 ]. Array type (e.g. Affymetrix, cDNA microarrays | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
161 | GSE5287 | 8/1/2007 | ['5287'] | [] | [u'17671123'] | 2707631 | [u'19609451'] | ['', 'Toldbod', 'Mansilla', 'Orntoft', 'von', 'Dyrskj\xc3\xb8t', 'Koed', 'Sengel\xc3\xb8v', 'Ulh\xc3\xb8i', 'Als', 'Jensen'] | ['Jiang', 'Lee', 'Zhang', 'Song', 'Liu', 'Zhao', 'Fan'] | [] | PLoS One | 2009 | 7/17/2009 | 1 | (MD Anderson cancer center database) [14] and UCSF-2 (Stanford microarray database) [15] and three HGG (grade III and GBM combined) sets from the cohorts UCLA (GEO GDS1975) [3] , MDA (GEO GDS1815) [4] , and CMBC (BROAD institute database) [2] ( Table 1 ). Among the five cohorts, UCLA, UCSF-1 and MDA have 35, 34, and|a I (63); II (20) d UM-HLM [23] Oligos Affymetrix 56 66 (10) 118 a I (160); II (48) d Bladder AUH [24] Oligos Affymetrix 13 NA 30 a III+IV (30) c Ovarium MNI(GSE8842) Spotted cDNA 80 52 (12) 13 a I (68) d a Death. b Metastasis. c Tumor grade. d Tumor stage. NA, not available. m, month. Yr, year. Ref, reference. Using the median OS as a cutoff for each cohort, w|iction of the three gene classifiers for patients with other tumor types, we obtained 12 cohorts including five breast cancer cohorts: GIS (ArrayExpress E-GEOD-3494) [16] , CRCM (GEO GSE9893) [17] , SUSM (Stanford microarray database) [18] , NCI (Rosetta inpharmatics inc database) [19] , EMC (GEO GSE2034) [20] , five l| , PCH (GEO GSE5843) [22] , CAN/DF (caArray) [23] , MSK (caArray) [23] , UM-HLM (caArray) [23] , one bladder cancer cohort AUH (GEO GSE5287{{tag}}--REUSE--) [24] , and one ovarian cancer cohort MNI (GEO GSE8842) with microarray expression data and clinicopathogical information publicly available (detailed in Materials and methods ) (|y stage I) from EMC [20] ; one bladder cancer set of 30 advanced bladder cancers from AUH [24] ; one ovarian tumor set of 68 stage I ovarian carcinomas from MNI (GEO GSE8842). For the two breast cancer cohorts NCI and EMC, where the overall survival times were unavailable, time to distant metastasis was used instead. For all the cohorts, we used normalized microarray d | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
162 | GSE5287 | 8/1/2007 | ['5287'] | [] | [u'17671123'] | 2863163 | [u'20334636'] | ['', 'Toldbod', 'Mansilla', 'Orntoft', 'von', 'Dyrskj\xc3\xb8t', 'Koed', 'Sengel\xc3\xb8v', 'Ulh\xc3\xb8i', 'Als', 'Jensen'] | ['Rouam', 'Moreau', 'Bro\xc3\xabt'] | [] | BMC Bioinformatics | 2010 | 3/24/2010 | 0 | e sizes which samples were hybridized on a same platform (Affymetrix HU133 Plus 2.0 or HU133A ; Affymetrix, Santa Clara, CA, USA). The datasets are publicly available on the GEO site under the labels GSE2034, GSE1456, GSE11121, GSE4573, GSE5287{{tag}}, GSE4271, GSE4412 and GSE19234, respectively, and they are briefly described below. GSE2034 cohort, breast cancer [ 24 ] This series includes 286 lymph-node neg|llow-up. The median metastasis-free survival time was 80 months. The two years metastasis-free survival was 83.9% [79.8%; 88.3%], and the five years metastasis-free survival was 66.7% [61.4%; 72.4%]. GSE1456 cohort, breast cancer [ 25 ] This series comprises 159 primary breast cancer patients (referred as Stockholm cohort). Metastasis-free survival measured the time from initial therapy until the first|llow-up. The median metastasis-free survival time was 80 months. The two years metastasis-free survival was 87.9% [83.0%, 93.2%], and the five years metastasis-free survival was 77.6% [71.3%, 84.4%]. GSE11121 cohort, breast cancer [ 26 ] This series is composed of 200 lymph node-negative breast cancer patients who were not treated by systemic therapy after surgery. Metastasis-free survival was defined |low-up. The median metastasis-free survival time was 149 months. The two years metastasis-free survival was 92.9% [89.3%; 96.5%], and the five years metastasis-free survival was 85.4% [80.6%; 90.6%]. GSE4573 cohort, lung cancer [ 27 ] This series comprises 129 patients with different stages of squamous cell carcinomas, who underwent surgery resection of the lung. Overall survival was defined as the tim|y until death or last follow-up. The median overall survival time was 63 months. The two years overall survival was 70.5% [63.1%; 78.9%], and the five years overall survival was 56.8% [48.3%; 66.7%]. GSE5287{{tag}}--REUSE-- cohort, bladder cancer [ 28 ] This series is composed of 30 patients who received chemotherapy. Overall survival was defined as the time from first chemotherapy to death or last follow-up. The medi|resection to death or last follow-up. The median overall-survival was 21 months. The two years overall-survival was 45.5% [35.6%, 58.1%], and the five years overall-survival was 22.6% [14.7%, 34.9%]. GSE4412 cohort, glioma [ 30 ] This series includes 85 patients who suffered of glioma of grade III or IV of any histologic type. The overall survival corresponded to the time from inclusion for surgical tr | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
163 | GSE5287 | 8/1/2007 | ['5287'] | [] | [u'17671123'] | 2268699 | [u'18237400'] | ['', 'Toldbod', 'Mansilla', 'Orntoft', 'von', 'Dyrskj\xc3\xb8t', 'Koed', 'Sengel\xc3\xb8v', 'Ulh\xc3\xb8i', 'Als', 'Jensen'] | ['Christensen', 'Orntoft', 'Dyrskj\xc3\xb8t', 'Thykjaer', 'Wiuf', 'Herbsleb', 'Borre'] | ['Orntoft', 'Dyrskj\xc3\xb8t'] | BMC Cancer | 2008 | 1/31/2008 | 0 | om twenty-three patients suffering from T2-4 muscle invasive bladder tumors were also included. For description of these data, see [ 11 ]. All original data are found at [ 12 ] with accession numbers GSE3167 and GSE5287{{tag}}--DEPOSIT--. Informed consent was obtained from all enrolled patients and the protocol was approved by the Scientific Ethical Committee of Aarhus County. Gene expression was measured using Affymetr | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
164 | GSE5302 | 4/25/2007 | ['5302'] | [] | [u'17559304'] | 1891326 | [u'17559304'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | PLoS Pathog | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
165 | GSE5303 | 4/25/2007 | ['5303'] | [] | [u'17559304'] | 1891326 | [u'17559304'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | PLoS Pathog | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
166 | GSE5310 | 3/2/2007 | ['5310'] | [] | [u'17242213'] | 2796272 | [u'19633971'] | ['Collins', 'Jewell', 'Cidlowski', 'Lewis-Tuffin', 'Bienstock'] | ['Kino', 'Chrousos', 'Su'] | [] | Cell Mol Life Sci | 2009 | 2009 Nov | 0 | corticoids. Helix 12 changes its localization dramatically upon binding to ligands, playing a critical role in the formation of a binding surface for the coactivator (LXXLL) motif. Image sources were downloaded from the RCSB Protein Data Bank ( http://www.rcsb.org ), while the images were created using the MacPyMOL software. Yellow bold arrow: ligand-binding pocket; white arrow: Helix 12; white arrowhead: We have compared the microarray results obtained by us and those of others [10,11], and found that the 2 studies share 78 genes modulated by overexpression of GR_ (Table 1). --REUSE-- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
167 | GSE5310 | 3/2/2007 | ['5310'] | [] | [u'17242213'] | 1820503 | [u'17242213'] | ['Collins', 'Jewell', 'Cidlowski', 'Lewis-Tuffin', 'Bienstock'] | ['Collins', 'Jewell', 'Cidlowski', 'Lewis-Tuffin', 'Bienstock'] | ['Collins', 'Jewell', 'Cidlowski', 'Bienstock', 'Lewis-Tuffin'] | Mol Cell Biol | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
168 | GSE5321 | 4/30/2007 | ['5321'] | [] | [] | 2329912 | [u'18241855'] | [u'Kong', u'Pignoni', u'Yang-Zhou', u'Ranade'] | ['McDonald', 'Kong', 'Yang-Zhou', 'Cook', 'Pignoni', 'Ranade'] | [u'Kong', u'Pignoni', u'Yang-Zhou', u'Ranade'] | Dev Biol | 2008 | 3/15/2008 | 0 | in immunofluorescent stainings or β-Galactosidase activity measurements ( Lee and Carthew, 2003 ). Microarray analysis Microarray data have been deposited in NCBI’s Gene Expression Omnibus with the GEO Series accession number GSE5321{{tag}}--DEPOSIT-- ( http://www.ncbi.nlm.nih.gov/geo/ ). High density oligonucleotide microarrays covering the Drosophila melanogaster genome (DrosGenome1) from Affymetr | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
169 | GSE5323 | 8/22/2007 | ['5323'] | [] | [u'17137508'] | 1693924 | [u'17137508'] | ['Nielsen', 'Xue-Franz\xc3\xa9n', 'Kjaerulff', 'Wright', u'Kjarulff', u'Xue-Franzen', 'Holmberg'] | ['Holmberg', 'Wright', 'Nielsen', 'Xue-Franz\xc3\xa9n', 'Kjaerulff'] | ['Holmberg', 'Wright', 'Nielsen', 'Xue-Franz\xc3\xa9n', 'Kjaerulff'] | BMC Genomics | 2006 | 11/30/2006 | 0 | AND pmc_gds | 1 | 0 | ||||
170 | GSE5325 | 4/7/2007 | ['5325'] | [] | [u'17452630'] | 2481503 | [u'18559090'] | ['Koujak', 'Johansson', 'Maurer', 'Hibshoosh', 'Memeo', 'Ferrando', 'Ringn\xc3\xa9r', 'Malmstr\xc3\xb6m', 'Rosen', 'Parsons', 'Bendahl', 'Holm', 'Saal', 'She', 'Borg', u'Malmstrom', 'Gruvberger-Saal', u'Ringner', 'Isola'] | ['Hegardt', 'Grabau', 'Honeth', 'Ringn\xc3\xa9r', 'Bendahl', 'Saal', 'L\xc3\xb6vgren', 'Fern\xc3\xb6', 'Borg', 'Gruvberger-Saal'] | ['Borg', 'Bendahl', 'Gruvberger-Saal', 'Saal', 'Ringn\xc3\xa9r'] | Breast Cancer Res | 2008 | 2008 | 0 | rt, mRNA expression analysis has previously been performed using cDNA microarrays with 27,648 reporters [ 22 , 23 ]. The microarray data for these 168 tumors are available through the Gene Expression Omnibus database (accession numbers GSE6577 and GSE5325{{tag}}--DEPOSIT--). Data pre-processing and filtering for the selected 168 tumors were performed using the BioArray Software Environment [ 24 ] as previously described [ | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
171 | GSE5325 | 4/7/2007 | ['5325'] | [] | [u'17452630'] | 1855070 | [u'17452630'] | ['Koujak', 'Johansson', 'Maurer', 'Hibshoosh', 'Memeo', 'Ferrando', 'Ringn\xc3\xa9r', 'Malmstr\xc3\xb6m', 'Rosen', 'Parsons', 'Bendahl', 'Holm', 'Saal', 'She', 'Borg', u'Malmstrom', 'Gruvberger-Saal', u'Ringner', 'Isola'] | ['Koujak', 'Johansson', 'Maurer', 'Hibshoosh', 'Memeo', 'Ferrando', 'Ringn\xc3\xa9r', 'Malmstr\xc3\xb6m', 'Rosen', 'Parsons', 'Bendahl', 'Holm', 'Saal', 'She', 'Borg', 'Gruvberger-Saal', 'Isola'] | ['Koujak', 'Johansson', 'Maurer', 'Hibshoosh', 'Memeo', 'Ferrando', 'Ringn\xc3\xa9r', 'Malmstr\xc3\xb6m', 'Rosen', 'Parsons', 'Bendahl', 'Holm', 'Saal', 'She', 'Borg', 'Gruvberger-Saal', 'Isola'] | Proc Natl Acad Sci U S A | 2007 | 5/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
172 | GSE5327 | 5/14/2007 | ['5327'] | [] | [u'17420468'] | 2563019 | [u'18803878'] | ['van', 'Wang', 'Minn', 'Zhang', 'Gupta', 'Bos', 'Nuyten', 'Nguyen', 'Massagu\xc3\xa9', 'Padua', u'Massague', 'Kreike', 'Foekens', 'Ishwaran'] | ['Sims', 'Pepper', 'Miller', 'Clarke', 'Hey', 'Okoniewski', 'Howell', 'Smethurst'] | [] | BMC Med Genomics | 2008 | 9/21/2008 | 1 | ed in this study. Datasets No. Tumours Array express/GEO ID GeneChip ER+ Age Tumour Size (cm) FU (years) Reference Chin et al. 2006 114 E-TABM U133AA 67% 51 2.3 6.1 [ 16 ] Desmedt et al. 2007 198 GSE7390 U133A 68% 47 2.0 13.6 [ 17 ] Farmer et al 2005 49 GSE1561 U133A 58% - - - [ 11 ] Ivshina et al. 2006 249 GSE4922 U133A 85% 63 2.0 9.9 [ 18 ] Loi et al. 2007 119, 87 GSE6532 U133A, U133 plus2.|t al. 2007 58 GSE5327{{tag}}--REUSE-- U133A 0% - - 7.2 [ 33 ] Pawitan et al. 2005 159 GSE1456 U133A 83% 58 $ 2.2 $ 7.1 [ 19 ] Richardson et al. 40 GSE3744 U133 plus2.0 38% - - - [ 10 ] Sotiriou et al. 2006 101* GSE2990 U133A 71% 60 2.0 5.8 [ 20 ] Wang et al. 2005 286 GSE2034 U133A 73% 52 - 7.2 [ 52 ] Continuous variables (age, size and follow up) are given as median values, except where indicated $ the mean | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
173 | GSE5327 | 5/14/2007 | ['5327'] | [] | [u'17420468'] | 3008357 | [u'19483725'] | ['van', 'Wang', 'Minn', 'Zhang', 'Gupta', 'Bos', 'Nuyten', 'Nguyen', 'Massagu\xc3\xa9', 'Padua', u'Massague', 'Kreike', 'Foekens', 'Ishwaran'] | ['Baba', 'Gatza', 'Matsumura', 'Nevins', 'Murphy', 'Andrechek', 'Yao', 'Mori', 'Kim', 'Chang'] | [] | Oncogene | 2009 | 8/6/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
174 | GSE5327 | 5/14/2007 | ['5327'] | [] | [u'17420468'] | 2077336 | [u'17894856'] | ['van', 'Wang', 'Minn', 'Zhang', 'Gupta', 'Bos', 'Nuyten', 'Nguyen', 'Massagu\xc3\xa9', 'Padua', u'Massague', 'Kreike', 'Foekens', 'Ishwaran'] | ['Wang', 'Zhang', 'Foekens', 'Klijn', 'Martens', 'Yu', 'Sieuwerts', 'Smid'] | ['Foekens', 'Wang', 'Zhang'] | BMC Cancer | 2007 | 9/25/2007 | 1 | in the signatures, genes in each of the signatures were mapped to GOBP. Data availability The microarray data analyzed in this paper have been submitted to the NCBI/Genbank GEO database (series entry GSE2034 for the first 286 patients, and GSE5327{{tag}}--REUSE-- for the additional 58 patients). The microarray and clinical data used for the independent validation testing set analysis were obtained from the GEO databas | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
175 | GSE5327 | 5/14/2007 | ['5327'] | [] | [u'17420468'] | 2527336 | [u'18684329'] | ['van', 'Wang', 'Minn', 'Zhang', 'Gupta', 'Bos', 'Nuyten', 'Nguyen', 'Massagu\xc3\xa9', 'Padua', u'Massague', 'Kreike', 'Foekens', 'Ishwaran'] | ['Reyal', 'Wessels', 'van', 'Reinders', 'Horlings'] | ['van'] | BMC Genomics | 2008 | 8/6/2008 | 1 | l measured on Human Genome HG U133A Affymetrix arrays and normalized using the same protocol. The datasets were downloaded from NCBI's Gene Expression Omnibus (GEO, ) with the following identifiers; GSE6532 [ 24 ], GSE3494 [ 18 ], GSE1456 [ 23 ], GSE7390 [ 4 ] and GSE5327{{tag}}--REUSE-- [ 22 ]. The Chin et al. [ 25 ] data set was downloaded from ArrayExpress ( , identifier E-TABM-158). To ensure comparability betw | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
176 | GSE5327 | 5/14/2007 | ['5327'] | [] | [u'17420468'] | 2602602 | [u'19104654'] | ['van', 'Wang', 'Minn', 'Zhang', 'Gupta', 'Bos', 'Nuyten', 'Nguyen', 'Massagu\xc3\xa9', 'Padua', u'Massague', 'Kreike', 'Foekens', 'Ishwaran'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | ['Nguyen'] | PLoS One | 2008 | 2008 | 1 | al Biology/Genomics Computational Biology/Transcriptional Regulation TranscriptomeBrowser: A Powerful and Flexible Toolbox to Explore Productively the Transcriptional Landscape of the Gene Expression Omnibus Database GEO Datamining with TBrowser Lopez Fabrice 1 2 Textoris Julien 1 2 5 Bergon Aurélie 1 2 Didier Gilles 2 3 Remy Elisabeth 2 3 Granjeaud Samuel 1 2 Imbert Jean 1 2 Nguyen Catherine 1 2|. Methodology We used a modified version of the Markov clustering algorithm to systematically extract clusters of co-regulated genes from hundreds of microarray datasets stored in the Gene Expression Omnibus database (n = 1,484). This approach led to the definition of 18,250 transcriptional signatures (TS) that were tested for functional enrichment using the DAVID knowledgebase. O|data, most generally deposited in MIAME-compliant public databases, constitute an unprecedented source of knowledge for biologists [1] . As an example, until now, the Gene Expression Omnibus repository (GEO) host approximately 8,000 experiments encompassing about 200,000 biological samples analyzed using various high through-put technologies [2] . Consequently, this repr|iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ry service made this work possible. Materials and Methods Microarray data retrieval Human mouse and rat microarray data derived from 30 Affymetrix microarray platforms (Supplementary Table S1 ) were downloaded from the GEO ftp site and retrieved in seriesMatrix file format ( ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/ ). SeriesMatrix are summary text files related to a GEO series Experiment (GSE) t|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
177 | GSE5327 | 5/14/2007 | ['5327'] | [] | [u'17420468'] | 2529088 | [u'18755890'] | ['van', 'Wang', 'Minn', 'Zhang', 'Gupta', 'Bos', 'Nuyten', 'Nguyen', 'Massagu\xc3\xa9', 'Padua', u'Massague', 'Kreike', 'Foekens', 'Ishwaran'] | ['Boersma', 'Klijn', 'Look', 'de', 'Wiemer', 'Martens', 'Foekens', 'Sieuwerts', 'Smid'] | ['Foekens'] | Proc Natl Acad Sci U S A | 2008 | 9/2/2008 | 0 | ormed using the statistical package STATA, release 10 (Stata). Pathway Analysis. Affymetrix microarray gene expression data (HG-U133A chips) previously deposited in the NCBI/GEO database (entries GSE2034 and GSE5327{{tag}}--REUSE-- ) were available from all 184 ER + patients and 114 ER − patients. As done previously ( 2 , 26 ), gene expression signals were calculated using Affymetrix GeneChip analysis | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
178 | GSE5327 | 5/14/2007 | ['5327'] | [] | [u'17420468'] | 1871856 | [u'17420468'] | ['van', 'Wang', 'Minn', 'Zhang', 'Gupta', 'Bos', 'Nuyten', 'Nguyen', 'Massagu\xc3\xa9', 'Padua', u'Massague', 'Kreike', 'Foekens', 'Ishwaran'] | ['van', 'Wang', 'Minn', 'Zhang', 'Gupta', 'Bos', 'Nuyten', 'Nguyen', 'Massagu\xc3\xa9', 'Padua', 'Kreike', 'Foekens', 'Ishwaran'] | ['van', 'Wang', 'Minn', 'Zhang', 'Gupta', 'Bos', 'Nuyten', 'Nguyen', 'Massagu\xc3\xa9', 'Padua', 'Kreike', 'Foekens', 'Ishwaran'] | Proc Natl Acad Sci U S A | 2007 | 4/17/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
179 | GSE5327 | 5/14/2007 | ['5327'] | [] | [u'17420468'] | 2749247 | [u'19573813'] | ['van', 'Wang', 'Minn', 'Zhang', 'Gupta', 'Bos', 'Nuyten', 'Nguyen', 'Massagu\xc3\xa9', 'Padua', u'Massague', 'Kreike', 'Foekens', 'Ishwaran'] | ['Wang', 'Zhang', 'Massagu\xc3\xa9', 'Gerald', 'Hudis', 'Norton', 'Foekens', 'Smid'] | ['Massagu\xc3\xa9', 'Foekens', 'Wang', 'Zhang'] | Cancer Cell | 2009 | 7/7/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
180 | GSE5339 | 1/1/2007 | ['5339'] | ['2535'] | [u'17459161'] | 1865536 | [u'17459161'] | ['Antao-Menezes', 'Pluta', 'Thomas', 'Ingram', 'Wallace', 'Mangum', 'Turpin', u'Menezes', 'Bonner'] | ['Antao-Menezes', 'Pluta', 'Thomas', 'Ingram', 'Wallace', 'Mangum', 'Turpin', 'Bonner'] | ['Antao-Menezes', 'Pluta', 'Thomas', 'Ingram', 'Wallace', 'Mangum', 'Turpin', 'Bonner'] | Respir Res | 2007 | 4/25/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
181 | GSE5342 | 2/5/2007 | ['5342'] | ['2602'] | [u'17266762'] | 1839133 | [u'17266762'] | ['Bosetti', 'Prabhu', 'Becker', 'Toscano', 'Langenbach'] | ['Bosetti', 'Prabhu', 'Becker', 'Toscano', 'Langenbach'] | ['Bosetti', 'Becker', 'Toscano', 'Prabhu', 'Langenbach'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
182 | GSE5345 | 1/1/2007 | ['5345'] | [] | [u'17263875'] | 1800835 | [u'17263875'] | ['Louro', 'Amaral', 'Festa', 'Reis', 'Nakaya', 'Verjovski-Almeida', 'da', 'Sogayar'] | ['Louro', 'Amaral', 'Festa', 'Reis', 'Nakaya', 'Verjovski-Almeida', 'da', 'Sogayar'] | ['Louro', 'Amaral', 'Festa', 'Reis', 'Nakaya', 'Verjovski-Almeida', 'da', 'Sogayar'] | BMC Biol | 2007 | 1/30/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
183 | GSE5381 | 3/1/2007 | ['5381'] | [] | [] | 2748096 | [u'19728865'] | [u'Gerrish', u'Blanchard', u'Auman', u'Chou', u'Huang', u'Jayadev'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | [] | BMC Genomics | 2009 | 9/3/2009 | 1 | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192 (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965 GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381{{tag}}--REUSE-- GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029 GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615 GPL2884 Cancer Process Study 0.752 0.0107 GSE7600 GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192 GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625 GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528 GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
184 | GSE5381 | 3/1/2007 | ['5381'] | [] | [] | 1852695 | [u'17450226'] | [u'Gerrish', u'Blanchard', u'Auman', u'Chou', u'Huang', u'Jayadev'] | ['Paules', 'Gerrish', 'Blanchard', 'Auman', 'Jayadev', 'Huang', 'Chou'] | [u'Gerrish', u'Blanchard', u'Auman', u'Chou', u'Huang', u'Jayadev'] | Environ Health Perspect | 2007 | 2007 Apr | 0 | e National Center for Biotechnology Information’s Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ ; Edgar et al. 2002 ) and are accessible through GEO Series accession number GSE5381{{tag}}--REUSE--. Results Histopathology A detailed histopathologic analysis of the rat livers from methapyrilene administration has been described previously ( Hamadeh et al. 2002 ). Briefly summarized, high-dose | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
185 | GSE5383 | 6/7/2007 | ['5383'] | [] | [u'17526797'] | 1932840 | [u'17526797'] | ['Borden', 'Paredes', 'Spath', u'Jones', 'Sillers', 'Senger', u'Cheng', 'Papoutsakis'] | ['Borden', 'Paredes', 'Spath', 'Sillers', 'Senger', 'Papoutsakis'] | ['Borden', 'Paredes', 'Spath', 'Sillers', 'Senger', 'Papoutsakis'] | Appl Environ Microbiol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
186 | GSE5384 | 6/7/2007 | ['5384'] | [] | [u'17526797'] | 1932840 | [u'17526797'] | ['Borden', 'Paredes', 'Spath', u'Jones', 'Sillers', 'Senger', u'Cheng', 'Papoutsakis'] | ['Borden', 'Paredes', 'Spath', 'Sillers', 'Senger', 'Papoutsakis'] | ['Borden', 'Paredes', 'Spath', 'Sillers', 'Senger', 'Papoutsakis'] | Appl Environ Microbiol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
187 | GSE5394 | 12/26/2007 | ['5394'] | [] | [u'18945907'] | 2585504 | [u'18945907'] | ['Seidl', 'Harris', 'Rubel', 'Lurie', 'Iguchi'] | ['Seidl', 'Harris', 'Rubel', 'Lurie', 'Iguchi'] | ['Seidl', 'Harris', 'Rubel', 'Lurie', 'Iguchi'] | J Neurosci | 2008 | 10/22/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
188 | GSE5401 | 7/3/2007 | ['5401'] | [] | [u'17578578'] | 1924859 | [u'17578578'] | ['Butler', 'Nevin', 'Lovley', 'Zhou', 'He'] | ['Butler', 'Nevin', 'Lovley', 'Zhou', 'He'] | ['Butler', 'Nevin', 'Lovley', 'Zhou', 'He'] | BMC Genomics | 2007 | 6/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
189 | GSE5422 | 6/1/2007 | ['5422'] | [] | [u'17506876'] | 1885259 | [u'17506876'] | ['Adjaye', 'Greber', 'Lehrach'] | ['Adjaye', 'Greber', 'Lehrach'] | ['Adjaye', 'Greber', 'Lehrach'] | BMC Dev Biol | 2007 | 5/16/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
190 | GSE5423 | 6/1/2007 | ['5423'] | [] | [u'17506876'] | 1885259 | [u'17506876'] | ['Adjaye', 'Greber', 'Lehrach'] | ['Adjaye', 'Greber', 'Lehrach'] | ['Adjaye', 'Greber', 'Lehrach'] | BMC Dev Biol | 2007 | 5/16/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
191 | GSE5452 | 3/29/2007 | ['5452'] | [] | [u'17386095'] | 1868932 | [u'17386095'] | ['Louro', 'Amaral', u'Almeida', 'Reis', 'Moreira', 'Nakaya', 'Verjovski-Almeida', 'Fachel', 'Lopes', 'da', 'El-Jundi'] | ['Louro', 'Amaral', 'Reis', 'Moreira', 'Nakaya', 'Verjovski-Almeida', 'Fachel', 'Lopes', 'da', 'El-Jundi'] | ['Louro', 'Amaral', 'Reis', 'Moreira', 'Nakaya', 'Verjovski-Almeida', 'Fachel', 'Lopes', 'da', 'El-Jundi'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
192 | GSE5453 | 3/29/2007 | ['5453'] | [] | [u'17386095'] | 1868932 | [u'17386095'] | ['Louro', 'Amaral', u'Almeida', 'Reis', 'Moreira', 'Nakaya', 'Verjovski-Almeida', 'Fachel', 'Lopes', 'da', 'El-Jundi'] | ['Louro', 'Amaral', 'Reis', 'Moreira', 'Nakaya', 'Verjovski-Almeida', 'Fachel', 'Lopes', 'da', 'El-Jundi'] | ['Louro', 'Amaral', 'Reis', 'Moreira', 'Nakaya', 'Verjovski-Almeida', 'Fachel', 'Lopes', 'da', 'El-Jundi'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
193 | GSE5456 | 7/17/2007 | ['5456'] | [] | [u'17331240'] | 2684550 | [u'19397818'] | ['L\xc3\xa9ger', 'Hocquette', 'Bernard', 'Cassar-Malek', u'L\xe9ger', 'Passelaigue'] | ['Picard', 'Chevalier', 'Hocquette', 'Cassar-Malek', 'Meunier', 'Chelh', 'Reecy'] | ['Cassar-Malek', 'Hocquette'] | BMC Genomics | 2009 | 4/27/2009 | 0 | performed according to recently proposed standards (MIAME consortium). Data were incorporated into the BASE database and the NCBI Gene Expression Omnibus (GEO) and are accessible through GEO Series GSE5561 and GSE5456{{tag}}--DEPOSIT--. Total RNA was extracted from muscle tissue samples with TRIZOL ® reagent (Life Technologies) according to the manufacturer's recommendation. The RNA was then purified and trea | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
194 | GSE5456 | 7/17/2007 | ['5456'] | [] | [u'17331240'] | 1831773 | [u'17331240'] | ['L\xc3\xa9ger', 'Hocquette', 'Bernard', 'Cassar-Malek', u'L\xe9ger', 'Passelaigue'] | ['L\xc3\xa9ger', 'Bernard', 'Cassar-Malek', 'Passelaigue', 'Hocquette'] | ['L\xc3\xa9ger', 'Bernard', 'Cassar-Malek', 'Passelaigue', 'Hocquette'] | BMC Genomics | 2007 | 3/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
195 | GSE5458 | 5/2/2007 | ['5458'] | ['2702'] | [u'16923961'] | 1636731 | [u'16923961'] | ['Lyu', 'Azarova', 'Wang', 'Lin', 'Liu', 'Cai'] | ['Lyu', 'Azarova', 'Wang', 'Lin', 'Liu', 'Cai'] | ['Lyu', 'Azarova', 'Wang', 'Lin', 'Liu', 'Cai'] | Mol Cell Biol | 2006 | 2006 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
196 | GSE5460 | 5/26/2007 | ['5460'] | [] | [u'18297396'] | 2602602 | [u'19104654'] | ['Richardson', 'Lu', 'Wang', 'Iglehart', 'Zhang'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460{{tag}}--REUSE-- GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
197 | GSE5460 | 5/26/2007 | ['5460'] | [] | [u'18297396'] | 2872436 | [u'20335537'] | ['Richardson', 'Lu', 'Wang', 'Iglehart', 'Zhang'] | ['Potti', 'Gatza', 'Lucas', 'Nevins', 'Barry', 'Kelley', 'Datto', 'Mathey-Prevot', 'Kim', 'Crawford', 'Wang'] | ['Wang'] | Proc Natl Acad Sci U S A | 2010 | 4/13/2010 | 0 | sion Materials and Methods Supplementary Material References Materials and Methods Human Breast Tumor Samples and Cancer Cell Lines. A total of 1,143 patient samples from 10 independent datasets ( GSE1456 , GSE1561 , GSE2034 , GSE3494 , GSE3744 , GSE4922 , GSE5460{{tag}}--REUSE-- , GSE5764 , GSE6596 , and E-TABM-158) were analyzed ( 9 , 32 – 40 ). The validation dataset ( n = 547) was derived from | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
198 | GSE5460 | 5/26/2007 | ['5460'] | [] | [u'18297396'] | 2883591 | [u'20548947'] | ['Richardson', 'Lu', 'Wang', 'Iglehart', 'Zhang'] | ['Wagoner', 'Friedl', 'Gunsalus', 'Schoenike', 'Roopra', 'Richardson'] | ['Richardson'] | PLoS Genet | 2010 | 6/10/2010 | 0 | g002 Figure 2 The 24-gene signature detects loss of REST function in breast tumors. (A) Hierarchical clustering analysis is performed on gene expression microarray data from 129 breast cancer tumors (GSE5460{{tag}}--REUSE--) across the 24-gene REST gene signature. Five tumors show a concerted overexpression of REST target genes, suggesting a loss of REST repression. (B) The expression of genes significantly upregulate|x0003c;0.001, FDR q-value <0.01). 10.1371/journal.pgen.1000979.g003 Figure 3 Gene set enrichment analysis of REST–less tumors. Gene Set Enrichment Analysis of the breast tumor dataset GSE5460{{tag}}--REUSE-- shows induction of REST target genes in REST–less tumors using three separate lists of experimentally defined REST target genes. (A) The “REST gene signature” 24-gene set co|igure 4C ). 10.1371/journal.pgen.1000979.g004 Figure 4 REST mRNA levels in breast tissue. (A) Mean REST mRNA levels were assessed in REST–less and RESTfl breast tumors from microarray dataset GSE5460{{tag}}--REUSE--. All error bars represent standard error. (B) Mean REST mRNA levels were compared in normal and tumor tissues across three independent datasets, all of which show a statistically significant increa|t GDS2250, representing three distinct tumor types in addition to normal tissue. (C) REST mRNA data is presented from three independent datasets broken down by stratified by stage (E-TABM-158) grade (GSE6532) and eventual relapse (GSE2034). There is no significant difference in REST mRNA levels across any of these conditions. REST–less tumors show increased levels of the REST splice variant RES|T–less tumors, but not in any of the REST competent tumors. (B) Quantitative real-time RTPCR analysis of REST4 levels (relative to actin), in nine tumors represented in the microarray dataset GSE5460{{tag}}--REUSE--. REST4 mRNA, was detected in REST–less, but not RESTfl tumors after 35 cycles of amplification. (C) Patients with REST–less breast tumors in the superseries GSE6532, as defined by t|ure, show a significant decrease in their disease free survival with respect to their RESTfl counterparts (p<0.01). We then used the 24-gene signature to classify the breast tumor superseries GSE6532 into REST–less and RESTfl tumors and determined how REST status associated with patient outcome ( Figure 5C ) [26] . This analysis shows that REST–less tumors ident|n data were obtained from the NCBI Gene Expression Omnibus, and are identified by their GEO dataset record number. Dataset E-TABM-276 was downloaded from the Ensembl ArrayExpress. Analysis of dataset GSE6532 was performed to determine the aggressiveness of tumors identified as being REST–less using the gene signature method. All samples from this dataset that included information on duration of|cell lines: HEK-293, MCF10a, and T47D cells. (0.10 MB XLS) Click here for additional data file. Table S2 This table describes the gene list determined to be associated with RESTless tumors in dataset GSE5460{{tag}}--REUSE-- by class comparison. This list of genes represents all of the genes identified as more highly expressed in RESTless tumors with respect to RESTfl tumors in the GSE5460{{tag}}--REUSE-- dataset (p<e-7) by us| upon REST knockdown in at least one of three cell lines. (0.08 MB XLS) Click here for additional data file. Table S3 This table provides the gene sets used in gene set enrichment analysis of dataset GSE5460{{tag}}--REUSE--. (0.06 MB XLS) Click here for additional data file. We would like to thank John Svaren, Caroline Alexander, and members of the Roopra lab for advice with the manuscript. This manuscript is dedicate | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
199 | GSE5460 | 5/26/2007 | ['5460'] | [] | [u'18297396'] | 2909213 | [u'20668531'] | ['Richardson', 'Lu', 'Wang', 'Iglehart', 'Zhang'] | ['Selfors', 'Wang', 'Mills', 'Frye', 'Natesan', 'Lu', 'Shrestha', 'Polyak', 'Yao', 'Iida', 'Zou', 'Irie', 'Richardson', 'Brugge', 'Hahn', 'Epstein'] | ['Richardson', 'Lu', 'Wang'] | PLoS One | 2010 | 7/23/2010 | 0 | gene were estimated by averaging copy numbers from all SNPs found within the gene structure and flanking 100-kbp regions. All of the raw data were deposited into Oncomine and Gene Expression Omnibus (GSE19399) and are publicly available: ( https://www.oncomine.com/resource/login.html and http://www.ncbi.nlm.nih.gov/sites/entrez?db=gds&term=GSE19399Accession&cmd=search) . FISH analysis|er Cytogenetics Core Facility (P30 CA006516). Microarray analyses Breast tumor subtype analyses Boxplots showing the level of PTK6 mRNA in breast tumor subtypes were derived from three data sets: (1) GSE1992 downloaded from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/projects/geo/ ) [48] , (2) Rosetta Inpharmatics ( http://www.rii.com/publications/2002/vantveer.html |05b;50] , [51] . Boxplots were generated in Matlab and p-values (ANOVA) were calculated in JMP 7.0. Classification of tumors in the Van't Veer data set was obtained from GEO (GSE4382) [52] . For GSE5460{{tag}}--REUSE--, raw expression values obtained from Affymetrix GENECHIP software were additionally analyzed using DNA-Chip analyzer (dChip) custom software ( www.dchip.org ). |y the 226 samples classified as ER+ by Van't Veer et al. were used. For the Wang set, PTK6 expression values and time to recurrence data were downloaded from the Gene Expression Omnibus (GEO, GSE2034). Samples were divided into three equal tertiles: 95 tumors with highest PTK6 level (>242.3); 96 tumors with intermediate PTK6 level (between 126.3 and 243.7) and 95 tumors with low | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
200 | GSE5460 | 5/26/2007 | ['5460'] | [] | [u'18297396'] | 2788761 | [u'19878873'] | ['Richardson', 'Lu', 'Wang', 'Iglehart', 'Zhang'] | ['Ni', 'Gu', 'Hahn', 'Brown', 'Zhang', 'Kaelin', 'Bommi-Reddy', 'Li', 'Cheung', 'Liu', 'Polyak', 'Richardson', 'Geisen', 'Root', 'Boehm', 'Luo'] | ['Richardson', 'Zhang'] | Cancer Cell | 2009 | 11/6/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
201 | GSE5462 | 12/10/2007 | ['5462'] | ['3116'] | [u'17885619'] | 2845644 | [u'20361040'] | ['Hampton', 'Evans', 'Miller', 'Larionov', 'Renshaw', 'Murray', 'Krause', 'Anderson', 'Ho', 'Walker', 'White', 'Dixon'] | ['Daigle', 'Cushman', 'McLaughlin', 'Tsao', 'Altman', 'Reaven', 'Cam', 'Deng'] | [] | PLoS Comput Biol | 2010 | 3/26/2010 | 0 | nificant DE genes from each dataset in its entirety; this resulted in 1122 (12.3%), 588 (4.4%), and 6002 (29.9%) DE genes for the prostate cancer, letrozole treatment (GEO ID: GSE5462{{tag}}--REUSE--), and colorectal cancer (GSE8671) datasets, respectively. After downloading the three corresponding knowledge compendia (minus the highly replicated datasets) and running SVD on each, we determined|ouse transcriptomes. Proc Natl Acad Sci U S A 99 4465 70 11904358 3 IGC 2008 expo (expression project for oncology). URL http://www.intgen.org/expo 4 Edgar R Domrachev M Lash AE 2002 Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic Acids Res 30 207 10 11752295 5 Ein-Dor L Zuk O Domany E 2006 Thousands of samples are needed to generate a robust gene list for | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
202 | GSE5462 | 12/10/2007 | ['5462'] | ['3116'] | [u'17885619'] | 2831002 | [u'20064233'] | ['Hampton', 'Evans', 'Miller', 'Larionov', 'Renshaw', 'Murray', 'Krause', 'Anderson', 'Ho', 'Walker', 'White', 'Dixon'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux'] | [] | BMC Bioinformatics | 2010 | 1/11/2010 | 0 | using the R package GCRMA [ 20 ]. As the benchmark is tested gene by gene, a pre-treatment including all Affybatch objects globally was not needed. Table 2 Datasets list Dataset Number of replicates GSE10072 107 GSE10760 98 GSE1561 49 GSE1922 49 GSE3790FC 65 GSE3790CN 70 GSE3790CB 54 GSE3846 108 GSE3910 70 GSE3912 113 GSE5388 61 GSE5392 82 GSE5462{{tag}}--REUSE-- 116 GSE5580 42 GSE5847 95 GSE646-7 93 GSE643-5 126 GSE|6 38 GSE9874b-f 60 GSE9874 60 GSE9877 47 GSE994 75 Datasets used for construction of the initial matrix. The number of replicates is the number of microarrays in the experiment. Giant datasets (e.g., GSE3790 with 202 replicates in three different brain regions) were first split into subsets according to their biological content. The datasets were then sampled as follows: when the number of replicates w | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
203 | GSE5462 | 12/10/2007 | ['5462'] | ['3116'] | [u'17885619'] | 2949641 | [u'20646288'] | ['Hampton', 'Evans', 'Miller', 'Larionov', 'Renshaw', 'Murray', 'Krause', 'Anderson', 'Ho', 'Walker', 'White', 'Dixon'] | ['Miller', 'Larionov'] | ['Miller', 'Larionov'] | Breast Cancer Res | 2010 | 2010 | 0 | 13 - 16 ]. The method adjusts for background noise on chips and summarizes data into expression values, one number per gene per sample. Primary microarray data are available from the Gene Expression Omnibus [ 17 ] with series numbers [GEO:GSE5462{{tag}}--DEPOSIT--] and [GEO:GSE20181]). Genes associated with oestrogen regulation and proliferation Marker genes classically associated with oestrogen regulation were KIAA0101|th C Smyth G Tierney L Yang JY Zhang J Bioconductor: Open software development for computational biology and bioinformatics Genome Biology 2004 5 R80 10.1186/gb-2004-5-10-r80 15461798 Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ Itoh T Karlsberg K Kijima I Yuan YC Smith D Ye J Chen S Letrozole-, anastrozole- and tamoxifen-responsive genes in MCF-7aro cells: a microarray approach Mol Cancer Re | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
204 | GSE5469 | 4/16/2007 | ['5469'] | [] | [u'17537510'] | 2225591 | [u'17537510'] | ['Wu', 'Pal'] | ['Wu', 'Pal'] | ['Wu', 'Pal'] | Dev Comp Immunol | 2008 | 2008 | 0 | AND pmc_gds | 1 | 0 | ||||
205 | GSE5473 | 8/8/2007 | ['5473'] | [] | [u'17727719'] | 2928167 | [u'20682026'] | ['', 'Lam', 'Ng', 'Chari', 'Lonergan', 'MacAulay'] | ['Ng', 'Lam', 'Zhu', 'Chan', 'Tsao', 'Chari', 'Coe', 'Lonergan', 'Pikor', 'MacAulay'] | ['Chari', 'Lam', 'Lonergan', 'MacAulay', 'Ng'] | BMC Med Genomics | 2010 | 8/3/2010 | 0 | with six libraries representing lung squamous cell carcinoma and five libraries representing carcinoma in situ. This data can be found at the GEO database with the following series accession numbers: GSE3707, GSE5473{{tag}}--DEPOSIT--, and GSE7898. All samples were acquired under approval by the University of British Columbia - British Columbia Cancer Agency Research Ethics Board (UBC-BCCA-REB) and all subjects provided| 14 ] and NormFinder [ 29 ], and the variance of cycle threshold difference (dCt) across all 15 tumor/matched non-malignant sample pairs were the approaches used to determine constancy. Analysis of publicly available microarray datasets Lung NEPS genes were used to re-normalize two publicly available microarray datasets. Microarray data were obtained from GEO at NCBI under accession numbers GSE10072 [|ying these criteria, eight probes were used (Additional file 5 ), which represented genes PPP1CB , B2M , RPL4 , CAPZB , ATP5J , RAB5C , NDUFA1 , and HSPA1A . For the Agilent microarray data (GSE12428), all lung NEPS genes were represented on this microarray platform. Data was processed as described previously [ 31 ]. In the cases where lung NEPS genes were represented with multiple probes, the|he poorest is GAPDH . Demonstrating tissue specificity of reference genes To further our investigations regarding reference genes optimal for cancer cell biology, we expanded our analysis to include publicly available SAGE libraries representing normal and cancer tissue from both brain and breast. The results of this analysis clearly demonstrate that the reference genes identified in the lung dataset are|1.653 18 GAPDH 58 4.044 17 0.145 20 2.093 21 SLFN13 58 4.803 19 0.132 19 1.858 20 *Genes identified in this study are bolded Effect of reference genes on differential gene expression analysis Using a publicly available microarray dataset (GSE10072, [ 30 ]), differential expression analysis was performed using SAM [ 34 ]. Results from SAM were compared using the dataset normalized by MAS 5.0 alone, v|uch as Neuregulin and JAK/Stat [ 35 ], at a higher significance relative to analysis of the same dataset normalized by MAS 5.0 alone (Figure 3B ). Similarly, when evaluated using an additional publicly available lung cancer microarray dataset [ 31 ], we observe slight differences between the various pathways identified from analysis of differentially expressed genes derived from a NEPS -normalized | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
206 | GSE5473 | 8/8/2007 | ['5473'] | [] | [u'17727719'] | 2001199 | [u'17727719'] | ['', 'Lam', 'Ng', 'Chari', 'Lonergan', 'MacAulay'] | ['Chari', 'Lonergan', 'MacAulay', 'Lam', 'Ng'] | ['Chari', 'Lonergan', 'MacAulay', 'Lam', 'Ng'] | BMC Genomics | 2007 | 8/29/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
207 | GSE5473 | 8/8/2007 | ['5473'] | [] | [u'17727719'] | 2820080 | [u'20161782'] | ['', 'Lam', 'Ng', 'Chari', 'Lonergan', 'MacAulay'] | ['Lam', 'Wilson', 'Macaulay', 'Ng', 'Chari', 'Coe', 'Lonergan', 'Tsao'] | ['Chari', 'Lam', 'Lonergan', 'Ng'] | PLoS One | 2010 | 2/11/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
208 | GSE5489 | 4/16/2007 | ['5489'] | [] | [u'17537510'] | 2225591 | [u'17537510'] | ['Wu', 'Pal'] | ['Wu', 'Pal'] | ['Wu', 'Pal'] | Dev Comp Immunol | 2008 | 2008 | 0 | AND pmc_gds | 1 | 0 | ||||
209 | GSE5501 | 2/1/2007 | ['5501'] | [] | [u'14623978', u'20053763'] | 2821037 | [u'20053763'] | ['Pommier', 'Young', 'Shankavaram', 'Lorenzi', 'Charboneau', 'Weinstein', 'Espina', 'Lee', 'Nishizuka', 'Munson', 'Ho', 'Reinhold', 'Kouros-Mehr', u'Munsen', 'Reimers', 'Ikediobi', 'Waltham', 'Major', 'Petricoin', 'Ziegler', 'Liotta', 'Bussey'] | ['Ikediobi', 'Weinstein', 'Pommier', 'Nishizuka', 'Shankavaram', 'Ziegler', 'Lorenzi', 'Ho', 'Bussey', 'Reimers', 'Reinhold'] | ['Ikediobi', 'Weinstein', 'Pommier', 'Nishizuka', 'Shankavaram', 'Ziegler', 'Lorenzi', 'Ho', 'Bussey', 'Reimers', 'Reinhold'] | Mol Cancer Ther | 2010 | 2010 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
210 | GSE5503 | 10/31/2007 | ['5503'] | [] | [u'18178870'] | 2254543 | [u'18178870'] | ['King', 'Heller', 'Murphy', 'van', 'Grubin', 'Smith', 'Zakrzewski', 'Terwey', 'Suh', 'Liu', 'Borsotti', 'Kim', 'Chen', 'Alpdogan', 'Kochman'] | ['King', 'Heller', 'Murphy', 'van', 'Grubin', 'Smith', 'Zakrzewski', 'Terwey', 'Suh', 'Liu', 'Borsotti', 'Kim', 'Chen', 'Alpdogan', 'Kochman'] | ['King', 'Heller', 'Murphy', 'van', 'Grubin', 'Smith', 'Zakrzewski', 'Terwey', 'Suh', 'Liu', 'Borsotti', 'Kim', 'Chen', 'Alpdogan', 'Kochman'] | Blood | 2008 | 3/1/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
211 | GSE5504 | 8/10/2007 | ['5504'] | ['2856'] | [u'15668391', u'17913878'] | 2042192 | [u'17913878'] | ['Aderem', 'Stolovitzky', 'Zhou', 'Smith', 'Duggar', 'Kundaje', 'Held', 'Hood', 'Strobe', 'Haudenschild', 'Vasicek', 'Roach', 'Nissen'] | ['Aderem', 'Stolovitzky', 'Zhou', 'Smith', 'Held', 'Hood', 'Strobe', 'Haudenschild', 'Vasicek', 'Roach', 'Nissen'] | ['Aderem', 'Stolovitzky', 'Zhou', 'Smith', 'Held', 'Hood', 'Strobe', 'Haudenschild', 'Vasicek', 'Roach', 'Nissen'] | Proc Natl Acad Sci U S A | 2007 | 10/9/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
212 | GSE5510 | 8/9/2007 | ['5510'] | ['2857'] | [u'17242199'] | 1899882 | [u'17242199'] | ['Gerton', 'Wang', 'Kouadio', 'Davidson', 'Cheng', 'Goodheart', 'Page', 'Buffone'] | ['Gerton', 'Wang', 'Kouadio', 'Davidson', 'Cheng', 'Goodheart', 'Page', 'Buffone'] | ['Gerton', 'Wang', 'Kouadio', 'Davidson', 'Cheng', 'Goodheart', 'Page', 'Buffone'] | Mol Cell Biol | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
213 | GSE5543 | 2/10/2007 | ['5543'] | ['2682'] | [u'17460260'] | 2927432 | [u'20806045'] | ['Turner', 'Akinci', 'Budak', 'Wolosin'] | ['Wang', 'Zhang', 'Vellonen', 'Wolosin', 'Urtti', 'Reinach', 'Turner'] | ['Turner', 'Wolosin'] | Mol Vis | 2010 | 8/22/2010 | 0 | from the Affymetrix 95 A chip-based gene expression studies on intact human corneal epithelium from cadaver donor corneas by Turner et al. [ 20 ]. The data are available through the accession number GSE5543{{tag}}--REUSE-- at the NCBI Gene Expression Omnibus ( GEO ). The same U95 A microarray has been used to characterize global gene expression in stratified epithelia generated using the svHCECs. Briefly, svHCECs we | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
214 | GSE5543 | 2/10/2007 | ['5543'] | ['2682'] | [u'17460260'] | 2994346 | [u'21139686'] | ['Turner', 'Akinci', 'Budak', 'Wolosin'] | ['Urtti', 'Vellonen', 'Wolosin', 'Auvinen', 'Greco', 'Turner', 'Tervo', 'H\xc3\xa4kli'] | ['Turner', 'Wolosin'] | Mol Vis | 2010 | 10/15/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
215 | GSE5543 | 2/10/2007 | ['5543'] | ['2682'] | [u'17460260'] | 2909883 | [u'17460260'] | ['Turner', 'Akinci', 'Budak', 'Wolosin'] | ['Turner', 'Akinci', 'Budak', 'Wolosin'] | ['Turner', 'Akinci', 'Budak', 'Wolosin'] | Invest Ophthalmol Vis Sci | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
216 | GSE5552 | 7/9/2007 | ['5552'] | ['2879'] | [u'17591798'] | 1951185 | [u'17591798'] | ['Englert', 'Lee', 'Jayaraman', 'Wood', 'Hegde', 'Bansal'] | ['Englert', 'Lee', 'Jayaraman', 'Wood', 'Hegde', 'Bansal'] | ['Englert', 'Lee', 'Jayaraman', 'Wood', 'Hegde', 'Bansal'] | Infect Immun | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
217 | GSE5558 | 8/17/2007 | ['5558'] | [] | [] | 1906760 | [u'17567914'] | [u'Sims', u'Joshi', u'Levy', u'Dean'] | ['Sims', 'Davies', 'Levy', 'Joshi', 'Dean'] | [u'Sims', u'Joshi', u'Levy', u'Dean'] | BMC Dev Biol | 2007 | 6/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
218 | GSE5559 | 1/1/2007 | ['5559'] | [] | [u'17382889'] | 2802188 | [u'20008927'] | [u'Abdullayev', 'Zhang', 'Smith', 'Loukinov', 'Ching', 'Green', 'Abdullaev', 'Kim', 'Ren', 'Lobanenkov'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
219 | GSE5559 | 1/1/2007 | ['5559'] | [] | [u'17382889'] | 2572726 | [u'17382889'] | [u'Abdullayev', 'Zhang', 'Smith', 'Loukinov', 'Ching', 'Green', 'Abdullaev', 'Kim', 'Ren', 'Lobanenkov'] | ['Zhang', 'Smith', 'Loukinov', 'Ching', 'Green', 'Abdullaev', 'Kim', 'Ren', 'Lobanenkov'] | ['Zhang', 'Smith', 'Loukinov', 'Ching', 'Green', 'Abdullaev', 'Kim', 'Ren', 'Lobanenkov'] | Cell | 2007 | 3/23/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
220 | GSE5561 | 8/18/2007 | ['5561'] | [] | [u'17547415'] | 2684550 | [u'19397818'] | ['Le', 'Renand', 'Hocquette', 'Bernard', 'Cassar-Malek', 'Dubroeucq', u'L\xe9ger', u'LeCunff'] | ['Picard', 'Chevalier', 'Hocquette', 'Cassar-Malek', 'Meunier', 'Chelh', 'Reecy'] | ['Cassar-Malek', 'Hocquette'] | BMC Genomics | 2009 | 4/27/2009 | 0 | performed according to recently proposed standards (MIAME consortium). Data were incorporated into the BASE database and the NCBI Gene Expression Omnibus (GEO) and are accessible through GEO Series GSE5561{{tag}}--DEPOSIT-- and GSE5456. Total RNA was extracted from muscle tissue samples with TRIZOL ® reagent (Life Technologies) according to the manufacturer's recommendation. The RNA was then purified and trea | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
221 | GSE5595 | 9/19/2007 | ['5595'] | [] | [u'17984051'] | 2748096 | [u'19728865'] | ['Wilson', 'Paules', 'Russo', 'Tennant', 'Li', 'Fannin', 'Houle', 'Boorman', 'Huang', 'Malarkey', 'Bushel', 'Heinloth', 'Ward', 'Watkins', 'Chou'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | [] | BMC Genomics | 2009 | 9/3/2009 | 1 | f uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when|r to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. Results GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query an|gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . Background One of the major challenges in the post-g|vering gene functions on a genomic scale [ 1 ]. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO) [ 2 ], ArrayExpress [ 3 ] and researchers' websites. These resources serve at least two purposes. One is as an archive of the data, which allows other researchers to confirm results that have b|eveloped a web tool named GEM-TREND (Gene Expression data Mining Toward Relevant Network Discovery) to automatically retrieve gene expression data across a wide range of microarray experiments in the publicly available GEO database by comparing gene-expression patterns between a query and the database entries. Subsequently, the system generates a gene co-expression network for retrieved gene expression da|, and each series links to GEO by clicking the GSE ID or GPL ID (Fig. 3e ). In addition, the series of interest can be selected for further processing. Both search results and selected series can be downloaded in CSV format. Figure 3 Screenshot of GEM-TREND . a) Query input area. The gene-expression signature, gene expression ratio data and text are accepted. Network IDs can be used to retrieve previous | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192 (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965 GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381 GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029 GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615 GPL2884 Cancer Process Study 0.752 0.0107 GSE7600 GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192 GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625 GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528 GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou|language: Java, PHP Other requirements: Java 1.5.0 or higher License: The tool is available free of charge Any restrictions to of use by non-academics: None List of abbreviations GEO: Gene Expression Omnibus; GO: Gene Ontology; GSE: Series in GEO; GPL: Platform in GEO; MeSH: Medical Subject Headings. Authors' contributions CF designed the system and wrote the manuscript; MA gave comments and edited the m|l E Koller D Kim SK A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules Science 2003 302 249 255 12934013 10.1126/science.1087447 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 207 210 11752295 10.1093/nar/30.1.207 Parkinson H Kapushesky M Shojatalab M Abeygunawardena N Coulson R Farne A | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
222 | GSE5595 | 9/19/2007 | ['5595'] | [] | [u'17984051'] | 2478688 | [u'18558008'] | ['Wilson', 'Paules', 'Russo', 'Tennant', 'Li', 'Fannin', 'Houle', 'Boorman', 'Huang', 'Malarkey', 'Bushel', 'Heinloth', 'Ward', 'Watkins', 'Chou'] | ['Paules', 'Heinloth', 'Zeng', 'Huang', 'Bushel'] | ['Paules', 'Heinloth', 'Huang', 'Bushel'] | BMC Genomics | 2008 | 6/16/2008 | 0 | The data is publicly available at the Chemical Effects in Biological Systems (CEBS) database under accession numbers 001-00002-0013-000-7, 001-00001-0021-000-5, 001-00001-0022-000-6, 001-00001-0028-000-2, 001-00001-0025-000-9, 001-00001-0024-000-8, 001-00001-0023-000-7, 001-00001-0026-000-0, 001-00001-0027-000-1 and 002-00001-0011-000-5 or at the Gene Expression Omnibus (GEO) database with accession number GSE5595{{tag}}--DEPOSIT--. | 0 | 0 | 0 | NOT pmc_gds | 1 | 0 |
223 | GSE5609 | 3/5/2007 | ['5609'] | [] | [u'17332508'] | 2743478 | [u'17332508'] | ['Lyons', 'Zhang', 'Li', 'Pankratz', 'Chen', 'Lavaute'] | ['Lyons', 'Zhang', 'Li', 'Pankratz', 'Chen', 'Lavaute'] | ['Lyons', 'Zhang', 'Li', 'Pankratz', 'Chen', 'Lavaute'] | Stem Cells | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
224 | GSE5615 | 1/8/2007 | ['5615'] | [] | [] | 2848248 | [u'20226034'] | [u'Emmerson', u'Brunner', u'Schildknecht', u'Townsend', u'N\xfcrnberger'] | ['Blomster', 'Saloj\xc3\xa4rvi', 'Wrzaczek', 'Jaspers', 'Vainonen', 'Kangasj\xc3\xa4rvi', 'Overmyer', 'Reddy'] | [] | BMC Genomics | 2010 | 3/12/2010 | 0 | n with decreasing short-wave cut-off in the UV range (UV-B experiment); E-MEXP-739, Syringolin A; E-MEXP-1797, Rotenone), Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ (accession numbers GSE5615{{tag}}--REUSE--, Elicitors LPS, HrpZ, Flg22 and NPP1; GSE5685, Virulent and avirulent Pseudomonas syringae ; GSE9955, BTH experiment 1; GDS417 E. cichoracearum ; GSE5530, H 2 O 2 ; GSE5621, Cold time course expe| GSE5622, Osmotic stress time course experiment; GSE5623, Salt time course experiment; GSE5624, Drought time course experiment; GSE5722, O 3 ; GSE12887, Norflurazon; GSE10732, OPDA and Phytoprostane; GSE7112, ABA experiment 2) and The Integrated Microarray Database System http://ausubellab.mgh.harvard.edu/imds (Experiment name: BTH time course, BTH experiment 2). The raw Affymetrix data was preproces | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
225 | GSE5615 | 1/8/2007 | ['5615'] | [] | [] | 2811198 | [u'20126659'] | [u'Emmerson', u'Brunner', u'Schildknecht', u'Townsend', u'N\xfcrnberger'] | ['Kwezi', 'Ruzvidzo', 'Morse', 'Meier', 'Gehring', 'Donaldson'] | [] | PLoS One | 2010 | 1/26/2010 | 0 | change (log2) in expression of AtWAKL10 and all genes in the ECGG in response to selected microarray experiments. The experiments presented include; chitooctaose (30 minutes after treatment (mat), GSE8319), elf26 (30 mat, E-MEXP-547), flg22 (1 hour after treatment (hat), NASC-409), NPP1 (1 hat, GSE5615{{tag}}--REUSE--), HrpZ (1 hat, GSE5615{{tag}}--REUSE--), P. infestans (6 hat, NASC-123), B. graminis h (12 hat, GSE12856), E.|4 hat, NASC-120), BTH treatment (BTH vs. untreated (Col-0) and BTH ( npr1) vs. BTH (Col-0), 8 hat, NASC-392), mpk4 (At4g01370, E-MEXP-174), mkk1 (At4g26070) and mkk2 (At4g29810) double mutant (GSE10646), and CHX (3 hat, NASC-189). Details of the microarray experimental conditions are presented in Text S2 (Supporting Information). 10.1371/journal.pone.0008904.g007 Figure 7 Expression of AtWAKL|10 following pathogen and elicitor challenge. ( A ) Fold change in AtWAKL10 expression following incubation with the pathogen elicitors, chitooctaose (Col-0 and cerk1 (At3g21630) mutant, 30 mat, GSE8319); elf26 (30 mat, E-MEXP-547), flg22 (1 hat, NASC-409), NPP1 (1 hat, GSE5615{{tag}}--REUSE--), HrpZ (1 hat, GSE5615{{tag}}--REUSE--) and Syringolin A (12 hat, E-MEXP-739) as determined from microarray experiments. ( B ) Fold chang|owing challenge with P. infestans (6 hat, NASC-123), B. graminis h (12 hat, GSE12856), E. cichoracearum (3 dat, GSE431), G. orontii time course (Col-0 and eds16/ics1 mutant, 3, 5 and 7 dat, GSE13739), B. cinerea (18 and 48 hat, NASC-167) and OG treatment (1 hat, NASC-409) as determined from microarray experiments. ( C ) Semi-quantitative RT-PCR gel image illustrating AtWAKL10 expression o|rray experiments. Fold change in AtWAKL10 expression was also determined from microarray experiments in a number of pathogen-related mutants including mpk4 (E-MEXP-174), mkk1mkk2 double mutant (GSE10646) and cpr5 (At5g64930, GSE5745). ( B ) Semi-quantitative RT-PCR confirmed the induction of AtWAKL10 expression in response to 10 µM CHX (3 hat) relative to the DMSO control (C). UBQ w|.genevestigator.com ) using the stimulus and mutation tools [96] . In order to obtain greater resolution of gene expression profiles, the normalized microarray data were subsequently downloaded and analyzed for experiments that were found to induce differential expression of the genes. The data were downloaded from the following repository sites; NASCArrays ( http://affymetrix.arabidopsis | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
226 | GSE5617 | 1/22/2007 | ['5617'] | [] | [] | 2535664 | [u'18798691'] | [u'Kretsch', u'Emmerson', u'Schildknecht', u'Townsend'] | ['Hazen', 'Michael', 'Priest', 'Chory', 'Kay', 'Mockler', 'Breton'] | [] | PLoS Biol | 2008 | 9/16/2008 | 1 | data were fit using a linear model in the R Bioconductor limma package with a p < 0.01 cutoff. Datasets were downloaded from the ArrayExpress or GEO Web site: AtGenExpress light treatments GSE5617{{tag}}--REUSE-- and tissue (7-d-old cotyledons, hypocotyls, and roots), E-TABM-17; shade avoidance (low R/FR), E-MEXP-443 [ 36 ]; ckx1-ox , E-MEXP-344; DELLApenta ( ga1-3 gai-t6 rga-t2 rgl1-1 rg | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
227 | GSE5617 | 1/22/2007 | ['5617'] | [] | [] | 2660632 | [u'19244139'] | [u'Kretsch', u'Emmerson', u'Schildknecht', u'Townsend'] | ['Lee', 'Oh', 'Kang', 'Park', 'Choi', 'Yamaguchi', 'Kamiya'] | [] | Plant Cell | 2009 | 2009 Feb | 0 | AND pmc_gds | 0 | 1 | ||||
228 | GSE5620 | 1/8/2007 | ['5620'] | [] | [u'17376166'] | 2998528 | [u'21070630'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Lohse', 'Giorgi', 'Usadel', 'Bolger'] | [] | BMC Bioinformatics | 2010 | 11/11/2010 | 0 | quencing techniques promise to replace them in the near future [ 1 ], it would be a mistake to ignore the biological importance of the massive quantity of data already produced through this platform. Publicly available databases alone store a huge (and growing) quantity of microarray experiments (e.g. 338947 samples in Gene Expression Omnibus [ 2 ] and 251711 in ArrayExpress [ 3 ]), comprising hundreds of|technique both for differential gene expression analyses and for correlative studies based on microarray data. Methods Datasets In order to obtain a vast, robust and condition-independent dataset, we downloaded all Arabidopsis thaliana ATH1 microarrays available from GEO [ 2 ] and removed truncated or unreadable files and genomic DNA experiments. This dataset comprised 3707 arrays and is henceforth referr|sis dataset" . To test the abilities of RMA and tRMA to correctly cluster different tissue samples, we analyzed microarrays from the AtGenExpress stress study [ 31 ], contained in the Gene Expression Omnibus series GSE5620{{tag}}--REUSE---GSE5628. This dataset ( root-shoot dataset ) comprises 248 samples, evenly distributed in shoot and root tissues. To further assess sample classification performance of RMA and tRMA, w|obal view of gene activity and alternative splicing by deep sequencing of the human transcriptome Science 2008 321 956 960 10.1126/science.1160342 18599741 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 207 210 10.1093/nar/30.1.207 11752295 Parkinson H Kapushesky M Shojatalab M Abeygunawardena N Coulson R Farne A | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
229 | GSE5620 | 1/8/2007 | ['5620'] | [] | [u'17376166'] | 1947970 | [u'17640358'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | [] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620{{tag}}--REUSE--, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
230 | GSE5620 | 1/8/2007 | ['5620'] | [] | [u'17376166'] | 2483354 | [u'18682831'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Kunova', 'Adamo', 'Meyer', 'Pinney', 'Westhead'] | [] | PLoS One | 2008 | 8/6/2008 | 0 | thin this software was used to identify groups of genes with mutually high levels of co-expression. The heat stress time series from AtGenExpress (Nover et al. Gene Expression Omnibus (GEO) Accession GSE5628) and the corresponding control time series (Townsend et al. GEO accession GSE5620{{tag}}--REUSE--) were used in detailed investigation of heat induction. These data sets were downloaded from GEO, expression summar | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
231 | GSE5620 | 1/8/2007 | ['5620'] | [] | [u'17376166'] | 2576257 | [u'18826656'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | [] | BMC Genomics | 2008 | 10/1/2008 | 0 | oarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (< 2 mm panicle), P1-II (0.2 to 0.5 mm panicle) and P1-III (5 to 10 mm panicle) have been added to emphas|00ae; ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620{{tag}}--REUSE--, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software wherein GCRMA algo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
232 | GSE5620 | 1/8/2007 | ['5620'] | [] | [u'17376166'] | 2882264 | [u'20423940'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Zhao', 'Ma'] | [] | J Exp Bot | 2010 | 2010 Jun | 0 | base at the NCBI website ( http://www.ncbi.nlm.nih.gov/geo/ ) and the Rice Functional Genomic Express Database ( http://signal.salk.edu/cgi-bin/RiceGE ). For temporal and spatial expression analysis (GSE6893), different stages of panicle and seed development were categorized according to panicle length and days after pollination, respectively, based on landmark developmental events as follows. (i) Pani| morphogenesis (S3); 11–20 DAP, embryo maturation (S4); 21–29 DAP, dormancy and desiccation tolerance (S5) ( Itoh et al. , 2005 , Jain et al. , 2007 ). For abiotic stress analysis (GSE6901), rice seedlings were transferred to a beaker containing 200 mM NaCl solution for salt stress, dried between folds of tissue paper at 28±1 °C in a culture room for d|of Arabidopsis AGP-encoding genes were downloaded using ‘Bulk Gene Download’ at Nottingham Arabidopsis Stock Centre's microarray database, and the results of developmental stages (GSE5629–5633) and stress treatments (GSE5620{{tag}}--REUSE--–5621 and 5623–5624) were used to analyse the expression of AGP-encoding genes in Arabidopsis ( http://affymetrix.arabidopsis.info/narr|les from UniGene at http://www.ncbi.nlm.nih.gov/unigene/ ; 12, MPSS tags, http://mpss.udel.edu/rice/ ; 13, the absolute signal values were downloaded at http://signal.salk.edu/cgi-bin/RiceGE ; 14, GSE6893, expression at various developmental stages; 15, GSE661, expression under ABA and GA treatments; 16, GSE6901, expression under abiotic stresses treatments. All PAST-rich proteins used for final ana|and OsAGP22 , and OsAGP25 and OsAGP26 ) ( Fig. 6 ). Fig. 6. Expression profiles of AGP-encoding genes in various rice organs and tissues at different development stages. The microarray data sets (GSE6893) of gene expression at various developmental stages were used for cluster display. A heat map representing hierarchical clustering of average log signal values of all the AGP-encoding genes in vari|6 OsAffx-28335-1-S1_at 5696.2 233.3 0.04 Y OsLLA7 Os-9342-1-S1_at 3120.1 1746.4 0.56 OsLLA8 OsAffx-18220-1-S1_x_at 87.0 212.5 2.44 Y a The probe ID is used on microarray plate GPL2025. The experiment GSE6893 was used for differential expression analysis. b MOV, the maximum absolute values of vegetative tissues. c MOR, the maximum absolute values of reproductive tissues. d The values of MOR divided by M|panicles. Expression analysis of rice AGP-encoding genes under abiotic stress, ABA, and GA treatments To investigate the abiotic stress response of rice AGP-encoding genes, the results of microarray (GSE6901) from 7-day-old seedlings subjected to drought, salt, and cold stresses were analysed. The data revealed that a total of 15 genes were significantly down- or up-regulated (<0.5 or >|x02009;h, and down-regulated by drought and salt stresses ( Fig. 8 ). Fig. 8. Expression profiles of rice AGP-encoding genes differentially expressed under abiotic stresses. The microarray data sets (GSE6901) of gene expression under various abiotic stresses were used for cluster display. The average log signal values of AGP-encoding genes under control and various stress conditions (indicated at the t|values) is shown at the bottom. CK, control; DS, drought stress; SS, salt stress; CS, cold stress. The results of ABA- and GA-treated callus were used to analyse the regulation of AGP-encoding genes (GSE661). It was found that many AGP-encoding genes were regulated by ABA and GA ( Supplementary Table S5 at JXB online). To verify the results of microarray under ABA and GA treatments, the transcriptio | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
233 | GSE5620 | 1/8/2007 | ['5620'] | [] | [u'17376166'] | 2993539 | [u'21044985'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Jhanwar', 'Garg', 'Tyagi', 'Jain'] | [] | DNA Res | 2010 | 2010 Dec | 0 | rice and TAIR database for Arabidopsis . Another approach used was to search for the proteins containing PFam domain PF00462, which encodes GRX domain. Further, the rice and Arabidopsis proteomes downloaded from RGAP (version 6.1) and TAIR (version 9), respectively, were searched using the hidden Markov model (HMM) profile (build 2.3.2) for GRX domain (PF00462) downloaded from the PFam database. The r| gene in a given library represents the quantitative estimate of the expression of that gene. 2.6. Expression analysis using microarray data For rice, the microarray data available at Gene Expression Omnibus (GEO) database under the series accession numbers GSE6893 (various stages of development), 21 GSE6901 (abiotic stress), 21 GSE4471 (arsenate stress), 22 GSE5167 (auxin and cytokinin treatment), 23|ays as described earlier. 21 For Arabidopsis , the microarray data for various stages of developments, hormone treatments and abiotic stress treatments corresponding to those analyzed for rice were downloaded from GEO (series accession numbers GSE5620{{tag}}--REUSE--, GSE5621, GSE5623, GSE5624 and GSE5629–GSE5634) and AtGenExpress ( http://www.arabidopsis.org/portals/expression/microarray/ATGenExpress.jsp ) dat | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
234 | GSE5620 | 1/8/2007 | ['5620'] | [] | [u'17376166'] | 2660624 | [u'19244141'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Dreos', 'Sugio', 'Maule', 'Aparicio'] | [] | Plant Cell | 2009 | 2009 Feb | 0 | AND pmc_gds | 0 | 1 | ||||
235 | GSE5621 | 1/8/2007 | ['5621'] | [] | [u'17376166'] | 2848248 | [u'20226034'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Blomster', 'Saloj\xc3\xa4rvi', 'Wrzaczek', 'Jaspers', 'Vainonen', 'Kangasj\xc3\xa4rvi', 'Overmyer', 'Reddy'] | [] | BMC Genomics | 2010 | 3/12/2010 | 0 | n with decreasing short-wave cut-off in the UV range (UV-B experiment); E-MEXP-739, Syringolin A; E-MEXP-1797, Rotenone), Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ (accession numbers GSE5615, Elicitors LPS, HrpZ, Flg22 and NPP1; GSE5685, Virulent and avirulent Pseudomonas syringae ; GSE9955, BTH experiment 1; GDS417 E. cichoracearum ; GSE5530, H 2 O 2 ; GSE5621{{tag}}--REUSE--, Cold time course expe| GSE5622, Osmotic stress time course experiment; GSE5623, Salt time course experiment; GSE5624, Drought time course experiment; GSE5722, O 3 ; GSE12887, Norflurazon; GSE10732, OPDA and Phytoprostane; GSE7112, ABA experiment 2) and The Integrated Microarray Database System http://ausubellab.mgh.harvard.edu/imds (Experiment name: BTH time course, BTH experiment 2). The raw Affymetrix data was preproces | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
236 | GSE5621 | 1/8/2007 | ['5621'] | [] | [u'17376166'] | 1947970 | [u'17640358'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | [] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621{{tag}}--REUSE--, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
237 | GSE5621 | 1/8/2007 | ['5621'] | [] | [u'17376166'] | 2576257 | [u'18826656'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | [] | BMC Genomics | 2008 | 10/1/2008 | 0 | oarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (< 2 mm panicle), P1-II (0.2 to 0.5 mm panicle) and P1-III (5 to 10 mm panicle) have been added to emphas|00ae; ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621{{tag}}--REUSE--, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software wherein GCRMA algo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
238 | GSE5621 | 1/8/2007 | ['5621'] | [] | [u'17376166'] | 2993539 | [u'21044985'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Jhanwar', 'Garg', 'Tyagi', 'Jain'] | [] | DNA Res | 2010 | 2010 Dec | 0 | rice and TAIR database for Arabidopsis . Another approach used was to search for the proteins containing PFam domain PF00462, which encodes GRX domain. Further, the rice and Arabidopsis proteomes downloaded from RGAP (version 6.1) and TAIR (version 9), respectively, were searched using the hidden Markov model (HMM) profile (build 2.3.2) for GRX domain (PF00462) downloaded from the PFam database. The r| gene in a given library represents the quantitative estimate of the expression of that gene. 2.6. Expression analysis using microarray data For rice, the microarray data available at Gene Expression Omnibus (GEO) database under the series accession numbers GSE6893 (various stages of development), 21 GSE6901 (abiotic stress), 21 GSE4471 (arsenate stress), 22 GSE5167 (auxin and cytokinin treatment), 23|ays as described earlier. 21 For Arabidopsis , the microarray data for various stages of developments, hormone treatments and abiotic stress treatments corresponding to those analyzed for rice were downloaded from GEO (series accession numbers GSE5620, GSE5621{{tag}}--REUSE--, GSE5623, GSE5624 and GSE5629–GSE5634) and AtGenExpress ( http://www.arabidopsis.org/portals/expression/microarray/ATGenExpress.jsp ) dat | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
239 | GSE5622 | 1/8/2007 | ['5622'] | [] | [u'17376166'] | 2848248 | [u'20226034'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Blomster', 'Saloj\xc3\xa4rvi', 'Wrzaczek', 'Jaspers', 'Vainonen', 'Kangasj\xc3\xa4rvi', 'Overmyer', 'Reddy'] | [] | BMC Genomics | 2010 | 3/12/2010 | 0 | n with decreasing short-wave cut-off in the UV range (UV-B experiment); E-MEXP-739, Syringolin A; E-MEXP-1797, Rotenone), Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ (accession numbers GSE5615, Elicitors LPS, HrpZ, Flg22 and NPP1; GSE5685, Virulent and avirulent Pseudomonas syringae ; GSE9955, BTH experiment 1; GDS417 E. cichoracearum ; GSE5530, H 2 O 2 ; GSE5621, Cold time course expe| GSE5622{{tag}}--REUSE--, Osmotic stress time course experiment; GSE5623, Salt time course experiment; GSE5624, Drought time course experiment; GSE5722, O 3 ; GSE12887, Norflurazon; GSE10732, OPDA and Phytoprostane; GSE7112, ABA experiment 2) and The Integrated Microarray Database System http://ausubellab.mgh.harvard.edu/imds (Experiment name: BTH time course, BTH experiment 2). The raw Affymetrix data was preproces | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
240 | GSE5623 | 1/8/2007 | ['5623'] | [] | [u'17376166'] | 2848248 | [u'20226034'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Blomster', 'Saloj\xc3\xa4rvi', 'Wrzaczek', 'Jaspers', 'Vainonen', 'Kangasj\xc3\xa4rvi', 'Overmyer', 'Reddy'] | [] | BMC Genomics | 2010 | 3/12/2010 | 0 | n with decreasing short-wave cut-off in the UV range (UV-B experiment); E-MEXP-739, Syringolin A; E-MEXP-1797, Rotenone), Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ (accession numbers GSE5615, Elicitors LPS, HrpZ, Flg22 and NPP1; GSE5685, Virulent and avirulent Pseudomonas syringae ; GSE9955, BTH experiment 1; GDS417 E. cichoracearum ; GSE5530, H 2 O 2 ; GSE5621, Cold time course expe| GSE5622, Osmotic stress time course experiment; GSE5623{{tag}}--REUSE--, Salt time course experiment; GSE5624, Drought time course experiment; GSE5722, O 3 ; GSE12887, Norflurazon; GSE10732, OPDA and Phytoprostane; GSE7112, ABA experiment 2) and The Integrated Microarray Database System http://ausubellab.mgh.harvard.edu/imds (Experiment name: BTH time course, BTH experiment 2). The raw Affymetrix data was preproces | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
241 | GSE5623 | 1/8/2007 | ['5623'] | [] | [u'17376166'] | 1947970 | [u'17640358'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | [] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623{{tag}}--REUSE--, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
242 | GSE5623 | 1/8/2007 | ['5623'] | [] | [u'17376166'] | 2576257 | [u'18826656'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | [] | BMC Genomics | 2008 | 10/1/2008 | 0 | oarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (< 2 mm panicle), P1-II (0.2 to 0.5 mm panicle) and P1-III (5 to 10 mm panicle) have been added to emphas|00ae; ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623{{tag}}--REUSE--, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software wherein GCRMA algo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
243 | GSE5623 | 1/8/2007 | ['5623'] | [] | [u'17376166'] | 2993539 | [u'21044985'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Jhanwar', 'Garg', 'Tyagi', 'Jain'] | [] | DNA Res | 2010 | 2010 Dec | 0 | rice and TAIR database for Arabidopsis . Another approach used was to search for the proteins containing PFam domain PF00462, which encodes GRX domain. Further, the rice and Arabidopsis proteomes downloaded from RGAP (version 6.1) and TAIR (version 9), respectively, were searched using the hidden Markov model (HMM) profile (build 2.3.2) for GRX domain (PF00462) downloaded from the PFam database. The r| gene in a given library represents the quantitative estimate of the expression of that gene. 2.6. Expression analysis using microarray data For rice, the microarray data available at Gene Expression Omnibus (GEO) database under the series accession numbers GSE6893 (various stages of development), 21 GSE6901 (abiotic stress), 21 GSE4471 (arsenate stress), 22 GSE5167 (auxin and cytokinin treatment), 23|ays as described earlier. 21 For Arabidopsis , the microarray data for various stages of developments, hormone treatments and abiotic stress treatments corresponding to those analyzed for rice were downloaded from GEO (series accession numbers GSE5620, GSE5621, GSE5623{{tag}}--REUSE--, GSE5624 and GSE5629–GSE5634) and AtGenExpress ( http://www.arabidopsis.org/portals/expression/microarray/ATGenExpress.jsp ) dat | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
244 | GSE5624 | 1/8/2007 | ['5624'] | [] | [u'17376166'] | 2848248 | [u'20226034'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Blomster', 'Saloj\xc3\xa4rvi', 'Wrzaczek', 'Jaspers', 'Vainonen', 'Kangasj\xc3\xa4rvi', 'Overmyer', 'Reddy'] | [] | BMC Genomics | 2010 | 3/12/2010 | 0 | n with decreasing short-wave cut-off in the UV range (UV-B experiment); E-MEXP-739, Syringolin A; E-MEXP-1797, Rotenone), Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ (accession numbers GSE5615, Elicitors LPS, HrpZ, Flg22 and NPP1; GSE5685, Virulent and avirulent Pseudomonas syringae ; GSE9955, BTH experiment 1; GDS417 E. cichoracearum ; GSE5530, H 2 O 2 ; GSE5621, Cold time course expe| GSE5622, Osmotic stress time course experiment; GSE5623, Salt time course experiment; GSE5624{{tag}}--REUSE--, Drought time course experiment; GSE5722, O 3 ; GSE12887, Norflurazon; GSE10732, OPDA and Phytoprostane; GSE7112, ABA experiment 2) and The Integrated Microarray Database System http://ausubellab.mgh.harvard.edu/imds (Experiment name: BTH time course, BTH experiment 2). The raw Affymetrix data was preproces | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
245 | GSE5624 | 1/8/2007 | ['5624'] | [] | [u'17376166'] | 1947970 | [u'17640358'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | [] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624{{tag}}--REUSE--, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
246 | GSE5624 | 1/8/2007 | ['5624'] | [] | [u'17376166'] | 2576257 | [u'18826656'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | [] | BMC Genomics | 2008 | 10/1/2008 | 0 | oarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (< 2 mm panicle), P1-II (0.2 to 0.5 mm panicle) and P1-III (5 to 10 mm panicle) have been added to emphas|00ae; ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624{{tag}}--REUSE--, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software wherein GCRMA algo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
247 | GSE5624 | 1/8/2007 | ['5624'] | [] | [u'17376166'] | 2993539 | [u'21044985'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'D`Angelo', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Townsend', 'Bornberg-Bauer'] | ['Jhanwar', 'Garg', 'Tyagi', 'Jain'] | [] | DNA Res | 2010 | 2010 Dec | 0 | rice and TAIR database for Arabidopsis . Another approach used was to search for the proteins containing PFam domain PF00462, which encodes GRX domain. Further, the rice and Arabidopsis proteomes downloaded from RGAP (version 6.1) and TAIR (version 9), respectively, were searched using the hidden Markov model (HMM) profile (build 2.3.2) for GRX domain (PF00462) downloaded from the PFam database. The r| gene in a given library represents the quantitative estimate of the expression of that gene. 2.6. Expression analysis using microarray data For rice, the microarray data available at Gene Expression Omnibus (GEO) database under the series accession numbers GSE6893 (various stages of development), 21 GSE6901 (abiotic stress), 21 GSE4471 (arsenate stress), 22 GSE5167 (auxin and cytokinin treatment), 23|ays as described earlier. 21 For Arabidopsis , the microarray data for various stages of developments, hormone treatments and abiotic stress treatments corresponding to those analyzed for rice were downloaded from GEO (series accession numbers GSE5620, GSE5621, GSE5623, GSE5624{{tag}}--REUSE-- and GSE5629–GSE5634) and AtGenExpress ( http://www.arabidopsis.org/portals/expression/microarray/ATGenExpress.jsp ) dat | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
248 | GSE5628 | 1/8/2007 | ['5628'] | [] | [u'17376166'] | 2998528 | [u'21070630'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'von', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Nover', u'Townsend', 'Bornberg-Bauer'] | ['Lohse', 'Giorgi', 'Usadel', 'Bolger'] | [] | BMC Bioinformatics | 2010 | 11/11/2010 | 0 | quencing techniques promise to replace them in the near future [ 1 ], it would be a mistake to ignore the biological importance of the massive quantity of data already produced through this platform. Publicly available databases alone store a huge (and growing) quantity of microarray experiments (e.g. 338947 samples in Gene Expression Omnibus [ 2 ] and 251711 in ArrayExpress [ 3 ]), comprising hundreds of|technique both for differential gene expression analyses and for correlative studies based on microarray data. Methods Datasets In order to obtain a vast, robust and condition-independent dataset, we downloaded all Arabidopsis thaliana ATH1 microarrays available from GEO [ 2 ] and removed truncated or unreadable files and genomic DNA experiments. This dataset comprised 3707 arrays and is henceforth referr|sis dataset" . To test the abilities of RMA and tRMA to correctly cluster different tissue samples, we analyzed microarrays from the AtGenExpress stress study [ 31 ], contained in the Gene Expression Omnibus series GSE5620-GSE5628{{tag}}--REUSE--. This dataset ( root-shoot dataset ) comprises 248 samples, evenly distributed in shoot and root tissues. To further assess sample classification performance of RMA and tRMA, w|obal view of gene activity and alternative splicing by deep sequencing of the human transcriptome Science 2008 321 956 960 10.1126/science.1160342 18599741 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 207 210 10.1093/nar/30.1.207 11752295 Parkinson H Kapushesky M Shojatalab M Abeygunawardena N Coulson R Farne A | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
249 | GSE5628 | 1/8/2007 | ['5628'] | [] | [u'17376166'] | 2483354 | [u'18682831'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'von', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Nover', u'Townsend', 'Bornberg-Bauer'] | ['Kunova', 'Adamo', 'Meyer', 'Pinney', 'Westhead'] | [] | PLoS One | 2008 | 8/6/2008 | 0 | thin this software was used to identify groups of genes with mutually high levels of co-expression. The heat stress time series from AtGenExpress (Nover et al. Gene Expression Omnibus (GEO) Accession GSE5628{{tag}}--REUSE--) and the corresponding control time series (Townsend et al. GEO accession GSE5620) were used in detailed investigation of heat induction. These data sets were downloaded from GEO, expression summar | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
250 | GSE5628 | 1/8/2007 | ['5628'] | [] | [u'17376166'] | 2660624 | [u'19244141'] | ['Whitehead', 'Horak', 'Wanke', 'Kudla', u'von', 'Kilian', u'Schildknecht', 'Weinl', "D'Angelo", u'Emmerson', 'Batistic', 'Harter', u'Nover', u'Townsend', 'Bornberg-Bauer'] | ['Dreos', 'Sugio', 'Maule', 'Aparicio'] | [] | Plant Cell | 2009 | 2009 Feb | 0 | AND pmc_gds | 0 | 1 | ||||
251 | GSE5629 | 1/8/2007 | ['5629'] | [] | [] | 1947970 | [u'17640358'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | [] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629{{tag}}--REUSE--, GSE5630, GSE5631, GSE5632 and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
252 | GSE5629 | 1/8/2007 | ['5629'] | [] | [] | 2576257 | [u'18826656'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | [] | BMC Genomics | 2008 | 10/1/2008 | 0 | oarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (< 2 mm panicle), P1-II (0.2 to 0.5 mm panicle) and P1-III (5 to 10 mm panicle) have been added to emphas|00ae; ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629{{tag}}--REUSE--, GSE5630, GSE5631, GSE5632 and GSE5634. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software wherein GCRMA algo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
253 | GSE5629 | 1/8/2007 | ['5629'] | [] | [] | 2882264 | [u'20423940'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Zhao', 'Ma'] | [] | J Exp Bot | 2010 | 2010 Jun | 0 | base at the NCBI website ( http://www.ncbi.nlm.nih.gov/geo/ ) and the Rice Functional Genomic Express Database ( http://signal.salk.edu/cgi-bin/RiceGE ). For temporal and spatial expression analysis (GSE6893), different stages of panicle and seed development were categorized according to panicle length and days after pollination, respectively, based on landmark developmental events as follows. (i) Pani| morphogenesis (S3); 11–20 DAP, embryo maturation (S4); 21–29 DAP, dormancy and desiccation tolerance (S5) ( Itoh et al. , 2005 , Jain et al. , 2007 ). For abiotic stress analysis (GSE6901), rice seedlings were transferred to a beaker containing 200 mM NaCl solution for salt stress, dried between folds of tissue paper at 28±1 °C in a culture room for d|of Arabidopsis AGP-encoding genes were downloaded using ‘Bulk Gene Download’ at Nottingham Arabidopsis Stock Centre's microarray database, and the results of developmental stages (GSE5629{{tag}}--REUSE--–5633) and stress treatments (GSE5620–5621 and 5623–5624) were used to analyse the expression of AGP-encoding genes in Arabidopsis ( http://affymetrix.arabidopsis.info/narr|les from UniGene at http://www.ncbi.nlm.nih.gov/unigene/ ; 12, MPSS tags, http://mpss.udel.edu/rice/ ; 13, the absolute signal values were downloaded at http://signal.salk.edu/cgi-bin/RiceGE ; 14, GSE6893, expression at various developmental stages; 15, GSE661, expression under ABA and GA treatments; 16, GSE6901, expression under abiotic stresses treatments. All PAST-rich proteins used for final ana|and OsAGP22 , and OsAGP25 and OsAGP26 ) ( Fig. 6 ). Fig. 6. Expression profiles of AGP-encoding genes in various rice organs and tissues at different development stages. The microarray data sets (GSE6893) of gene expression at various developmental stages were used for cluster display. A heat map representing hierarchical clustering of average log signal values of all the AGP-encoding genes in vari|6 OsAffx-28335-1-S1_at 5696.2 233.3 0.04 Y OsLLA7 Os-9342-1-S1_at 3120.1 1746.4 0.56 OsLLA8 OsAffx-18220-1-S1_x_at 87.0 212.5 2.44 Y a The probe ID is used on microarray plate GPL2025. The experiment GSE6893 was used for differential expression analysis. b MOV, the maximum absolute values of vegetative tissues. c MOR, the maximum absolute values of reproductive tissues. d The values of MOR divided by M|panicles. Expression analysis of rice AGP-encoding genes under abiotic stress, ABA, and GA treatments To investigate the abiotic stress response of rice AGP-encoding genes, the results of microarray (GSE6901) from 7-day-old seedlings subjected to drought, salt, and cold stresses were analysed. The data revealed that a total of 15 genes were significantly down- or up-regulated (<0.5 or >|x02009;h, and down-regulated by drought and salt stresses ( Fig. 8 ). Fig. 8. Expression profiles of rice AGP-encoding genes differentially expressed under abiotic stresses. The microarray data sets (GSE6901) of gene expression under various abiotic stresses were used for cluster display. The average log signal values of AGP-encoding genes under control and various stress conditions (indicated at the t|values) is shown at the bottom. CK, control; DS, drought stress; SS, salt stress; CS, cold stress. The results of ABA- and GA-treated callus were used to analyse the regulation of AGP-encoding genes (GSE661). It was found that many AGP-encoding genes were regulated by ABA and GA ( Supplementary Table S5 at JXB online). To verify the results of microarray under ABA and GA treatments, the transcriptio | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
254 | GSE5629 | 1/8/2007 | ['5629'] | [] | [] | 2993539 | [u'21044985'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Jhanwar', 'Garg', 'Tyagi', 'Jain'] | [] | DNA Res | 2010 | 2010 Dec | 0 | rice and TAIR database for Arabidopsis . Another approach used was to search for the proteins containing PFam domain PF00462, which encodes GRX domain. Further, the rice and Arabidopsis proteomes downloaded from RGAP (version 6.1) and TAIR (version 9), respectively, were searched using the hidden Markov model (HMM) profile (build 2.3.2) for GRX domain (PF00462) downloaded from the PFam database. The r| gene in a given library represents the quantitative estimate of the expression of that gene. 2.6. Expression analysis using microarray data For rice, the microarray data available at Gene Expression Omnibus (GEO) database under the series accession numbers GSE6893 (various stages of development), 21 GSE6901 (abiotic stress), 21 GSE4471 (arsenate stress), 22 GSE5167 (auxin and cytokinin treatment), 23|ays as described earlier. 21 For Arabidopsis , the microarray data for various stages of developments, hormone treatments and abiotic stress treatments corresponding to those analyzed for rice were downloaded from GEO (series accession numbers GSE5620, GSE5621, GSE5623, GSE5624 and GSE5629{{tag}}--REUSE--–GSE5634) and AtGenExpress ( http://www.arabidopsis.org/portals/expression/microarray/ATGenExpress.jsp ) dat | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
255 | GSE5629 | 1/8/2007 | ['5629'] | [] | [] | 3022907 | [u'21167079'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Wang', 'Tu', 'Guo', 'Hu', 'Peng', 'Li', 'Cui'] | [] | BMC Plant Biol | 2010 | 12/20/2010 | 0 | ns Authors' contributions Supplementary Material References Methods Database searches for OsCESA/CSL genes in rice The Hidden Markov Model (HMM) profile of the cellulose synthase domain (PF03552) was downloaded from PFam http://pfam.sanger.ac.uk/ . We employed a name search and the protein family ID PF03552 for the identification of OsCESA / CSL genes from the rice genome. Information about the chromos|l mutant" analysis. However, in the hierarchical cluster of the "artificial mutant" analysis, the expression data for regarding gene(s) or tissues were deleted. All Arabidopsis microarray data were downloaded from the Gene Expression Omnibus database http://www.ncbi.nlm.nih.gov/geo/ using the GSE series accession numbers GSE5629{{tag}}--REUSE-- , GSE5630 , GSE5631 , GSE5632 , GSE5633 and GSE5634 (Additional f | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
256 | GSE5630 | 1/8/2007 | ['5630'] | [] | [] | 1947970 | [u'17640358'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | [] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630{{tag}}--REUSE--, GSE5631, GSE5632 and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
257 | GSE5630 | 1/8/2007 | ['5630'] | [] | [] | 2576257 | [u'18826656'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | [] | BMC Genomics | 2008 | 10/1/2008 | 0 | oarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (< 2 mm panicle), P1-II (0.2 to 0.5 mm panicle) and P1-III (5 to 10 mm panicle) have been added to emphas|00ae; ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630{{tag}}--REUSE--, GSE5631, GSE5632 and GSE5634. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software wherein GCRMA algo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
258 | GSE5630 | 1/8/2007 | ['5630'] | [] | [] | 3022907 | [u'21167079'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Wang', 'Tu', 'Guo', 'Hu', 'Peng', 'Li', 'Cui'] | [] | BMC Plant Biol | 2010 | 12/20/2010 | 0 | ns Authors' contributions Supplementary Material References Methods Database searches for OsCESA/CSL genes in rice The Hidden Markov Model (HMM) profile of the cellulose synthase domain (PF03552) was downloaded from PFam http://pfam.sanger.ac.uk/ . We employed a name search and the protein family ID PF03552 for the identification of OsCESA / CSL genes from the rice genome. Information about the chromos|l mutant" analysis. However, in the hierarchical cluster of the "artificial mutant" analysis, the expression data for regarding gene(s) or tissues were deleted. All Arabidopsis microarray data were downloaded from the Gene Expression Omnibus database http://www.ncbi.nlm.nih.gov/geo/ using the GSE series accession numbers GSE5629 , GSE5630{{tag}}--REUSE-- , GSE5631 , GSE5632 , GSE5633 and GSE5634 (Additional f | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
259 | GSE5630 | 1/8/2007 | ['5630'] | [] | [] | 2940909 | [u'20862292'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['King', 'Love', 'Broadley', 'Bowen', 'Hammond', 'O', 'May', 'White', 'Graham'] | [] | PLoS One | 2010 | 9/16/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
260 | GSE5631 | 1/8/2007 | ['5631'] | [] | [] | 1947970 | [u'17640358'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | [] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631{{tag}}--REUSE--, GSE5632 and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
261 | GSE5631 | 1/8/2007 | ['5631'] | [] | [] | 2576257 | [u'18826656'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | [] | BMC Genomics | 2008 | 10/1/2008 | 0 | oarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (< 2 mm panicle), P1-II (0.2 to 0.5 mm panicle) and P1-III (5 to 10 mm panicle) have been added to emphas|00ae; ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631{{tag}}--REUSE--, GSE5632 and GSE5634. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software wherein GCRMA algo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
262 | GSE5631 | 1/8/2007 | ['5631'] | [] | [] | 3022907 | [u'21167079'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Wang', 'Tu', 'Guo', 'Hu', 'Peng', 'Li', 'Cui'] | [] | BMC Plant Biol | 2010 | 12/20/2010 | 0 | ns Authors' contributions Supplementary Material References Methods Database searches for OsCESA/CSL genes in rice The Hidden Markov Model (HMM) profile of the cellulose synthase domain (PF03552) was downloaded from PFam http://pfam.sanger.ac.uk/ . We employed a name search and the protein family ID PF03552 for the identification of OsCESA / CSL genes from the rice genome. Information about the chromos|l mutant" analysis. However, in the hierarchical cluster of the "artificial mutant" analysis, the expression data for regarding gene(s) or tissues were deleted. All Arabidopsis microarray data were downloaded from the Gene Expression Omnibus database http://www.ncbi.nlm.nih.gov/geo/ using the GSE series accession numbers GSE5629 , GSE5630 , GSE5631{{tag}}--REUSE-- , GSE5632 , GSE5633 and GSE5634 (Additional f | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
263 | GSE5631 | 1/8/2007 | ['5631'] | [] | [] | 2940909 | [u'20862292'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['King', 'Love', 'Broadley', 'Bowen', 'Hammond', 'O', 'May', 'White', 'Graham'] | [] | PLoS One | 2010 | 9/16/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
264 | GSE5632 | 1/8/2007 | ['5632'] | [] | [] | 3012471 | [u'21149731'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Yoo', 'Ma', 'Zahn', 'Wall', 'dePamphilis', 'Albert', 'Leebens-Mack', 'Altman', 'Gitzendanner', 'Soltis', 'Chanderbali', 'Brockington'] | [] | Proc Natl Acad Sci U S A | 2010 | 12/28/2010 | 0 | Microarray expression data for leaves and mature floral organs were extracted from data sets forÊEschscholziaÊ[Gene Expressio Omnibus (GEO) accession no.GSE24237],ÊArabidopsisÊ(GEO accession no.ÊGSE5632{{key}}--REUSE--),ÊPerseaÊ(GEO accession no.ÊGSE13737), andÊNupharÊ(GEO accession no.ÊGSE23082) | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
265 | GSE5632 | 1/8/2007 | ['5632'] | [] | [] | 1947970 | [u'17640358'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | [] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632{{tag}}--REUSE-- and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
266 | GSE5632 | 1/8/2007 | ['5632'] | [] | [] | 2576257 | [u'18826656'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | [] | BMC Genomics | 2008 | 10/1/2008 | 0 | oarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (< 2 mm panicle), P1-II (0.2 to 0.5 mm panicle) and P1-III (5 to 10 mm panicle) have been added to emphas|00ae; ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632{{tag}}--REUSE-- and GSE5634. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software wherein GCRMA algo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
267 | GSE5632 | 1/8/2007 | ['5632'] | [] | [] | 3022907 | [u'21167079'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Wang', 'Tu', 'Guo', 'Hu', 'Peng', 'Li', 'Cui'] | [] | BMC Plant Biol | 2010 | 12/20/2010 | 0 | ns Authors' contributions Supplementary Material References Methods Database searches for OsCESA/CSL genes in rice The Hidden Markov Model (HMM) profile of the cellulose synthase domain (PF03552) was downloaded from PFam http://pfam.sanger.ac.uk/ . We employed a name search and the protein family ID PF03552 for the identification of OsCESA / CSL genes from the rice genome. Information about the chromos|l mutant" analysis. However, in the hierarchical cluster of the "artificial mutant" analysis, the expression data for regarding gene(s) or tissues were deleted. All Arabidopsis microarray data were downloaded from the Gene Expression Omnibus database http://www.ncbi.nlm.nih.gov/geo/ using the GSE series accession numbers GSE5629 , GSE5630 , GSE5631 , GSE5632{{tag}}--REUSE-- , GSE5633 and GSE5634 (Additional f | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
268 | GSE5632 | 1/8/2007 | ['5632'] | [] | [] | 2843724 | [u'20352114'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Kramer', 'Borevitz', 'Hodges', 'Voelckel'] | [] | PLoS One | 2010 | 3/23/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
269 | GSE5633 | 1/8/2007 | ['5633'] | [] | [] | 3022907 | [u'21167079'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Wang', 'Tu', 'Guo', 'Hu', 'Peng', 'Li', 'Cui'] | [] | BMC Plant Biol | 2010 | 12/20/2010 | 0 | ns Authors' contributions Supplementary Material References Methods Database searches for OsCESA/CSL genes in rice The Hidden Markov Model (HMM) profile of the cellulose synthase domain (PF03552) was downloaded from PFam http://pfam.sanger.ac.uk/ . We employed a name search and the protein family ID PF03552 for the identification of OsCESA / CSL genes from the rice genome. Information about the chromos|l mutant" analysis. However, in the hierarchical cluster of the "artificial mutant" analysis, the expression data for regarding gene(s) or tissues were deleted. All Arabidopsis microarray data were downloaded from the Gene Expression Omnibus database http://www.ncbi.nlm.nih.gov/geo/ using the GSE series accession numbers GSE5629 , GSE5630 , GSE5631 , GSE5632 , GSE5633{{tag}}--REUSE-- and GSE5634 (Additional f | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
270 | GSE5634 | 1/8/2007 | ['5634'] | [] | [] | 1947970 | [u'17640358'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | [] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634{{tag}}--REUSE--. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
271 | GSE5634 | 1/8/2007 | ['5634'] | [] | [] | 2576257 | [u'18826656'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | [] | BMC Genomics | 2008 | 10/1/2008 | 0 | oarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (< 2 mm panicle), P1-II (0.2 to 0.5 mm panicle) and P1-III (5 to 10 mm panicle) have been added to emphas|00ae; ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634{{tag}}--REUSE--. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software wherein GCRMA algo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
272 | GSE5634 | 1/8/2007 | ['5634'] | [] | [] | 2993539 | [u'21044985'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Jhanwar', 'Garg', 'Tyagi', 'Jain'] | [] | DNA Res | 2010 | 2010 Dec | 0 | rice and TAIR database for Arabidopsis . Another approach used was to search for the proteins containing PFam domain PF00462, which encodes GRX domain. Further, the rice and Arabidopsis proteomes downloaded from RGAP (version 6.1) and TAIR (version 9), respectively, were searched using the hidden Markov model (HMM) profile (build 2.3.2) for GRX domain (PF00462) downloaded from the PFam database. The r| gene in a given library represents the quantitative estimate of the expression of that gene. 2.6. Expression analysis using microarray data For rice, the microarray data available at Gene Expression Omnibus (GEO) database under the series accession numbers GSE6893 (various stages of development), 21 GSE6901 (abiotic stress), 21 GSE4471 (arsenate stress), 22 GSE5167 (auxin and cytokinin treatment), 23|ays as described earlier. 21 For Arabidopsis , the microarray data for various stages of developments, hormone treatments and abiotic stress treatments corresponding to those analyzed for rice were downloaded from GEO (series accession numbers GSE5620, GSE5621, GSE5623, GSE5624 and GSE5629–GSE5634{{tag}}--REUSE--) and AtGenExpress ( http://www.arabidopsis.org/portals/expression/microarray/ATGenExpress.jsp ) dat | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
273 | GSE5634 | 1/8/2007 | ['5634'] | [] | [] | 3022907 | [u'21167079'] | [u'Schildknecht', u'Townsend', u'Weigel', u'Emmerson', u'Lohmann', u'Schmid'] | ['Wang', 'Tu', 'Guo', 'Hu', 'Peng', 'Li', 'Cui'] | [] | BMC Plant Biol | 2010 | 12/20/2010 | 0 | ns Authors' contributions Supplementary Material References Methods Database searches for OsCESA/CSL genes in rice The Hidden Markov Model (HMM) profile of the cellulose synthase domain (PF03552) was downloaded from PFam http://pfam.sanger.ac.uk/ . We employed a name search and the protein family ID PF03552 for the identification of OsCESA / CSL genes from the rice genome. Information about the chromos|l mutant" analysis. However, in the hierarchical cluster of the "artificial mutant" analysis, the expression data for regarding gene(s) or tissues were deleted. All Arabidopsis microarray data were downloaded from the Gene Expression Omnibus database http://www.ncbi.nlm.nih.gov/geo/ using the GSE series accession numbers GSE5629 , GSE5630 , GSE5631 , GSE5632 , GSE5633 and GSE5634{{tag}}--REUSE-- (Additional f | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
274 | GSE5646 | 1/1/2007 | ['5646'] | [] | [u'17101968'] | 1693798 | [u'17101968'] | ['Scheifele', 'Mart\xc3\xadnez-Murillo', u'Martinez-Murillo', 'Wheelan', 'Irizarry', 'Boeke'] | ['Scheifele', 'Mart\xc3\xadnez-Murillo', 'Wheelan', 'Irizarry', 'Boeke'] | ['Scheifele', 'Mart\xc3\xadnez-Murillo', 'Wheelan', 'Irizarry', 'Boeke'] | Proc Natl Acad Sci U S A | 2006 | 11/21/2006 | 0 | AND pmc_gds | 1 | 0 | ||||
275 | GSE5652 | 9/19/2007 | ['5652'] | [] | [u'17984051'] | 2748096 | [u'19728865'] | ['', 'Wilson', 'Paules', 'Russo', 'Tennant', 'Li', 'Fannin', 'Houle', 'Boorman', 'Huang', 'Malarkey', 'Bushel', 'Heinloth', 'Ward', 'Watkins', 'Chou'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | [] | BMC Genomics | 2009 | 9/3/2009 | 1 | f uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when|r to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. Results GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query an|gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . Background One of the major challenges in the post-g|vering gene functions on a genomic scale [ 1 ]. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO) [ 2 ], ArrayExpress [ 3 ] and researchers' websites. These resources serve at least two purposes. One is as an archive of the data, which allows other researchers to confirm results that have b|eveloped a web tool named GEM-TREND (Gene Expression data Mining Toward Relevant Network Discovery) to automatically retrieve gene expression data across a wide range of microarray experiments in the publicly available GEO database by comparing gene-expression patterns between a query and the database entries. Subsequently, the system generates a gene co-expression network for retrieved gene expression da|, and each series links to GEO by clicking the GSE ID or GPL ID (Fig. 3e ). In addition, the series of interest can be selected for further processing. Both search results and selected series can be downloaded in CSV format. Figure 3 Screenshot of GEM-TREND . a) Query input area. The gene-expression signature, gene expression ratio data and text are accepted. Network IDs can be used to retrieve previous | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192 (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965 GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381 GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029 GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615 GPL2884 Cancer Process Study 0.752 0.0107 GSE7600 GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192 GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625 GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528 GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou|language: Java, PHP Other requirements: Java 1.5.0 or higher License: The tool is available free of charge Any restrictions to of use by non-academics: None List of abbreviations GEO: Gene Expression Omnibus; GO: Gene Ontology; GSE: Series in GEO; GPL: Platform in GEO; MeSH: Medical Subject Headings. Authors' contributions CF designed the system and wrote the manuscript; MA gave comments and edited the m|l E Koller D Kim SK A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules Science 2003 302 249 255 12934013 10.1126/science.1087447 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 207 210 11752295 10.1093/nar/30.1.207 Parkinson H Kapushesky M Shojatalab M Abeygunawardena N Coulson R Farne A | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
276 | GSE5652 | 9/19/2007 | ['5652'] | [] | [u'17984051'] | 2084322 | [u'17984051'] | ['', 'Wilson', 'Paules', 'Russo', 'Tennant', 'Li', 'Fannin', 'Houle', 'Boorman', 'Huang', 'Malarkey', 'Bushel', 'Heinloth', 'Ward', 'Watkins', 'Chou'] | ['Wilson', 'Paules', 'Russo', 'Tennant', 'Li', 'Fannin', 'Houle', 'Boorman', 'Huang', 'Malarkey', 'Bushel', 'Heinloth', 'Ward', 'Watkins', 'Chou'] | ['Bushel', 'Wilson', 'Paules', 'Tennant', 'Li', 'Fannin', 'Houle', 'Chou', 'Huang', 'Malarkey', 'Ward', 'Heinloth', 'Russo', 'Watkins', 'Boorman'] | Proc Natl Acad Sci U S A | 2007 | 11/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
277 | GSE5658 | 1/28/2007 | ['5658'] | [] | [u'17239249'] | 2759528 | [u'19834616'] | [u'Jason', 'Lanier', 'Eng', 'Dykes', 'Turner', 'Fedtsova'] | ['Eppig', 'Graber', 'Wigglesworth', 'Hutchison', 'Salisbury'] | [] | PLoS One | 2009 | 10/16/2009 | 0 | polyadenylation in oocytes [29] led to the choice of cDNA generated with random primers for the GV and MII oocyte microarrays (referred to here as GV and MII , GEO Accession GSE5658{{tag}}--REUSE-- [22] ). In contrast, the microarrays from the Dicer-knockout experiment (referred to as MIIpa and MIIdko ) used the standard oligo-dT primed cDNA [20] . Rmodel |he expanded gene annotations. Microarray data Microarray data files for GV and MII datasets [22] were obtained from the Gene Expression Omnibus [39] (Accession GSE5668). Oligo-dT primed array data files for the MIIpa and MIIdko datasets were generously provided by Richard M. Schultz [20] . Identification of differences in mRNA processing with | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
278 | GSE5658 | 1/28/2007 | ['5658'] | [] | [u'17239249'] | 1796875 | [u'17239249'] | [u'Jason', 'Lanier', 'Eng', 'Dykes', 'Turner', 'Fedtsova'] | ['Turner', 'Fedtsova', 'Lanier', 'Dykes', 'Eng'] | ['Turner', 'Lanier', 'Fedtsova', 'Dykes', 'Eng'] | Neural Dev | 2007 | 1/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
279 | GSE5671 | 8/30/2007 | ['5671'] | [] | [u'18246200'] | 2214848 | [u'18246200'] | ['Jie', 'Miller', 'Gearhart', 'Christoforou', 'Hill', 'McCallion'] | ['Jie', 'Miller', 'Gearhart', 'Christoforou', 'Hill', 'McCallion'] | ['Jie', 'Miller', 'Gearhart', 'Christoforou', 'Hill', 'McCallion'] | J Clin Invest | 2008 | 2008 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
280 | GSE5682 | 9/1/2007 | ['5682'] | [] | [u'17873882'] | 2600418 | [u'17873882'] | ['Agarwal', 'Barnes', 'Palomero', 'Castillo', u'CordonCardo', 'Real', 'Brown', 'Buteau', 'Caparros', 'Cordon-Cardo', u'Z\xfa\xf1iga-Pfl\xfccker', 'Basso', 'Bhagat', 'Parsons', 'Nagase', 'Ferrando', 'Ciofani', 'Sulis', 'Perkins', 'Cortina', u'Mishra', 'Z\xc3\xba\xc3\xb1iga-Pfl\xc3\xbccker', 'Dominguez'] | ['Real', 'Brown', 'Parsons', 'Ciofani', 'Sulis', 'Buteau', 'Ferrando', 'Barnes', 'Basso', 'Bhagat', 'Palomero', 'Perkins', 'Z\xc3\xba\xc3\xb1iga-Pfl\xc3\xbccker', 'Cortina', 'Caparros', 'Dominguez', 'Agarwal', 'Cordon-Cardo', 'Nagase', 'Castillo'] | ['Real', 'Brown', 'Ciofani', 'Sulis', 'Buteau', 'Ferrando', 'Barnes', 'Basso', 'Bhagat', 'Parsons', 'Cordon-Cardo', 'Palomero', 'Caparros', 'Cortina', 'Z\xc3\xba\xc3\xb1iga-Pfl\xc3\xbccker', 'Dominguez', 'Agarwal', 'Perkins', 'Nagase', 'Castillo'] | Nat Med | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
281 | GSE5683 | 4/20/2007 | ['5683'] | [] | [u'17442700'] | 1669715 | [u'17090591'] | ['McCuine', 'Tenzen', 'Zhong', 'Longabaugh', 'Giles', 'Wong', 'Vokes', 'McMahon', 'Davidson', 'Ji'] | ['Wong', 'Ji', 'Vokes'] | ['Wong', 'Ji', 'Vokes'] | Nucleic Acids Res | 2006 | 2006 | 0 | iption factors and provide guidelines for their future analysis. MATERIALS AND METHODS Data preparation To conduct the comparative study, we collected ChIP-chip data for Gli (mouse, GEO accession no. GSE5683{{tag}}--REUSE--) (S.A. Vokes, H. Ji, S. McCuine, T. Tenzen, S. Giles, S. Zhong, W.J.R. Longabaugh, E.H. Davison, W.H. Wong and A.P. McMahon, submitted for publication), estrogen receptor (human) ( 2 ), Oct4, Sox2 | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
282 | GSE5685 | 1/8/2007 | ['5685'] | [] | [] | 2848248 | [u'20226034'] | [u'Emmerson', u'Schildknecht', u'Dong', u'Townsend'] | ['Blomster', 'Saloj\xc3\xa4rvi', 'Wrzaczek', 'Jaspers', 'Vainonen', 'Kangasj\xc3\xa4rvi', 'Overmyer', 'Reddy'] | [] | BMC Genomics | 2010 | 3/12/2010 | 0 | n with decreasing short-wave cut-off in the UV range (UV-B experiment); E-MEXP-739, Syringolin A; E-MEXP-1797, Rotenone), Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ (accession numbers GSE5615, Elicitors LPS, HrpZ, Flg22 and NPP1; GSE5685{{tag}}--REUSE--, Virulent and avirulent Pseudomonas syringae ; GSE9955, BTH experiment 1; GDS417 E. cichoracearum ; GSE5530, H 2 O 2 ; GSE5621, Cold time course expe| GSE5622, Osmotic stress time course experiment; GSE5623, Salt time course experiment; GSE5624, Drought time course experiment; GSE5722, O 3 ; GSE12887, Norflurazon; GSE10732, OPDA and Phytoprostane; GSE7112, ABA experiment 2) and The Integrated Microarray Database System http://ausubellab.mgh.harvard.edu/imds (Experiment name: BTH time course, BTH experiment 2). The raw Affymetrix data was preproces | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
283 | GSE5715 | 5/21/2007 | ['5715'] | [] | [u'17615178'] | 2945940 | [u'20840752'] | ['Humes', u'Carnale-Zambrano', 'Canale-Zambrano', 'Poffenberger', 'Haston', 'Cory'] | ['Ojeda', 'de', 'Nitsch', 'Gon\xc3\xa7alves', 'Moreau'] | [] | BMC Bioinformatics | 2010 | 9/14/2010 | 0 | n, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expr| approach for scoring candidates is presented here for comparison purposes as a naïve strategy for network analysis of differential expression. We have benchmarked these four strategies on 40 publicly available data sets originated from Affymetrix chips on which mice with (simple) knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. For each data set we|egy based on a direct neighborhood analysis. These four network-based prioritization strategies for scoring candidate genes based on their differentially expressed neighborhood were benchmarked on 40 publicly available knockout experiments in mice. Performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expressi|ng candidate genes using network-based machine learning approaches even if no knowledge is available about the disease or phenotype. Methods Benchmark data The benchmark for this study consists of 40 publicly available data sets originated from Affymetrix chips on which mice with knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. Table 3 shows all data sets|ur benchmark. Table 3 The benchmark data. Gene Name GEO accession number Gene Name GEO accession number 1 Abca1 GSE5496 21 Mbnl1 GSE14691 2 Btk GSE2826 22 Mst1r, Ron GSE16629 3 Cav1 GSE10849 23 MyD88 GSE6688 4 Cav3 GSE10848 24 Nos3, eNos GSE1988 5 Cftr GSE5715{{tag}}--REUSE-- 25 Phgdh GSE8555 6 Clcn1 GSE14691 26 Pmp22 GSE1947 7 Cnr1 GSE7694 27 PPAR α GSE6864 8 Emd GSE5304 28 Prkag3, AMPK G3 GSE4065 9 Epas1, Hi|-2 GSE16067 29 Pthlh, Pthrp GSE17654 10 Esrra GSE7196 30 Rab3a GSE6527 11 Gap43 GSE12687 31 RasGrf1 GSE8425 12 Gnmt GSE9809 32 Rbm15 GSE12628 13 Hdac1 GSE5583 33 Runx GSE4911 14 Hdac2 GSE6770 34 Scd1 GSE2926 15 Hsf4 GSE12415 35 Slc26a4 GSE10587 16 Hspa1A, Hsp70.1 GSE11120 36 Srf GSE13333 17 Il6 GSE411 37 Tgm2 GSE10285 18 Lhx1, Lim1 GSE4230 38 Zc3h12a GSE14891 19 Lhx8 GSE11897 39 Zfp36, Tpp GSE5324 20 L | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
284 | GSE5720 | 3/26/2007 | ['5720'] | [] | [u'17339364', u'20053763'] | 2655965 | [u'18999108'] | ['Scherf', 'Pommier', 'Shankavaram', 'Lorenzi', 'Chary', 'Ikediobi', 'Lee', 'Cossman', 'Nishizuka', 'Scudiero', 'Ho', 'Reinhold', 'Weinstein', 'Dolginow', 'Reimers', 'Major', 'Petricoin', 'Kaldjian', 'Kahn', 'Ziegler', 'Morita', 'Liotta', 'Bussey'] | ['Liu', 'Clarke', 'Yoon', 'Li'] | [] | AMIA Annu Symp Proc | 2008 | 11/6/2008 | 1 | eriments. In this paper, we report our experience in exploring Gene Expression Data (MGED) ontology (MO) 9 and NCI Thesaurus for annotating breast cancer microarray data available at Gene Expression Omnibus (GEO) 10 . Specifically, we tailored NCI Thesaurus to obtain breast cancer microarray clinical ontology (BCM-CO), an ontology to capture breast cancer microarray clinical information. The coverage of|I Metathesaurus are used to provide terminology support to the public Web portal, http://cancer.gov , numerous portals supporting consortia, and other communities of researchers. The Gene Expression Omnibus (GEO) was initiated to serve as a public repository for a wide range of high-throughput experimental data, which includes data from single and dual channel microarray-based experiments measuring mRNA| The prototype coverage with respect to four categories is discussed below in detail. DiseaseState and Histology In the BCM-CO prototype, 82 histology nodes were mapped. We identified six series ( GSE2109 , GSE5949 , GSE5720{{tag}}--REUSE-- , GSE6595 , GSE1477 , and GSE7849 ) in which histology terms can be extracted using simple patterns. Overall, 52 terms were retrieved, among which 48 are histology terms, 2 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
285 | GSE5720 | 3/26/2007 | ['5720'] | [] | [u'17339364', u'20053763'] | 2729377 | [u'19714244'] | ['Scherf', 'Pommier', 'Shankavaram', 'Lorenzi', 'Chary', 'Ikediobi', 'Lee', 'Cossman', 'Nishizuka', 'Scudiero', 'Ho', 'Reinhold', 'Weinstein', 'Dolginow', 'Reimers', 'Major', 'Petricoin', 'Kaldjian', 'Kahn', 'Ziegler', 'Morita', 'Liotta', 'Bussey'] | ['Potti', 'Chang', 'Mori', 'Andrechek', 'Nevins'] | [] | PLoS One | 2009 | 8/28/2009 | 0 | ypes. To validate the “basal-luminal” signature from cultured cell lines, we used three independent datasets for human in vivo breast cancer (GEO; http://www.ncbi.nlm.nih.gov/geo ; GSE1456, GSE1561 and GSE3744) [19] , [20] , [21] . Gene expression signatures for RAS and PI3K used in this study were generated by adenoviral overexpress|, among which 6,638 have chemical names (or equivalents). Conversion of gene expression signature to drug response We used the gene expression data of the NCI-60 on Affymetrix U133A/B chips from GEO (GSE5720{{tag}}--REUSE--) [22] for the analyses in this study. To select compounds, we calculate the Pearson correlation between the predicted probability for the phenotype of interest against the GI50 va|ependent datasets of human primary breast cancers. The predicted probability for basal (blue) or luminal (red) are shown in a heatmap with the labeling for the cell type classification by microarray (GSE1456), the immunoreactivity status for estrogen and progesterone receptor (GSE1561) or the status for basal subtype by cytokeratin expression patterns (GSE3744). C, D and E. Prediction for basal and lum|ated using Mann-Whitney U test. A bar indicates mean value for each group. The predicted probability for basal or luminal is shown with the labeling for the cell type classification by microarray (C; GSE1456), the immunoreactivity status for estrogen and progesterone receptor (D; GSE1561) or the status for basal subtype by cytokeratin expression patterns (E; GSE3744). Accuracy of the prediction was als | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
286 | GSE5720 | 3/26/2007 | ['5720'] | [] | [u'17339364', u'20053763'] | 2821037 | [u'20053763'] | ['Scherf', 'Pommier', 'Shankavaram', 'Lorenzi', 'Chary', 'Ikediobi', 'Lee', 'Cossman', 'Nishizuka', 'Scudiero', 'Ho', 'Reinhold', 'Weinstein', 'Dolginow', 'Reimers', 'Major', 'Petricoin', 'Kaldjian', 'Kahn', 'Ziegler', 'Morita', 'Liotta', 'Bussey'] | ['Ikediobi', 'Weinstein', 'Pommier', 'Nishizuka', 'Shankavaram', 'Ziegler', 'Lorenzi', 'Ho', 'Bussey', 'Reimers', 'Reinhold'] | ['Ikediobi', 'Weinstein', 'Pommier', 'Nishizuka', 'Shankavaram', 'Ziegler', 'Lorenzi', 'Ho', 'Bussey', 'Reimers', 'Reinhold'] | Mol Cancer Ther | 2010 | 2010 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
287 | GSE5722 | 1/8/2007 | ['5722'] | [] | [] | 2848248 | [u'20226034'] | [u'Emmerson', u'Short', u'Schildknecht', u'Shirras', u'Townsend'] | ['Blomster', 'Saloj\xc3\xa4rvi', 'Wrzaczek', 'Jaspers', 'Vainonen', 'Kangasj\xc3\xa4rvi', 'Overmyer', 'Reddy'] | [] | BMC Genomics | 2010 | 3/12/2010 | 0 | n with decreasing short-wave cut-off in the UV range (UV-B experiment); E-MEXP-739, Syringolin A; E-MEXP-1797, Rotenone), Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ (accession numbers GSE5615, Elicitors LPS, HrpZ, Flg22 and NPP1; GSE5685, Virulent and avirulent Pseudomonas syringae ; GSE9955, BTH experiment 1; GDS417 E. cichoracearum ; GSE5530, H 2 O 2 ; GSE5621, Cold time course expe| GSE5622, Osmotic stress time course experiment; GSE5623, Salt time course experiment; GSE5624, Drought time course experiment; GSE5722{{tag}}--REUSE--, O 3 ; GSE12887, Norflurazon; GSE10732, OPDA and Phytoprostane; GSE7112, ABA experiment 2) and The Integrated Microarray Database System http://ausubellab.mgh.harvard.edu/imds (Experiment name: BTH time course, BTH experiment 2). The raw Affymetrix data was preproces | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
288 | GSE5745 | 1/8/2007 | ['5745'] | [] | [] | 2811198 | [u'20126659'] | [u'Emmerson', u'Yang', u'Schildknecht', u'Dong', u'Townsend'] | ['Kwezi', 'Ruzvidzo', 'Morse', 'Meier', 'Gehring', 'Donaldson'] | [] | PLoS One | 2010 | 1/26/2010 | 0 | change (log2) in expression of AtWAKL10 and all genes in the ECGG in response to selected microarray experiments. The experiments presented include; chitooctaose (30 minutes after treatment (mat), GSE8319), elf26 (30 mat, E-MEXP-547), flg22 (1 hour after treatment (hat), NASC-409), NPP1 (1 hat, GSE5615), HrpZ (1 hat, GSE5615), P. infestans (6 hat, NASC-123), B. graminis h (12 hat, GSE12856), E.|4 hat, NASC-120), BTH treatment (BTH vs. untreated (Col-0) and BTH ( npr1) vs. BTH (Col-0), 8 hat, NASC-392), mpk4 (At4g01370, E-MEXP-174), mkk1 (At4g26070) and mkk2 (At4g29810) double mutant (GSE10646), and CHX (3 hat, NASC-189). Details of the microarray experimental conditions are presented in Text S2 (Supporting Information). 10.1371/journal.pone.0008904.g007 Figure 7 Expression of AtWAKL|10 following pathogen and elicitor challenge. ( A ) Fold change in AtWAKL10 expression following incubation with the pathogen elicitors, chitooctaose (Col-0 and cerk1 (At3g21630) mutant, 30 mat, GSE8319); elf26 (30 mat, E-MEXP-547), flg22 (1 hat, NASC-409), NPP1 (1 hat, GSE5615), HrpZ (1 hat, GSE5615) and Syringolin A (12 hat, E-MEXP-739) as determined from microarray experiments. ( B ) Fold chang|owing challenge with P. infestans (6 hat, NASC-123), B. graminis h (12 hat, GSE12856), E. cichoracearum (3 dat, GSE431), G. orontii time course (Col-0 and eds16/ics1 mutant, 3, 5 and 7 dat, GSE13739), B. cinerea (18 and 48 hat, NASC-167) and OG treatment (1 hat, NASC-409) as determined from microarray experiments. ( C ) Semi-quantitative RT-PCR gel image illustrating AtWAKL10 expression o|rray experiments. Fold change in AtWAKL10 expression was also determined from microarray experiments in a number of pathogen-related mutants including mpk4 (E-MEXP-174), mkk1mkk2 double mutant (GSE10646) and cpr5 (At5g64930, GSE5745{{tag}}--REUSE--). ( B ) Semi-quantitative RT-PCR confirmed the induction of AtWAKL10 expression in response to 10 µM CHX (3 hat) relative to the DMSO control (C). UBQ w|.genevestigator.com ) using the stimulus and mutation tools [96] . In order to obtain greater resolution of gene expression profiles, the normalized microarray data were subsequently downloaded and analyzed for experiments that were found to induce differential expression of the genes. The data were downloaded from the following repository sites; NASCArrays ( http://affymetrix.arabidopsis | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
289 | GSE5749 | 1/22/2007 | ['5749'] | [] | [] | 2847557 | [u'20230623'] | [u'Emmerson', u'Benfey', u'Schildknecht', u'Birnbaum', u'Townsend'] | ['Holman', 'Wilson', 'Kenobi', 'Holdsworth', 'Dryden', 'Hodgman', 'Wood'] | [] | Plant Methods | 2010 | 3/15/2010 | 0 | /www.ebi.ac.uk/arrayexpress/ with accession number [E-MEXP-2608]. Other data sets used in this manuscript were obtained from GEO http://www.ncbi.nlm.nih.gov/geo/ with the accession numbers: [GEO: GSE5749{{tag}}--REUSE-- ] [ 4 ], [GEO: GSE432 ] [ 24 ] and [GEO: GSE3350 ] [ 25 ]. Competing interests The authors declare that they have no competing interests. Authors' contributions TJH produced the material, generat | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
290 | GSE5764 | 3/16/2007 | ['5764'] | ['2635'] | [u'17389037'] | 2872436 | [u'20335537'] | ['Bouchal', 'Skarda', 'Srovnal', 'Klein', 'Wei', 'Murray', 'Hajduch', 'Baumforth', 'Kolar', 'Ehrmann', 'Fridman', 'Dziechciarkova', 'Turashvili'] | ['Potti', 'Gatza', 'Lucas', 'Nevins', 'Barry', 'Kelley', 'Datto', 'Mathey-Prevot', 'Kim', 'Crawford', 'Wang'] | [] | Proc Natl Acad Sci U S A | 2010 | 4/13/2010 | 0 | sion Materials and Methods Supplementary Material References Materials and Methods Human Breast Tumor Samples and Cancer Cell Lines. A total of 1,143 patient samples from 10 independent datasets ( GSE1456 , GSE1561 , GSE2034 , GSE3494 , GSE3744 , GSE4922 , GSE5460 , GSE5764{{tag}}--REUSE-- , GSE6596 , and E-TABM-158) were analyzed ( 9 , 32 – 40 ). The validation dataset ( n = 547) was derived from | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
291 | GSE5764 | 3/16/2007 | ['5764'] | ['2635'] | [u'17389037'] | 2896865 | [u'20625410'] | ['Bouchal', 'Skarda', 'Srovnal', 'Klein', 'Wei', 'Murray', 'Hajduch', 'Baumforth', 'Kolar', 'Ehrmann', 'Fridman', 'Dziechciarkova', 'Turashvili'] | ['Lei', 'Hou', 'Wang', 'Li'] | [] | J Biomed Biotechnol | 2010 | 2010 | 0 | network analysis. 2.6. Dataset To evaluate the performance of the proposed method, seven gene expression datasets were used in this study: Acute Lymphoblastic Leukemia (ALL) [ 17 ], Breast cancer 30 (GSE5764{{tag}}--REUSE--) [ 18 ], Breast cancer 22(GSE8977) [ 18 ], Colon cancer [ 19 ], Prostate cancer 102 [ 20 ], and Prostate cancer 34 [ 21 ]. The two pairs of cross-platform datasets were used to evaluate the general | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
292 | GSE5764 | 3/16/2007 | ['5764'] | ['2635'] | [u'17389037'] | 2799788 | [u'19995982'] | ['Bouchal', 'Skarda', 'Srovnal', 'Klein', 'Wei', 'Murray', 'Hajduch', 'Baumforth', 'Kolar', 'Ehrmann', 'Fridman', 'Dziechciarkova', 'Turashvili'] | ['Rubin', 'Green'] | [] | Proc Natl Acad Sci U S A | 2009 | 12/22/2009 | 0 | ith refs. 1 and 3 (breast and colorectal cancers), ref. 5 (glioblastoma multiforme), ref. 4 (pancreatic cancer), and ref. 2 (protein kinases). Gene sequences and CpG island annotations were downloaded from the University of California, Santa Cruz (UCSC) Genome Browser database, release hg18, except for Fig. 3 , which used hg17 ( 34 ). Where necessary, we used the UCSC liftOver tool to convert m| from human genome build 35 to build 36. For type-specific analyses, tumors were classified by tissue of origin. Normalized gene expression data were obtained from GEO ( 35 ) series accession number GSE5764{{tag}}--REUSE-- ( 31 ). Data Filtering. We obtained the list of genes sequenced in ref. 3 and their RefSeq ( 28 ) identifiers from the table of PCR primers in supplemental online material ( 3 ). We discarded|browser (release hg18) for 498 of 518 genes analyzed in ref. 2 . Point mutation positions reported in ref. 2 were manually curated to resolve inconsistencies between RefSeq sequences and sequences downloaded from the Cancer Genome Project (CGP) database ( http://www.sanger.ac.uk/genetics/CGP/Studies/Kinases/ ). For five genes, we manually corrected apparent frameshift errors in the RefSeq sequences. Ge | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
293 | GSE5764 | 3/16/2007 | ['5764'] | ['2635'] | [u'17389037'] | 1852112 | [u'17389037'] | ['Bouchal', 'Skarda', 'Srovnal', 'Klein', 'Wei', 'Murray', 'Hajduch', 'Baumforth', 'Kolar', 'Ehrmann', 'Fridman', 'Dziechciarkova', 'Turashvili'] | ['Bouchal', 'Skarda', 'Srovnal', 'Klein', 'Wei', 'Murray', 'Hajduch', 'Baumforth', 'Kolar', 'Ehrmann', 'Fridman', 'Dziechciarkova', 'Turashvili'] | ['Bouchal', 'Skarda', 'Srovnal', 'Klein', 'Wei', 'Murray', 'Hajduch', 'Baumforth', 'Kolar', 'Ehrmann', 'Fridman', 'Dziechciarkova', 'Turashvili'] | BMC Cancer | 2007 | 3/27/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
294 | GSE5766 | 11/10/2007 | ['5766'] | [] | [u'18234993'] | 2792233 | [u'18234993'] | ['Peterson', 'Farjo', 'Naash'] | ['Peterson', 'Farjo', 'Naash'] | ['Peterson', 'Farjo', 'Naash'] | Invest Ophthalmol Vis Sci | 2008 | 2008 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
295 | GSE5771 | 8/29/2007 | ['5771'] | [] | [u'17082818'] | 1794695 | [u'17082818'] | ['Mathieu', 'Zhu', 'Tariq', 'Smathajitt', 'Habu', 'Paszkowski', 'Probst'] | ['Mathieu', 'Zhu', 'Tariq', 'Smathajitt', 'Habu', 'Paszkowski', 'Probst'] | ['Mathieu', 'Zhu', 'Tariq', 'Smathajitt', 'Habu', 'Paszkowski', 'Probst'] | EMBO Rep | 2006 | 2006 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
296 | GSE5773 | 1/24/2007 | ['5773'] | [] | [u'17189378'] | 1781356 | [u'17189378'] | ['Miller', 'Amores', 'Cresko', 'Johnson', 'Dunham'] | ['Miller', 'Amores', 'Cresko', 'Johnson', 'Dunham'] | ['Miller', 'Amores', 'Cresko', 'Johnson', 'Dunham'] | Genome Res | 2007 | 2007 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
297 | GSE5781 | 1/7/2007 | ['5781'] | [] | [u'17236129'] | 1785342 | [u'17236129'] | ['Newbury-Ecob', 'Mundlos', 'Neumann', 'Megarbane', 'Seemanova', 'Habenicht', 'Klopocki', 'Strauss', 'Horn', 'Schulze', 'Trotier', 'Ullmann', 'Fleischhauer', 'K\xc3\xb6nig', u'K\xf6nig', 'Greenhalgh', 'Ott', 'Hall', 'Ropers'] | ['Newbury-Ecob', 'Mundlos', 'Neumann', 'Megarbane', 'Seemanova', 'Habenicht', 'Klopocki', 'Strauss', 'Horn', 'Schulze', 'Trotier', 'Ullmann', 'Fleischhauer', 'K\xc3\xb6nig', 'Greenhalgh', 'Ott', 'Hall', 'Ropers'] | ['Newbury-Ecob', 'Neumann', 'Mundlos', 'Seemanova', 'Habenicht', 'Klopocki', 'Strauss', 'Horn', 'Megarbane', 'Trotier', 'Ullmann', 'Fleischhauer', 'Schulze', 'K\xc3\xb6nig', 'Greenhalgh', 'Ott', 'Hall', 'Ropers'] | Am J Hum Genet | 2007 | 2007 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
298 | GSE5784 | 8/6/2007 | ['5784'] | [] | [u'17637744'] | 2664703 | [u'19352461'] | ['', 'Kaneko', 'Ohira', 'Ishii', 'Pinkel', 'Oba', 'Hirata', 'Fridlyand', 'Feuerstein', 'Nakagawara', 'Tomioka', 'Yoshida', 'Isogai', 'Misra', 'Todo', 'Nakamura', 'Albertson'] | ['Shay', 'Domany', 'Reiner-Benaim', 'Hegi', 'Lambiv'] | [] | Cancer Inform | 2009 | 2009 | 0 | monstrate the method for three different public aCGH datasets from two different childhood neoplasms associated with the nervous system on three different BAC array platforms: Medulloblastoma—GSE8634; Neuroblastoma—GSE5784{{tag}}--REUSE-- 37 and GSE7230. 38 Results Algorithm Our method uses aCGH data to create a concise genomic description of each sample, including chromosomal status and appearance of|d separately but similarly, using the same method. Input The algorithm’s input is the raw log2 aCGH data, and the markers’ status. The raw log2 ratio data of chromosome 2p, taken from GSE7230, is presented in Figure 1A . Markers’ status is the assignment per marker per sample—loss (−1), normal (0) or gain (1). The status was set by the R package GLAD (Gain and L|tions matrix A, which has binary valued elements: A ms = 1 if the aCGH marker m was assigned a gain value on sample s, and A ms = 0 otherwise (the amplification matrix of chromosome 2p based on the GSE7230 data is shown in Fig. 1B ). A deletions matrix D is defined similarly: D ms = 1 if the aCGH marker m has a loss assignment on sample s, and D ms = 0 otherwise (deletion matrix is not shown). Mar| in which an entire chromosome arm is lost, the corresponding entries are replaced by NaNs in the deletion matrix D. Figure 1 displays the amplification volume calculation for chromosomal arm 2p in GSE7230 (Neuroblastoma). The height matrix H is actually the raw log2 ratio. H ms ( Fig. 1A ) is the measured aCGH log2 ratio value of marker m in sample s. A ms ( Fig. 1B ) is the amplification matrix|ntary Table 5. Significantly amplified markers appear in Supplementary Table 4, and amplifications in Supplementary Table 6. Medulloblastoma When applied to the Medulloblastoma dataset analyzed here (GSE8634) our method finds all the known chromosomal aberrations of this cancer, and several possibly new ones as well. Figure 2 displays the chromosome status map of the Medulloblastoma dataset, and the |ious chromosomal translocations in hematological malignancies. NPM1 was associated with centrosome duplication and the regulation of p53, and might have a role as a tumor suppressor. 47 This dataset (GSE8634) has not yet been published, but dataset GSE2139 that includes a subset of the samples 41 was analyzed for local aberrations. This publication included a list of amplifications and deletions. We s|were identified as significantly amplified by our method—MYCN, CDK6 and marker RP11–382A18. Marker RP11–382A18 is annotated near MYC region on chromosome 8q by the platform of GSE2139, used by. 41 MYC amplification and MYCN amplification are mutually exclusive. Nine of the amplifications reported there were not identified by our method. Four of their deletions included markers |as MYCN amplification and 1p deletion. Group 2B is characterized by 11q deletion, and to a lesser extent, 3p deletion. This classification explains most of the chromosomal arms associations found. In GSE5784{{tag}}--REUSE-- there are 15 amplifications (Supplementary Table 6B, 28 markers amplified, Supplementary Table 4B) and 115 deletions (Supplementary Table 3B, 245 markers deleted, Supplementary Table 5B). In GSE723|plified Supplementary Table 4C) and 49 deletions (Supplementary Table 5C, 87 markers deleted, Supplementary Table 3C). Three amplifications and 14 deletions are common to both Neuroblastoma datasets (GSE5784{{tag}}--REUSE--, GSE7230) ( Table 2 , Fig. 3C and D ). The first amplified region, which was separated into two regions in GSE7230, is on chromosome 2, and corresponds to the MYCN region. MYCN amplifications were|ance with this region being a known frequent normal copy number variation. 48 Eight of the common deletions correspond to the 1pter deletion, and this deletion was fractioned into eight deletions in GSE7230. Another common deletion is in the region of BRCA1, a known tumor suppressor gene. In GSE5784{{tag}}--REUSE--, several known tumor suppressor genes were deleted—APC, CDKN2A, RB1 and TGFBR1. Also, two regio| 11, that includes CCND1, FGF19, FGF3, FGF4 was amplified, as well as a region on chromosome 12 with ETV6. For GSE5784{{tag}}--REUSE--, no aberration list was given in the original publication 37 for comparison. In GSE7230, the ALK region on chromosome 2 was amplified. ALK was previously identified as having a role in Neuroblastoma. 49 The fumarate hydratase (FH) region was deleted in GSE7230. FH was shown to be a t|for the entire genome and for each chromosomal arm separately. We applied our method on three public datasets of childhood neoplasms associated with the nervous system—one of Medulloblastoma (GSE8634) and two of Neuroblastoma (GSE5784{{tag}}--REUSE--, GSE7230). In Medulloblastoma, we find five distinct sub groups. Two sub groups with isochromosome 17, one with many other chromosomal events (2), and one with fe|erwise it was left in the analysis. Potentially inaccurate location was identified for 17 to 144 markers per dataset, which constitute 0.7%–3.5% of the markers (see Table 1 ). We noticed for GSE8634 that many aberrations were highly correlated, and correlated to gender. Some of the samples were probably hybridized to opposite sex control samples. The 28 markers whose two sided t-test p-value b|te controlling procedures 10.1093/bioinformatics/btf877 Bioinformatics 2003 19 368 375 12584122 Figure 1 Calculation of the “volume” statistic for chromosomal arm 2p amplifications in GSE7230 (Neuroblastoma) A ) The height matrix H (raw data) of 2p, where each element (m, s) on 2p is the log2 ratio of aCGH marker m in sample s. Each row corresponds to a marker, and each column correspon|arked in A–E by red asterisks. For presentation only, values are truncated to [−1, 1]. Figure 2 Chromosomal status and aberrations in Medulloblastoma A) Chromosomal status of dataset GSE8634. Each row corresponds to a chromosomal arm. Due to space limitation, only every second arm is labelled. Since some chromosomes are telocentric (with short p arm), there is a change from p to q. Val|tion only, values are truncated to the range [−1, 1], rising from blue to red. Figure 3 Chromosomal status and aberrations common to both Neuroblastoma datasets Chromosomal status of datasets GSE5784{{tag}}--REUSE-- ( A ) and GSE7230 ( B ), and the aberrations common to both of Neuroblastoma datasets, shown for the patients of GSE5784{{tag}}--REUSE-- ( C ) and GSE7230 ( D ). Each column corresponds to a sample. Samples are ma|the range [−1, 1], rising from blue to red. Table 1 Array CGH datasets analyzed. Dataset Condition Samples # Markers # Markers amplified Amplifications Markers deleted Deletions CNV Removed # GSE8634 Medulloblastoma 80 6295 13 10 137 99 4 126 GSE5784{{tag}}--REUSE-- Neuroblastoma 236 2457 28 15 245 115 4 17 GSE7230 Neuroblastoma 82 4073 30 18 87 49 0 144 The datasets are recognized by their Gene Expression Omn | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
299 | GSE5788 | 8/27/2007 | ['5788'] | ['2908'] | [u'17713554'] | 2737286 | [u'19750229'] | ['Klein-Hitpass', 'D\xc3\xbchrsen', 'Harder', 'Martin-Subero', u'D\xfcrig', 'Boes', 'J\xc3\xb6ns', u'J\xf6ns', u'D\xfchrsen', 'Baudis', 'Siebert', u'Martin-Dubero', 'Bug', 'D\xc3\xbcrig'] | ['Ammerpohl', 'Wickham-Garcia', 'Pott', 'Tr\xc3\xbcmper', 'Dreyling', 'Alvarez', 'Siebert', 'Vater', 'Prosper', 'Seifert', 'Stein', 'Deckert', 'Suela', 'Cruz', 'Haferlach', 'K\xc3\xbcppers', 'Fan', 'Montesinos-Rongen', 'Du', 'Bug', 'Klapper', 'Gesk', 'Nagel', 'Harder', 'Martin-Subero', 'Hansmann', 'Dyer', 'Richter', 'D\xc3\xbcrig', 'Bibikova', 'Calasanz', 'Agirre', 'Br\xc3\xbcggemann', 'Hartmann', 'Rom\xc3\xa1n-G\xc3\xb3mez'] | ['Siebert', 'Harder', 'Martin-Subero', 'Bug', 'D\xc3\xbcrig'] | PLoS One | 2009 | 9/11/2009 | 0 | peripheral blood using data generated with the Affymetrix U133A array [26] (raw expression data has been deposited in a MIAME compliant format in the GEO database, accession number GSE5788{{tag}}--DEPOSIT--). A fold change (in log2 scale) was calculated between the mean of gene expression data per tag in T-PLLs and T-cell controls. Enrichment for polycomb repressor complex 2 (PRC2) marks and promoter | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
300 | GSE5790 | 9/1/2007 | ['5790'] | [] | [u'17522210'] | 1951294 | [u'17522210'] | ['Lukashevich', 'Crasta', 'Salvato', 'Zapata', u'Hammameieh', 'Swindells', 'Pauza', 'Sobral', 'Djavani', 'Hammamieh', 'Davis', u'Fagan', 'Fei', 'Folkerts', 'Bryant', 'Jett'] | ['Lukashevich', 'Crasta', 'Salvato', 'Zapata', 'Swindells', 'Pauza', 'Sobral', 'Djavani', 'Hammamieh', 'Davis', 'Fei', 'Folkerts', 'Bryant', 'Jett'] | ['Crasta', 'Salvato', 'Zapata', 'Swindells', 'Lukashevich', 'Sobral', 'Djavani', 'Hammamieh', 'Folkerts', 'Davis', 'Fei', 'Pauza', 'Bryant', 'Jett'] | J Virol | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
301 | GSE5791 | 1/1/2007 | ['5791'] | [] | [u'17456239'] | 2950125 | [u'20957185'] | ['Schweitzer', 'Blume', 'Staples', 'Clark', 'Lu', 'Chen', 'Williams', 'Wang'] | ['Marais', 'Vibranovski', 'Long', 'Landback', 'Zhang'] | [] | PLoS Biol | 2010 | 10/5/2010 | 0 | genes into the mammalian X chromosome, which may be independent of major chromosomal changes. X-Linked Young Genes Are Male-Biased, Whereas X-Linked Old Genes Are Not Based on human body index data (GSE7307, Materials and Methods ) and mouse tissue profiling data [27] at the NCBI GEO database [28] , we identified genes with sex-biased expression ( Materials and Meth| S5 ) motivated us to perform more thorough transcriptional profiling to get a more complete picture of how genes from this evolutionary period are transcribed. We investigated mouse exon atlas data (GSE15998) to ask whether X-linked genes are more frequently expressed in the tissue of interest across different age groups. We clustered tissues by the proportion of X-linked genes expressed versus the pr|uggests that stronger positive selection acts on rodents and could explain why the recent peak of gene gain ( Figure 2 ) began earlier in the mouse lineage than in the human. Materials and Methods We downloaded Ensembl [54] release 51 (November, 2008) as the basic gene dataset for our analyses. We used MySQL V5.0.45 to organize the data, BioPerl [55] and BioEnsembl &#x|e defined as de novo. Expression Profiling In order to avoid non-specific probes and to cover more recently annotated genes, we used the customized array annotation files (released on November, 2008) downloaded from University of Michigan [62] , HGU133Plus2_Hs_ENSG (Affymetrix Human 133 plus 2) and Mouse4302_Mm_ENSG (Affymetrix Mouse Genome 430 2.0 Array) for human and mouse, respectively|at this dataset does not overlap with what we described in Table 3 , since Table 3 only presents genes with unique probes, which 19 of these 20 genes do not have. Branch-Specific Ka/Ks Analysis We downloaded the vertebrate-wide 44-way coding sequence alignment from UCSC. UCSC known genes mapping to multiple Ensembl genes were discarded. For Ensembl genes mapping to multiple UCSC known genes, we retaine|icting with the age were removed. Based on the species tree ( Figure 1 ), we estimated Ka/Ks for each branch using free ratio model in PAML [71] . Functional Enrichment Analysis We downloaded Gene Ontology (GO) annotations for Ensembl V51. We used the program analyze.pl V1.9 of TermFinder package [72] to identify those significant terms for new genes, with multiple tes|ginating on a given chromosome out of all genes originating during that evolutionary period, that is, in that phylogenetic branch. Since human and mouse chromosomes are not completely orthologous, we downloaded net chain information (table netMm9) between human and mouse from UCSC [74] and extracted the top mouse hit for each individual human chromosome. For example, the top hit in mouse|uman and N  = 0.90, r  = 0.008, and d  = 0.22 for mouse. Panel A is based on Affymetrix Research Exon Array data for humans (GSE5791{{tag}}--REUSE--), while panel B is based on the Affymetrix Mouse Exon Array Panel. For the former, since the raw CEL file is not available, we downloaded the processed data from GEO website [28] ,|ndance (RA) of nine control tissues in mice. (0.55 MB DOC) Click here for additional data file. Figure S7 Heatmap of expression enrichment in X chromosome and autosome based on human body index data (GSE7307). The axes are labeled as in Figure 6 of the main text. Note that branches 10, 11, and 12 were skipped since these branches have too few (<5) genes with unique probes on the X chromosome | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
302 | GSE5791 | 1/1/2007 | ['5791'] | [] | [u'17456239'] | 2636774 | [u'19154578'] | ['Schweitzer', 'Blume', 'Staples', 'Clark', 'Lu', 'Chen', 'Williams', 'Wang'] | ['Shah', 'Pallas'] | [] | BMC Bioinformatics | 2009 | 1/20/2009 | 0 | tes, and in turn the SI, under-estimating the extent of differential splicing. This highlights one of the pitfalls of the SI approach. A similar study was done on a tissue panel data set (GEO dataset GSE5791{{tag}}--REUSE--) using the Affymetrix Research Exon Arrays [ 12 ]. The design of this array differs to that of the HuEx 1.0ST arrays as they consisted of a GeneChip array set of 4 chips and each probeset had up to|t, cerebellum, heart, kidney, liver, muscle, pancreas, prostate, spleen, testes and thyroid) with three assay replicates per tissue. Samples are from a commercial source. The dataset is available for download from the Affymetrix website: Data Summarisation and Normalisation The exon-level and gene-level data was generated from the CEL files using the Affymetrix Power Tools (APT). The library files used w | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
303 | GSE5791 | 1/1/2007 | ['5791'] | [] | [u'17456239'] | 1950531 | [u'17626050'] | ['Schweitzer', 'Blume', 'Staples', 'Clark', 'Lu', 'Chen', 'Williams', 'Wang'] | ['Schweitzer', 'Blume', 'Clark', 'Poliakov', 'Minovitsky', 'Arribere', 'Dubchak', 'Marr', 'Das', 'Conboy', 'Yamamoto'] | ['Schweitzer', 'Blume', 'Clark'] | Nucleic Acids Res | 2007 | 2007 | 0 | ncluding annotated genes, cDNA sequences and exon prediction algorithms. Design information and microarray data is available at the GEO database ( http://www.ncbi.nlm.nih.gov/geo/ ; accession number: GSE5791{{tag}}--DEPOSIT--). Candidate muscle-enriched probesets were identified using the splicing index approach ( 26 , 36 , 37 ). Exon-level expression was normalized to the expression level of the parent gene by dividing | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
304 | GSE5791 | 1/1/2007 | ['5791'] | [] | [u'17456239'] | 1896007 | [u'17456239'] | ['Schweitzer', 'Blume', 'Staples', 'Clark', 'Lu', 'Chen', 'Williams', 'Wang'] | ['Schweitzer', 'Blume', 'Staples', 'Clark', 'Lu', 'Chen', 'Williams', 'Wang'] | ['Schweitzer', 'Blume', 'Staples', 'Clark', 'Lu', 'Chen', 'Williams', 'Wang'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
305 | GSE5793 | 2/8/2007 | ['5793'] | ['3252'] | [u'17096597'] | 1635533 | [u'17096597'] | ['Lee', 'Reinke', 'Ausubel', 'Chu', 'Kim', 'Troemel'] | ['Lee', 'Reinke', 'Ausubel', 'Chu', 'Kim', 'Troemel'] | ['Lee', 'Reinke', 'Ausubel', 'Chu', 'Kim', 'Troemel'] | PLoS Genet | 2006 | 11/10/2006 | 0 | AND pmc_gds | 1 | 0 | ||||
306 | GSE5794 | 10/9/2007 | ['5794'] | [] | [u'17895889'] | 2360455 | [u'17895889'] | ['Zhang', 'Kurashina', 'Ohyashiki', 'Takaku', 'Hamamura', 'Kobayashi'] | ['Zhang', 'Kurashina', 'Ohyashiki', 'Takaku', 'Hamamura', 'Kobayashi'] | ['Zhang', 'Kurashina', 'Ohyashiki', 'Takaku', 'Hamamura', 'Kobayashi'] | Br J Cancer | 2007 | 10/22/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
307 | GSE5801 | 2/8/2007 | ['5801'] | ['3253'] | [u'17096597'] | 1635533 | [u'17096597'] | ['Lee', 'Reinke', 'Ausubel', 'Chu', 'Kim', 'Troemel'] | ['Lee', 'Reinke', 'Ausubel', 'Chu', 'Kim', 'Troemel'] | ['Lee', 'Reinke', 'Ausubel', 'Chu', 'Kim', 'Troemel'] | PLoS Genet | 2006 | 11/10/2006 | 0 | AND pmc_gds | 1 | 0 | ||||
308 | GSE5802 | 9/8/2007 | ['5802'] | [] | [u'17567914'] | 1906760 | [u'17567914'] | ['Sims', 'Davies', 'Levy', 'Joshi', 'Dean'] | ['Sims', 'Davies', 'Levy', 'Joshi', 'Dean'] | ['Sims', 'Davies', 'Levy', 'Joshi', 'Dean'] | BMC Dev Biol | 2007 | 6/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
309 | GSE5806 | 9/10/2007 | ['5806'] | [] | [u'17293567'] | 1867337 | [u'17293567'] | ['', 'Winter', 'Bezhani', 'Kennedy', 'Wagner', 'Kwon', 'Su', 'Pfluger', 'Hershman'] | ['Winter', 'Bezhani', 'Kennedy', 'Wagner', 'Kwon', 'Su', 'Pfluger', 'Hershman'] | ['Winter', 'Bezhani', 'Kennedy', 'Wagner', 'Kwon', 'Su', 'Pfluger', 'Hershman'] | Plant Cell | 2007 | 2007 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
310 | GSE5807 | 12/8/2007 | ['5807'] | [] | [] | 2064964 | [u'18000542'] | [u'Pillai', u'Ghosh'] | ['Maitra', 'Brahmachari', 'Pandey', 'Ghosh', 'Pillai'] | [u'Pillai', u'Ghosh'] | PLoS One | 2007 | 11/14/2007 | 0 | bed by Cheadle et al., [28] . Z ratio value ±1.96 was considered significant (p<0.05) [28] . Microarray data has been submitted to GEO (Accession no. GSE5807{{tag}}--DEPOSIT--). Northern analysis Total RNA was transferred to the nylon membrane after separation on a 1.5% agarose formaldehyde gel. Subsequently, radioactively labeled probe prepared from purified PCR | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
311 | GSE5808 | 6/7/2007 | ['5808'] | ['2756'] | [u'17538120'] | 2761903 | [u'19772654'] | ['Griffin', 'Moss', 'Zilliox'] | ['Gillis', 'Pavlidis'] | [] | BMC Bioinformatics | 2009 | 9/22/2009 | 0 | experiment name, organism part, array design and age category for the experiments are listed in each column. Experiments used for analysis . Gemma ID Name Organism part Array Design Age category 622 GSE8586 Umbilical cord GPL570 Prenatal 726 GSE9164 Foreskin cells GPL5876 Prenatal 233 GSE1397 Brain, heart GPL96 Prenatal 215 khatua-astrocytoma Brain GPL91 Child/young adult 218 pomeroy-embryonal Brain, |Child/young adult 555 GSE5808{{tag}}--REUSE-- Blood cell GPL96 Child/young adult 585 GSE7586 Placenta GPL570 Adult 178 GSE80 Muscle GPL91 Adult 633 GSE8607 Testis GPL91 Adult 275 GSE4757 Brain GPL570 Older adult 721 GSE8919 Brain GPL2700 Older adult 263 GSE5281 Brain GPL570 Older adult To allow the investigation of differential expression over age, we computed a relative rank-based measure of expression level for each | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
312 | GSE5808 | 6/7/2007 | ['5808'] | ['2756'] | [u'17538120'] | 1951064 | [u'17538120'] | ['Griffin', 'Moss', 'Zilliox'] | ['Griffin', 'Moss', 'Zilliox'] | ['Griffin', 'Moss', 'Zilliox'] | Clin Vaccine Immunol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
313 | GSE5816 | 1/3/2007 | ['5816'] | [] | [u'17194187'] | 2841122 | [u'20205715'] | ['Euhus', 'Lam', 'Minna', 'Girard', 'Jiang', 'Pollack', 'Perou', 'Shames', 'Shay', 'Gazdar', 'Gao', 'Fong', 'Wong', 'Kim', 'Lewis', 'Nanda', 'Sato', 'Gerald', 'Shyr', 'Olopade', 'Shivapurkar'] | ['Vacher', 'Minna', 'Euhus', 'Tommasi', 'Bieche', 'Lewis', 'Latif', 'Pfeifer', 'Dobbins', 'Gentle', 'Dallol', 'Hill', 'Maher', 'Ward', 'Dansranjavin', 'Dammann', 'Hesson'] | ['Lewis', 'Minna', 'Euhus'] | Mol Cancer | 2010 | 3/5/2010 | 0 | The low stringency candidate list was compared to a publically available expression array data for an experiment demonstrating the re-expression of genes in the MCF7 breast cancer cell line after treatment with a de-methylating agent (GEO data base accession IDÊGSE5816{{tag}}--REUSE--). Genes in both the low stringency candidate list and the re-expressed list were then considered. | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
314 | GSE5816 | 1/3/2007 | ['5816'] | [] | [u'17194187'] | 1716188 | [u'17194187'] | ['Euhus', 'Lam', 'Minna', 'Girard', 'Jiang', 'Pollack', 'Perou', 'Shames', 'Shay', 'Gazdar', 'Gao', 'Fong', 'Wong', 'Kim', 'Lewis', 'Nanda', 'Sato', 'Gerald', 'Shyr', 'Olopade', 'Shivapurkar'] | ['Euhus', 'Lam', 'Minna', 'Girard', 'Jiang', 'Pollack', 'Perou', 'Shames', 'Shay', 'Gazdar', 'Gao', 'Fong', 'Wong', 'Kim', 'Lewis', 'Nanda', 'Sato', 'Gerald', 'Shyr', 'Olopade', 'Shivapurkar'] | ['Nanda', 'Lam', 'Minna', 'Euhus', 'Jiang', 'Perou', 'Pollack', 'Shay', 'Gao', 'Fong', 'Girard', 'Kim', 'Shames', 'Sato', 'Shivapurkar', 'Lewis', 'Olopade', 'Gerald', 'Gazdar', 'Wong', 'Shyr'] | PLoS Med | 2006 | 2006 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
315 | GSE5819 | 4/10/2007 | ['5819'] | [] | [u'19772648'] | 2761919 | [u'19772648'] | ['Abbruscato', 'Fumasoni', u'Brasilero', 'Berri', 'Faivre-Rampant', 'Morandini', 'P\xc3\xa8', 'Mizzi', 'Piffanelli', 'Kikuchi', 'Brasileiro', 'Satoh', u'P\xe8'] | ['Abbruscato', 'Fumasoni', 'Berri', 'Faivre-Rampant', 'Morandini', 'P\xc3\xa8', 'Mizzi', 'Piffanelli', 'Kikuchi', 'Brasileiro', 'Satoh'] | ['Abbruscato', 'Fumasoni', 'Berri', 'Faivre-Rampant', 'Morandini', 'P\xc3\xa8', 'Mizzi', 'Piffanelli', 'Kikuchi', 'Brasileiro', 'Satoh'] | BMC Plant Biol | 2009 | 9/22/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
316 | GSE5825 | 6/14/2007 | ['5825'] | ['2924'] | [u'17394237'] | 2254327 | [u'17394237'] | ['Newbold', 'Padilla-Banks', 'Lobenhofer', 'Snyder', 'Jefferson', 'Grissom'] | ['Newbold', 'Padilla-Banks', 'Lobenhofer', 'Snyder', 'Jefferson', 'Grissom'] | ['Newbold', 'Padilla-Banks', 'Lobenhofer', 'Snyder', 'Jefferson', 'Grissom'] | Mol Carcinog | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
317 | GSE5828 | 5/1/2007 | ['5828'] | [] | [u'17601969'] | 2694616 | [u'19362539'] | ['Pavey', 'Bowman', 'Fong', 'Yang', 'Larsen', 'Colosimo', 'Clarke', 'Hayward'] | ['Potti', 'Gatza', 'Bild', 'Wang', 'Nevins', 'West', 'Febbo', 'Carvalho', 'Mori', 'Chang', 'Lucas'] | [] | Mol Cell | 2009 | 4/10/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
318 | GSE5834 | 7/1/2007 | ['5834'] | [] | [u'17679087'] | 2847180 | [u'17679087'] | ['Martens', 'Schotta', 'Pauler', 'Melikant', 'Regha', 'Sloane', 'Jenuwein', 'Huang', 'Warczok', 'Radolf', 'Barlow'] | ['Martens', 'Schotta', 'Pauler', 'Melikant', 'Regha', 'Sloane', 'Jenuwein', 'Huang', 'Warczok', 'Radolf', 'Barlow'] | ['Schotta', 'Pauler', 'Melikant', 'Regha', 'Sloane', 'Jenuwein', 'Huang', 'Radolf', 'Martens', 'Barlow', 'Warczok'] | Mol Cell | 2007 | 8/3/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
319 | GSE5840 | 9/14/2007 | ['5840'] | [] | [u'17178894'] | 2782370 | [u'19117983'] | ['Paik', 'Hartman-Frey', 'Huang', 'Abbosh', 'Li', 'Yan', 'Fan', 'Salisbury', 'Cheng', 'Chen', 'Nephew', 'Oyer'] | ['Steffen', 'Hilsenbeck', 'Ochsner', 'Chen', 'McKenna', 'Watkins'] | ['Chen'] | Cancer Res | 2009 | 1/1/2009 | 0 | t datasets at either time point. Moreover, relaxing the q -value cut -off to 0.2 resulted in only a modest increase in the number of genes in this intersection (data not shown here but available for download from the GEMS website). This initial result indicated that given the extent in variation across the datasets, traditional Venn analysis would be of limited use in arriving at a consensus gene express| Table 1 Studies selected for meta-analysis. Supplementary Material body Supplementary Click here to view. (4.1M, zip) Acknowledgments We thank the principal investigators who made their datasets publicly available. This work was supported by NIDDK NURSA U19 DK62434. Other Sections� Abstract Introduction Materials and Methods Results Gene Expression MetaSignatures (GEMS) web resource Discussion | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
320 | GSE5840 | 9/14/2007 | ['5840'] | [] | [u'17178894'] | 2970572 | [u'21072189'] | ['Paik', 'Hartman-Frey', 'Huang', 'Abbosh', 'Li', 'Yan', 'Fan', 'Salisbury', 'Cheng', 'Chen', 'Nephew', 'Oyer'] | ['Wang', 'Shen', 'Li', 'Liu', 'Huang', 'Nephew'] | ['Nephew', 'Huang', 'Li'] | PLoS One | 2010 | 11/2/2010 | 0 | ulatory regions ( Figure 4A ), 2,000-bp upstream of the regulatory region, 2,000-bp downstream of the regulatory region, and 2,000-bp of randomly selected intergenic regions. The PhastCons scores are downloaded from UCSC Genome Browser and reflect the overall conservation among seventeen vertebrate species [23] . Importantly, the average conservation score in the TSS region and transcript|describing the Gamma distribution of genome-wide background signals. See Appendix S1 for detail procedures. Data and model availability All the data are made available in the NCBI Gene Expression Omnibus (GEO) database with accession number GSE21068 for the ChIP-seq data for RPol II and H3K4me2, and GSE5840{{tag}}--DEPOSIT-- for the microarray data for MCF7 and MCF7-T with and without E2 treatment. In addition, both t | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
321 | GSE5840 | 9/14/2007 | ['5840'] | [] | [u'17178894'] | 3003621 | [u'21108837'] | ['Paik', 'Hartman-Frey', 'Huang', 'Abbosh', 'Li', 'Yan', 'Fan', 'Salisbury', 'Cheng', 'Chen', 'Nephew', 'Oyer'] | ['Shen', 'Li', 'Liu', 'Huang', 'Nephew', 'Jeong'] | ['Nephew', 'Huang', 'Li'] | BMC Med Genomics | 2010 | 11/25/2010 | 0 | . Microarray Analysis Suite (MAS) version 5.0 was used for preprocessing. Experimental details were described in [ 20 ]. The data discussed in this paper have been deposited in NCBI's Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ and are accessible through GEO Series accession number (GSE5840{{tag}}--DEPOSIT-- for gene expression and GSE25519 for methylation). Data We compared gene expression and DNA methylat | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
322 | GSE5843 | 5/1/2007 | ['5843'] | [] | [u'17082175'] | 2633354 | [u'19087325'] | ['Pavey', 'Bowman', 'Fong', 'Clarke', 'Passmore', 'Larsen', 'Hayward'] | ['Liang'] | [] | BMC Med Genomics | 2008 | 12/16/2008 | 1 | using human primary lung cancer specimens (no cell lines) under the search terms "human lung adenocarcinoma" or "human lung squamous carcinoma" as of July 1 st , 2007 were reviewed. Only one dataset, GSE7339, was removed from further analysis due to too many genes missing from the 17-gene signature for prediction analysis. The 14 th dataset GSE2514 in Table 1 that is a mouse gene expression dataset | and subject to analyses. Table 1 Summary of lung cancer datasets used in this study. Database GEO Platform Institute PMID Technology type Organism AD# SCC# Stage # Genes % Correct AD % Correct SCC 1 GSE3398 GPL2648/2778/2832 Stanford 11707590 spotted cDNA Human 41 17 I to III 17 93 94 2 NA* Affy HG-U95A DFCI 11707567 oligonucleotide Human 139 21 I to III 17 91 86 3 NA** Affy HG-U133A U Michigan 121182| I to III 17 95 NA 4 GSE3141 Affy U95A/HuGeneFL Duke 16899777 oligonucleotide Human 54 57 I to III 17 83 72 5 GSE4573 Affy HG-U133A U Michigan 16885343 oligonucleotide Human 0 129 I to III 17 NA 85 6 GSE1037 CHUGAI 41K CIH, Japan 15016488 spotted cDNA Human 12 0 NA 17 83 NA 7 GSE6253 Affy HG-U95A/U133AB Washington U 17194181 oligonucleotide Human 14 36 I 17 79 78 8 GSE3268 Affy HG-U133A UC Davis 161889| Human 0 5 NA 17 NA 100 9 GSE1987 Affy HG-U95A Tel Aviv U NA oligonucleotide Human 8 17 I to III 17 88 59 10 GSE6044 Affy HG-Focus U Duesseldorf, Germany NA oligonucleotide Human 10 10 NA 16 70 90 11 GSE7880 Affy HG-Focus Heinrich-Heine U, Germany NA oligonucleotide Human 25 18 IIIB/IV 16 92 83 12 GSE2514 Affy HG-U95A U Colorado 16314486 oligonucleotide Human 20 0 NA 15 100 NA 13 GSE5843{{tag}}--REUSE-- GSE5123 PC Hum|14486 oligonucleotide Mouse 44 0 NA 13 100 NA * ** Figure 1 A flow diagram outlines selection of 13 human lung cancer databases used in this study . This diagram does not include the 14 th dataset GSE2514 in Table 1 that is a mouse gene expression dataset and was not used to calculate the accuracy of prediction in the meta-analysis. DB, database. Clustering analysis and evaluation of classification | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
323 | GSE5843 | 5/1/2007 | ['5843'] | [] | [u'17082175'] | 2707631 | [u'19609451'] | ['Pavey', 'Bowman', 'Fong', 'Clarke', 'Passmore', 'Larsen', 'Hayward'] | ['Jiang', 'Lee', 'Zhang', 'Song', 'Liu', 'Zhao', 'Fan'] | [] | PLoS One | 2009 | 7/17/2009 | 1 | (MD Anderson cancer center database) [14] and UCSF-2 (Stanford microarray database) [15] and three HGG (grade III and GBM combined) sets from the cohorts UCLA (GEO GDS1975) [3] , MDA (GEO GDS1815) [4] , and CMBC (BROAD institute database) [2] ( Table 1 ). Among the five cohorts, UCLA, UCSF-1 and MDA have 35, 34, and|a I (63); II (20) d UM-HLM [23] Oligos Affymetrix 56 66 (10) 118 a I (160); II (48) d Bladder AUH [24] Oligos Affymetrix 13 NA 30 a III+IV (30) c Ovarium MNI(GSE8842) Spotted cDNA 80 52 (12) 13 a I (68) d a Death. b Metastasis. c Tumor grade. d Tumor stage. NA, not available. m, month. Yr, year. Ref, reference. Using the median OS as a cutoff for each cohort, w|iction of the three gene classifiers for patients with other tumor types, we obtained 12 cohorts including five breast cancer cohorts: GIS (ArrayExpress E-GEOD-3494) [16] , CRCM (GEO GSE9893) [17] , SUSM (Stanford microarray database) [18] , NCI (Rosetta inpharmatics inc database) [19] , EMC (GEO GSE2034) [20] , five l| , PCH (GEO GSE5843{{tag}}--REUSE--) [22] , CAN/DF (caArray) [23] , MSK (caArray) [23] , UM-HLM (caArray) [23] , one bladder cancer cohort AUH (GEO GSE5287) [24] , and one ovarian cancer cohort MNI (GEO GSE8842) with microarray expression data and clinicopathogical information publicly available (detailed in Materials and methods ) (|y stage I) from EMC [20] ; one bladder cancer set of 30 advanced bladder cancers from AUH [24] ; one ovarian tumor set of 68 stage I ovarian carcinomas from MNI (GEO GSE8842). For the two breast cancer cohorts NCI and EMC, where the overall survival times were unavailable, time to distant metastasis was used instead. For all the cohorts, we used normalized microarray d | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
324 | GSE5847 | 9/30/2007 | ['5847'] | ['3097', '3096'] | [u'17999412'] | 2716388 | [u'19651608'] | ['Yi', 'Boersma', 'Lee', 'Stephens', 'Luke', 'Yfantis', 'Reimers', 'Ambs', 'Ludwig', 'Weinstein'] | ['Neville', 'Clouston', 'Shin', 'Jungbluth', 'Miller', 'Grigoriadis', 'da', 'Chen', 'Lakhani', 'Caballero', 'Old', 'Hoek', 'Cebon', 'Simpson'] | [] | Proc Natl Acad Sci U S A | 2009 | 8/11/2009 | 0 | ccines. The aim of the present study was to investigate the expression of CT antigens in breast cancer. Using previously generated massively parallel signature sequencing (MPSS) data, together with 9 publicly available gene expression datasets, the expression pattern of CT antigens located on the X chromosome (CT-X) was interrogated. Whereas a minority of unselected breast cancers was found to contain CT- --REUSE-- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
325 | GSE5847 | 9/30/2007 | ['5847'] | ['3097', '3096'] | [u'17999412'] | 2987696 | [u'19188147'] | ['Yi', 'Boersma', 'Lee', 'Stephens', 'Luke', 'Yfantis', 'Reimers', 'Ambs', 'Ludwig', 'Weinstein'] | ['Zhu', 'Edris', 'West', 'Espinosa', 'Montgomery', 'Varma', 'Beck', 'van', 'Li', 'Marinelli'] | [] | Clin Cancer Res | 2009 | 2/1/2009 | 0 | We utilized five publically available whole tumor breast cancer data sets (NKI (9), Perreard (10),ÊGSE1379Ê(11),ÊGSE1456Ê(12),ÊGSE3494Ê(13)) that contain gene expression data on a total of 856 cases with clinical follow-up, and we utilized three laser capture microdissection (LCM) breast cancer datasets (GSE5847{{key}}--REUSE--Ê(14),GSE9014(15),ÊGSE10797(16)), that contain gene expression data obtained separately from the stroma and epithelium (GSE5847Ê(14) andÊGSE10797Ê(16)) or solely from the stroma (GSE9014Ê(15)) from 126 cases. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
326 | GSE5847 | 9/30/2007 | ['5847'] | ['3097', '3096'] | [u'17999412'] | 2602602 | [u'19104654'] | ['Yi', 'Boersma', 'Lee', 'Stephens', 'Luke', 'Yfantis', 'Reimers', 'Ambs', 'Ludwig', 'Weinstein'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847{{tag}}--REUSE-- GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
327 | GSE5847 | 9/30/2007 | ['5847'] | ['3097', '3096'] | [u'17999412'] | 2831002 | [u'20064233'] | ['Yi', 'Boersma', 'Lee', 'Stephens', 'Luke', 'Yfantis', 'Reimers', 'Ambs', 'Ludwig', 'Weinstein'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux'] | [] | BMC Bioinformatics | 2010 | 1/11/2010 | 0 | using the R package GCRMA [ 20 ]. As the benchmark is tested gene by gene, a pre-treatment including all Affybatch objects globally was not needed. Table 2 Datasets list Dataset Number of replicates GSE10072 107 GSE10760 98 GSE1561 49 GSE1922 49 GSE3790FC 65 GSE3790CN 70 GSE3790CB 54 GSE3846 108 GSE3910 70 GSE3912 113 GSE5388 61 GSE5392 82 GSE5462 116 GSE5580 42 GSE5847{{tag}}--REUSE-- 95 GSE646-7 93 GSE643-5 126 GSE|6 38 GSE9874b-f 60 GSE9874 60 GSE9877 47 GSE994 75 Datasets used for construction of the initial matrix. The number of replicates is the number of microarrays in the experiment. Giant datasets (e.g., GSE3790 with 202 replicates in three different brain regions) were first split into subsets according to their biological content. The datasets were then sampled as follows: when the number of replicates w | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
328 | GSE5847 | 9/30/2007 | ['5847'] | ['3097', '3096'] | [u'17999412'] | 2861033 | [u'20346108'] | ['Yi', 'Boersma', 'Lee', 'Stephens', 'Luke', 'Yfantis', 'Reimers', 'Ambs', 'Ludwig', 'Weinstein'] | ['Noh', 'Lee', 'Kang', 'Ahn', 'Kim', 'Yu'] | ['Lee'] | BMC Cancer | 2010 | 3/26/2010 | 0 | 33A platform (GPL96) were downloaded from the Gene Expression Omnibus (GEO) database ( http://www.ncbi.nlm.nih.gov/projects/geo/ ). The samples included 1,715 cases of biopsied breast cancer tissues (GSE1456, GSE2034, GSE2990, GSE3494, GSE4922, GSE5364 and GSE11121) and 95 cases of laser-capture microdissected (LCM) breast cancer tissues (GSE5847{{tag}}--REUSE--). The latter 95 samples were considered to be positive c | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
329 | GSE5847 | 9/30/2007 | ['5847'] | ['3097', '3096'] | [u'17999412'] | 2964971 | [u'20978357'] | ['Yi', 'Boersma', 'Lee', 'Stephens', 'Luke', 'Yfantis', 'Reimers', 'Ambs', 'Ludwig', 'Weinstein'] | ['Yi', 'Boersma', 'Glynn', 'Lee', 'Stephens', 'Ridnour', 'Hudson', 'Yfantis', 'Wink', 'Switzer', 'Dorsey', 'Martin', 'Ambs'] | ['Yi', 'Boersma', 'Lee', 'Stephens', 'Ambs', 'Yfantis'] | J Clin Invest | 2010 | 11/1/2010 | 0 | Cel files with the normalized expression data were deposited in the GEO repository (Êhttp://www.ncbi.nlm.nih.gov/projects/geo/; accession no.GSE5847{{key}}--DEPOSIT--). | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
330 | GSE5847 | 9/30/2007 | ['5847'] | ['3097', '3096'] | [u'17999412'] | 2638012 | [u'19225562'] | ['Yi', 'Boersma', 'Lee', 'Stephens', 'Luke', 'Yfantis', 'Reimers', 'Ambs', 'Ludwig', 'Weinstein'] | ['Yi', 'Boersma', 'Lee', 'Stephens', 'Tsai', 'Martin', 'Yfantis', 'Weissman', 'Howe', 'Reimers', 'Ambs', 'Williams'] | ['Yi', 'Boersma', 'Lee', 'Stephens', 'Yfantis', 'Reimers', 'Ambs'] | PLoS One | 2009 | 2009 | 0 | ale, NY). Labeled cRNA was hybridized onto Affymetrix HG-U133A GeneChips. Cel files with the normalized expression data, and additional tumor marker information, were deposited in the GEO repository (GSE5847{{tag}}--DEPOSIT--). Table S4 lists the GEO accession number for the Cel files of each sample. Analysis of gene expression data All chips were normalized with the robust multichip analysis procedure [45&#x | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
331 | GSE5859 | 1/7/2007 | ['5859'] | [] | [u'17206142'] | 2670604 | [u'19141711'] | ['Spielman', 'Morley', 'Bastone', 'Ewens', 'Cheung', 'Burdick'] | ['Urban', 'Kwok', 'Giacomini', 'Stryke', 'Ferrin', 'Johns', 'Yee', 'Castro', 'Hesselson', 'Tahara', 'Kawamoto'] | [] | J Pharmacol Exp Ther | 2009 | 2009 Apr | 0 | The OCTN2 mRNA expression level of the individual lymphoblastoid cell lines was obtained from the GEO database (accession numbersÊGSE5859{{tag}}--REUSE--ÊandÊGSE7761).Ê | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
332 | GSE5859 | 1/7/2007 | ['5859'] | [] | [u'17206142'] | 2600931 | [u'18791227'] | ['Spielman', 'Morley', 'Bastone', 'Ewens', 'Cheung', 'Burdick'] | ['Kang', 'Ye', 'Eskin'] | [] | Genetics | 2008 | 2008 Dec | 0 | other known batch effects have been suggested ( A key et al . 2007 ), our analysis identified more real cis associations than methods that explicitly correct for the batch effects. Our method is publicly available as an R package at http://genetics.cs.ucla.edu/ice  Other Sections� Abstract MATERIALS AND METHODS RESULTS DISCUSSION References MATERIALS AND METHODS Gene expression data and genetic maps|ed genomic annotations were mapped onto the genome to draw the genomewide eQTL maps. For BXD RI data sets, we obtained the hematopoietic stem cell (HSC) data from the GEO database with accession no. GSE2031 and the whole brain data set by request from the authors. Both data sets use the Affymetrix U74Av2 GeneChip platform and contain 12,422 probes. A total of 8596 probes were mapped onto the NCBI bui|tes for each strain, and the whole brain data set contains 64 samples over 28 strains, varying from one to four measurements per strain. The second-generation whole brain data set using M430v2 arrays downloaded from GeneNetwork ( http://www.genenetwork.org ) contains expression profiles over 45,102 probes across 30 BXD RI strains with up to six replicates per strain. Their expression values were normalize| ). We used the default settings and the batch-corrected expression levels were used to perform traditional eQTL mapping using the t -test. For surrogate variable analysis, we used the SVA R package downloaded from the author's website, identifying surrogate variables without the genotype data as suggested ( L eek and S torey 2007 ). The P -values are obtained using a linear model after correcting for | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
333 | GSE5859 | 1/7/2007 | ['5859'] | [] | [u'17206142'] | 2557141 | [u'18846218'] | ['Spielman', 'Morley', 'Bastone', 'Ewens', 'Cheung', 'Burdick'] | ['Ouyang', 'Krontiris', 'Smith'] | [] | PLoS One | 2008 | 2008 | 1 | s time taking advantage of the high-density HapMap Phase II data – within the block containing the reported peak SNPs. We applied public HapMap expression data across three major populations (GSE2552 [13] and GSE5859{{tag}}--REUSE-- [19] , based on the Affymetrix platform, and GSE6536 [20] , based on the Illumina platform). 10.1371/journal.pone.0003362.t001 T|in Ibadan, Nigeria; CHB: Han Chinese in Beijing, China; JPT: Japanese in Tokyo, Japan. 3 Relative position (upstream; up / downstream; dn) to initiation/termination sites. 4 Based on public data from GSE 6536 (Illumina platform) or GSE 2552 / GSE 5859{{tag}}--REUSE-- (Affymetrix platform). 5 Reported in Morley et al., Nature 430, 743–7 (2004). 6 Reported in Cheung et al., Nature 437, 1365–9 (2005).|es in the lineage), a SNP with the most complete genotypes (underlined) was chosen for testing association. The nominal p-values for these SNPs in each major population, based on expression data sets GSE6536 (Illumina platform) and GSE2552/GSE5859{{tag}}--REUSE-- (Affymetrix platform), are shown in order. The coalescent-based maximum likelihood tree structure and the regression of expression phenotypes are plotted at |ed the same evolutionarily conserved feature, we next asked whether their cis -regulatory phenotypes were, as expected, also conserved across populations. Based on one set of expression data in YRI (GSE6536, Illumina platform) and two sets data in CHB/JPT (GSE5859{{tag}}--REUSE--, Affymetrix platform; GSE6536, Illumina platform), we tested the association for all tagging SNPs ( Figure 4 and Supporting Information F|age or allelic imbalance (AI) assays, we added a further validation step by confirming the cis -association using an independent dataset having a relatively large sample size [33] (GSE8052; Affymetrix platform; 400 UK samples). Thirty of the 44 genes passed the genome-wide significance threshold (a LOD score of 6.076, corresponding to a false discovery rate of 0.05, as listed in supp|tion. Association analysis between each tagging SNP and two sets of HapMap expression data, based on two (Affymetrix and Illumina) platforms and across three HapMap populations, (GEO accession number GSE2552 [13] , GSE5859{{tag}}--REUSE-- [19] , and GSE6536 [20] ), was conducted by following the regression methods described in Cheung et al. [13] (dis | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
334 | GSE5859 | 1/7/2007 | ['5859'] | [] | [u'17206142'] | 2775716 | [u'19349310'] | ['Spielman', 'Morley', 'Bastone', 'Ewens', 'Cheung', 'Burdick'] | ['Kruse', 'Moya', 'Wendland', 'Murphy', 'Wheaton', 'Timpano', 'Ren-Patterson', 'Anavitarte'] | [] | Arch Gen Psychiatry | 2009 | 2009 Apr | 1 | To determine the role of genetic variation of SLC1A1 in OCD in a large case-control study and to better understand how SLC1A1 variation affects functionality. Design A case-control study. Setting Publicly accessible SLC1A1 expression and genotype data. Patients Three hundred twenty-five OCD probands and 662 ethnically and sex-matched controls. Interventions Probands were assessed with the Structured|REFERENCES METHODS SLC1A1 GENE EXPRESSION ANALYSES For heritability calculations, we extracted lymphoblastoid cell line SLC1A1 gene expression values (probe No. 213664_at) from the Gene Expression Omnibus data set GSE1485 . 22 All individuals had a “present” call for this probe; technical replicates were averaged and then log 2 -transformed. Family structures for the 14 three-generat|id cell line SLC1A1 expression values for the 60 HapMap Centre d'Etude du Polymorphisme Humain founders with northern and western European ancestry from 2 independent platforms: the Gene Expression Omnibus data set GSE5859{{tag}}--REUSE-- (Affymetrix platform, probe No. 213664_at; replicates were averaged and transformed as above) 24 and from the Gene Expression Variation Web site hosted by the Wellcome Trust Sange|Institute ( http://www.sanger.ac.uk/humgen/genevar ; Illumina platform, probe No. GI_31543625-S). 25 Genotypes for all 90 individuals from the Centre d'Etude du Polymorphisme Humain HapMap data were downloaded from the HapMap Web site ( http://www.hapmap.org , release 22) for the entire SLC1A1 region plus 1 megabase (Mb) flanking both sides (corresponding to position 3 480 444 to 5 577 469 on chromosom| for single-locus and haplotypic associations with the software packages PLINK 26 and WHAP. 32 Conditional haplotype analyses were carried out as recently described after detection of a significant omnibus association. 28 Calculation of intermarker linkage disequilibrium and correction for multiple testing with 100 000 permutations for the 6 individual markers as well as for the 3-locus haplotype was |analyses varied slightly (SIR, n=198; hoarding factor, n=216).  Other Sectionsâ�¼ Abstract METHODS RESULTS COMMENT REFERENCES RESULTS SLC1A1 GENE EXPRESSION ANALYSES IN LYMPHOBLASTOID CELL LINES Using publicly available gene expression data, we observed that SLC1A1 is robustly expressed in lymphoblastoid cell lines. This prompted us to investigate whether SLC1A1 expression constitutes a heritable trait|th individually and as part of a haplotype, 17 and the haplotypic association has been replicated subsequently. 18 Figure 1 Regression of SLC1A1 expression on single-nucleotide polymorphisms using publicly accessible lymphoblastoid cell line data. Depicted here are 3 cis -acting expression quantitative trait loci that were nominally significant and had the same direction of (more ...) Figure 1 Regr|t shown). Table 1 Polymorphisms, Linkage Disequilibrium Pattern, and Single-Locus Association Tests in 325 OCD Probands and 662 Controls Haplotype analyses, in contrast, revealed a highly significant omnibus association of 3 of these markers, rs3087879, rs301430, and rs7858819, with OCD ( Table 2 ). In haplotype-specific tests (ie, when tested individually against all other haplotypes, with df =1), 3 of|interacting from within different haplotype blocks. This possibility might be addressed by genotyping markers at higher density or by resequencing large genomic regions. We have made extensive use of publicly accessible data with the aim of better understanding 1 facet of SLC1A1 functionality—gene expression. Some of the most extensive gene expression data sets currently available were obtained | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
335 | GSE5859 | 1/7/2007 | ['5859'] | [] | [u'17206142'] | 2782819 | [u'19925429'] | ['Spielman', 'Morley', 'Bastone', 'Ewens', 'Cheung', 'Burdick'] | ['Dolan', 'Zhang'] | [] | Curr Pharm Des | 2009 | 2009 | 1 | AND pmc_gds | 0 | 1 | ||||
336 | GSE5859 | 1/7/2007 | ['5859'] | [] | [u'17206142'] | 3005333 | [u'17206142'] | ['Spielman', 'Morley', 'Bastone', 'Ewens', 'Cheung', 'Burdick'] | ['Spielman', 'Morley', 'Bastone', 'Ewens', 'Cheung', 'Burdick'] | ['Spielman', 'Morley', 'Bastone', 'Ewens', 'Cheung', 'Burdick'] | Nat Genet | 2007 | 2007 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
337 | GSE5862 | 4/19/2007 | ['5862'] | [] | [u'17551510'] | 2801700 | [u'20003344'] | ['Marelli', 'Aitchison', 'Smith', 'Marzolf', 'Saleem', 'Rachubinski', 'Ramsey', 'Hwang'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | [] | BMC Genomics | 2009 | 12/11/2009 | 0 | tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302/E-GEOD-8302 Hs Liver Affymetrix 2 [ 55 ] GSE8302/E-GEOD-8302 Mm Liver Affymetrix 3 [ 55 ] GSE8302/E-GEOD-8302 Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291/E-GEOD-8291 Mm Liver Affymetrix 6 [ 55 ] GSE8292/E-GEOD-8292 Mm Liver Affymetrix 7 [ 55 ] GSE8295/E-GEOD-8295 Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753/E-GEOD-8753 Mm Liver Affymetrix 9 [ 57 ] GSE6903/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862{{tag}}--REUSE--/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862{{tag}}--REUSE--/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862{{tag}}--REUSE--/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862{{tag}}--REUSE--/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302, GSE9291 and GSE9290). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
338 | GSE5862 | 4/19/2007 | ['5862'] | [] | [u'17551510'] | 2680204 | [u'19216778'] | ['Marelli', 'Aitchison', 'Smith', 'Marzolf', 'Saleem', 'Rachubinski', 'Ramsey', 'Hwang'] | ['Chiogna', 'Risso', 'Massa', 'Romualdi'] | [] | BMC Bioinformatics | 2009 | 2/13/2009 | 0 | f the top ranking gene lists: 20, 50, 100, 500, and 600. 4.3.4 Real Data We used two cDNA expression datasets and two oligonucleotide datasets to validate our simulation results. All the datasets are publicly available at the GEO database. Baird et al. [ 24 ] (hereafter dataset A) studied expression profiling of 181 tumors representing various classes of bone and soft tissue sarcomas. In this study, we se|ference was obtained by pooling sarcoma cell lines. Expression datasets and platform annotation are available on the NCBI GEO database with platform identification number GPL1977 and reference series GSE2553. Urban et al. [ 25 ] (hereafter dataset B) analysed the rapamycin response in Saccharomyces cerevisiae. Global transcriptional analysis of rapamycin response was conducted on cells expressing eithe|M185498, GSM185503, GSM185504, GSM185518, GSM185519. Expression datasets and platform annotation are available on the NCBI GEO database with platform identification number GPL884 and reference series GSE7660. Smith et al. [ 26 ] (hereafter dataset C) studied the expression profiles of transcription factor deletion strains in the presence of oleate. mRNA levels in each of four deletion strains (delta_OA|onsidered only the delta_ADR1 samples. Expression datasets and platform annotation are available on the NCBI GEO database with platform identification number GPL4287 and GPL4303, and reference series GSE5862{{tag}}. De Pittà and colleagues [ 27 ] (hereafter dataset D) obtained expression profiling of bone marrow from paediatric patients with acute lymphoblastic leukemia (ALL) using a dedicated muscle |, Europe) prepared from male fetal skeletal muscle. Expression datasets and platform annotation are available on the NCBI GEO database with platform identification number GPL2011 and reference series GSE2604. By the analysis of four datasets, we are able to test the normalisation procedures in different situations: (i) either a large (dataset B) or a small (dataset C) proportion of genes expected to be | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
339 | GSE5862 | 4/19/2007 | ['5862'] | [] | [u'17551510'] | 1911199 | [u'17551510'] | ['Marelli', 'Aitchison', 'Smith', 'Marzolf', 'Saleem', 'Rachubinski', 'Ramsey', 'Hwang'] | ['Marelli', 'Aitchison', 'Smith', 'Marzolf', 'Saleem', 'Rachubinski', 'Ramsey', 'Hwang'] | ['Marelli', 'Aitchison', 'Smith', 'Marzolf', 'Saleem', 'Rachubinski', 'Ramsey', 'Hwang'] | Mol Syst Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
340 | GSE5863 | 4/19/2007 | ['5863'] | [] | [u'17551510'] | 1911199 | [u'17551510'] | ['Marelli', 'Aitchison', 'Smith', 'Marzolf', 'Saleem', 'Rachubinski', 'Ramsey', 'Hwang'] | ['Marelli', 'Aitchison', 'Smith', 'Marzolf', 'Saleem', 'Rachubinski', 'Ramsey', 'Hwang'] | ['Marelli', 'Aitchison', 'Smith', 'Marzolf', 'Saleem', 'Rachubinski', 'Ramsey', 'Hwang'] | Mol Syst Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
341 | GSE5869 | 1/1/2007 | ['5869'] | [] | [u'17132046'] | 1661682 | [u'17132046'] | ['Chen', 'Sonnenburg', 'Gordon'] | ['Chen', 'Sonnenburg', 'Gordon'] | ['Chen', 'Sonnenburg', 'Gordon'] | PLoS Biol | 2006 | 2006 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
342 | GSE5870 | 1/1/2007 | ['5870'] | [] | [u'17132046'] | 1661682 | [u'17132046'] | ['Chen', 'Sonnenburg', 'Gordon'] | ['Chen', 'Sonnenburg', 'Gordon'] | ['Chen', 'Sonnenburg', 'Gordon'] | PLoS Biol | 2006 | 2006 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
343 | GSE5872 | 2/1/2007 | ['5872'] | [] | [u'17283056'] | 1899935 | [u'17283056'] | ['', 'Tazi', 'Gabut', 'Dejardin', 'Soret'] | ['Tazi', 'Gabut', 'Dejardin', 'Soret'] | ['Tazi', 'Gabut', 'Dejardin', 'Soret'] | Mol Cell Biol | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
344 | GSE5882 | 2/13/2007 | ['5882'] | [] | [u'17261521'] | 1828832 | [u'17261521'] | [u'De', 'de', 'Saulnier', 'Kolida', 'Molenaar', 'Gibson'] | ['Kolida', 'Saulnier', 'de', 'Molenaar', 'Gibson'] | ['Kolida', 'Saulnier', 'de', 'Gibson', 'Molenaar'] | Appl Environ Microbiol | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
345 | GSE5884 | 1/1/2007 | ['5884'] | [] | [u'17189430'] | 1820458 | [u'17189430'] | ['Nicolas', 'Amiot', 'Robine', 'Gidrol', 'Barillot', 'Uematsu', 'Borde'] | ['Nicolas', 'Amiot', 'Robine', 'Gidrol', 'Barillot', 'Uematsu', 'Borde'] | ['Nicolas', 'Amiot', 'Robine', 'Gidrol', 'Barillot', 'Uematsu', 'Borde'] | Mol Cell Biol | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
346 | GSE5886 | 9/1/2007 | ['5886'] | [] | [u'18223091'] | 2293194 | [u'18223091'] | ['Lupo', 'Foga\xc3\xa7a', 'Zaini', 'Nakaya', 'da', 'V\xc3\xaancio'] | ['Lupo', 'Foga\xc3\xa7a', 'Zaini', 'Nakaya', 'da', 'V\xc3\xaancio'] | ['Lupo', 'Foga\xc3\xa7a', 'Zaini', 'Nakaya', 'da', 'V\xc3\xaancio'] | J Bacteriol | 2008 | 2008 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
347 | GSE5888 | 9/1/2007 | ['5888'] | [] | [u'18223091'] | 2293194 | [u'18223091'] | ['', 'Lupo', 'Foga\xc3\xa7a', 'Zaini', 'Nakaya', 'da', 'V\xc3\xaancio'] | ['Lupo', 'Foga\xc3\xa7a', 'Zaini', 'Nakaya', 'da', 'V\xc3\xaancio'] | ['Lupo', 'Foga\xc3\xa7a', 'Zaini', 'Nakaya', 'da', 'V\xc3\xaancio'] | J Bacteriol | 2008 | 2008 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
348 | GSE5913 | 1/3/2007 | ['5913'] | [] | [u'17213802'] | 2737644 | [u'19015069'] | ['Berenjeno', 'Bustelo', u'Nu\xf1ez', 'N\xc3\xba\xc3\xb1ez'] | ['Berenjeno', 'Bustelo'] | ['Berenjeno', 'Bustelo'] | Clin Transl Oncol | 2008 | 2008 Nov | 0 | e and by the US National Cancer Institute. All Spanish funding is co-sponsored by the European Union FEDER programme. Footnotes The genomic data of this work are deposited in the NCBI Gene Expression Omnibus database (Accession number: GSE5913{{tag}}--DEPOSIT-- ).  Other Sections� Abstract Introduction Materials and methods Results and discussion References References 1. Bustelo XR, Sauzeau V, Berenjeno IM. GTP-binding | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
349 | GSE5913 | 1/3/2007 | ['5913'] | [] | [u'17213802'] | 2084474 | [u'17213802'] | ['Berenjeno', 'Bustelo', u'Nu\xf1ez', 'N\xc3\xba\xc3\xb1ez'] | ['Berenjeno', 'Bustelo', 'N\xc3\xba\xc3\xb1ez'] | ['Berenjeno', 'Bustelo', 'N\xc3\xba\xc3\xb1ez'] | Oncogene | 2007 | 6/21/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
350 | GSE5914 | 5/11/2007 | ['5914'] | [] | [u'17560561'] | 2000702 | [u'17560561'] | ['Sullivan', 'Ko', 'Shaik', 'Piao', u'S.H.Ko', 'Sharov', 'Sharova', 'Hogan', 'Stewart'] | ['Sullivan', 'Ko', 'Shaik', 'Piao', 'Sharov', 'Sharova', 'Hogan', 'Stewart'] | ['Sullivan', 'Ko', 'Shaik', 'Piao', 'Sharov', 'Sharova', 'Hogan', 'Stewart'] | Dev Biol | 2007 | 7/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
351 | GSE5915 | 4/3/2007 | ['5915'] | [] | [u'17519037'] | 1888706 | [u'17519037'] | ['Cadrin-Girard', 'Nishida', 'St-Amand', 'Yoshioka', 'Kouadjo'] | ['Cadrin-Girard', 'Nishida', 'St-Amand', 'Yoshioka', 'Kouadjo'] | ['Cadrin-Girard', 'Nishida', 'St-Amand', 'Yoshioka', 'Kouadjo'] | BMC Genomics | 2007 | 5/22/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
352 | GSE5917 | 6/6/2007 | ['5917'] | [] | [u'17550995'] | 2204013 | [u'17953764'] | ['Paredes', 'Papoutsakis', u'Li', 'Huang', 'Miller'] | ['Paredes', 'Wang', 'Miller', 'Apostolidis', 'Fuhrken', 'Huang', 'Chen', 'Papoutsakis'] | ['Paredes', 'Papoutsakis', 'Huang', 'Miller'] | BMC Genomics | 2007 | 10/22/2007 | 0 | d [ 73 ]. For Mk vs. G comparison, the Mk/G value was first calculated for each experiment and then averaged. Raw and normalized data were deposited in the Gene Expression Omnibus [ 77 ] (Mk cells: GSE3839; G cells: GSE5917{{tag}}--DEPOSIT--). All subsequent data analysis was performed using the MultiExperiment Viewer 3.0 (MeV; Institute for Genomic Research, Rockville, MD) [ 74 ]. Differentially expressed genes were | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
353 | GSE5920 | 6/1/2007 | ['5920'] | [] | [] | 2267712 | [u'18186939'] | [u'Wang', u'Azaro', u'Hu', u'Yang', u'Li', u'Yue', u'Cui'] | ['Wang', 'Azaro', 'Hu', 'Yang', 'Li', 'Yue', 'Cui'] | ['Wang', 'Azaro', 'Hu', 'Yang', 'Li', 'Yue', 'Cui'] | BMC Genomics | 2008 | 1/10/2008 | 0 | ection procedure is schematically illustrated in Fig. 1 . Resulting data have been deposited to the NCBI's Gene Expression Omnibus (GEO) [ 31 ] and are accessible through GEO Series accession number GSE5920{{tag}}--DEPOSIT--. Figure 1 Schematic illustration of the high-throughput gene expression profiling procedure . Fluorescent labeling is indicated by an asterisk. Reproducibility of the high-throughput gene expressio|sion, remediation or response to treatments. Data discussed in this publication have been deposited in the NCBI's Gene Expression Omnibus [ 31 ] and are accessible through GEO Series accession number GSE5920{{tag}}--DEPOSIT--. Methods Cell lines and single cell preparation Human breast cancer cell line MCF-7 and ovarian cancer cell line NCI/ADR-RES were kindly provided by Drs. Jinming Yang, Hao Wu and William Hait [ 42 | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
354 | GSE5923 | 8/23/2007 | ['5923'] | ['2901'] | [u'17483316'] | 2976757 | [u'20925918'] | ['Ahr', 'Dietrich', 'Ellinger-Ziegelbauer', 'Stemmer'] | ['Jansen', 'Tesson', 'Breitling'] | [] | BMC Bioinformatics | 2010 | 10/6/2010 | 0 | WGCNA) to differential network analysis. We first describe the five steps involved in DiffCoEx and then, to illustrate the method's effectiveness, we present the results of an analysis performed on a publicly available dataset generated by Stemmer et al. [ 13 ]. Algorithm Our method builds on WGCNA [ 14 , 15 ], which is a framework for coexpression analysis. Identification of coexpression modules with WGC|. We identify modules of genes that are differentially coexpressed and, by using gene set enrichment analysis, we provide evidence for their biological relevance. Dataset Our dataset (Gene Expression Omnibus GEO GSE5923{{tag}}) contains Affymetrix gene expression profiles of renal cortex outer medulla in wild-type- and Eker rats treated with carcinogens. The dataset is a time course as the rats were treated wit | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
355 | GSE5926 | 2/27/2007 | ['5926'] | ['2925'] | [u'17484738'] | 2168061 | [u'17933919'] | ['Abbott', 'van', 'Knijnenburg', 'de', 'Pronk', 'Reinders'] | ['Pronk', 'Knijnenburg', 'De', 'Daran-Lapujade', 'Walker', 'Walsh', 'Reinders', 'Hazelwood', 'Daran'] | ['Knijnenburg', 'Pronk', 'Reinders'] | Appl Environ Microbiol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
356 | GSE5929 | 7/31/2007 | ['5929'] | [] | [u'17785531'] | 1987344 | [u'17785531'] | ['', 'King', 'Van', 'Kaur', 'Hohmann', 'Schmid', 'Martin', 'Baliga', 'Pan', 'Reiss'] | ['King', 'Van', 'Kaur', 'Hohmann', 'Schmid', 'Martin', 'Baliga', 'Pan', 'Reiss'] | ['King', 'Van', 'Kaur', 'Hohmann', 'Schmid', 'Martin', 'Baliga', 'Pan', 'Reiss'] | Genome Res | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
357 | GSE5932 | 4/25/2007 | ['5932'] | [] | [u'17559304'] | 1891326 | [u'17559304'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | PLoS Pathog | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
358 | GSE5934 | 4/25/2007 | ['5934'] | [] | [u'17559304'] | 1891326 | [u'17559304'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | PLoS Pathog | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
359 | GSE5935 | 4/26/2007 | ['5935'] | [] | [u'17559304'] | 1891326 | [u'17559304'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | ['Blokesch', 'Schoolnik'] | PLoS Pathog | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
360 | GSE5938 | 3/1/2007 | ['5938'] | [] | [u'17389876'] | 1847951 | [u'17389876'] | ['Galitski', 'Carter', 'Thorsson', 'Shelby', 'Marzolf', 'Prinz', 'Neou'] | ['Galitski', 'Carter', 'Thorsson', 'Shelby', 'Marzolf', 'Prinz', 'Neou'] | ['Galitski', 'Carter', 'Thorsson', 'Shelby', 'Marzolf', 'Prinz', 'Neou'] | Mol Syst Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
361 | GSE5941 | 9/28/2007 | ['5941'] | [] | [u'17868468'] | 2045682 | [u'17868468'] | ['Lin', 'Eshaghi', 'Li', 'Liu', 'Chu', 'Karuturi'] | ['Lin', 'Eshaghi', 'Li', 'Liu', 'Chu', 'Karuturi'] | ['Lin', 'Eshaghi', 'Li', 'Liu', 'Chu', 'Karuturi'] | BMC Genomics | 2007 | 9/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
362 | GSE5943 | 12/1/2007 | ['5943'] | [] | [] | 2311291 | [u'18371188'] | [u'Rodenburg', u'Kramer', u'Keijer', u'Bovee-Oudenhoven'] | ['Vink', 'van', 'Rodenburg', 'Keijer', 'Bovee-Oudenhoven', 'Kramer'] | [u'Rodenburg', u'Kramer', u'Keijer', u'Bovee-Oudenhoven'] | BMC Genomics | 2008 | 3/27/2008 | 0 | s were normalized against the Cy3 reference as described previously [ 73 ]. The data have been deposited in NCBIs Gene Expression Omnibus [ 74 ] and are accessible through GEO Series accession number GSE5943{{tag}}--DEPOSIT--. The complete dataset is available in Additional files 4 and 5 . Fold changes calculations were performed in Microsoft Excel, fold change equals ratio FOS/control in the case of increase or equa | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
363 | GSE5947 | 1/26/2007 | ['5947'] | [] | [u'17227908'] | 2374994 | [u'17683608'] | ['Heremans', u'Kidder', u'Lutton', 'Panoskaltsis-Mortari', u'Sharov', u'Ulloa-Montoya', u'Crabbe', 'Nelson-Holte', u'Pauwelyn', 'Tolar', u'Hu', "O'Shaughnessy", 'Serafini', 'Oki', 'Burns', 'Jiang', 'Blazar', u'Ko', 'Frommer', u'Piao', 'Buckley', 'Dylla', 'Weissman', 'Verfaillie', 'Pelacho', 'Bryder', 'Fine', 'Rossi', u'Chase'] | ['Kidder', 'Luttun', 'Pauwelyn', 'Ko', 'Geraerts', 'Hu', 'Verfaillie', 'Piao', 'Sharov', 'Chase', 'Ulloa-Montoya', 'Crabbe'] | ['Kidder', 'Pauwelyn', 'Ko', 'Hu', 'Verfaillie', 'Piao', 'Sharov', 'Chase', 'Ulloa-Montoya', 'Crabbe'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
364 | GSE5947 | 1/26/2007 | ['5947'] | [] | [u'17227908'] | 2118428 | [u'17227908'] | ['Heremans', u'Kidder', u'Lutton', 'Panoskaltsis-Mortari', u'Sharov', u'Ulloa-Montoya', u'Crabbe', 'Nelson-Holte', u'Pauwelyn', 'Tolar', u'Hu', "O'Shaughnessy", 'Serafini', 'Oki', 'Burns', 'Jiang', 'Blazar', u'Ko', 'Frommer', u'Piao', 'Buckley', 'Dylla', 'Weissman', 'Verfaillie', 'Pelacho', 'Bryder', 'Fine', 'Rossi', u'Chase'] | ['Burns', 'Jiang', 'Blazar', 'Frommer', 'Weissman', 'Nelson-Holte', 'Heremans', 'Tolar', 'Panoskaltsis-Mortari', 'Bryder', 'Pelacho', 'Verfaillie', 'Serafini', 'Oki', 'Buckley', 'Dylla', 'Fine', 'Rossi', "O'Shaughnessy"] | ['Bryder', 'Burns', 'Jiang', 'Frommer', 'Nelson-Holte', 'Blazar', 'Weissman', 'Tolar', 'Panoskaltsis-Mortari', 'Heremans', 'Verfaillie', 'Pelacho', 'Serafini', 'Oki', 'Buckley', 'Dylla', 'Fine', 'Rossi', "O'Shaughnessy"] | J Exp Med | 2007 | 1/22/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
365 | GSE5949 | 8/6/2007 | ['5949'] | [] | [u'20053763'] | 2655965 | [u'18999108'] | ['Ikediobi', 'Weinstein', 'Pommier', 'Nishizuka', 'Shankavaram', 'Ziegler', 'Lorenzi', 'Ho', u'Kahn', 'Bussey', 'Reimers', 'Reinhold'] | ['Liu', 'Clarke', 'Yoon', 'Li'] | [] | AMIA Annu Symp Proc | 2008 | 11/6/2008 | 1 | eriments. In this paper, we report our experience in exploring Gene Expression Data (MGED) ontology (MO) 9 and NCI Thesaurus for annotating breast cancer microarray data available at Gene Expression Omnibus (GEO) 10 . Specifically, we tailored NCI Thesaurus to obtain breast cancer microarray clinical ontology (BCM-CO), an ontology to capture breast cancer microarray clinical information. The coverage of|I Metathesaurus are used to provide terminology support to the public Web portal, http://cancer.gov , numerous portals supporting consortia, and other communities of researchers. The Gene Expression Omnibus (GEO) was initiated to serve as a public repository for a wide range of high-throughput experimental data, which includes data from single and dual channel microarray-based experiments measuring mRNA| The prototype coverage with respect to four categories is discussed below in detail. DiseaseState and Histology In the BCM-CO prototype, 82 histology nodes were mapped. We identified six series ( GSE2109 , GSE5949{{tag}}--REUSE-- , GSE5720 , GSE6595 , GSE1477 , and GSE7849 ) in which histology terms can be extracted using simple patterns. Overall, 52 terms were retrieved, among which 48 are histology terms, 2 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
366 | GSE5949 | 8/6/2007 | ['5949'] | [] | [u'20053763'] | 2755823 | [u'19796399'] | ['Ikediobi', 'Weinstein', 'Pommier', 'Nishizuka', 'Shankavaram', 'Ziegler', 'Lorenzi', 'Ho', u'Kahn', 'Bussey', 'Reimers', 'Reinhold'] | ['Phillips', 'Krauthammer', 'McCusker', 'Gonz\xc3\xa1lez', 'Finkelstein'] | [] | BMC Bioinformatics | 2009 | 10/1/2009 | 0 | om caTissue to enhance the analysis of gene expression data from caArray. We first set up grid instances for caTissue [ 27 ] and caArray [ 28 ] holding clinical data [ 29 ] and experimental data (GEO GSE5949{{tag}}--REUSE-- [ 30 ]) on the NCI-60 cell lines. We then generated the OWL ontologies for caTissue, caArray, and the NCIt concepts they use from the metadata available from those services. The URLs for these onto|related to every Hybridization caArray. The results of the query are in Additional file 8 . We used these diagnoses as labels in Figure 1 , a Principal Components Analysis (PCA) Projection of GEO GSE5949{{tag}}--REUSE-- in caArray. The diagnoses that are shown are far more specific than the usual "cancer type" that is available in the GEO data set. Using this technique, other statistical analyses can be performed | of clinical annotations that caTissue can be customized to contain. Figure 1 Principal components analysis . Projection of the first two principal components of gene expression microarray experiment GSE5949{{tag}}--REUSE-- from GEO. The clinical diagnoses for the biological source cell line were extracted from caTissue and joined using Corvus. The significance of this join is important: caArray and caTissue, while ba|sue and caArray can publish data via caGrid services. caTissue and caArray instances were deployed with caGrid services that published to the caGrid training grid. Expression data was loaded from GEO GSE5949{{tag}}--REUSE-- [ 30 ] by downloading the data and converting it into the MAGE-TAB format using the GEOImport and TabConverter tools from the tab2mage project [ 31 ]. Additional curation was needed to fix some ref| CellSpecimen . The Label used is found through the NCIt term nci:Label . Click here for file Additional file 7 SPARQL query used to retrieve the clinical diagnosis for the cell lines used in GEO GSE5949{{tag}}--REUSE--. The query takes advantage of the inferred relationship derived_from that is described in Additional file 6. Click here for file Additional file 8 Diagnoses for hybridizations, tab separated. Cli | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
367 | GSE5949 | 8/6/2007 | ['5949'] | [] | [u'20053763'] | 2762058 | [u'19828069'] | ['Ikediobi', 'Weinstein', 'Pommier', 'Nishizuka', 'Shankavaram', 'Ziegler', 'Lorenzi', 'Ho', u'Kahn', 'Bussey', 'Reimers', 'Reinhold'] | ['Balestrieri', 'Vanoni', 'Chiaradonna', 'Alberghina'] | [] | BMC Bioinformatics | 2009 | 10/15/2009 | 0 | Table 2 Gene expression profiling datasets of NCI60 cell lines and normal tissues analyzed in this study Reference Tissue of origin Number of transcriptional profiles GEO Number [ 31 ] NCI60 cells 60 GSE5949{{tag}}--REUSE-- Breast 0 - [ 106 ] CNS 2 GSE96 [ 107 ] Colon 4 GSE6731 [ 108 ] Blood 4 GSE1402 [ 106 ] Lung 2 GSE96 Skin 0 - [ 106 ] Ovary 3 GSE96 [ 106 ] Prostate 3 GSE96 [ 106 ] Kidney 3 GSE96 Gene expression pr|) at the National Center for Biotechnology Information (NCBI) website ( ) [ 105 ]. In particular, gene expression profiles of NCI60 cell collection (cancer samples) were recovered from GEO database ( GSE5949{{tag}}--REUSE--, [ 31 ]) in which the experimental data were obtained by using the Affymetrix HG-U95Av2 oligonucleotide array platform. For the analysis only results obtained by oligonucleotide arrays were consid|NA array platform. Therefore, also for normal tissue samples, the data used for the comparative analysis, were recovered from transcriptional profiles produced by using U95Av2 oligonucleotide array ( GSE96 [ 106 ], GSE6731 [ 107 ] and GSE1402 [ 108 ]). A total of 81 transcriptional profiles encompassing cancer cell lines with nine histological origins and samples from six normal tissues were reco | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
368 | GSE5949 | 8/6/2007 | ['5949'] | [] | [u'20053763'] | 2821037 | [u'20053763'] | ['Ikediobi', 'Weinstein', 'Pommier', 'Nishizuka', 'Shankavaram', 'Ziegler', 'Lorenzi', 'Ho', u'Kahn', 'Bussey', 'Reimers', 'Reinhold'] | ['Ikediobi', 'Weinstein', 'Pommier', 'Nishizuka', 'Shankavaram', 'Ziegler', 'Lorenzi', 'Ho', 'Bussey', 'Reimers', 'Reinhold'] | ['Ikediobi', 'Weinstein', 'Pommier', 'Nishizuka', 'Shankavaram', 'Ziegler', 'Lorenzi', 'Ho', 'Bussey', 'Reimers', 'Reinhold'] | Mol Cancer Ther | 2010 | 2010 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
369 | GSE5956 | 10/1/2007 | ['5956'] | [] | [u'17708775'] | 2375002 | [u'17708775'] | ['Mackay', 'Yamamoto', 'Jordan', 'Morgan', 'Carbone'] | ['Mackay', 'Yamamoto', 'Jordan', 'Morgan', 'Carbone'] | ['Mackay', 'Yamamoto', 'Jordan', 'Morgan', 'Carbone'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
370 | GSE5961 | 4/18/2007 | ['5961'] | [] | [u'17409088'] | 3018144 | [u'21080971'] | ['Janes', 'Wiltshire', 'Bass', 'Batalov', 'Su', 'Delano', 'McClurg', 'Wu', 'Kohsaka', 'Shimomura', 'Walker', 'Takahashi'] | ['Cahan', 'Graubert'] | [] | BMC Genomics | 2010 | 11/17/2010 | 0 | Hypothalamus and adipose tissue expression data were obtained from GEO (accessionsÊGSE5961{{key}}--REUSE--ÊandÊGSE8028, respectively) | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
371 | GSE5961 | 4/18/2007 | ['5961'] | [] | [u'17409088'] | 2642639 | [u'19091771'] | ['Janes', 'Wiltshire', 'Bass', 'Batalov', 'Su', 'Delano', 'McClurg', 'Wu', 'Kohsaka', 'Shimomura', 'Walker', 'Takahashi'] | ['Lam', 'Gatti', 'Rusyn', 'Nobel', 'Wright', 'Shabalin'] | [] | Bioinformatics | 2009 | 2/15/2009 | 0 | ://www.genenetwork.org/genotypes/BXD.geno ; further information is available at http://www.genenetwork.org/dbdoc/BXDGeno.html . 2.1.3 Hypothalamus gene expression data The mouse hypothalamus dataset GSE5961{{tag}}--REUSE-- was downloaded from the NCBI Gene Expression Omnibus website. These data are described in McClurg et al. ( 2007 ). The 58 CEL files were normalized using the gcrma package from Bioconductor (vers | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
372 | GSE5961 | 4/18/2007 | ['5961'] | [] | [u'17409088'] | 1893038 | [u'17409088'] | ['Janes', 'Wiltshire', 'Bass', 'Batalov', 'Su', 'Delano', 'McClurg', 'Wu', 'Kohsaka', 'Shimomura', 'Walker', 'Takahashi'] | ['Janes', 'Wiltshire', 'Bass', 'Batalov', 'Su', 'Delano', 'McClurg', 'Wu', 'Kohsaka', 'Shimomura', 'Walker', 'Takahashi'] | ['Janes', 'Wiltshire', 'Bass', 'Batalov', 'Su', 'Delano', 'McClurg', 'Wu', 'Kohsaka', 'Shimomura', 'Walker', 'Takahashi'] | Genetics | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
373 | GSE5962 | 6/1/2007 | ['5962'] | [] | [] | 2394769 | [u'17577413'] | [u'Coleman', u'Bianchi-Frias', u'Nelson', u'Pritchard', u'Mecham'] | ['Coleman', 'Mecham', 'Nelson', 'Pritchard', 'Bianchi-Frias'] | ['Coleman', 'Mecham', 'Nelson', 'Pritchard', 'Bianchi-Frias'] | Genome Biol | 2007 | 2007 | 0 | k where Y ijk is the percentage of stroma area observed from mouse K, lobe j, strain i. Data Microarray data for this study have been deposited in the Gene Expression Omnibus [ 47 ] under accession GSE5962{{tag}}--DEPOSIT--. Acknowledgements We thank members of the Nelson laboratory for helpful discussions. We thank Barbara Trask for critical review of the manuscript and helpful discussions. We thank Sarah Hawley for | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
374 | GSE5969 | 2/1/2007 | ['5969'] | [] | [u'17509148'] | 1929152 | [u'17509148'] | ['Firestein', 'Zhang', 'De', 'Nicolae', 'Gilad', 'Pinto'] | ['Firestein', 'Zhang', 'De', 'Nicolae', 'Gilad', 'Pinto'] | ['Firestein', 'Zhang', 'De', 'Nicolae', 'Gilad', 'Pinto'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
375 | GSE5972 | 7/31/2007 | ['5972'] | [] | [u'17537853'] | 1951379 | [u'17537853'] | ['Somogyi', 'Fang', 'McGeer', 'Greller', 'Danesh', 'Wilkinson', 'DeVries', 'Seneviratne', 'Kelvin', 'Poutanen', 'Persad', 'Muller', 'Brunton', 'Bosinger', 'Keshavjee', 'Bermejo-Martin', 'Richardson', 'Louie', 'Gold', 'Ran', 'Humar', 'Willey', 'Cameron', 'Loeb', 'Xu'] | ['Somogyi', 'Fang', 'McGeer', 'Greller', 'Danesh', 'Wilkinson', 'DeVries', 'Seneviratne', 'Kelvin', 'Poutanen', 'Persad', 'Muller', 'Brunton', 'Bosinger', 'Keshavjee', 'Bermejo-Martin', 'Richardson', 'Louie', 'Gold', 'Ran', 'Humar', 'Willey', 'Cameron', 'Loeb', 'Xu'] | ['Somogyi', 'Fang', 'McGeer', 'Greller', 'Danesh', 'Wilkinson', 'DeVries', 'Seneviratne', 'Kelvin', 'Poutanen', 'Persad', 'Muller', 'Brunton', 'Bosinger', 'Keshavjee', 'Bermejo-Martin', 'Richardson', 'Louie', 'Gold', 'Ran', 'Humar', 'Willey', 'Cameron', 'Loeb', 'Xu'] | J Virol | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
376 | GSE5975 | 10/25/2007 | ['5975'] | [] | [u'17317821'] | 2643030 | [u'18835033'] | ['', 'Qin', 'Tang', 'Wang', 'Jia', 'Shen', 'Sun', 'Ye', 'Budhu', 'Liu', 'Forgues', 'Lu', 'Chen'] | ['Zheng', 'Wang', 'Sengupta', 'Ried', 'Jia', 'Appella', 'Xiao', 'Li', 'Chilton', 'Cao', 'Kim', 'Deng', 'Xu'] | ['Wang', 'Jia'] | Cancer Cell | 2008 | 10/7/2008 | 0 | nd lowered levels in cancers. (D–F) SIRT1 expression levels in microarray data of 263 HCC samples, presented as raw log2 ratio (T/N) using previously described dataset (GEO accession number, GSE5975{{tag}}--REUSE-- ) (D), and bars (E). Realtime RT-PCR of 10 pairs of samples was also presented (F). Data shown is average ±SD. Bars: 100 µm for C. Next, we performed tissue array to compare SIRT1 p | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
377 | GSE5975 | 10/25/2007 | ['5975'] | [] | [u'17317821'] | 2786938 | [u'19812400'] | ['', 'Qin', 'Tang', 'Wang', 'Jia', 'Shen', 'Sun', 'Ye', 'Budhu', 'Liu', 'Forgues', 'Lu', 'Chen'] | ['Wang', 'Roessler', 'Qin', 'Lee', 'Tang', 'Lo', 'Croce', 'Budhu', 'Meltzer', 'Ambs', 'Forgues', 'Fan', 'Ji', 'Sun', 'Chen', 'Shi', 'Ng', 'Yu', 'Man'] | ['Qin', 'Wang', 'Tang', 'Sun', 'Budhu', 'Forgues', 'Chen'] | N Engl J Med | 2009 | 10/8/2009 | 0 | men and those from women, we globally analyzed the microRNA expression profiles of 241 patients in cohort 1, in which both tumor and nontumor microRNA microarray data were available (Gene Expression Omnibus [GEO] accession number, GSE6857 ). 21 To avoid potential confounding factors, an age-matched and balanced case set was used to identify microRNAs with different expression levels in men and women, |patients in cohort 1 with available microRNA and messenger RNA (mRNA) microarray data. The mRNA microarray data were based on the expression of approximately 21,000 mRNA genes (GEO accession number, GSE5975{{tag}}--REUSE-- ). 23 Multidimensional scaling analysis on the basis of the first three principal components of all genes revealed that a majority of patients with low miR-26 expression clustered separately from | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
378 | GSE5975 | 10/25/2007 | ['5975'] | [] | [u'17317821'] | 2828822 | [u'19150350'] | ['', 'Qin', 'Tang', 'Wang', 'Jia', 'Shen', 'Sun', 'Ye', 'Budhu', 'Liu', 'Forgues', 'Lu', 'Chen'] | ['Minato', 'Qin', 'Wang', 'Jia', 'Tang', 'Honda', 'Ye', 'Budhu', 'Wauthier', 'Reid', 'Forgues', 'Yang', 'Ji', 'Kaneko', 'Yamashita'] | ['Qin', 'Wang', 'Jia', 'Tang', 'Ye', 'Budhu', 'Forgues'] | Gastroenterology | 2009 | 2009 Mar | 0 | nd the Liver Disease Center of Kanazawa University Hospital, and the study was approved by the Institutional Review Board of the respective Institutes. The microarray data from clinical specimens are publicly available (GEO accession number, GSE5975{{tag}}--REUSE-- ) 27 . Array data from a total of 156 HCC cases (155 hepatitis B virus-positive) corresponding to two subtypes of HCC, i.e., HpSC-HCC and MH-HCC, were used | with molecular features of HpSC We re-evaluated the gene expression profiles that were uniquely associated with two recently identified prognostic subtypes of HCC, i.e., HpSC-HCC and MH-HCC, using a publicly available microarray dataset of 156 HCC cases (GEO accession number: GSE5975{{tag}}--REUSE-- ). Sixty cases were defined as HpSC-HCC with a poor prognosis while 96 cases were defined as MH-HCC with a good prognosis|f Energy grant (DE-FG02-02ER-63477). Sponsors have no role in the study design, data collection, analysis and interpretation. Footnotes Disclosures: No conflicts of interest exist. Microarray data: Publicly available at http://www.ncbi.nlm.nih.gov/geo/ (Accession number: GSE5975{{tag}}--REUSE-- ).  Other Sections� Abstract INTRODUCTION Materials and Methods RESULTS DISCUSSION Supplementary Material References Refere | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
379 | GSE5981 | 10/22/2007 | ['5981'] | [] | [u'17351046'] | 1855875 | [u'17351046'] | ['Lee', 'Karimova', 'Piper', 'Busby', 'Ladant', 'Hobman', 'Kolb', 'Oshima', 'Westblade', 'Webster', 'Mitchell', u'Ogasawara'] | ['Lee', 'Karimova', 'Piper', 'Busby', 'Ladant', 'Hobman', 'Kolb', 'Oshima', 'Westblade', 'Webster', 'Mitchell'] | ['Lee', 'Karimova', 'Piper', 'Ladant', 'Busby', 'Hobman', 'Kolb', 'Oshima', 'Westblade', 'Webster', 'Mitchell'] | J Bacteriol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
380 | GSE5984 | 3/22/2007 | ['5984'] | ['2627'] | [u'17318176'] | 1864965 | [u'17318176'] | ['Johansson', 'Bernhardsson', 'Stenberg', 'Larsson'] | ['Johansson', 'Bernhardsson', 'Stenberg', 'Larsson'] | ['Johansson', 'Bernhardsson', 'Stenberg', 'Larsson'] | EMBO J | 2007 | 5/2/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
381 | GSE5986 | 2/14/2007 | ['5986'] | [] | [u'17254555'] | 2808885 | [u'19808935'] | ['Ge', 'Jin', 'Lin', 'Xiao', 'Li', 'Zhao', 'Liang', 'Ruan'] | ['Wang', 'Zhou', 'Zhang', 'Su', 'Li', 'Liu', 'Ling', 'Yu'] | ['Li'] | Nucleic Acids Res | 2010 | 2010 Jan | 0 | as Arabidopsis thaliana , Oryza sativa , Populus trichocarpa , Glycine max , etc., it is desirable to develop a plant miRNA database through the integration of large amounts of information about publicly deposited miRNA data. The plant miRNA database (PMRD) integrates available plant miRNA data deposited in public databases, gleaned from the recent literature, and data generated in-house. This databa|alt and osmotic stresses) regulated miRNAs ( 35 ), maize miRNAs responding to salt stress in roots ( 36 ), miRNA expression profiles generated from Arabidopsis grown at different temperatures (GEO: GSE11535), and rice miRNAs responding to drought stress (GEO: GSE5986{{tag}}--MENTION--), etc. For example, in the data set for poplar cold stress, there were 75 probes for 168 miRNAs, and a total of 21 samples including se | 0 | 0 | 1 | NOT pmc_gds | 1 | 0 |
382 | GSE5988 | 6/10/2007 | ['5988'] | [] | [] | 2099488 | [u'18047719'] | [u'Baghdayan', u'Shankar', u'Kyker', u'Dozmorov', u'Hurst', u'Saban', u'Centola'] | ['Baghdayan', 'Shankar', 'Kyker', 'Dozmorov', 'Hurst', 'Saban', 'Centola'] | ['Baghdayan', 'Shankar', 'Kyker', 'Dozmorov', 'Hurst', 'Saban', 'Centola'] | BMC Bioinformatics | 2007 | 11/1/2007 | 0 | necessary to identify at this point every gene that shows a statistically increased probability of modulation. The expression levels of all 21,521 genes are listed on the public database GEO (Series GSE5988{{tag}}--DEPOSIT--). Hierarchical clustering The expression values of 239 VHV genes were clustered by Cluster 3.0 program [ 36 ], the next generation of the Cluster program developed by M. B. Eisen [ 37 ]. Genes were | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
383 | GSE5991 | 2/1/2007 | ['5991'] | [] | [u'17565378'] | 1888727 | [u'17565378'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | PLoS One | 2007 | 6/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
384 | GSE6005 | 4/10/2007 | ['6005'] | [] | [u'17416654'] | 1913364 | [u'17416654'] | ['Abee', 'de', 'Molenaar', 'van', 'Moezelaar'] | ['Abee', 'de', 'Molenaar', 'van', 'Moezelaar'] | ['van', 'de', 'Abee', 'Molenaar', 'Moezelaar'] | J Bacteriol | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
385 | GSE6008 | 4/9/2007 | ['6008'] | [] | [u'16452189', u'19843521'] | 2602602 | [u'19104654'] | [u'Akyol', 'Feng', 'Hendrix', u'Misek', u'Hanash', 'Schwartz', 'Kadikoy', 'Bommer', 'Iura', 'Cho', 'Wu', 'Giordano', u'Zhai', 'Fearon', 'Kuick', u'Katabuchi', u'Williams', 'Sikorski'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008{{tag}}--REUSE-- GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
386 | GSE6008 | 4/9/2007 | ['6008'] | [] | [u'16452189', u'19843521'] | 2620272 | [u'19014681'] | [u'Akyol', 'Feng', 'Hendrix', u'Misek', u'Hanash', 'Schwartz', 'Kadikoy', 'Bommer', 'Iura', 'Cho', 'Wu', 'Giordano', u'Zhai', 'Fearon', 'Kuick', u'Katabuchi', u'Williams', 'Sikorski'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008{{tag}}--REUSE--, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
387 | GSE6008 | 4/9/2007 | ['6008'] | [] | [u'16452189', u'19843521'] | 2679144 | [u'19440550'] | [u'Akyol', 'Feng', 'Hendrix', u'Misek', u'Hanash', 'Schwartz', 'Kadikoy', 'Bommer', 'Iura', 'Cho', 'Wu', 'Giordano', u'Zhai', 'Fearon', 'Kuick', u'Katabuchi', u'Williams', 'Sikorski'] | ['B\xc3\xb8rresen-Dale', 'Helland', 'Mills', 'Hennessy', 'Lahad', 'Liu', 'Lu', 'Schaner', 'Kristensen', 'Murph', 'Yu', 'Hall'] | [] | PLoS One | 2009 | 2009 | 1 | reating average linkage. The display of hierarchical clustering graphs utilized TreeView [23] . For hierarchical clusters, publicly-available ovarian cancer gene expression datasets (GSE6822 [24] , GSE6008{{tag}}--REUSE-- [25] , GSE10971 and GSE12418 [26] ) were downloaded from the NCBI Entrez GEO website ( http://www.ncbi.nlm.nih.gov/sites/entrez?db=| data suggested LPA influences the development of serous EOC and/or the 39-gene signature characterizes serous EOC. To test these hypotheses, we analyzed additional datasets including a test dataset (GSE6008{{tag}}--REUSE--; N = 103) [24] containing normal ovary specimens and ovarian tumors from endometrioid, mucinous, serous and clear cell types. Hierarchical clustering analy| of transcripts contained in the LPA gene signature were clustered and average linkages were calculated using the Cluster software. Results were visualized using the TreeView program (see methods ). GSE6008{{tag}}--REUSE--, N = 103. The strongly-positive LPA cluster, N = 19, (mostly red) is seen in the bracket on the far right corresponding to serous tumors. Statistical|#x0002a;*P<0.01 and ***P<0.001. To further verify that 39-gene signature characterizes serous EOC, we examined additional ovarian datasets. The dataset GSE6822 (N = 74) [25] contains a majority of serous specimens (N = 46, 62%) and includes benign, borderline and invasive tumors. It|umors of undetermined origin. Over half of all serous tumor cells are invasive (malignant) and the remaining are borderline (low malignant potential) or benign [14] . Another dataset GSE10971 (N = 37) contains samples from non-malignant fallopian tube epithelium and high-grade serous carcinoma. Hierarchical clustering divided the samples into two groups (data no|igure S2B ). Only one sample from the latter group was high-grade serous (N = 1, 4%). In summary, the data from our training set combined with the data presented using GSE6822, GSE10971 and GSE6008{{tag}}--REUSE-- datasets suggests that the LPA-signature characterizes serous EOC. Since evidence suggests the 39-gene signature characterizes serous EOC from ovarian datasets ( Figure 1 , F| and data not shown), we questioned whether it also classified prognosis in ovarian cancers. For this analysis, we acquired two ovarian cancer datasets containing patient outcomes. The first dataset (GSE12418; N = 54) [26] was used to examined the predictive value of the 39-gene signature and it contains serous samples from different stages. Hierarchical cluste|00a;= 16, 53%) ( Figure 2C ). 10.1371/journal.pone.0005583.g002 Figure 2 The LPA signature corresponds to worsened outcome in ovarian cancer patients. (A) The patient dataset, GSE12418, N = 54, was downloaded from the NCBI Entrez GEO DataSets website and analyzed for gene expression changes among all included transcripts in the 39-gene signature. Hierarch|he microarray chip. Among these drivers, we sought to determine whether a singular transcript is responsible for LPA-positive signature characterization. Using the most complete dataset we acquired, (GSE6008{{tag}}--REUSE--; N = 103) [24] , we analyzed the drivers and determined whether classification of LPA-positive serous tumors could be achieved using only CLDN1 , CYR61 , |ed levels in serous (P<0.001) ( Figure 3H ). 10.1371/journal.pone.0005583.g003 Figure 3 CLDN1 is a biomarker for serous epithelial ovarian carcinoma specimens. Box-plot analyses of data from GSE6008{{tag}}--REUSE--, N = 103 show the drivers THBS1 (A, B), TNC (C), IL-8 (D), MUC1 (E, F), CYR61 (G), CLDN1 (H), KRT23 (I) and FN1 (J, K, L) normalized comparison of gene expre|inent categories. (1.60 MB TIF) Click here for additional data file. Figure S2 Various ovarian cancer datasets demonstrate that the 39-gene signature characterizes serous EOC. (A) The patient dataset GSE6822 (N = 74) [25] was examined with the genes available contained in the 39-gene signature and hierarchical clustering separated a group (N =&#|um carcinoma. (1.79 MB TIF) Click here for additional data file. Figure S3 CLDN-1 and cell adhesion-related proteins drive the clustering of the LPA transcriptomic signature. (A) The patient dataset, GSE6008{{tag}}--REUSE--, N = 103, was downloaded from the NCBI Entrez GEO DataSets website and analyzed using a previously-identified set of drivers, CLDN1, CYR61, FN1, IL-8, MUC1, THBS1, and TNC. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
388 | GSE6008 | 4/9/2007 | ['6008'] | [] | [u'16452189', u'19843521'] | 2566571 | [u'18826610'] | [u'Akyol', 'Feng', 'Hendrix', u'Misek', u'Hanash', 'Schwartz', 'Kadikoy', 'Bommer', 'Iura', 'Cho', 'Wu', 'Giordano', u'Zhai', 'Fearon', 'Kuick', u'Katabuchi', u'Williams', 'Sikorski'] | ['Degeest', 'Watts', 'Rose', 'Holtan', 'Domann', 'Futscher'] | [] | BMC Med Genomics | 2008 | 9/30/2008 | 0 | expression study, Hendrix et al. analyzed 99 individual ovarian tumors: 35 stage I, 11 stage II, 44 stage III, 9 stage IV, and 4 normal ovary samples on Affymetrix HG_U133A GeneChips (gene expression omnibus accession number: GSE6008{{tag}}--DEPOSIT--). Of the 659 CpG-rich clones with DNA methylation changes in our CGI array data, 201 could be mapped to the U133A GeneChip using gene names: 126 were hyper-methylated with d | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
389 | GSE6008 | 4/9/2007 | ['6008'] | [] | [u'16452189', u'19843521'] | 2655730 | [u'18945643'] | [u'Akyol', 'Feng', 'Hendrix', u'Misek', u'Hanash', 'Schwartz', 'Kadikoy', 'Bommer', 'Iura', 'Cho', 'Wu', 'Giordano', u'Zhai', 'Fearon', 'Kuick', u'Katabuchi', u'Williams', 'Sikorski'] | ['Liu', 'Kuick', 'Hanash', 'Richardson'] | ['Kuick', 'Hanash'] | Clin Immunol | 2009 | 2009 Feb | 0 | to compare groups, and fold-changes were estimated using the ratio of the group means, after replacing means of less than 50 with 50. The array data are available from NCBI’s Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ ) using series accession number GSE6008{{tag}}--DEPOSIT-- . Flow Cytometric Analysis The following monoclonal antibodies were used: CYC-anti-CD3, FITC-anti-CD8, CYC-anti-CD4, F | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
390 | GSE6019 | 10/11/2007 | ['6019'] | ['3118'] | [u'18474652'] | 2446712 | [u'18474652'] | ['Jones', 'Engwerda', 'Zhou', 'Amante', 'McSweeney', 'Haque', 'Hill', 'Stanley', 'Randall', 'Boyle'] | ['Jones', 'Engwerda', 'Zhou', 'Amante', 'McSweeney', 'Haque', 'Hill', 'Stanley', 'Randall', 'Boyle'] | ['Jones', 'Engwerda', 'Zhou', 'Amante', 'McSweeney', 'Haque', 'Hill', 'Stanley', 'Randall', 'Boyle'] | Infect Immun | 2008 | 2008 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
391 | GSE6024 | 5/1/2007 | ['6024'] | [] | [u'17439654'] | 1896003 | [u'17439654'] | ['Cai', 'Vaughn', 'von', 'Kim'] | ['Cai', 'Vaughn', 'von', 'Kim'] | ['Cai', 'Vaughn', 'von', 'Kim'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
392 | GSE6025 | 5/1/2007 | ['6025'] | [] | [u'17439654'] | 1896003 | [u'17439654'] | ['Cai', 'Vaughn', 'von', 'Kim'] | ['Cai', 'Vaughn', 'von', 'Kim'] | ['Cai', 'Vaughn', 'von', 'Kim'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
393 | GSE6028 | 2/1/2007 | ['6028'] | [] | [u'17085572'] | 1797385 | [u'17085572'] | ['Goetz', 'Mertins', 'Hillen', 'Ecke', 'Goebel', u'M\xfcller-Altrock', 'Joseph', 'Sprehe', 'M\xc3\xbcller-Altrock', 'Seidel'] | ['Goetz', 'Mertins', 'Hillen', 'Ecke', 'Goebel', 'Joseph', 'Sprehe', 'M\xc3\xbcller-Altrock', 'Seidel'] | ['Goetz', 'Mertins', 'Hillen', 'Ecke', 'Goebel', 'Joseph', 'Sprehe', 'M\xc3\xbcller-Altrock', 'Seidel'] | J Bacteriol | 2007 | 2007 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
394 | GSE6037 | 1/1/2007 | ['6037'] | [] | [u'17138696'] | 1693961 | [u'17138696'] | [u'Kaemper', u'Mueller', 'Lessing', 'Winterberg', 'Schirawski', 'M\xc3\xbcller', 'Kahmann', 'K\xc3\xa4mper', 'Eichhorn'] | ['Lessing', 'Winterberg', 'Schirawski', 'M\xc3\xbcller', 'Kahmann', 'K\xc3\xa4mper', 'Eichhorn'] | ['Lessing', 'Winterberg', 'Schirawski', 'M\xc3\xbcller', 'Kahmann', 'K\xc3\xa4mper', 'Eichhorn'] | Plant Cell | 2006 | 2006 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
395 | GSE6038 | 1/1/2007 | ['6038'] | [] | [u'17138696'] | 1693961 | [u'17138696'] | [u'Kaemper', u'Mueller', 'Lessing', 'Winterberg', 'Schirawski', 'M\xc3\xbcller', 'Kahmann', 'K\xc3\xa4mper', 'Eichhorn'] | ['Lessing', 'Winterberg', 'Schirawski', 'M\xc3\xbcller', 'Kahmann', 'K\xc3\xa4mper', 'Eichhorn'] | ['Lessing', 'Winterberg', 'Schirawski', 'M\xc3\xbcller', 'Kahmann', 'K\xc3\xa4mper', 'Eichhorn'] | Plant Cell | 2006 | 2006 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
396 | GSE6039 | 1/1/2007 | ['6039'] | [] | [u'17138696'] | 1693961 | [u'17138696'] | [u'Kaemper', u'Mueller', 'Lessing', 'Winterberg', 'Schirawski', 'M\xc3\xbcller', 'Kahmann', 'K\xc3\xa4mper', 'Eichhorn'] | ['Lessing', 'Winterberg', 'Schirawski', 'M\xc3\xbcller', 'Kahmann', 'K\xc3\xa4mper', 'Eichhorn'] | ['Lessing', 'Winterberg', 'Schirawski', 'M\xc3\xbcller', 'Kahmann', 'K\xc3\xa4mper', 'Eichhorn'] | Plant Cell | 2006 | 2006 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
397 | GSE6041 | 10/8/2007 | ['6041'] | [] | [u'17704236'] | 2048735 | [u'17704236'] | ['Kolotilin', 'Bar-Or', 'Levin', 'Reuveni', 'Tadmor', 'Chen', 'Koltai', 'Nahon', 'Shlomo', 'Meir'] | ['Kolotilin', 'Bar-Or', 'Levin', 'Reuveni', 'Tadmor', 'Chen', 'Koltai', 'Nahon', 'Shlomo', 'Meir'] | ['Kolotilin', 'Bar-Or', 'Levin', 'Reuveni', 'Tadmor', 'Chen', 'Koltai', 'Nahon', 'Shlomo', 'Meir'] | Plant Physiol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
398 | GSE6043 | 10/16/2007 | ['6043'] | [] | [u'17638893'] | 2683727 | [u'19401680'] | ['Smith', 'Li', 'Polunovsky', 'Peterson', 'Larsson', 'Avdulov', 'Bitterman', 'Issaenko'] | ['Stormo', 'Foat'] | [] | Mol Syst Biol | 2009 | 2009 | 1 | n it was in a Δ vts1 strain, Oberstrass et al (2006) used microarrays to look for mRNAs that were differentially expressed in a wild-type versus a Δ vts1 strain (Gene Expression Omnibus (GEO) accession GSE3859). They confirmed with Northern blots that a few transcripts were present at different levels between the two strains and contained predicted Vts1p-binding sites. Aviv et al |ree microarray time courses in Figure 3B examined gene expression in activated Drosophila eggs or embryos during early embryogenesis ( Pilot et al , 2006 ; Tadros et al , 2007 ; GEO accessions GSE8910, GSE3955). In both wild-type time courses, having high-affinity Smaug-binding sites correlates with reduced mRNA concentration starting at the 2–4 h time point and T1 (slow phase). This obs| the heavier fractions. Qin et al (2007) performed such microarray profiling of mRNA–ribosome association during a time course of the first 10 h of Drosophila development (GEO accession GSE5430). By examining the TFAPs of the Smaug specificities over two replicates each of a 0–2 h sample and a 4–6 h sample, we see that mRNAs that are bound by Smaug are being specifically e| activity that we see with the Smaug SCREs. The Dm3 and Dm4 specificities were detected using microarray data that compared expression in wild-type flies and flies lacking the Kep1 RBP (GEO accession GSE6086), suggesting that they may represent the specificity of Kep1. There were no significant themes among Gene Ontology, phenotype, or in situ annotations for Dm3 and Dm4, besides that their targets s|icing ( Fruscio et al , 2003 ; Robard et al , 2006 ). Finally, Dm5 and Dm6 both were detected using polysome association data from the early Drosophila embryo ( Qin et al , 2007 ; GEO accession GSE5430). Both have strong correlations during the 0–2 h time point and have almost no effect by 4–6 h. Dm5 has a strong positive correlation with the lightest fraction, suggesting that tra|c colorectal cancer cell line, SW620, versus a non-metastatic cell line from the same patient, SW480, as measured in a polysome association microarray study ( Provenzani et al , 2006 ; GEO accession GSE2509). Transcripts containing Hs2 SCREs are expressed at a lower level in U937 cells that have been exposed to 12-myristate 13-acetate (PMA) and caused to differentiate into a macrophage-like state ( Ki| correlate with increased association with ribosomes in human mammary epithelial cells, regardless of whether translation initiation factor 4F is overexpressed ( Larsson et al , 2007 ; GEO accession GSE6043{{tag}}--MENTION--). Although a Smaug homolog exists in the human genome, we did not detect a Smaug/Vts1p-like specificity in the data that we analyzed. Nevertheless, we could calculate TFAPs for the Drosophila Sma|ok for Smaug activities that were too weak to detect in the original search. Indeed, there were two RBP pull-down microarray studies, one for poly-pyrimidine tract binding protein (PTB; GEO accession GSE6021; Gama-Carvalho et al , 2006 ) and one for Staufen1 and Staufen2 ( Furic et al , 2008 ; GEO accessions GSE8437, GSE8438), where pulled-down mRNAs were enriched for Smaug-binding sites ( Figure 7C|about half of the genome. We used the mean 5′ and 3′ UTR lengths from the known half to provide approximate mRNA sequences for the unknown half. Drosophila transcript sequences were downloaded from FlyBase ( Wilson et al , 2008 ), and the longest transcript representing each gene in Drosophila melanogaster was used for further analysis. When estimating full-length mRNAs in other Dros|ype can have nearly identical sequences in multiple places in the genome and be represented by multiple microarray measurements, it can cause false correlations. Genome-wide expression data Data were downloaded from the NCBI GEO ( Barrett et al , 2007 ). All data were purged of extreme outliers by using the Grubbs' test ( Grubbs, 1969 ; P -value ⩽10 −9 ). We used datasets for further ana|m2 did not correlate with mRNA levels in similarly treated Δ smg eggs (not shown). Dm3 and Dm4 correlated with mRNA levels changing between wild-type and Δ kep1 flies (GEO accession GSE6086), suggesting that Dm3 and Dm4 may reflect the specificity of Kep1, an RNA-binding protein. ( C ) Dm5 and Dm6 were detected from microarray data measuring mRNA association with ribosomes in early dr | 1 | 0 | 1 | NOT pmc_gds | 0 | 1 |
399 | GSE6046 | 10/5/2007 | ['6046'] | [] | [u'17872505'] | 2040081 | [u'17872505'] | ['Shomron', 'Kitzman', 'Hornstein', 'Burge', 'Sandberg', u'Neilsen', 'Nielsen'] | ['Shomron', 'Kitzman', 'Hornstein', 'Burge', 'Sandberg', 'Nielsen'] | ['Shomron', 'Hornstein', 'Burge', 'Nielsen', 'Sandberg', 'Kitzman'] | RNA | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
400 | GSE6050 | 2/1/2007 | ['6050'] | [] | [u'17565378'] | 1888727 | [u'17565378'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | PLoS One | 2007 | 6/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
401 | GSE6051 | 2/1/2007 | ['6051'] | [] | [u'17565378'] | 1888727 | [u'17565378'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | PLoS One | 2007 | 6/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
402 | GSE6052 | 2/1/2007 | ['6052'] | [] | [u'17565378'] | 1888727 | [u'17565378'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | ['Hawkins', 'Bhonagiri', 'Speck', 'Lovett', 'Alvarado', 'Powder', 'Warchol', 'Bashiardes', 'Sajan'] | PLoS One | 2007 | 6/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
403 | GSE6079 | 4/24/2007 | ['6079'] | [] | [u'17486137'] | 2673710 | [u'17486137'] | ['Jurisica', 'Brown', 'Emili', 'Cox', 'Kannan', 'Okubo', 'Frey', 'Kislinger', 'Rossant', 'Wigle', 'Hogan'] | ['Jurisica', 'Brown', 'Emili', 'Cox', 'Kannan', 'Okubo', 'Frey', 'Kislinger', 'Rossant', 'Wigle', 'Hogan'] | ['Jurisica', 'Brown', 'Emili', 'Cox', 'Okubo', 'Kannan', 'Frey', 'Kislinger', 'Rossant', 'Wigle', 'Hogan'] | Mol Syst Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
404 | GSE6090 | 4/24/2007 | ['6090'] | [] | [u'17496896'] | 2610483 | [u'19125200'] | ['McMichael', 'Schwartz', 'Baban', u'Batyka', 'Sharrocks', 'Hodges', 'Kessler', 'Edelmann', 'Moris', 'Davies', 'Drakesmith', 'Simmons'] | ['Battaglia', 'Rizzetto', 'Paola', 'Rocca-Serra', 'Beltrame', 'Gambineri', 'Cavalieri'] | [] | PLoS One | 2009 | 2009 | 1 | AND pmc_gds | 0 | 1 | ||||
405 | GSE6091 | 1/12/2007 | ['6091'] | [] | [u'17181398'] | 2268762 | [u'17181398'] | ['Komatsu', 'Semenza', 'Bosch-Marce', 'Hadjiargyrou'] | ['Komatsu', 'Semenza', 'Bosch-Marce', 'Hadjiargyrou'] | ['Komatsu', 'Semenza', 'Bosch-Marce', 'Hadjiargyrou'] | J Bone Miner Res | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
406 | GSE6096 | 1/8/2007 | ['6096'] | [] | [u'17142363'] | 1828658 | [u'17142363'] | ['Fong', 'Cohen', 'Hutchinson', 'Kao', 'Hu', 'Huang', 'Vroom'] | ['Fong', 'Cohen', 'Hutchinson', 'Kao', 'Hu', 'Huang', 'Vroom'] | ['Fong', 'Cohen', 'Hutchinson', 'Kao', 'Hu', 'Huang', 'Vroom'] | Appl Environ Microbiol | 2007 | 2007 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
407 | GSE6097 | 1/3/2007 | ['6097'] | [] | [u'17085549'] | 1797412 | [u'17085549'] | ['Livny', 'Slamti', 'Waldor'] | ['Livny', 'Slamti', 'Waldor'] | ['Livny', 'Slamti', 'Waldor'] | J Bacteriol | 2007 | 2007 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
408 | GSE6108 | 4/24/2007 | ['6108'] | [] | [u'17486137'] | 2673710 | [u'17486137'] | ['Jurisica', 'Brown', 'Emili', 'Cox', 'Kannan', 'Okubo', 'Frey', 'Kislinger', 'Rossant', 'Wigle', 'Hogan'] | ['Jurisica', 'Brown', 'Emili', 'Cox', 'Kannan', 'Okubo', 'Frey', 'Kislinger', 'Rossant', 'Wigle', 'Hogan'] | ['Jurisica', 'Brown', 'Emili', 'Cox', 'Okubo', 'Kannan', 'Frey', 'Kislinger', 'Rossant', 'Wigle', 'Hogan'] | Mol Syst Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
409 | GSE6110 | 8/30/2007 | ['6110'] | [] | [u'17670906'] | 2758575 | [u'17670906'] | ['Bridenbaugh', 'Sampson', 'Catania', 'Zawieja', 'Parrish', 'Vaidya', 'Bonventre', 'Akintola', 'Burghardt', 'Dearman', 'Chen'] | ['Bridenbaugh', 'Sampson', 'Catania', 'Zawieja', 'Parrish', 'Vaidya', 'Bonventre', 'Akintola', 'Burghardt', 'Dearman', 'Chen'] | ['Bridenbaugh', 'Sampson', 'Catania', 'Zawieja', 'Parrish', 'Bonventre', 'Akintola', 'Chen', 'Burghardt', 'Dearman', 'Vaidya'] | Am J Physiol Renal Physiol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
410 | GSE6114 | 4/24/2007 | ['6114'] | [] | [u'17446861'] | 1868902 | [u'17446861'] | ['Durand-Dubief', 'Grunstein', 'Wright', 'Sinha', 'Bonilla', u'Fagerstr\xf6m-Billai', 'Fagerstr\xc3\xb6m-Billai', 'Ekwall'] | ['Durand-Dubief', 'Grunstein', 'Wright', 'Sinha', 'Bonilla', 'Fagerstr\xc3\xb6m-Billai', 'Ekwall'] | ['Durand-Dubief', 'Grunstein', 'Wright', 'Sinha', 'Bonilla', 'Fagerstr\xc3\xb6m-Billai', 'Ekwall'] | EMBO J | 2007 | 5/16/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
411 | GSE6128 | 8/20/2007 | ['6128'] | [] | [u'17663798'] | 2014778 | [u'17663798'] | ['Perou', 'Sartor', 'Carey', 'Troester', 'Sawyer', 'Bernard', 'Fan', 'Hoadley', 'Weigman', 'Rieger-House', 'He'] | ['Perou', 'Sartor', 'Carey', 'Troester', 'Sawyer', 'Bernard', 'Fan', 'Hoadley', 'Weigman', 'Rieger-House', 'He'] | ['Perou', 'Sartor', 'Carey', 'Troester', 'Sawyer', 'Bernard', 'Fan', 'Hoadley', 'Weigman', 'Rieger-House', 'He'] | BMC Genomics | 2007 | 7/31/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
412 | GSE6130 | 4/20/2007 | ['6130'] | [] | [u'17525107'] | 2822536 | [u'20158879'] | ['Parker', 'Ellis', 'Bernard', 'Perou', 'Quackenbush', 'Perreard', 'Bayer', 'Szabo', 'Mullins', 'Gauthier'] | ['Kirillov', 'Nikolsky', 'Bessarabova', 'Nikolskaya', 'Bugrim', 'Shi'] | [] | BMC Genomics | 2010 | 2/10/2010 | 0 | ion study, we confirmed the phenomenon of bimodality and the ability of bimodal genes to form co-expressed clusters using four datasets carried out on standard Affymetrix and Agilent array platforms: GSE1456 [ 29 ], GSE7390 [ 30 ], GSE4922 [ 31 ], and an Agilent data set (Table 1 ). The Agilent dataset was formed as a non-redundant set of 193 samples from four studies: GSE1992 [ 32 ], GSE2740 [ 33 ], |ression patterns. In all five datasets, bimodality was defined by τ = 2.64 and standard deviation over 25th percentile of the distribution. a Recognized genes for each platform. Sorlie295 GSE1456 GSE7390 GSE4922 Agilent set Platform cDNA Affymetrix Affymetrix Affymetrix Agilent Bimodal genes 2476 (10604 a ) 5075 (12017 a ) 5440 (12017 a ) 4874 (12017 a ) 4983 (13379 a ) First, we compared t|nes out of the array of 10604 genes [ 28 ]. Using these parameters, we calculated sets of bimodal genes using the validation datasets of 5075, 5440, 4872, and 4983 genes from the independent datasets GSE1456, GSE7390, GSE4922, and the Agilent data set respectively (Table 1 ). Figure 2 Bimodal genes . (A) Distribution of GRB7 expression among 295 patients (Sorlie295 dataset). The green line marks the t|#x02248;-1 and uGRB7 = 1. Binary intersections of the pairs of bimodal genes from different datasets are large and statistically significant (Table 2 ). The largest intersection was for the datasets GSE7390 and GSE1456 at 3587 common bimodal genes - 66% of all bimodal genes for GSE7390 and 70% of all bimodal genes for GSE1456. The datasets Sorlie295 and GSE4922 had the smallest intersection of 1121 co|ed to estimate p-values. SetA SetB All genes intersection Bimodal genes intersection Bimodal genes for set A a Bimodal genes for set B a p-value Agilent Sorlie295 9433 1237 3661 2219 8.81E-77 Agilent GSE1456 10301 1830 3961 4307 5.86E-13 Agilent GSE4922 10301 1799 3961 4099 2.14E-20 Agilent GSE7390 10301 1839 3961 4551 0.000154 Sorlie295 GSE1456 9367 1173 2223 3851 3.49E-37 Sorlie295 GSE4922 9367 1121 |al genes with synchronised expression in accordance with physiological conditions. Figure 4 Signal normalization for bimodal genes . (A) Expression profiles for genes FOXA1 and GATA3 in Sorlie295 and GSE1456 data sets before normalization. (B) Expression profiles for genes FOXA1 and GATA3 in Sorlie295 and GSE1456 data sets before normalization and after normalization. Signal normalization also helped t|ized expression of the same two genes, FOXA1 and GATA3, was compared between experiments run on two array platforms: cDNA array, Sorlie 295 [ 28 ] and Affymetrix (Affymetrix Human Genome U133A Array) GSE1456 [ 29 ]. The original expression profiles of the two genes had different intensity intervals (Figure 4A ), while the normalized expression values ranged between -1 and 1. (Figure 4B ). We generate In the validation study, we confirmed the phenomenon of bimodality and the ability of bimodal genes to form co-expressed clusters using four datasets carried out on standard Affymetrix and Agilent array platforms: GSE1456 [29], GSE7390 [30], GSE4922 [31], and an Agilent data set (Table 1). The Agilent dataset was formed as a non-redundant set of 193 samples from four studies: GSE1992 [32], GSE2740 [33], GSE2741 [34], and GSE6130{{key}}--REUSE-- [35]. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
413 | GSE6151 | 1/22/2007 | ['6151'] | [] | [] | 2535664 | [u'18798691'] | [u'Dubos'] | ['Hazen', 'Michael', 'Priest', 'Chory', 'Kay', 'Mockler', 'Breton'] | [] | PLoS Biol | 2008 | 9/16/2008 | 1 | rmone mutants should preferentially affect genes that are normally expressed during the growth phase. To test this hypothesis, we analyzed the genes affected in selected phytohormone mutants by using publicly available Affymetrix microarray datasets in either the ArrayExpress or the Gene Expression Omnibus (GEO) databases ( Materials and Methods ). To identify the genes associated with the GA pathway, we | noise factor (δ). For mutant or condition comparisons, gcRMA normalized data were fit using a linear model in the R Bioconductor limma package with a p < 0.01 cutoff. Datasets were downloaded from the ArrayExpress or GEO Web site: AtGenExpress light treatments GSE5617 and tissue (7-d-old cotyledons, hypocotyls, and roots), E-TABM-17; shade avoidance (low R/FR), E-MEXP-443 [ 36 &|-344; DELLApenta ( ga1-3 gai-t6 rga-t2 rgl1-1 rgl2-1 ) mutant, E-MEXP-849 [ 24 ]; arf6-2 arf8-3 , GSE2848 [ 25 ]; ein5-1 , GPL198 [ 26 ]; abi1-1 , GSE6151{{tag}}--REUSE--; brx , E-MEXP-635&7 [ 27 ]. Whole datasets were downloaded from the respective Web sites, and all CEL files from a given experiment were normalized together, regardless of | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
414 | GSE6162 | 1/22/2007 | ['6162'] | [] | [u'15535861'] | 2765272 | [u'19546170'] | ['Twell', 'Honys'] | ['Zhang', 'Basler', 'Qeli', 'Grobei', 'Rehrauer', 'Grossniklaus', 'Brunner', 'Ahrens', 'Roschitzki'] | [] | Genome Res | 2009 | 2009 Oct | 0 | Honys and Twell 2004 ). Since the genome sequence and annotation have evolved significantly after the initial design of the probe sets for the ATH1 array (~2001), we reanalyzed these data sets using publicly available, stringently remapped Affymetrix probe sets to exclude probes that hybridize to multiple genome locations. Previous studies had shown that a high percentage of false hybridization results c|in Figure 4A as a dendrogram, and in Supplemental Figure S9 as a heatmap. Gene expression data sets Data from several previous transcriptomics studies on different stages of pollen development were downloaded and further processed: ArrayExpress accession no. E-MEXP-285 for the mature pollen data described in Pina et al. (2005) ; E-TABM-17 for the mature pollen data set from Schmid et al. (2005) ; Gene|inct gene models or protein sequences encoded by different gene models (class 3b), we remapped the Affymetrix probe sets against the TAIR7 database release using the custom CDF libraries (version 10) downloaded from the Brainarray microarray laboratory ( http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download_v10.asp ) and eliminated probe sets that could be mapped to multiple posi | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
415 | GSE6171 | 1/22/2007 | ['6171'] | [] | [] | 2729609 | [u'19638476'] | [u'Okamoto'] | ['Smirnoff', 'Jones', 'Mullineaux', 'Baker', 'Galvez-Valdivieso', 'Lawson', 'Davies', 'Asami', 'Truman', 'Slattery', 'Fryer'] | [] | Plant Cell | 2009 | 2009 Jul | 0 | L exposure and exogenous ABA application. A total of 816 genes were identified as representing a significant overlap between HL and ABA responses and were clustered with data from the Gene Expression Omnibus (GEO) and NASCARRAYS databases. The 30 min, 1 h, and 3 h ABA data come from NASCARRAYS-176, 4 h ABA from GSE7112 , 3 h ABA #2 from GSE6171{{tag}}--REUSE-- , and 3 h HL from GSE7743 . In this TREEVIEW repr|in HL-exposed ABA biosynthesis-deficient mutants. All mutants caused a significant reduction in the expression of the test genes compared with the wild-type controls ( Figure 2C ). A meta-analysis of publicly available microarray data for treatment of seedlings with ABA ( Goda et al., 2008 ) compared with data from HL-exposed seedlings ( Kleine et al., 2007 ; see Methods) revealed that 816 genes were core| genes was induced under both conditions, while 320 were suppressed in response to both treatments (see Supplemental Data Set 1 online). When expression data for these genes were clustered with other publicly available ABA treatment data ( Figure 2D ), a strong correlation was observed between 3 h of HL exposure and plants 3 or 4 h after ABA application at a variety of concentrations (uncentered correlati|iar ABA content varies ( Rossel et al., 2006 ). The primers used in this study for quantitative RT-PCR are given in Supplemental Table 2 online. Bioinformatics Data for Affymetrix ATH1 GeneChips were downloaded from the GEO repository ( http://www.ncbi.nlm.nih.gov/sites/entrez?db=gds ) and the NASCArrays database ( http://affymetrix.Arabidopsis.info/narrays/experimentbrowse.pl ). The HL exposure d|ed that contributed to the significant weighted similarity score were clustered together with other ABA treatment time points from the NASCARRAYS-176 data set and additional ABA treatment data sets ( GSE7112 and GSE6171{{tag}}--REUSE-- ). Hierarchical clustering was performed using CLUSTER ( Eisen et al., 1998 ) and visualized with the program TREEVIEW ( Eisen et al., 1998 ). Complete linkage clustering using an unc | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
416 | GSE6181 | 1/8/2007 | ['6181'] | [] | [] | 3016406 | [u'21143977'] | [u'Willats'] | ['Brochu', 'Grondin', 'Domingue', 'Lerat', 'Girard-Martel', 'Beaulieu', 'Duval', 'Beaudoin'] | [] | BMC Plant Biol | 2010 | 12/10/2010 | 0 | In order to compare gene expression data in IXBhab cells with those of TA(-)hab cells, raw microarray data (CEL file) from IXBhab cells available at GEO (GSE6181{{key}}--REUSE--) or NASC (NASCARRAYS-27) were analyzed using RMA and SAM with the Flexarray software.Ê | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
417 | GSE6190 | 1/20/2007 | ['6190'] | [] | [u'17928405'] | 2096585 | [u'17928405'] | ['Pronk', 'Daran', 'Walsh', 'Tai', 'Daran-Lapujade'] | ['Pronk', 'Daran', 'Walsh', 'Tai', 'Daran-Lapujade'] | ['Pronk', 'Daran', 'Walsh', 'Daran-Lapujade', 'Tai'] | Mol Biol Cell | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
418 | GSE6192 | 1/31/2007 | ['6192'] | [] | [u'17292962'] | 2748096 | [u'19728865'] | ['Gorbe', 'Pocza', 'Falus', 'Jaeger', 'Toth', 'Gilicze', 'Wiener', u'Jeager', 'Kohalmi', 'Papp', 'Keszei'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | [] | BMC Genomics | 2009 | 9/3/2009 | 1 | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192{{tag}}--REUSE-- (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965 GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381 GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029 GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615 GPL2884 Cancer Process Study 0.752 0.0107 GSE7600 GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192{{tag}} GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625 GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528 GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
419 | GSE6195 | 4/26/2007 | ['6195'] | ['2753'] | [u'17483266'] | 2785812 | [u'19917117'] | ['Jayaraman', 'Wood', 'Lee', 'Bentley', 'Bansal'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195{{tag}}--REUSE-- E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
420 | GSE6195 | 4/26/2007 | ['6195'] | ['2753'] | [u'17483266'] | 1932762 | [u'17483266'] | ['Jayaraman', 'Wood', 'Lee', 'Bentley', 'Bansal'] | ['Jayaraman', 'Wood', 'Lee', 'Bentley', 'Bansal'] | ['Jayaraman', 'Wood', 'Lee', 'Bentley', 'Bansal'] | Appl Environ Microbiol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
421 | GSE6206 | 9/1/2007 | ['6206'] | ['3099'] | [u'19584263'] | 2994938 | [u'19584263'] | ['Liu', 'Knabb', 'Macleod', 'Spike'] | ['Liu', 'Knabb', 'Macleod', 'Spike'] | ['Liu', 'Knabb', 'Macleod', 'Spike'] | Mol Cancer Res | 2009 | 2009 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
422 | GSE6209 | 12/18/2007 | ['6209'] | [] | [u'18657035'] | 2223452 | [u'18070897'] | ['Ghanny', 'Alvarez', 'Smith', 'Trevani', 'Pine', 'Soteropoulos', 'Cheng', 'Font\xc3\xa1n', 'Aris', u'Fontan'] | ['Font\xc3\xa1n', 'Aris', 'Soteropoulos', 'Smith', 'Ghanny'] | ['Font\xc3\xa1n', 'Aris', 'Soteropoulos', 'Smith', 'Ghanny'] | Infect Immun | 2008 | 2008 Feb | 0 | Data were deposited in the Gene Expression Omnibus repository under the accession numberÊGSE6209{{tag}}--DEPOSIT--. | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
423 | GSE6210 | 1/1/2007 | ['6210'] | ['3197', '2515'] | [u'17141629'] | 2785812 | [u'19917117'] | ['', 'Huntgeburth', 'Barbatelli', 'Cinti', 'Lowell', 'Choi', 'Lin', 'Vianna', 'Shulman', 'Spiegelman', 'Kim', 'Krauss', 'Tzameli', 'Coppari'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210{{tag}}--REUSE-- Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
424 | GSE6210 | 1/1/2007 | ['6210'] | ['3197', '2515'] | [u'17141629'] | 1764615 | [u'17141629'] | ['', 'Huntgeburth', 'Barbatelli', 'Cinti', 'Lowell', 'Choi', 'Lin', 'Vianna', 'Shulman', 'Spiegelman', 'Kim', 'Krauss', 'Tzameli', 'Coppari'] | ['Huntgeburth', 'Barbatelli', 'Cinti', 'Lowell', 'Choi', 'Lin', 'Vianna', 'Shulman', 'Spiegelman', 'Kim', 'Krauss', 'Tzameli', 'Coppari'] | ['Huntgeburth', 'Barbatelli', 'Shulman', 'Lowell', 'Choi', 'Lin', 'Vianna', 'Spiegelman', 'Cinti', 'Kim', 'Krauss', 'Tzameli', 'Coppari'] | Cell Metab | 2006 | 2006 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
425 | GSE6224 | 12/30/2007 | ['6224'] | [] | [u'17440621'] | 1849891 | [u'17440621'] | [u'D.', u'J.H.', u'M.', u'C.', 'Ekwall', 'Werler', 'Carr', 'Walfridsson', 'Boukaba', 'Murzina', 'Trewick', 'Laue', 'Bonilla', 'Cauwood', 'Allshire', 'Kouzarides', 'Opel', 'Lando'] | ['Ekwall', 'Werler', 'Carr', 'Walfridsson', 'Boukaba', 'Murzina', 'Trewick', 'Laue', 'Bonilla', 'Cauwood', 'Allshire', 'Kouzarides', 'Opel', 'Lando'] | ['Ekwall', 'Carr', 'Walfridsson', 'Boukaba', 'Murzina', 'Trewick', 'Laue', 'Bonilla', 'Werler', 'Allshire', 'Kouzarides', 'Cauwood', 'Opel', 'Lando'] | PLoS One | 2007 | 4/18/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
426 | GSE6226 | 1/9/2007 | ['6226'] | [] | [u'17319742'] | 1803011 | [u'17319742'] | ['Chun', 'Liu', 'Madhani'] | ['Chun', 'Liu', 'Madhani'] | ['Chun', 'Liu', 'Madhani'] | PLoS Pathog | 2007 | 2007 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
427 | GSE6231 | 6/22/2007 | ['6231'] | [] | [u'17449658'] | 1951522 | [u'17449658'] | ['Gomes', 'Georg'] | ['Gomes', 'Georg'] | ['Gomes', 'Georg'] | Eukaryot Cell | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
428 | GSE6236 | 3/26/2007 | ['6236'] | ['2656', '2655'] | [u'17405831'] | 2975422 | [u'21047384'] | ['Lee', 'Josleyn', 'Miller', 'Gherman', 'Cam', 'Danner', 'Goh'] | ['Xu', 'Hu'] | [] | BMC Genomics | 2010 | 11/2/2010 | 0 | essed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test|dary region in the 2-D feature space of average gene expression (AG) versus average difference of gene expression (AD). Fig. 1 . shows the distribution of true DEGs in the 2D space for four datasets: GSE9499, GSE6342, GSE6740_1, and GSE6740_2 from GEO database [ 22 ]. Based on the fact that boundary region is characterized with scarcity of genes, a density based pruning algorithm is proposed here for p|s as collected by Kadota et. al. [ 20 ]. They collected 38 microarray datasets with experimentally determined true DEGS by real-time polymerase chain reaction (RT-PCR). Thirty six of the datasets are downloaded from GEO database [ 22 ]. Without losing generality, we experimented with 17 disease or dose response datasets of Homo sapiens out of the 36 GEO datasets (Table 1 ). The 17 datasets are reported j|how that real-world GEO datasets, especially historical ones, tend to have small sample size. Table 1 17 Datasets with 284 DEGs in total. Each dataset has 22833 genes. Dataset Conditions True DEG A B GSE1462 4 4 4 GSE1615_1 4 5 8 GSE1650 18 12 8 GSE2666_2 5 5 6 GSE3524 16 4 4 GSE3860 9 9 8 GSE4917 3 3 5 GSE5667_1 5 6 3 GSE6236{{tag}}--REUSE-- 14 14 7 GSE6344 10 10 19 GSE6740_1 10 10 40 GSE6740_2 10 10 62 GSE7146 6 6 6|GSE8441 11 11 9 GSE9499 15 7 77 GSE9574 15 14 5 The 17 Datasets used here cover a variety of biological or medical studies: GSE1462 (mitochondrial DNA mutations), GSE1615_1 (Valproic acid treatment), GSE1650 (chronic obstructive pulmonary disease), GSE2666_2(bone marrow Rho level effect), GSE3524 (tumor of epithelial tissue), GSE3860 (Hutchinson-Gilford progeria syndrome), GSE4917 (breast cancer), GSE5|67_1 (atopic dermatitis), GSE6236{{tag}}--REUSE-- (Adult vs. fetal reticulocyte transcriptome comparison), GSE6344 (renal cell carcinoma disease), GSE6740_1 (HIV-infection), GSE6740_2 (HIV-infection, disease state), GSE7146 (hyperinsulinaemic, does response), GSE7765 (dose response, DMSO or 100 nM Dioxin), GSE8441 (dietary intake response), GSE9574 (breast cancer), and GSE9499 (hypomorphic germline mutations). The div|ch as WAD and FC. To illustrate the bias of popular DEG identification algorithms, Fig. 2 shows true positive (TP), true negative (TN), false positive (FP), and false negative (FN) DEGs for dataset GSE9499 which has 77 true DEGs. Fig. 2 . (a) shows that fold change (FC) misses most true DEGs (FN genes), which are located in the region below the threshold average difference and with high expression le|obtaining a user-specified number of candidate DEGs. Table 2 Comparison of No. of missing true DEGs after DB pruning. ( N 0 = 4, R 0 = 0.0017) Total Gene: 22283 After DP-pruning True DEG DP missed GSE1462 2054 4 0 GSE1615_1 2449 8 3 GSE1650 1317 8 2 GSE2666_2 1618 6 2 GSE3524 814 4 0 GSE3860 2073 8 0 GSE4917 785 5 1 GSE5667_1 1316 3 0 GSE6236{{tag}}--REUSE-- 2231 7 0 GSE6344 3127 19 0 GSE6740_1 1183 40 1 GSE6740_2 |DEG to have high expression levels and high expression difference. Table 3 Ranks of true DEGs in original gene list and pruned gene list. Genes are sorted by four DEG identification algorithms on the GSE1577 dataset. Increase of ranks of true DEGs means that DB pruning have correctly filtered out many non-DEGs. t-test/tTest’ 1404/808 7/6 1321/768 3800/1713 4741/1975 3633/1659 4145/1828 606/388 |on level remains the same. And the average difference of expression can be defined as sum of average difference among pairwise comparisons. Our DB pruning is implemented using C++ and Perl and can be downloaded from http://mleg.cse.sc.edu/degprune . Authors’ contributions JH initiated the project, proposed the DB pruning idea, helped in experimental designs, and wrote the manuscript. J. X. develo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
429 | GSE6238 | 1/24/2007 | ['6238'] | ['2654'] | [u'15960800'] | 1933513 | [u'17609390'] | ['Carter', 'Fuchs', 'Swaroop', 'Greenhall', 'Yoshida', 'Helton', 'Barlow', 'Lockhart'] | ['Schork', 'C\xc3\xa1ceres', 'Zapala', 'Greenhall', 'Libiger', 'Barlow', 'Lockhart'] | ['Greenhall', 'Barlow', 'Lockhart'] | Genome Res | 2007 | 2007 Aug | 0 | sion data from a study on inflammatory bowel disease were analyzed ( Burczynski et al. 2006 ). Data for peripheral blood samples on Affymetrix HG-U133A arrays were obtained from GEO accession number GSE3365 ( http://www.ncbi.nlm.nih.gov/projects/geo/ ) and directly from the investigators. The aim of the study was to identify gene expression signatures from peripheral blood mononuclear cells that coul|etions or insertions and genes with different splice forms ( Winzeler et al. 1998 ; Hu et al. 2001 ; Li and Wong 2001 ). We further illustrated the additional information that can be generated with publicly available data files that contain detailed clinical or phenotypic information. Using GeSNP, we identified several well-known inflammatory bowel disease candidate genes and many new, promising candida|act Results Discussion Methods References Methods Computer software The algorithm was written in standard ANSI C++ and compiled to run on UNIX. The extensively commented source code is available for download from Supplemental materials at the Genome Research Web site and the GeSNP Web site, http://porifera.ucsd.edu/~cabney/cgi-bin/geSNP.cgi . In addition, the GeSNP Web site hosts a user-friendly Web-b|within a probe set, a greater t -value, and/or multiple probe sets representing a single gene provide increased confidence that a true sequence difference exists for that gene. Annotation files were downloaded from Affymetrix ( http://www.affymetrix.com/analysis/index.affx ). Additional information on candidate genes was obtained from NCBI’s Entrez ( http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?|Standard protocols were used for the generation of cDNA from RNA. Primers were designed to amplify the regions defined by the Affymetrix probe set target sequences of the selected genes, which can be downloaded from the Affymetrix Analysis Center Web site. Standard PCR reactions were performed on an Applied Biosystems GeneAmp PCR System 9700, and PCR products were purified using the recommended procedures|e accessed at http://porifera.ucsd.edu/~cabney/cgi-bin/geSNP.cgi . The Affymetrix CEL files for the mouse studies and the human/chimpanzee array data have been submitted to GEO under accession nos. GSE6238{{tag}}--DEPOSIT-- and GSE7540 , respectively.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6307307  Other Sectionsâ�¼ Abstract Results Discuss | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
430 | GSE6245 | 2/1/2007 | ['6245'] | [] | [u'17360459'] | 1805699 | [u'17360459'] | ['Dominska', 'Buck', 'Petes', 'Lieb', 'Mieczkowski'] | ['Dominska', 'Buck', 'Petes', 'Lieb', 'Mieczkowski'] | ['Dominska', 'Buck', 'Petes', 'Lieb', 'Mieczkowski'] | Proc Natl Acad Sci U S A | 2007 | 3/6/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
431 | GSE6256 | 9/12/2007 | ['6256'] | [] | [u'17913887'] | 2000452 | [u'17913887'] | ['Schwegmann', 'Hide', 'Brombacher', 'Horsnell', 'Seoighe', 'Ryan', u'Seioghe', 'Kottmann', 'Flemming', 'Guler', 'Cutler', 'Arendse', 'Leitges'] | ['Schwegmann', 'Hide', 'Brombacher', 'Horsnell', 'Seoighe', 'Ryan', 'Kottmann', 'Flemming', 'Guler', 'Cutler', 'Arendse', 'Leitges'] | ['Schwegmann', 'Hide', 'Brombacher', 'Horsnell', 'Seoighe', 'Ryan', 'Kottmann', 'Flemming', 'Guler', 'Cutler', 'Arendse', 'Leitges'] | Proc Natl Acad Sci U S A | 2007 | 10/9/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
432 | GSE6262 | 10/8/2007 | ['6262'] | [] | [u'17584934'] | 2394771 | [u'17584934'] | ['Roydasgupta', 'Gajduskova', 'Snijders', 'Pinkel', 'Fridlyand', 'Kwek', 'Tokuyasu', 'Albertson'] | ['Roydasgupta', 'Gajduskova', 'Snijders', 'Pinkel', 'Fridlyand', 'Kwek', 'Tokuyasu', 'Albertson'] | ['Roydasgupta', 'Gajduskova', 'Snijders', 'Pinkel', 'Fridlyand', 'Kwek', 'Tokuyasu', 'Albertson'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
433 | GSE6267 | 6/7/2007 | ['6267'] | [] | [u'17764504'] | 1892921 | [u'17287356'] | ['Nettleton', 'Zhou', 'Zhang', 'Scanlon', 'Borsuk', 'Buckner', 'Smith', 'Timmermans', 'Beck', 'Ohtsu', 'Janick-Buckner', 'Chen', 'Emrich', 'Schnable'] | ['Vemuri', 'Eiteman', 'McEwen', 'Olsson', 'Nielsen'] | [] | Proc Natl Acad Sci U S A | 2007 | 2/13/2007 | 0 | AND pmc_gds | 0 | 1 | ||||
434 | GSE6267 | 6/7/2007 | ['6267'] | [] | [u'17764504'] | 2156186 | [u'17764504'] | ['Nettleton', 'Zhou', 'Zhang', 'Scanlon', 'Borsuk', 'Buckner', 'Smith', 'Timmermans', 'Beck', 'Ohtsu', 'Janick-Buckner', 'Chen', 'Emrich', 'Schnable'] | ['Nettleton', 'Zhou', 'Zhang', 'Scanlon', 'Borsuk', 'Buckner', 'Smith', 'Timmermans', 'Beck', 'Ohtsu', 'Janick-Buckner', 'Chen', 'Emrich', 'Schnable'] | ['Nettleton', 'Zhou', 'Zhang', 'Scanlon', 'Borsuk', 'Buckner', 'Smith', 'Timmermans', 'Beck', 'Ohtsu', 'Janick-Buckner', 'Chen', 'Emrich', 'Schnable'] | Plant J | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
435 | GSE6269 | 3/16/2007 | ['6269'] | [] | [u'17105821'] | 2620272 | [u'19014681'] | ['', 'Ardura', 'Chung', 'Mejias', 'Palucka', 'Allman', 'Wittkowski', 'Glaser', 'Ramilo', 'Chaussabel', 'Banchereau', 'Piqueras'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | luate switch-like gene expression patterns in high-dimensional datasets profiling diverse biological conditions. For this purpose, we compiled two large-scale gene expression microarray datasets from publicly available data repositories. The first dataset included samples spanning nineteen different tissue types from healthy donors. The second dataset included samples from donors with one of a number of i| genes may serve as candidate biomarkers or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epiderm|133, GSE2361, GSE3419, GSE3526, GSE7307 Heart 38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, G| Ovary 10 GSE2361, GSE3526, GSE6008, GSE7307 Pancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of|on regions of DNA that code for switch-like genes and their promoter regions. Methods Datasets Microarray datasets used in this study were compiled from the online public repositories Gene Expression Omnibus (GEO) [ 53 ] and Array Express (AE) [ 54 ] as described in additional file 2 . All datasets were profiled on the HGU133A or its recently expanded version, the HGU133plus2 Affymetrix platforms. The da|ssi A Lee C Relative impact of nucleotide and copy number variation on gene expression phenotypes Science 2007 315 848 853 17289997 10.1126/science.1136678 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic acids research 2002 30 207 210 11752295 10.1093/nar/30.1.207 Parkinson H Kapushesky M Shojatalab M Abeygunawardena N Coulson R Fa | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
436 | GSE6269 | 3/16/2007 | ['6269'] | [] | [u'17105821'] | 2797819 | [u'19961616'] | ['', 'Ardura', 'Chung', 'Mejias', 'Palucka', 'Allman', 'Wittkowski', 'Glaser', 'Ramilo', 'Chaussabel', 'Banchereau', 'Piqueras'] | ['Geman', 'Price', 'Edelman', 'Toia', 'Zhang'] | [] | BMC Genomics | 2009 | 12/5/2009 | 0 | s Classification Task Tissue Source Samples (Positive/Negative) GEO ID # Probes GI Stromal Tumor vs Leiomyosarcoma GI Biopsy 68 (37/31) N/A 43,931 Crohn's Disease vs Healthy Controls PBMC 101 (59/42) GDS1615 22,283 Ischemic vs Idiopathic Cardiomyopathy Cardiac Biopsy 194 (86/108) GSE5406 22,283 Type I Diabetes vs Healthy Controls PBMC 105 (81/24) GSE9006 22,283 Type II Diabetes vs Healthy Controls PBMC| Ulcerative Colitis W/WO Transformation Colon Biopsy 54 (11/43) GSE3629 54,681 Gram-Negative vs Gram-Positive Infection PBMC 73 (29/44) GSE6269{{tag}}--REUSE-- 22,283 Gram-Negative vs Viral Infection PBMC 62 (18/44) GSE6269{{tag}}--REUSE-- 22,283 HIV Infection vs Healthy Controls PBMC 86 (74/12) GDS1449 8793 Microarray gene expression datasets obtained from the Gene Expression Omnibus. Transcriptional analysis was performed on either | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
437 | GSE6269 | 3/16/2007 | ['6269'] | [] | [u'17105821'] | 2911917 | [u'20576155'] | ['', 'Ardura', 'Chung', 'Mejias', 'Palucka', 'Allman', 'Wittkowski', 'Glaser', 'Ramilo', 'Chaussabel', 'Banchereau', 'Piqueras'] | ['Griffin', 'Layh-Schmitt', 'Hinze', 'Barnes', 'Thornton', 'Grom', 'Colbert', 'Glass', 'Aronow', 'Fall', 'Thompson', 'Mo'] | [] | Arthritis Res Ther | 2010 | 2010 | 0 | AND pmc_gds | 0 | 1 | ||||
438 | GSE6269 | 3/16/2007 | ['6269'] | [] | [u'17105821'] | 1801073 | [u'17105821'] | ['', 'Ardura', 'Chung', 'Mejias', 'Palucka', 'Allman', 'Wittkowski', 'Glaser', 'Ramilo', 'Chaussabel', 'Banchereau', 'Piqueras'] | ['Ardura', 'Chung', 'Mejias', 'Palucka', 'Allman', 'Wittkowski', 'Glaser', 'Ramilo', 'Chaussabel', 'Banchereau', 'Piqueras'] | ['Ardura', 'Chung', 'Mejias', 'Palucka', 'Allman', 'Wittkowski', 'Glaser', 'Banchereau', 'Chaussabel', 'Ramilo', 'Piqueras'] | Blood | 2007 | 3/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
439 | GSE6271 | 7/4/2007 | ['6271'] | [] | [] | 2262072 | [u'18261244'] | [u'Little', u'Taylor', u'Grimmond', u'Ovchinnikov', u'Woods', u'Ricardo', u'Rae', u'Sasmono', u'Hume', u'Campanale'] | ['Pompon', 'Juvan', 'Aggerbeck', 'Rezen', 'Roth', 'Rozman', 'Fon', 'Kuzman', 'Meyer'] | [] | BMC Genomics | 2008 | 2/11/2008 | 0 | AND pmc_gds | 0 | 1 | ||||
440 | GSE6273 | 1/1/2007 | ['6273'] | [] | [u'17277777', u'18927605'] | 2556089 | [u'18927605'] | ['Barrera', u'Calcar', 'Hawkins', 'Van', 'Fu', 'Heintzman', 'Hon', 'Weng', 'Ren', 'Ching', 'Green', 'Stuart', 'Qu', 'Crawford', 'Wang'] | ['Hon', 'Ren', 'Wang'] | ['Hon', 'Ren', 'Wang'] | PLoS Comput Biol | 2008 | 2008 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
441 | GSE6274 | 3/1/2007 | ['6274'] | [] | [u'17242211'] | 1820508 | [u'17242211'] | ['Bergstrom', 'Yao', 'Cao', 'Tanaka', 'Tapscott', 'Kooperberg'] | ['Bergstrom', 'Yao', 'Cao', 'Tanaka', 'Tapscott', 'Kooperberg'] | ['Bergstrom', 'Yao', 'Cao', 'Tanaka', 'Tapscott', 'Kooperberg'] | Mol Cell Biol | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
442 | GSE6289 | 12/29/2007 | ['6289'] | [] | [u'17337545'] | 1892849 | [u'17337545'] | ['Borden', u'Paredes', 'Papoutsakis'] | ['Borden', 'Papoutsakis'] | ['Borden', 'Papoutsakis'] | Appl Environ Microbiol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
443 | GSE6291 | 1/26/2007 | ['6291'] | [] | [u'17227908'] | 2118428 | [u'17227908'] | ['Burns', 'Jiang', 'Blazar', 'Frommer', 'Weissman', 'Nelson-Holte', 'Heremans', u'Hu', 'Tolar', 'Panoskaltsis-Mortari', 'Bryder', u'Montoya', 'Pelacho', 'Verfaillie', 'Serafini', 'Oki', 'Buckley', 'Dylla', 'Fine', 'Rossi', "O'Shaughnessy"] | ['Burns', 'Jiang', 'Blazar', 'Frommer', 'Weissman', 'Nelson-Holte', 'Heremans', 'Tolar', 'Panoskaltsis-Mortari', 'Bryder', 'Pelacho', 'Verfaillie', 'Serafini', 'Oki', 'Buckley', 'Dylla', 'Fine', 'Rossi', "O'Shaughnessy"] | ['Bryder', 'Burns', 'Jiang', 'Frommer', 'Nelson-Holte', 'Blazar', 'Weissman', 'Tolar', 'Panoskaltsis-Mortari', 'Heremans', 'Verfaillie', 'Pelacho', 'Serafini', 'Oki', 'Buckley', 'Dylla', 'Fine', 'Rossi', "O'Shaughnessy"] | J Exp Med | 2007 | 1/22/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
444 | GSE6292 | 1/1/2007 | ['6292'] | [] | [] | 2884167 | [u'20423484'] | [u'Calcar', u'Barrera', u'Hawkins', u'Rosenfeld', u'Wang', u'Heintzman', u'Hon', u'Ye', u'Luna', u'Glass', u'Ching', u'Ren', u'Kim', u'Stuart', u'Webster', u'Harp'] | ['Stadler', 'Binder', 'Fasold'] | [] | BMC Bioinformatics | 2010 | 4/27/2010 | 0 | ] and gcRMA [ 4 ], dChip [ 5 ], MAS5 [ 6 ], Plier [ 7 ]; see, e.g., ref [ 8 ] for a mini-review). Figure 1 (a) Fluorescence image of a hybridized Affymetrix GeneChip Mouse Genome MG430 2.0 array (GEO GSE12545) . The chip surface divides into a grid of nearly 10 6 fluorescing probe spots. The bright horizontal stripe matches with probes the 25meric sequence of which starts with triple degenerated guani|del of rank r = 3 to the intensity data of the three data sets given in Table 2 . Table 2 Chip characteristics of selected data sets studied. Data Set HG133A_S Mouse ENCODE HG133P_Z HG133A_Z GEO a GSE1133 GSE12545 GSE6292{{tag}}--REUSE-- GSE3061 GSE3061 Chip type HG U133A MG 430 2.0 Human Tiling HG U133plus2 HG U133A # probes × 10 6 b ≈ 0.5 ≈ 1.0 ≈ 1.5 ≈ 1.2 ≈ 0.5 # |42.8 ⟨ log L N ⟩ chip e 2.0 2.3 1.1 1.94 2.09 log I max e 4.48 4.71 3.45 4.32 4.45 %( GGG ) 1 probes f 2% 1.9% 2% 2% 2% %( GGG ) 1 probesets f 20% 19% - 20% 20% a Gene expression omnibus (GEO) accession number b number of probes and of probesets per array c pseudo sets are assembled using five consecutive probes d percentage of absent probes per array e mean value of the logged n | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
445 | GSE6292 | 1/1/2007 | ['6292'] | [] | [] | 2802188 | [u'20008927'] | [u'Calcar', u'Barrera', u'Hawkins', u'Rosenfeld', u'Wang', u'Heintzman', u'Hon', u'Ye', u'Luna', u'Glass', u'Ching', u'Ren', u'Kim', u'Stuart', u'Webster', u'Harp'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
446 | GSE6300 | 11/14/2007 | ['6300'] | [] | [u'17553483'] | 2140239 | [u'17553483'] | ['Firth', 'Baker'] | ['Firth', 'Baker'] | ['Firth', 'Baker'] | Dev Biol | 2007 | 7/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
447 | GSE6302 | 2/28/2007 | ['6302'] | [] | [u'17327914'] | 1803021 | [u'17327914'] | ['Ihmels', 'Levy', 'Weinberger', 'Carmi', 'Friedlander', 'Barkai'] | ['Ihmels', 'Levy', 'Weinberger', 'Carmi', 'Friedlander', 'Barkai'] | ['Ihmels', 'Levy', 'Weinberger', 'Carmi', 'Friedlander', 'Barkai'] | PLoS One | 2007 | 2/28/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
448 | GSE6313 | 1/1/2007 | ['6313'] | [] | [u'17555589'] | 1899500 | [u'17555589'] | ['Punzo', 'Jenssen', 'Ohno-Machado', 'Cepko', 'Kuo', 'Liu', 'Hovig', 'Trimarchi'] | ['Punzo', 'Jenssen', 'Ohno-Machado', 'Cepko', 'Kuo', 'Liu', 'Hovig', 'Trimarchi'] | ['Punzo', 'Jenssen', 'Ohno-Machado', 'Cepko', 'Kuo', 'Liu', 'Hovig', 'Trimarchi'] | BMC Genomics | 2007 | 6/7/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
449 | GSE6318 | 3/1/2007 | ['6318'] | [] | [u'17684006'] | 1913252 | [u'17684006'] | ['Rossignol', 'Logue', 'Grenon', 'Butler', 'Lowndes', 'Reynolds'] | ['Rossignol', 'Logue', 'Grenon', 'Butler', 'Lowndes', 'Reynolds'] | ['Rossignol', 'Logue', 'Grenon', 'Butler', 'Lowndes', 'Reynolds'] | Antimicrob Agents Chemother | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
450 | GSE6319 | 8/1/2007 | ['6319'] | [] | [u'17664279'] | 2099221 | [u'17664279'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | Mol Cell Biol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
451 | GSE6325 | 3/31/2007 | ['6325'] | ['2722'] | [u'17394671'] | 1851953 | [u'17394671'] | ['Wilson', 'Condamine', 'Ismail', 'Close', 'Walia', 'Cui', 'Xu'] | ['Wilson', 'Condamine', 'Ismail', 'Close', 'Walia', 'Cui', 'Xu'] | ['Wilson', 'Condamine', 'Ismail', 'Close', 'Walia', 'Cui', 'Xu'] | BMC Genomics | 2007 | 3/30/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
452 | GSE6325 | 3/31/2007 | ['6325'] | ['2722'] | [u'17394671'] | 2739230 | [u'19706179'] | ['Wilson', 'Condamine', 'Ismail', 'Close', 'Walia', 'Cui', 'Xu'] | ['Close', 'Wilson', 'Ismail', 'Cui', 'Walia'] | ['Close', 'Wilson', 'Ismail', 'Cui', 'Walia'] | BMC Genomics | 2009 | 8/25/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
453 | GSE6326 | 8/1/2007 | ['6326'] | [] | [u'17664279'] | 2099221 | [u'17664279'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | Mol Cell Biol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
454 | GSE6327 | 8/1/2007 | ['6327'] | [] | [u'17664279'] | 2099221 | [u'17664279'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | Mol Cell Biol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
455 | GSE6328 | 8/1/2007 | ['6328'] | [] | [u'17664279'] | 2099221 | [u'17664279'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | Mol Cell Biol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
456 | GSE6331 | 8/1/2007 | ['6331'] | [] | [u'17664279'] | 2099221 | [u'17664279'] | ['', 'Stanton', 'Jin', 'Rodriguez', 'Wyrick', 'Kitazono'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | ['Jin', 'Rodriguez', 'Wyrick', 'Stanton', 'Kitazono'] | Mol Cell Biol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
457 | GSE6339 | 11/6/2007 | ['6339'] | [] | [u'17968324'] | 2764086 | [u'19893615'] | ['Franc', 'Fontaine', u'Malthi\xe8ry', 'Triau', 'Houlgatte', 'Mirebeau-Prunier', 'Malthi\xc3\xa8ry', 'Rodien', 'Savagner'] | ['Franc', 'Raharijaona', 'Fontaine', 'Go\xc3\xabau-Brissonni\xc3\xa9re', 'Triau', 'Houlgatte', 'Mirebeau-Prunier', 'Rodien', 'Malthiery', 'Karayan-Tapon', 'Savagner', 'Mello'] | ['Franc', 'Fontaine', 'Triau', 'Houlgatte', 'Mirebeau-Prunier', 'Rodien', 'Savagner'] | PLoS One | 2009 | 10/29/2009 | 0 | le 1 . The Fontaine and the Giordano datasets are referred as the main datasets. All the datasets, generated by various microarray platforms ( Table 1 ), are publicly available. The Fontaine dataset (GSE6339{{tag}}--REUSE--), the He dataset (GSE3467) and the Reyes dataset (GSE3678) are hosted by the Gene Expression Omnibus (GEO) database of the U.S. National Center for Biotechnology Information. The Weber dataset was | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
458 | GSE6351 | 10/30/2007 | ['6351'] | [] | [u'17660462'] | 2752178 | [u'17660462'] | ['Saksela', 'Aaltonen', 'Aittom\xc3\xa4ki', 'Kokko', u'Aittom\xe4ki', 'Vahteristo'] | ['Aaltonen', 'Saksela', 'Kokko', 'Aittom\xc3\xa4ki', 'Vahteristo'] | ['Aaltonen', 'Saksela', 'Kokko', 'Aittom\xc3\xa4ki', 'Vahteristo'] | J Med Genet | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
459 | GSE6360 | 10/8/2007 | ['6360'] | [] | [u'17584934'] | 2394771 | [u'17584934'] | ['Roydasgupta', 'Gajduskova', 'Snijders', 'Pinkel', 'Fridlyand', 'Kwek', 'Tokuyasu', 'Albertson'] | ['Roydasgupta', 'Gajduskova', 'Snijders', 'Pinkel', 'Fridlyand', 'Kwek', 'Tokuyasu', 'Albertson'] | ['Roydasgupta', 'Gajduskova', 'Snijders', 'Pinkel', 'Fridlyand', 'Kwek', 'Tokuyasu', 'Albertson'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
460 | GSE6364 | 5/1/2007 | ['6364'] | ['2737'] | [u'17510236'] | 2752458 | [u'19735579'] | ['Nezhat', '', 'Burney', 'Giudice', 'Vo', 'Hamilton', 'Talbi', 'Nyegaard', 'Lessey'] | ['Zhao', 'He', 'Wang', 'Pan', 'Bai'] | [] | Reprod Biol Endocrinol | 2009 | 9/8/2009 | 1 | microarray raw or normalized data are available. Finally six public gene expression data sets were involved in our study, which assessed endometriosis transcripts on a genome-wide basis. In data set GSE7307, total 677 samples from more than 90 distinct tissue types were processed, but only the profiles related to endometriosis and eutopic endometrium were considered here. The data generated from human| Characteristics of datasets included in the studies. First Author or Contributor Chip GEO Accession Experimental design Classification Probes Number of samples Disease Normal Sha [ 4 ] U133 PLUS 2.0 GSE7846 unpaired, HEECS ovarian 54K 5 5 Burney [ 14 ] U133 PLUS 2.0 GSE6364{{tag}}--REUSE-- unpaired, tissues Ovarian, peritoneal, rectovaginal 54K 21 16 Eyster [ 15 ] CodeLink GSE5108 paired, tissues Ovarian, peritoneal |nd adjusted, normalized and log2 probe-set intensities calculated using the Robust Multichip Averaging (RMA) algorithm in affy package [ 23 , 24 ], and the Codelink arrays normalizations performed in GSE5108 were retained. Genes which cannot be mapped to any KEGG pathway identified were excluded from the further analysis. The interquartile range (IQR) was used as a measure of variability. From the resu | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
461 | GSE6364 | 5/1/2007 | ['6364'] | ['2737'] | [u'17510236'] | 2744474 | [u'19692421'] | ['Nezhat', '', 'Burney', 'Giudice', 'Vo', 'Hamilton', 'Talbi', 'Nyegaard', 'Lessey'] | ['Nezhat', 'Burney', 'Giudice', 'Vo', 'Hamilton', 'Lessey', 'Aghajanova'] | ['Nezhat', 'Burney', 'Giudice', 'Vo', 'Hamilton', 'Lessey'] | Mol Hum Reprod | 2009 | 2009 Oct | 0 | with disease versus no disease are published as supplemental data on The Endocrine Society's Journals Online website at http://endo.endojournals.org . The data were submitted to the Gene Expression Omnibus database under the identifier GSE6364{{tag}}--DEPOSIT-- . To explore the biologic relationship between the differentially expressed messenger- and miRNAs identified in our analysis of ESE from women with versus witho | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
462 | GSE6385 | 1/15/2007 | ['6385'] | [] | [u'17220878'] | 2878307 | [u'20478020'] | ['Liu', 'Ozsolak', 'Fisher', 'Song'] | ['Tanaka', 'Nakai', 'Suzuki', 'Yamashita'] | [] | BMC Genomics | 2010 | 5/17/2010 | 0 | e 1 ). All positions of repetitive elements and RefSeq genes were obtained from "rmsk" files and the "refGene.txt" file in the UCSC database, respectively. The tiling array data for human promoters (GSE6385{{tag}}--REUSE--) were downloaded from Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ [ 9 ]. About 40 million raw sequence reads of precise TSSs from HEK293 and MCF7 cells were obtained from DBTSS http: | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
463 | GSE6385 | 1/15/2007 | ['6385'] | [] | [u'17220878'] | 2802188 | [u'20008927'] | ['Liu', 'Ozsolak', 'Fisher', 'Song'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
464 | GSE6404 | 11/29/2007 | ['6404'] | [] | [u'17993546'] | 2230561 | [u'17993546'] | ['Kovacs', 'Fekete', 'Gonzalo', 'Schachtman', 'Fung', 'Marsh', 'Qiu', 'McIntyre', 'He'] | ['Kovacs', 'Fekete', 'Gonzalo', 'Schachtman', 'Fung', 'Marsh', 'Qiu', 'McIntyre', 'He'] | ['Kovacs', 'Fekete', 'Gonzalo', 'Schachtman', 'Fung', 'Marsh', 'Qiu', 'McIntyre', 'He'] | Plant Physiol | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
465 | GSE6408 | 5/1/2007 | ['6408'] | [] | [u'17701894'] | 1950826 | [u'17701894'] | [u'Cr\xe9au', 'Aubert', u'Bl\xe9haut', 'Dauphinot', u'A\xeft', 'Creau', 'Bl\xc3\xa9haut', 'Potier', 'Golfier', 'Rivals', 'Rossier', 'Prieur', 'Personnaz', 'Delabar', 'A\xc3\xaft', 'Robin'] | ['Aubert', 'Dauphinot', 'Creau', 'Bl\xc3\xa9haut', 'Potier', 'Golfier', 'Rivals', 'Rossier', 'Prieur', 'Personnaz', 'Delabar', 'A\xc3\xaft', 'Robin'] | ['Aubert', 'Dauphinot', 'Creau', 'Bl\xc3\xa9haut', 'Potier', 'Golfier', 'Rivals', 'Rossier', 'Prieur', 'Personnaz', 'Delabar', 'A\xc3\xaft', 'Robin'] | Am J Hum Genet | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
466 | GSE6409 | 1/25/2007 | ['6409'] | [] | [] | 1891340 | [u'17567999'] | [u'Jin', u'Iyengar', u'Farnham', u'Green', u"O'Geen"] | ['Jin', 'Iyengar', 'Farnham', 'Green', "O'Geen"] | ['Jin', 'Iyengar', 'Farnham', 'Green', "O'Geen"] | Genome Res | 2007 | 2007 Jun | 0 | rmations of OCT4 and SRY target genes To better understand the biological functions for these identified OCT4 and SRY target genes, we applied the FatiGO program ( Al-Shahrour et al. 2005 ), which is publicly available online at http://fatigo.bioinfo.cipf.es/ , to characterize the promoters that are bound by both OCT4 and SRY versus a group of promoters bound only by OCT4 or only by SRY ( Fig. 6 ). The F| the members of the Farnham laboratory for helpful discussions. Footnotes [Supplemental material is available online at www.genome.org . The OCT4, SRY, and KAP1 ChIP-chip data has been deposited in GSE6409{{tag}}--DEPOSIT-- .] Article is online at http://www.genome.org/cgi/doi/10.1101/gr.6006107 Other Sections� Abstract Results Discussion Methods References References Aerts S., Van Loo P., Thijs G., Moreau Y. | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
467 | GSE6411 | 2/2/2007 | ['6411'] | [] | [u'17255947'] | 1794385 | [u'17255947'] | ['van', 'Greil', 'de', 'Bussemaker'] | ['van', 'Greil', 'de', 'Bussemaker'] | ['van', 'Greil', 'de', 'Bussemaker'] | EMBO J | 2007 | 2/7/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
468 | GSE6414 | 4/26/2007 | ['6414'] | ['3238'] | [] | 1914191 | [u'17556519'] | [u'Le', u'Wang', u'Harada', u'Bui', u'Goldberg', u'Wagmaister'] | ['Le', 'Kawashima', 'Harada', 'Bui', 'Goldberg', 'Wagmaister'] | ['Wagmaister', 'Harada', 'Bui', 'Le', 'Goldberg'] | Plant Physiol | 2007 | 2007 Jun | 0 | parentheses refers to suspensor transcripts not detected in other seed compartments at the level of the GeneChip (i.e. suspensor-specific transcripts). Raw data were deposited in the Gene Expression Omnibus (GEO) as data series GSE6414{{tag}}--DEPOSIT-- ( http://www.ncbi.nlm.nih.gov/geo ) and can also be accessed at http://estdb.biology.ucla.edu/seed . D, Functional category distribution of soybean suspensor transcrip | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
469 | GSE6415 | 2/15/2007 | ['6415'] | [] | [u'17276656', u'20551132'] | 2926625 | [u'20551132'] | ['Lam', 'MacAulay', 'Meijer', 'Ylstra', 'Carvalho', 'Chari', 'Coe', 'Macaulay'] | ['Chari', 'Lam', 'MacAulay', 'Coe'] | ['Chari', 'Lam', 'MacAulay', 'Coe'] | Nucleic Acids Res | 2010 | 2010 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
470 | GSE6423 | 11/27/2007 | ['6423'] | [] | [u'18261244'] | 2262072 | [u'18261244'] | ['Pompon', 'Juvan', 'Aggerbeck', 'Rezen', 'Roth', 'Rozman', 'Fon', 'Kuzman', 'Meyer'] | ['Pompon', 'Juvan', 'Aggerbeck', 'Rezen', 'Roth', 'Rozman', 'Fon', 'Kuzman', 'Meyer'] | ['Pompon', 'Juvan', 'Aggerbeck', 'Rezen', 'Roth', 'Rozman', 'Fon', 'Kuzman', 'Meyer'] | BMC Genomics | 2008 | 2/11/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
471 | GSE6444 | 1/1/2007 | ['6444'] | [] | [u'17158661'] | 1855744 | [u'17158661'] | ['Witkin', 'Gross', 'Rhodius', 'Guisbert', 'Ahuja'] | ['Witkin', 'Gross', 'Rhodius', 'Guisbert', 'Ahuja'] | ['Witkin', 'Gross', 'Rhodius', 'Guisbert', 'Ahuja'] | J Bacteriol | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
472 | GSE6447 | 11/27/2007 | ['6447'] | [] | [u'18261244'] | 2262072 | [u'18261244'] | ['Pompon', 'Juvan', 'Aggerbeck', 'Rezen', 'Roth', 'Rozman', 'Fon', u'Re\u017een', 'Kuzman', 'Meyer'] | ['Pompon', 'Juvan', 'Aggerbeck', 'Rezen', 'Roth', 'Rozman', 'Fon', 'Kuzman', 'Meyer'] | ['Pompon', 'Juvan', 'Aggerbeck', 'Rezen', 'Roth', 'Rozman', 'Fon', 'Kuzman', 'Meyer'] | BMC Genomics | 2008 | 2/11/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
473 | GSE6448 | 2/7/2007 | ['6448'] | [] | [u'17234794'] | 2778664 | [u'19863778'] | ['Climent', 'Lluch', 'Gray', 'Pinkel', 'Dimitrow', 'Fridlyand', 'Palacios', 'Siebert', 'Martinez-Climent', 'Albertson'] | ['Costa', 'van', 'ten', 'Eijk', 'Welsh', 'Schmitt', 'Ylstra', 'Narvaez'] | [] | BMC Genomics | 2009 | 10/28/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
474 | GSE6451 | 2/22/2007 | ['6451'] | ['2653'] | [u'17360324'] | 1829287 | [u'17360324'] | ['Schultz', 'Johnson', 'Kern', 'Liu', 'Walker', 'Luesch'] | ['Schultz', 'Johnson', 'Kern', 'Liu', 'Walker', 'Luesch'] | ['Schultz', 'Johnson', 'Kern', 'Liu', 'Walker', 'Luesch'] | Proc Natl Acad Sci U S A | 2007 | 3/20/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
475 | GSE6452 | 5/31/2007 | ['6452'] | [] | [u'17577409'] | 1929063 | [u'17577409'] | ['Costa', 'Barchuk', 'Sim\xc3\xb5es', u'Paulino', 'Cristino', 'Maleszka', 'Kucharski'] | ['Costa', 'Barchuk', 'Sim\xc3\xb5es', 'Cristino', 'Maleszka', 'Kucharski'] | ['Costa', 'Barchuk', 'Sim\xc3\xb5es', 'Cristino', 'Maleszka', 'Kucharski'] | BMC Dev Biol | 2007 | 6/18/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
476 | GSE6461 | 4/30/2007 | ['6461'] | ['2698'] | [u'17418413'] | 2779485 | [u'19956606'] | ['Haldar', 'Lessnick', 'Hancock', u'Halder', 'Coffin', 'Capecchi'] | ['Kirsch', 'Yoon', 'Jacks', 'Dodd', 'Riedel', 'Lahat', 'Lazar', 'Mito', 'Brigman', 'Lev', 'Mukherjee', 'Eward', 'Hornicek', 'Stangenberg'] | [] | PLoS One | 2009 | 11/30/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
477 | GSE6471 | 2/22/2007 | ['6471'] | [] | [u'17337561'] | 1892884 | [u'17337561'] | ['Wiedmann', 'McGann', 'Boor', 'Ivanek'] | ['Wiedmann', 'McGann', 'Boor', 'Ivanek'] | ['Wiedmann', 'McGann', 'Boor', 'Ivanek'] | Appl Environ Microbiol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
478 | GSE6471 | 2/22/2007 | ['6471'] | [] | [u'17337561'] | 2688707 | [u'18713061'] | ['Wiedmann', 'McGann', 'Boor', 'Ivanek'] | ['Wiedmann', 'McGann', 'Boor', 'Ivanek', 'Raengpradub'] | ['Wiedmann', 'McGann', 'Boor', 'Ivanek'] | Foodborne Pathog Dis | 2008 | 2008 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
479 | GSE6472 | 5/30/2007 | ['6472'] | [] | [] | 2491640 | [u'18605998'] | [u'Chen', u'Lee', u'Fang'] | ['Zheng', 'Ma', 'Shang', 'Liang', 'Chen', 'Liao'] | [u'Chen'] | BMC Genomics | 2008 | 7/7/2008 | 1 | trategy . The red lines represent the iteration between TFBSs prediction and literature mining. Differential genes The Venn diagram in Figure 2A shows the distribution of differential genes between GSE2370 (EBV - /normal) and GSE2371 (EBV + /EBV - ). In brief, of the 260 differentially up-regulated genes in GSE2371, 32 were also up-regulated and 14 others were found down-regulated in GSE2370. Of the |primary infection. Many of these genes have been discussed [ 18 ]. Figure 2 Venn diagrams of the differential genes identified from the data sets used . (A) Intersection of differential genes between GSE2370 and GSE2371; (B) Intersection of differential genes between R15/P1 and R1/P1 of GSE6472{{tag}}; (C) Intersection of differential genes between GSE2371 and GSE2149. Figure 2B shows the Venn diagram of di|1, DEK, RPS28 6 Down-regulated Up-regulated ITGA6, PPP2R2D, SMARCC1 3 Down-regulated Down-regulated DUSP1, ST5, APPBP1, DUSP6, TRIP12, PABPC1, TKT, CD9, IMPDH2, HOXA9 10 The 585 differential genes in GSE2371 (EBV + -NPC) and 729 genes in GSE2149 (EBV + -PEL) were integrated in Figure 2C . The intersection represents 45 overlapping meta-genes (named as meta-B, Table 2 ) expressed in both tumor types, |ndicates that all three cancer-related genes are more important to look at among all others. Table 3 List of the data sets used in this research Accession number Gene chips' type Samples (cell lines) GSE2370 7500 K microarray NPC(TW01, TW03, TW04, TW06, CGBM1)/normal nasal nucosal epithelia GSE2371 7500 K microarray EBV + /EBV - -NPC(TW01, TW03, TW04, TW06, CGBM1)/common reference RNAs GSE6472{{tag}} Agilent |x HG-133A EBV + /EBV - -PEL * Nine cell lines (BC-1, BC-2, BC-3, BC-5, BCBL-1, BCKN, IBL-4, PEL-5 and SM1) derived from patients with lymphomatous effusions and three PEL patient samples were used in GSE2149. With knowledge gathered by in-depth analysis, a detailed regulatory network was set up by joining newly identified meta-genes with related transcriptional factors. As shown in Figure 5 , many of |susceptibility and polymorphism. Methods Data sets Four data sets retrieved from the GEO database are listed in Table 3 and the open-access analysis tools selected are shown in Table 4 . Data sets GSE2370 [ 44 ] and GSE2371 [ 45 ] submitted by Lee contain 15 samples surveyed by the 7500 K microarray representing approximately 7411 distinct human transcripts expressed in five representative NPC cell |ed from a lympho-epitheliomatous undifferentiated NPC; TW04 and TW06 are derived from two distinct undifferentiated carcinomas; and CGBM1 line is derived from bone marrow metastatic NPC tumor tissue. GSE 2370 used the five EBV - NPC cell lines (labeled with cy5) mentioned above against normal nasal mucosal epithelial (labeled with cy3) as a control. GSE2371 used the same five EBV - NPC cell lines and| R15 are NA cells experienced EBV recurrent reactivation one and fifteen times induced artificially by sodium n -butyrate (SB) and 12- o -tetradecanoylphorbol-13-acetate (TPA), respectively. Dataset GSE2149, supplied by Fan [ 47 ] and based on the Affymetrix HG-133A microarray, has eleven samples (21 microarrays) from EBV + /EBV - -PEL. More information of the four data sets is shown in Table 3 . Tab| localization and tissue specificity Data preprocessing The raw data from each experiment was normalized using Lowess smoother (per spot and per chip: intensity-dependent normalization) for data sets GSE2370, GSE2371 and GSE6472{{tag}}--REUSE--, or using median over entire array for GSE2149 to minimize randomness of signals among microarrays and spots. To focus on high-quality and stronger hybrid signal spots, we excl| 100. Filtering on flags, which we required all present calls only, was applied to GSE2370 and GSE2371. Filtering on expression level with threshold of standard error average× 4 were used for GSE6472{{tag}}--REUSE--. Probes with 20% data points missing were then filtered out for GSE2149. Selection of differential genes We utilized GeneSpring GX 7.3.1 (Agilent technologies, US) to analyze two-channel data and B| ]. The following thresholds were used to obtain sets of differential genes as close to those described by the authors of the data sets as possible. The statistical comparison ( p < 0.05) of GSE2370 revealed that 1182 genes were differentially expressed, including 617 genes with greater than 1.765 fold-changes as an up-regulated group and 565 genes with less than -1.765-fold defined as a down-|ally expressed, including 260 genes showing greater than 1.25-fold as up-regulated group and 253 showing less than -1.25-fold as down-regulated group. The differential genes identified from analyzing GSE2370 and GSE2371 were designated as potential target genes of primary EBV infection. Up-regulated or down-regulated genes in GSE6472{{tag}} were identified using an absolute threshold of 1.5-fold. Then, the di|tial genes of R1/P1/R15/P15 were cross-compared to those from GSE2371 to obtain meta-A genes which are targeted by EBV and subjected to EBV reactivation of various duration and frequency. GSE2371 and GSE2149 come from EBV + /EBV - -NPC and EBV + /EBV - -PEL respectively. We collected the common differential meta-B genes infected by EBV between the two tumors by cross-comparing the gene sets obtained af | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
480 | GSE6474 | 1/15/2007 | ['6474'] | [] | [u'17308078'] | 2764428 | [u'19671526'] | ['S\xc3\xbcltmann', u'Sultmann', 'Mund', 'Musch', u'Buness', 'Kuner', 'Brueckner', 'Meister', u'Ruschhaupt', u'Poustka', 'Stresemann', 'Lyko'] | ['Xie', 'Tu', 'Li', 'Liu', 'Yu', 'Hua'] | [] | Nucleic Acids Res | 2009 | 2009 Oct | 1 | es the log 2 -transformed ratio of gene expression between treatment (after transfection) and control (before transfection). ( B ) mRNA level changes of miR-124’s putative non-target genes in GDS2657. Putative non-target genes are the whole set of genes found in GDS2657 minus the putative target genes. ( C ) Numbers of up- (up panel) and down-regulated (below panel) genes at different time poin|. The final TF–gene set included 130 338 relationships between 214 human TFs and 16 534 targets. For more information, see Supplementary Table 3 . MPGE datasets Five groups of MPGE datasets (GDS1858, GDS2657, GSE6474{{tag}}--REUSE--, GSE6838 and GSE7864) were downloaded from the GEO database; these groups include 53 individual datasets involving 19 miRNAs. The GDS1858 dataset group ( 2 ) includes data on HeLa|ansfection with wild type or mutant miR-1, miR-124, or miR-373. GDS2657 ( 13 ) includes gene expression profiles at seven time points (4, 8, 16, 24, 32, 72 and 120 h) after overexpression of miR-124. GSE6474{{tag}}--REUSE-- ( 14 ) includes four replicated measurements of gene expression changes after overexpression of let-7a. GSE6838 ( 15 ) includes gene expression data over a time course (6, 10, 14 and 24 h) after ov|mediators of miRNA-triggered regulation, summarized for each of 53 MPGE datasets Dataset group miRNA Cell line Time point K–S test P -value De-graded targets TF mediators Shuffling P -value GDS1858 miR-1 HeLa 12 1.2 e –15 91 ETS1, CREB1, YY1 0 24 4.7 e –15 107 TFAP2A, CREB1, YY1, SREBF1 0 miR-124 12 1.8 e –12 109 GLI3* 0.04 24 6.5 e –22 132 MLLT7, NKX6.1 0.03 m|02013;73 366 AHR*, RREB1 0 32 6.2 e –64 329 AHR*, SP1*, EGR1, RELA*, RREB1, NR3C1*, SP2 0 72 2.2 e –59 292 CREB1, SP1, ETS1, MLLT7, SP2 0 120 1.1 e –19 144 AHR*, SP1*, MLLT7 0 GSE6474{{tag}}--REUSE-- let-7a3 A549 Not known 1.1 e –2 1 PAX3, HOXA1, BACH2, EGR3, MYC 0.02 GSE6838 let-7c HCT116 Dicer−/− #2 24 1.5 e –54 211 MYC 0.05 miR-103 10 5.7 e –08 82 MEF2|– miR-195 10 4.3 e –30 137 SMAD7, NFATC3 0.05 24 7.4 e –31 98 FOXC1 0.08 miR-20 24 9.3 e –30 59 – – miR-215 24 2.6 e –11 38 – – GSE7864 miR-34a A549 H-1 term 24 1.0 e –29 112 E2F5*, YY1 0.03 HCT116 Dicer −/− #2 24 1.1 e –29 132 E2F3, YY1, NFE2L1 0.02 TOV21G H1-term 24 1.4 e –23 70 E2F5, BACH2|al or greater BIC score than that of the regressed Equation ( 2 ). Many of our predictions are supported by independent experimental studies. For example, our analyses of two MPGE datasets for miR-1 (GDS1858) predicted 130 primary targets; 50 (38.4%) of these targets appeared in TarBase ( 19 ), a database collecting experimentally validated miRNA targets. Some miRNAs, like let-7c, miR-16 and miR-17-5p,|ly (Please find these plots in the ‘wrapped results’ available at http://www.biosino.org/kanghu/~DCR/ supplementary file1.zip). With the exception of let-7a-3 in the A549 cell line (GSE6474{{tag}}--REUSE--), all miRNAs were found to have degradation-inducing ability in all surveyed situations, as the K–S test P -values were exclusively <0.001 ( Table 1 ). Some miRNAs, such as miR-124|ry targets account for a significant proportion of the observed mRNA level changes in an MPGE dataset ( Figure 1 C). For example, at the 32-h time point after miR-124 overexpression (dataset from the GDS2657 group), miRNA’s direct regulation could explain decreased MCs of only 181 genes; with our predicted two-layer regulatory model, the decreased MCs of an additional 98 genes and increased MCs| of another 42 genes were attributed to miRNA’s indirect regulation, raising the proportion of explainable MCs from 27.8 to 47.7%. The classifications of regulated genes at all time-points in GDS2657 are shown in Supplementary Table 8 , where a general trend is evident that the direct regulation decreases rapidly while the secondary regulation maintains at a considerable multitude, resulting i|tered on miRNA and mediated by TFs Figure 3 depicts a typical two-layer regulatory network, mined from an MPGE dataset measured at the 12-h time point after overexpression of miR-1 (dataset from the GDS1858 group). In addition to directly down-regulating 91 degraded targets (blue arrows), miR-1 overexpression causes expression changes in more than 100 non-target genes, possibly through translationally|O:005114, GO:000752; all FDRs < 0.29), consistent with the known fact that miR-1 is expressed selectively in heart and skeletal muscle. Similar analyses were performed on the miR-124 network (GDS2657, 32 h), resulting in the identification of 129 significant biological themes (FDR < 0.25), among which neuron apoptosis (GO:0051402) is in accordance with miR-124’s proven role in d | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
481 | GSE6476 | 7/4/2007 | ['6476'] | ['2803'] | [u'17609676'] | 2949890 | [u'20831831'] | ['Schultz', 'Miller', 'Pletcher', 'Cameron', 'Gulati'] | ['Monsonego', 'Rubin', 'Geifman'] | [] | BMC Bioinformatics | 2010 | 9/12/2010 | 0 | ound 0.456971* 0.505641 External side of plasma membrane 0.744269 0.508488* Not Found (A) Enrichment in genes with up-regulated expression in hippocampus treated with Fluoxetine, based on GEO profile GSE6476{{tag}}--REUSE--. All terms passed FDR > 0.25 and state the FDR values. (B) GEO profile GSE6675. All terms passed FDR < 0.25 and state the FDR values (Up-regulated in Control in comparison to FGF2 t|ture. † indicates terms that were not included in the NIGO subset. For three GEO expression profiles, the results of the analysis are described in detail (Table 1 ). In the analysis of the GSE6476{{tag}}--REUSE--, in which the effect of chronic Fluoxetine treatment on hippocampal gene expression was examined [ 10 ], eleven terms passed statistic filtering with NIGO and not with the full GO, out of which 7 w|ded in the full GO but not in NIGO passed the statistic cutoffs but this term ('Protein self-association') was functionally irrelevant and contributed very little to the analysis. For the GEO profile GSE6675, in which astroglial gene expression program elicited by fibroblast growth factor-2 was examined, four terms relevant to neural/immune systems passed statistic filtering with NIGO and not with the |ith the full GO. This is an example of how without the use of NIGO one would have to raise the cutoff to at least 0.55 in order for this term to appear in the analysis results. Functional analysis of GSE6509 with NIGO also revealed statistically significant terms that were missed when using the full GO. This experiment involved microarray expression profiling designed to explore the effect of RU486, a |ed both in NIGO and the full GO, received higher FDR values when NIGO was used (Table 2 ), contrary to our expectations from the impact of reducing the ontology size. For example, in the analysis of GSE8788 in which gene expression analysis was conducted using Trib1-deficient macrophages treated with LPS as compared to LPS-treated wild-type macrophages [ 12 ], 11 out of 14 terms (that passed statistic|ment in FDR values when using NIGO is partially due to the stochastic nature of the GSEA algorithm. To test this hypothesis, the same analysis was repeated three times with each of the ontologies for GSE8788. FDR values were averaged and a comparison of analysis results was performed based on these averaged FDR values. In accordance with our hypothesis, the averaging of FDR values improved the apparent|l GO and GO-slim in enrichment analysis using GSEA Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.05 2 (2) 5 0 0 2 0 GSE6675 FDR < 0.25 0 6 0 0 1 0 GSE6476{{tag}}--REUSE-- FDR < 0.25 1 (1) 11 0 0 4 0 GSE3779 P < 0.05 7 (3) 13 1 (1) 0 1 5 GSE8425 P < 0.05 6 (1) 0 0 | 0 0 0 1 GSE8788 P < 0.01 6 (6) 0 0 11 3 1 GSE8788* P < 0.01 6 (6) 0 0 3 11 1 GSE9659 P < 0.01 6 (6) 0 NS A 4 28 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.01 5 (5) 0 0 0 1 2 GSE2259 FDR < 0.25 3 (1) 0 1 (1) 1 2 1 GSE8191 P < 0.05 5 (2) 0 0 0 0 0 The enrichment analysis results of GSEA, providing the full GO, NIGO or GO-s| for each subset given in separate columns. *To test the effect of stochasticity, the analysis of this profile is based on 3-fold averaged FDR values from three independent GSEA analyses. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. Three non-neural- or immune-related datasets were used to test the performance of NIGO in functional analysis of microarray data in|. These datasets included the GEO expression profile, GSE7407, in which gene expression in heart tissue with cardiac specific over-expression of Sirt1 was examined [ 13 ], the GEO expression profile, GSE8191, in which the gene expression profile of mammary glands from pregnant mice was compared to that of mammary glands from lactating mice [ 14 ], and the GEO expression profile, GSE2259, in which gene |in enrichment analysis using the Fisher Exact Test Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.0001 1 (1) 1 0 0 22 2 GSE6675 P < 0.05 1 (1) 0 11 (1) 0 2 0 GSE6476{{tag}}--REUSE-- P < 0.0001 0 2 2 0 15 3 GSE3779 P < 0.1 0 0 0 0 2 0 GSE8425 P < 0.1 0 0 0 0 0 0 GSE|0.001 0 2 0 0 9 1 GSE6136 P < 0.000001 0 1 0 0 37 7 GSE8788 P < 0.01 0 2 0 0 4 1 GSE9659 P < 0.001 0 3 NS A 0 33 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.0000000001 33 (30) 0 0 9 0 0 GSE2259 P < 0.1 63 (35) 0 2 8 8 7 GSE8191 P < 0.001 6 (6) 7 6 (2) 0 16 3 Enrichment analysis results from Ontologizer, providing the full |ional GO subset (the full GO or generic slim) are also shown for each term ('Lowest P-Value'), with the number of terms with the lowest P-values for each subset given in separate columns. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. These results, together with the analysis of five additional neural/immune-related experiments not described in Table 1 (namely p| cell's activity. These results further demonstrate that analysis with NIGO can enhance interpretation of functional analysis results produced for relevant microarray datasets. In the analysis of the GSE6509 expression dataset, three relevant terms passed the statistical cutoff with NIGO but not with the full GO. These terms, 'viral envelope', 'viral infectious cycle' and 'viral capsid' are all terms r|et. Such terms received FDR or p-values that were very close (but larger) than the cutoff values used. This is partially explained by the stochastic nature of the GSEA algorithm. Indeed, for one set (GSE8788), we compared the raw results with averaged FDR values. Averaging dramatically decreased the number of such terms. Furthermore, we conducted a similar functional analysis using the Fisher Exact Tes|nous peptide antigen via MHC class I' and 'Positive regulation of T cell mediated cytotoxicity' without defeating the purpose of the slimming process. Yet these two terms were found to be enriched in GSE6476{{tag}}--REUSE--, and are crucial for generating a hypothesis based on the expression profile. This shows that GO slims may be complemented by small, yet fully detailed domain-specific subsets of GO. Figure 4 The U|plementary/NIGO/Supplementary.html . The ontologies were clipped using the Protégé 4.0 beta OWL editor [ 26 ]. For by-species filtering, annotation files for human, rat and mouse were downloaded (October 2008) from GOA-EBI [ 27 ]. Association files used for GSEA analysis were generated based on the GOA-EBI annotation files and in the format required by GSEA. In this format, each row repres|a analysis and functional analysis were conducted using the GenePattern [ 29 ], GSEA [ 9 ] (release 2.5) web servers, and Ontologizer [ 16 ] as follows: (1) for each study, raw data (.CEL files) were downloaded from GEO [ 30 ]; (2) expression files (.gct files) were created using the Gene Pattern Expression File Creator module; (3) where necessary (i.e. for expression files GSE6509, GSE6675, GSE6476{{tag}}--REUSE--, GSE7|was conducted using the GSEA module. GSEA was run three times for each dataset, using a different GO version for each run. For the full GO, we used the organism-specific GO subset. In the analysis of GSE8788, GSEA was run three times for each of the three ontologies and FDR values were averaged over the three runs. (5) Differentially-expressed genes were found using the Gene Pattern ComparativeMarkerSe|, to an input term (Figure 4 ). The connections of UMLS concepts to the tested GO term are provided by UMLS and defined within the UMLS data files. Microarray data sets All microarray data sets were downloaded from GEO at NCBI [ 30 ]. The GEO sets used in this study are described in Additional file 2 . Availability NIGO is freely available as Additional file 6 and for download from: http://bioinfo.bg|re included or excluded from NIGO. Click here for file Additional file 2 Microarray Datasets for Comparative analysis of NIGO, GO and GO-slim . This file contains a summary of the microarray datasets downloaded from GEO at NCBI and used to test the performance of NIGO. Click here for file Additional file 3 GSEA analysis results . This file contains a summary of the GSEA analysis results for each of the mi|etroviral infectivity through different mechanisms Am J Physiol Lung Cell Mol Physiol 2009 297 L538 545 10.1152/ajplung.00162.2009 19561138 The Gene Ontology Consortium http://www.geneontology.org/GO.downloads.ontology.shtml Web Ontology Language Guide http://www.w3.org/TR/owl-guide Protégé 4.0 beta http://protege.stanford.edu Barrell D Dimmer E Huntley RP Binns D O'Donovan C Apweiler R Th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
482 | GSE6477 | 4/1/2007 | ['6477'] | [] | [u'17409404', u'19996089'] | 2901331 | [u'20634887'] | ['', 'Chng', 'Greipp', 'Yin', 'Price-Troska', 'Tiedemann', 'Schmidt', 'Henderson', 'Rajkumar', 'Basu', 'Bergsagel', 'Braggio', 'Que', 'Kim', 'Bryant', 'Kumar', 'Zhu', 'Gertz', 'Mousses', 'Kyle', 'Shi', 'Mulligan', 'Dispenzieri', 'Stewart', 'Chung', 'Fonseca', 'Ahmann', 'Vanwier', 'Perkins', 'Carpten', 'Lacy', 'Azorsa'] | ['Agarwal', 'Vanderkerken', 'Osterborg', 'Eriksson', 'Jernberg-Wiklund', 'Frykn\xc3\xa4s', 'Nilsson', 'Fristedt', 'Oberg', 'Deleu', 'Atadja', 'Kalushkova', 'Lemaire'] | [] | PLoS One | 2010 | 7/9/2010 | 0 | Figure S2A–C ). The selected genes were then additionally confirmed to be underexpressed in MM compared to normal counterpart plasma cells in an independent data set [22] (GSE6477{{tag}}--REUSE--, Figure S3 ). The INK4A/p16 gene was also included, as a previously reported target of the Polycomb group (PcG) proteins [23] . The actively transcribed genes RPL30 and GAPDH were|te as previously described [50] . The five selected target genes (CIITA, CXCL12, GATA2, CDH6 and ICSBP/IRF8) were compared with the Mayo clinic data set from Gene Expression Omnibus (GSE6477{{tag}}--REUSE--). The Mayo clinic dataset contains 15 normal bone marrows that were compared to 147 different grades of MM patients. Purification of MM cells from patient material and MM cell lines Heparinised bon| profile generated after a gene-set enrichment (GSE) analysis of the five PRC2 target genes (CIITA, GATA2, CDH6, CXCL12 and ICSBP/IRF8) when compared to Mayo clinic dataset (Chng, Kumar et al. 2007) (GSE6477{{tag}}--REUSE--) using software GSEA v 2.05 with FDR q-value 0.0181. (0.68 MB TIF) Click here for additional data file. Figure S4 DZNep and LBH589 deplete EZH2 protein expression in a concentration-dependent manne | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
483 | GSE6477 | 4/1/2007 | ['6477'] | [] | [u'17409404', u'19996089'] | 2713475 | [u'19443661'] | ['', 'Chng', 'Greipp', 'Yin', 'Price-Troska', 'Tiedemann', 'Schmidt', 'Henderson', 'Rajkumar', 'Basu', 'Bergsagel', 'Braggio', 'Que', 'Kim', 'Bryant', 'Kumar', 'Zhu', 'Gertz', 'Mousses', 'Kyle', 'Shi', 'Mulligan', 'Dispenzieri', 'Stewart', 'Chung', 'Fonseca', 'Ahmann', 'Vanwier', 'Perkins', 'Carpten', 'Lacy', 'Azorsa'] | ['Pitsillides', 'Runnels', 'Jia', 'Rollins', 'Lin', 'Ngo', 'Sacco', 'Roccaro', 'Thompson', 'Anderson', 'Ghobrial', 'Azab', 'Blotta', 'Melhem'] | [] | Blood | 2009 | 7/16/2009 | 0 | performed in ice-cold PBS. Samples were then analyzed with the use of flow cytometry. Gene expression profiling of RhoA, Rac1, and CDC42 Gene expression datasets from the Mayo Clinic (GEO accession GSE6477{{tag}}--REUSE-- ) were obtained from the Gene Expression Omnibus for analysis and generated by the use of the Affymetrix U133A platform. 21 The data pertaining to RhoA (probe ID 200059_s_at), Rac1 (probe ID 20864|and 3 MM patient samples demonstrating similar expression of both GTPases in all cell lines and patient samples. (B) Gene expression of the GTPases RhoA, Rac1, and CDC42, based on NIH Gene Expression Omnibus database under the accession number GSE6691 , demonstrating significant overexpression of RhoA and Rac1, but not CDC42, GTPases in MM samples compared with normal subjects. Gene expression profiling | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
484 | GSE6477 | 4/1/2007 | ['6477'] | [] | [u'17409404', u'19996089'] | 2531129 | [u'18700954'] | ['', 'Chng', 'Greipp', 'Yin', 'Price-Troska', 'Tiedemann', 'Schmidt', 'Henderson', 'Rajkumar', 'Basu', 'Bergsagel', 'Braggio', 'Que', 'Kim', 'Bryant', 'Kumar', 'Zhu', 'Gertz', 'Mousses', 'Kyle', 'Shi', 'Mulligan', 'Dispenzieri', 'Stewart', 'Chung', 'Fonseca', 'Ahmann', 'Vanwier', 'Perkins', 'Carpten', 'Lacy', 'Azorsa'] | ['Mosca', 'Lionetti', 'Agnelli', 'Deliliers', 'Andronache', 'Neri', 'Ronchetti', 'Fabris'] | [] | BMC Med Genomics | 2008 | 8/13/2008 | 1 | AND pmc_gds | 0 | 1 | ||||
485 | GSE6477 | 4/1/2007 | ['6477'] | [] | [u'17409404', u'19996089'] | 2255064 | [u'18242516'] | ['', 'Chng', 'Greipp', 'Yin', 'Price-Troska', 'Tiedemann', 'Schmidt', 'Henderson', 'Rajkumar', 'Basu', 'Bergsagel', 'Braggio', 'Que', 'Kim', 'Bryant', 'Kumar', 'Zhu', 'Gertz', 'Mousses', 'Kyle', 'Shi', 'Mulligan', 'Dispenzieri', 'Stewart', 'Chung', 'Fonseca', 'Ahmann', 'Vanwier', 'Perkins', 'Carpten', 'Lacy', 'Azorsa'] | ['Valdez', 'Palmer', 'Chng', 'Bergsagel', 'Haas', 'Fonseca', 'Sebag', 'Affer', 'Chesi', 'Robbiani', 'Cattoretti', 'Tiedemann', 'Kremer', 'Stewart'] | ['Tiedemann', 'Fonseca', 'Stewart', 'Chng', 'Bergsagel'] | Cancer Cell | 2008 | 2008 Feb | 1 | was extracted and gene expression profiling performed using the Affymetrix U133A genechip. Electronic Availability of the Data The gene expression data have been previously published (Gene expression omnibus accession number GSE6477{{tag}}--REUSE-- ). Another gene expression dataset comprising of 22 PC, 43 MGUS, 351 MM and 44 human myeloma cell lines performed on the U133plus2.0 gene chip was also analyzed (GEO accessi | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
486 | GSE6477 | 4/1/2007 | ['6477'] | [] | [u'17409404', u'19996089'] | 2868587 | [u'19135901'] | ['', 'Chng', 'Greipp', 'Yin', 'Price-Troska', 'Tiedemann', 'Schmidt', 'Henderson', 'Rajkumar', 'Basu', 'Bergsagel', 'Braggio', 'Que', 'Kim', 'Bryant', 'Kumar', 'Zhu', 'Gertz', 'Mousses', 'Kyle', 'Shi', 'Mulligan', 'Dispenzieri', 'Stewart', 'Chung', 'Fonseca', 'Ahmann', 'Vanwier', 'Perkins', 'Carpten', 'Lacy', 'Azorsa'] | ['Gao', 'Chng', 'Barrios', 'Kilimann', 'Vij', 'Monahan', "O'Neal", 'Hassan', 'Tomasson', 'Lee'] | ['Chng'] | Exp Hematol | 2009 | 2009 Feb | 0 | cy of primer/probes was shown. Microarray Expression Analysis Two datasets were analyzed. The Mayo Clinic dataset included 162 samples [ 26 ](101 MM, 24 SMM, 22 MGUS, and 15 normal PC’s; GEO GSE6477{{tag}}--REUSE-- ). Chromosome 13 status was determined by FISH. The MMRC dataset ( http://www.themmrc.org ) included 100 samples. Chromosome 13 status was determined by aCGH. Expression values were derived against|ession in MM cells. We anticipated that patient samples with del[ 13 ] would have lower NBEA expression than patient samples with without del[ 13 ]. We analyzed two large microarray data sets (Mayo GSE 6477{{tag}}--REUSE--) [ 26 ]and MMRC ( http://www.themmrc.org ; total n=262) for expression changes based on chromosome 13 status. In both datasets, NBEA transcript levels were significantly decreased in patient sam | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
487 | GSE6477 | 4/1/2007 | ['6477'] | [] | [u'17409404', u'19996089'] | 2977975 | [u'20220778'] | ['', 'Chng', 'Greipp', 'Yin', 'Price-Troska', 'Tiedemann', 'Schmidt', 'Henderson', 'Rajkumar', 'Basu', 'Bergsagel', 'Braggio', 'Que', 'Kim', 'Bryant', 'Kumar', 'Zhu', 'Gertz', 'Mousses', 'Kyle', 'Shi', 'Mulligan', 'Dispenzieri', 'Stewart', 'Chung', 'Fonseca', 'Ahmann', 'Vanwier', 'Perkins', 'Carpten', 'Lacy', 'Azorsa'] | ['Van', 'Chung', 'Bergsagel', 'Gertz', 'Baker', 'Keats', 'Carpten', 'Chng', 'Fonseca'] | ['Chng', 'Chung', 'Bergsagel', 'Gertz', 'Fonseca', 'Carpten'] | Leukemia | 2010 | 2010 Apr | 0 | tions� Abstract Introduction Materials and methods Results Discussion Supplementary Material References Materials and methods aCGH and gene expression profiling datasets A Mayo Clinic dataset and the publicly available University of Arkansas Medical School (UAMS) dataset 18 comprising 64 and 67 newly diagnosed multiple myeloma, respectively, were analyzed. The patients from Mayo Clinic were predominantly|ch 16 097 unique map positions were defined. For the Mayo Clinic dataset, gene expression profiling was performed using the Affymetrix U133A V2 chip (data available from GEO through accession number GSE6477{{tag}}--REUSE-- ), whereas for the UAMS dataset, gene expression was performed using the Affymetrix U133plus 2.0 chip (data available from GEO through accession number GSE2658 ). Clinical validation dataset We st | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
488 | GSE6477 | 4/1/2007 | ['6477'] | [] | [u'17409404', u'19996089'] | 2083698 | [u'17692805'] | ['', 'Chng', 'Greipp', 'Yin', 'Price-Troska', 'Tiedemann', 'Schmidt', 'Henderson', 'Rajkumar', 'Basu', 'Bergsagel', 'Braggio', 'Que', 'Kim', 'Bryant', 'Kumar', 'Zhu', 'Gertz', 'Mousses', 'Kyle', 'Shi', 'Mulligan', 'Dispenzieri', 'Stewart', 'Chung', 'Fonseca', 'Ahmann', 'Vanwier', 'Perkins', 'Carpten', 'Lacy', 'Azorsa'] | ['Bergsagel', 'Chng', 'Greipp', 'Chesi', 'Price-Troska', 'Bruhn', 'Tiedemann', 'Brents', 'Schop', 'Dispenzieri', 'Sebag', 'Braggio', 'Keats', 'Carpten', 'Bryant', 'Barrett', 'Kumar', 'Zhu', 'Baker', 'Mancini', 'Henry', 'Shi', 'Mulligan', 'Stewart', 'Valdez', 'Fonseca', 'Ahmann', 'Fogle', 'Trent', 'Van'] | ['Kumar', 'Chng', 'Bergsagel', 'Fonseca', 'Ahmann', 'Braggio', 'Greipp', 'Price-Troska', 'Tiedemann', 'Carpten', 'Shi', 'Zhu', 'Mulligan', 'Dispenzieri', 'Bryant', 'Stewart'] | Cancer Cell | 2007 | 2007 Aug | 0 | letions observed in this patient sample. Interestingly, inactivation of TRAF3, cIAP1/2, and CYLD appears to be a common event in MM, as we identified a number of potential bi-allelic deletions in the publicly available dataset from Carrasco et al., based on a correlation between the aCGH copy number and the expression level ( Figure S4 )( Carrasco et al., 2006 ). Combining these two patient datasets did n|was determined by Annexin-V-Alexa 647 (Invitrogen) staining. Electronic availability of the Data A complete list of the patient samples and various tests are provided in Supplementary Table S6 . The publicly available datasets used in this paper are available in gene expression omnibus (GEO) or institutional websites as follows, Carrasco et al. GEP ( GSE4452 ) and aCGH ( http://genomic.dfci.harvard.edu/a he GEP data from our patient dataset is available in the GEO database under the accession number GSE6477{{tag}}--REUSE--. | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
489 | GSE6481 | 4/1/2007 | ['6481'] | ['2736'] | [u'17464315'] | 2779485 | [u'19956606'] | ['Kawai', 'Nemoto', 'Ichikawa', 'Toyama', 'Seki', 'Ohta', 'Hasegawa', 'Nakayama', 'Yoshida', 'Takahashi'] | ['Kirsch', 'Yoon', 'Jacks', 'Dodd', 'Riedel', 'Lahat', 'Lazar', 'Mito', 'Brigman', 'Lev', 'Mukherjee', 'Eward', 'Hornicek', 'Stangenberg'] | [] | PLoS One | 2009 | 11/30/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
490 | GSE6482 | 1/1/2007 | ['6482'] | [] | [u'17349582'] | 2180156 | [u'17349582'] | ['Gao', 'La', 'Dittmer', 'Mesri', 'Asgari', 'Rafii', 'Eroles', 'Chiozzini', 'Mutlu', 'Hooper', 'Vincent', 'Hilsher', 'Cavallin', 'Duran'] | ['Gao', 'La', 'Dittmer', 'Mesri', 'Asgari', 'Rafii', 'Eroles', 'Chiozzini', 'Mutlu', 'Hooper', 'Vincent', 'Hilsher', 'Cavallin', 'Duran'] | ['Gao', 'La', 'Dittmer', 'Mesri', 'Rafii', 'Mutlu', 'Chiozzini', 'Duran', 'Eroles', 'Hooper', 'Vincent', 'Hilsher', 'Cavallin', 'Asgari'] | Cancer Cell | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
491 | GSE6488 | 4/30/2007 | ['6488'] | ['2685'] | [u'17369330'] | 1907119 | [u'17369330'] | ['Arp', '', 'Gelfand', 'Permina', 'Bottomley', 'Gvakharia', 'Sayavedra-Soto'] | ['Arp', 'Gelfand', 'Permina', 'Bottomley', 'Gvakharia', 'Sayavedra-Soto'] | ['Arp', 'Gelfand', 'Permina', 'Bottomley', 'Gvakharia', 'Sayavedra-Soto'] | Appl Environ Microbiol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
492 | GSE6497 | 4/1/2007 | ['6497'] | [] | [u'18261221'] | 2391056 | [u'18216318'] | ['Lang', 'Salomon', 'M\xc3\xbcller-Tidow', 'Schlatter', 'Edemir', 'Eisenacher', 'Gabri\xc3\xabls', 'Kurian'] | ['Schr\xc3\xb6ter', 'Schlatter', 'Edemir', 'Reuter', 'Gabri\xc3\xabls', 'Borgulya', 'Neugebauer'] | ['Edemir', 'Gabri\xc3\xabls', 'Schlatter'] | J Am Soc Nephrol | 2008 | 2008 Mar | 0 | ere processed as per manufacturer's instructions ( http://www.affymetrix.com ). Data were analyzed using Affymetrix GCOS array analysis software. All data have been deposited in NCBIs Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo ) and are accessible through GEO series accession number GSE6497{{tag}}--DEPOSIT-- . Identification of Differentially Expressed Genes Significant changes in the gene expression after | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
493 | GSE6497 | 4/1/2007 | ['6497'] | [] | [u'18261221'] | 2262896 | [u'18261221'] | ['Lang', 'Salomon', 'M\xc3\xbcller-Tidow', 'Schlatter', 'Edemir', 'Eisenacher', 'Gabri\xc3\xabls', 'Kurian'] | ['Lang', 'Salomon', 'M\xc3\xbcller-Tidow', 'Schlatter', 'Edemir', 'Eisenacher', 'Gabri\xc3\xabls', 'Kurian'] | ['Lang', 'Salomon', 'M\xc3\xbcller-Tidow', 'Schlatter', 'Edemir', 'Eisenacher', 'Gabri\xc3\xabls', 'Kurian'] | BMC Genomics | 2008 | 2/8/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
494 | GSE6503 | 6/1/2007 | ['6503'] | [] | [u'17676974'] | 1925137 | [u'17676974'] | ['', 'Gatza', 'Shaw', 'Fisk', 'Donehower', 'Chambers', 'Goodell'] | ['Gatza', 'Shaw', 'Fisk', 'Donehower', 'Chambers', 'Goodell'] | ['Gatza', 'Shaw', 'Fisk', 'Donehower', 'Chambers', 'Goodell'] | PLoS Biol | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
495 | GSE6506 | 11/21/2007 | ['6506'] | [] | [u'18371395'] | 2475548 | [u'18371395'] | ['Merchant', 'Weksberg', 'Sirin', 'Bowman', 'Bradfute', 'Lin', 'Shaw', 'Boles', 'Fisk', 'Chen', 'Chambers', 'Goodell', 'Tierney'] | ['Merchant', 'Weksberg', 'Sirin', 'Bowman', 'Bradfute', 'Lin', 'Shaw', 'Boles', 'Fisk', 'Chen', 'Chambers', 'Goodell', 'Tierney'] | ['Merchant', 'Weksberg', 'Sirin', 'Bowman', 'Bradfute', 'Lin', 'Shaw', 'Boles', 'Fisk', 'Chen', 'Chambers', 'Goodell', 'Tierney'] | Cell Stem Cell | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
496 | GSE6507 | 12/11/2007 | ['6507'] | [] | [] | 2745168 | [u'19763250'] | [u'Smith-Richards', u'Kumar'] | ['Richards', 'Kumar'] | ['Kumar'] | J Nutrigenet Nutrigenomics | 2008 | 6/1/2008 | 0 | are v16.0 (Spotfire, Somerville, MA). Fold changes were calculated for each gene using the normalized signal values. Hybridization data and parameter information were deposited in the Gene Expression Omnibus (GEO) database ( http://www.ncbi.nlm.nih.gov/geo/ ) under the GEO series accession number GSE6507{{tag}}--DEPOSIT-- . Bioinformatics approach to candidate gene identification The Protein ANalysis THrough Evolutionary | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
497 | GSE6509 | 3/22/2007 | ['6509'] | [] | [u'17375196'] | 2949890 | [u'20831831'] | ['Glezer', 'Chernomoretz', 'Rivest', 'Plante', 'David'] | ['Monsonego', 'Rubin', 'Geifman'] | [] | BMC Bioinformatics | 2010 | 9/12/2010 | 0 | ound 0.456971* 0.505641 External side of plasma membrane 0.744269 0.508488* Not Found (A) Enrichment in genes with up-regulated expression in hippocampus treated with Fluoxetine, based on GEO profile GSE6476. All terms passed FDR > 0.25 and state the FDR values. (B) GEO profile GSE6675. All terms passed FDR < 0.25 and state the FDR values (Up-regulated in Control in comparison to FGF2 t|ture. † indicates terms that were not included in the NIGO subset. For three GEO expression profiles, the results of the analysis are described in detail (Table 1 ). In the analysis of the GSE6476, in which the effect of chronic Fluoxetine treatment on hippocampal gene expression was examined [ 10 ], eleven terms passed statistic filtering with NIGO and not with the full GO, out of which 7 w|ded in the full GO but not in NIGO passed the statistic cutoffs but this term ('Protein self-association') was functionally irrelevant and contributed very little to the analysis. For the GEO profile GSE6675, in which astroglial gene expression program elicited by fibroblast growth factor-2 was examined, four terms relevant to neural/immune systems passed statistic filtering with NIGO and not with the |ith the full GO. This is an example of how without the use of NIGO one would have to raise the cutoff to at least 0.55 in order for this term to appear in the analysis results. Functional analysis of GSE6509{{tag}}--REUSE-- with NIGO also revealed statistically significant terms that were missed when using the full GO. This experiment involved microarray expression profiling designed to explore the effect of RU486, a |ed both in NIGO and the full GO, received higher FDR values when NIGO was used (Table 2 ), contrary to our expectations from the impact of reducing the ontology size. For example, in the analysis of GSE8788 in which gene expression analysis was conducted using Trib1-deficient macrophages treated with LPS as compared to LPS-treated wild-type macrophages [ 12 ], 11 out of 14 terms (that passed statistic|ment in FDR values when using NIGO is partially due to the stochastic nature of the GSEA algorithm. To test this hypothesis, the same analysis was repeated three times with each of the ontologies for GSE8788. FDR values were averaged and a comparison of analysis results was performed based on these averaged FDR values. In accordance with our hypothesis, the averaging of FDR values improved the apparent|l GO and GO-slim in enrichment analysis using GSEA Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509{{tag}}--REUSE-- P < 0.05 2 (2) 5 0 0 2 0 GSE6675 FDR < 0.25 0 6 0 0 1 0 GSE6476 FDR < 0.25 1 (1) 11 0 0 4 0 GSE3779 P < 0.05 7 (3) 13 1 (1) 0 1 5 GSE8425 P < 0.05 6 (1) 0 0 | 0 0 0 1 GSE8788 P < 0.01 6 (6) 0 0 11 3 1 GSE8788* P < 0.01 6 (6) 0 0 3 11 1 GSE9659 P < 0.01 6 (6) 0 NS A 4 28 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.01 5 (5) 0 0 0 1 2 GSE2259 FDR < 0.25 3 (1) 0 1 (1) 1 2 1 GSE8191 P < 0.05 5 (2) 0 0 0 0 0 The enrichment analysis results of GSEA, providing the full GO, NIGO or GO-s| for each subset given in separate columns. *To test the effect of stochasticity, the analysis of this profile is based on 3-fold averaged FDR values from three independent GSEA analyses. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. Three non-neural- or immune-related datasets were used to test the performance of NIGO in functional analysis of microarray data in|. These datasets included the GEO expression profile, GSE7407, in which gene expression in heart tissue with cardiac specific over-expression of Sirt1 was examined [ 13 ], the GEO expression profile, GSE8191, in which the gene expression profile of mammary glands from pregnant mice was compared to that of mammary glands from lactating mice [ 14 ], and the GEO expression profile, GSE2259, in which gene |in enrichment analysis using the Fisher Exact Test Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509{{tag}}--REUSE-- P < 0.0001 1 (1) 1 0 0 22 2 GSE6675 P < 0.05 1 (1) 0 11 (1) 0 2 0 GSE6476 P < 0.0001 0 2 2 0 15 3 GSE3779 P < 0.1 0 0 0 0 2 0 GSE8425 P < 0.1 0 0 0 0 0 0 GSE|0.001 0 2 0 0 9 1 GSE6136 P < 0.000001 0 1 0 0 37 7 GSE8788 P < 0.01 0 2 0 0 4 1 GSE9659 P < 0.001 0 3 NS A 0 33 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.0000000001 33 (30) 0 0 9 0 0 GSE2259 P < 0.1 63 (35) 0 2 8 8 7 GSE8191 P < 0.001 6 (6) 7 6 (2) 0 16 3 Enrichment analysis results from Ontologizer, providing the full |ional GO subset (the full GO or generic slim) are also shown for each term ('Lowest P-Value'), with the number of terms with the lowest P-values for each subset given in separate columns. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. These results, together with the analysis of five additional neural/immune-related experiments not described in Table 1 (namely p| cell's activity. These results further demonstrate that analysis with NIGO can enhance interpretation of functional analysis results produced for relevant microarray datasets. In the analysis of the GSE6509{{tag}}--REUSE-- expression dataset, three relevant terms passed the statistical cutoff with NIGO but not with the full GO. These terms, 'viral envelope', 'viral infectious cycle' and 'viral capsid' are all terms r|et. Such terms received FDR or p-values that were very close (but larger) than the cutoff values used. This is partially explained by the stochastic nature of the GSEA algorithm. Indeed, for one set (GSE8788), we compared the raw results with averaged FDR values. Averaging dramatically decreased the number of such terms. Furthermore, we conducted a similar functional analysis using the Fisher Exact Tes|nous peptide antigen via MHC class I' and 'Positive regulation of T cell mediated cytotoxicity' without defeating the purpose of the slimming process. Yet these two terms were found to be enriched in GSE6476, and are crucial for generating a hypothesis based on the expression profile. This shows that GO slims may be complemented by small, yet fully detailed domain-specific subsets of GO. Figure 4 The U|plementary/NIGO/Supplementary.html . The ontologies were clipped using the Protégé 4.0 beta OWL editor [ 26 ]. For by-species filtering, annotation files for human, rat and mouse were downloaded (October 2008) from GOA-EBI [ 27 ]. Association files used for GSEA analysis were generated based on the GOA-EBI annotation files and in the format required by GSEA. In this format, each row repres|a analysis and functional analysis were conducted using the GenePattern [ 29 ], GSEA [ 9 ] (release 2.5) web servers, and Ontologizer [ 16 ] as follows: (1) for each study, raw data (.CEL files) were downloaded from GEO [ 30 ]; (2) expression files (.gct files) were created using the Gene Pattern Expression File Creator module; (3) where necessary (i.e. for expression files GSE6509{{tag}}--REUSE--, GSE6675, GSE6476, GSE7|was conducted using the GSEA module. GSEA was run three times for each dataset, using a different GO version for each run. For the full GO, we used the organism-specific GO subset. In the analysis of GSE8788, GSEA was run three times for each of the three ontologies and FDR values were averaged over the three runs. (5) Differentially-expressed genes were found using the Gene Pattern ComparativeMarkerSe|, to an input term (Figure 4 ). The connections of UMLS concepts to the tested GO term are provided by UMLS and defined within the UMLS data files. Microarray data sets All microarray data sets were downloaded from GEO at NCBI [ 30 ]. The GEO sets used in this study are described in Additional file 2 . Availability NIGO is freely available as Additional file 6 and for download from: http://bioinfo.bg|re included or excluded from NIGO. Click here for file Additional file 2 Microarray Datasets for Comparative analysis of NIGO, GO and GO-slim . This file contains a summary of the microarray datasets downloaded from GEO at NCBI and used to test the performance of NIGO. Click here for file Additional file 3 GSEA analysis results . This file contains a summary of the GSEA analysis results for each of the mi|etroviral infectivity through different mechanisms Am J Physiol Lung Cell Mol Physiol 2009 297 L538 545 10.1152/ajplung.00162.2009 19561138 The Gene Ontology Consortium http://www.geneontology.org/GO.downloads.ontology.shtml Web Ontology Language Guide http://www.w3.org/TR/owl-guide Protégé 4.0 beta http://protege.stanford.edu Barrell D Dimmer E Huntley RP Binns D O'Donovan C Apweiler R Th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
498 | GSE6509 | 3/22/2007 | ['6509'] | [] | [u'17375196'] | 1819560 | [u'17375196'] | ['Glezer', 'Chernomoretz', 'Rivest', 'Plante', 'David'] | ['Glezer', 'Chernomoretz', 'Rivest', 'Plante', 'David'] | ['Glezer', 'Chernomoretz', 'Rivest', 'Plante', 'David'] | PLoS One | 2007 | 3/21/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
499 | GSE6510 | 3/1/2007 | ['6510'] | [] | [u'17337630'] | 1867377 | [u'17337630'] | ['Figueroa', 'Tongprasit', 'Gao', 'Zhao', 'Stolc', 'He', 'Deng', 'Lee'] | ['Figueroa', 'Tongprasit', 'Gao', 'Zhao', 'Stolc', 'He', 'Deng', 'Lee'] | ['Figueroa', 'Tongprasit', 'Gao', 'Zhao', 'Stolc', 'He', 'Deng', 'Lee'] | Plant Cell | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
500 | GSE6514 | 2/11/2007 | ['6514'] | ['2882'] | [u'17698924'] | 3022651 | [u'21187013'] | ['Mackiewicz', 'Jensen', 'Zimmerman', 'Galante', 'Romer', 'Churchill', 'Shockley', 'Naidoo', 'Baldwin', 'Pack'] | ['Withers', 'Thomason', 'Scott', 'Li'] | [] | BMC Syst Biol | 2010 | 12/27/2010 | 0 | orithm. This dataset has been deposited in EBI ArrayExpress Database with accession number E-BAIR-12 http://www.ebi.ac.uk/microarray-as/ae/ . The second dataset was obtained from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ , accession ID: GSE4671 ), which contains 28 Affymetrix mouse genome 4302 chips. Nine-week old male mice in this experiment were fed with either a control cho|7), retroperitoneal WAT was obtained from two control mice and two test mice. Microarray data were processed using GCRMA algorithm. The third dataset was also acquired from GEO with accession number GSE8831 , in which 20 female and 15 male C57BL/6J mice, fed by ad libitum , varied in body weight and insulin sensitivity were studied. Fasting blood glucose and serum insulin concentrations were measured|15 genes and fewer than 500 genes; the results returned were significant in the above four datasets as well as an additional dataset which included 45 sleep mice model with hypothalamus tissue (GEO: GSE6514{{tag}}--REUSE-- ). We then worked on those canonical pathway sets which passed the same filter only on BAIR fat-fed mice and human fat dataset about 20 lean and 19 obese person[ 22 ]. Protein-protein interaction n| Nnat include Aqp1, Sncg, Sulf2 and Cxcl9 . Additional file 1 , Figure S3 illustrates the expressions of the top four most correlated genes ( Gstt1, Ccdc80, Hfe Sod3 ) with Nnat in BAIR and GSE6571 datasets. Among these genes is only one transcription factor, Ebf1 which has previously been suggested to have a regulatory role in adipogenesis[ 24 ]. Another of the Nnat -correlated genes, L | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
501 | GSE6517 | 3/1/2007 | ['6517'] | [] | [u'17337630'] | 1867377 | [u'17337630'] | ['Figueroa', 'Tongprasit', 'Gao', 'Zhao', 'Stolc', 'He', 'Deng', 'Lee'] | ['Figueroa', 'Tongprasit', 'Gao', 'Zhao', 'Stolc', 'He', 'Deng', 'Lee'] | ['Figueroa', 'Tongprasit', 'Gao', 'Zhao', 'Stolc', 'He', 'Deng', 'Lee'] | Plant Cell | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
502 | GSE6519 | 3/10/2007 | ['6519'] | ['2652'] | [u'17426818'] | 1847718 | [u'17426818'] | ['Brenna', 'Kothapalli', 'Anthony', 'Hsieh', 'Nathanielsz', 'Pan'] | ['Brenna', 'Kothapalli', 'Anthony', 'Hsieh', 'Nathanielsz', 'Pan'] | ['Brenna', 'Kothapalli', 'Anthony', 'Hsieh', 'Nathanielsz', 'Pan'] | PLoS One | 2007 | 4/11/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
503 | GSE6523 | 3/5/2007 | ['6523'] | [] | [u'17304213'] | 1817633 | [u'17304213'] | ['Yamada', 'Shirahige', 'Nakagawa', 'Katou', 'Itoh', 'Masukata', 'Hayashi', u'Itou', 'Tazumi', 'Takahashi'] | ['Yamada', 'Shirahige', 'Nakagawa', 'Katou', 'Itoh', 'Masukata', 'Hayashi', 'Tazumi', 'Takahashi'] | ['Yamada', 'Shirahige', 'Nakagawa', 'Katou', 'Itoh', 'Masukata', 'Hayashi', 'Tazumi', 'Takahashi'] | EMBO J | 2007 | 3/7/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
504 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2972666 | [u'20975711'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Wang', 'Li', "O'Connor-McCourt", 'Purisima', 'Collins', 'Deng', 'Lenferink', 'Cui'] | [] | Nat Commun | 2010 | 7/13/2010 | 0 | The breast cancer microarray data sets used were from WangÊet al.15Ê(Wang cohort or data set, Affymetrix arrays), ChangÊet al.6Ê(Chang cohort or data set, cDNA arrays), van 't VeerÊet al.5Ê(van 't Veer cohort or data set, cDNA arrays), MillerÊet al.21Ê(Miller cohort or data set, Affymetrix arrays) and from several other Affymetrix array data sets with the following NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/) IDs:GSE11121,ÊGSE1456,ÊGSE6532{{key}}--REUSE--,ÊGSE9151,ÊGSE7378ÊandÊGSE12093. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
505 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2906483 | [u'20584321'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['San', 'Feliu', 'S\xc3\xa1nchez-Navarro', 'Hardisson', 'Redondo', 'Espinosa', 'Gonz\xc3\xa1lez', 'G\xc3\xa1mez-Pozo', 'L\xc3\xb3pez', 'Cejas', 'Zamora', 'Pinto', 'Madero', 'Angel'] | [] | BMC Cancer | 2010 | 6/28/2010 | 0 | independent databases that are available online were used as validation sets: 1) NKI [ 8 ], downloaded from the Rosetta Inpharmatics Web page http://www.rii.com/publications/2002/nejm.html , 2) SWE (GSE1456)[ 20 ], 3) UPP (GSE4922)[ 21 ], and 4) LOI (GSE6532{{tag}}--REUSE--) [ 22 ], downloaded from the NCBI GEO data repository http://www.ncbi.nlm.nih.gov/projects/geo/index.cgi . To apply our qRT-PCR reduced gene sco|ing distant metastasis-free survival (DMFS) for patients included in four databases available online . We subsequently analyzed the performance of the 8-gene Score in three additional data sets: SWE (GSE1456)[ 20 ], UPP (GSE4922)[ 21 ], and LOI (GSE6532{{tag}}--REUSE--)[ 22 ]. Although the use of other external databases does not constitute a formal validation, it may provide insight about the performance of the gene | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
506 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2917039 | [u'20584310'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Baumbach', 'Gehrmann', 'Schormann', 'Maccoux', 'Sickmann', 'Freis', 'Ickstadt', 'Hengstler', 'Franckenstein', 'Wilhelm', 'Geppert', 'Rahnenf\xc3\xbchrer', 'Cadenas', 'Schug', 'Schumann', 'Schmidt', 'Hermes'] | [] | Breast Cancer Res | 2010 | 2010 | 0 | x HG-U133A array and the GeneChip System as described [ 31 ]. These data have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (GEO) and are accessible [GEO:GSE11121]. Results obtained from the Mainz cohort were validated in two previously published microarray datasets. Two breast cancer Affymetrix HG-U133A microarray datasets including patient outcome informa|re downloaded from the National Center for Biotechnology Information GEO data repository. The first dataset, the Rotterdam cohort [ 32 ], represents 180 lymph node-negative relapse-free patients [GEO:GSE2034] and 106 lymph node-negative patients that developed a distant metastasis. None of these patients had received systemic neoadjuvant or adjuvant therapy (Rotterdam cohort). The original data were re| dataset, the Transbig cohort, consists of 302 samples from breast cancer patients that remained untreated in the adjuvant setting after surgery [ 33 , 34 ]. GEO sample record numbers of samples [GEO:GSE6532{{tag}}--REUSE--, GEO:GSE7390] used for analysis are listed in the supplementary tables previously published by Schmidt and colleagues [ 31 ]. Raw .cel file data were processed by MAS 5.0 using a TGT of 500. Ethica | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
507 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2527336 | [u'18684329'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Reyal', 'Wessels', 'van', 'Reinders', 'Horlings'] | [] | BMC Genomics | 2008 | 8/6/2008 | 1 | l measured on Human Genome HG U133A Affymetrix arrays and normalized using the same protocol. The datasets were downloaded from NCBI's Gene Expression Omnibus (GEO, ) with the following identifiers; GSE6532{{tag}}--REUSE-- [ 24 ], GSE3494 [ 18 ], GSE1456 [ 23 ], GSE7390 [ 4 ] and GSE5327 [ 22 ]. The Chin et al. [ 25 ] data set was downloaded from ArrayExpress ( , identifier E-TABM-158). To ensure comparability betw | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
508 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2563019 | [u'18803878'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Sims', 'Pepper', 'Miller', 'Clarke', 'Hey', 'Okoniewski', 'Howell', 'Smethurst'] | [] | BMC Med Genomics | 2008 | 9/21/2008 | 1 | ed in this study. Datasets No. Tumours Array express/GEO ID GeneChip ER+ Age Tumour Size (cm) FU (years) Reference Chin et al. 2006 114 E-TABM U133AA 67% 51 2.3 6.1 [ 16 ] Desmedt et al. 2007 198 GSE7390 U133A 68% 47 2.0 13.6 [ 17 ] Farmer et al 2005 49 GSE1561 U133A 58% - - - [ 11 ] Ivshina et al. 2006 249 GSE4922 U133A 85% 63 2.0 9.9 [ 18 ] Loi et al. 2007 119, 87 GSE6532{{tag}}--REUSE-- U133A, U133 plus2.|t al. 2007 58 GSE5327 U133A 0% - - 7.2 [ 33 ] Pawitan et al. 2005 159 GSE1456 U133A 83% 58 $ 2.2 $ 7.1 [ 19 ] Richardson et al. 40 GSE3744 U133 plus2.0 38% - - - [ 10 ] Sotiriou et al. 2006 101* GSE2990 U133A 71% 60 2.0 5.8 [ 20 ] Wang et al. 2005 286 GSE2034 U133A 73% 52 - 7.2 [ 52 ] Continuous variables (age, size and follow up) are given as median values, except where indicated $ the mean | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
509 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2689133 | [u'19451693'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Parker', 'Bierie', 'Chung', 'Shyr', 'Cheng', 'Stover', 'Moses', 'Aakre', 'Chytil'] | [] | J Clin Invest | 2009 | 2009 Jun | 1 | cancer tissues with well-documented clinical data related to tumor size, LN involvement, ER status, treatment regimen, and time of relapse detection over a 10-year period if present (Gene Expression Omnibus ID: GSE10886 , GSE4922 , GSE6532{{tag}}--REUSE-- , and GSE2845 ) ( 22 – 28 ). Using the clinical data and gene expression profiles represented by these 4 data sets, we have been able to determine that th|ression signatures with human gene profiling and clinical status data were performed using normalized data representing 1,319 patients from 4 independent, previously reported studies (Gene Expression Omnibus ID: GSE10886 , GSE4922 , GSE6532{{tag}}--REUSE-- , and GSE2845 ) ( 22 – 28 ). Probes were median centered across each dataset to minimize platform effects. Gene symbols were assigned using the manufactur | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
510 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2689196 | [u'19445687'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Kim'] | [] | BMC Bioinformatics | 2009 | 5/16/2009 | 1 | -centered and pooled into a single data set of 1,418 samples. Each color above the heatmap represents each data set. Table 1 Data sets analyzed in this study Data set Total ER+ ER- Survival Reference GSE1456 159 99 40 RFS [ 13 ] GSE2603 82 57 42 DMFS [ 14 ] GSE3494 236 213 34 DMFS [ 15 ] GSE6532{{tag}}--REUSE-- 306 262 45 DMFS [ 16 , 17 ] GSE7378 54 54 0 DMFS [ 18 ] GSE7390 198 134 64 DMFS [ 19 ] GSE11121 129 200 0 RF | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
511 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2689870 | [u'19393097'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Nakai', 'Mizuno', 'Kitada', 'Sarai'] | [] | BMC Med Genomics | 2009 | 4/24/2009 | 1 | ccess and analyze without an effective analysis platform. Description To take advantage of public resources in full, a database named "PrognoScan" has been developed. This is 1) a large collection of publicly available cancer microarray datasets with clinical annotation, as well as 2) a tool for assessing the biological relationship between gene expression and prognosis. PrognoScan employs the minimum P |-analysis of multiple datasets. Conclusion PrognoScan provides a powerful platform for evaluating potential tumor markers and therapeutic targets and would accelerate cancer research. The database is publicly accessible at . Background A number of genes are recognized as being potentially relevant to cancers. One way to evaluate such genes is to assess their relationship to prognosis. At present, many ca|transformation [ 6 ], and MYC for tumor maintenance [ 7 ], and provided a rationale for the application to gene expression. Thus, we developed "PrognoScan", a database featuring a large collection of publicly available cancer microarray datasets with clinical annotation and a tool for assessing the relationship between gene expression and prognosis using the minimum P -value approach. This database enabl|ll accelerate cancer research. Construction and content Data collection Cancer microarray datasets with clinical annotation were intensively collected from the public domain including Gene Expression Omnibus (GEO) [ 8 ], ArrayExpress [ 9 ] and individual laboratory web sites, under the following criteria: 1) includes patient information on survival event and time, 2) contains large enough sample sizes to|as possible. All tables were relationally linked and stored in the MySQL server. Table 1 Dataset content from PrognoScan Dataset Cancer type Subtype Cohort Author/Contributor Array type n Data source GSE13507 Bladder cancer Transitional cell carcinoma Cheongju Kim Human-6 v2 n = 165 GEO GSE5287 Bladder cancer Aarhus (1995–2004) Als et al . [ 10 ] HG-U133A n = 30 GEO GSE12417-GPL570 Blood cance|-GPL96 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133A n = 163 GEO GSE12417-GPL97 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133B n = 163 GEO GSE8970 Blood cancer AML San Diego Raponi et al . [ 12 ] HG-U133A n = 34 GEO GSE4475 Blood cancer B-cell lymphoma Berlin (2003–2005) Hummel et al . [ 13 ] HG-U133A n = 158 GEO E-TABM-346 Blood ca|E2658 Blood cancer Multiple myeloma Arkansas Zhan et al . [ 15 ] HG-U133_Plus_2 n = 559 GEO E-TABM-158 Breast cancer UCSF, CPMC (1989–1997) Chin et al . [ 16 ] HG-U133A n = 129 ArrayExpress GSE11121 Breast cancer Mainz (1988–1998) Schmidt et al . [ 17 ] HG-U133A n = 200 GEO GSE1378 Breast cancer MGH (1987–2000) Ma et al . [ 18 ] Arcturus 22 k n = 60 GEO GSE1379 Breast cancer|6-GPL96 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133A n = 159 GEO GSE1456-GPL97 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133B n = 159 GEO GSE2034 Breast cancer Rotterdam (1980–1995) Wang et al . [ 20 ] HG-U133A n = 286 GEO GSE2990 Breast cancer Uppsala, Oxford Sotiriou et al . [ 21 ] HG-U133A n = 187 GEO GSE3143 Breast cancer Duke |GSE3494-GPL96 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133A n = 236 GEO GSE3494-GPL97 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133B n = 236 GEO GSE4922-GPL96 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133A n = 249 GEO GSE4922-GPL97 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133B n = 249 GEO GSE|SE7378 Breast cancer UCSF Zhou et al . [ 26 ] U133AAofAv2 n = 54 GEO GSE7390 Breast cancer Uppsala, Oxford, Stockholm, IGR, GUYT, CRH (1980–1998) Desmedt et al . [ 27 ] HG-U133A n = 198 GEO GSE7849 Breast cancer Duke (1990–2001) Anders et al . [ 28 ] HG-U95A n = 76 GEO GSE9195 Breast cancer GUYT2 Loi et al . [ 25 ] HG-U133_Plus_2 n = 77 GEO GSE9893 Breast cancer Montpellier, Bordeau|man 21 K V12.0 n = 155 GEO GSE11595 Esophagus cancer Adenocarcinoma Sutton Giddings CRUKDMF_22 K_v1.0.0 n = 34 GEO GSE7696 Glioma Glioblastoma Lausanne Murat et al . [ 30 ] HG-U133_Plus_2 n = 70 GEO GSE4271-GPL96 Glioma MDA Phillips et al . [ 31 ] HG-U133A n = 77 GEO GSE4271-GPL97 Glioma MDA Phillips et al . [ 31 ] HG-U133B n = 77 GEO GSE2837 Head and neck cancer Squamous cell carcinoma VUMC, VAMC, |Adenocarcinoma Harvard Beer et al . [ 33 ] HG-U95A n = 84 Author's web site MICHIGAN-LC Lung cancer Adenocarcinoma Michigan (1994–2000) Beer et al . [ 33 ] HuGeneFL n = 86 Author's web site GSE11117 Lung cancer NSCLC Basel Baty Novachip human 34.5 k n = 41 GEO GSE3141 Lung cancer NSCLC Duke Bild et al . [ 22 ] HG-U133_Plus_2 n = 111 GEO GSE4716-GPL3694 Lung cancer NSCLC Nagoya (1995–|da et al . [ 34 ] GF200 n = 50 GEO GSE4716-GPL3696 Lung cancer NSCLC Nagoya (1995–1996) Tomida et al . [ 34 ] GF201 n = 50 GEO GSE8894 Lung cancer NSCLC Seoul Son HG-U133_Plus_2 n = 138 GEO GSE4573 Lung cancer Squamous cell carcinoma Michigan (1991–2002) Raponi et al . [ 35 ] HG-U133A n = 129 GEO DUKE-OC Ovarian cancer Duke Bild et al . [ 22 ] HG-U133A n = 134 Author's web site GSE8|en in red font. Each dataset has a link to the public domain where the raw data is archived. By clicking a probe ID in the summary table, a detailed report for the test is displayed. The table can be downloaded in a tab delimited file from the button at bottom. Figure 2 PrognoScan screenshot and sample search results (part 2) . (A) Annotation table. Row headings are color-coded. For example, headings of d| cancer prognosis and MCTS1 to brain, blood, breast and lung cancer prognosis for the first time. PrognoScan aims to fulfill such substantial practical requirements. Regarding survival analysis using publicly available microarray datasets, several considerations exist: 1) Cohorts. Datasets come from a number of different institutions around the world, and patient backgrounds differ. In addition, several d|subsequent care may affect the clinical course of a patient. 3) Experimental factors. Expression measurement of microarray is subject to various factors at the experiment level. Microdissection (e.g. GSE1378) would reduce contamination of mRNAs from non-cancer cells [ 57 ]. Formalin fixation of a sample (e.g. GSE2873) influences the quality of mRNAs [ 58 ]. Array type (e.g. Affymetrix, cDNA microarrays | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
512 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2718903 | [u'19619298'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Buechler'] | [] | BMC Cancer | 2009 | 7/20/2009 | 1 | y of questionable benefit. Methods Patient cohorts and data analysis The microarray datasets used here were obtained from the Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ , specifically, GSE4922 (UPPS), GSE6532{{tag}}--REUSE-- (OXFD, GUYT), GSE7390 (TRANSBIG), GSE9195 (GUYT2), and GSE11121 (MZ). The codes used for the cohorts in this paper are given in parentheses. Two independent cohorts were obtained fr|he combination of OXFT and OFXU from GSE6532{{tag}}. (GSE6532{{tag}} contains additional cohorts, coded KIT and KIU, however these cohorts were excluded since many of the patients in these groups are also found in GSE4922.) None of the patients received adjuvant chemotherapy. A summary of the clinical traits of the patients is found in Table 1 . Complete descriptions can be found in the references at the Gene Expre|us was assessed by different methods across the cohorts. A "+" after the code for a cohort denotes the set of ER+ samples. In all cohorts, the survival endpoint used was distant metastasis, except in GSE4922, in which it was local recurrence or metastasis. Data on metastasis for most of the samples in this cohort are found in GSE6532{{tag}}--REUSE--. All survival data was censored to 10 years so as not to distort the |ix GeneChip platform hgu133a or hgu133plus2 . Table 1 Summary of the patient cohorts used in this study Uppsala Transbig Guys 1 Oxford Guys 2 Mainz Code UPPS TRANSBIG GUYT OXFD GUYT2 MZ GEO Series GSE4922 GSE7390 GSE6532{{tag}}--REUSE-- GSE6532{{tag}}--REUSE-- GSE9195 GSE11121 array hgu133a hgu133a hgu133plus2 hgu133a hgu133plus2 hgu133a # samples 249 198 87 178 77 200 # ER+ 200 138 85 144 77 169 LN+/-/? on ER+ 62/132/6 0/138/0 56|onsidering some probes as binary variables. The AP4 test for metastasis in ER+ breast cancer Derivation of the AP4 model The AP4 model is derived with the ER+ samples in two cohorts as training sets, GSE4922 (denoted UPPS+) and GSE7390 (TRANSBIG+). An initial set of 100 significant probes is identified as follows: Working in UPPS+, 100 training sets are selected, each containing 2 / 3 of the samples th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
513 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2838864 | [u'20184770'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Dewhirst', 'Garcia-Blanco', 'Pearson', 'Robinson', 'Dinan'] | [] | BMC Bioinformatics | 2010 | 2/25/2010 | 0 | nges in alternative mRNA processing had been undertaken for any of these oncogenes. We examined an oncogene over-expression microarray dataset published by Nevins and colleagues [ 23 ] (GEO accession GSE3151) to demonstrate SplicerAV's ability to detect oncogene driven changes in alternative processing. In this experiment, activated HRAS, SRC, E2F3, activated β-catenin (CTNNB1), MYC, or green f|d colleagues profiled 87 Tamoxifen treated, estrogen receptor (ER) positive tumors obtained from Guys Hospital, London (GUYT) using the Affymetrix HG-U133 PLUS2 Genechip™[ 50 ] (GEO accession GSE6532{{tag}}--REUSE--, RMA normalized). Using this dataset, we examined changes in probeset expression between low grade (I, n = 17) and high grade (III, n = 16) breast tumors. Analysis was limited to probesets present |dation datasets to examine whether specific isoform changes observed in high grade tumors were also associated with poor patient prognosis (see methods). Previous datasets generated by Miller [ 51 ] (GSE3494) and Pawitan [ 52 ] (GSE1456) have independently profiled breast tumor gene expression using the Affymetrix U133 A and B microarrays (probeset intensities were estimated using MAS5 [ 53 ]). These s | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
514 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2922198 | [u'20678237'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Yang', 'Campain'] | [] | BMC Bioinformatics | 2010 | 8/3/2010 | 0 | all considered datasets, it should be understood that predicting ER status using gene expression data is not the same as immunohistochemistry. These datasets include the Farmer et al. dataset [ 29 ] (GSE1561) which utilises the Affymetrix U133A platform with 49 samples, comprising of 27 +ve and 22 -ve samples. The Loi et al. dataset [ 30 ] contains Affymetrix samples from three platforms, U133 (A,B) an|some of which underwent treatment and some which did not. Samples from platform U133A which did not experience any treatment are used in this study, which totalled 126 with 86 +ve and 40 -ve samples (GSE6532{{tag}}--REUSE--). Ivshina et al. [ 31 ], developed breast cancer samples on Affymentrix U133A arrays, 200 in total corresponding to 49 +ve and 151 -ve samples (GSE4922). The performance of the meta-analysis method | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
515 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2719918 | [u'19620787'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Ramnarayanan', 'Datta', 'Han', 'Zhang', 'Karuturi', 'Lim', 'Chan', 'Govindarajan', 'Palanisamy', 'Liu', 'Phua', 'Wong', 'Miller', 'Tam', 'Leong', 'George'] | [] | J Clin Invest | 2009 | 2009 Aug | 0 | Raw microarray data (from Affymetrix U133A and U133B GeneChips) were retrieved from Gene Expression Omnibus: the Uppsala cohort (Gene Expression Omnibus accession numberÊGSE3494) (54), the Stockholm cohort (Gene Expression Omnibus accession numberÊGSE1456) (55), the Oxford cohort (Gene Expression Omnibus accession numberÊGSE6532{{tag}}--REUSE--) (56), and the Singapore cohort (Gene Expression Omnibus accession numberÊGSE4922) (5). | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
516 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2883591 | [u'20548947'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Wagoner', 'Friedl', 'Gunsalus', 'Schoenike', 'Roopra', 'Richardson'] | [] | PLoS Genet | 2010 | 6/10/2010 | 0 | g002 Figure 2 The 24-gene signature detects loss of REST function in breast tumors. (A) Hierarchical clustering analysis is performed on gene expression microarray data from 129 breast cancer tumors (GSE5460) across the 24-gene REST gene signature. Five tumors show a concerted overexpression of REST target genes, suggesting a loss of REST repression. (B) The expression of genes significantly upregulate|x0003c;0.001, FDR q-value <0.01). 10.1371/journal.pgen.1000979.g003 Figure 3 Gene set enrichment analysis of REST–less tumors. Gene Set Enrichment Analysis of the breast tumor dataset GSE5460 shows induction of REST target genes in REST–less tumors using three separate lists of experimentally defined REST target genes. (A) The “REST gene signature” 24-gene set co|igure 4C ). 10.1371/journal.pgen.1000979.g004 Figure 4 REST mRNA levels in breast tissue. (A) Mean REST mRNA levels were assessed in REST–less and RESTfl breast tumors from microarray dataset GSE5460. All error bars represent standard error. (B) Mean REST mRNA levels were compared in normal and tumor tissues across three independent datasets, all of which show a statistically significant increa|t GDS2250, representing three distinct tumor types in addition to normal tissue. (C) REST mRNA data is presented from three independent datasets broken down by stratified by stage (E-TABM-158) grade (GSE6532{{tag}}--REUSE--) and eventual relapse (GSE2034). There is no significant difference in REST mRNA levels across any of these conditions. REST–less tumors show increased levels of the REST splice variant RES|T–less tumors, but not in any of the REST competent tumors. (B) Quantitative real-time RTPCR analysis of REST4 levels (relative to actin), in nine tumors represented in the microarray dataset GSE5460. REST4 mRNA, was detected in REST–less, but not RESTfl tumors after 35 cycles of amplification. (C) Patients with REST–less breast tumors in the superseries GSE6532{{tag}}--REUSE--, as defined by t|ure, show a significant decrease in their disease free survival with respect to their RESTfl counterparts (p<0.01). We then used the 24-gene signature to classify the breast tumor superseries GSE6532{{tag}}--REUSE-- into REST–less and RESTfl tumors and determined how REST status associated with patient outcome ( Figure 5C ) [26] . This analysis shows that REST–less tumors ident|n data were obtained from the NCBI Gene Expression Omnibus, and are identified by their GEO dataset record number. Dataset E-TABM-276 was downloaded from the Ensembl ArrayExpress. Analysis of dataset GSE6532{{tag}}--REUSE-- was performed to determine the aggressiveness of tumors identified as being REST–less using the gene signature method. All samples from this dataset that included information on duration of|cell lines: HEK-293, MCF10a, and T47D cells. (0.10 MB XLS) Click here for additional data file. Table S2 This table describes the gene list determined to be associated with RESTless tumors in dataset GSE5460 by class comparison. This list of genes represents all of the genes identified as more highly expressed in RESTless tumors with respect to RESTfl tumors in the GSE5460 dataset (p<e-7) by us| upon REST knockdown in at least one of three cell lines. (0.08 MB XLS) Click here for additional data file. Table S3 This table provides the gene sets used in gene set enrichment analysis of dataset GSE5460. (0.06 MB XLS) Click here for additional data file. We would like to thank John Svaren, Caroline Alexander, and members of the Roopra lab for advice with the manuscript. This manuscript is dedicate | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
517 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2945993 | [u'20831783'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Schramm', 'Reinelt', 'Eils', 'Wiesberg', 'K\xc3\xb6nig', 'Surmann', 'Oswald'] | [] | BMC Med Genomics | 2010 | 9/10/2010 | 0 | alonate kinase (EC 2.7.1.36). Analyzing a second dataset To verify our findings we analyzed an independent gene expression dataset of breast cancer [ 36 , 37 ] consisting of 414 samples. The data was downloaded from Array Express (GSE6532{{tag}}--REUSE--, http://www.ebi.ac.uk/microarray-as/ae/ ) and normalized as described in Methods. We divided the dataset into 250 tumors showing a favorable prognosis and 61 tumors wit|ed study [ 5 ] of breast-cancer samples of 295 women (diagnosis between 1984 and 1995) with age ≤ 53 years and no previous history of cancer, except for non-melanoma skin cancer. The data was downloaded from Rosetta Inpharmatics at http://www.rii.com/publications/2002/nejm.html . The gene expression profiles were derived by using oligonucleotide microarrays from Agilent Technologies http://www.a | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
518 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2946304 | [u'20825665'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Sims', 'Dexter', 'Mackay', 'Mitsopoulos', 'Grigoriadis', 'Ahmad', 'Zvelebil'] | [] | BMC Syst Biol | 2010 | 9/8/2010 | 0 | cally relevant cancer phenotypes. Methods Microarray data normalisation Microarray gene expression data for five of the breast cancer datasets used in this study, were obtained from the GEO database (GSE6532{{tag}}--REUSE--, GSE1456, GSE3494, GSE7390, GSE2034) [ 27 ]. The paired gene expression and array comparative genomic hybridization data for 43 ER+ tumours [ 20 ] was downloaded from the database referenced therei|h the ridge parameter set by leave-one-out cross-validation in the training set (values ranged from 25 to 120). Competitive selection was carried out on the merged dataset of 793 ER+ samples from the GSE6532{{tag}}--REUSE--, GSE1456, GSE3494, GSE7390 and GSE2034 datasets. One hundred random sample sets, each with 396 tumours, were drawn from the pool. The ridge regression model was then built up selecting the RMG at e|umor suppressor blocks cell cycle progression and inhibits cyclin D1 accumulation Mol Cell Biol 2002 22 4309 4318 10.1128/MCB.22.12.4309-4318.2002 12024041 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 207 210 10.1093/nar/30.1.207 11752295 Bolstad BM Irizarry RA Astrand M Speed TP A comparison of normalization m | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
519 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2528137 | [u'18728668'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Helms', 'Wang', 'Liedtke', 'Korsching', 'Chan', 'Vogt', 'Schlotter', 'Kemming', 'Brandt', 'Buerger', 'Pospisil'] | [] | Br J Cancer | 2008 | 9/2/2008 | 0 | sing the Efron approximation from the ‘survival' package within the statistical data analysis environment R version 2.5.0 ( R Development Core Team, 2007 ). Analysis of microarray data The publicly available data set of Loi et al (2007) was used for correlation analyses of SQLE expression and survival. The cel files (GEO accession no. GSE6532{{tag}}--REUSE-- ) were downloaded from the NCBI GEO Database ( |ation of prognostic value of SQLE expression in an independent sample set To confirm the prognostic value of SQLE mRNA, we analysed an independent validation cohort. Our findings hold true in this publicly available set of Affymetrix-based expression data ( Loi et al , 2007 ). In a cohort of 162 ER+ lymph node-negative patients, tumours with the highest SQLE expression indicated a highly signi | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
520 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2662705 | [u'19267539'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Subramanian', 'Piening', 'Wang', 'Paulovich'] | [] | Radiat Res | 2009 | 2009 Feb | 0 | as weighted, and 1000 permutations were used for each analysis. Curation of Expression Data Sets from the Literature The NKI expression data set and patients’ clinical information ( 14 ) were downloaded from the Stanford University public repository ( http://microarray-pubs.stanford.edu/wound_NKI/explore.html ). There are 244 arrays in total with 24,136 clones on each array. Global normalization w|absolute deviation) to 1]. A second expression data set described by Loi et. al. ( 21 ) was downloaded from the Gene Expression Omnibus database ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE6532{{tag}}--REUSE-- , “LUMINAL. Rdata”). There are 277 arrays (from patients who received adjuvant tamoxifen treatment) with 44,928 clones on each array. The data set representing irradiated mammary tu | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
521 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2892466 | [u'20500820'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Hellwig', 'Gehrmann', 'Hengstler', 'Rahnenf\xc3\xbchrer', 'Schormann', 'Schmidt'] | [] | BMC Bioinformatics | 2010 | 5/25/2010 | 0 | ormalization of the raw data was done using RMA [ 24 ] from the Bioconductor [ 25 ] package affy [ 26 ]. The raw .cel files are deposited at the NCBI GEO data repository [ 27 ] with accession number GSE11121. Results We analyze and compare distributions of bimodality measures. All methods presented in the methodology section are applied to the Mainz cohort study. For all bimodality measures we present|ur analysis on an other free available data set. The raw .cel files and clinical parameters of the Rotterdam cohort [ 28 , 29 ] were downloaded from the NCBI GEO data repository with accession number GSE2034 (n = 286). Here, the outlier-sum statistic and the likelihood ratio also have the smallest p-values of the logrank test (see Additional file 2 : Supplemental Figure 2). For the bimodality index th|gnificant enrichment ( p < 10 -7 ). To determine whether our results also hold for known prognostic subgroups we used a pooled cohort of 766 patients from different free available data sets: GSE11121 (n = 200), GSE2034 (n = 286), GSE7390 (n = 177) [ 30 ] and GSE6532{{tag}}--REUSE-- (n = 103) [ 31 , 32 ]. We look at three subgroups defined by the expression of the two genes ESR1 and erbB2, namely ESR1+/erbB2- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
522 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 3012716 | [u'21209904'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Quackenbush', 'Holton', 'Rubio', 'Cheng', 'Pak', 'Iglehart', 'Prendergast', 'Mar', 'Culhane', 'Aryee', 'Bentink', 'Glinskii', 'Cai', 'Hahn', 'Chittenden', 'Howe', 'Holmes', 'Taylor', 'Sultana', 'Lanahan', 'Schwede', 'Zhao'] | [] | PLoS One | 2010 | 12/30/2010 | 0 | on of human cancers, 411 probesets with a fold-change ≥2 in SAM analysis of GIPC1 depleted MDA-MB231 cells compared to control cells (GIPC1 signature; Table S1 ) were used to interrogate two publicly available and clinically annotated breast and ovarian cancer datasets with the Bioconductor package, globaltest [27] . A large merged breast cancer DNA microarray dataset with 689 |human colorectal cancer cells. By using RNAi to deplete GIPC1 mRNA in MDA-MB-231 cells we were able to identify a wide range of genes whose expression was altered. We compared this GIPC1 signature to publicly available breast and ovarian cancer gene expression datasets for which well-annotated phenotype and outcome data were available. We found strong correlation between the GIPC1 signature and a number o|h individual EASE GO term. Analysis of clinical breast and ovarian cancer public datasets The clinical relevance of the GIPC1 KD signature (n = 411 probesets) was evaluated in publicly available breast and ovarian cancer gene expression data which were downloaded from the Gene Expression Omnibus database at NCBI. After excluding patients with missing clinical data, the breast can|ined by merging the datasets GSE6532{{tag}}--REUSE-- [30] , GSE4922 [29] , and GSE7390 [28] . The ovarian cancer dataset contained 274 gene expression profiles from GSE9891 [31] . Association between gene expression of the GIPC1 KD signature and each clinical variable in the breast and ovarian cancer datasets were evaluated using globaltest [| were size, grade, type (malignant versus. low malignant potential) and overall survival. Analysis of merged MDA-MB231 GIPC1 KD and HMEC oncogene signature dataset An HMEC oncogene signature dataset, GSE3151 [23] was merged with the MDA-MB231 GIPC1 KD dataset. The merged dataset was normalized with RMA [20] using the Bioconductor package, affy . A meta-analysis was p | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
523 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2917034 | [u'20569502'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Sims', 'Taylor', 'Harrison', 'Muir', 'Langdon', 'Cameron', 'Kuske', 'Dixon', 'Liang', 'Walker', 'Faratian'] | [] | Breast Cancer Res | 2010 | 2010 | 0 | AND pmc_gds | 0 | 1 | ||||
524 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2553442 | [u'18635567'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Sotiriou', 'Haibe-Kains', 'Bontempi', 'Desmedt'] | ['Sotiriou', 'Haibe-Kains', 'Bontempi', 'Desmedt'] | Bioinformatics | 2008 | 10/1/2008 | 1 | k prediction methods on more than 1000 patients. This is made possible thanks to the recent publications of several large microarray datasets in gene expression databases, such as the Gene Expression Omnibus (GEO, Barrett et al. , 2005 ) An important outcome of the analysis is that, in spite of the large number of samples, there is no statistical evidence that complex methods outperform the simplest BC| 283 common probes), called VDX (Wang et al. , 2005 ), TBG (Desmedt et al. , 2007 ), TAM (Haibe-Kains et al. , 2008 ; Loi et al. , 2007 ) and UPP (Miller et al. , 2005 ). These datasets are publicly available from the GEO database 2 through accession numbers GSE2034, GSE7390, GSE6532{{tag}}--REUSE--/GSE9195 and GSE3494, respectively. VDX includes the gene expressions of 286 untreated node-negative BC patients |ene signatures and found that the signatures had many pathways in common such as cell cycle, regulation of cell cycle, mitosis, apoptosis, etc. Our group also investigated in a large meta-analysis of publicly available gene expression data, how different gene lists may give rise to signatures with equivalent prognostic performance and found by dissecting these signatures according to the main molecular pr | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
525 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2706265 | [u'19552798'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Sotiriou', 'Haibe-Kains', 'Loi', 'McArthur', 'Piccart', 'Lallemand', 'Speed', 'Conus'] | ['Sotiriou', 'Haibe-Kains', 'Loi', 'McArthur', 'Piccart', 'Speed', 'Lallemand'] | BMC Med Genomics | 2009 | 6/24/2009 | 0 | n previously used in another study (methods and demographics are described in Loi et al . [ 6 ] with the raw data available at the Gene Expression Omnibus (GEO) repository database , accession code GSE6532{{tag}}--REUSE--. Microarray analysis was performed with Affymetrix™ U133A Genechips ® (Affymetrix, Santa Clara, CA) according to Affymetrix™ protocols. There were 99 tumors classified as h | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
526 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2895626 | [u'20500821'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Loi', 'Haviv', 'Abraham', 'Kowalczyk', 'Zobel'] | ['Loi'] | BMC Bioinformatics | 2010 | 5/25/2010 | 0 | . Methods We explored a range of methods for extracting gene sets. These statistics are described below; we first discuss the data used. Data We used five breast cancer datasets from NCBI GEO [ 23 ]: GSE2034 [ 24 ], GSE4922 [ 25 ], GSE6532{{tag}}--REUSE-- [ 26 , 27 ], GSE7390 [ 28 ], and GSE11121 [ 29 ]. All five are Affymetrix HG-U133A microarrays (some datasets include other platforms; these platforms were excluded)|15 remaining probesets. The datasets were independently normalised (see Additional File 1 ). Data Composition The data contains both lymph-node-negative and node-positive breast cancer patients. For GSE7390, GSE11121, and GSE2034, none of the patients received adjuvant treatment. For GSE6532{{tag}}--REUSE-- and GSE4922, some patients received adjuvant therapy; these were removed from the data. The data contains patie|e considered noninformative and were removed from the data, as shown in Table 1 . Table 1 Sample sizes and breakdown by class Dataset Good Obs. Removed Obs. < 5 years ≥ 5 years Total GSE2034 82 165 247 8 GSE4922 30 103 133 9 GSE6532{{tag}}--REUSE-- 21 91 112 25 GSE7390 36 154 190 8 GSE11121 28 154 182 18 Observations (samples) were removed if they were censored before the 5-year cutoff. Gene Sets We u|bility of the ranks, we used the percentile bootstrap to sample the observations with replacement, generating a bootstrap distribution for the centroid weights for genes and gene sets in one dataset (GSE4922). Since there are 22,215 genes and only 5414 gene sets, a reduced gene list was derived by training a centroid classifier on the GSE11121 dataset and selecting the 5414 genes with the highest absol|t alone between them; gene set features are more stable. Figure 3 Bootstrap . Mean and 2.5%/97.5% of the ranks of genes and gene sets (set centroid statistic), over 5000 bootstrap replications of the GSE4922 dataset. The features have been sorted by their mean rank. Concordance of Datasets We were interested in how the different datasets agreed on the importance of the features (genes or gene sets). We|sed on publications of expression profiles, rarely using more than dozens of samples. To see whether different MSigDB categories were more useful for predicting metastasis, we combined four datasets (GSE2034, GSE4922, GSE6532{{tag}}--REUSE--, and GSE7390) into a single training set. A separate centroid classifier was trained on each gene set, using the set-centroid statistic, and the gene sets were then ranked by thei| negative correlations. Figure 6 Kolmogorov-Smirnov analysis . Kolmogorov-Smirnov enrichment for MSigDB categories, using the set-centroid statistic. A AUC and spline smooth for each set, tested on GSE11121. B Number of mapped probesets in each set, on log 2 scale, and spline smooth. C Two-sample Kolmogorov-Smirnov Brownian-bridge for each MSigDB category ( p -values: C1: 1.44 × 10 -4 , | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
527 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2423197 | [u'18498629'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Sotiriou', 'Daidone', 'Delorenzi', 'Berns', 'Ellis', 'Jansen', 'Haibe-Kains', 'Loi', 'Desmedt', 'Ryder', 'Reid', 'Piccart', 'Gillet', 'Bontempi', 'Pierotti', 'Wirapati', 'Foekens', 'Tutt', 'Lallemand'] | ['Sotiriou', 'Daidone', 'Berns', 'Ellis', 'Jansen', 'Gillet', 'Haibe-Kains', 'Loi', 'Desmedt', 'Ryder', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Pierotti', 'Wirapati', 'Foekens', 'Tutt', 'Lallemand'] | BMC Genomics | 2008 | 5/22/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
528 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2496892 | [u'18714337'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['McArthur', 'Musgrove', 'Sergio', 'Alles', 'Ormandy', 'Loi', 'Inman', 'Sch\xc3\xbctte', 'Anderson', 'Butt', 'Pinese', 'Sutherland', 'Caldon', 'Gardiner-Garden'] | ['McArthur', 'Loi'] | PLoS One | 2008 | 8/20/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
529 | GSE6532 | 3/1/2007 | ['6532'] | [] | [u'17401012', u'18498629', u'20479250'] | 2890442 | [u'20479250'] | ['Gonzalez-Angulo', 'Larsimont', 'Ellis', 'Loi', 'Mills', 'Hennessy', 'Bardelli', 'Majjaj', 'Buyse', 'Foekens', 'Durbecq', 'Pusztai', 'Sotiriou', 'Haibe-Kains', 'Desmedt', 'Harris', 'Lallemand', 'Tutt', 'Gillett', 'Berns', 'Ryder', 'Bergh', 'Reid', 'Piccart', 'Delorenzi', 'Bontempi', 'Jansen', 'Wirapati', 'Daidone', 'Pierotti', 'Phillips', 'Klijn', 'Symmans', 'McArthur', 'Gillet', 'Speed'] | ['Gonzalez-Angulo', 'Sotiriou', 'Larsimont', 'Ellis', 'Haibe-Kains', 'Loi', 'Mills', 'Symmans', 'Tutt', 'McArthur', 'Phillips', 'Bardelli', 'Hennessy', 'Majjaj', 'Piccart', 'Pusztai', 'Speed', 'Durbecq', 'Gillett', 'Lallemand'] | ['Gonzalez-Angulo', 'Sotiriou', 'Larsimont', 'Ellis', 'Haibe-Kains', 'Loi', 'Mills', 'Symmans', 'Hennessy', 'McArthur', 'Phillips', 'Bardelli', 'Tutt', 'Piccart', 'Majjaj', 'Pusztai', 'Speed', 'Durbecq', 'Gillett', 'Lallemand'] | Proc Natl Acad Sci U S A | 2010 | 6/1/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
530 | GSE6534 | 12/28/2007 | ['6534'] | [] | [u'18093333'] | 2234262 | [u'18093333'] | ['Gonzalez', 'Vodkin'] | ['Gonzalez', 'Vodkin'] | ['Gonzalez', 'Vodkin'] | BMC Genomics | 2007 | 12/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
531 | GSE6536 | 2/9/2007 | ['6536'] | [] | [u'17289997'] | 2557141 | [u'18846218'] | ['Deloukas', 'Redon', 'Lee', 'Stranger', 'Hurles', 'Dermitzakis', 'de', 'Ingle', 'Thorne', 'Beazley', 'Dunning', 'Tyler-Smith', 'Scherer', 'Tavar\xc3\xa9', u'Tavare', 'Forrest', 'Bird', 'Carter'] | ['Ouyang', 'Krontiris', 'Smith'] | [] | PLoS One | 2008 | 2008 | 1 | s time taking advantage of the high-density HapMap Phase II data – within the block containing the reported peak SNPs. We applied public HapMap expression data across three major populations (GSE2552 [13] and GSE5859 [19] , based on the Affymetrix platform, and GSE6536{{tag}}--REUSE-- [20] , based on the Illumina platform). 10.1371/journal.pone.0003362.t001 T|in Ibadan, Nigeria; CHB: Han Chinese in Beijing, China; JPT: Japanese in Tokyo, Japan. 3 Relative position (upstream; up / downstream; dn) to initiation/termination sites. 4 Based on public data from GSE 6536{{tag}}--REUSE-- (Illumina platform) or GSE 2552 / GSE 5859 (Affymetrix platform). 5 Reported in Morley et al., Nature 430, 743–7 (2004). 6 Reported in Cheung et al., Nature 437, 1365–9 (2005).|es in the lineage), a SNP with the most complete genotypes (underlined) was chosen for testing association. The nominal p-values for these SNPs in each major population, based on expression data sets GSE6536{{tag}}--REUSE-- (Illumina platform) and GSE2552/GSE5859 (Affymetrix platform), are shown in order. The coalescent-based maximum likelihood tree structure and the regression of expression phenotypes are plotted at |ed the same evolutionarily conserved feature, we next asked whether their cis -regulatory phenotypes were, as expected, also conserved across populations. Based on one set of expression data in YRI (GSE6536{{tag}}--REUSE--, Illumina platform) and two sets data in CHB/JPT (GSE5859, Affymetrix platform; GSE6536{{tag}}--REUSE--, Illumina platform), we tested the association for all tagging SNPs ( Figure 4 and Supporting Information F|age or allelic imbalance (AI) assays, we added a further validation step by confirming the cis -association using an independent dataset having a relatively large sample size [33] (GSE8052; Affymetrix platform; 400 UK samples). Thirty of the 44 genes passed the genome-wide significance threshold (a LOD score of 6.076, corresponding to a false discovery rate of 0.05, as listed in supp|tion. Association analysis between each tagging SNP and two sets of HapMap expression data, based on two (Affymetrix and Illumina) platforms and across three HapMap populations, (GEO accession number GSE2552 [13] , GSE5859 [19] , and GSE6536{{tag}}--REUSE-- [20] ), was conducted by following the regression methods described in Cheung et al. [13] (dis | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
532 | GSE6536 | 2/9/2007 | ['6536'] | [] | [u'17289997'] | 2654149 | [u'19300510'] | ['Deloukas', 'Redon', 'Lee', 'Stranger', 'Hurles', 'Dermitzakis', 'de', 'Ingle', 'Thorne', 'Beazley', 'Dunning', 'Tyler-Smith', 'Scherer', 'Tavar\xc3\xa9', u'Tavare', 'Forrest', 'Bird', 'Carter'] | ['Hennah', 'Porteous'] | [] | PLoS One | 2009 | 2009 | 1 | s from 10 kb upstream of the immediately adjacent gene TSNAX to 10 kb downstream of DISC1 for association to expression values of DISC1. Expression values were from four publicly available data sets (GSE6536{{tag}}--REUSE-- in NCBI GEO) drawn from a common set of 210 lymphoblastoid cell lines from the four HapMap population cohorts; 60 CEU, 60 YRI, 45 CHB and 45 JPT [16] . Of the 754 variants tested, |s included in the HapMap project. Expression data The expression data came from three independent laboratories and can be obtained from the NCBI GEO database under the following identification codes: GDS2106 [17] , GDS1048 [18] , and GSE6536{{tag}}--REUSE-- [16] . One group had performed a replication analysis for their samples (GDS2106), which we used here as a fourt|cal replicate for the observations in GDS2106. Data were derived from different gene chip platforms: GDS2106 used Affymetrix GeneChip Human HG-Focus Target Array, GDS1048 used a Rosetta platform, and GSE6536{{tag}}--REUSE-- used the Illumina Sentrix Human-6 Expression BeadChip. The Affymetrix platform was re-annotated using a custom GeneChip library file (CDF file) [32] . Each study tested a different|n the expression levels of DISC1 using the one-way ANOVA function in SPSS (version 14.0 for Windows). This was performed for all four HapMap populations using the expression data from the GEO dataset GSE6536{{tag}}--REUSE--. It was predicted that if a genomic area truly regulates DISC1 expression, then that region would be detected to associate in all populations. These findings were mapped back on to the linkage dise | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
533 | GSE6536 | 2/9/2007 | ['6536'] | [] | [u'17289997'] | 2844390 | [u'20187971'] | ['Deloukas', 'Redon', 'Lee', 'Stranger', 'Hurles', 'Dermitzakis', 'de', 'Ingle', 'Thorne', 'Beazley', 'Dunning', 'Tyler-Smith', 'Scherer', 'Tavar\xc3\xa9', u'Tavare', 'Forrest', 'Bird', 'Carter'] | ['Lian', 'Hsiao', 'Hsieh', 'Fann'] | [] | BMC Bioinformatics | 2010 | 2/27/2010 | 0 | upplementary material). Ethnic population data: HapMap To validate the performance of eQTL identification models for GWAS data, we reanalyzed the gene expression dataset from Gene Expression Omnibus (GSE6536{{tag}}--REUSE--) using 60 unrelated CEU and 90 CHB + JPT individuals. Recent studies have demonstrated that probe sequences including SNPs would influence the hybridization on microarrays and cause false cis eQT | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
534 | GSE6536 | 2/9/2007 | ['6536'] | [] | [u'17289997'] | 2735003 | [u'19742321'] | ['Deloukas', 'Redon', 'Lee', 'Stranger', 'Hurles', 'Dermitzakis', 'de', 'Ingle', 'Thorne', 'Beazley', 'Dunning', 'Tyler-Smith', 'Scherer', 'Tavar\xc3\xa9', u'Tavare', 'Forrest', 'Bird', 'Carter'] | ['Giacomini', 'Stryke', 'Picard', 'Ferrin', 'Shima', 'Gow', 'Sj\xc3\xb6din', 'Ha', 'Poon', 'Kroetz', 'Ma', 'Choi', 'Castro', 'Chen', 'Matsson', 'Tahara', 'Myers', 'Roelofs', 'Kwok', 'Johns', 'Yee', 'Hesselson', 'Fukushima', 'Kobayashi'] | [] | PLoS One | 2009 | 9/9/2009 | 0 | m the HapMap website ( http://www.hapmap.org ), and normalized expression data from http://www.sanger.ac.uk/humgen/genevar/ . This data has been deposited in the MIAME database with accession number GSE6536{{tag}}--DEPOSIT--. Associations were calculated using PLINK v1.04 [42] ( http://pngu.mgh.harvard.edu/purcell/plink/ ) for 46 common SNPs (MAF≥5%) that were found in both datasets, a | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
535 | GSE6536 | 2/9/2007 | ['6536'] | [] | [u'17289997'] | 2782819 | [u'19925429'] | ['Deloukas', 'Redon', 'Lee', 'Stranger', 'Hurles', 'Dermitzakis', 'de', 'Ingle', 'Thorne', 'Beazley', 'Dunning', 'Tyler-Smith', 'Scherer', 'Tavar\xc3\xa9', u'Tavare', 'Forrest', 'Bird', 'Carter'] | ['Dolan', 'Zhang'] | [] | Curr Pharm Des | 2009 | 2009 | 1 | AND pmc_gds | 0 | 1 | ||||
536 | GSE6536 | 2/9/2007 | ['6536'] | [] | [u'17289997'] | 2244606 | [u'18190704'] | ['Deloukas', 'Redon', 'Lee', 'Stranger', 'Hurles', 'Dermitzakis', 'de', 'Ingle', 'Thorne', 'Beazley', 'Dunning', 'Tyler-Smith', 'Scherer', 'Tavar\xc3\xa9', u'Tavare', 'Forrest', 'Bird', 'Carter'] | ['G\xc3\xb3mez', 'Rodr\xc3\xadguez-Santiago', 'Aguil\xc3\xb3', 'Pujana', 'Estivill', 'Abril', 'Maxwell', 'de', 'Nunes', 'Hern\xc3\xa1ndez', 'Capell\xc3\xa1', 'Condom', 'Sol\xc3\xa9', 'Moreno', 'P\xc3\xa9rez-Jurado', 'Armengol', 'Gruber'] | ['de'] | BMC Genomics | 2008 | 1/11/2008 | 1 | uals in total (60 Utah residents with ancestry from northern and western Europe; 45 Han Chinese in Beijing; 45 Japanese in Tokyo; and 60 Yoruba in Ibadan Nigeria; Gene Expression Omnibus (GEO) record GSE6536{{tag}}--REUSE--) [ 42 ]. Transcriptional differences were scanned between the 128 and 129 Mb of chromosome 8, corresponding to ~1,530 SNPs (NCBI build 35). Scans were performed in R with the SNPassoc package [ 68 | 70 ]. Microarray gene expression analysis Using the HapMap lymphocyte expression data [ 42 ] and the prostate cancer data of Tomlins et al. [ 25 ], matrix series were downloaded from GEO references GSE6536{{tag}}--REUSE-- and GSE6099, respectively. Using the Singh et al. [ 30 ] raw data, background correction, normalization and averaging of expression values were performed with the robust multi-array average (RMA) |PI) tolerance was set to 0.20 and the mutual information (MI) threshold was 0.05. Normalized data sets of MYC/Myc-driven cellular transformation and tumorigenesis were downloaded from the GEO records GSE3151 and GSE3158 [ 35 , 36 ]. Gene probes were matched using the NetAffx (Affymetrix) tool and differentially expressed probes were identified by calculating two-tailed t -test P values. Genotyping a | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
537 | GSE6536 | 2/9/2007 | ['6536'] | [] | [u'17289997'] | 2683249 | [u'17873874'] | ['Deloukas', 'Redon', 'Lee', 'Stranger', 'Hurles', 'Dermitzakis', 'de', 'Ingle', 'Thorne', 'Beazley', 'Dunning', 'Tyler-Smith', 'Scherer', 'Tavar\xc3\xa9', u'Tavare', 'Forrest', 'Bird', 'Carter'] | ['Dimas', 'Deloukas', 'Stranger', 'Flicek', 'Ingle', 'Dermitzakis', 'Beazley', 'Dunning', 'Tavar\xc3\xa9', 'Koller', 'Montgomery', 'Nica', 'Forrest', 'Bird'] | ['Deloukas', 'Stranger', 'Dermitzakis', 'Ingle', 'Beazley', 'Dunning', 'Tavar\xc3\xa9', 'Forrest', 'Bird'] | Nat Genet | 2007 | 2007 Oct | 0 | the genic and immediate intergenic regions. The association data will become available at the Ensembl web site in the October 2007 as Distributed Annotation System (DAS) tracks to enable browsing and downloading. Finally, we have attempted to analyze effects in trans by adopting a “candidate variants approach” assigning prior relevance to those SNPs already known to be associated with c|ermutations were performed to assess significance of the nominal p-values 29 , 30 . Accession Numbers The expression data reported in this paper have been previously deposited in the Gene Expression Omnibus (GEO) ( http://www.ncbi.nlm.nih.gov/geo ) database (Series Accession Number GSE6536{{tag}}--REUSE-- 19 ). Supplementary Material body 1 Click here to view. (834K, pdf) 2 Click here to view. (1.3M, pdf) 3 Click he | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
538 | GSE6536 | 2/9/2007 | ['6536'] | [] | [u'17289997'] | 2213701 | [u'18208332'] | ['Deloukas', 'Redon', 'Lee', 'Stranger', 'Hurles', 'Dermitzakis', 'de', 'Ingle', 'Thorne', 'Beazley', 'Dunning', 'Tyler-Smith', 'Scherer', 'Tavar\xc3\xa9', u'Tavare', 'Forrest', 'Bird', 'Carter'] | ['Stranger', 'Dermitzakis', 'Lovell', 'Leongamornlert', 'Johnston', 'Ross'] | ['Dermitzakis', 'Stranger'] | PLoS Genet | 2008 | 2008 Jan | 0 | ls. Final data points for each gene are the mean of the four normalized hybridisation values. Log 2 transformed mRNA expression values were used throughout except where otherwise stated. Data can be downloaded from http://www.sanger.ac.uk/humgen/genevar/ and the Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo/ ) entry GSE6536{{tag}}--DEPOSIT--. Selection of expression data for analysis. We established an appro | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
539 | GSE6538 | 12/1/2007 | ['6538'] | [] | [u'19925645'] | 2791104 | [u'19925645'] | ['Moreau', 'Van', 'Hannes', 'De', 'Vermeesch', 'Allemeersch'] | ['Moreau', 'Van', 'Hannes', 'De', 'Vermeesch', 'Allemeersch'] | ['Moreau', 'Van', 'Hannes', 'De', 'Vermeesch', 'Allemeersch'] | BMC Bioinformatics | 2009 | 11/19/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
540 | GSE6541 | 11/15/2007 | ['6541'] | [] | [u'18197293'] | 2199289 | [u'18197293'] | ['Mandavilli', 'Thomas', 'Madamanchi', 'Christiani', 'Runge', 'Costa', 'Gilmour', 'Schladweiler', 'Ledbetter', 'Jaskot', 'Wallenborn', 'Kodavanti', 'Richards', 'Nyska', 'Peddada', 'Karoly'] | ['Mandavilli', 'Thomas', 'Madamanchi', 'Christiani', 'Runge', 'Costa', 'Gilmour', 'Schladweiler', 'Ledbetter', 'Jaskot', 'Wallenborn', 'Kodavanti', 'Richards', 'Nyska', 'Peddada', 'Karoly'] | ['Mandavilli', 'Thomas', 'Madamanchi', 'Christiani', 'Kodavanti', 'Runge', 'Costa', 'Gilmour', 'Schladweiler', 'Ledbetter', 'Wallenborn', 'Jaskot', 'Richards', 'Nyska', 'Peddada', 'Karoly'] | Environ Health Perspect | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
541 | GSE6542 | 12/7/2007 | ['6542'] | [] | [u'17411344'] | 2998870 | [u'21151841'] | ['Wijnen', 'Boothroyd', 'Young', 'Naef', 'Saez'] | ['Tominaga'] | [] | Bioinform Biol Insights | 2010 | 11/22/2010 | 0 | nformation criterion. 4 The combinatorial search does not require a long computational time for most DNA microarray time series datasets found on the web, such as the datasets in the Gene Expression Omnibus 10 and ArrayExpress. 11 We improve the peridicity detection performance of the piccolo method by introducing Akaike’s Information Criterion (AIC) 4 instead of BIC, and demonstrate its perfo| the piccolo method than for the other methods. Detection of circadian rhythm Data The five detection methods are applied to experimentally observed DNA microarray data taken from the Gene Expression Omnibus online database by NCBI, NIH, 10 to detect genes (probes) which have 24-hour periodicity, or ‘circadian rhythm’. The P -values obtained by the Kolmogorov-Smirnov test for the normal|ts in which the data length of each time series is 7 or less. Biological description of datasets All twelve DNA microarray datasets are time series observations intending to analyze circadian rhythm. GDS1629 is a set of fourty five samples of a immortalized suprachiasmatic nucleus cell line of normal rat for 42 hours, every 6 hours (eight time points). The dataset contains five or six samples for each |me point. We only use one of them, whose sample ID is the largest. GDS2110 is a set of six samples of normal Macaca mulatta adult females adrenal glands for 20 hours, every 4 hours (six time points). GDS2232 is a set of twenty four samples of normal mouse adrenal glands for 44 hours, every 4 hours (twelve time points). The dataset contains two samples for each time point. We only use one of them, which|al mouse aortae for 44 hours, every 4 hours (twelve time points). The dataset contains two samples for the first time point. We only use one of them, which appears earlier in the published data file. GSE3424 is a set of eight samples of normal Arabidopsis thaliana for 20 hours, every 4 hours (six time points). The dataset contains two samples for two time points (0-hour and 12-hour). We only use one of|/B), for all datasets. Ratios of S in Table 3 , which is the number of probes detected by the piccolo method but not by other four methods, to the number of total probes in each dataset are 0.333 (GDS1629) to 0.776 (GSE6542{{tag}}--REUSE--_3). This means that using the piccolo method we find that 77.6% of all probes in GSE6542{{tag}}--REUSE--_3 are under the influence of circadian oscillation mechanisms but other four methods cann|r hand, ratios of the numbers of probes detected by one or more of the other four methods but not detected by the piccolo method to the number of total probes in each dataset are in the range of 0.0 (GSE6542{{tag}}--REUSE--_2, GSE6542{{tag}}--REUSE--_4, GSE6542{{tag}}--REUSE--_6) to 0.0418 (GDS404), or less than 5% (data not shown). The time series of a probe detected by only the piccolo method is plotted in each panel in Figure 3 (one probe is ch|argest ratio between the maximum power and the second largest power is plotted. Ranges of sampling time points are different by datasets. Datasets and its time ranges are: Top (left to right)—GDS1629 (44 h), GDS2110 (20 h), GDS2232 (44 h), Second (left to right)—GDS404 (44 h), GSE3424 (20 h), GSE6542{{tag}}--REUSE--_1 (20 h), Third (left to right)—GSE6542{{tag}}--REUSE--_2 (20 h), GSE6542{{tag}}--REUSE--_3 (20 h), GSE6542{{tag}}--REUSE--_4 (|ies. Table 2. P -values and their standard deviations (sd) for the normality test of the distribution of powers and logarithms of powers of time series data in datasets taken from the Gene Expression Omnibus database. N T Int. Power Log of power GDS1629 6346 8 6 0.898 (sd: 0.127) 0.911 (sd: 0.112) GDS2110 14904 6 4 0.936 (sd: 0.0712) 0.936 (sd: 0.0712) GDS2232 29109 12 4 0.845 (sd: 0.172) 0.828 (sd: 0.18|sd: 0.0751) 0.936 (sd: 0.0719) GDS404 6484 12 4 0.871 (sd: 0.153) 0.865 (sd: 0.159) GSE6542{{tag}}--REUSE---1 11699 6 4 0.936 (sd: 0.0712) 0.936 (sd: 0.0712) GSE6542{{tag}}--REUSE---2 11699 6 4 0.936 (sd: 0.0720) 0.936 (sd: 0.0718) GSE6542{{tag}}--REUSE---3 11699 6 4 0.945 (sd: 0.0682) 0.944 (sd: 0.0681) GSE6542{{tag}}--REUSE---4 11699 6 4 0.931 (sd: 0.0735) 0.932 (sd: 0.0735) GSE6542{{tag}}--REUSE---5 11699 12 4 0.877 (sd: 0.145) 0.875 (sd: 0.147) GSE6542{{tag}}--REUSE---6 11699 6 4 0.937 (sd: 0|ectively. The annotated probes are labeled with the GO term ‘circadian rhythm’ in the chip definition files of the microarrays. C Quantile Q test Ahdesmäki Piccolo/B Piccolo S GDS1629 22 146 / 1 60 / 0 121 / 0 163 / 1 2231 / 7 1981 GDS2110 26 – 667 / 0 457 / 2 0 / 0 10658 / 16 9745 GDS2232 37 4005 / 1 3053 / 3 5343 / 4 9118 / 11 23057 / 28 10892 GSE3424 33 – 2853| 0 / 0 19233 / 29 15837 GDS404 12 714 / 2 497 / 2 655 / 3 752 / 3 4044 / 6 2670 GSE6542{{tag}}--REUSE---1 28 – 529 / 0 401 / 1 0 / 0 8554 / 23 7829 GSE6542{{tag}}--REUSE---2 28 – 413 / 0 339 / 2 0 / 0 7706 / 23 7118 GSE6542{{tag}}--REUSE---3 28 – 720 / 0 340 / 1 0 / 0 9939 / 23 9099 GSE6542{{tag}}--REUSE---4 28 – 495 / 0 361 / 0 0 / 0 8924 / 23 8235 GSE6542{{tag}}--REUSE---5 28 799 / 3 623 / 0 656 / 5 1046 / 4 6038 / 19 4335 GSE6542{{tag}}--REUSE---6 28 – 5 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
542 | GSE6542 | 12/7/2007 | ['6542'] | [] | [u'17411344'] | 1847695 | [u'17411344'] | ['Wijnen', 'Boothroyd', 'Young', 'Naef', 'Saez'] | ['Wijnen', 'Boothroyd', 'Young', 'Naef', 'Saez'] | ['Saez', 'Boothroyd', 'Young', 'Naef', 'Wijnen'] | PLoS Genet | 2007 | 4/6/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
543 | GSE6543 | 7/26/2007 | ['6543'] | [] | [u'17652709'] | 1983368 | [u'17652709'] | ['Stone', 'McGlinn', 'Budak', 'Tobias', 'Khurana', 'Baldwin'] | ['Stone', 'McGlinn', 'Budak', 'Tobias', 'Khurana', 'Baldwin'] | ['Stone', 'McGlinn', 'Budak', 'Tobias', 'Khurana', 'Baldwin'] | Invest Ophthalmol Vis Sci | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
544 | GSE6547 | 4/27/2007 | ['6547'] | ['2751'] | [u'17368442'] | 2785812 | [u'19917117'] | ['Fay', 'Kirienko'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547{{tag}}--REUSE-- Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
545 | GSE6548 | 6/22/2007 | ['6548'] | [] | [u'17573968'] | 1929103 | [u'17573968'] | ['Nicoulaz', 'Bonnefoi', 'Brisken', 'Iggo', 'Andr\xc3\xa9', 'Fiche', u'Andr\xe9', 'Duss'] | ['Nicoulaz', 'Bonnefoi', 'Brisken', 'Iggo', 'Andr\xc3\xa9', 'Fiche', 'Duss'] | ['Nicoulaz', 'Bonnefoi', 'Brisken', 'Iggo', 'Andr\xc3\xa9', 'Fiche', 'Duss'] | Breast Cancer Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
546 | GSE6557 | 5/19/2007 | ['6557'] | [] | [u'17510629'] | 1894767 | [u'17510629'] | ['Walfridsson', 'Gustafsson', 'Khorosjutina', 'Ekwall', 'Matikainen'] | ['Walfridsson', 'Gustafsson', 'Khorosjutina', 'Ekwall', 'Matikainen'] | ['Walfridsson', 'Gustafsson', 'Khorosjutina', 'Ekwall', 'Matikainen'] | EMBO J | 2007 | 6/20/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
547 | GSE6558 | 4/16/2007 | ['6558'] | ['2830'] | [u'17584255'] | 2785812 | [u'19917117'] | ['S\xc3\xb8rensen', 'Loeschcke', 'Nielsen', u'S\xf8rensen'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558{{tag}}--REUSE-- DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
548 | GSE6559 | 1/2/2007 | ['6559'] | [] | [u'17440097'] | 2239246 | [u'17440097'] | ['Adams', 'Zhang', u'Ricky', 'Khavari', 'Tao', 'Ridky'] | ['Tao', 'Ridky', 'Adams', 'Khavari', 'Zhang'] | ['Tao', 'Ridky', 'Adams', 'Khavari', 'Zhang'] | Cancer Res | 2007 | 4/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
549 | GSE6564 | 3/2/2007 | ['6564'] | [] | [u'17335352'] | 1808074 | [u'17335352'] | ['Greil', 'de', 'van'] | ['Greil', 'de', 'van'] | ['Greil', 'de', 'van'] | PLoS Genet | 2007 | 3/2/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
550 | GSE6565 | 7/6/2007 | ['6565'] | [] | [u'17565682'] | 1906768 | [u'17565682'] | ['Krakow', 'Cohn', 'Funari', 'Nelson', 'Chen', 'Day'] | ['Krakow', 'Cohn', 'Funari', 'Nelson', 'Chen', 'Day'] | ['Krakow', 'Cohn', 'Funari', 'Nelson', 'Chen', 'Day'] | BMC Genomics | 2007 | 6/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
551 | GSE6569 | 3/5/2007 | ['6569'] | [] | [u'17332353'] | 2602602 | [u'19104654'] | ['Clark', 'Fairchild', 'Han', 'Platero', 'Wong', 'Shaw', 'Huang', 'Reeves', 'Lee'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569{{tag}}--REUSE-- GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
552 | GSE6569 | 3/5/2007 | ['6569'] | [] | [u'17332353'] | 2943622 | [u'20660011'] | ['Clark', 'Fairchild', 'Han', 'Platero', 'Wong', 'Shaw', 'Huang', 'Reeves', 'Lee'] | ['Taschereau', 'Plaisier', 'Wong', 'Graeber'] | ['Wong'] | Nucleic Acids Res | 2010 | 9/1/2010 | 0 | res are then characterized using algorithms that measure statistical enrichment for genes in particular pathways, with particular functions or with particular structural characteristics attained from publicly available databases. The statistical significance of enrichment is typically determined using the hypergeometric distribution or equivalently the one-tailed version of Fisher’s exact test. Al| and homologs were identified using HomoloGene; only the probe with the highest absolute signed t -test P -value within those with matching UniGene identifiers was kept in the collapsing step. Data downloaded from Gene-expression Omnibus (GEO) ( 19 ): MPAKT prostate cancer mouse model, GSE1413; breast tumors with ER, PR and HER2 status, GSE2603; MMTV-HER2/neu mouse model, GSE2528; BCR-ABL transfected ce|0912; mammary stem cell, GSE3711; KRAS2 overexpression cell line, GSE3151; lung tumors with KRAS2 status, GSE3141; imatinib treatment in leukemia patients, GSE2535; dasatinib treatment in cell lines, GSE9633 and GSE6569{{tag}}--REUSE--; castration and testosterone treatment in mice, GSE5901; gedunin treatment in prostate cancer cell line, GSE5506. Data downloaded from Array Express ( 20 ): imatinib treatment in leukem|orodeoxyglucose-positron emission tomography (FDG-PET) imaging (N. Palaskas et al. , submitted for publication). Gene-expression information from microarray data repositories such as Gene-expression Omnibus ( 19 ) and ArrayExpress ( 53 ) as well as from more specialized resources such as the drug response profile database Cmap ( 10 ) can be readily converted to ranked list inputs for our program. Thus, |dependency Ann Stat 2001 29 1165 1188 18 Dudoit S Shaffer JP Boldrick JC Multiple hypothesis testing in microarray experiments Stat. Sci. 2003 18 71 103 19 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res. 2002 30 207 210 11752295 20 Brazma A Parkinson H Sarkans U Shojatalab M Vilo J Abeygunawardena N Holloway E Kapushesky | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
553 | GSE6571 | 7/1/2007 | ['6571'] | [] | [u'18300050'] | 3022651 | [u'21187013'] | ['Jakob', 'Beckers', 'Bitterle', 'Beck-Speier', 'Frankenberger', 'Maier', u'Rieger', 'Horsch', 'St\xc3\xb6ger', 'Krauss-Etschmann', 'Hofer', 'Schulz', 'H\xc3\xbcltner', 'Alessandrini', 'Diabat\xc3\xa9', 'Behrendt', 'Ziesenis', u'St\xf6ger'] | ['Withers', 'Thomason', 'Scott', 'Li'] | [] | BMC Syst Biol | 2010 | 12/27/2010 | 0 | orithm. This dataset has been deposited in EBI ArrayExpress Database with accession number E-BAIR-12 http://www.ebi.ac.uk/microarray-as/ae/ . The second dataset was obtained from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ , accession ID: GSE4671 ), which contains 28 Affymetrix mouse genome 4302 chips. Nine-week old male mice in this experiment were fed with either a control cho|7), retroperitoneal WAT was obtained from two control mice and two test mice. Microarray data were processed using GCRMA algorithm. The third dataset was also acquired from GEO with accession number GSE8831 , in which 20 female and 15 male C57BL/6J mice, fed by ad libitum , varied in body weight and insulin sensitivity were studied. Fasting blood glucose and serum insulin concentrations were measured|15 genes and fewer than 500 genes; the results returned were significant in the above four datasets as well as an additional dataset which included 45 sleep mice model with hypothalamus tissue (GEO: GSE6514 ). We then worked on those canonical pathway sets which passed the same filter only on BAIR fat-fed mice and human fat dataset about 20 lean and 19 obese person[ 22 ]. Protein-protein interaction n| Nnat include Aqp1, Sncg, Sulf2 and Cxcl9 . Additional file 1 , Figure S3 illustrates the expressions of the top four most correlated genes ( Gstt1, Ccdc80, Hfe Sod3 ) with Nnat in BAIR and GSE6571{{tag}}--REUSE-- datasets. Among these genes is only one transcription factor, Ebf1 which has previously been suggested to have a regulatory role in adipogenesis[ 24 ]. Another of the Nnat -correlated genes, L | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
554 | GSE6572 | 4/20/2007 | ['6572'] | [] | [u'18495935'] | 2820453 | [u'20053297'] | ['Wang', 'Zhang', 'Jin', 'Lin', 'Yan', 'Ping', 'Elmerich', 'Li', 'Peng', 'Liu', 'Yao', 'Yang', 'Chen', 'Geng', 'He', 'Yu', 'Lu', 'Zhan', 'Dou'] | ['Han', 'Zhang', 'Jin', 'Lin', 'Ping', 'Li', 'Peng', 'Chen', 'Yang', 'Fan', 'Cheng', 'Yan', 'Lu', 'Zhan', 'Dou'] | ['Yang', 'Zhang', 'Jin', 'Lin', 'Ping', 'Li', 'Peng', 'Yan', 'Chen', 'Lu', 'Zhan', 'Dou'] | BMC Genomics | 2010 | 1/7/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
555 | GSE6572 | 4/20/2007 | ['6572'] | [] | [u'18495935'] | 2396677 | [u'18495935'] | ['Wang', 'Zhang', 'Jin', 'Lin', 'Yan', 'Ping', 'Elmerich', 'Li', 'Peng', 'Liu', 'Yao', 'Yang', 'Chen', 'Geng', 'He', 'Yu', 'Lu', 'Zhan', 'Dou'] | ['Wang', 'Zhang', 'Jin', 'Lin', 'Yan', 'Ping', 'Elmerich', 'Li', 'Peng', 'Liu', 'Yao', 'Yang', 'Chen', 'Geng', 'He', 'Yu', 'Lu', 'Zhan', 'Dou'] | ['Wang', 'Zhang', 'Jin', 'Lin', 'Ping', 'Elmerich', 'Li', 'Peng', 'Liu', 'Yao', 'Yang', 'Chen', 'Yan', 'Geng', 'He', 'Yu', 'Lu', 'Zhan', 'Dou'] | Proc Natl Acad Sci U S A | 2008 | 5/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
556 | GSE6577 | 4/1/2007 | ['6577'] | ['2827'] | [u'17404078'] | 2481503 | [u'18559090'] | ['Hegardt', 'Ed\xc3\xa9n', 'Malmstr\xc3\xb6m', u'Ferno', u'Eden', 'Bendahl', 'Saal', 'Fern\xc3\xb6', u'Malmstrom', 'Borg', 'Laakso', 'Gruvberger-Saal', 'Peterson', 'Isola'] | ['Hegardt', 'Grabau', 'Honeth', 'Ringn\xc3\xa9r', 'Bendahl', 'Saal', 'L\xc3\xb6vgren', 'Fern\xc3\xb6', 'Borg', 'Gruvberger-Saal'] | ['Hegardt', 'Bendahl', 'Saal', 'Fern\xc3\xb6', 'Borg', 'Gruvberger-Saal'] | Breast Cancer Res | 2008 | 2008 | 0 | rt, mRNA expression analysis has previously been performed using cDNA microarrays with 27,648 reporters [ 22 , 23 ]. The microarray data for these 168 tumors are available through the Gene Expression Omnibus database (accession numbers GSE6577{{tag}}--DEPOSIT-- and GSE5325). Data pre-processing and filtering for the selected 168 tumors were performed using the BioArray Software Environment [ 24 ] as previously described [ | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
557 | GSE6578 | 3/14/2007 | ['6578'] | ['2826'] | [u'17389363'] | 1851616 | [u'17389363'] | ['Portoghese', 'Lunzer', 'Hebbel', 'Yekkirala'] | ['Portoghese', 'Lunzer', 'Hebbel', 'Yekkirala'] | ['Portoghese', 'Lunzer', 'Hebbel', 'Yekkirala'] | Proc Natl Acad Sci U S A | 2007 | 4/3/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
558 | GSE6583 | 12/1/2007 | ['6583'] | [] | [] | 2048692 | [u'17905899'] | [''] | ['Zhang', 'Ouyang', 'Chua', 'Catala', 'Hu', 'Abreu', 'Seo'] | [] | Plant Cell | 2007 | 2007 Sep | 0 | nning were performed at the Genomic Resource Center (The Rockefeller University; http://www.rockefeller.edu/genomics ). All microarray data will be available in the public repository Gene Expression Omnibus upon publication ( http://www.ncbi.nlm.nih.gov/geo/ ) under the accession number GSE6583{{tag}}--DEPOSIT-- . Statistical Analysis of the Microarray Analysis Statistical analysis of microarray data was performed using | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
559 | GSE6584 | 7/30/2007 | ['6584'] | ['2889'] | [u'17655762'] | 2323217 | [u'17655762'] | ['Araujo', 'Kleinman', 'Nel', 'Gong', 'Lusis', 'Horvath', 'Li', 'Zhao', 'Sioutas', 'Barajas'] | ['Araujo', 'Kleinman', 'Nel', 'Gong', 'Lusis', 'Horvath', 'Li', 'Zhao', 'Sioutas', 'Barajas'] | ['Araujo', 'Kleinman', 'Nel', 'Gong', 'Lusis', 'Horvath', 'Li', 'Zhao', 'Sioutas', 'Barajas'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
560 | GSE6586 | 1/31/2007 | ['6586'] | [] | [u'17259983'] | 3008549 | [u'17259983'] | ['', 'Wang', 'Medvid', 'Blelloch', 'Jaenisch', 'Melton'] | ['Medvid', 'Blelloch', 'Melton', 'Wang', 'Jaenisch'] | ['Medvid', 'Melton', 'Blelloch', 'Wang', 'Jaenisch'] | Nat Genet | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
561 | GSE6593 | 1/8/2007 | ['6593'] | ['2521'] | [u'17047147'] | 2843641 | [u'20102610'] | ['Chen', 'Shivdasani', 'Hu'] | ['Rassart', 'Ben-David', 'Legault', 'Voisin', 'Ospina'] | [] | BMC Med Genomics | 2010 | 1/26/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
562 | GSE6593 | 1/8/2007 | ['6593'] | ['2521'] | [u'17047147'] | 1794061 | [u'17047147'] | ['Chen', 'Shivdasani', 'Hu'] | ['Chen', 'Shivdasani', 'Hu'] | ['Chen', 'Shivdasani', 'Hu'] | Blood | 2007 | 2/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
563 | GSE6594 | 4/30/2007 | ['6594'] | ['2684'] | [u'17369330'] | 1907119 | [u'17369330'] | ['Arp', '', 'Gelfand', 'Permina', 'Bottomley', 'Gvakharia', 'Sayavedra-Soto'] | ['Arp', 'Gelfand', 'Permina', 'Bottomley', 'Gvakharia', 'Sayavedra-Soto'] | ['Arp', 'Gelfand', 'Permina', 'Bottomley', 'Gvakharia', 'Sayavedra-Soto'] | Appl Environ Microbiol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
564 | GSE6595 | 1/1/2007 | ['6595'] | [] | [u'17234177'] | 2655965 | [u'18999108'] | ['Schultz', 'Han', 'Vassena', 'Gao', 'Latham', 'Baldwin'] | ['Liu', 'Clarke', 'Yoon', 'Li'] | [] | AMIA Annu Symp Proc | 2008 | 11/6/2008 | 1 | eriments. In this paper, we report our experience in exploring Gene Expression Data (MGED) ontology (MO) 9 and NCI Thesaurus for annotating breast cancer microarray data available at Gene Expression Omnibus (GEO) 10 . Specifically, we tailored NCI Thesaurus to obtain breast cancer microarray clinical ontology (BCM-CO), an ontology to capture breast cancer microarray clinical information. The coverage of|I Metathesaurus are used to provide terminology support to the public Web portal, http://cancer.gov , numerous portals supporting consortia, and other communities of researchers. The Gene Expression Omnibus (GEO) was initiated to serve as a public repository for a wide range of high-throughput experimental data, which includes data from single and dual channel microarray-based experiments measuring mRNA| The prototype coverage with respect to four categories is discussed below in detail. DiseaseState and Histology In the BCM-CO prototype, 82 histology nodes were mapped. We identified six series ( GSE2109 , GSE5949 , GSE5720 , GSE6595{{tag}}--REUSE-- , GSE1477 , and GSE7849 ) in which histology terms can be extracted using simple patterns. Overall, 52 terms were retrieved, among which 48 are histology terms, 2 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
565 | GSE6596 | 1/12/2007 | ['6596'] | [] | [u'17410534'] | 2602602 | [u'19104654'] | ['J\xc3\xbcrgens', 'Klein', 'Schmutzler', 'Arnold', 'Meindl', 'Scherneck', 'Niederacher', 'Seitz', 'Graessmann', 'Wessel', 'Petersen'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596{{tag}}--REUSE-- GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
566 | GSE6596 | 1/12/2007 | ['6596'] | [] | [u'17410534'] | 2872436 | [u'20335537'] | ['J\xc3\xbcrgens', 'Klein', 'Schmutzler', 'Arnold', 'Meindl', 'Scherneck', 'Niederacher', 'Seitz', 'Graessmann', 'Wessel', 'Petersen'] | ['Potti', 'Gatza', 'Lucas', 'Nevins', 'Barry', 'Kelley', 'Datto', 'Mathey-Prevot', 'Kim', 'Crawford', 'Wang'] | [] | Proc Natl Acad Sci U S A | 2010 | 4/13/2010 | 0 | sion Materials and Methods Supplementary Material References Materials and Methods Human Breast Tumor Samples and Cancer Cell Lines. A total of 1,143 patient samples from 10 independent datasets ( GSE1456 , GSE1561 , GSE2034 , GSE3494 , GSE3744 , GSE4922 , GSE5460 , GSE5764 , GSE6596{{tag}}--REUSE-- , and E-TABM-158) were analyzed ( 9 , 32 – 40 ). The validation dataset ( n = 547) was derived from | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
567 | GSE6604 | 1/16/2007 | ['6604'] | [] | [u'17430594', u'15254046', u'15892885'] | 1865555 | [u'17430594'] | ['Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Gilbertson', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | BMC Cancer | 2007 | 4/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
568 | GSE6605 | 1/16/2007 | ['6605'] | [] | [u'17430594', u'17071602'] | 1865555 | [u'17430594'] | ['Jelezcova', 'Michalopoulos', 'McHale', 'Ma', 'Becich', 'Acquafondata', 'Bisceglia', 'Liang', 'Chiosea', 'Sobol', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | BMC Cancer | 2007 | 4/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
569 | GSE6606 | 1/16/2007 | ['6606'] | [] | [u'17430594', u'15254046', u'15892885', u'17071602'] | 1865555 | [u'17430594'] | ['Jing', 'Liu', 'Ren', 'Liang', 'Sobol', 'Monzon', 'Luo', 'Michalopoulos', 'Dhir', 'Yu', 'Jelezcova', 'Ma', 'McHale', 'Bisceglia', 'Nelson', 'Lyons-Weiler', 'Chandran', 'Thomas', 'McDonald', 'Chiosea', 'Finkelstein', 'Gilbertson', 'Landsittel', 'Becich', 'Acquafondata'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | BMC Cancer | 2007 | 4/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
570 | GSE6607 | 12/29/2007 | ['6607'] | [] | [] | 2396644 | [u'18485203'] | [u'Papoutsakis', u'Wang'] | ['Papoutsakis', 'Wang', 'Windgassen'] | [u'Papoutsakis', u'Wang'] | BMC Genomics | 2008 | 5/16/2008 | 0 | ontology assignment) was carried out using 'MultiExperiment Viewer (MeV)' from The Institute for Genomic Research (TIGR) [ 56 ]. Raw and normalized data were deposited in the Gene Expression Omnibus (GSE6607{{tag}}--DEPOSIT-- (CD3+ T-cell experiment), GSE7571 (CD4+ T-cell experiment) and GSE7572 (CD8+ T-cell experiment)) [ 57 ]. Within each population (three biological replicates using cells from different donors), mult | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
571 | GSE6607 | 12/29/2007 | ['6607'] | [] | [] | 2600644 | [u'18947405'] | [u'Papoutsakis', u'Wang'] | ['Papoutsakis', 'Wang', 'Windgassen'] | [u'Papoutsakis', u'Wang'] | BMC Med Genomics | 2008 | 10/23/2008 | 0 | tering, and Gene Ontology assignment) with 'MultiExperiment Viewer (MeV)' from The Institute for Genomic Research (TIGR) [ 11 ]. Raw and normalized data were deposited in the Gene Expression Omnibus (GSE6607{{tag}}--DEPOSIT-- (CD3+ T-cell experiment), GSE7571 (CD4+ T-cell experiment) and GSE7572 (CD8+ T-cell experiment)) [ 12 ]. Within each population (three biological replicates using cells from three different donors) | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
572 | GSE6608 | 1/16/2007 | ['6608'] | [] | [u'17430594', u'15254046', u'15892885'] | 1865555 | [u'17430594'] | ['Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Gilbertson', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | BMC Cancer | 2007 | 4/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
573 | GSE6611 | 8/31/2007 | ['6611'] | [] | [u'17369403'] | 1820944 | [u'17369403'] | ['Ni', u'OBrien', 'Blume', 'Clark', 'Preston', 'Nobida', 'Shiue', "O'Brien", 'Donohue', 'Grate', 'Ares'] | ['Ni', 'Blume', 'Clark', 'Preston', 'Nobida', 'Shiue', "O'Brien", 'Donohue', 'Grate', 'Ares'] | ['Ni', 'Blume', 'Clark', 'Preston', 'Nobida', 'Shiue', "O'Brien", 'Donohue', 'Grate', 'Ares'] | Genes Dev | 2007 | 3/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
574 | GSE6621 | 1/20/2007 | ['6621'] | [] | [u'16874319'] | 2094713 | [u'17915019'] | ['Nott', 'Little', 'Chan', 'Cotsapas', 'Cowley', 'Williams'] | ['Swindell'] | [] | BMC Genomics | 2007 | 10/3/2007 | 0 | n in treatment A (relative to treatment B ). The RNA source for all treatments was liver and the value n refers to the number of independent biological replicates available for each treatment. a GSE3129, MG-U74Av2, Boyleston et al. [20] b GSE3150, MG 430 2.0, Boyleston et al. [20] c EMEXP153, MOE 430A, Amador-Noguez et al. [19] d GSE988, MG-U74Av2, Rowland et al. [26] e GSE5959, MG-U74Av2, Adamo e|30A, Amador-Noguez et al. [87] g GSE1093, MG-U74Av2, Tsuchiya et al. [24] h GSE2431, MG-U74Av2, Dhahbi et al. [28] i EMEXP490, MG-U74Av2, Heishi et al. [88] j GSE363, MG-U74Av2, Recinos et al. [89] k GSE3889, MG 430 2.0, Flowers et al. [90] l EMEXP839, MG 430 2.0, Niedernhofer et al. [91] The Snell, Ames and Little dwarf mutants were associated with large affects on gene expression at every age examine|pregulation, while green squares indicate significant downregulation. Williams et al. [ 30 ] have recently reported on baseline expression levels of liver organs from 31 BxD mouse strains (GEO Series GSE6621{{tag}}). The lifespan of 21 of these strains had previously been measured by Gelman et al. [ 31 ]. Using these two data sources, the relationship between expression levels of candidate genes and mean life|didate gene expression versus mean lifespan I . The expression level of nine candidate genes was examined among 21 BxD mouse strains. Expression data was generated by Williams et al. [30] (GEO series GSE6621{{tag}}--REUSE--). Lifespans of BxD strains were assayed by Gelman et al. [31]. The dashed horizontal line indicates the average gene expression level for each gene, while the solid line represents the least-square | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
575 | GSE6624 | 1/25/2007 | ['6624'] | [] | [u'17259635'] | 2862456 | [u'18229712'] | ['Farnham', "O'Geen", 'Jin', 'Bieda', 'Krig', 'Green', 'Yaswen'] | ['Chun', 'Kuan', 'Kele\xc5\x9f'] | [] | Pac Symp Biocomput | 2008 | 2008 | 0 | We provide an illustration ofÊCMARRTÊwith a ZNF217 ChIP-chip data tiling the ENCODE regions (available from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/)14Êwith accession numberÊGSE6624{{key}}--REUSE--). The ENCODE regions were tiled at a density of one 50-mer every 38 bp, leading to ~ 380,000 50-mer probes on the array. We analyze two different replicates of this dataset separately and compare the analysis on these single replicates | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
576 | GSE6624 | 1/25/2007 | ['6624'] | [] | [u'17259635'] | 2802188 | [u'20008927'] | ['Farnham', "O'Geen", 'Jin', 'Bieda', 'Krig', 'Green', 'Yaswen'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
577 | GSE6625 | 1/25/2007 | ['6625'] | [] | [u'17259635'] | 2802188 | [u'20008927'] | ['Farnham', "O'Geen", 'Jin', 'Bieda', 'Krig', 'Green', 'Yaswen'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
578 | GSE6626 | 1/4/2007 | ['6626'] | [] | [] | 2715804 | [u'19451228'] | [u'Ma', u'Cormack'] | ['Ma', 'Johnson', 'Whiteway', 'Domergue', 'Rigby', 'Pan', 'Cormack'] | [u'Ma', u'Cormack'] | Mol Cell Biol | 2009 | 2009 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
579 | GSE6629 | 9/1/2007 | ['6629'] | [] | [u'17693573'] | 1950897 | [u'17693573'] | ['van', 'Gierman', 'Geerts', 'Goetze', 'Versteeg', 'Indemans', 'Seppen', 'Koster'] | ['van', 'Gierman', 'Geerts', 'Goetze', 'Versteeg', 'Indemans', 'Seppen', 'Koster'] | ['van', 'Gierman', 'Geerts', 'Goetze', 'Versteeg', 'Indemans', 'Seppen', 'Koster'] | Genome Res | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
580 | GSE6631 | 1/20/2007 | ['6631'] | ['2520'] | [u'15170515'] | 2848541 | [u'20369013'] | ['Brown', 'Delacure', 'Elango', 'Zhang', 'Kuriakose', 'McMunn-Coffran', u'Elongo', 'Qiu', 'Hsu', 'Chen', 'Sikora', 'He'] | ['Hasina', 'Xing', 'Lee', 'Zhang', 'Gerstein', 'Lussier', 'Cheng', 'Li', 'Wu', 'Yang', 'Fan', 'Huang', 'Weichselbaum', 'Lingen'] | ['Zhang'] | PLoS Comput Biol | 2010 | 4/1/2010 | 0 | nd Table 1 in Text S2 ). We validated this method using two independent cancer expression profiling experiments in GEO comprised of paired mRNA and microRNA expressions for tumors and normal tissue (GSE2564 [18] : multiple epithelial cancer; GSE8126 [19] : prostate cancer). IMRE-predicted downregulated microRNAs that are exclusively inferred from mRNA expression and m| analyze two independent HNSCC mRNA microarray datasets for predicting deregulated microRNAs from genome-wide mRNA expression (Supporting Figure 2 in Text S1 , Materials and Methods ): first , the GSE6631{{tag}}--REUSE-- set that provides differential mRNA gene expression between 22 HNSCC non-microdissected patient tumor samples and their paired normal squamous tissues [21] , and second , the GSE2|ted node-positive HNSCC tumors of the hypopharynx. We noted that vast majority of the known microRNAs had at least one putative target in the top 500 deregulated genes of the HNSCC expression arrays (GSE6631{{tag}}--REUSE--), with a median of 19 targets. Therefore, it is unfeasible to manually select microRNA candidates from their deregulated targets for biological validation. Applying IMRE method to each dataset, we | HNSCC. ( A ) Enriched gene targets of 46 microRNAs among inheritable cancer genes in OMIM are significantly overlapping with 34 predictions of deregulated microRNAs based on HNSCC expression arrays (GSE6631{{tag}}--REUSE--, GSE2379; Figure S3; Table S2 and Table S3), yielding ten prioritized microRNAs ( P  = 2.33×10 −4 ). P : Cumulative hypergeometric Statistics. §: mi|'s B-cell lymphoma. ( G ) Comparison of miR-204 expression between six types of human epithelial cancers tissues and their respective normal tissues was conducted using the microRNA profiling dataset GSE2564 [18] ( P-values were calculated using two-tail unpaired t-test; “n” indicates number of patients; error bars represent mean ± standard error of the mean, |ificantly reduced ( Figure 1F ). Further, paired comparison of miR-204 expression between 6 types of adenocarcinomas and their respective normal tissues was conducted using the microRNA array dataset GSE2564 [18] . miR-204 was significantly down-regulated in breast ( P  = 0.014), kidney ( P  = 0.004) and prostate ( P  =&#x|rgets are significantly related through their biological functions Among the 1,088 putative miR-204 targets predicted in the miRNOME, 34 mRNA transcripts that were significantly upregulated in HNSCC (GSE6631{{tag}}) led to the enrichment of miR-204 (Figure 2A and Table 6 in Text S2 ). We first conducted statistical functional enrichment analyses using Gene Ontology (GO) [28] and found a num|cted targets of miR-204 in HNSCC are significantly related via their molecular or biological functions. ( A ) Enrichment of 34 miR-204 gene targets between 382 differentially upregulated HNSCC genes (GSE6631{{tag}}) and 1088 putative miR-204 targets predicted by sequence-based microRNA target prediction databases (miRNOME); ( B–C ) Determination of mRNA expression of “functionally prioritized |contains 44,695 protein-protein interactions and 7,321 predicted human genes targets for the 532 microRNAs in the miRNOME. We subsequently could map 260 out of 382 (68%) up-regulated genes in GSE6631{{tag}} to the PPIN (refer to as “HNSCC PPIN”), of which 24 were miR-204 targets predicted in miRNOME. We next computed the empirical probability of interactions among these 260 genes in th| as compared to those found in the empirical distribution ( Materials and Methods ). As a result, we identified a protein regulatory network in HNSCC consisting of 56 prioritized upregulated genes in GSE6631{{tag}} at a low false discovery rate of 7% ( Figure 3 and, Materials and Methods ) (referred to as “prioritized HNSCC PPIN”). Among the 24 miR-204 targets mapped to the genome-sc|ore the clinical relevance of miR-204 down regulation in HNSCC, we conducted an unbiased hierarchical clustering analysis of 60 HNSCC tumors harvested from representative anatomical sites of HNSCC in GSE686 [41] based on the mRNA expression pattern of 34 miR-204 targets identified in GSE6631{{tag}}--REUSE-- [21] ( Materials and Methods ). The original study reported a 582-gene signa|dentified a subtype of HNSCC tumors exhibiting an EGFR-pathway signature and miR-204 was deregulated in other squamous and epithelial tumors. miR-204 functional targets classified 60 HNSCC tumors in (GSE686) [41] microarray based on their intrinsic properties (Methods). P - values were obtained using a Fisher's exact test; *: censored data. Discussion Here, we developed an e|istical analyses (Figure 1A, Figure 1G, Figure 5 and Supporting Figure 3 in Text S1 ) Microarray datasets were downloaded from NCBI GEO database. The .cel file of HNSCC mRNA transcription array sets GSE6631{{tag}}--REUSE-- [21] and GSE2379 [22] were processed using the Bioconductor Package [65] implementation of GCRMA in R Software [66] . To ident|5;2 and False Discovery Rate (FDR) ≤0.0006 (Figure1A and Supporting Figure 3 in Text S1 ). The association of miR-204 targets with clinical parameters was analyzed using HNSCC mRNA array set GSE686 [41] . The intensity ratios of red to green channel of the predicted miR204 targets were retrieved from GSE686 dataset. Missing values were assigned a constant value of 0. Redundant|d by “*” in Figure 5 ). To determine the miR-204 expression status in epithelial tumors ( Figure 1G ), the expression values of miR-204 were extracted from microRNA array set GSE2564 [18] . Only six solid tumor types, colon, kidney, prostate, uterus, lung and breast that contained more than one samples in both tumor and the respective norm tissue were included |1 in Text S2 . miRNOME contains 534 distinct human microRNAs, 17,343 predicted putative microRNA gene targets and 444,558 distinct microRNA-Target relationships. Specifically, 5110 distinct genes in GSE6631{{tag}} and 5131 distinct genes in GSE2379 are respectively targeted by 531 and 530 microRNAs in the miRNOME. Imputed microRNA regulation based on weighted ranked expression and putative microRNA targets (|this microRNA. Further, the scores are adjusted for the cardinality (count of genes) of each gene-set ( e.g. cardinality of T i,j is | T i,j |; Equation 3). follows a normal distribution in both GSE6631{{tag}} and GSE2379 (data not shown). An empirical Student T-test for unequal variances was performed to compare Δ for each microRNA between cancer and normal tissue (Bioconductor package twili| Additionally, we developed a model that estimates the probability of occurrence of an observed Single Protein Network arising from the upregulated gene list between HNSCC and normal paired tissue in GSE6631{{tag}}. Each of these unregulated HNSCC gene was translated to its corresponding protein identifier in the network (HNSCC protein). Each HNSCC protein was mapped to each of the rest HNSCC proteins accordi|d; ( Figure 3 ) was predicted from SPAN in the “genome-scale PPIN” with a FDR of 7.14% for the links between labeled genes and of 10.15% for upregulated HNSCC genes in GSE6631{{tag}}. The resulting network was drawn using Cytoscape [84] . Details on the protein interaction dataset supporting each pair of protein interactions are provided in Table 12 in Text S2 | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
581 | GSE6634 | 1/25/2007 | ['6634'] | [] | [u'17259635'] | 2802188 | [u'20008927'] | ['Farnham', '', "O'Geen", 'Jin', 'Bieda', 'Krig', 'Green', 'Yaswen'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
582 | GSE6635 | 3/1/2007 | ['6635'] | [] | [u'17207281'] | 1779778 | [u'17207281'] | ['Ehrenkaufer', 'Singh', 'Hackney', 'Ali'] | ['Ehrenkaufer', 'Singh', 'Hackney', 'Ali'] | ['Ehrenkaufer', 'Singh', 'Hackney', 'Ali'] | BMC Genomics | 2007 | 1/5/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
583 | GSE6639 | 4/21/2007 | ['6639'] | [] | [u'17311107'] | 1797416 | [u'17311107'] | ['Petroulakis', 'Mamane', 'Martineau', 'Rajasekhar', 'Sato', 'Sonenberg', 'Larsson'] | ['Petroulakis', 'Mamane', 'Martineau', 'Rajasekhar', 'Sato', 'Sonenberg', 'Larsson'] | ['Petroulakis', 'Mamane', 'Martineau', 'Rajasekhar', 'Sato', 'Sonenberg', 'Larsson'] | PLoS One | 2007 | 2/21/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
584 | GSE6640 | 11/8/2007 | ['6640'] | [] | [u'17994089'] | 2633540 | [u'19139403'] | ['Sturgill', 'Parisi', 'Oliver', 'Kumar', 'Zhang'] | ['Bedford', 'Hartl'] | [] | Proc Natl Acad Sci U S A | 2009 | 1/27/2009 | 0 | ne-Expression Data. Present-day gene-expression levels for all 7 Drosophila species were based upon data from Zhang et al. ( 34 ). Raw hybridization data were obtained from the Gene Expression Omnibus under accession GSE6640{{tag}}--REUSE-- ( http://www.ncbi.nlm.nih.gov/geo/ ; accessed March 2008). For each array, we took the log 2 intensities of its probes and normalized these intensities to have mean 0 and v | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
585 | GSE6640 | 11/8/2007 | ['6640'] | [] | [u'17994089'] | 2386140 | [u'17994090'] | ['Sturgill', 'Parisi', 'Oliver', 'Kumar', 'Zhang'] | ['Sturgill', 'Parisi', 'Oliver', 'Zhang'] | ['Sturgill', 'Parisi', 'Oliver', 'Zhang'] | Nature | 2007 | 11/8/2007 | 1 | stract METHODS SUMMARY METHODS Supplementary Material References METHODS SUMMARY Array data sources Expression data for sex-sorted whole adults of the seven Drosophila species 8 are from GEO 19 ( GSE6640{{tag}}--REUSE-- ). Data for D. melanogaster gonadectomized male and female carcass on the FlyGEM platform were published previously 2 , GEO ( GSE442 ). Affymetrix data for bgcn − and UAS- os , bgcn &#| Wang PJ, McCarrey JR, Yang F, Page DC. An abundance of X-linked genes expressed in spermatogonia. Nature Genet. 2001; 27 :422â��426. [ PubMed ] 19. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002; 30 :207â��210. [ PMC free article ] [ PubMed ] 20. Lyne R, et al. FlyMine: An integrated database for Dr|1] Paucity of genes on the Drosophila X chromosome showing male-biased expression. Science. 2003 Jan 31; 299(5607):697-700. [Science. 2003] See more articles cited in this paragraph Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002 Jan 1; 30(1):207-10. [Nucleic Acids Res. 2002] Paucity of genes on the Drosophila X chromosome showing male-bi | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
586 | GSE6640 | 11/8/2007 | ['6640'] | [] | [u'17994089'] | 2386141 | [u'17994089'] | ['Sturgill', 'Parisi', 'Oliver', 'Kumar', 'Zhang'] | ['Sturgill', 'Parisi', 'Oliver', 'Kumar', 'Zhang'] | ['Sturgill', 'Parisi', 'Oliver', 'Kumar', 'Zhang'] | Nature | 2007 | 11/8/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
587 | GSE6640 | 11/8/2007 | ['6640'] | [] | [u'17994089'] | 2820458 | [u'20051121'] | ['Sturgill', 'Parisi', 'Oliver', 'Kumar', 'Zhang'] | ['Oliver', 'Zhang'] | ['Oliver', 'Zhang'] | BMC Genomics | 2010 | 1/5/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
588 | GSE6645 | 6/24/2007 | ['6645'] | [] | [u'17761519'] | 2190616 | [u'17761519'] | ['Rao', 'McLean', 'Shin', 'Wang', 'Song', 'Sherrer', 'Uhl', 'Blau', "D'Amour", 'Brimble', 'Zhan', 'Galeano', 'Nelson', 'Schulz', 'Chen', 'Ware', 'Dauphin', 'Robins', 'Chesnut'] | ['Rao', 'McLean', 'Shin', 'Wang', 'Song', 'Sherrer', 'Uhl', 'Blau', "D'Amour", 'Brimble', 'Zhan', 'Galeano', 'Nelson', 'Schulz', 'Chen', 'Ware', 'Dauphin', 'Robins', 'Chesnut'] | ['McLean', 'Sherrer', 'Shin', 'Wang', 'Song', 'Rao', 'Uhl', 'Blau', "D'Amour", 'Brimble', 'Zhan', 'Galeano', 'Nelson', 'Schulz', 'Chen', 'Ware', 'Dauphin', 'Robins', 'Chesnut'] | Blood | 2007 | 12/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
589 | GSE6647 | 1/6/2007 | ['6647'] | ['2625'] | [u'17317628'] | 1858660 | [u'17317628'] | ['Singer', 'Jacobson', 'Dong', 'Zenklusen', 'Li', 'He'] | ['Singer', 'Jacobson', 'Dong', 'Zenklusen', 'Li', 'He'] | ['Singer', 'Jacobson', 'Dong', 'Zenklusen', 'Li', 'He'] | Mol Cell | 2007 | 2/23/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
590 | GSE6653 | 1/9/2007 | ['6653'] | ['2975'] | [u'19615063'] | 2724489 | [u'19615063'] | ['Showe', 'Agosto-Perez', 'Qin', 'Huang', 'Chan', 'Lin', 'Souriraj', 'Potter', 'Yan', 'Liyanarachchi', 'Cheng', 'Saltz', 'Nephew', 'Nikonova', 'Davuluri', 'Balch'] | ['Showe', 'Agosto-Perez', 'Qin', 'Huang', 'Chan', 'Lin', 'Souriraj', 'Potter', 'Yan', 'Liyanarachchi', 'Cheng', 'Saltz', 'Nephew', 'Nikonova', 'Davuluri', 'Balch'] | ['Agosto-Perez', 'Qin', 'Chan', 'Showe', 'Lin', 'Souriraj', 'Potter', 'Yan', 'Balch', 'Liyanarachchi', 'Cheng', 'Saltz', 'Nephew', 'Nikonova', 'Davuluri', 'Huang'] | BMC Syst Biol | 2009 | 7/17/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
591 | GSE6675 | 1/9/2007 | ['6675'] | [] | [] | 2949890 | [u'20831831'] | [u'Mandell'] | ['Monsonego', 'Rubin', 'Geifman'] | [] | BMC Bioinformatics | 2010 | 9/12/2010 | 0 | ound 0.456971* 0.505641 External side of plasma membrane 0.744269 0.508488* Not Found (A) Enrichment in genes with up-regulated expression in hippocampus treated with Fluoxetine, based on GEO profile GSE6476. All terms passed FDR > 0.25 and state the FDR values. (B) GEO profile GSE6675{{tag}}--REUSE--. All terms passed FDR < 0.25 and state the FDR values (Up-regulated in Control in comparison to FGF2 t|ture. † indicates terms that were not included in the NIGO subset. For three GEO expression profiles, the results of the analysis are described in detail (Table 1 ). In the analysis of the GSE6476, in which the effect of chronic Fluoxetine treatment on hippocampal gene expression was examined [ 10 ], eleven terms passed statistic filtering with NIGO and not with the full GO, out of which 7 w|ded in the full GO but not in NIGO passed the statistic cutoffs but this term ('Protein self-association') was functionally irrelevant and contributed very little to the analysis. For the GEO profile GSE6675{{tag}}--REUSE--, in which astroglial gene expression program elicited by fibroblast growth factor-2 was examined, four terms relevant to neural/immune systems passed statistic filtering with NIGO and not with the |ith the full GO. This is an example of how without the use of NIGO one would have to raise the cutoff to at least 0.55 in order for this term to appear in the analysis results. Functional analysis of GSE6509 with NIGO also revealed statistically significant terms that were missed when using the full GO. This experiment involved microarray expression profiling designed to explore the effect of RU486, a |ed both in NIGO and the full GO, received higher FDR values when NIGO was used (Table 2 ), contrary to our expectations from the impact of reducing the ontology size. For example, in the analysis of GSE8788 in which gene expression analysis was conducted using Trib1-deficient macrophages treated with LPS as compared to LPS-treated wild-type macrophages [ 12 ], 11 out of 14 terms (that passed statistic|ment in FDR values when using NIGO is partially due to the stochastic nature of the GSEA algorithm. To test this hypothesis, the same analysis was repeated three times with each of the ontologies for GSE8788. FDR values were averaged and a comparison of analysis results was performed based on these averaged FDR values. In accordance with our hypothesis, the averaging of FDR values improved the apparent|l GO and GO-slim in enrichment analysis using GSEA Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.05 2 (2) 5 0 0 2 0 GSE6675{{tag}}--REUSE-- FDR < 0.25 0 6 0 0 1 0 GSE6476 FDR < 0.25 1 (1) 11 0 0 4 0 GSE3779 P < 0.05 7 (3) 13 1 (1) 0 1 5 GSE8425 P < 0.05 6 (1) 0 0 | 0 0 0 1 GSE8788 P < 0.01 6 (6) 0 0 11 3 1 GSE8788* P < 0.01 6 (6) 0 0 3 11 1 GSE9659 P < 0.01 6 (6) 0 NS A 4 28 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.01 5 (5) 0 0 0 1 2 GSE2259 FDR < 0.25 3 (1) 0 1 (1) 1 2 1 GSE8191 P < 0.05 5 (2) 0 0 0 0 0 The enrichment analysis results of GSEA, providing the full GO, NIGO or GO-s| for each subset given in separate columns. *To test the effect of stochasticity, the analysis of this profile is based on 3-fold averaged FDR values from three independent GSEA analyses. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. Three non-neural- or immune-related datasets were used to test the performance of NIGO in functional analysis of microarray data in|. These datasets included the GEO expression profile, GSE7407, in which gene expression in heart tissue with cardiac specific over-expression of Sirt1 was examined [ 13 ], the GEO expression profile, GSE8191, in which the gene expression profile of mammary glands from pregnant mice was compared to that of mammary glands from lactating mice [ 14 ], and the GEO expression profile, GSE2259, in which gene |in enrichment analysis using the Fisher Exact Test Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.0001 1 (1) 1 0 0 22 2 GSE6675{{tag}}--REUSE-- P < 0.05 1 (1) 0 11 (1) 0 2 0 GSE6476 P < 0.0001 0 2 2 0 15 3 GSE3779 P < 0.1 0 0 0 0 2 0 GSE8425 P < 0.1 0 0 0 0 0 0 GSE|0.001 0 2 0 0 9 1 GSE6136 P < 0.000001 0 1 0 0 37 7 GSE8788 P < 0.01 0 2 0 0 4 1 GSE9659 P < 0.001 0 3 NS A 0 33 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.0000000001 33 (30) 0 0 9 0 0 GSE2259 P < 0.1 63 (35) 0 2 8 8 7 GSE8191 P < 0.001 6 (6) 7 6 (2) 0 16 3 Enrichment analysis results from Ontologizer, providing the full |ional GO subset (the full GO or generic slim) are also shown for each term ('Lowest P-Value'), with the number of terms with the lowest P-values for each subset given in separate columns. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. These results, together with the analysis of five additional neural/immune-related experiments not described in Table 1 (namely p| cell's activity. These results further demonstrate that analysis with NIGO can enhance interpretation of functional analysis results produced for relevant microarray datasets. In the analysis of the GSE6509 expression dataset, three relevant terms passed the statistical cutoff with NIGO but not with the full GO. These terms, 'viral envelope', 'viral infectious cycle' and 'viral capsid' are all terms r|et. Such terms received FDR or p-values that were very close (but larger) than the cutoff values used. This is partially explained by the stochastic nature of the GSEA algorithm. Indeed, for one set (GSE8788), we compared the raw results with averaged FDR values. Averaging dramatically decreased the number of such terms. Furthermore, we conducted a similar functional analysis using the Fisher Exact Tes|nous peptide antigen via MHC class I' and 'Positive regulation of T cell mediated cytotoxicity' without defeating the purpose of the slimming process. Yet these two terms were found to be enriched in GSE6476, and are crucial for generating a hypothesis based on the expression profile. This shows that GO slims may be complemented by small, yet fully detailed domain-specific subsets of GO. Figure 4 The U|plementary/NIGO/Supplementary.html . The ontologies were clipped using the Protégé 4.0 beta OWL editor [ 26 ]. For by-species filtering, annotation files for human, rat and mouse were downloaded (October 2008) from GOA-EBI [ 27 ]. Association files used for GSEA analysis were generated based on the GOA-EBI annotation files and in the format required by GSEA. In this format, each row repres|a analysis and functional analysis were conducted using the GenePattern [ 29 ], GSEA [ 9 ] (release 2.5) web servers, and Ontologizer [ 16 ] as follows: (1) for each study, raw data (.CEL files) were downloaded from GEO [ 30 ]; (2) expression files (.gct files) were created using the Gene Pattern Expression File Creator module; (3) where necessary (i.e. for expression files GSE6509, GSE6675{{tag}}--REUSE--, GSE6476, GSE7|was conducted using the GSEA module. GSEA was run three times for each dataset, using a different GO version for each run. For the full GO, we used the organism-specific GO subset. In the analysis of GSE8788, GSEA was run three times for each of the three ontologies and FDR values were averaged over the three runs. (5) Differentially-expressed genes were found using the Gene Pattern ComparativeMarkerSe|, to an input term (Figure 4 ). The connections of UMLS concepts to the tested GO term are provided by UMLS and defined within the UMLS data files. Microarray data sets All microarray data sets were downloaded from GEO at NCBI [ 30 ]. The GEO sets used in this study are described in Additional file 2 . Availability NIGO is freely available as Additional file 6 and for download from: http://bioinfo.bg|re included or excluded from NIGO. Click here for file Additional file 2 Microarray Datasets for Comparative analysis of NIGO, GO and GO-slim . This file contains a summary of the microarray datasets downloaded from GEO at NCBI and used to test the performance of NIGO. Click here for file Additional file 3 GSEA analysis results . This file contains a summary of the GSEA analysis results for each of the mi|etroviral infectivity through different mechanisms Am J Physiol Lung Cell Mol Physiol 2009 297 L538 545 10.1152/ajplung.00162.2009 19561138 The Gene Ontology Consortium http://www.geneontology.org/GO.downloads.ontology.shtml Web Ontology Language Guide http://www.w3.org/TR/owl-guide Protégé 4.0 beta http://protege.stanford.edu Barrell D Dimmer E Huntley RP Binns D O'Donovan C Apweiler R Th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
592 | GSE6678 | 1/9/2007 | ['6678'] | [] | [] | 2204004 | [u'18021406'] | [u'Hofmann'] | ['Hofmann', 'Lu', 'Qiao'] | [u'Hofmann'] | BMC Neurosci | 2007 | 11/16/2007 | 0 | Statistical Algorithm [ 55 ]. All .CEL, .CHP, and signal call files are available for download in Additional File 8 , and all raw data are available in the Gene Expression Omnibus [ 51 ], record no. GSE6678{{tag}}--DEPOSIT--. Filtering and normalization Probe-set level data generated by GCOS v1.4 software (Affymetrix) was collated into the software package BRB Array Tools, developed by the Biometric Research Branch of | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
593 | GSE6679 | 12/26/2007 | ['6679'] | [] | [u'17510634'] | 1888674 | [u'17510634'] | ['Major', 'Parisien', 'Maquat', 'Furic', 'Kim', 'DesGroseillers'] | ['Major', 'Parisien', 'Maquat', 'Furic', 'Kim', 'DesGroseillers'] | ['Major', 'Parisien', 'Maquat', 'Furic', 'Kim', 'DesGroseillers'] | EMBO J | 2007 | 6/6/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
594 | GSE6682 | 2/1/2007 | ['6682'] | [] | [u'17298187', u'17299599', u'17400893'] | 1914114 | [u'17496106'] | ['Grant', 'Sullivan', 'Kasschau', 'Howell', 'Chapman', 'Fahlgren', 'Carrington', 'Law', 'Dangl', 'Givan', 'Cumbie'] | ['Carrington', 'Henz', 'Kasschau', 'Weigel', 'Lohmann', 'Schmid', 'Cumbie'] | ['Carrington', 'Cumbie', 'Kasschau'] | Plant Physiol | 2007 | 2007 Jul | 1 | neral difference between the expression profiles of cis-NATs and nonoverlapping transcript pairs, we calculated the pairwise Pearson correlation coefficient (PCC) for these transcript pairs from four publicly available data sets generated by the AtGenExpress initiative. The first set comprised data from 234 arrays that capture expression of 78 different tissue samples assayed in triplicate throughout deve|lementary Material References MATERIALS AND METHODS Mapping of Transcript Pairs The XML file containing the latest annotation (version 6) of Arabidopsis ( Arabidopsis thaliana ) pseudochromosomes was downloaded from the TAIR FTP server ( ftp://ftp.arabidopsis.org/home/tair/ ). Start and stop position of the transcription units along with information on the strand that encodes a mRNA and the gene descripti| written in Java. Histograms (bin size 0.1), ranking, and comparisons of PCCs between individual microarray data sets were created in Microsoft Excel. Microarray Analysis All microarray data used are publicly available. Data for correlation analysis were from the AtGenExpress initiative (available from TAIR). Microarray data of small RNA biogenesis mutants ( Allen et al., 2005 ) were obtained from Nationa|e in expression estimate for a probe set to be considered to be robustly differentially expressed. Mapping MPSS Tags and Small RNA Sequences to cis-NATs All MPSS tags and small RNA sequences used are publicly available. MPSS tags were downloaded from the Arabidopsis MPSS database ( http://mpss.udel.edu/at/ ; Meyers et al., 2004a , 2004b ; Lu et al., 2005 ). Small RNAs sequences (ecotype Columbia) from | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
595 | GSE6682 | 2/1/2007 | ['6682'] | [] | [u'17298187', u'17299599', u'17400893'] | 1867363 | [u'17400893'] | ['Grant', 'Sullivan', 'Kasschau', 'Howell', 'Chapman', 'Fahlgren', 'Carrington', 'Law', 'Dangl', 'Givan', 'Cumbie'] | ['Sullivan', 'Kasschau', 'Chapman', 'Fahlgren', 'Howell', 'Carrington', 'Givan', 'Cumbie'] | ['Sullivan', 'Kasschau', 'Chapman', 'Fahlgren', 'Howell', 'Carrington', 'Givan', 'Cumbie'] | Plant Cell | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
596 | GSE6682 | 2/1/2007 | ['6682'] | [] | [u'17298187', u'17299599', u'17400893'] | 1820830 | [u'17298187'] | ['Grant', 'Sullivan', 'Kasschau', 'Howell', 'Chapman', 'Fahlgren', 'Carrington', 'Law', 'Dangl', 'Givan', 'Cumbie'] | ['Sullivan', 'Kasschau', 'Chapman', 'Fahlgren', 'Carrington', 'Givan', 'Cumbie'] | ['Sullivan', 'Kasschau', 'Chapman', 'Fahlgren', 'Carrington', 'Givan', 'Cumbie'] | PLoS Biol | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
597 | GSE6682 | 2/1/2007 | ['6682'] | [] | [u'17298187', u'17299599', u'17400893'] | 1790633 | [u'17299599'] | ['Grant', 'Sullivan', 'Kasschau', 'Howell', 'Chapman', 'Fahlgren', 'Carrington', 'Law', 'Dangl', 'Givan', 'Cumbie'] | ['Grant', 'Sullivan', 'Kasschau', 'Howell', 'Chapman', 'Fahlgren', 'Carrington', 'Law', 'Dangl', 'Givan', 'Cumbie'] | ['Grant', 'Sullivan', 'Kasschau', 'Howell', 'Chapman', 'Fahlgren', 'Carrington', 'Law', 'Dangl', 'Givan', 'Cumbie'] | PLoS One | 2007 | 2/14/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
598 | GSE6685 | 1/19/2007 | ['6685'] | ['2683'] | [u'17142403', u'20038686'] | 1855752 | [u'17142403'] | ['Davies', 'Mohn', 'Eltis', 'Hara', 'Stewart'] | ['Davies', 'Mohn', 'Eltis', 'Hara'] | ['Davies', 'Mohn', 'Eltis', 'Hara'] | J Bacteriol | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
599 | GSE6686 | 7/5/2007 | ['6686'] | ['2840'] | [u'17598924'] | 1925113 | [u'17598924'] | ['Gehl', 'Hojman', 'Zibert', 'Eriksen', u'Moller', 'Gissel'] | ['Gehl', 'Hojman', 'Zibert', 'Eriksen', 'Gissel'] | ['Gehl', 'Hojman', 'Zibert', 'Eriksen', 'Gissel'] | BMC Mol Biol | 2007 | 6/29/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
600 | GSE6687 | 3/8/2007 | ['6687'] | [] | [u'17337637'] | 1865649 | [u'17337637'] | ['Steinleitner', 'Malli', 'Vormann', 'Schweyen', 'Stadler', 'Graier', 'Wiesenberger'] | ['Steinleitner', 'Malli', 'Vormann', 'Schweyen', 'Stadler', 'Graier', 'Wiesenberger'] | ['Steinleitner', 'Malli', 'Vormann', 'Schweyen', 'Stadler', 'Graier', 'Wiesenberger'] | Eukaryot Cell | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
601 | GSE6688 | 3/27/2007 | ['6688'] | ['2650'] | [u'17374847'] | 2945940 | [u'20840752'] | ['Lang', 'Rodr\xc3\xadguez', 'Wagner', 'Dietrich', 'Mages', 'Miethke', 'Wantia', u'Rodr\xedguez'] | ['Ojeda', 'de', 'Nitsch', 'Gon\xc3\xa7alves', 'Moreau'] | [] | BMC Bioinformatics | 2010 | 9/14/2010 | 0 | n, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expr| approach for scoring candidates is presented here for comparison purposes as a naïve strategy for network analysis of differential expression. We have benchmarked these four strategies on 40 publicly available data sets originated from Affymetrix chips on which mice with (simple) knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. For each data set we|egy based on a direct neighborhood analysis. These four network-based prioritization strategies for scoring candidate genes based on their differentially expressed neighborhood were benchmarked on 40 publicly available knockout experiments in mice. Performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expressi|ng candidate genes using network-based machine learning approaches even if no knowledge is available about the disease or phenotype. Methods Benchmark data The benchmark for this study consists of 40 publicly available data sets originated from Affymetrix chips on which mice with knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. Table 3 shows all data sets|ur benchmark. Table 3 The benchmark data. Gene Name GEO accession number Gene Name GEO accession number 1 Abca1 GSE5496 21 Mbnl1 GSE14691 2 Btk GSE2826 22 Mst1r, Ron GSE16629 3 Cav1 GSE10849 23 MyD88 GSE6688{{tag}}--REUSE-- 4 Cav3 GSE10848 24 Nos3, eNos GSE1988 5 Cftr GSE5715 25 Phgdh GSE8555 6 Clcn1 GSE14691 26 Pmp22 GSE1947 7 Cnr1 GSE7694 27 PPAR α GSE6864 8 Emd GSE5304 28 Prkag3, AMPK G3 GSE4065 9 Epas1, Hi|-2 GSE16067 29 Pthlh, Pthrp GSE17654 10 Esrra GSE7196 30 Rab3a GSE6527 11 Gap43 GSE12687 31 RasGrf1 GSE8425 12 Gnmt GSE9809 32 Rbm15 GSE12628 13 Hdac1 GSE5583 33 Runx GSE4911 14 Hdac2 GSE6770 34 Scd1 GSE2926 15 Hsf4 GSE12415 35 Slc26a4 GSE10587 16 Hspa1A, Hsp70.1 GSE11120 36 Srf GSE13333 17 Il6 GSE411 37 Tgm2 GSE10285 18 Lhx1, Lim1 GSE4230 38 Zc3h12a GSE14891 19 Lhx8 GSE11897 39 Zfp36, Tpp GSE5324 20 L | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
602 | GSE6690 | 3/27/2007 | ['6690'] | ['2651'] | [u'17374847'] | 2949890 | [u'20831831'] | ['Lang', 'Rodr\xc3\xadguez', 'Wagner', 'Dietrich', 'Mages', 'Miethke', 'Wantia', u'Rodr\xedguez'] | ['Monsonego', 'Rubin', 'Geifman'] | [] | BMC Bioinformatics | 2010 | 9/12/2010 | 0 | ound 0.456971* 0.505641 External side of plasma membrane 0.744269 0.508488* Not Found (A) Enrichment in genes with up-regulated expression in hippocampus treated with Fluoxetine, based on GEO profile GSE6476. All terms passed FDR > 0.25 and state the FDR values. (B) GEO profile GSE6675. All terms passed FDR < 0.25 and state the FDR values (Up-regulated in Control in comparison to FGF2 t|ture. † indicates terms that were not included in the NIGO subset. For three GEO expression profiles, the results of the analysis are described in detail (Table 1 ). In the analysis of the GSE6476, in which the effect of chronic Fluoxetine treatment on hippocampal gene expression was examined [ 10 ], eleven terms passed statistic filtering with NIGO and not with the full GO, out of which 7 w|ded in the full GO but not in NIGO passed the statistic cutoffs but this term ('Protein self-association') was functionally irrelevant and contributed very little to the analysis. For the GEO profile GSE6675, in which astroglial gene expression program elicited by fibroblast growth factor-2 was examined, four terms relevant to neural/immune systems passed statistic filtering with NIGO and not with the |ith the full GO. This is an example of how without the use of NIGO one would have to raise the cutoff to at least 0.55 in order for this term to appear in the analysis results. Functional analysis of GSE6509 with NIGO also revealed statistically significant terms that were missed when using the full GO. This experiment involved microarray expression profiling designed to explore the effect of RU486, a |ed both in NIGO and the full GO, received higher FDR values when NIGO was used (Table 2 ), contrary to our expectations from the impact of reducing the ontology size. For example, in the analysis of GSE8788 in which gene expression analysis was conducted using Trib1-deficient macrophages treated with LPS as compared to LPS-treated wild-type macrophages [ 12 ], 11 out of 14 terms (that passed statistic|ment in FDR values when using NIGO is partially due to the stochastic nature of the GSEA algorithm. To test this hypothesis, the same analysis was repeated three times with each of the ontologies for GSE8788. FDR values were averaged and a comparison of analysis results was performed based on these averaged FDR values. In accordance with our hypothesis, the averaging of FDR values improved the apparent|l GO and GO-slim in enrichment analysis using GSEA Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.05 2 (2) 5 0 0 2 0 GSE6675 FDR < 0.25 0 6 0 0 1 0 GSE6476 FDR < 0.25 1 (1) 11 0 0 4 0 GSE3779 P < 0.05 7 (3) 13 1 (1) 0 1 5 GSE8425 P < 0.05 6 (1) 0 0 | 0 0 0 1 GSE8788 P < 0.01 6 (6) 0 0 11 3 1 GSE8788* P < 0.01 6 (6) 0 0 3 11 1 GSE9659 P < 0.01 6 (6) 0 NS A 4 28 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.01 5 (5) 0 0 0 1 2 GSE2259 FDR < 0.25 3 (1) 0 1 (1) 1 2 1 GSE8191 P < 0.05 5 (2) 0 0 0 0 0 The enrichment analysis results of GSEA, providing the full GO, NIGO or GO-s| for each subset given in separate columns. *To test the effect of stochasticity, the analysis of this profile is based on 3-fold averaged FDR values from three independent GSEA analyses. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. Three non-neural- or immune-related datasets were used to test the performance of NIGO in functional analysis of microarray data in|. These datasets included the GEO expression profile, GSE7407, in which gene expression in heart tissue with cardiac specific over-expression of Sirt1 was examined [ 13 ], the GEO expression profile, GSE8191, in which the gene expression profile of mammary glands from pregnant mice was compared to that of mammary glands from lactating mice [ 14 ], and the GEO expression profile, GSE2259, in which gene |in enrichment analysis using the Fisher Exact Test Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.0001 1 (1) 1 0 0 22 2 GSE6675 P < 0.05 1 (1) 0 11 (1) 0 2 0 GSE6476 P < 0.0001 0 2 2 0 15 3 GSE3779 P < 0.1 0 0 0 0 2 0 GSE8425 P < 0.1 0 0 0 0 0 0 GSE|0.001 0 2 0 0 9 1 GSE6136 P < 0.000001 0 1 0 0 37 7 GSE8788 P < 0.01 0 2 0 0 4 1 GSE9659 P < 0.001 0 3 NS A 0 33 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.0000000001 33 (30) 0 0 9 0 0 GSE2259 P < 0.1 63 (35) 0 2 8 8 7 GSE8191 P < 0.001 6 (6) 7 6 (2) 0 16 3 Enrichment analysis results from Ontologizer, providing the full |ional GO subset (the full GO or generic slim) are also shown for each term ('Lowest P-Value'), with the number of terms with the lowest P-values for each subset given in separate columns. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. These results, together with the analysis of five additional neural/immune-related experiments not described in Table 1 (namely p| cell's activity. These results further demonstrate that analysis with NIGO can enhance interpretation of functional analysis results produced for relevant microarray datasets. In the analysis of the GSE6509 expression dataset, three relevant terms passed the statistical cutoff with NIGO but not with the full GO. These terms, 'viral envelope', 'viral infectious cycle' and 'viral capsid' are all terms r|et. Such terms received FDR or p-values that were very close (but larger) than the cutoff values used. This is partially explained by the stochastic nature of the GSEA algorithm. Indeed, for one set (GSE8788), we compared the raw results with averaged FDR values. Averaging dramatically decreased the number of such terms. Furthermore, we conducted a similar functional analysis using the Fisher Exact Tes|nous peptide antigen via MHC class I' and 'Positive regulation of T cell mediated cytotoxicity' without defeating the purpose of the slimming process. Yet these two terms were found to be enriched in GSE6476, and are crucial for generating a hypothesis based on the expression profile. This shows that GO slims may be complemented by small, yet fully detailed domain-specific subsets of GO. Figure 4 The U|plementary/NIGO/Supplementary.html . The ontologies were clipped using the Protégé 4.0 beta OWL editor [ 26 ]. For by-species filtering, annotation files for human, rat and mouse were downloaded (October 2008) from GOA-EBI [ 27 ]. Association files used for GSEA analysis were generated based on the GOA-EBI annotation files and in the format required by GSEA. In this format, each row repres|a analysis and functional analysis were conducted using the GenePattern [ 29 ], GSEA [ 9 ] (release 2.5) web servers, and Ontologizer [ 16 ] as follows: (1) for each study, raw data (.CEL files) were downloaded from GEO [ 30 ]; (2) expression files (.gct files) were created using the Gene Pattern Expression File Creator module; (3) where necessary (i.e. for expression files GSE6509, GSE6675, GSE6476, GSE7|was conducted using the GSEA module. GSEA was run three times for each dataset, using a different GO version for each run. For the full GO, we used the organism-specific GO subset. In the analysis of GSE8788, GSEA was run three times for each of the three ontologies and FDR values were averaged over the three runs. (5) Differentially-expressed genes were found using the Gene Pattern ComparativeMarkerSe|, to an input term (Figure 4 ). The connections of UMLS concepts to the tested GO term are provided by UMLS and defined within the UMLS data files. Microarray data sets All microarray data sets were downloaded from GEO at NCBI [ 30 ]. The GEO sets used in this study are described in Additional file 2 . Availability NIGO is freely available as Additional file 6 and for download from: http://bioinfo.bg|re included or excluded from NIGO. Click here for file Additional file 2 Microarray Datasets for Comparative analysis of NIGO, GO and GO-slim . This file contains a summary of the microarray datasets downloaded from GEO at NCBI and used to test the performance of NIGO. Click here for file Additional file 3 GSEA analysis results . This file contains a summary of the GSEA analysis results for each of the mi|etroviral infectivity through different mechanisms Am J Physiol Lung Cell Mol Physiol 2009 297 L538 545 10.1152/ajplung.00162.2009 19561138 The Gene Ontology Consortium http://www.geneontology.org/GO.downloads.ontology.shtml Web Ontology Language Guide http://www.w3.org/TR/owl-guide Protégé 4.0 beta http://protege.stanford.edu Barrell D Dimmer E Huntley RP Binns D O'Donovan C Apweiler R Th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
603 | GSE6691 | 3/1/2007 | ['6691'] | ['2643'] | [u'17252022'] | 2713475 | [u'19443661'] | [u'Hern\xe1ndez', u'S\xe1nchez', 'Ocio', 'Arcos', 'San', 'Guti\xc3\xa9rrez', 'Fermi\xc3\xb1\xc3\xa1n', 'S\xc3\xa1nchez', 'de', 'Maiso', u'De', u'Fermi\xf1an', 'Hern\xc3\xa1ndez', u'Guti\xe9rrez', 'Delgado'] | ['Pitsillides', 'Runnels', 'Jia', 'Rollins', 'Lin', 'Ngo', 'Sacco', 'Roccaro', 'Thompson', 'Anderson', 'Ghobrial', 'Azab', 'Blotta', 'Melhem'] | [] | Blood | 2009 | 7/16/2009 | 0 | performed in ice-cold PBS. Samples were then analyzed with the use of flow cytometry. Gene expression profiling of RhoA, Rac1, and CDC42 Gene expression datasets from the Mayo Clinic (GEO accession GSE6477 ) were obtained from the Gene Expression Omnibus for analysis and generated by the use of the Affymetrix U133A platform. 21 The data pertaining to RhoA (probe ID 200059_s_at), Rac1 (probe ID 20864|and 3 MM patient samples demonstrating similar expression of both GTPases in all cell lines and patient samples. (B) Gene expression of the GTPases RhoA, Rac1, and CDC42, based on NIH Gene Expression Omnibus database under the accession number GSE6691{{tag}}--REUSE-- , demonstrating significant overexpression of RhoA and Rac1, but not CDC42, GTPases in MM samples compared with normal subjects. Gene expression profiling | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
604 | GSE6691 | 3/1/2007 | ['6691'] | ['2643'] | [u'17252022'] | 2531129 | [u'18700954'] | [u'Hern\xe1ndez', u'S\xe1nchez', 'Ocio', 'Arcos', 'San', 'Guti\xc3\xa9rrez', 'Fermi\xc3\xb1\xc3\xa1n', 'S\xc3\xa1nchez', 'de', 'Maiso', u'De', u'Fermi\xf1an', 'Hern\xc3\xa1ndez', u'Guti\xe9rrez', 'Delgado'] | ['Mosca', 'Lionetti', 'Agnelli', 'Deliliers', 'Andronache', 'Neri', 'Ronchetti', 'Fabris'] | [] | BMC Med Genomics | 2008 | 8/13/2008 | 1 | AND pmc_gds | 0 | 1 | ||||
605 | GSE6697 | 1/16/2007 | ['6697'] | [] | [u'17405863'] | 1851046 | [u'17405863'] | ['Ittrich', u'Gr\xf6ne', 'Wang', u'Sch\xfctz', 'Kueffer', 'Hotz-Wagenblatt', 'Arribas', 'Lemberger', 'Jonnakuty', 'Li', 'Gretz', 'Kenzelmann', 'Hergenhahn', 'Gr\xc3\xb6ne', 'Sch\xc3\xbctz', u'K\xfcffer', 'Hollstein', 'Schmid', 'Maertens'] | ['Ittrich', 'Wang', 'Kueffer', 'Hotz-Wagenblatt', 'Arribas', 'Lemberger', 'Jonnakuty', 'Li', 'Gretz', 'Kenzelmann', 'Hergenhahn', 'Gr\xc3\xb6ne', 'Sch\xc3\xbctz', 'Hollstein', 'Schmid', 'Maertens'] | ['Ittrich', 'Wang', 'Kueffer', 'Hotz-Wagenblatt', 'Arribas', 'Lemberger', 'Jonnakuty', 'Li', 'Gretz', 'Hollstein', 'Hergenhahn', 'Gr\xc3\xb6ne', 'Sch\xc3\xbctz', 'Kenzelmann', 'Schmid', 'Maertens'] | Proc Natl Acad Sci U S A | 2007 | 4/10/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
606 | GSE6698 | 1/16/2007 | ['6698'] | [] | [u'17405863'] | 1851046 | [u'17405863'] | ['Ittrich', u'Gr\xf6ne', 'Wang', u'Sch\xfctz', 'Kueffer', 'Hotz-Wagenblatt', 'Arribas', 'Lemberger', 'Jonnakuty', 'Li', 'Gretz', 'Kenzelmann', 'Hergenhahn', 'Gr\xc3\xb6ne', 'Sch\xc3\xbctz', u'K\xfcffer', 'Hollstein', 'Schmid', 'Maertens'] | ['Ittrich', 'Wang', 'Kueffer', 'Hotz-Wagenblatt', 'Arribas', 'Lemberger', 'Jonnakuty', 'Li', 'Gretz', 'Kenzelmann', 'Hergenhahn', 'Gr\xc3\xb6ne', 'Sch\xc3\xbctz', 'Hollstein', 'Schmid', 'Maertens'] | ['Ittrich', 'Wang', 'Kueffer', 'Hotz-Wagenblatt', 'Arribas', 'Lemberger', 'Jonnakuty', 'Li', 'Gretz', 'Hollstein', 'Hergenhahn', 'Gr\xc3\xb6ne', 'Sch\xc3\xbctz', 'Kenzelmann', 'Schmid', 'Maertens'] | Proc Natl Acad Sci U S A | 2007 | 4/10/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
607 | GSE6700 | 6/1/2007 | ['6700'] | ['2747'] | [u'17420277'] | 1900060 | [u'17420277'] | ['Englert', 'Klattig', 'Sierig', 'Kruspe', 'Besenbeck'] | ['Englert', 'Klattig', 'Sierig', 'Kruspe', 'Besenbeck'] | ['Klattig', 'Sierig', 'Englert', 'Kruspe', 'Besenbeck'] | Mol Cell Biol | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
608 | GSE6703 | 4/2/2007 | ['6703'] | [] | [u'18923020', u'18253465'] | 1939914 | [u'18253465'] | ['', 'Ambrose', 'Macknight', 'Herridge', 'McNoe', 'Day'] | ['McNoe', 'Macknight', 'Day'] | ['McNoe', 'Macknight', 'Day'] | Int J Plant Genomics | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
609 | GSE6703 | 4/2/2007 | ['6703'] | [] | [u'18923020', u'18253465'] | 2593665 | [u'18923020'] | ['', 'Ambrose', 'Macknight', 'Herridge', 'McNoe', 'Day'] | ['Herridge', 'Macknight', 'Ambrose', 'Day'] | ['Herridge', 'Macknight', 'Ambrose', 'Day'] | Plant Physiol | 2008 | 2008 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
610 | GSE6707 | 6/6/2007 | ['6707'] | [] | [u'17526728'] | 1952100 | [u'17526728'] | ['Durant', 'Pugh'] | ['Durant', 'Pugh'] | ['Durant', 'Pugh'] | Mol Cell Biol | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
611 | GSE6709 | 1/19/2007 | ['6709'] | [] | [u'17264217'] | 1794314 | [u'17264217'] | ['Anderton', 'Van', 'Wilbrink', 'Eltis', 'Hara', 'Yam', u'Geize', 'Davies', 'Dijkhuizen', 'Heuser', 'Mohn', 'Sim'] | ['Anderton', 'Van', 'Wilbrink', 'Eltis', 'Hara', 'Yam', 'Davies', 'Dijkhuizen', 'Heuser', 'Mohn', 'Sim'] | ['Anderton', 'Van', 'Wilbrink', 'Eltis', 'Hara', 'Yam', 'Davies', 'Dijkhuizen', 'Heuser', 'Mohn', 'Sim'] | Proc Natl Acad Sci U S A | 2007 | 2/6/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
612 | GSE6710 | 1/12/2007 | ['6710'] | ['2518'] | [u'16858420'] | 2860497 | [u'20436964'] | ['Heubach', 'Mrowietz', 'Reischl', 'Beekman', u'Ternes', u'Sturzebecher', 'St\xc3\xbcrzebecher', u'Bauer', 'Schwenke'] | ['Montaner', 'Dopazo'] | [] | PLoS One | 2010 | 4/27/2010 | 0 | The first experiment consisted of the comparison of lessional and non lessional skin samples in atopic dermatitis patients [32] (data were obtained from the GEO database, accession: GSE5667). The second experiment compared affected and unaffected skin in psoriatic patients [31] (GEO database, accession: GSE6710{{tag}}--REUSE--). Separated gene expression analyses of these two datase | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
613 | GSE6711 | 12/20/2007 | ['6711'] | ['3432'] | [u'17682054'] | 2168898 | [u'17682054'] | ['Collins', 'Lu', 'Cidlowski', 'Grissom'] | ['Collins', 'Lu', 'Cidlowski', 'Grissom'] | ['Collins', 'Lu', 'Cidlowski', 'Grissom'] | Mol Cell Biol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
614 | GSE6714 | 11/11/2007 | ['6714'] | [] | [u'17994021'] | 2365887 | [u'17994021'] | ['Nechaev', 'Parker', 'Zeitlinger', u'Shau', 'Muse', 'Gilchrist', 'Adelman', 'Shah', 'Grissom'] | ['Nechaev', 'Parker', 'Zeitlinger', 'Muse', 'Gilchrist', 'Adelman', 'Shah', 'Grissom'] | ['Nechaev', 'Parker', 'Zeitlinger', 'Muse', 'Gilchrist', 'Adelman', 'Shah', 'Grissom'] | Nat Genet | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
615 | GSE6717 | 5/20/2007 | ['6717'] | [] | [u'17557819'] | 1952027 | [u'17557819'] | ['Cecchini', 'Gorton', 'Geary'] | ['Cecchini', 'Gorton', 'Geary'] | ['Cecchini', 'Gorton', 'Geary'] | J Bacteriol | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
616 | GSE6719 | 2/15/2007 | ['6719'] | ['2632'] | [u'17293362'] | 2923530 | [u'20353606'] | ['Kamada-Nobusada', 'Makita', 'Sakakibara', 'Kojima', 'Hirose'] | ['Narsai', 'Ivanova', 'Whelan', 'Ng'] | [] | BMC Plant Biol | 2010 | 3/31/2010 | 0 | To compile the entire publically available Affymetrix rice microarray (as at 1stÊAugust 2009), all experiments containing CEL files were downloaded from the Gene Expression Omnibus within the National Centre for Biotechnology Information database or from the MIAME ArrayExpress databasehttp://www.ebi.ac.uk/arrayexpress/.Ê --REUSE-- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
617 | GSE6719 | 2/15/2007 | ['6719'] | ['2632'] | [u'17293362'] | 2736006 | [u'19571305'] | ['Kamada-Nobusada', 'Makita', 'Sakakibara', 'Kojima', 'Hirose'] | ['Millar', 'Carroll', 'Narsai', 'Ivanova', 'Whelan', 'Howell'] | [] | Plant Physiol | 2009 | 2009 Sep | 1 | or rice development and growth, we investigated the expression of the 63 transcription factors defined as aerobic ( Fig. 6A ) and the 71 transcription factors defined as anaerobic ( Fig. 6B ) in all publicly available Affymetrix rice microarray data (68 different conditions). Analysis of the expression pattern of the 63 aerobic transcription factors reveals that young root tissue, stigma, and anther have|ess ( Yanhui et al., 2006 ). Thus, many of the aerobic transcription factors overlap with responses to stress. Figure 6. Analysis of aerobic and anaerobic responsive transcription factor genes across publicly available rice arrays. All 63 aerobic and 71 anaerobic responsive genes encoding transcription factors were hierarchically clustered across the germination, switch, (more ...) Figure 6. Analysis o|anges across different tissues, under different conditions, and compare these with the transcript abundance profiles generated from this study, rice array data were retrieved from the Gene Expression Omnibus within the National Center for Biotechnology Information database as described previously ( Howell et al., 2009 ) with the addition of data derived from cytokinin treatment of roots and leaves ( GSE6|en examined in the whole genome and the 12 subsets. 3′ UTR Sequence Analysis The full genome 3′ UTR and 5′ UTR sequences available from The Institute for Genomic Research were downloaded and filtered to retain only the 3′ UTRs. However, this only included a total of 3,027 UTRs available for the “whole genome.” Taking this small number into consideration, it |d metabolism and lipid metabolism-related functions in the core anaerobic set. Supplemental Figure S5. Hierarchical clustering analysis of the aerobic and anaerobic transcription factor genes across publicly available rice arrays. Supplemental Table S1. All 29,087 expressed genes, GC-RMA normalized values, and these values normalized to maximum. Supplemental Table S2. The number of differentially expre | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
618 | GSE6719 | 2/15/2007 | ['6719'] | ['2632'] | [u'17293362'] | 2825235 | [u'20109239'] | ['Kamada-Nobusada', 'Makita', 'Sakakibara', 'Kojima', 'Hirose'] | ['Ghanashyam', 'Jain', 'Bhattacharjee'] | [] | BMC Genomics | 2010 | 1/29/2010 | 0 | olved in a particular biological process. In the second approach, we used the microarray data for various tissues/organs and developmental stages available at GEO database under the accession numbers GSE6893 and GSE7951. The series GSE6893 includes microarray data from 45 hybridizations representing three biological replicates each of 15 different tissues/organs and developmental stages [ 30 ], whereas| GSE7951 includes the microarray data from 12 hybridizations representing 9 different tissue samples [ 33 ]. Because three biological replicates were available only for stigma and ovary in the series GSE7951 dataset, only these data were used in this analysis. All the tissues/organs and developmental stages for which microarray data was analyzed in this study are summarized in Additional file 5 . The |ted [ 6 , 40 ]. To study the effect of various abiotic stresses (desiccation, salt, cold and arsenate) on the expression profiles of GST genes, microarray data available under series accession number GSE6901 [ 30 ] was analyzed. Differential expression analysis for rice seedlings treated with different abiotic stresses (desiccation, salt and cold) as compared to mock-treated control seedlings was perfo|vely. The control seedlings were kept in water for 3 h, at 28 ± 1°C. Microarray data analysis The microarray data publicly available at GEO database under the series accession numbers GSE6893 (expression data for reproductive development), GSE7951 (expression profiling of stigma), GSE6901 (expression data for stress treatment), GSE4471 (expression data from rice varieties Azucena and Ba| arsenate), GSE5167 (expression data for auxin and cytokinin response), GSE6719{{tag}}--REUSE-- (expression data for cytokinin response), GSE7256 (expression data for virulent infection by Magnaporthe grisea ), and GSE10373 (expression data for interaction with the parasitic plant Striga hermonthica ) were used for expression analysis of rice GST genes. The entire microarray experiments used in this study are listed | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
619 | GSE6719 | 2/15/2007 | ['6719'] | ['2632'] | [u'17293362'] | 2528094 | [u'18650402'] | ['Kamada-Nobusada', 'Makita', 'Sakakibara', 'Kojima', 'Hirose'] | ['Liu', 'Zhao', 'Lu', 'Han', 'Huang'] | [] | Plant Physiol | 2008 | 2008 Sep | 0 | a sequences, by BACs directly or by 87 assembled contigs, were performed. The alignment results of BGI 93-11 contigs and Nipponbare pseudomolecules, which were generated by the software nucmer, were downloaded using the GFF Dumper on the TIGR Genome Browser. We found that a small quantity of anchor results were self-contradictory; that is, two 93-11 contigs that localized on the same location yielded opp| more than 100 bp were further confirmed by BLAST2. The indica Guangluai 4 BACs were obtained from http://www.ncgr.ac.cn/chinese/databasei.htm . The genomic sequences of japonica Nipponbare were downloaded from http://www.tigr.org/tdb/e2k1/osa1 , and the indica 93-11 sequences were downloaded from ftp://ftp.genomics.org.cn . Mining of TIPs in the Rice Genome For each insertion region identified a| known TE repeat databases using RepeatMasker, as described above. Those elements, which were composed of a single LTR, were recognized as solo LTR retroelements. EST Analysis and Gene Prediction All publicly available rice ESTs were obtained from the National Center for Biotechnology Information EST database ( http://www.ncbi.nlm.nih.gov/projects/dbEST/ ). Full-length cDNAs of both KOME ( http://red.dna.|t the two gene fragments of indica XIP-I separated by a TE insertion. The probes in the two probe sets were remapped to the rice genomes, Nipponbare pseudomolecules and 93-11 contigs, by BLASTN. We downloaded the microarray data files of each experiment from the GEO Web site ( http://www.ncbi.nlm.nih.gov/geo/ ). Overall, there are 57 chips of indica IR64 (45 from GSE6893 and 12 from GSE6901 ) and 4 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
620 | GSE6720 | 2/15/2007 | ['6720'] | ['2633'] | [u'17293362'] | 2528094 | [u'18650402'] | ['Kamada-Nobusada', 'Makita', 'Sakakibara', 'Kojima', 'Hirose'] | ['Liu', 'Zhao', 'Lu', 'Han', 'Huang'] | [] | Plant Physiol | 2008 | 2008 Sep | 0 | a sequences, by BACs directly or by 87 assembled contigs, were performed. The alignment results of BGI 93-11 contigs and Nipponbare pseudomolecules, which were generated by the software nucmer, were downloaded using the GFF Dumper on the TIGR Genome Browser. We found that a small quantity of anchor results were self-contradictory; that is, two 93-11 contigs that localized on the same location yielded opp| more than 100 bp were further confirmed by BLAST2. The indica Guangluai 4 BACs were obtained from http://www.ncgr.ac.cn/chinese/databasei.htm . The genomic sequences of japonica Nipponbare were downloaded from http://www.tigr.org/tdb/e2k1/osa1 , and the indica 93-11 sequences were downloaded from ftp://ftp.genomics.org.cn . Mining of TIPs in the Rice Genome For each insertion region identified a| known TE repeat databases using RepeatMasker, as described above. Those elements, which were composed of a single LTR, were recognized as solo LTR retroelements. EST Analysis and Gene Prediction All publicly available rice ESTs were obtained from the National Center for Biotechnology Information EST database ( http://www.ncbi.nlm.nih.gov/projects/dbEST/ ). Full-length cDNAs of both KOME ( http://red.dna.|t the two gene fragments of indica XIP-I separated by a TE insertion. The probes in the two probe sets were remapped to the rice genomes, Nipponbare pseudomolecules and 93-11 contigs, by BLASTN. We downloaded the microarray data files of each experiment from the GEO Web site ( http://www.ncbi.nlm.nih.gov/geo/ ). Overall, there are 57 chips of indica IR64 (45 from GSE6893 and 12 from GSE6901 ) and 4 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
621 | GSE6727 | 1/13/2007 | ['6727'] | [] | [u'19615063'] | 2724489 | [u'19615063'] | ['Showe', 'Agosto-Perez', 'Qin', 'Huang', 'Chan', 'Lin', 'Souriraj', 'Potter', 'Yan', 'Liyanarachchi', 'Cheng', 'Saltz', 'Nephew', 'Nikonova', 'Davuluri', 'Balch'] | ['Showe', 'Agosto-Perez', 'Qin', 'Huang', 'Chan', 'Lin', 'Souriraj', 'Potter', 'Yan', 'Liyanarachchi', 'Cheng', 'Saltz', 'Nephew', 'Nikonova', 'Davuluri', 'Balch'] | ['Agosto-Perez', 'Qin', 'Chan', 'Showe', 'Lin', 'Souriraj', 'Potter', 'Yan', 'Balch', 'Liyanarachchi', 'Cheng', 'Saltz', 'Nephew', 'Nikonova', 'Davuluri', 'Huang'] | BMC Syst Biol | 2009 | 7/17/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
622 | GSE6731 | 3/1/2007 | ['6731'] | ['2642'] | [u'17262812'] | 2762058 | [u'19828069'] | ['Parmigiani', 'Cope', 'Chakravarti', 'Maitra', 'Wu', 'Harris', 'Bayless', 'Brant', 'Dassopoulos'] | ['Balestrieri', 'Vanoni', 'Chiaradonna', 'Alberghina'] | [] | BMC Bioinformatics | 2009 | 10/15/2009 | 0 | Table 2 Gene expression profiling datasets of NCI60 cell lines and normal tissues analyzed in this study Reference Tissue of origin Number of transcriptional profiles GEO Number [ 31 ] NCI60 cells 60 GSE5949 Breast 0 - [ 106 ] CNS 2 GSE96 [ 107 ] Colon 4 GSE6731{{tag}}--REUSE-- [ 108 ] Blood 4 GSE1402 [ 106 ] Lung 2 GSE96 Skin 0 - [ 106 ] Ovary 3 GSE96 [ 106 ] Prostate 3 GSE96 [ 106 ] Kidney 3 GSE96 Gene expression pr|) at the National Center for Biotechnology Information (NCBI) website ( ) [ 105 ]. In particular, gene expression profiles of NCI60 cell collection (cancer samples) were recovered from GEO database ( GSE5949, [ 31 ]) in which the experimental data were obtained by using the Affymetrix HG-U95Av2 oligonucleotide array platform. For the analysis only results obtained by oligonucleotide arrays were consid|NA array platform. Therefore, also for normal tissue samples, the data used for the comparative analysis, were recovered from transcriptional profiles produced by using U95Av2 oligonucleotide array ( GSE96 [ 106 ], GSE6731{{tag}}--REUSE-- [ 107 ] and GSE1402 [ 108 ]). A total of 81 transcriptional profiles encompassing cancer cell lines with nine histological origins and samples from six normal tissues were reco | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
623 | GSE6731 | 3/1/2007 | ['6731'] | ['2642'] | [u'17262812'] | 2944782 | [u'20885780'] | ['Parmigiani', 'Cope', 'Chakravarti', 'Maitra', 'Wu', 'Harris', 'Bayless', 'Brant', 'Dassopoulos'] | ['Klassen', 'Khush', 'Morgan', 'Valantine', 'Dudley', 'Li', 'Sigdel', 'Sarwal', 'Kambham', 'Chen', 'Butte', 'Hsieh', 'Caohuu'] | [] | PLoS Comput Biol | 2010 | 9/23/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
624 | GSE6734 | 3/8/2007 | ['6734'] | [] | [u'17346786'] | 2825236 | [u'20113528'] | ['Sachidanandam', 'Brennecke', 'Hannon', 'Stark', 'Kellis', 'Aravin', 'Dus'] | ['Korbie', 'Mattick', 'Hansen', 'Makunin', 'Jung'] | [] | BMC Genomics | 2010 | 2/1/2010 | 0 | As frequently cover the entire length of annotated snoRNAs or tRNAs, which suggests that other loci specifying similar ncRNAs can be identified by clusters of short RNA sequences. Results We combined publicly available datasets of tens of millions of short RNA sequence tags from Drosophila melanogaster , and mapped them to the Drosophila genome. Approximately 6 million perfectly mapping sequence tags w|snoRNAs, as well as a number of novel ncRNAs. Results Compilation of short RNA sequence reads into tag-contigs We obtained 10,846,433 sequence tags comprising 55,894,809 reads from 12 Gene Expression Omnibus (GEO) datasets (Table 1 ) derived from 90 experiments performed on Drosophila cell lines and tissues. Approximately 6 million tags were perfectly mapped to the D. melanogaster genome, excluding |ee Methods). As a measure of expression level, each TC was assigned a tag-depth score based on the maximum number of overlapping reads covering any part of the locus (Fig. 1 ) (see Methods). Table 1 Publicly available short RNA sequencing datasets on D. malanogaster GEO accession No. of tags Mappable References GSE10277 23252 12057 [ 14 ] GSE10515 49878 12096 [ 15 ] GSE10790 347861 30780 [ 19 ] GSE10794|19 255670 381508 [ 9 ] GSE11086 1277025 1509771 [ 13 ] GSE11624 6643474 3125323 [ 12 ] GSE6734{{tag}}--REUSE-- 32160 34362 [ 8 ] GSE7448 753797 452471 [ 17 ] GSE9138 13299 13294 [ 20 ] GSE9389 59906 32472 [ 18 , 9 ] GSE12527 2967 817 [ 11 ] total 10846433 6297373 Figure 1 Compilation of a tag-contig . Contiguously overlapping tags (grey arrows) were assembled into a tag-contig (TC) (block arrow). The tag-depth is the |rts of existing transposons generating siRNAs or piRNAs. Conclusions Several studies investigating the population of small RNAs have yielded millions of sequence reads. In this study, we combined all publicly available sequence data from Drosophila melanogaster short RNA into hundreds of thousands tag-contigs and associated subsets of them with known ncRNAs such as snoRNAs and tRNAs. The characteristic | miRbase release 12.0 [ 25 ]. Repeats were annotated using RepeatMasker [ 43 ] in FlyBase 5.12. Mapping of sequence tags We obtained all public available deep-sequencing datasets from Gene Expression Omnibus database at National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/geo in SOFT format (Table 1 ). These sequences were subsequently mapped to the genome of D. melanogaster usi | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
625 | GSE6734 | 3/8/2007 | ['6734'] | [] | [u'17346786'] | 2882632 | [u'19395010'] | ['Sachidanandam', 'Brennecke', 'Hannon', 'Stark', 'Kellis', 'Aravin', 'Dus'] | ['Sachidanandam', 'Brennecke', 'Hannon', 'McCombie', 'Stark', 'Malone', 'Dus'] | ['Sachidanandam', 'Dus', 'Brennecke', 'Stark', 'Hannon'] | Cell | 2009 | 5/1/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
626 | GSE6736 | 2/1/2007 | ['6736'] | [] | [u'17189329'] | 1803716 | [u'17189329'] | ['Roney', 'Khatibi', 'Westwood'] | ['Roney', 'Khatibi', 'Westwood'] | ['Roney', 'Khatibi', 'Westwood'] | Plant Physiol | 2007 | 2007 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
627 | GSE6738 | 1/17/2007 | ['6738'] | [] | [u'17268553'] | 1852848 | [u'17268553'] | ['Hernandez', 'Hekkelman', 'van', '', 'Wehrens', 'Grummt', 'Stunnenberg', 'Hulsen', 'Denissov', 'Voit'] | ['Hernandez', 'Hekkelman', 'van', 'Wehrens', 'Grummt', 'Stunnenberg', 'Hulsen', 'Denissov', 'Voit'] | ['Hernandez', 'Hekkelman', 'van', 'Wehrens', 'Grummt', 'Stunnenberg', 'Hulsen', 'Denissov', 'Voit'] | EMBO J | 2007 | 2/21/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
628 | GSE6739 | 2/11/2007 | ['6739'] | [] | [u'17293863', u'21177964'] | 2753834 | [u'17293863'] | ['Desai', 'Egelhofer', 'Ikegami', 'Rechtsteiner', 'Liu', 'Jensen', 'Zhang', 'Whittle', 'Kolasinska-Zwierz', 'Takasaki', 'Green', 'Iniguez', 'Taing', 'Lieb', 'Shin', 'Rosenbaum', 'Vielle', 'Ercan', 'Ahringer', 'Giresi', 'Cheung', 'Dernburg', 'Strome', 'Latorre'] | ['Lieb', 'Zhang', 'Giresi', 'Whittle', 'Green', 'Ercan'] | ['Lieb', 'Zhang', 'Giresi', 'Whittle', 'Green', 'Ercan'] | Nat Genet | 2007 | 2007 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
629 | GSE6739 | 2/11/2007 | ['6739'] | [] | [u'17293863', u'21177964'] | 2522285 | [u'18787694'] | ['Desai', 'Egelhofer', 'Ikegami', 'Rechtsteiner', 'Liu', 'Jensen', 'Zhang', 'Whittle', 'Kolasinska-Zwierz', 'Takasaki', 'Green', 'Iniguez', 'Taing', 'Lieb', 'Shin', 'Rosenbaum', 'Vielle', 'Ercan', 'Ahringer', 'Giresi', 'Cheung', 'Dernburg', 'Strome', 'Latorre'] | ['McClinic', 'Lieb', 'Zhang', 'Whittle', 'Green', 'Kelly', 'Ercan'] | ['Lieb', 'Whittle', 'Green', 'Ercan', 'Zhang'] | PLoS Genet | 2008 | 9/12/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
630 | GSE6739 | 2/11/2007 | ['6739'] | [] | [u'17293863', u'21177964'] | 2783177 | [u'19853451'] | ['Desai', 'Egelhofer', 'Ikegami', 'Rechtsteiner', 'Liu', 'Jensen', 'Zhang', 'Whittle', 'Kolasinska-Zwierz', 'Takasaki', 'Green', 'Iniguez', 'Taing', 'Lieb', 'Shin', 'Rosenbaum', 'Vielle', 'Ercan', 'Ahringer', 'Giresi', 'Cheung', 'Dernburg', 'Strome', 'Latorre'] | ['Lieb', 'Dick', 'Ercan'] | ['Lieb', 'Ercan'] | Curr Biol | 2009 | 11/17/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
631 | GSE6740 | 2/23/2007 | ['6740'] | ['2649'] | [u'17251300'] | 2975422 | [u'21047384'] | ['Kovacs', 'Loutfy', 'Der', 'Halpenny', 'Wilkins', u'Loufty', 'Yang', 'Heisler', u'Covacs', 'Hyrcza', 'Ostrowski'] | ['Xu', 'Hu'] | [] | BMC Genomics | 2010 | 11/2/2010 | 0 | essed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test|dary region in the 2-D feature space of average gene expression (AG) versus average difference of gene expression (AD). Fig. 1 . shows the distribution of true DEGs in the 2D space for four datasets: GSE9499, GSE6342, GSE6740{{tag}}--REUSE--_1, and GSE6740{{tag}}--REUSE--_2 from GEO database [ 22 ]. Based on the fact that boundary region is characterized with scarcity of genes, a density based pruning algorithm is proposed here for p|s as collected by Kadota et. al. [ 20 ]. They collected 38 microarray datasets with experimentally determined true DEGS by real-time polymerase chain reaction (RT-PCR). Thirty six of the datasets are downloaded from GEO database [ 22 ]. Without losing generality, we experimented with 17 disease or dose response datasets of Homo sapiens out of the 36 GEO datasets (Table 1 ). The 17 datasets are reported j|how that real-world GEO datasets, especially historical ones, tend to have small sample size. Table 1 17 Datasets with 284 DEGs in total. Each dataset has 22833 genes. Dataset Conditions True DEG A B GSE1462 4 4 4 GSE1615_1 4 5 8 GSE1650 18 12 8 GSE2666_2 5 5 6 GSE3524 16 4 4 GSE3860 9 9 8 GSE4917 3 3 5 GSE5667_1 5 6 3 GSE6236 14 14 7 GSE6344 10 10 19 GSE6740{{tag}}--REUSE--_1 10 10 40 GSE6740{{tag}}--REUSE--_2 10 10 62 GSE7146 6 6 6|GSE8441 11 11 9 GSE9499 15 7 77 GSE9574 15 14 5 The 17 Datasets used here cover a variety of biological or medical studies: GSE1462 (mitochondrial DNA mutations), GSE1615_1 (Valproic acid treatment), GSE1650 (chronic obstructive pulmonary disease), GSE2666_2(bone marrow Rho level effect), GSE3524 (tumor of epithelial tissue), GSE3860 (Hutchinson-Gilford progeria syndrome), GSE4917 (breast cancer), GSE5|67_1 (atopic dermatitis), GSE6236 (Adult vs. fetal reticulocyte transcriptome comparison), GSE6344 (renal cell carcinoma disease), GSE6740{{tag}}--REUSE--_1 (HIV-infection), GSE6740{{tag}}--REUSE--_2 (HIV-infection, disease state), GSE7146 (hyperinsulinaemic, does response), GSE7765 (dose response, DMSO or 100 nM Dioxin), GSE8441 (dietary intake response), GSE9574 (breast cancer), and GSE9499 (hypomorphic germline mutations). The div|ch as WAD and FC. To illustrate the bias of popular DEG identification algorithms, Fig. 2 shows true positive (TP), true negative (TN), false positive (FP), and false negative (FN) DEGs for dataset GSE9499 which has 77 true DEGs. Fig. 2 . (a) shows that fold change (FC) misses most true DEGs (FN genes), which are located in the region below the threshold average difference and with high expression le|obtaining a user-specified number of candidate DEGs. Table 2 Comparison of No. of missing true DEGs after DB pruning. ( N 0 = 4, R 0 = 0.0017) Total Gene: 22283 After DP-pruning True DEG DP missed GSE1462 2054 4 0 GSE1615_1 2449 8 3 GSE1650 1317 8 2 GSE2666_2 1618 6 2 GSE3524 814 4 0 GSE3860 2073 8 0 GSE4917 785 5 1 GSE5667_1 1316 3 0 GSE6236 2231 7 0 GSE6344 3127 19 0 GSE6740{{tag}}--REUSE--_1 1183 40 1 GSE6740{{tag}}--REUSE--_2 |DEG to have high expression levels and high expression difference. Table 3 Ranks of true DEGs in original gene list and pruned gene list. Genes are sorted by four DEG identification algorithms on the GSE1577 dataset. Increase of ranks of true DEGs means that DB pruning have correctly filtered out many non-DEGs. t-test/tTest’ 1404/808 7/6 1321/768 3800/1713 4741/1975 3633/1659 4145/1828 606/388 |on level remains the same. And the average difference of expression can be defined as sum of average difference among pairwise comparisons. Our DB pruning is implemented using C++ and Perl and can be downloaded from http://mleg.cse.sc.edu/degprune . Authors’ contributions JH initiated the project, proposed the DB pruning idea, helped in experimental designs, and wrote the manuscript. J. X. develo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
632 | GSE6740 | 2/23/2007 | ['6740'] | ['2649'] | [u'17251300'] | 2620272 | [u'19014681'] | ['Kovacs', 'Loutfy', 'Der', 'Halpenny', 'Wilkins', u'Loufty', 'Yang', 'Heisler', u'Covacs', 'Hyrcza', 'Ostrowski'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | luate switch-like gene expression patterns in high-dimensional datasets profiling diverse biological conditions. For this purpose, we compiled two large-scale gene expression microarray datasets from publicly available data repositories. The first dataset included samples spanning nineteen different tissue types from healthy donors. The second dataset included samples from donors with one of a number of i| genes may serve as candidate biomarkers or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epiderm|133, GSE2361, GSE3419, GSE3526, GSE7307 Heart 38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, G| Ovary 10 GSE2361, GSE3526, GSE6008, GSE7307 Pancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of|on regions of DNA that code for switch-like genes and their promoter regions. Methods Datasets Microarray datasets used in this study were compiled from the online public repositories Gene Expression Omnibus (GEO) [ 53 ] and Array Express (AE) [ 54 ] as described in additional file 2 . All datasets were profiled on the HGU133A or its recently expanded version, the HGU133plus2 Affymetrix platforms. The da|ssi A Lee C Relative impact of nucleotide and copy number variation on gene expression phenotypes Science 2007 315 848 853 17289997 10.1126/science.1136678 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic acids research 2002 30 207 210 11752295 10.1093/nar/30.1.207 Parkinson H Kapushesky M Shojatalab M Abeygunawardena N Coulson R Fa | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
633 | GSE6740 | 2/23/2007 | ['6740'] | ['2649'] | [u'17251300'] | 2893168 | [u'20525252'] | ['Kovacs', 'Loutfy', 'Der', 'Halpenny', 'Wilkins', u'Loufty', 'Yang', 'Heisler', u'Covacs', 'Hyrcza', 'Ostrowski'] | ['Varshavsky', 'Gottlieb', 'Horn', 'Linial'] | [] | BMC Bioinformatics | 2010 | 6/3/2010 | 0 | 38 , 39 ]. Data collections are: (i) Gene expression measurements taken from skin tissues including 7 normal skin tissues, 18 benign melanocytic lesions and 45 malignant melanoma [ 28 ] (series entry GSE3189); (ii) HIV dataset (series entry GSE6740{{tag}}--REUSE--), containing gene expression measurements from 20 CD4+ and 20 CD8+ T cells from HIV patients at different clinical stages; (iii) Hepatitis C (series entry G | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
634 | GSE6740 | 2/23/2007 | ['6740'] | ['2649'] | [u'17251300'] | 3002992 | [u'21187905'] | ['Kovacs', 'Loutfy', 'Der', 'Halpenny', 'Wilkins', u'Loufty', 'Yang', 'Heisler', u'Covacs', 'Hyrcza', 'Ostrowski'] | ['Kanduri', 'Santoni', 'Castiglione', 'Hovig', 'Clancy', 'Pedicini', 'Barren\xc3\xa4s', 'Benson'] | [] | PLoS Comput Biol | 2010 | 12/16/2010 | 0 | cells in health and disease We proceeded to examine how the in silico findings related to in vitro studies of T-cells from healthy controls and patients with different T-cell related diseases. We downloaded several sets of gene expression microarray data from the public domain to test whether Th1 and Th2 genes were inversely correlated in T-cell related diseases. If Th1 and Th2 cells are antagonists w| was considered irrelevant. Compilation and analysis of gene expression microarray data To examine whether Th1 and Th2 gene activation patterns denoted two opposed pathways, gene expression data were downloaded from Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo/ ). Datasets were selected based on the criteria that they i) measured mRNA expression from CD4+ cells from healthy controls o|-cell related diseases ( e.g. , SLE) and ii) that there were at least 5 samples per disease or per controls, ( Table 5 ). 10.1371/journal.pcbi.1001032.t005 Table 5 Gene expression microarray datasets downloaded from the Gene Expression Omnibus repository. GEO Accession Number Disorder GSE4588 Systemic Lupus Erythematosus (SLE), Rheumatoid Arthritis (RA) GSE6740{{tag}}--REUSE-- HIV GSE8835 B cell chronic lymphocytic leuke| GSE9927 Type I HIV (HIV-I) GSE10586 Type 1 Diabetes (T1D) GSE12079 Hypereosinophilic syndrome GSE13732 Clinically Isolated Syndrome - Multiple Sclerosis GSE14317 Adult T-cell leukemia/lymphoma (ATL) GSE14924 Acute Myeloid Leukaemia (AML) GSE17354 Adenosine deaminase (ADA) - Severe combined immunodeficiency (SCID) (Therapy treated) Differentially expressed genes between patients and controls in each di | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
635 | GSE6740 | 2/23/2007 | ['6740'] | ['2649'] | [u'17251300'] | 2944782 | [u'20885780'] | ['Kovacs', 'Loutfy', 'Der', 'Halpenny', 'Wilkins', u'Loufty', 'Yang', 'Heisler', u'Covacs', 'Hyrcza', 'Ostrowski'] | ['Klassen', 'Khush', 'Morgan', 'Valantine', 'Dudley', 'Li', 'Sigdel', 'Sarwal', 'Kambham', 'Chen', 'Butte', 'Hsieh', 'Caohuu'] | [] | PLoS Comput Biol | 2010 | 9/23/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
636 | GSE6741 | 5/17/2007 | ['6741'] | ['2893'] | [u'17581126'] | 2998477 | [u'21083928'] | ['Harwood', 'Alvarez-Ortega'] | ['Ehrlich', 'Mazurie', 'Parker', 'Pitts', 'Roe', 'Folsom', 'Richards', 'Stewart'] | [] | BMC Microbiol | 2010 | 11/17/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
637 | GSE6747 | 4/1/2007 | ['6747'] | [] | [u'17502615'] | 2258866 | [u'18203827'] | [u'Whitmann', 'Moore', 'Leigh', 'Hendrickson', 'Haydock', 'Whitman'] | ['S\xc3\xb6ll', 'Leigh', 'Liu', 'Hendrickson', 'Porat', 'Rosas-Sandoval', 'Whitman'] | ['Hendrickson', 'Whitman', 'Leigh'] | J Bacteriol | 2008 | 2008 Mar | 0 | arison involved four technical replicates as described previously ( 33 ). Gene expression ratios were viewed using a TIGR MultiExperiment viewer ( 24 ). Data are available at the NCBI Gene Expression Omnibus (GEO) through accession numbers GSE6747{{tag}}--DEPOSIT-- and GSE8728 . 16S rRNA abundance was calculated by multiplying the yield of total RNA (μg per OD 660 per ml of culture) by 0.24. Real-time RT-PCR r | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
638 | GSE6747 | 4/1/2007 | ['6747'] | [] | [u'17502615'] | 1885605 | [u'17502615'] | [u'Whitmann', 'Moore', 'Leigh', 'Hendrickson', 'Haydock', 'Whitman'] | ['Hendrickson', 'Haydock', 'Moore', 'Leigh', 'Whitman'] | ['Hendrickson', 'Haydock', 'Moore', 'Leigh', 'Whitman'] | Proc Natl Acad Sci U S A | 2007 | 5/22/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
639 | GSE6751 | 12/31/2007 | ['6751'] | [] | [u'17716309'] | 2670555 | [u'17716309'] | ['Belusko', 'Lalla', 'Demmer', 'Celenti', 'Roth', 'Yang', 'Wolf', 'Pavlidis', 'Papapanou', 'Sedaghatfar'] | ['Belusko', 'Lalla', 'Demmer', 'Celenti', 'Roth', 'Yang', 'Wolf', 'Pavlidis', 'Papapanou', 'Sedaghatfar'] | ['Belusko', 'Lalla', 'Demmer', 'Celenti', 'Roth', 'Yang', 'Wolf', 'Pavlidis', 'Papapanou', 'Sedaghatfar'] | J Clin Periodontol | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
640 | GSE6752 | 2/20/2007 | ['6752'] | [] | [u'17430594'] | 1865555 | [u'17430594'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | BMC Cancer | 2007 | 4/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
641 | GSE6754 | 2/20/2007 | ['6754'] | [] | [u'17322880'] | 2823693 | [u'20092660'] | ['B\xc3\xb6lte', 'Van', 'Smith', 'Samango-Sprouse', 'Senman', 'Carone', 'Bailey', 'Weeks', 'Wilkinson', 'Schuster', 'Brown', 'Schellenberg', 'Klei', 'Devlin', 'Munson', 'Silverman', 'Koop', 'Spence', 'Barnby', 'de', 'Zwaigenbaum', 'Herbert', 'Roberts', 'Hijmans', 'Hollander', 'Nelson', 'Monaco', 'Toma', 'Salt', 'Gilliam', 'McMahon', 'Wijsman', 'Szatmari', 'Rog\xc3\xa9', 'Pickles', 'Guter', 'Volkmar', 'Stodgell', 'Wright', 'State', 'Bryson', 'Betancur', 'Rutter', 'Bucan', 'Sheffield', 'Gillberg', 'Parr', 'Berney', 'Cook', 'Francis', 'Coon', 'Meyer', 'Vieland', 'Tepper', 'Corsello', 'Tauber', 'Baird', 'Green', 'Davis', 'Lotspeich', 'Hallmayer', 'Scherer', 'Dawson', 'Lamb', 'Honeyman', 'Fombonne', 'Wallace', 'Ledbetter', 'Maestrini', 'Brian', 'Cantor', 'Mangin', 'Gilbert', 'Mantoulan', 'Staal', 'McConachie', 'Blasi', 'Skaug', 'Goedken', 'Hus', 'Bartlett', 'Constantino', 'Felder', 'Abramson', 'Hutchinson', 'Pericak-Vance', 'Flodman', 'Vincent', 'Bolton', 'Weisblatt', 'Feuk', 'Segre', 'Feineis-Matthews', 'Sousa', 'Qian', 'Papanikolaou', 'Marshall', 'Leventhal', 'Tsiantis', 'Lese-Martin', 'Wassink', 'Minshew', 'Le', 'Bacchelli', 'Liu', 'Paterson', 'Lord', 'Thompson', 'Estes', 'Poustka', 'Klauck', 'Lajonchere', 'Haines', 'Leboyer', 'Langemeijer', 'Schm\xc3\xb6tzer', 'Kemner', 'Sykes', 'Piven', 'Sutcliffe', 'Yu', 'Jones', 'Cuccaro', 'Herbrecht', 'Buxbaum', 'Folstein', 'Aldred', 'Wittemeyer', 'Tanzi', 'Shih', 'Miller', 'Rodier', 'Bourgeron', 'Geschwind', 'Kobayashi', 'Korvatska'] | ['Toplak', 'Demsar', 'Curk', 'Zupan'] | [] | BMC Genomics | 2010 | 1/22/2010 | 0 | nibus Gene Expression Omnibus [ 8 ] was considered for SNP data sets that contain at least 200 samples with approximately equal case/control distribution. Five data sets met these criteria: • GSE6754{{tag}}--REUSE-- [ 9 ] describing families with two individuals affected by autism spectrum disorders. Individuals were classified as affected (2459 samples) or unaffected (3473 samples) and described with around 1|E8054 [ 10 ] comprising 901 SNPs for each of the 121 cancerous samples and 87 controls. • GSE8055 [ 10 ] comprising 1,189 SNPs for each of the 141 cancerous samples and 89 controls. • GSE7226 [ 11 ] with platform designation GPL2004, comprising around 50,000 SNPs for each of the 102 samples from mentally retarded children and 213 controls from their unaffected siblings or parents. | entire set of SNP pairs, this limited our studies to about 2,000 SNPs. Therefore we only considered the first 2,000 SNPs of each data set, and a stratified sample of 500 individuals was used for the GSE6754{{tag}}--REUSE-- data set. Experimental methodology Feature scoring assigns interaction scores to all pairs of SNPs, resulting in a ranked list of SNP pairs. Either pairs of SNPs with scores exceeding a certain thr|ion to direct scoring and scoring with replication groups we report results obtained with bootstrap sampling. Click here for file Additional file 3 Performance graphs for differently sized subsets of GSE6754{{tag}} . Performance graphs for data subsets of 100, 200, 500, 1000, 2000, and 5000 samples drawn from GSE6754{{tag}}--REUSE--. Click here for file Additional file 4 Source code and data sets . Source code and data sets | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
642 | GSE6764 | 6/21/2007 | ['6764'] | [] | [u'17393520'] | 2620272 | [u'19014681'] | ['Mazzaferro', 'Schwartz', 'Zhang', 'Khitrov', 'Friedman', 'Bruix', 'Thung', 'Wurmbach', 'Bottinger', 'Fiel', 'Llovet', 'Chen', 'Roayaie', 'Waxman'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764{{tag}}--REUSE-- Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
643 | GSE6764 | 6/21/2007 | ['6764'] | [] | [u'17393520'] | 2990751 | [u'21124904'] | ['Mazzaferro', 'Schwartz', 'Zhang', 'Khitrov', 'Friedman', 'Bruix', 'Thung', 'Wurmbach', 'Bottinger', 'Fiel', 'Llovet', 'Chen', 'Roayaie', 'Waxman'] | ['Hoshida'] | [] | PLoS One | 2010 | 11/23/2010 | 0 | ancer Training set 97 Hu25K** (b) [11] Test set 49 HuGeneFL* (a) [12] 3. Liver cirrhosis in human and rat Training set 23 HG-U133plus2* GSE6764{{tag}}--REUSE-- [13] Test set 12 Rat Genome 230* GSE13747 - 4. Multiple tissue types (breast, lung, prostate, colon) Training set 51 HG-U95A* (a) [14] , [15|005b;14] , [15] 5. Molecular subclasses of breast cancer Training set 295 Stanford cDNA (c) [16] Test set 1 (“TransBig”) 198 HG-U133A* GSE7390 [18] Test set 2 (“Wang”) 286 HG-U133A* GSE2034 [19] Test set 3 (“Weigelt”) 53 Human WG6*** E-TABM-543|ce in cross-species prediction. We first defined a human liver cirrhosis signature including 801 up-regulated and 445 down-regulated genes in comparison between 13 cirrhotic and 10 normal livers from publicly available dataset [13] ( Table 1 ). We then tested whether the signature was presented in another publicly available dataset of gene-expression profiles of rat liver cirrhosis induc| of existing genomic signatures for their potential value as reliable medical diagnostics. The NTP methodology is implemented as Nearest Template Prediction module of GenePattern analysis toolkit and publicly available from www.broadinstitute.org/genepattern . Materials and Methods Data preprocessing We utilized data sets already normalized in the respective studies. Multiple probes corresponding a singl|its sample-wise mean and sample standard deviation in each dataset to adjust range of gene expression level between training and test datasets. All datasets and class labels used for the analysis are publicly available at http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi . The author thanks Joshua Gould, Heidi Kuehn, Barbara Hill, and Michael Reich for technical help and Stefano Monti, DR Mani, a | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
644 | GSE6764 | 6/21/2007 | ['6764'] | [] | [u'17393520'] | 2637897 | [u'19146704'] | ['Mazzaferro', 'Schwartz', 'Zhang', 'Khitrov', 'Friedman', 'Bruix', 'Thung', 'Wurmbach', 'Bottinger', 'Fiel', 'Llovet', 'Chen', 'Roayaie', 'Waxman'] | ['Jen', 'Lin', 'Tung', 'Wang', 'Hsu'] | [] | BMC Genomics | 2009 | 1/16/2009 | 0 | biological function interpretation. Methods Data collections and preprocessing Six independent data sets (Normal, HCC 1 , HCC 2 , Tumor 1 , Tumor 2 , Tumor 3 ), including one normal tissue data set (GSE3526), two HCC data sets (E-TABM-36 and GSE6764{{tag}}--REUSE--), and data sets for other three tumor types: thyroid cancer (GSE3678), colon cancer(GSE4107) and lung cancer (GSE7670), were downloaded from two public ar | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
645 | GSE6764 | 6/21/2007 | ['6764'] | [] | [u'17393520'] | 2750086 | [u'19366792'] | ['Mazzaferro', 'Schwartz', 'Zhang', 'Khitrov', 'Friedman', 'Bruix', 'Thung', 'Wurmbach', 'Bottinger', 'Fiel', 'Llovet', 'Chen', 'Roayaie', 'Waxman'] | ['Lee', 'Park', 'Thorgeirsson', 'Woo', 'Ishikawa', 'Kim'] | [] | Cancer Res | 2009 | 5/1/2009 | 0 | assay. Screening of Therapeutic Drugs using in trans correlated gene signatures Raw data of Connectivity Map (build01) with two different Affymetrix platforms ( i.e. , hgu133a and HT-hgu133a) were downloaded from authors’ website ( www.broad.mit.edu/cmap/ ) and normalized independently using the RMA method implemented in Bioconductor ( http://www.bioconductor.org ). For each of the in trans g|lot for the average copy numbers of the corCNA genes at 1q vs. 8q in 15 HCC dataset. C-D , Scatter plots for the average gene expression levels of corCNA genes at 1q vs. 8q in 139 HCC ( C ) and GSE6764{{tag}}--REUSE-- dataset ( D ). Pathological phenotypes of cirrhotic liver (n = 13, black), dysplastic nodules (n = 17, blue), early HCC (n = 18, pink), and advanced HCC (n = 17, red) are indicated with different |portance of the early dysregulation of these genes in HCC development. The prognostic values of the 50 or 30 1q/8q corCNA genes were further validated by two independent HCC datasets (SNU ( 22 ) and GSE6764{{tag}}--REUSE-- ( 2 )) using class prediction algorithms (for details see SI Methods). The tumor classes defined by the expression similarity of the 50 or 30 genes could successfully predict the prognostic outcom|q corCNA genes were also correlated in our 139 HCC dataset ( r =0.333, P =6.17 × 10 −5 , permutation P =0.042; Fig. 3 C ), and this was validated in an independent 35 HCC dataset ( GSE6764{{tag}}--REUSE-- , r =0.398, P =0.017; Fig. 3 D ). In addition, the result from GSE6764{{tag}}--REUSE-- also showed that the coexpression levels of the 1q and 8q genes corresponded with the pathological staging of the liver ( | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
646 | GSE6764 | 6/21/2007 | ['6764'] | [] | [u'17393520'] | 2949900 | [u'20875095'] | ['Mazzaferro', 'Schwartz', 'Zhang', 'Khitrov', 'Friedman', 'Bruix', 'Thung', 'Wurmbach', 'Bottinger', 'Fiel', 'Llovet', 'Chen', 'Roayaie', 'Waxman'] | ['Dawany', 'Tozeren'] | [] | BMC Bioinformatics | 2010 | 9/27/2010 | 0 | he individual datasets usually small in size, but the inferences made from individual studies are often inconsistent with similar studies [ 1 ]. As thousands of microarray samples have accumulated in publicly accessible databases in the last decade [ 2 - 4 ], several statistical methods have been developed to allow for the combination and comparison of data from multiple sources. Among the many methodolog|ons based on hypergeometric test. Table 1 Overview of datasets used and distribution of microarray samples Analysis Tissue Accession # Normal Cancer Platform IV1/IV2/SAM1/SAM2 Colon E-MTAB-57 22 25 A GSE4107 10 12 P2 GSE4183 8 15 P2 Kidney E-TABM-282 11 16 P2 GSE11024† 12 60 P2 GSE11151 3 57 P2 GSE14762† 12 10 P2 GSE15641 23 57 A GSE6344 10 10 A GSE7023 12 35 P2 Liver GSE14323 19 47 A/A|2 49 58 A GSE7670 27 27 A Pancreas E-MEXP-1121† 6 17 A E-MEXP-950 11 14 A GSE15471 39 39 P2 GSE16515 15 36 P2 Total: 294 619 SAM2 Colon E-MEXP-1224 0 55 A E-MEXP-383 0 36 A E-TABM-176 55 0 P2 GSE12945 0 36 A GSE17538 0 232 P2 Kidney GSE10320 0 144 A GSE11904 0 21 A2 Liver E-TABM-292 0 32 A E-TABM-36 0 57 A GSE9843 0 69 P2 Lung GSE10445 0 72 P2 GSE12667 0 75 P2 Total: 55 829 IV2 Colon GSE6988 28|5E-257 No data - 262 2.34E-299 * Only 338 genes are used for colon IV1 Moreover, to assess the effect of the refRMA method in normalizing data, three samples from different colon datasets (E-MTAB-57, GSE4107 and GSE4183) were chosen. The expression values for the three arrays were obtained based on classical RMA and refRMA normalization techniques. Quantile-quantile plots were produced to compare the d|election A total of 31 Affymetrix microarray datasets containing 1,768 unique samples from human cancer (1,429) and corresponding healthy control tissues (339) were collected from the Gene Expression Omnibus (GEO; [ 2 , 3 ] and Array Express [ 4 ] online repositories (Additional File 2 ). Samples were selected for 5 different tissue types: colon, kidney, liver, lung and pancreas, then categorized into c|ms and the conversion of data to Entrez IDs resulted in the study of varying number of genes per dataset as well as different total overlap with the common Affymetrix platform (shown in parentheses); GSE6988: 9,072 (5,834) genes, GSE3: 12,452 (6,598) genes, GSE7367: 2118 (1,301) genes, GSE2088: 13754 (7,038) genes, and GSE8596: 6740 (4,330) genes. The datasets contained cancer versus normal samples fro|NCBI GEO: mining tens of millions of expression profiles--database and tools update Nucleic Acids Res 2007 35 Database D760 765 10.1093/nar/gkl887 17099226 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 1 207 210 10.1093/nar/30.1.207 11752295 Brazma A Parkinson H Sarkans U Shojatalab M Vilo J Abeygunawardena N Ho | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
647 | GSE6764 | 6/21/2007 | ['6764'] | [] | [u'17393520'] | 2783293 | [u'19861515'] | ['Mazzaferro', 'Schwartz', 'Zhang', 'Khitrov', 'Friedman', 'Bruix', 'Thung', 'Wurmbach', 'Bottinger', 'Fiel', 'Llovet', 'Chen', 'Roayaie', 'Waxman'] | ['Mas', 'Bornstein', 'Maluf', 'David', 'Archer', 'Fisher'] | [] | Cancer Epidemiol Biomarkers Prev | 2009 | 2009 Nov | 0 | AND pmc_gds | 0 | 1 | ||||
648 | GSE6764 | 6/21/2007 | ['6764'] | [] | [u'17393520'] | 2944782 | [u'20885780'] | ['Mazzaferro', 'Schwartz', 'Zhang', 'Khitrov', 'Friedman', 'Bruix', 'Thung', 'Wurmbach', 'Bottinger', 'Fiel', 'Llovet', 'Chen', 'Roayaie', 'Waxman'] | ['Klassen', 'Khush', 'Morgan', 'Valantine', 'Dudley', 'Li', 'Sigdel', 'Sarwal', 'Kambham', 'Chen', 'Butte', 'Hsieh', 'Caohuu'] | ['Chen'] | PLoS Comput Biol | 2010 | 9/23/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
649 | GSE6768 | 2/26/2007 | ['6768'] | [] | [u'17263897'] | 1851391 | [u'17263897'] | ['Krogh', 'Tanner', u'Ferno', 'Kataja', 'Bendahl', 'Kauraniemi', 'Jumppanen', 'Fern\xc3\xb6', 'Lundin', 'Gruvberger-Saal', 'Borg', 'Isola'] | ['Krogh', 'Tanner', 'Kataja', 'Bendahl', 'Kauraniemi', 'Jumppanen', 'Fern\xc3\xb6', 'Lundin', 'Gruvberger-Saal', 'Borg', 'Isola'] | ['Krogh', 'Tanner', 'Kataja', 'Bendahl', 'Kauraniemi', 'Jumppanen', 'Fern\xc3\xb6', 'Lundin', 'Gruvberger-Saal', 'Borg', 'Isola'] | Breast Cancer Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
650 | GSE6770 | 1/18/2007 | ['6770'] | ['2624'] | [u'17322895'] | 2945940 | [u'20840752'] | ['Zhu', 'Wang', 'Zhang', 'Trivedi', 'Yin', 'Goettlicher', 'Wurst', 'Floss', 'Ferrari', 'Epstein', 'Noppinger', 'Abrams', 'Gruber', 'Luo'] | ['Ojeda', 'de', 'Nitsch', 'Gon\xc3\xa7alves', 'Moreau'] | [] | BMC Bioinformatics | 2010 | 9/14/2010 | 0 | n, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expr| approach for scoring candidates is presented here for comparison purposes as a naïve strategy for network analysis of differential expression. We have benchmarked these four strategies on 40 publicly available data sets originated from Affymetrix chips on which mice with (simple) knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. For each data set we|egy based on a direct neighborhood analysis. These four network-based prioritization strategies for scoring candidate genes based on their differentially expressed neighborhood were benchmarked on 40 publicly available knockout experiments in mice. Performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expressi|ng candidate genes using network-based machine learning approaches even if no knowledge is available about the disease or phenotype. Methods Benchmark data The benchmark for this study consists of 40 publicly available data sets originated from Affymetrix chips on which mice with knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. Table 3 shows all data sets|ur benchmark. Table 3 The benchmark data. Gene Name GEO accession number Gene Name GEO accession number 1 Abca1 GSE5496 21 Mbnl1 GSE14691 2 Btk GSE2826 22 Mst1r, Ron GSE16629 3 Cav1 GSE10849 23 MyD88 GSE6688 4 Cav3 GSE10848 24 Nos3, eNos GSE1988 5 Cftr GSE5715 25 Phgdh GSE8555 6 Clcn1 GSE14691 26 Pmp22 GSE1947 7 Cnr1 GSE7694 27 PPAR α GSE6864 8 Emd GSE5304 28 Prkag3, AMPK G3 GSE4065 9 Epas1, Hi|-2 GSE16067 29 Pthlh, Pthrp GSE17654 10 Esrra GSE7196 30 Rab3a GSE6527 11 Gap43 GSE12687 31 RasGrf1 GSE8425 12 Gnmt GSE9809 32 Rbm15 GSE12628 13 Hdac1 GSE5583 33 Runx GSE4911 14 Hdac2 GSE6770{{tag}}--REUSE-- 34 Scd1 GSE2926 15 Hsf4 GSE12415 35 Slc26a4 GSE10587 16 Hspa1A, Hsp70.1 GSE11120 36 Srf GSE13333 17 Il6 GSE411 37 Tgm2 GSE10285 18 Lhx1, Lim1 GSE4230 38 Zc3h12a GSE14891 19 Lhx8 GSE11897 39 Zfp36, Tpp GSE5324 20 L | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
651 | GSE6771 | 1/23/2007 | ['6771'] | [] | [u'17343748'] | 1868939 | [u'17343748'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
652 | GSE6772 | 3/13/2007 | ['6772'] | [] | [u'17410534'] | 2602602 | [u'19104654'] | ['J\xc3\xbcrgens', 'Klein', 'Schmutzler', 'Arnold', 'Meindl', 'Scherneck', 'Niederacher', 'Seitz', 'Graessmann', 'Wessel', 'Petersen'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | al Biology/Genomics Computational Biology/Transcriptional Regulation TranscriptomeBrowser: A Powerful and Flexible Toolbox to Explore Productively the Transcriptional Landscape of the Gene Expression Omnibus Database GEO Datamining with TBrowser Lopez Fabrice 1 2 Textoris Julien 1 2 5 Bergon Aurélie 1 2 Didier Gilles 2 3 Remy Elisabeth 2 3 Granjeaud Samuel 1 2 Imbert Jean 1 2 Nguyen Catherine 1 2|. Methodology We used a modified version of the Markov clustering algorithm to systematically extract clusters of co-regulated genes from hundreds of microarray datasets stored in the Gene Expression Omnibus database (n = 1,484). This approach led to the definition of 18,250 transcriptional signatures (TS) that were tested for functional enrichment using the DAVID knowledgebase. O|data, most generally deposited in MIAME-compliant public databases, constitute an unprecedented source of knowledge for biologists [1] . As an example, until now, the Gene Expression Omnibus repository (GEO) host approximately 8,000 experiments encompassing about 200,000 biological samples analyzed using various high through-put technologies [2] . Consequently, this repr|iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ry service made this work possible. Materials and Methods Microarray data retrieval Human mouse and rat microarray data derived from 30 Affymetrix microarray platforms (Supplementary Table S1 ) were downloaded from the GEO ftp site and retrieved in seriesMatrix file format ( ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/ ). SeriesMatrix are summary text files related to a GEO series Experiment (GSE) t|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
653 | GSE6773 | 1/23/2007 | ['6773'] | [] | [u'17343748'] | 1868939 | [u'17343748'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
654 | GSE6774 | 1/23/2007 | ['6774'] | [] | [u'17343748'] | 1868939 | [u'17343748'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
655 | GSE6775 | 1/23/2007 | ['6775'] | [] | [u'17343748'] | 1868939 | [u'17343748'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
656 | GSE6776 | 2/23/2007 | ['6776'] | [] | [u'17360575'] | 1838652 | [u'17360575'] | ['Srivastava', u'Donahoe', 'Baliga', 'Vuthoori', 'Kaur', 'Donohoe', 'Facciotti', 'Hood', 'Shannon', 'Bonneau', 'Pan', 'Reiss'] | ['Srivastava', 'Baliga', 'Vuthoori', 'Kaur', 'Donohoe', 'Facciotti', 'Hood', 'Shannon', 'Bonneau', 'Pan', 'Reiss'] | ['Srivastava', 'Baliga', 'Vuthoori', 'Kaur', 'Facciotti', 'Hood', 'Donohoe', 'Bonneau', 'Pan', 'Shannon', 'Reiss'] | Proc Natl Acad Sci U S A | 2007 | 3/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
657 | GSE6777 | 1/23/2007 | ['6777'] | [] | [u'17343748'] | 1868939 | [u'17343748'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
658 | GSE6778 | 1/23/2007 | ['6778'] | [] | [u'17343748'] | 1868939 | [u'17343748'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | ['Sisodiya', 'Yoon', 'Goldstein', 'Heinzen', 'Weale', 'Wood', 'Hulette', 'Sen', 'Welsh-Bohmer', 'Burke'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
659 | GSE6782 | 2/1/2007 | ['6782'] | [] | [u'17916165'] | 2433278 | [u'17916165'] | ['Kaufmann', 'Mollenkopf', 'Hurwitz', 'Besra', 'Darmoise', 'Schaible', 'Hahnke', 'Niemeyer'] | ['Kaufmann', 'Mollenkopf', 'Hurwitz', 'Besra', 'Darmoise', 'Schaible', 'Hahnke', 'Niemeyer'] | ['Schaible', 'Hurwitz', 'Besra', 'Darmoise', 'Kaufmann', 'Hahnke', 'Mollenkopf', 'Niemeyer'] | Immunology | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
660 | GSE6783 | 3/1/2007 | ['6783'] | ['2623'] | [u'17322878'] | 3013804 | [u'21037261'] | ['Segal', 'Rechavi', 'Shay', 'Mills', 'Tarcic', 'Lu', 'Lahad', 'Vaisman', u'Ami', u'Yosef', 'Zhang', 'Amariglio', u'Eytan', 'Jacob-Hirsch', 'Domany', 'Citri', 'Yarden', 'Alon', u'Ido', 'Amit', 'Katz', 'Siwak', u'Tal'] | ['Bo-Kai', 'Chang', 'Lee', 'Huang'] | [] | Nucleic Acids Res | 2011 | 2011 Jan | 0 | onsists of 22 283 probe set for 12 678 genes, is used to explore the co-expression of kinase and substrate genes. Gene expression data, including Esophageal cell response to low pH (GSE2144), Lung cancer cell line response to motexafin gadolinium (GSE2189), Cyanobacterial metabolite apratoxin A cytotoxic effect on colon adenocarcinoma cells (GSE2742), Interleukin 13 effect on bro|in effect on leukocytes (GSE3284), Blood response to various beverages (GSE3846) Androgen receptor modulator effect (GSE4636), Glucocorticoid receptor activation effect on breast cancer cells (GSE4917) and Epidermal growth factor effect on cervical carcinoma cell line (GSE6783{{tag}}--REUSE--), were quantified by Robust Multichip Average (RMA) algorithm ( 43 ). RMA quantification was performed by the justRMA |e experimentally verified data on protein phosphorylation and protein–protein interaction will be updated quarterly. The time-coursed microarray expression data collected from Gene Expression Omnibus (GEO) will also be updated quarterly. The resource is now freely available at http://RegPhos.mbc.nctu.edu.tw . SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. FUNDING National Sc | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
661 | GSE6783 | 3/1/2007 | ['6783'] | ['2623'] | [u'17322878'] | 2424294 | [u'18463614'] | ['Segal', 'Rechavi', 'Shay', 'Mills', 'Tarcic', 'Lu', 'Lahad', 'Vaisman', u'Ami', u'Yosef', 'Zhang', 'Amariglio', u'Eytan', 'Jacob-Hirsch', 'Domany', 'Citri', 'Yarden', 'Alon', u'Ido', 'Amit', 'Katz', 'Siwak', u'Tal'] | ['Westerhoff', 'Herzel', 'Legewie', 'Bl\xc3\xbcthgen'] | [] | Mol Syst Biol | 2008 | 2008 | 1 | Microarray data Microarray data were collected from the Gene Expression Omnibus database ( Barrett et al , 2005 ) using R and bioconductor. Data sets with the following accession numbers were used: GDS896, GSE6783{{tag}}--REUSE--, GSE6462 (MAPK signalling); GDS854, GDS855, GSE5232 (TGFβ signalling); GSE3737, GSE6783{{tag}}--REUSE--, GSE6462 (PI3K/AKT signalling); GDS323, GDS1036, GDS1365, GDS1489 (JAK/STAT signalling); GDS1 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
662 | GSE6787 | 6/5/2007 | ['6787'] | ['2757'] | [u'17557897'] | 1976369 | [u'17557897'] | ['Dibling', 'Macleod', 'Spike'] | ['Dibling', 'Macleod', 'Spike'] | ['Dibling', 'Macleod', 'Spike'] | Blood | 2007 | 9/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
663 | GSE6791 | 5/15/2007 | ['6791'] | [] | [u'17510386'] | 2575803 | [u'18669583'] | ['Marsit', 'Kelsey', 'Sengupta', 'Lambert', 'Newton', 'Smith', 'Ahlquist', 'Pyeon', 'den', 'Connor', 'Turek', 'Woodworth', 'Haugen'] | ['Farwell', 'Schwartz', 'Upton', 'M\xc3\xa9ndez', 'Yueh', 'Zhao', 'Fan', 'Futran', 'Houck', 'Chen', 'Lohavanichbutr', 'Doody'] | [] | Cancer Epidemiol Biomarkers Prev | 2008 | 2008 Aug | 0 | se two models using an internal independent testing set of 48 invasive OSCC and 10 controls and an external testing set of 42 head and neck squamous cell carcinoma (HNSCC) cases and 14 controls (GEO GSE6791{{tag}}--REUSE-- ), with sensitivity and specificity above 95%. These two models were also able to distinguish dysplasia (n=17) from control (n=35) tissue. Differential expression of these four genes was confirmed |ults We conducted two rounds of QC checks to evaluate whether to include results from each of the GeneChips. In the first round, recommendations made by Affymetrix ( http://www.affymetrix.com/support/downloads/manuals/data_analysis_fundamentals_manual.pdf ) were followed. In the second round, we used the “affyQCReport” and “affyPLM” software in the Bioconductor package ( ht|controls ( 10 - 12 ). Validating Prediction Models We validated the selected prediction models with our own independent validation dataset and an external validation dataset from GEO (Gene Expression Omnibus, www.ncbi.nlm.nih.gov/geo , GSE6791{{tag}}--REUSE-- containing 42 HNSCC cases and 14 controls) ( 13 ). CEL files from these datasets were extracted using gcRMA algorithm. ROC curves were drawn by applying the exp|sults to the prediction models. Comparison of Gene Expression of the Prediction Models in Different Tissues to Test the Specificity of the Models for OSCC We downloaded gene expression data from GEO GSE6791{{tag}}--REUSE-- for normal and tumor cervical tissue samples and GSE6044 for normal and tumor lung samples. We chose these datasets because: 1) they were generated using the same Affymetrix U133 GeneChip platfo|for peptidyl arginine deimminase type 1) emerged as the next set of markers that best separated OSCC from controls (AUC=0.99976). Table 2 Validation of predictive models using internal and external ( GSE6791{{tag}}--REUSE-- ) testing datasets When we applied the expression values from the testing datasets to the predictive models derived from our training dataset, the model with LAMC2 (probe set 207517_at) and COL4| external testing set (GEO GSE6791{{tag}}--REUSE-- ), respectively ( Table 2 ). The model with COL1A1 and PADI1 also was strongly predictive (AUC=0.99167 in our testing set, and AUC=0.97789 in the external GEO GSE6791{{tag}}--REUSE-- data set ( Table 2 ). Results on the testing of the other eight models against the internal and external datasets indicate that they also performed well in distinguishing OSCC from controls ( Tabl|es with overlapping risk factors. For each of these two predictive models, we compared the scores for cases and controls calculated from our testing dataset to the scores from the GEO HNSCC dataset ( GSE6791{{tag}}--REUSE-- ) and from the GEO cervical cancer and lung cancer data sets ( GSE6044 ) and their controls. The model containing LAMC2 and COL4A1 distinguished HNSCC from controls, but not cervical cancer nor|(top) and model COL1A1 and PADI1 (bottom). Box Whisker plots of logistic regression scores (y axis) for normal controls and cases in our own testing set (N: normal, DYS: dysplasia, T: OSCC), GEO GSE6791{{tag}}--REUSE-- head (more ...) Figure 2 Tissue specificity of model LAMC2 and COL4A1 (top) and model COL1A1 and PADI1 (bottom). Box Whisker plots of logistic regression scores (y axis) for normal cont|s in our own testing set (N: normal, DYS: dysplasia, T: OSCC), GEO GSE6791{{tag}}--REUSE-- head and neck normal controls (HN N) and cases (HN T), GEOGSE 6791{{tag}}--REUSE-- cervical normal controls (C N)and cases (C T), and GEO GSE6044 lung normal controls (L N), lung squamous cell carcinoma (L SCC), lung adenocarcinoma (L AD) and lung small cell cancer (L SC). Comparison of gene expressions of invasive cancer with those of norm | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
664 | GSE6791 | 5/15/2007 | ['6791'] | [] | [u'17510386'] | 2816644 | [u'20087356'] | ['Marsit', 'Kelsey', 'Sengupta', 'Lambert', 'Newton', 'Smith', 'Ahlquist', 'Pyeon', 'den', 'Connor', 'Turek', 'Woodworth', 'Haugen'] | ['West', 'Harris', 'Buffa', 'Miller'] | [] | Br J Cancer | 2010 | 1/19/2010 | 0 | Cox multivariate analysis that includes the other significant clinical covariates and the hazard ratio (HR) of the HS is calculated. Data sets, data processing and annotation NCBI Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo/ ) was searched for gene expression studies in cancer, published in peer-reviewed journals, where microarray were performed on frozen material extracted before chemo|tween data sets. Much of this inter-experimental variation is likely to reflect differences in both the patient populations and the processing of the biological material. For example, both data sets GSE6791{{tag}}--REUSE-- and GSE3494 , which showed a lower level of enrichment for hypoxia genes than others, featured samples with the highest proportions of tumour cells selected either by microdissection or visual sc | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
665 | GSE6791 | 5/15/2007 | ['6791'] | [] | [u'17510386'] | 2858285 | [u'17510386'] | ['Marsit', 'Kelsey', 'Sengupta', 'Lambert', 'Newton', 'Smith', 'Ahlquist', 'Pyeon', 'den', 'Connor', 'Turek', 'Woodworth', 'Haugen'] | ['Marsit', 'Kelsey', 'Sengupta', 'Lambert', 'Newton', 'Smith', 'Ahlquist', 'Pyeon', 'den', 'Connor', 'Turek', 'Woodworth', 'Haugen'] | ['Marsit', 'Newton', 'Sengupta', 'Lambert', 'Kelsey', 'Smith', 'Ahlquist', 'Turek', 'den', 'Connor', 'Pyeon', 'Woodworth', 'Haugen'] | Cancer Res | 2007 | 5/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
666 | GSE6794 | 1/23/2007 | ['6794'] | [] | [u'17276354'] | 2765082 | [u'19689276'] | ['Pissios', 'Flier', 'Friedman', 'Cheung', 'Finkel', 'Chen', 'Ohtsubo', 'Tzameli', 'Gavrilova', 'Rovira'] | ['Hackl', 'Bornstein', 'Prokesch', 'Trajanoski', 'Hakim-Weber'] | [] | Curr Med Chem | 2009 | 2009 | 0 | vant for the function of mature adipocytes. Fig. (2) Gene expression profiles of selected candidates from 3T3-L1 time series experiments. 1 Raw data were downloaded from Gene Expression Omnibus (GEO) GSE6794{{tag}}--REUSE--, normalized using GCRMA and annotated with recent annotation files for Mu11kA and Mu11kB downloaded from the Affymetrix website. Log2ratios are visualized for each time point in relation to the pre | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
667 | GSE6798 | 10/20/2007 | ['6798'] | ['3104'] | [u'17563058'] | 2620272 | [u'19014681'] | ['Glintborg', 'Kruse', 'Knudsen', 'Brusgaard', 'H\xc3\xb8jlund', u'H\xf8jlund', 'Beck-Nielsen', 'Skov', 'Tan', 'Jensen'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798{{tag}}--REUSE--, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
668 | GSE6799 | 12/7/2007 | ['6799'] | [] | [] | 2806043 | [u'20157543'] | [u'Riethman', u'Wright', u'Wei', u'Shay', u'Lou'] | ['Wei', 'Shay', 'Wright', 'Baur', 'Lou', 'Riethman', 'Voglauer'] | [u'Riethman', u'Wright', u'Wei', u'Shay', u'Lou'] | Aging (Albany NY) | 2009 | 7/17/2009 | 0 | To examine the correlation of gene expression and telomere shortening, we used a "Telo-Chip", a customized microarray containing 1,323 potential subtelomeric genes (within 1,000 kilobase pairs from the telomeres) representing all 92 telomere ends. The Telo-Chip also contained 92 random control genes, 12 housekeeping genes and 198 other genes (GEO Datasets,ÊGSE6799{{tag}}--DEPOSIT--).Ê | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
669 | GSE6800 | 9/28/2007 | ['6800'] | ['3105'] | [u'17880733'] | 2782370 | [u'19117983'] | ['Pusch', 'Wolfl', 'Gaube', 'Hamburger', 'Kroll'] | ['Steffen', 'Hilsenbeck', 'Ochsner', 'Chen', 'McKenna', 'Watkins'] | [] | Cancer Res | 2009 | 1/1/2009 | 0 | t datasets at either time point. Moreover, relaxing the q -value cut -off to 0.2 resulted in only a modest increase in the number of genes in this intersection (data not shown here but available for download from the GEMS website). This initial result indicated that given the extent in variation across the datasets, traditional Venn analysis would be of limited use in arriving at a consensus gene express| Table 1 Studies selected for meta-analysis. Supplementary Material body Supplementary Click here to view. (4.1M, zip) Acknowledgments We thank the principal investigators who made their datasets publicly available. This work was supported by NIDDK NURSA U19 DK62434. Other Sections� Abstract Introduction Materials and Methods Results Gene Expression MetaSignatures (GEMS) web resource Discussion | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
670 | GSE6800 | 9/28/2007 | ['6800'] | ['3105'] | [u'17880733'] | 2194763 | [u'17880733'] | ['Pusch', 'Wolfl', 'Gaube', 'Hamburger', 'Kroll'] | ['Pusch', 'Wolfl', 'Gaube', 'Hamburger', 'Kroll'] | ['Pusch', 'Wolfl', 'Gaube', 'Hamburger', 'Kroll'] | BMC Pharmacol | 2007 | 9/20/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
671 | GSE6802 | 2/27/2007 | ['6802'] | ['2606'] | [u'17312161'] | 2644708 | [u'19055840'] | ['Lang', 'Mayer', 'Muehmer', 'Dalpke', 'Gueinzius', 'Heeg', 'Mages', 'Hess', 'Bals'] | ['Bagby', 'Sears', 'Kulesz-Martin', 'Pelz'] | [] | BMC Bioinformatics | 2008 | 12/4/2008 | 0 | catter plot to compare pairs of samples, we expect to see most transcripts centered along the diagonal line. When this is not the case, further normalization may be required. We have examined over 30 publicly available datasets, and found many to contain samples with systematic non-linear distortions apparent in their scatter plots. In this report, we will consider a variety of datasets demonstrating vari|ence of 5000 to 10000, 10000 to 15000, and 15000 to 20000. Top Row – GB dataset comparing Fanconi vs. Normal using MAS 5.0 processed data on left and RMA processed data on the right. On left, GSE6802{{tag}}--REUSE-- dataset comparing R vs. C, using RMA. On right, 339RS dataset comparing TA vs. C using RMA. Next, we look at the effect of different GRiS sizes on the detection of statistically significant genes. |toff applied to the respective summary and normalization methods. B-C) Color coding is modified so that blue genes are shown only in the left panel and red genes are shown only in the right panel. B) GSE6475 data comparing 6 AL (acne lesion) replicates to 6 AN (acne normal) replicates. Samples are plotted as in A. C) SS data comparing 6 mutant samples to 6 wild type (3 male and 3 female for each condit | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
672 | GSE6802 | 2/27/2007 | ['6802'] | ['2606'] | [u'17312161'] | 2958743 | [u'20798171'] | ['Lang', 'Mayer', 'Muehmer', 'Dalpke', 'Gueinzius', 'Heeg', 'Mages', 'Hess', 'Bals'] | ['Sama', 'Huynen'] | [] | Bioinformatics | 2010 | 11/1/2010 | 0 | DS 2.1 General PPI networks The PPI network used were built from an accumulation of human-curated PPIs obtained from the Biomolecular Interaction Network Database (BIND; Bader et al. , 2003 ) (data downloaded in October 2006), the HPRD (Peri et al. , 2003 ) (data of release 6 of January 2007), the IntAct database (Kerrien et al. , 2007 ) (downloaded in May 2007), the Molecular Interactions Database |actions between human proteins as HsapiensPPI. Furthermore, interologous PPIs were built using the orthologues datasets from the Ensembl genome browser (Hubbard et al. , 2007 ) (Ensembl release 44, downloaded on May 2007). These were combined with the HsapiensPPI dataset. We refer to this comprehensive dataset as AllspeciesPPI. The HsapiensPPI contains 53 807 interactions between 10 826 proteins. The Al|. Unless otherwise stated, the HsapiensPPI is used in this article as the general PPI network. 2.2 Disease and immune-related data All human disease genes were obtained from the Morbid Omim database (downloaded February 10, 2009 from ftp://ftp.ncbi.nih.gov/repository/OMIM/morbidmap ) (Sayers et al. , 2009 ). The HMPV infection data was obtained from (Bao et al. , 2008 ), as deposited in the NCBI Gene| epithelial cell treatment with the cytokine INFG (GSE1815) (Pawliczak et al. , 2005 ). Expression data of bronchial epithelial cells infected with respiratory pathogens like Chlamydia pneumonia (GSE7246) (Alvesalo et al. , 2008 ) and UV-irradiated airway-pathogens (for P.aeruginosa and RSV; GSE6802{{tag}}--REUSE--; Mayer et al. , 2007 ). A geometric average of all probes for a gene was used to represent the | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
673 | GSE6806 | 3/16/2007 | ['6806'] | [] | [u'17369397'] | 1820938 | [u'17369397'] | ["O'Carroll", u'Fuchou', 'Lao', 'Lee', 'Tang', 'Sun', 'Barton', 'Tarakhovsky', u'Masahiro', 'Surani', 'Kaneda', 'Hajkova'] | ["O'Carroll", 'Lao', 'Lee', 'Tang', 'Sun', 'Barton', 'Tarakhovsky', 'Surani', 'Kaneda', 'Hajkova'] | ["O'Carroll", 'Lao', 'Lee', 'Tang', 'Sun', 'Barton', 'Tarakhovsky', 'Surani', 'Kaneda', 'Hajkova'] | Genes Dev | 2007 | 3/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
674 | GSE6822 | 12/31/2007 | ['6822'] | [] | [] | 2679144 | [u'19440550'] | [u'Martinu', u'Le', u'Mes-Masson', u'Filali-Mouhim', u'Novak', u'Hudson', u'Tonin', u'Ponton', u'Provencher', u'Bachvarov', u'Ouellet'] | ['B\xc3\xb8rresen-Dale', 'Helland', 'Mills', 'Hennessy', 'Lahad', 'Liu', 'Lu', 'Schaner', 'Kristensen', 'Murph', 'Yu', 'Hall'] | [] | PLoS One | 2009 | 2009 | 1 | reating average linkage. The display of hierarchical clustering graphs utilized TreeView [23] . For hierarchical clusters, publicly-available ovarian cancer gene expression datasets (GSE6822{{tag}}--REUSE-- [24] , GSE6008 [25] , GSE10971 and GSE12418 [26] ) were downloaded from the NCBI Entrez GEO website ( http://www.ncbi.nlm.nih.gov/sites/entrez?db=| data suggested LPA influences the development of serous EOC and/or the 39-gene signature characterizes serous EOC. To test these hypotheses, we analyzed additional datasets including a test dataset (GSE6008; N = 103) [24] containing normal ovary specimens and ovarian tumors from endometrioid, mucinous, serous and clear cell types. Hierarchical clustering analy| of transcripts contained in the LPA gene signature were clustered and average linkages were calculated using the Cluster software. Results were visualized using the TreeView program (see methods ). GSE6008, N = 103. The strongly-positive LPA cluster, N = 19, (mostly red) is seen in the bracket on the far right corresponding to serous tumors. Statistical|#x0002a;*P<0.01 and ***P<0.001. To further verify that 39-gene signature characterizes serous EOC, we examined additional ovarian datasets. The dataset GSE6822{{tag}}--REUSE-- (N = 74) [25] contains a majority of serous specimens (N = 46, 62%) and includes benign, borderline and invasive tumors. It|umors of undetermined origin. Over half of all serous tumor cells are invasive (malignant) and the remaining are borderline (low malignant potential) or benign [14] . Another dataset GSE10971 (N = 37) contains samples from non-malignant fallopian tube epithelium and high-grade serous carcinoma. Hierarchical clustering divided the samples into two groups (data no|igure S2B ). Only one sample from the latter group was high-grade serous (N = 1, 4%). In summary, the data from our training set combined with the data presented using GSE6822{{tag}}--REUSE--, GSE10971 and GSE6008 datasets suggests that the LPA-signature characterizes serous EOC. Since evidence suggests the 39-gene signature characterizes serous EOC from ovarian datasets ( Figure 1 , F| and data not shown), we questioned whether it also classified prognosis in ovarian cancers. For this analysis, we acquired two ovarian cancer datasets containing patient outcomes. The first dataset (GSE12418; N = 54) [26] was used to examined the predictive value of the 39-gene signature and it contains serous samples from different stages. Hierarchical cluste|00a;= 16, 53%) ( Figure 2C ). 10.1371/journal.pone.0005583.g002 Figure 2 The LPA signature corresponds to worsened outcome in ovarian cancer patients. (A) The patient dataset, GSE12418, N = 54, was downloaded from the NCBI Entrez GEO DataSets website and analyzed for gene expression changes among all included transcripts in the 39-gene signature. Hierarch|he microarray chip. Among these drivers, we sought to determine whether a singular transcript is responsible for LPA-positive signature characterization. Using the most complete dataset we acquired, (GSE6008; N = 103) [24] , we analyzed the drivers and determined whether classification of LPA-positive serous tumors could be achieved using only CLDN1 , CYR61 , |ed levels in serous (P<0.001) ( Figure 3H ). 10.1371/journal.pone.0005583.g003 Figure 3 CLDN1 is a biomarker for serous epithelial ovarian carcinoma specimens. Box-plot analyses of data from GSE6008, N = 103 show the drivers THBS1 (A, B), TNC (C), IL-8 (D), MUC1 (E, F), CYR61 (G), CLDN1 (H), KRT23 (I) and FN1 (J, K, L) normalized comparison of gene expre|inent categories. (1.60 MB TIF) Click here for additional data file. Figure S2 Various ovarian cancer datasets demonstrate that the 39-gene signature characterizes serous EOC. (A) The patient dataset GSE6822{{tag}} (N = 74) [25] was examined with the genes available contained in the 39-gene signature and hierarchical clustering separated a group (N =&#|um carcinoma. (1.79 MB TIF) Click here for additional data file. Figure S3 CLDN-1 and cell adhesion-related proteins drive the clustering of the LPA transcriptomic signature. (A) The patient dataset, GSE6008, N = 103, was downloaded from the NCBI Entrez GEO DataSets website and analyzed using a previously-identified set of drivers, CLDN1, CYR61, FN1, IL-8, MUC1, THBS1, and TNC. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
675 | GSE6836 | 1/22/2007 | ['6836'] | [] | [u'17214507'] | 2875035 | [u'20097653'] | ['Faith', 'Hayete', 'Wierzbowski', 'Cottarel', 'Gardner', 'Mogno', 'Collins', 'Kasif', 'Thaden'] | ['Ruppin', 'Sharan', 'Shlomi', 'Tuller', 'Waldman'] | [] | Nucleic Acids Res | 2010 | 2010 May | 0 | e). GE data All expression data was downloaded from Gene Expression Omnibus ( 34 ) ( http://www.ncbi.nlm.nih.gov/geo/ ). Human tissues (including fetal tissues): we used the GE of Su et al. ( 35 ) (GDS596). As the original data set is redundant (i.e. it includes similar tissues; for example, more than 20 of the tissues are from different parts of the brain) we focused our analysis on 30 (out of 79) n|ssues ( Supplementary Table S2 ). Other GE sets: fetal and adult circulating blood reticulocytes (GDS2655), Mouse tissues (GDS592), Mouse fetal and adult liver (GSE13149), Mouse embryonic stem cells (GDS2666), Yeast (GDS772, wild type), Chimpanzee (GSE7540), Rat (GDS589, three strains), E. coli (GSE6836{{tag}}--REUSE--), D. melanogaster (GSE7763) and C. elegans (GSE8004). We averaged technical repeats and probes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
676 | GSE6838 | 2/1/2007 | ['6838'] | [] | [u'17242205'] | 3019229 | [u'21194446'] | ['Lim', 'Schelter', 'Jackson', 'Raymond', 'Johnson', 'Kibukawa', 'Cummins', 'Chau', 'Dai', 'Cleary', 'Linsley', 'Martin', 'Bartz', 'Carleton', 'Burchard'] | ['S\xc3\xa6trom', 'Saito'] | [] | BMC Bioinformatics | 2010 | 12/31/2010 | 0 | We downloaded the Jackson [25], Lim [6], Grimson [5], and Linsley [21] datasets from the Gene Expression Omnibus (GEO) database [GEO:GSE5814, GEO:GSE2075, GEO:GSE8501, GEO:GSE6838{{key}}--REUSE--] [38] and the Birmingham [24] dataset from the ArrayExpress database [ArrayExpress:E-MEXP-668] [39]. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
677 | GSE6838 | 2/1/2007 | ['6838'] | [] | [u'17242205'] | 2764428 | [u'19671526'] | ['Lim', 'Schelter', 'Jackson', 'Raymond', 'Johnson', 'Kibukawa', 'Cummins', 'Chau', 'Dai', 'Cleary', 'Linsley', 'Martin', 'Bartz', 'Carleton', 'Burchard'] | ['Xie', 'Tu', 'Li', 'Liu', 'Yu', 'Hua'] | [] | Nucleic Acids Res | 2009 | 2009 Oct | 1 | es the log 2 -transformed ratio of gene expression between treatment (after transfection) and control (before transfection). ( B ) mRNA level changes of miR-124’s putative non-target genes in GDS2657. Putative non-target genes are the whole set of genes found in GDS2657 minus the putative target genes. ( C ) Numbers of up- (up panel) and down-regulated (below panel) genes at different time poin|. The final TF–gene set included 130 338 relationships between 214 human TFs and 16 534 targets. For more information, see Supplementary Table 3 . MPGE datasets Five groups of MPGE datasets (GDS1858, GDS2657, GSE6474, GSE6838{{tag}}--REUSE-- and GSE7864) were downloaded from the GEO database; these groups include 53 individual datasets involving 19 miRNAs. The GDS1858 dataset group ( 2 ) includes data on HeLa|ansfection with wild type or mutant miR-1, miR-124, or miR-373. GDS2657 ( 13 ) includes gene expression profiles at seven time points (4, 8, 16, 24, 32, 72 and 120 h) after overexpression of miR-124. GSE6474 ( 14 ) includes four replicated measurements of gene expression changes after overexpression of let-7a. GSE6838{{tag}}--REUSE-- ( 15 ) includes gene expression data over a time course (6, 10, 14 and 24 h) after ov|mediators of miRNA-triggered regulation, summarized for each of 53 MPGE datasets Dataset group miRNA Cell line Time point K–S test P -value De-graded targets TF mediators Shuffling P -value GDS1858 miR-1 HeLa 12 1.2 e –15 91 ETS1, CREB1, YY1 0 24 4.7 e –15 107 TFAP2A, CREB1, YY1, SREBF1 0 miR-124 12 1.8 e –12 109 GLI3* 0.04 24 6.5 e –22 132 MLLT7, NKX6.1 0.03 m|02013;73 366 AHR*, RREB1 0 32 6.2 e –64 329 AHR*, SP1*, EGR1, RELA*, RREB1, NR3C1*, SP2 0 72 2.2 e –59 292 CREB1, SP1, ETS1, MLLT7, SP2 0 120 1.1 e –19 144 AHR*, SP1*, MLLT7 0 GSE6474 let-7a3 A549 Not known 1.1 e –2 1 PAX3, HOXA1, BACH2, EGR3, MYC 0.02 GSE6838{{tag}}--REUSE-- let-7c HCT116 Dicer−/− #2 24 1.5 e –54 211 MYC 0.05 miR-103 10 5.7 e –08 82 MEF2|– miR-195 10 4.3 e –30 137 SMAD7, NFATC3 0.05 24 7.4 e –31 98 FOXC1 0.08 miR-20 24 9.3 e –30 59 – – miR-215 24 2.6 e –11 38 – – GSE7864 miR-34a A549 H-1 term 24 1.0 e –29 112 E2F5*, YY1 0.03 HCT116 Dicer −/− #2 24 1.1 e –29 132 E2F3, YY1, NFE2L1 0.02 TOV21G H1-term 24 1.4 e –23 70 E2F5, BACH2|al or greater BIC score than that of the regressed Equation ( 2 ). Many of our predictions are supported by independent experimental studies. For example, our analyses of two MPGE datasets for miR-1 (GDS1858) predicted 130 primary targets; 50 (38.4%) of these targets appeared in TarBase ( 19 ), a database collecting experimentally validated miRNA targets. Some miRNAs, like let-7c, miR-16 and miR-17-5p,|ly (Please find these plots in the ‘wrapped results’ available at http://www.biosino.org/kanghu/~DCR/ supplementary file1.zip). With the exception of let-7a-3 in the A549 cell line (GSE6474), all miRNAs were found to have degradation-inducing ability in all surveyed situations, as the K–S test P -values were exclusively <0.001 ( Table 1 ). Some miRNAs, such as miR-124|ry targets account for a significant proportion of the observed mRNA level changes in an MPGE dataset ( Figure 1 C). For example, at the 32-h time point after miR-124 overexpression (dataset from the GDS2657 group), miRNA’s direct regulation could explain decreased MCs of only 181 genes; with our predicted two-layer regulatory model, the decreased MCs of an additional 98 genes and increased MCs| of another 42 genes were attributed to miRNA’s indirect regulation, raising the proportion of explainable MCs from 27.8 to 47.7%. The classifications of regulated genes at all time-points in GDS2657 are shown in Supplementary Table 8 , where a general trend is evident that the direct regulation decreases rapidly while the secondary regulation maintains at a considerable multitude, resulting i|tered on miRNA and mediated by TFs Figure 3 depicts a typical two-layer regulatory network, mined from an MPGE dataset measured at the 12-h time point after overexpression of miR-1 (dataset from the GDS1858 group). In addition to directly down-regulating 91 degraded targets (blue arrows), miR-1 overexpression causes expression changes in more than 100 non-target genes, possibly through translationally|O:005114, GO:000752; all FDRs < 0.29), consistent with the known fact that miR-1 is expressed selectively in heart and skeletal muscle. Similar analyses were performed on the miR-124 network (GDS2657, 32 h), resulting in the identification of 129 significant biological themes (FDR < 0.25), among which neuron apoptosis (GO:0051402) is in accordance with miR-124’s proven role in d | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
678 | GSE6838 | 2/1/2007 | ['6838'] | [] | [u'17242205'] | 2909566 | [u'20508147'] | ['Lim', 'Schelter', 'Jackson', 'Raymond', 'Johnson', 'Kibukawa', 'Cummins', 'Chau', 'Dai', 'Cleary', 'Linsley', 'Martin', 'Bartz', 'Carleton', 'Burchard'] | ['Krogh', 'Marks', 'Jacobsen', 'Wen'] | [] | Genome Res | 2010 | 2010 Aug | 0 | utic small RNAs.  Other Sections� Abstract Results and Discussion Methods References Methods Experimental and sequence data Microarray expression data sets were obtained from the NCBI Gene Expression Omnibus: 11 different miRNA transfections in HeLa cells measured 24 h after transfection (accession nos. GSE2075 and GSE8501 ) ( Lim et al. 2005 ; Grimson et al. 2007 ), miR-124 transfection time-series | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
679 | GSE6838 | 2/1/2007 | ['6838'] | [] | [u'17242205'] | 2945792 | [u'20799968'] | ['Lim', 'Schelter', 'Jackson', 'Raymond', 'Johnson', 'Kibukawa', 'Cummins', 'Chau', 'Dai', 'Cleary', 'Linsley', 'Martin', 'Bartz', 'Carleton', 'Burchard'] | ['Agius', 'Sander', 'Koppal', 'Leslie', 'Betel'] | [] | Genome Biol | 2010 | 2010 | 0 | ated targets are missed. We note that this effect is not restricted to our particular choice of conservation measure or even to the mirSVR scoring system. We repeated the analysis with context scores downloaded from TargetScan and using their associated conservation scores ( P CT ) [ 26 ] and similarly found no improvement in detection rates of the most downregulated targets with increased P CT thresho|ing of microRNA regulation in a physiological context. Materials and methods Training and test data sets Training data The mRNA expression training data was taken from the Grimson et al. [ 8 ] [GEO:GSE8501] data set, containing expression arrays from HeLa cells transfected by miR-122a, miR-128a, miR-132, miR-133a, miR-142, miR-148b, miR-181a, miR-7, miR-9. Although mRNA expression was measured at 12 |on change after transfection was positive. Test data of microRNA transfection with mRNA expression measurements The mRNA expression test data set was taken from the Linsley et al. study [ 21 ] [GEO:GSE6838{{tag}}--REUSE--], which comprised expression data from let-7c, miR-103, miR-106b, miR-141, miR-15a, miR-16, miR-17-5p, miR-192, miR-20, miR-200a, and miR-215 transfection experiments (all measured after 24 h), and|-16, hsa-miR-30e-5p, hsa-miR-19b, hsa-miR-32, hsa-miR-20a and hsa-miR-21) and searched for their target sites in genes that are enriched in the AGO1-4 IPs. Microarray data from the IP experiments was downloaded from [ 36 ] and normalized using the GCRMA R package; log enrichment values were computed using the limma package. CLIP data Data was provided by private communication from the authors. Non-can|ing a weighted dynamic programming approach where matches in the seed regions have higher position-specific weights, resulting in alignments that strongly favor 5' base-pairing. 3' UTR sequences were downloaded from UCSC genome browser, with the longest UTR chosen from afilternative isoforms. "Canonical target" sites are defined as sites that contain minimally a 6-mer perfect match at positions 2 to 7 of | for Selbach et al. test set, and IP enrichment for Landthaler et al. test set) were Z -score transformed. Context score and PITA scores Context scores values were computed using the source code downloaded from [ 39 ] that implements the regression model described in Grimson et al. 2007. Briefly, context score is composed of three regression values, which are specific to each seed class, that model|r, T1A 7-mer, m8 7-mer and 8-mer; the context score is computed as the sum of the three regression values specific to the seed class. The computed context scores are highly correlated with the scores downloaded from TargetScan release 5.0 (0.96 average Pearson correlation, see Additional file 1 , Table S1). PITA scores were computed with code downloaded from [ 40 ] using default parameters. Target sites | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
680 | GSE6838 | 2/1/2007 | ['6838'] | [] | [u'17242205'] | 2775596 | [u'19767416'] | ['Lim', 'Schelter', 'Jackson', 'Raymond', 'Johnson', 'Kibukawa', 'Cummins', 'Chau', 'Dai', 'Cleary', 'Linsley', 'Martin', 'Bartz', 'Carleton', 'Burchard'] | ['Zavolan', 'Hausser', 'Landthaler', 'Jaskiewicz', 'Gaidatzis'] | [] | Genome Res | 2009 | 2009 Nov | 0 | AND pmc_gds | 0 | 1 | ||||
681 | GSE6839 | 3/22/2007 | ['6839'] | ['2619'] | [u'17318176'] | 1864965 | [u'17318176'] | ['Johansson', 'Bernhardsson', 'Stenberg', 'Larsson'] | ['Johansson', 'Bernhardsson', 'Stenberg', 'Larsson'] | ['Johansson', 'Bernhardsson', 'Stenberg', 'Larsson'] | EMBO J | 2007 | 5/2/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
682 | GSE6841 | 1/31/2007 | ['6841'] | [] | [] | 2238769 | [u'18045468'] | [''] | ['Ralli\xc3\xa8re', 'Esquerr\xc3\xa9', 'Rescan', 'Montfort', 'Le', 'Hugot'] | [] | BMC Genomics | 2007 | 11/28/2007 | 0 | then refed for 4, 7, 11 and 36 days. At each time point, eight to nine fish were sampled giving in total 43 separate complex cDNA targets that were hybridized to 43 microarrays (GEO accession number: GSE6841{{tag}}--REUSE--). Unsupervised hierarchical clustering of gene expression patterns from all samples produced a consistent grouping of the samples according to the fish feeding conditions (i.e. fasting and 4, 7, 11|STER software and the results were visualized by TREEVIEW [ 11 ]. Data mining Rainbow trout sequences originating from INRA Agenae [ 35 ] and USDA [ 36 ] EST sequencing programs were used to generate publicly available contigs [ 38 ]. The 8th version (Om.8, released January 2006) was used for Blast X comparison against the Swiss-Prot database (January 2006) [ 39 ]. The score of each alignment was retrieve | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
683 | GSE6843 | 3/20/2007 | ['6843'] | [] | [u'20357053', u'17352797'] | 2375040 | [u'17900367'] | ['Van', 'Wang', 'Kampf', 'Band', 'Itoh', 'Lusis', 'Arnold', 'Clayton', 'Yang', 'Kim', 'Yehya', 'Replogle', 'Wade', 'Melamed', 'Schadt'] | ['Melamed', 'Arnold'] | ['Melamed', 'Arnold'] | Genome Biol | 2007 | 2007 | 0 | probe sets consisted of 9,692 for brain (465 Z, 9,227 autosomal), 8,737 for liver (415 Z, 8,322 autosomal), and 9,119 for heart (444 Z, 8,675 autosomal). Array data are available from Gene Expression Omnibus [ 41 ] (accession numbers GSE6843{{tag}}--DEPOSIT--, GSE6844, GSE6856). Gene positions on the Z chromosome were based on release 2.1 of the chicken genome [ 42 ]. Expression data analysis Statistical analyses were per|ia elegans ) and ostrich ( Struthio camelus ) and the process of sex chromosome differentiation in palaeognathous birds. Chromosoma 2007 116 159 173 17219176 10.1007/s00412-006-0088-y Gene Expression Omnibus Ensembl The R Project for Statistical Computing Resampling Stats Dennis G Jr Sherman BT Hosack DA Yang J Gao W Lane HC Lempicki RA DAVID: Database for Annotation, Visualization, and Integrated Discov | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
684 | GSE6843 | 3/20/2007 | ['6843'] | [] | [u'20357053', u'17352797'] | 2373894 | [u'17352797'] | ['Van', 'Wang', 'Kampf', 'Band', 'Itoh', 'Lusis', 'Arnold', 'Clayton', 'Yang', 'Kim', 'Yehya', 'Replogle', 'Wade', 'Melamed', 'Schadt'] | ['Van', 'Wang', 'Kampf', 'Clayton', 'Itoh', 'Lusis', 'Arnold', 'Band', 'Yang', 'Replogle', 'Yehya', 'Melamed', 'Schadt'] | ['Van', 'Wang', 'Kampf', 'Clayton', 'Itoh', 'Lusis', 'Arnold', 'Band', 'Yang', 'Replogle', 'Yehya', 'Melamed', 'Schadt'] | J Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
685 | GSE6843 | 3/20/2007 | ['6843'] | [] | [u'20357053', u'17352797'] | 2847754 | [u'20357053'] | ['Van', 'Wang', 'Kampf', 'Band', 'Itoh', 'Lusis', 'Arnold', 'Clayton', 'Yang', 'Kim', 'Yehya', 'Replogle', 'Wade', 'Melamed', 'Schadt'] | ['Itoh', 'Arnold', 'Clayton', 'Kim', 'Replogle', 'Wade'] | ['Itoh', 'Arnold', 'Clayton', 'Kim', 'Replogle', 'Wade'] | Genome Res | 2010 | 2010 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
686 | GSE6844 | 3/20/2007 | ['6844'] | [] | [u'20357053', u'17352797'] | 2375040 | [u'17900367'] | ['Van', 'Wang', 'Kampf', 'Band', 'Itoh', 'Lusis', 'Arnold', 'Clayton', 'Yang', 'Kim', 'Yehya', 'Replogle', 'Wade', 'Melamed', 'Schadt'] | ['Melamed', 'Arnold'] | ['Melamed', 'Arnold'] | Genome Biol | 2007 | 2007 | 0 | probe sets consisted of 9,692 for brain (465 Z, 9,227 autosomal), 8,737 for liver (415 Z, 8,322 autosomal), and 9,119 for heart (444 Z, 8,675 autosomal). Array data are available from Gene Expression Omnibus [ 41 ] (accession numbers GSE6843, GSE6844{{tag}}--DEPOSIT--, GSE6856). Gene positions on the Z chromosome were based on release 2.1 of the chicken genome [ 42 ]. Expression data analysis Statistical analyses were per|ia elegans ) and ostrich ( Struthio camelus ) and the process of sex chromosome differentiation in palaeognathous birds. Chromosoma 2007 116 159 173 17219176 10.1007/s00412-006-0088-y Gene Expression Omnibus Ensembl The R Project for Statistical Computing Resampling Stats Dennis G Jr Sherman BT Hosack DA Yang J Gao W Lane HC Lempicki RA DAVID: Database for Annotation, Visualization, and Integrated Discov | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
687 | GSE6844 | 3/20/2007 | ['6844'] | [] | [u'20357053', u'17352797'] | 2373894 | [u'17352797'] | ['Van', 'Wang', 'Kampf', 'Band', 'Itoh', 'Lusis', 'Arnold', 'Clayton', 'Yang', 'Kim', 'Yehya', 'Replogle', 'Wade', 'Melamed', 'Schadt'] | ['Van', 'Wang', 'Kampf', 'Clayton', 'Itoh', 'Lusis', 'Arnold', 'Band', 'Yang', 'Replogle', 'Yehya', 'Melamed', 'Schadt'] | ['Van', 'Wang', 'Kampf', 'Clayton', 'Itoh', 'Lusis', 'Arnold', 'Band', 'Yang', 'Replogle', 'Yehya', 'Melamed', 'Schadt'] | J Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
688 | GSE6844 | 3/20/2007 | ['6844'] | [] | [u'20357053', u'17352797'] | 2847754 | [u'20357053'] | ['Van', 'Wang', 'Kampf', 'Band', 'Itoh', 'Lusis', 'Arnold', 'Clayton', 'Yang', 'Kim', 'Yehya', 'Replogle', 'Wade', 'Melamed', 'Schadt'] | ['Itoh', 'Arnold', 'Clayton', 'Kim', 'Replogle', 'Wade'] | ['Itoh', 'Arnold', 'Clayton', 'Kim', 'Replogle', 'Wade'] | Genome Res | 2010 | 2010 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
689 | GSE6846 | 8/24/2007 | ['6846'] | ['3106'] | [u'18070348'] | 2234434 | [u'18070348'] | ['Matsuzaki', 'Ikegami', 'Wang', 'Whitsett', 'Xu'] | ['Matsuzaki', 'Ikegami', 'Wang', 'Whitsett', 'Xu'] | ['Matsuzaki', 'Ikegami', 'Wang', 'Whitsett', 'Xu'] | BMC Genomics | 2007 | 12/10/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
690 | GSE6847 | 10/10/2007 | ['6847'] | [] | [u'17660562'] | 2013730 | [u'17660562'] | ['Jobin-Robitaille', 'Genereaux', 'Brandl', 'Hoke', 'Hannam', 'Haniford', 'MacKenzie', u'Cote', 'Andrews', 'Guzzo', 'C\xc3\xb4t\xc3\xa9', u'Abrassart', 'Mutiu'] | ['Jobin-Robitaille', 'Genereaux', 'Brandl', 'Hoke', 'Hannam', 'Haniford', 'MacKenzie', 'Andrews', 'Guzzo', 'C\xc3\xb4t\xc3\xa9', 'Mutiu'] | ['Jobin-Robitaille', 'Genereaux', 'Brandl', 'Hoke', 'Hannam', 'Haniford', 'MacKenzie', 'Andrews', 'Guzzo', 'C\xc3\xb4t\xc3\xa9', 'Mutiu'] | Genetics | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
691 | GSE6850 | 6/15/2007 | ['6850'] | ['2766'] | [u'17456467'] | 2745720 | [u'17456467'] | ['Drosatos', 'Kardassis', 'Kypreos', 'Sanoudou', 'Zannis'] | ['Drosatos', 'Kardassis', 'Kypreos', 'Sanoudou', 'Zannis'] | ['Drosatos', 'Kardassis', 'Kypreos', 'Sanoudou', 'Zannis'] | J Biol Chem | 2007 | 7/6/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
692 | GSE6855 | 6/26/2007 | ['6855'] | [] | [u'17506877'] | 1906782 | [u'17506877'] | ['', 'Taliercio', 'Boykin'] | ['Taliercio', 'Boykin'] | ['Taliercio', 'Boykin'] | BMC Plant Biol | 2007 | 5/16/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
693 | GSE6856 | 3/20/2007 | ['6856'] | [] | [u'20357053', u'17352797'] | 2375040 | [u'17900367'] | ['Van', 'Wang', 'Kampf', 'Band', 'Itoh', 'Lusis', 'Arnold', 'Clayton', 'Yang', 'Kim', 'Yehya', 'Replogle', 'Wade', 'Melamed', 'Schadt'] | ['Melamed', 'Arnold'] | ['Melamed', 'Arnold'] | Genome Biol | 2007 | 2007 | 0 | probe sets consisted of 9,692 for brain (465 Z, 9,227 autosomal), 8,737 for liver (415 Z, 8,322 autosomal), and 9,119 for heart (444 Z, 8,675 autosomal). Array data are available from Gene Expression Omnibus [ 41 ] (accession numbers GSE6843, GSE6844, GSE6856{{tag}}--DEPOSIT--). Gene positions on the Z chromosome were based on release 2.1 of the chicken genome [ 42 ]. Expression data analysis Statistical analyses were per|ia elegans ) and ostrich ( Struthio camelus ) and the process of sex chromosome differentiation in palaeognathous birds. Chromosoma 2007 116 159 173 17219176 10.1007/s00412-006-0088-y Gene Expression Omnibus Ensembl The R Project for Statistical Computing Resampling Stats Dennis G Jr Sherman BT Hosack DA Yang J Gao W Lane HC Lempicki RA DAVID: Database for Annotation, Visualization, and Integrated Discov | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
694 | GSE6856 | 3/20/2007 | ['6856'] | [] | [u'20357053', u'17352797'] | 2373894 | [u'17352797'] | ['Van', 'Wang', 'Kampf', 'Band', 'Itoh', 'Lusis', 'Arnold', 'Clayton', 'Yang', 'Kim', 'Yehya', 'Replogle', 'Wade', 'Melamed', 'Schadt'] | ['Van', 'Wang', 'Kampf', 'Clayton', 'Itoh', 'Lusis', 'Arnold', 'Band', 'Yang', 'Replogle', 'Yehya', 'Melamed', 'Schadt'] | ['Van', 'Wang', 'Kampf', 'Clayton', 'Itoh', 'Lusis', 'Arnold', 'Band', 'Yang', 'Replogle', 'Yehya', 'Melamed', 'Schadt'] | J Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
695 | GSE6856 | 3/20/2007 | ['6856'] | [] | [u'20357053', u'17352797'] | 2847754 | [u'20357053'] | ['Van', 'Wang', 'Kampf', 'Band', 'Itoh', 'Lusis', 'Arnold', 'Clayton', 'Yang', 'Kim', 'Yehya', 'Replogle', 'Wade', 'Melamed', 'Schadt'] | ['Itoh', 'Arnold', 'Clayton', 'Kim', 'Replogle', 'Wade'] | ['Itoh', 'Arnold', 'Clayton', 'Kim', 'Replogle', 'Wade'] | Genome Res | 2010 | 2010 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
696 | GSE6858 | 4/23/2007 | ['6858'] | ['2647'] | [u'17437023'] | 2258452 | [u'17921359'] | ['Perkins', 'Lu', 'Jain', 'Finn'] | ['Friedman', 'Novershtern', 'Itzhaki', 'Manor', 'Kaminski'] | [] | Am J Respir Cell Mol Biol | 2008 | 2008 Mar | 1 | sms of asthma in humans. A systems-level view of asthma that integrates multiple levels of molecular and functional information is needed. For this, we compiled a gene expression compendium from five publicly available mouse microarray datasets and a gene knowledge base of 4,305 gene annotation sets. Using this collection we generated a high-level map of the functional themes that characterize animal mode|ntegration of multiple levels of information ( 17 – 19 ) and identification of regulatory modules in complex tissue and in disease ( 20 ). In this study we create a global map of asthma using publicly available gene expression datasets from multiple sources and tools that allow integration of multiple levels of information, such as functional annotations and protein interactions ( Figure 1 ). In a|nerate Protein regulatory network.  Other Sectionsâ�¼ Abstract MATERIALS AND METHODS RESULTS DISCUSSION Supplementary Material References MATERIALS AND METHODS Datasets We searched NCBI Gene Expression Omnibus (GEO) for all in vivo asthma murine models gene expression datasets, publicly available by June 2006. Five datasets that passed our inclusion criteria ( see online supplement) were combined to gen|Wills-Karp and coworkers): 12 lung samples from IL-13 knockout (IL-13–KO) and BALB/cJ wild-type (WT) mice that were treated with house dust mite (HDM) or with PBS as control. RAG (GEO series GSE483 , Wills-Karp and colleagues): 7 lung samples from BALB/cJ mice that were treated with ragweed pollen protein plus Alum or with PBS as control. MAH (Murine Airway Hyperresponsiveness, GEO series GSE|T6 expressed only in epithelial cells); and ( 4 ) BALB/cJ WT. The datasets have been previously described ( 3 – 5 , 8 ) and recently reviewed by Rolph and coworkers ( 21 ). TEST (GEO series GSE6858{{tag}}--REUSE-- [22]): 16 lung samples taken from BALB/cJ recombinase-activating gene–deficient (RD) mice and from WT mice. Lungs were collected 1 day after challenging with ovalbumin or PBS as control. T|odule Network Validation We used TEST dataset to independently validate our analysis. This recently published dataset was generated on Affymetrix GeneChip Mouse Genome 430 2.0 Arrays. Cell files were downloaded and normalized using RMAExpress. To measure how well the modules predict the expression of their genes in the new dataset, we measured the correlation of the modules between the new dataset and the|s available on the interactive AsthmaMap website ( http://compbio.cs.huji.ac.il/AsthmaMap ). The website allows visualizing the sets along with the expression patterns of their genes. The sets can be downloaded in a format applicable for Genomica software. In addition, user-defined gene lists can be uploaded and analyzed for enrichment in respect to any of the sets available in this study.  Other Sections | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
697 | GSE6858 | 4/23/2007 | ['6858'] | ['2647'] | [u'17437023'] | 2967749 | [u'21044366'] | ['Perkins', 'Lu', 'Jain', 'Finn'] | ['Yousif', 'Mbagwu', 'Ohno-Machado', 'Lacson'] | [] | BMC Bioinformatics | 2010 | 10/28/2010 | 0 | es/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background The amount of data deposited in the Gene Expression Omnibus (GEO) has expanded significantly. It is important to ensure that these data are properly annotated with clinical data and descriptions of experimental conditions so that they can be useful for future|istency. Association between relevant variables, however, was adequate. 10–12 March 2010 2010 AMIA Summit on Translational Bioinformatics San Francisco, CA, USA Background The Gene Expression Omnibus (GEO) project was initiated by the National Center for Biotechnology Information (NCBI) to serve as a repository for gene expression data [ 1 , 2 ]. In addition to GEO, there are several other large-| 400,000 samples. There has been an ever growing interest in large microarray repositories for several reasons: (a) Microarray data are required by funding agencies and scientific journals to be made publicly accessible; (b) such repositories enable researchers to view data from other research groups; and (c) with proper pre-processing, such repositories may allow researchers to formulate and test hypothe|viously described [ 14 , 15 ]. The annotation tool used for this research was developed to facilitate human annotation by allowing easy access between the data descriptions and measurements that were downloaded from GEO and appropriate scientific publications from Pubmed [ 13 ]. The annotators are able to read the study descriptions that researchers deposited in GEO, as well as individual sample descripti|, and the results are displayed in Table 3 . Table 4 shows all the studies’ goals and the number of samples in each of the 17 annotated studies. Table 3 Coverage of Asthma variables in GDS GSE 470 GSE 473 GSE 3183 GSE 3004 Total Agent 100% 0% 100% 100% 17.4% Disease State 100% 100% 0% 0% 88.2% Time 100% 0% 100% 0% 12.7% Other 0% 100% 0% 0% 82.5% No. of Samples 12 175 15 10 212 Table 4 Annota|dy No. of Samples Topic/Title GSE8052 404 Determinants of susceptibility to childhood asthma GSE473 175 Defining diagnostic genes from purified CD4+ blood cells that have specific diagnostic profiles GSE4302 118 Profiling of airway epithelial cells GSE3184 40 Murine airway hyperresponsiveness GSE483 39 Allergic response to ragweed GSE1301 24 Mechanisms by which IL-13 elicits the symptoms of asthma GSE8|fects of exercise on gene expression GSE6858{{tag}}--REUSE-- 16 Expression data from experimental murine asthma GSE3183 15 Early cytokine-mediated mechanisms that lead to asthma GSE470 12 Asthma exacerbatory factors GSE9465 12 Pulmonary responses to ambient particulate matter GSE3004 10 Effects of allergen challenge on airway cell gene expression GSE2276 9 Effect of PGE receptor subtype agonist on an asthma model GSE4|d inhaler 697 24.1 Disease frequency 627 31.7 Gender 489 46.7 Atopic 425 53.7 Tissue 403 56.1 Challenge 0 1.0 The consistency of the studies in the asthma domain was also measured. In one such study (GSE4302), the data for 32 asthmatics randomized to a placebo-controlled trial of fluticasone propionate were examined. The authors use the generic name “fluticasone propionate” within both | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
698 | GSE6858 | 4/23/2007 | ['6858'] | ['2647'] | [u'17437023'] | 1865580 | [u'17437023'] | ['Perkins', 'Lu', 'Jain', 'Finn'] | ['Perkins', 'Lu', 'Jain', 'Finn'] | ['Perkins', 'Lu', 'Jain', 'Finn'] | Mol Syst Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
699 | GSE6863 | 1/27/2007 | ['6863'] | ['2750'] | [u'18314479'] | 2768750 | [u'19832978'] | ['Munroe', 'Varesio', 'Elia', 'Eva', 'Puppo', 'Giovarelli', 'Cappello', 'Wu', 'Ricciardi', 'Fardin', 'Vanni'] | ['Varesio', 'Mosci', 'Barla', 'Verri', 'Rosasco', 'Fardin'] | ['Fardin', 'Varesio'] | BMC Genomics | 2009 | 10/15/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
700 | GSE6864 | 7/4/2007 | ['6864'] | ['2934'] | [u'17683626'] | 2945940 | [u'20840752'] | ['Hooiveld', 'de', u'Bunger', 'M\xc3\xbcller', 'van', 'B\xc3\xbcnger', u'Muller'] | ['Ojeda', 'de', 'Nitsch', 'Gon\xc3\xa7alves', 'Moreau'] | ['de'] | BMC Bioinformatics | 2010 | 9/14/2010 | 0 | n, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expr| approach for scoring candidates is presented here for comparison purposes as a naïve strategy for network analysis of differential expression. We have benchmarked these four strategies on 40 publicly available data sets originated from Affymetrix chips on which mice with (simple) knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. For each data set we|egy based on a direct neighborhood analysis. These four network-based prioritization strategies for scoring candidate genes based on their differentially expressed neighborhood were benchmarked on 40 publicly available knockout experiments in mice. Performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expressi|ng candidate genes using network-based machine learning approaches even if no knowledge is available about the disease or phenotype. Methods Benchmark data The benchmark for this study consists of 40 publicly available data sets originated from Affymetrix chips on which mice with knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. Table 3 shows all data sets|ur benchmark. Table 3 The benchmark data. Gene Name GEO accession number Gene Name GEO accession number 1 Abca1 GSE5496 21 Mbnl1 GSE14691 2 Btk GSE2826 22 Mst1r, Ron GSE16629 3 Cav1 GSE10849 23 MyD88 GSE6688 4 Cav3 GSE10848 24 Nos3, eNos GSE1988 5 Cftr GSE5715 25 Phgdh GSE8555 6 Clcn1 GSE14691 26 Pmp22 GSE1947 7 Cnr1 GSE7694 27 PPAR α GSE6864{{tag}}--REUSE-- 8 Emd GSE5304 28 Prkag3, AMPK G3 GSE4065 9 Epas1, Hi|-2 GSE16067 29 Pthlh, Pthrp GSE17654 10 Esrra GSE7196 30 Rab3a GSE6527 11 Gap43 GSE12687 31 RasGrf1 GSE8425 12 Gnmt GSE9809 32 Rbm15 GSE12628 13 Hdac1 GSE5583 33 Runx GSE4911 14 Hdac2 GSE6770 34 Scd1 GSE2926 15 Hsf4 GSE12415 35 Slc26a4 GSE10587 16 Hspa1A, Hsp70.1 GSE11120 36 Srf GSE13333 17 Il6 GSE411 37 Tgm2 GSE10285 18 Lhx1, Lim1 GSE4230 38 Zc3h12a GSE14891 19 Lhx8 GSE11897 39 Zfp36, Tpp GSE5324 20 L | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
701 | GSE6864 | 7/4/2007 | ['6864'] | ['2934'] | [u'17683626'] | 1971072 | [u'17683626'] | ['Hooiveld', 'de', u'Bunger', 'M\xc3\xbcller', 'van', 'B\xc3\xbcnger', u'Muller'] | ['B\xc3\xbcnger', 'Hooiveld', 'de', 'van', 'M\xc3\xbcller'] | ['B\xc3\xbcnger', 'M\xc3\xbcller', 'de', 'Hooiveld', 'van'] | BMC Genomics | 2007 | 8/7/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
702 | GSE6865 | 3/27/2007 | ['6865'] | [] | [u'17322312'] | 2661003 | [u'19087282'] | ['van', 'Keijser', 'Montijn', 'Rauwerda', 'Brul', 'Schuren', 'Ter'] | ['Hofstede', 'van', 'Kuipers', 'Roerdink', 'Silvis', 'Blom'] | ['van'] | BMC Bioinformatics | 2008 | 12/16/2008 | 0 | nscriptome analysis from a study on the growth transitions of Bacillus subtilis . Data from this experiment was obtained from the Gene Expression Omnibus database from NCBI [ 18 ] (accession number: GSE6865{{tag}}--REUSE--). The authors of the original study applied a K-means clustering to reveal patterns of temporal gene expression. The optimum number of clusters was revealed by principal-component analysis and orde | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
703 | GSE6872 | 4/27/2007 | ['6872'] | ['2697'] | [u'17327269'] | 2811022 | [u'19906709'] | ['Krawetz', 'Diamond', 'Goodrich', 'Strader', 'Quintana', 'Platts', 'Rockett', 'Dix', 'Rawe', 'Thompson', 'Chemes'] | ['Kuznetsov', 'Grinchuk', 'Jenjaroenpun', 'Zhou', 'Orlov'] | [] | Nucleic Acids Res | 2010 | 2010 Jan | 1 | Affymetrix Chip U133 hybridization signal of mRNA of the SAT and random non-SAT mRNA pairs shows preferential over-expression genes in CASGP. 21 brain samples from epilepsy patients were used (GEO ID GSE4290). Using APMA-defined reliable set of Affymetrix target sequences, we calculated correlations of SAT pairs in several normal and cancer tissues. Statistical testing of the frequency distribution fun|APMA DB ( 30 ). Chip U133A&B microarray expression and clinical data (70 breast cancer patients with grade 1) were described by Ivshina et al. , ( 32 ) and were presented in GEO NCBI DB (ID: GSE4922). Gene Ontology analysis reveals specific functional categories of the genes in SAT pairs A total of 5473 DAVID IDs of our 7113 SAT RefSeq IDs were recognized by DAVID software ( http://david.abcc.|D3 SA gene pair in normal spermatogenesis and teratozoospermia. ( A ) Expression profiles for NDUFAF3, DALRD3, hsa-mir-let7d , DICER1 and BNC2 in N S and T Z retrieved from Genome Omnibus platform GDS1665. N S , patients with normal spermatogenesis ( n = 13); T Z , patients with teratozoospermia ( n = 8). ( B ) Scheme of hypothetical mechanism by which in normal spermatogenesis local activation |sion platforms to examine transcription profiles of 13 reproductively normal spermatozoal RNA samples, N S , and 8 spermatozoal RNA samples from severe teratozoospermia, T Z . Based on NCBI GEO data (GSE6872{{tag}}--REUSE--) reported in ( 43 ), we found that DALRD3 and NDUFAF3 are strongly co-expressed in sperm cells of normal individuals ( Figure 7 ). Additionally, these two SA genes showed significant positive corre | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
704 | GSE6872 | 4/27/2007 | ['6872'] | ['2697'] | [u'17327269'] | 2988811 | [u'21124965'] | ['Krawetz', 'Diamond', 'Goodrich', 'Strader', 'Quintana', 'Platts', 'Rockett', 'Dix', 'Rawe', 'Thompson', 'Chemes'] | ['Goto', 'Rennert', 'Nagashima', 'Kumamoto', 'Hussain', 'Saito', 'Horikawa', 'Harris', 'Furusato', 'Robles', 'Yokota', 'Baxendale', 'Trivers', 'Sesterhenn', 'Takenoshita', 'Okamura', 'Yamashita', 'Lee'] | [] | PLoS One | 2010 | 11/19/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
705 | GSE6874 | 3/1/2007 | ['6874'] | [] | [u'17407386'] | 1845155 | [u'17407386'] | ['', 'Chute', 'Nevins', 'Marshall', 'Ginsburg', 'Chao', 'Muramoto', 'Meadows', 'Dressman'] | ['Chute', 'Nevins', 'Marshall', 'Ginsburg', 'Chao', 'Muramoto', 'Meadows', 'Dressman'] | ['Chute', 'Nevins', 'Marshall', 'Ginsburg', 'Chao', 'Muramoto', 'Meadows', 'Dressman'] | PLoS Med | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
706 | GSE6881 | 2/5/2007 | ['6881'] | [] | [u'15496517'] | 2652405 | [u'19384428'] | ['Small', 'Skinner', 'Griswold', 'Shima', 'Uzumcu'] | ['Holt', 'Jans', 'Loveland', 'Efthymiadis', 'Hime', 'Ly-Huynh'] | [] | Curr Genomics | 2007 | 2007 Aug | 1 | tructural changes that transform the round spermatids into elongated spermatids and ultimately the mature spermatozoa is formed [ 43 , 48 ]. Based on previous in situ hybridization data [ 10 ] and publicly accessible Affymetrix array data summarised and presented herein (Fig. 3 ), mouse IMP α1 mRNA exhibits the most ubiquitous expression pattern during spermatogenesis, from the spermatogonium |ort [ 10 ]. Consistent with these data, an age series Affymetrix analysis of the IMP αs supports the theory that different importins have distinct functions during development (NCBI reference GSE6881{{tag}}--MENTION--, GDS605-6). The age series examined herein encompasses the period from mouse gonadal differentiation (E11.5-12.5dpp), to adulthood (~56dpp). Throughout this period, distinct gonadal cell types begi|expression profile of the mouse IMP αs in the developing mouse testis, encompassing the processes of embryonic gonadal development through to spermiogenesis in the adult (NCBI GEO references: GSE2736) [ 61 ]. Arbitrary Affymetrix expression values are provided. Fig. (4). Microarray based expression profile of the different IMP αs in subpopulations of adult mouse testis cells (NCBI GEO r | 1 | 0 | 1 | NOT pmc_gds | 0 | 1 |
707 | GSE6883 | 3/10/2007 | ['6883'] | ['2618', '2617'] | [u'17229949'] | 2880288 | [u'20459635'] | ['Wang', 'Clarke', 'Dalerba', 'Lewicki', 'Liu', 'Gurney', 'Sherlock', 'Chen', 'Shedden', 'Hoey'] | ['Patel', 'Butte'] | [] | BMC Med Genomics | 2010 | 5/6/2010 | 0 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 | |
708 | GSE6887 | 5/9/2007 | ['6887'] | ['2735'] | [u'17488182'] | 2666812 | [u'19237447'] | ['Holmes', 'Lee', 'Yan', 'Critchley-Thorne', 'Nacu', 'Weber'] | ['Bindea', 'Pag\xc3\xa8s', 'Charoentong', 'Mlecnik', 'Hackl', 'Galon', 'Kirilovsky', 'Tosolini', 'Fridman', 'Trajanoski'] | [] | Bioinformatics | 2009 | 4/15/2009 | 0 | organized GO/pathway term network. It can analyze one or compare two lists of genes and comprehensively visualizes functionally grouped terms. A one-click update option allows ClueGO to automatically download the most recent GO/KEGG release at any time. ClueGO provides an intuitive representation of the analysis results and can be optionally used in conjunction with the GOlorize plug-in. Availability: h|al ). 2.2 Annotation sources To allow a fast analysis, ClueGO uses precompiled annotation files including GO, KEGG and BioCarta for a wide range of organisms. A one-click update feature automatically downloads the latest ontology and annotation sources and creates new precompiled files that are added to the existing ones. This ensures an up-to-date functional analysis. Additionally ClueGO can easily integ|mpares biological functions for clusters of genes we selected up- and down-regulated natural killer (NK) cell genes in healthy donors from an expression profile of human peripheral blood lymphocytes (GSE6887{{tag}}--REUSE--, Gene Expression Omnibus). For upregulated NK genes ClueGO revealed specific terms like ‘Natural killer cell mediated cytotoxicity’ in the group ‘Cellular defense response&# | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
709 | GSE6887 | 5/9/2007 | ['6887'] | ['2735'] | [u'17488182'] | 1865558 | [u'17488182'] | ['Holmes', 'Lee', 'Yan', 'Critchley-Thorne', 'Nacu', 'Weber'] | ['Holmes', 'Lee', 'Yan', 'Critchley-Thorne', 'Nacu', 'Weber'] | ['Holmes', 'Lee', 'Yan', 'Critchley-Thorne', 'Nacu', 'Weber'] | PLoS Med | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
710 | GSE6889 | 1/31/2007 | ['6889'] | [] | [u'17433778'] | 2080781 | [u'17433778'] | ['Smith', u'Zarraga', 'Rehren', 'Z\xc3\xa1rraga', 'Fontan', 'Walters'] | ['Smith', 'Rehren', 'Z\xc3\xa1rraga', 'Fontan', 'Walters'] | ['Smith', 'Rehren', 'Z\xc3\xa1rraga', 'Fontan', 'Walters'] | Tuberculosis (Edinb) | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
711 | GSE6890 | 4/23/2007 | ['6890'] | [] | [u'17420274'] | 1900058 | [u'17420274'] | ['Ondrej', 'van', 'Giromus', 'Mateos-Langerak', 'de', 'Goetze', 'Versteeg', 'Gierman', 'Indemans', 'Koster'] | ['Ondrej', 'van', 'Giromus', 'Mateos-Langerak', 'de', 'Goetze', 'Versteeg', 'Gierman', 'Indemans', 'Koster'] | ['Ondrej', 'van', 'Giromus', 'Mateos-Langerak', 'de', 'Goetze', 'Versteeg', 'Gierman', 'Indemans', 'Koster'] | Mol Cell Biol | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
712 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2709227 | [u'18937034'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Alexandrov', 'Zhang', 'Freidin', 'Brover', 'Feldmann', 'Lu', 'Bouck', 'Swaller', 'Troukhan', 'Flavell', 'Tatarinova'] | [] | Plant Mol Biol | 2009 | 2009 Jan | 0 | rich and other genes in rice for which microarray chip data are available. We obtained a list of rice microarray experiments from NCBI GEO database ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6893{{tag}}--REUSE-- (Jain et al. 2007 ) and GSE4438 (Walia et al. 2007 )). For each probe on an Oryza sativa 50K Affymetrix GeneChip Rice Genome Array ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL2025 ) | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
713 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2923530 | [u'20353606'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Narsai', 'Ivanova', 'Whelan', 'Ng'] | [] | BMC Plant Biol | 2010 | 3/31/2010 | 0 | nd germinating seed Dry seed and anaerobic germination (up to 24 h) and switch conditions cv. Amaroo [ 21 ] E-MEXP-2267 3 36 Imbibed seed Aerobic and anaerobic grown coleoptiles cv. Nipponbare [ 27 ] GSE6908 2 4 Coleoptile Embryo, endosperm, leaf and root from 7-d seedling, 10-d seedling cv. Zhonghua [ 28 ] GSE11966 2 10 Embryo, endosperm, leaf and root from 7-d seedling, 10-d seedling Stigma, Ovary+7 |af, semi apical meristem, inflorescence, seed cv. IR64 [ 30 ] GSE6893{{tag}}--REUSE-- 3 45 Mature leaf, young leaf, semi apical meristem, inflorescence, seed ABIOTIC STRESS Drought, salt, cold stress cv. IR64 [ 30 ] GSE6901 3 12 Seedling Heat stress cv. Zhonghua [ 31 ] GSE14275 3 6 Seedling Salt stress on 2 cultivars; indica, FL478 (salt tolerant), indica, IR29 (salt sensitive) [ 32 ] GSE3053 3 11 Crown and growing po| tolerant) [ 33 ] GSE4438 3 24 Panicle initiation stage Salt stress on root using 4 cultivars; FL478 (salt tolerant), IR29 (salt sensitive), IR63731 (salt tolerant), Pokkali (salt tolerant) Not found GSE14403 3 23 Root Fe and P treatments cv. Nipponbare [ 34 ] GSE17245 2 16 Root Arsenate treatment cv. Azucena [ 35 ] GSE4471 3 12 Seedling Physical stress at roots tips cv. Bala [ 35 ] GSE10857 3 12 Root |pponbare (resistant), IAC165 (susceptible) [ 36 ] GSE10373 2 24 Root M.grisea blast fungus infection cv. Nipponbare [ 37 ] GSE7256 2 8 Leaf Rice stripe virus infection cv. WuYun3, KT95-418 Not found GSE11025 3 12 Seedling Infection with bacteria X.Oryzae pv. oryzicola and oryzae cv. Nipponbare Not found GSE16793 4 60 Whole-plant tissue HORMONE TREATMENT Cytokinin treatment on root and leaf cv. Nippo|more tissue/stress microarray experiments, notably, this included Actin1 (LOC_Os05g36290.1; Gene 14 in Table 2 ) which was not expressed in all 3 biological replicates of the semi apical meristem (GSE6901) (Figure 2C ). Similarly, a recent study in rice defined a set of 248 stably expressed genes across 40 developmental tissues that were analysed using Yale/BGI oligonucleotide microarrays [ 22 ]. O|stance. The clustering analysis and heatmap generation was carried out using Partek Genomics Suite, version 6.3 (Partek). For the Agilent microarray comparison, data was retrieved under the accession GSE8518 from the Gene Expression Omnibus within the National Centre for Biotechnology Information database. Analysis of orthologues The InParanoid: Eukaryotic Orthologue Groups database (version 7.0) was u | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
714 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2528094 | [u'18650402'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Liu', 'Zhao', 'Lu', 'Han', 'Huang'] | [] | Plant Physiol | 2008 | 2008 Sep | 0 | a sequences, by BACs directly or by 87 assembled contigs, were performed. The alignment results of BGI 93-11 contigs and Nipponbare pseudomolecules, which were generated by the software nucmer, were downloaded using the GFF Dumper on the TIGR Genome Browser. We found that a small quantity of anchor results were self-contradictory; that is, two 93-11 contigs that localized on the same location yielded opp| more than 100 bp were further confirmed by BLAST2. The indica Guangluai 4 BACs were obtained from http://www.ncgr.ac.cn/chinese/databasei.htm . The genomic sequences of japonica Nipponbare were downloaded from http://www.tigr.org/tdb/e2k1/osa1 , and the indica 93-11 sequences were downloaded from ftp://ftp.genomics.org.cn . Mining of TIPs in the Rice Genome For each insertion region identified a| known TE repeat databases using RepeatMasker, as described above. Those elements, which were composed of a single LTR, were recognized as solo LTR retroelements. EST Analysis and Gene Prediction All publicly available rice ESTs were obtained from the National Center for Biotechnology Information EST database ( http://www.ncbi.nlm.nih.gov/projects/dbEST/ ). Full-length cDNAs of both KOME ( http://red.dna.|t the two gene fragments of indica XIP-I separated by a TE insertion. The probes in the two probe sets were remapped to the rice genomes, Nipponbare pseudomolecules and 93-11 contigs, by BLASTN. We downloaded the microarray data files of each experiment from the GEO Web site ( http://www.ncbi.nlm.nih.gov/geo/ ). Overall, there are 57 chips of indica IR64 (45 from GSE6893{{tag}}--REUSE-- and 12 from GSE6901 ) and 4 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
715 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2694483 | [u'19372385'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Tang', 'Paterson', 'Wang', 'Bowers'] | [] | Genome Res | 2009 | 2009 Jun | 0 | ay not actually contribute to gene conservation.  Other Sectionsâ�¼ Abstract Results Discussion Methods References Methods Inference of homologous quartets Rice and sorghum (version 1.0) sequences were downloaded from the RAP2 database ( http://rgp.dna.affrc.go.jp/E/index.html ) and Department of Energy Joint Genome Institute ( http://www.jgi.doe.gov/ ). We performed all-against-all BLASTP between rice and |alysis Genes in homologous quartets were linked to the PFAM domains (version 22) by running BLAST at E -value threshold 1 × 10 −5 . Expression analysis Rice gene expression data were downloaded from NCBI Gene Expression Omnibus ( GSE6893{{tag}}--REUSE-- ) ( Barrett et al. 2009 ), containing 45 Affymetrix microarray slides and for 15 samples (each having three replicates), which was generated with various|e between duplicated genes, the Pearson correlation coefficient was calculated for each pair with RAM measures. Sorghum bicolor transcript assemblies (48932 unigenes assembled from 203575 ESTs) were downloaded from TIGR Plant Transcript Assemblies database ( Childs et al. 2007 ). Each of the unigenes is composed of varying numbers of ESTs, which are used to approximate the number of times a particular ge | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
716 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2697655 | [u'19534757'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Gu', 'Zhang', 'Li', 'Gao', 'Ge', 'Luo'] | [] | BMC Bioinformatics | 2009 | 6/16/2009 | 0 | velopment such as panicle and seed [ 39 ]. The raw data containing 45 slides for 15 samples with three replicates for each sample were downloaded from the NCBI Gene Expression Omnibus (GEO dataset ID GSE6893{{tag}}--REUSE--) [ 40 ]. First, we assessed the quality for all 45 arrays using the affyPLM package [ 41 ] of Bioconductor [ 42 ] and discarded one slide (GSM159192) because of the possible artefact, implied by th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
717 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2719134 | [u'19535473'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Christoffels', 'Ramamoorthy', 'Ramachandran', 'Jiang'] | [] | Plant Physiol | 2009 | 2009 Aug | 0 | complete genome sequences, various genome annotation tools have been developed and corresponding databases established. For example, several rice ( Oryza sativa ) genome annotation databases are now publicly available for research, such as the Michigan State University (MSU) rice genome annotation database (previously The Institute for Genomic Research rice genome annotation database, now moved to MSU; | analyses. Expression Analyses of Hypothetical Genes Microarray expression data from 15 samples were used to assess the expression of hypothetical genes. These data were obtained from Gene Expression Omnibus (GEO) data sets ( Barrett et al., 2007 ; http://www.ncbi.nlm.nih.gov/geo/ ) with accession number GSE6893{{tag}}--REUSE-- ( Jain et al., 2007 ). We determined whether a hypothetical gene was expressed or not usin|used for retrieving all annotated hypothetical genes ( Swarbreck et al., 2008 ). Draft indica rice ( Yu et al., 2002 ), sorghum ( Sorghum bicolor ), and maize ( Oryza sativa ) genome sequences were downloaded from the following Web sites: http://rice.big.ac.cn/rice/index2.jsp , http://www.phytozome.net/sorghum , and http://www.maizesequence.org/index.html , respectively. Detection of Duplication-Rela| segmental genome duplication ( http://rice.plantbiology.msu.edu/segmental_dup/index.shtml ; Lin et al., 2006 ) for rice. Tandemly duplicated rice genes were determined using predicted rice proteins downloaded from the MSU rice genome annotation database (release 6) according to the description by Rizzon et al. (2006) . Transposon-like transcripts (16,185), as defined in the annotation, were removed fro|related genes between them. Detection of Transposon/Retrotransposon-Related Expansion of Hypothetical Genes For genome-wide identification of LTR retrotransposon elements, whole genome sequences were downloaded from release 6 of the MSU rice genome annotation database and were then used for detection of full-length LTR retrotransposons by executing the LTR_Finder program ( Xu and Wang, 2007 ). On the othe|full-length and solo LTR retrotransposons ( Chaparro et al., 2007 ). For detection of putative retrogenes in release 6 of the MSU rice annotation database, all annotated non-TE protein sequences were downloaded from the database, and those protein sequences from genes with single exons were subjected to BLASTP searches against all non-TE-related protein sequences deduced from two or more exon-containing c | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
718 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2726226 | [u'19604350'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Feng', 'Zhu', 'Wang', 'Zhang', 'Liu', 'Wu'] | [] | BMC Genomics | 2009 | 7/15/2009 | 0 | r, a site-specific posterior analysis was used to predict amino acid residues that were crucial for functional divergence. Investigation of transcription patterns Gene expression microarray datasets (GSE7951, GSE13161, GSE6893{{tag}}--REUSE--, GSE6908, and GSE6901 for rice; GSE680, GSE7641, and GSE8365 for Arabidopsis ) were downloaded from the GEO database in NCBI. The microarray data of rice include the analysis of | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
719 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2882264 | [u'20423940'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Zhao', 'Ma'] | [] | J Exp Bot | 2010 | 2010 Jun | 0 | base at the NCBI website ( http://www.ncbi.nlm.nih.gov/geo/ ) and the Rice Functional Genomic Express Database ( http://signal.salk.edu/cgi-bin/RiceGE ). For temporal and spatial expression analysis (GSE6893{{tag}}), different stages of panicle and seed development were categorized according to panicle length and days after pollination, respectively, based on landmark developmental events as follows. (i) Pani| morphogenesis (S3); 11–20 DAP, embryo maturation (S4); 21–29 DAP, dormancy and desiccation tolerance (S5) ( Itoh et al. , 2005 , Jain et al. , 2007 ). For abiotic stress analysis (GSE6901), rice seedlings were transferred to a beaker containing 200 mM NaCl solution for salt stress, dried between folds of tissue paper at 28±1 °C in a culture room for d|of Arabidopsis AGP-encoding genes were downloaded using ‘Bulk Gene Download’ at Nottingham Arabidopsis Stock Centre's microarray database, and the results of developmental stages (GSE5629–5633) and stress treatments (GSE5620–5621 and 5623–5624) were used to analyse the expression of AGP-encoding genes in Arabidopsis ( http://affymetrix.arabidopsis.info/narr|les from UniGene at http://www.ncbi.nlm.nih.gov/unigene/ ; 12, MPSS tags, http://mpss.udel.edu/rice/ ; 13, the absolute signal values were downloaded at http://signal.salk.edu/cgi-bin/RiceGE ; 14, GSE6893{{tag}}--REUSE--, expression at various developmental stages; 15, GSE661, expression under ABA and GA treatments; 16, GSE6901, expression under abiotic stresses treatments. All PAST-rich proteins used for final ana|and OsAGP22 , and OsAGP25 and OsAGP26 ) ( Fig. 6 ). Fig. 6. Expression profiles of AGP-encoding genes in various rice organs and tissues at different development stages. The microarray data sets (GSE6893{{tag}}--REUSE--) of gene expression at various developmental stages were used for cluster display. A heat map representing hierarchical clustering of average log signal values of all the AGP-encoding genes in vari|6 OsAffx-28335-1-S1_at 5696.2 233.3 0.04 Y OsLLA7 Os-9342-1-S1_at 3120.1 1746.4 0.56 OsLLA8 OsAffx-18220-1-S1_x_at 87.0 212.5 2.44 Y a The probe ID is used on microarray plate GPL2025. The experiment GSE6893{{tag}} was used for differential expression analysis. b MOV, the maximum absolute values of vegetative tissues. c MOR, the maximum absolute values of reproductive tissues. d The values of MOR divided by M|panicles. Expression analysis of rice AGP-encoding genes under abiotic stress, ABA, and GA treatments To investigate the abiotic stress response of rice AGP-encoding genes, the results of microarray (GSE6901) from 7-day-old seedlings subjected to drought, salt, and cold stresses were analysed. The data revealed that a total of 15 genes were significantly down- or up-regulated (<0.5 or >|x02009;h, and down-regulated by drought and salt stresses ( Fig. 8 ). Fig. 8. Expression profiles of rice AGP-encoding genes differentially expressed under abiotic stresses. The microarray data sets (GSE6901) of gene expression under various abiotic stresses were used for cluster display. The average log signal values of AGP-encoding genes under control and various stress conditions (indicated at the t|values) is shown at the bottom. CK, control; DS, drought stress; SS, salt stress; CS, cold stress. The results of ABA- and GA-treated callus were used to analyse the regulation of AGP-encoding genes (GSE661). It was found that many AGP-encoding genes were regulated by ABA and GA ( Supplementary Table S5 at JXB online). To verify the results of microarray under ABA and GA treatments, the transcriptio | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
720 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2990762 | [u'21124849'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Heazlewood', 'Oikawa', 'Ebert', 'Manisseri', 'Joshi', 'Scheller', 'Rennie'] | [] | PLoS One | 2010 | 11/23/2010 | 0 | ae with transcriptional data for dicots. Based on this data they proposed candidates of GT families involved in grass xylan synthesis. Recently, high-density Affymetrix array data for rice has become publicly available, thereby enabling more sensitive co-expression profiling analysis for rice [20] . A number of online tools are available for plant co-expression analysis [21�|/IRX9-L clade, and two genes in the IRX14/IRX14-L clade ( Figure 2A ). We examined the expression of the rice GT43 and GT47D genes in different developmental stages using rice Affymetrix DNA array GSE6893{{tag}}--REUSE-- data [33] . Interestingly, the expression patterns could be clearly defined into two distinct groups ( Figure 2B ). One type of expression profile was strongly dependent on tissue | this gene in ATTED-II. The red and blue arrowheads show the genes used as baits for the final comparative co-expression analysis. The y-axes show raw expression values from rice Affymetrix DNA array GSE6893{{tag}}--REUSE-- data [33] . The x-axes show tissue type: R; Root_7d_seedling, ML; Mature_leaf, YL; Young_leaf, P1; Young_inflorescence_P1, P2-P6, Inflorescence stage P2 to P6; S1-S5, Seed stage S1|lopmental stage in Arabidopsis and rice were obtained from Arabidopsis Affymetrix DNA array data available from AtGeneExpress at TAIR ( http://www.arabidopsis.org ) and rice Affymetrix DNA array data GSE6893{{tag}}--REUSE-- available from Rice array database ( http://www.ricearray.org ) [20] . The Pfam database (ver. 24.0) has a collection of 7677 unique protein functional domains based on Hidden Mark|m the rice genome annotation project ( http://rice.plantbiology.msu.edu ) [72] . Pfam domain profiling of Arabidopsis proteins To develop the Pfam domain-based algorithm in plants, we downloaded Arabidopsis gene product information from the AmiGO database ( http://amigo.geneontology.org ) for the following localization terms; GO:0005634 (3012 proteins; nucleus), GO:0005739 (1310 proteins; | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
721 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2245831 | [u'18065552'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Nijhawan', 'Khurana', 'Tyagi', 'Jain'] | ['Nijhawan', 'Khurana', 'Tyagi', 'Jain'] | Plant Physiol | 2008 | 2008 Feb | 0 | ts (Affymetrix) using 5 μ g of high-quality total RNA as starting material for each sample as described earlier ( Jain et al., 2007 ). Affymetrix GeneChip Rice Genome Arrays (Gene Expression Omnibus platform accession no. GPL2025 ) were used for microarray analysis. For microarray data analysis, the image (cel) files were imported into ArrayAssist (version 5.0) software. Three biological replic|ponding FL-cDNA and ESTs. FL-cDNA and EST sequences showed minimal alignment over 90% length of the transcript with 95% identity. Microarray data from this article can be found in the Gene Expression Omnibus database at the NCBI under accession numbers GSE6893{{tag}} and GSE6901 . Supplemental Data The following materials are available in the online version of this article. Supplemental Figure S1. Position | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
722 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2825235 | [u'20109239'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Ghanashyam', 'Jain', 'Bhattacharjee'] | ['Jain'] | BMC Genomics | 2010 | 1/29/2010 | 0 | olved in a particular biological process. In the second approach, we used the microarray data for various tissues/organs and developmental stages available at GEO database under the accession numbers GSE6893{{tag}}--REUSE-- and GSE7951. The series GSE6893{{tag}}--REUSE-- includes microarray data from 45 hybridizations representing three biological replicates each of 15 different tissues/organs and developmental stages [ 30 ], whereas| GSE7951 includes the microarray data from 12 hybridizations representing 9 different tissue samples [ 33 ]. Because three biological replicates were available only for stigma and ovary in the series GSE7951 dataset, only these data were used in this analysis. All the tissues/organs and developmental stages for which microarray data was analyzed in this study are summarized in Additional file 5 . The |ted [ 6 , 40 ]. To study the effect of various abiotic stresses (desiccation, salt, cold and arsenate) on the expression profiles of GST genes, microarray data available under series accession number GSE6901 [ 30 ] was analyzed. Differential expression analysis for rice seedlings treated with different abiotic stresses (desiccation, salt and cold) as compared to mock-treated control seedlings was perfo|vely. The control seedlings were kept in water for 3 h, at 28 ± 1°C. Microarray data analysis The microarray data publicly available at GEO database under the series accession numbers GSE6893{{tag}}--REUSE-- (expression data for reproductive development), GSE7951 (expression profiling of stigma), GSE6901 (expression data for stress treatment), GSE4471 (expression data from rice varieties Azucena and Ba| arsenate), GSE5167 (expression data for auxin and cytokinin response), GSE6719 (expression data for cytokinin response), GSE7256 (expression data for virulent infection by Magnaporthe grisea ), and GSE10373 (expression data for interaction with the parasitic plant Striga hermonthica ) were used for expression analysis of rice GST genes. The entire microarray experiments used in this study are listed | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
723 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2576257 | [u'18826656'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | ['Nijhawan', 'Khurana', 'Kapoor', 'Tyagi', 'Arora'] | BMC Genomics | 2008 | 10/1/2008 | 0 | oarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893{{tag}}--REUSE-- and GSE6901). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (< 2 mm panicle), P1-II (0.2 to 0.5 mm panicle) and P1-III (5 to 10 mm panicle) have been added to emphas|00ae; ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software wherein GCRMA algo | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
724 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 2993539 | [u'21044985'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Jhanwar', 'Garg', 'Tyagi', 'Jain'] | ['Tyagi', 'Jain'] | DNA Res | 2010 | 2010 Dec | 0 | rice and TAIR database for Arabidopsis . Another approach used was to search for the proteins containing PFam domain PF00462, which encodes GRX domain. Further, the rice and Arabidopsis proteomes downloaded from RGAP (version 6.1) and TAIR (version 9), respectively, were searched using the hidden Markov model (HMM) profile (build 2.3.2) for GRX domain (PF00462) downloaded from the PFam database. The r| gene in a given library represents the quantitative estimate of the expression of that gene. 2.6. Expression analysis using microarray data For rice, the microarray data available at Gene Expression Omnibus (GEO) database under the series accession numbers GSE6893{{tag}}--REUSE-- (various stages of development), 21 GSE6901 (abiotic stress), 21 GSE4471 (arsenate stress), 22 GSE5167 (auxin and cytokinin treatment), 23|ays as described earlier. 21 For Arabidopsis , the microarray data for various stages of developments, hormone treatments and abiotic stress treatments corresponding to those analyzed for rice were downloaded from GEO (series accession numbers GSE5620, GSE5621, GSE5623, GSE5624 and GSE5629–GSE5634) and AtGenExpress ( http://www.arabidopsis.org/portals/expression/microarray/ATGenExpress.jsp ) dat | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
725 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 1947970 | [u'17640358'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Ray'] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893{{tag}}--DEPOSIT-- and GSE6901. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
726 | GSE6893 | 5/17/2007 | ['6893'] | [] | [u'17293439', u'19490115', u'19788421'] | 1851844 | [u'17293439'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana', 'Ray'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana', 'Ray'] | Plant Physiol | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
727 | GSE6894 | 8/28/2007 | ['6894'] | [] | [u'17881565'] | 2000506 | [u'17881565'] | ['Tsui', 'Leung', 'Zhang', 'Mifflin', 'Chan', 'Li', 'Ho', 'Yuen', 'Chen', 'Powell', 'Kosinski'] | ['Tsui', 'Leung', 'Zhang', 'Mifflin', 'Chan', 'Li', 'Ho', 'Yuen', 'Chen', 'Powell', 'Kosinski'] | ['Tsui', 'Leung', 'Zhang', 'Mifflin', 'Chan', 'Li', 'Ho', 'Yuen', 'Chen', 'Powell', 'Kosinski'] | Proc Natl Acad Sci U S A | 2007 | 9/25/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
728 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2633829 | [u'19074628'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Millar', 'Usadel', 'Carroll', 'Narsai', 'Ivanova', 'Howell', 'Lohse', 'Whelan'] | [] | Plant Physiol | 2009 | 2009 Feb | 1 | later time points. This group was enriched in transcription factors and signal transduction components. A subset of these transiently expressed transcription factors were further interrogated across publicly available rice array data, indicating that some were only expressed during the germination process. Analysis of the 1-kb upstream regions of transcripts displaying similar changes in abundance identi| in the transcript abundance observed for over 6,000 genes in cluster 1). To determine if these were specific to the process of germination, we analyzed the expression of transcription factors across publicly available rice Affymetrix microarray data. These included analyses of over 30 microarrays from different tissues and stress treatments, and following normalization, all data were made relative to max|ession. A, Analysis of the expression profiles of 34 transcription factors that displayed between 70% and 100% of their maximum expression at 1 and 3 HAI in the germination time course, compared with publicly available array data for a variety of rice tissues and treatments. Boxed in yellow are transcription factors that appeared only to be induced during germination (i.e. in this study). Boxed in blue ar|calization. To date, only a few large-scale localization studies have been carried out, so less than 300 could be assigned in this way. In order to overcome this, all protein sequence information was downloaded for the 24,150 genes, and four primary sources were employed: (1) experimentally shown localization based on protein work ( Heazlewood et al., 2003 ; Howell et al., 2006 , 2007 ; Kleffmann et al|ss different tissues under different conditions and compare these with the germination transcript abundance profiles generated from this study, rice array data were retrieved from the Gene Expression Omnibus within the National Center for Biotechnology Information database. All data were MAS5.0 normalized and normalized against average ubiquitin expression for that array. These normalized array data were|r data relative to this. This normalization allowed cross-comparison of arrays from all of the different studies at once. The arrays analyzed included all of the arrays from this study, together with publicly available rice genome arrays carried out from different tissues/conditions, including 7-d-old seedlings that were untreated, drought stressed, salt stressed, or cold stressed ( GSE6901{{tag}}--REUSE-- ; Jain et al.| d following pollination, 10-d-old embryos, 10-d-old endosperms, seedling roots, seedling shoots, unpollinated stigmas (at antithesis), ovaries (at antithesis), mature anthers, and suspension cells ( GSE7951 ; Li et al., 2007 ); aerobically grown coleoptiles (4 d) and anoxically grown coleoptiles (4 d; GSE6908 ; Lasanthi-Kudahettige et al., 2007 ); crowns and growing points under salt stress and con|and tolerant mutants in subspecies indica and japonica ( GSE4438 ; Walia et al., 2007 ); crowns and growing points under control and salt stress conditions in subspecies indica and japonica (GDS1383; Walia et al., 2005 ); and leaves following biotic stress and control treatments ( GSE7256 ; Ribot et al., 2008 ). Promoter Motif Analysis Following expression analysis, distinct groups of transc|ome “peaking” subsets, where a peak is as defined above. 3′ UTR Sequence Analysis The full genome 3′ UTR and 5′ UTR sequences are available from TIGR. This was downloaded and filtered to retain only the 3′ UTRs. However, this only added up to 3,027 UTRs available for the “whole genome.” Taking this small number into consideration, it was not |uer et al., 2005 ), and normalization of matched peak areas to the peak area of the internal standard, ribitol, and to fresh tissue weight of extracted samples. The MSRI library was constructed using publicly available AMDIS software (version 2.65) to extract MSRI information for authentic standard derivatives from standard runs and MSRI information for unknown analytes from representative analyses of com | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
729 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2633852 | [u'19010998'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Taylor', 'Millar', 'Eubel', 'Narsai', 'Whelan', 'Huang'] | [] | Plant Physiol | 2009 | 2009 Feb | 1 | ntal stages and in response to stresses. To analyze the gene expression pattern, we extracted the available rice microarray data from the National Center for Biotechnology Information gene expression omnibus ( http://www.ncbi.nlm.nih.gov/geo ) of six independent studies with relevance for mitochondrial function ( Walia et al., 2005 , 2007 ; Jain et al., 2007 ; Lasanthi-Kudahettige et al., 2007 ; Li e|anges across different tissues and under different conditions and to compare these with the obtained germination transcript abundance profiles, rice array data were retrieved from the Gene Expression Omnibus within the National Center for Biotechnology Information database ( GSE6901{{tag}}--REUSE-- , GSE7951 GSE6908 , GSE4438 , GDS1383, and GSE7256 ). All data were MAS5.0 normalized and normalized against average u|all other data relative to this. This normalization allowed cross-comparison of arrays from all of the different studies at once. The arrays analyzed included all arrays from this study together with publicly available rice genome arrays carried out from different tissues/conditions. Hierarchical clustering across all of the arrays was carried out with average linkage clustering based on Euclidian distanc | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
730 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2923530 | [u'20353606'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Narsai', 'Ivanova', 'Whelan', 'Ng'] | [] | BMC Plant Biol | 2010 | 3/31/2010 | 0 | nd germinating seed Dry seed and anaerobic germination (up to 24 h) and switch conditions cv. Amaroo [ 21 ] E-MEXP-2267 3 36 Imbibed seed Aerobic and anaerobic grown coleoptiles cv. Nipponbare [ 27 ] GSE6908 2 4 Coleoptile Embryo, endosperm, leaf and root from 7-d seedling, 10-d seedling cv. Zhonghua [ 28 ] GSE11966 2 10 Embryo, endosperm, leaf and root from 7-d seedling, 10-d seedling Stigma, Ovary+7 |af, semi apical meristem, inflorescence, seed cv. IR64 [ 30 ] GSE6893 3 45 Mature leaf, young leaf, semi apical meristem, inflorescence, seed ABIOTIC STRESS Drought, salt, cold stress cv. IR64 [ 30 ] GSE6901{{tag}}--REUSE-- 3 12 Seedling Heat stress cv. Zhonghua [ 31 ] GSE14275 3 6 Seedling Salt stress on 2 cultivars; indica, FL478 (salt tolerant), indica, IR29 (salt sensitive) [ 32 ] GSE3053 3 11 Crown and growing po| tolerant) [ 33 ] GSE4438 3 24 Panicle initiation stage Salt stress on root using 4 cultivars; FL478 (salt tolerant), IR29 (salt sensitive), IR63731 (salt tolerant), Pokkali (salt tolerant) Not found GSE14403 3 23 Root Fe and P treatments cv. Nipponbare [ 34 ] GSE17245 2 16 Root Arsenate treatment cv. Azucena [ 35 ] GSE4471 3 12 Seedling Physical stress at roots tips cv. Bala [ 35 ] GSE10857 3 12 Root |pponbare (resistant), IAC165 (susceptible) [ 36 ] GSE10373 2 24 Root M.grisea blast fungus infection cv. Nipponbare [ 37 ] GSE7256 2 8 Leaf Rice stripe virus infection cv. WuYun3, KT95-418 Not found GSE11025 3 12 Seedling Infection with bacteria X.Oryzae pv. oryzicola and oryzae cv. Nipponbare Not found GSE16793 4 60 Whole-plant tissue HORMONE TREATMENT Cytokinin treatment on root and leaf cv. Nippo|more tissue/stress microarray experiments, notably, this included Actin1 (LOC_Os05g36290.1; Gene 14 in Table 2 ) which was not expressed in all 3 biological replicates of the semi apical meristem (GSE6901{{tag}}) (Figure 2C ). Similarly, a recent study in rice defined a set of 248 stably expressed genes across 40 developmental tissues that were analysed using Yale/BGI oligonucleotide microarrays [ 22 ]. O|stance. The clustering analysis and heatmap generation was carried out using Partek Genomics Suite, version 6.3 (Partek). For the Agilent microarray comparison, data was retrieved under the accession GSE8518 from the Gene Expression Omnibus within the National Centre for Biotechnology Information database. Analysis of orthologues The InParanoid: Eukaryotic Orthologue Groups database (version 7.0) was u | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
731 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2528094 | [u'18650402'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Liu', 'Zhao', 'Lu', 'Han', 'Huang'] | [] | Plant Physiol | 2008 | 2008 Sep | 0 | a sequences, by BACs directly or by 87 assembled contigs, were performed. The alignment results of BGI 93-11 contigs and Nipponbare pseudomolecules, which were generated by the software nucmer, were downloaded using the GFF Dumper on the TIGR Genome Browser. We found that a small quantity of anchor results were self-contradictory; that is, two 93-11 contigs that localized on the same location yielded opp| more than 100 bp were further confirmed by BLAST2. The indica Guangluai 4 BACs were obtained from http://www.ncgr.ac.cn/chinese/databasei.htm . The genomic sequences of japonica Nipponbare were downloaded from http://www.tigr.org/tdb/e2k1/osa1 , and the indica 93-11 sequences were downloaded from ftp://ftp.genomics.org.cn . Mining of TIPs in the Rice Genome For each insertion region identified a| known TE repeat databases using RepeatMasker, as described above. Those elements, which were composed of a single LTR, were recognized as solo LTR retroelements. EST Analysis and Gene Prediction All publicly available rice ESTs were obtained from the National Center for Biotechnology Information EST database ( http://www.ncbi.nlm.nih.gov/projects/dbEST/ ). Full-length cDNAs of both KOME ( http://red.dna.|t the two gene fragments of indica XIP-I separated by a TE insertion. The probes in the two probe sets were remapped to the rice genomes, Nipponbare pseudomolecules and 93-11 contigs, by BLASTN. We downloaded the microarray data files of each experiment from the GEO Web site ( http://www.ncbi.nlm.nih.gov/geo/ ). Overall, there are 57 chips of indica IR64 (45 from GSE6893 and 12 from GSE6901{{tag}}--REUSE-- ) and 4 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
732 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2726226 | [u'19604350'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Feng', 'Zhu', 'Wang', 'Zhang', 'Liu', 'Wu'] | [] | BMC Genomics | 2009 | 7/15/2009 | 0 | r, a site-specific posterior analysis was used to predict amino acid residues that were crucial for functional divergence. Investigation of transcription patterns Gene expression microarray datasets (GSE7951, GSE13161, GSE6893, GSE6908, and GSE6901{{tag}}--REUSE-- for rice; GSE680, GSE7641, and GSE8365 for Arabidopsis ) were downloaded from the GEO database in NCBI. The microarray data of rice include the analysis of | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
733 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2882264 | [u'20423940'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Zhao', 'Ma'] | [] | J Exp Bot | 2010 | 2010 Jun | 0 | base at the NCBI website ( http://www.ncbi.nlm.nih.gov/geo/ ) and the Rice Functional Genomic Express Database ( http://signal.salk.edu/cgi-bin/RiceGE ). For temporal and spatial expression analysis (GSE6893), different stages of panicle and seed development were categorized according to panicle length and days after pollination, respectively, based on landmark developmental events as follows. (i) Pani| morphogenesis (S3); 11–20 DAP, embryo maturation (S4); 21–29 DAP, dormancy and desiccation tolerance (S5) ( Itoh et al. , 2005 , Jain et al. , 2007 ). For abiotic stress analysis (GSE6901{{tag}}), rice seedlings were transferred to a beaker containing 200 mM NaCl solution for salt stress, dried between folds of tissue paper at 28±1 °C in a culture room for d|of Arabidopsis AGP-encoding genes were downloaded using ‘Bulk Gene Download’ at Nottingham Arabidopsis Stock Centre's microarray database, and the results of developmental stages (GSE5629–5633) and stress treatments (GSE5620–5621 and 5623–5624) were used to analyse the expression of AGP-encoding genes in Arabidopsis ( http://affymetrix.arabidopsis.info/narr|les from UniGene at http://www.ncbi.nlm.nih.gov/unigene/ ; 12, MPSS tags, http://mpss.udel.edu/rice/ ; 13, the absolute signal values were downloaded at http://signal.salk.edu/cgi-bin/RiceGE ; 14, GSE6893, expression at various developmental stages; 15, GSE661, expression under ABA and GA treatments; 16, GSE6901{{tag}}, expression under abiotic stresses treatments. All PAST-rich proteins used for final ana|and OsAGP22 , and OsAGP25 and OsAGP26 ) ( Fig. 6 ). Fig. 6. Expression profiles of AGP-encoding genes in various rice organs and tissues at different development stages. The microarray data sets (GSE6893) of gene expression at various developmental stages were used for cluster display. A heat map representing hierarchical clustering of average log signal values of all the AGP-encoding genes in vari|6 OsAffx-28335-1-S1_at 5696.2 233.3 0.04 Y OsLLA7 Os-9342-1-S1_at 3120.1 1746.4 0.56 OsLLA8 OsAffx-18220-1-S1_x_at 87.0 212.5 2.44 Y a The probe ID is used on microarray plate GPL2025. The experiment GSE6893 was used for differential expression analysis. b MOV, the maximum absolute values of vegetative tissues. c MOR, the maximum absolute values of reproductive tissues. d The values of MOR divided by M|panicles. Expression analysis of rice AGP-encoding genes under abiotic stress, ABA, and GA treatments To investigate the abiotic stress response of rice AGP-encoding genes, the results of microarray (GSE6901{{tag}}) from 7-day-old seedlings subjected to drought, salt, and cold stresses were analysed. The data revealed that a total of 15 genes were significantly down- or up-regulated (<0.5 or >|x02009;h, and down-regulated by drought and salt stresses ( Fig. 8 ). Fig. 8. Expression profiles of rice AGP-encoding genes differentially expressed under abiotic stresses. The microarray data sets (GSE6901{{tag}}--REUSE--) of gene expression under various abiotic stresses were used for cluster display. The average log signal values of AGP-encoding genes under control and various stress conditions (indicated at the t|values) is shown at the bottom. CK, control; DS, drought stress; SS, salt stress; CS, cold stress. The results of ABA- and GA-treated callus were used to analyse the regulation of AGP-encoding genes (GSE661). It was found that many AGP-encoding genes were regulated by ABA and GA ( Supplementary Table S5 at JXB online). To verify the results of microarray under ABA and GA treatments, the transcriptio | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
734 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2761422 | [u'19758430'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['May', 'Vanholme', 'Davey', 'Swennen', 'Graham', 'Keulemans'] | [] | BMC Genomics | 2009 | 9/16/2009 | 1 | AND pmc_gds | 0 | 1 | ||||
735 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2576257 | [u'18826656'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Kapoor', 'Lama', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana'] | ['Nijhawan', 'Khurana', 'Kapoor', 'Tyagi', 'Arora'] | BMC Genomics | 2008 | 10/1/2008 | 0 | otein sequences of rice and Arabidopsis . Protein sequences from Drosophila, humans, yeast and C. elegans were included as outgroups. Plant-specific clades have been shaded. Protein sequences were downloaded from National Center for Biotechnology Information (NCBI). Accession numbers and abbreviations are as follows: HsDcr1, Homo sapiens Dicer-1 (NP_085124); DmDcr1, Drosophila melanogaster , Dicer-1|calization and phylogenetic analysis Name search and Hidden Markov Model (HMM) analysis was employed to search for Dicer-like, Argonautes and RDRs genes encoded in the rice genome. The sequences were downloaded from TIGR, release 5 . An HMM profile was generated using HMMER 2.1.1 software package which was then used to search the proteome database of rice available at TIGR using the Basic Local Alignme|the previously identified genes and on the basis of their phylogenetic relatedness to other members of the same family. ORF length, and details of encoded proteins (length, PI, molecular weight) were downloaded from TIGR. For proteins whose molecular weight and PI was not available in TIGR, Gene Runner program version 3.04 was used to calculate the same. Genes were localized on the chromosomes based on th|iles for 22 stages of vegetative and reproductive development and stress response in rice. Of these, microarray analysis of 17 stages was described previously [ 27 ]; deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901{{tag}}). Here, five more stages viz, Y-leaf, SAM (shoot apical meristem), P1-I (<|idopsis expression analysis To analyze the expression of Arabidopsis genes, Affymetrix GeneChip ® ATH1 Genome Array data for 21 stages (55 .cel files) comparable to that used for rice were downloaded from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. The data were imported| as reference. Click here for file Additional file 4 GCRMA normalized expression values for Dicer-like, Argonaute and RDR genes in Arabidopsis from microarray data in public domain (Gene Expression Omnibus, ). Click here for file Acknowledgements TIGR database resources are acknowledged for making available the detailed sequence information on rice. Expression data for Arabidopsis has been obtained | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
736 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2825235 | [u'20109239'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Ghanashyam', 'Jain', 'Bhattacharjee'] | ['Jain'] | BMC Genomics | 2010 | 1/29/2010 | 0 | olved in a particular biological process. In the second approach, we used the microarray data for various tissues/organs and developmental stages available at GEO database under the accession numbers GSE6893 and GSE7951. The series GSE6893 includes microarray data from 45 hybridizations representing three biological replicates each of 15 different tissues/organs and developmental stages [ 30 ], whereas| GSE7951 includes the microarray data from 12 hybridizations representing 9 different tissue samples [ 33 ]. Because three biological replicates were available only for stigma and ovary in the series GSE7951 dataset, only these data were used in this analysis. All the tissues/organs and developmental stages for which microarray data was analyzed in this study are summarized in Additional file 5 . The |ted [ 6 , 40 ]. To study the effect of various abiotic stresses (desiccation, salt, cold and arsenate) on the expression profiles of GST genes, microarray data available under series accession number GSE6901{{tag}}--REUSE-- [ 30 ] was analyzed. Differential expression analysis for rice seedlings treated with different abiotic stresses (desiccation, salt and cold) as compared to mock-treated control seedlings was perfo|vely. The control seedlings were kept in water for 3 h, at 28 ± 1°C. Microarray data analysis The microarray data publicly available at GEO database under the series accession numbers GSE6893 (expression data for reproductive development), GSE7951 (expression profiling of stigma), GSE6901{{tag}}--REUSE-- (expression data for stress treatment), GSE4471 (expression data from rice varieties Azucena and Ba| arsenate), GSE5167 (expression data for auxin and cytokinin response), GSE6719 (expression data for cytokinin response), GSE7256 (expression data for virulent infection by Magnaporthe grisea ), and GSE10373 (expression data for interaction with the parasitic plant Striga hermonthica ) were used for expression analysis of rice GST genes. The entire microarray experiments used in this study are listed | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
737 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2993539 | [u'21044985'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Jhanwar', 'Garg', 'Tyagi', 'Jain'] | ['Tyagi', 'Jain'] | DNA Res | 2010 | 2010 Dec | 0 | rice and TAIR database for Arabidopsis . Another approach used was to search for the proteins containing PFam domain PF00462, which encodes GRX domain. Further, the rice and Arabidopsis proteomes downloaded from RGAP (version 6.1) and TAIR (version 9), respectively, were searched using the hidden Markov model (HMM) profile (build 2.3.2) for GRX domain (PF00462) downloaded from the PFam database. The r| gene in a given library represents the quantitative estimate of the expression of that gene. 2.6. Expression analysis using microarray data For rice, the microarray data available at Gene Expression Omnibus (GEO) database under the series accession numbers GSE6893 (various stages of development), 21 GSE6901{{tag}}--REUSE-- (abiotic stress), 21 GSE4471 (arsenate stress), 22 GSE5167 (auxin and cytokinin treatment), 23|ays as described earlier. 21 For Arabidopsis , the microarray data for various stages of developments, hormone treatments and abiotic stress treatments corresponding to those analyzed for rice were downloaded from GEO (series accession numbers GSE5620, GSE5621, GSE5623, GSE5624 and GSE5629–GSE5634) and AtGenExpress ( http://www.arabidopsis.org/portals/expression/microarray/ATGenExpress.jsp ) dat | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
738 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 1947970 | [u'17640358'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Singh', 'Ray'] | ['Agarwal', 'Kapoor', 'Tyagi', 'Arora', 'Ray'] | BMC Genomics | 2007 | 7/18/2007 | 0 | Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [ 79 ] and |d 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901{{tag}}--DEPOSIT--. QPCR Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [ 81 ]. In brief, primers were designed for all the genes | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
739 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 2245831 | [u'18065552'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Nijhawan', 'Khurana', 'Tyagi', 'Jain'] | ['Nijhawan', 'Khurana', 'Tyagi', 'Jain'] | Plant Physiol | 2008 | 2008 Feb | 0 | ts (Affymetrix) using 5 μ g of high-quality total RNA as starting material for each sample as described earlier ( Jain et al., 2007 ). Affymetrix GeneChip Rice Genome Arrays (Gene Expression Omnibus platform accession no. GPL2025 ) were used for microarray analysis. For microarray data analysis, the image (cel) files were imported into ArrayAssist (version 5.0) software. Three biological replic|ponding FL-cDNA and ESTs. FL-cDNA and EST sequences showed minimal alignment over 90% length of the transcript with 95% identity. Microarray data from this article can be found in the Gene Expression Omnibus database at the NCBI under accession numbers GSE6893 and GSE6901{{tag}}--DEPOSIT-- . Supplemental Data The following materials are available in the online version of this article. Supplemental Figure S1. Position | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
740 | GSE6901 | 5/17/2007 | ['6901'] | [] | [u'17293439', u'19490115', u'19788421'] | 1851844 | [u'17293439'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Malik', 'Mohan', 'Khurana', 'Deveshwar', 'Ray'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana', 'Ray'] | ['Agarwal', 'Kapoor', 'Sharma', 'Jain', 'Tyagi', 'Nijhawan', 'Arora', 'Khurana', 'Ray'] | Plant Physiol | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
741 | GSE6903 | 8/3/2007 | ['6903'] | ['3056'] | [u'17431181'] | 2801700 | [u'20003344'] | [u'Tumannov', 'Wang', 'Lo', 'Bamji', 'Getz', 'Fu', 'Yao', 'Reardon', 'Tumanov'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | [] | BMC Genomics | 2009 | 12/11/2009 | 0 | tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302/E-GEOD-8302 Hs Liver Affymetrix 2 [ 55 ] GSE8302/E-GEOD-8302 Mm Liver Affymetrix 3 [ 55 ] GSE8302/E-GEOD-8302 Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291/E-GEOD-8291 Mm Liver Affymetrix 6 [ 55 ] GSE8292/E-GEOD-8292 Mm Liver Affymetrix 7 [ 55 ] GSE8295/E-GEOD-8295 Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753/E-GEOD-8753 Mm Liver Affymetrix 9 [ 57 ] GSE6903{{tag}}--REUSE--/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302, GSE9291 and GSE9290). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
742 | GSE6904 | 10/31/2007 | ['6904'] | ['3107'] | [u'18021443'] | 2216081 | [u'18021443'] | ['Piontkivska', 'Mintz', 'Porterfield'] | ['Piontkivska', 'Mintz', 'Porterfield'] | ['Piontkivska', 'Mintz', 'Porterfield'] | BMC Neurosci | 2007 | 11/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
743 | GSE6907 | 3/1/2007 | ['6907'] | ['2780'] | [u'17547211'] | 2862047 | [u'20353558'] | ['Okabe', 'Shimazaki', 'Yokoo', 'Kawata'] | ['Lam', 'Gong', 'Mathavan', 'Ung', 'Korzh', 'Winata', 'Hlaing'] | [] | BMC Genomics | 2010 | 3/30/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
744 | GSE6908 | 3/19/2007 | ['6908'] | ['2631'] | [u'17369434'] | 2633829 | [u'19074628'] | ['Vitulli', 'Licausi', 'Beretta', 'Novi', 'Magneschi', 'Gonzali', 'Loreti', 'Perata', 'Lasanthi-Kudahettige', 'Alpi'] | ['Millar', 'Usadel', 'Carroll', 'Narsai', 'Ivanova', 'Howell', 'Lohse', 'Whelan'] | [] | Plant Physiol | 2009 | 2009 Feb | 1 | later time points. This group was enriched in transcription factors and signal transduction components. A subset of these transiently expressed transcription factors were further interrogated across publicly available rice array data, indicating that some were only expressed during the germination process. Analysis of the 1-kb upstream regions of transcripts displaying similar changes in abundance identi| in the transcript abundance observed for over 6,000 genes in cluster 1). To determine if these were specific to the process of germination, we analyzed the expression of transcription factors across publicly available rice Affymetrix microarray data. These included analyses of over 30 microarrays from different tissues and stress treatments, and following normalization, all data were made relative to max|ession. A, Analysis of the expression profiles of 34 transcription factors that displayed between 70% and 100% of their maximum expression at 1 and 3 HAI in the germination time course, compared with publicly available array data for a variety of rice tissues and treatments. Boxed in yellow are transcription factors that appeared only to be induced during germination (i.e. in this study). Boxed in blue ar|calization. To date, only a few large-scale localization studies have been carried out, so less than 300 could be assigned in this way. In order to overcome this, all protein sequence information was downloaded for the 24,150 genes, and four primary sources were employed: (1) experimentally shown localization based on protein work ( Heazlewood et al., 2003 ; Howell et al., 2006 , 2007 ; Kleffmann et al|ss different tissues under different conditions and compare these with the germination transcript abundance profiles generated from this study, rice array data were retrieved from the Gene Expression Omnibus within the National Center for Biotechnology Information database. All data were MAS5.0 normalized and normalized against average ubiquitin expression for that array. These normalized array data were|r data relative to this. This normalization allowed cross-comparison of arrays from all of the different studies at once. The arrays analyzed included all of the arrays from this study, together with publicly available rice genome arrays carried out from different tissues/conditions, including 7-d-old seedlings that were untreated, drought stressed, salt stressed, or cold stressed ( GSE6901 ; Jain et al.| d following pollination, 10-d-old embryos, 10-d-old endosperms, seedling roots, seedling shoots, unpollinated stigmas (at antithesis), ovaries (at antithesis), mature anthers, and suspension cells ( GSE7951 ; Li et al., 2007 ); aerobically grown coleoptiles (4 d) and anoxically grown coleoptiles (4 d; GSE6908{{tag}} ; Lasanthi-Kudahettige et al., 2007 ); crowns and growing points under salt stress and con|and tolerant mutants in subspecies indica and japonica ( GSE4438 ; Walia et al., 2007 ); crowns and growing points under control and salt stress conditions in subspecies indica and japonica (GDS1383; Walia et al., 2005 ); and leaves following biotic stress and control treatments ( GSE7256 ; Ribot et al., 2008 ). Promoter Motif Analysis Following expression analysis, distinct groups of transc|ome “peaking” subsets, where a peak is as defined above. 3′ UTR Sequence Analysis The full genome 3′ UTR and 5′ UTR sequences are available from TIGR. This was downloaded and filtered to retain only the 3′ UTRs. However, this only added up to 3,027 UTRs available for the “whole genome.” Taking this small number into consideration, it was not |uer et al., 2005 ), and normalization of matched peak areas to the peak area of the internal standard, ribitol, and to fresh tissue weight of extracted samples. The MSRI library was constructed using publicly available AMDIS software (version 2.65) to extract MSRI information for authentic standard derivatives from standard runs and MSRI information for unknown analytes from representative analyses of com | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
745 | GSE6908 | 3/19/2007 | ['6908'] | ['2631'] | [u'17369434'] | 2633852 | [u'19010998'] | ['Vitulli', 'Licausi', 'Beretta', 'Novi', 'Magneschi', 'Gonzali', 'Loreti', 'Perata', 'Lasanthi-Kudahettige', 'Alpi'] | ['Taylor', 'Millar', 'Eubel', 'Narsai', 'Whelan', 'Huang'] | [] | Plant Physiol | 2009 | 2009 Feb | 1 | ntal stages and in response to stresses. To analyze the gene expression pattern, we extracted the available rice microarray data from the National Center for Biotechnology Information gene expression omnibus ( http://www.ncbi.nlm.nih.gov/geo ) of six independent studies with relevance for mitochondrial function ( Walia et al., 2005 , 2007 ; Jain et al., 2007 ; Lasanthi-Kudahettige et al., 2007 ; Li e|anges across different tissues and under different conditions and to compare these with the obtained germination transcript abundance profiles, rice array data were retrieved from the Gene Expression Omnibus within the National Center for Biotechnology Information database ( GSE6901 , GSE7951 GSE6908{{tag}}--REUSE-- , GSE4438 , GDS1383, and GSE7256 ). All data were MAS5.0 normalized and normalized against average u|all other data relative to this. This normalization allowed cross-comparison of arrays from all of the different studies at once. The arrays analyzed included all arrays from this study together with publicly available rice genome arrays carried out from different tissues/conditions. Hierarchical clustering across all of the arrays was carried out with average linkage clustering based on Euclidian distanc | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
746 | GSE6908 | 3/19/2007 | ['6908'] | ['2631'] | [u'17369434'] | 2923530 | [u'20353606'] | ['Vitulli', 'Licausi', 'Beretta', 'Novi', 'Magneschi', 'Gonzali', 'Loreti', 'Perata', 'Lasanthi-Kudahettige', 'Alpi'] | ['Narsai', 'Ivanova', 'Whelan', 'Ng'] | [] | BMC Plant Biol | 2010 | 3/31/2010 | 0 | nd germinating seed Dry seed and anaerobic germination (up to 24 h) and switch conditions cv. Amaroo [ 21 ] E-MEXP-2267 3 36 Imbibed seed Aerobic and anaerobic grown coleoptiles cv. Nipponbare [ 27 ] GSE6908{{tag}}--REUSE-- 2 4 Coleoptile Embryo, endosperm, leaf and root from 7-d seedling, 10-d seedling cv. Zhonghua [ 28 ] GSE11966 2 10 Embryo, endosperm, leaf and root from 7-d seedling, 10-d seedling Stigma, Ovary+7 |af, semi apical meristem, inflorescence, seed cv. IR64 [ 30 ] GSE6893 3 45 Mature leaf, young leaf, semi apical meristem, inflorescence, seed ABIOTIC STRESS Drought, salt, cold stress cv. IR64 [ 30 ] GSE6901 3 12 Seedling Heat stress cv. Zhonghua [ 31 ] GSE14275 3 6 Seedling Salt stress on 2 cultivars; indica, FL478 (salt tolerant), indica, IR29 (salt sensitive) [ 32 ] GSE3053 3 11 Crown and growing po| tolerant) [ 33 ] GSE4438 3 24 Panicle initiation stage Salt stress on root using 4 cultivars; FL478 (salt tolerant), IR29 (salt sensitive), IR63731 (salt tolerant), Pokkali (salt tolerant) Not found GSE14403 3 23 Root Fe and P treatments cv. Nipponbare [ 34 ] GSE17245 2 16 Root Arsenate treatment cv. Azucena [ 35 ] GSE4471 3 12 Seedling Physical stress at roots tips cv. Bala [ 35 ] GSE10857 3 12 Root |pponbare (resistant), IAC165 (susceptible) [ 36 ] GSE10373 2 24 Root M.grisea blast fungus infection cv. Nipponbare [ 37 ] GSE7256 2 8 Leaf Rice stripe virus infection cv. WuYun3, KT95-418 Not found GSE11025 3 12 Seedling Infection with bacteria X.Oryzae pv. oryzicola and oryzae cv. Nipponbare Not found GSE16793 4 60 Whole-plant tissue HORMONE TREATMENT Cytokinin treatment on root and leaf cv. Nippo|more tissue/stress microarray experiments, notably, this included Actin1 (LOC_Os05g36290.1; Gene 14 in Table 2 ) which was not expressed in all 3 biological replicates of the semi apical meristem (GSE6901) (Figure 2C ). Similarly, a recent study in rice defined a set of 248 stably expressed genes across 40 developmental tissues that were analysed using Yale/BGI oligonucleotide microarrays [ 22 ]. O|stance. The clustering analysis and heatmap generation was carried out using Partek Genomics Suite, version 6.3 (Partek). For the Agilent microarray comparison, data was retrieved under the accession GSE8518 from the Gene Expression Omnibus within the National Centre for Biotechnology Information database. Analysis of orthologues The InParanoid: Eukaryotic Orthologue Groups database (version 7.0) was u | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
747 | GSE6908 | 3/19/2007 | ['6908'] | ['2631'] | [u'17369434'] | 2528094 | [u'18650402'] | ['Vitulli', 'Licausi', 'Beretta', 'Novi', 'Magneschi', 'Gonzali', 'Loreti', 'Perata', 'Lasanthi-Kudahettige', 'Alpi'] | ['Liu', 'Zhao', 'Lu', 'Han', 'Huang'] | [] | Plant Physiol | 2008 | 2008 Sep | 0 | a sequences, by BACs directly or by 87 assembled contigs, were performed. The alignment results of BGI 93-11 contigs and Nipponbare pseudomolecules, which were generated by the software nucmer, were downloaded using the GFF Dumper on the TIGR Genome Browser. We found that a small quantity of anchor results were self-contradictory; that is, two 93-11 contigs that localized on the same location yielded opp| more than 100 bp were further confirmed by BLAST2. The indica Guangluai 4 BACs were obtained from http://www.ncgr.ac.cn/chinese/databasei.htm . The genomic sequences of japonica Nipponbare were downloaded from http://www.tigr.org/tdb/e2k1/osa1 , and the indica 93-11 sequences were downloaded from ftp://ftp.genomics.org.cn . Mining of TIPs in the Rice Genome For each insertion region identified a| known TE repeat databases using RepeatMasker, as described above. Those elements, which were composed of a single LTR, were recognized as solo LTR retroelements. EST Analysis and Gene Prediction All publicly available rice ESTs were obtained from the National Center for Biotechnology Information EST database ( http://www.ncbi.nlm.nih.gov/projects/dbEST/ ). Full-length cDNAs of both KOME ( http://red.dna.|t the two gene fragments of indica XIP-I separated by a TE insertion. The probes in the two probe sets were remapped to the rice genomes, Nipponbare pseudomolecules and 93-11 contigs, by BLASTN. We downloaded the microarray data files of each experiment from the GEO Web site ( http://www.ncbi.nlm.nih.gov/geo/ ). Overall, there are 57 chips of indica IR64 (45 from GSE6893 and 12 from GSE6901 ) and 4 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
748 | GSE6908 | 3/19/2007 | ['6908'] | ['2631'] | [u'17369434'] | 2726226 | [u'19604350'] | ['Vitulli', 'Licausi', 'Beretta', 'Novi', 'Magneschi', 'Gonzali', 'Loreti', 'Perata', 'Lasanthi-Kudahettige', 'Alpi'] | ['Feng', 'Zhu', 'Wang', 'Zhang', 'Liu', 'Wu'] | [] | BMC Genomics | 2009 | 7/15/2009 | 0 | r, a site-specific posterior analysis was used to predict amino acid residues that were crucial for functional divergence. Investigation of transcription patterns Gene expression microarray datasets (GSE7951, GSE13161, GSE6893, GSE6908{{tag}}--REUSE--, and GSE6901 for rice; GSE680, GSE7641, and GSE8365 for Arabidopsis ) were downloaded from the GEO database in NCBI. The microarray data of rice include the analysis of | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
749 | GSE6908 | 3/19/2007 | ['6908'] | ['2631'] | [u'17369434'] | 1913783 | [u'17369434'] | ['Vitulli', 'Licausi', 'Beretta', 'Novi', 'Magneschi', 'Gonzali', 'Loreti', 'Perata', 'Lasanthi-Kudahettige', 'Alpi'] | ['Vitulli', 'Licausi', 'Beretta', 'Novi', 'Magneschi', 'Gonzali', 'Loreti', 'Perata', 'Lasanthi-Kudahettige', 'Alpi'] | ['Vitulli', 'Licausi', 'Beretta', 'Novi', 'Magneschi', 'Gonzali', 'Loreti', 'Perata', 'Lasanthi-Kudahettige', 'Alpi'] | Plant Physiol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
750 | GSE6916 | 1/31/2007 | ['6916'] | ['2719'] | [u'15496517'] | 2944139 | [u'20819218'] | ['', 'Griswold', 'Shima', 'Uzumcu', 'Skinner', 'Small'] | ['Lam', 'Ye', 'Dasgupta', 'Li'] | [] | BMC Syst Biol | 2010 | 9/6/2010 | 0 | roach is highly efficient to identify evolutionarily conserved gene modules and novel genes in meiotic prophase. Methods Metagene construction Pairwise ortholog groups of yeast, mouse, and human were downloaded from Inparanoid, a database of eukaryotic orthologs [ 52 ]. Only seed orthologs found through a reciprocal best match between two genomes were kept for metagene construction. Three types of metagen|gned to only one metagene. Microarray expression profiles Four time-series microarray studies were selected to investigate global gene expression of meiotic prophase in yeast, mouse postnatal testis (GSE12769), mouse embryonic ovary (GSE6916{{tag}}--REUSE--), and human fetal ovary (GSE15431) [ 5 , 6 , 8 , 9 ]. These experiments all used the Affymetrix microarray platform. For the yeast experiment, gene expression was | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
751 | GSE6916 | 1/31/2007 | ['6916'] | ['2719'] | [u'15496517'] | 2711087 | [u'19538736'] | ['', 'Griswold', 'Shima', 'Uzumcu', 'Skinner', 'Small'] | ['Nedorezov', 'Karmazin', 'Forabosco', 'Ottolenghi', 'Uda', 'Cao', 'Piao', 'Schlessinger', 'Garcia-Ortiz', 'Cole', 'Pelosi', 'Omari'] | [] | BMC Dev Biol | 2009 | 6/18/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
752 | GSE6917 | 7/18/2007 | ['6917'] | [] | [u'17616710'] | 3016711 | [u'17616710'] | ['Stoner', 'Nines', 'Dombkowski', 'Salagrama', 'Cukovic', 'Kresty', 'Reen', 'Mele'] | ['Stoner', 'Nines', 'Dombkowski', 'Salagrama', 'Cukovic', 'Kresty', 'Reen', 'Mele'] | ['Stoner', 'Nines', 'Dombkowski', 'Salagrama', 'Cukovic', 'Kresty', 'Reen', 'Mele'] | Cancer Res | 2007 | 7/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
753 | GSE6919 | 1/30/2007 | ['6919'] | ['2547', '2546', '2545'] | [u'17430594', u'15254046'] | 2584944 | [u'18941464'] | ['', 'Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Wang', 'Tong', 'Sun', 'Cai', 'Chen', 'Xie'] | [] | Br J Cancer | 2008 | 11/18/2008 | 0 | ession was changed with PCa progression After analysing the Cyr61 expression profile in the gene expression array-based database of human metastatic prostate tumours and primary prostate tumours ( GSE6919{{tag}}--REUSE-- ), we found that the normal prostate tissues showed very low level of Cyr61 expression, whereas the tumour-adjacent tissues and the tumour tissues demonstrated a significantly higher expression lev|tudy further showed that the metastasis of Du145 was inhibited by the silencing of Cyr61. A microarray analysis was conducted ( Bacac et al , 2006 ) (data accessible at NCBI GEO database, accession GSE5945 ) to evaluate the response of stromal cells to tumour invasion. Cyr61 expression level was found higher in the stromal cells from invasive prostate tumour tissue than that in the stromal cells from | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
754 | GSE6919 | 1/30/2007 | ['6919'] | ['2547', '2546', '2545'] | [u'17430594', u'15254046'] | 2935420 | [u'20823315'] | ['', 'Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Nam'] | [] | Bioinformatics | 2010 | 9/15/2010 | 0 | mal/cancer prostate expression data (Chandran et al. , 2007 ; Yu et al. , 2004 ). The data are available from the GEO database ( http://www.ncbi.nlm.nih.gov/projects/geo ) with the series number GSE6919{{tag}}--REUSE-- . We used the 171 samples with the platform of HG-U95A. The data are composed of 18 normal samples without any pathogenetic alterations, 63 normal samples from cells adjacent to prostate tumors, 65 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
755 | GSE6919 | 1/30/2007 | ['6919'] | ['2547', '2546', '2545'] | [u'17430594', u'15254046'] | 2880288 | [u'20459635'] | ['', 'Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Patel', 'Butte'] | [] | BMC Med Genomics | 2010 | 5/6/2010 | 0 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 | |
756 | GSE6919 | 1/30/2007 | ['6919'] | ['2547', '2546', '2545'] | [u'17430594', u'15254046'] | 2978222 | [u'20969778'] | ['', 'Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux', 'Habra'] | [] | BMC Bioinformatics | 2010 | 10/22/2010 | 0 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 | |
757 | GSE6919 | 1/30/2007 | ['6919'] | ['2547', '2546', '2545'] | [u'17430594', u'15254046'] | 2880990 | [u'20433688'] | ['', 'Michalopoulos', 'Jing', 'Ma', 'Luo', 'McDonald', 'Becich', 'Finkelstein', 'Thomas', 'Bisceglia', 'Liu', 'Ren', 'Nelson', 'Liang', 'Dhir', 'Landsittel', 'Yu', 'Lyons-Weiler', 'Monzon', 'Chandran'] | ['Berger', 'Michiels', 'Pierre', 'Depiereux', 'DeHertogh', 'Bareke', 'DeMeulder', 'Gaigneaux'] | [] | BMC Cancer | 2010 | 4/30/2010 | 0 | Abstract article-meta The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes th|ferences  The rapid accumulation of high-throughput genomic data offers an unprecedented opportunity to study human diseases. The National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) ( 1 ) with more than 330,000 gene expression profiles and an annual growth rate of 150%, is currently the largest database of its kind. The GEO systematically documents the molecular basis of m|rmance after Stage II refinement. ( B ) An example illustrating the error correction by the Stage II refinement. The query profile studies uterine leiomyomas obtained from fibroid afflicted patients (GDS484). The profile is annotated with four concepts by UMLS text mapping: Connective/Soft Tissue Neoplasm, Muscle tissue neoplasm, fibroid tumor, and uterine fibroids. The Stage I diagnosis predicted four| prediction is later corrected by Stage II refinement. ( C ) The figure presents the 110 disease classes and their hierarchical relationships. The red nodes represent diagnosed disease concepts for GDS563: (1) Nervous system disorder (2) Neuromuscular diseases (3) Myopathy (4) Musculoskeletal diseases (5) Congenital, Hereditary, and Neonatal diseases and abnormalities (CHNDA) (6) Genetic diseases, in| prediction performance decreases with the data reduction. Table 1. Prediction result of a subset of prevalent diseases We further exemplify the performance of our approach using the NCBI GEO dataset GDS563. This dataset was produced to identify modifying factors and pathogenic pathways involved in Duchenne Muscular Dystrophy (DMD). It consists of 24 microarrays from two subsets: 12 from DMD patients a|mance is shown in SI Text . A closer examination of the results shows further interesting features of our method. One example comes from the result for a query profiling the T-cells of HIV patients (GDS2649). Even though HIV is not included in the 110 disease classes of our diagnosis database due to the lack of sufficient training data, we obtain the relevant concept RNA virus infection that can descr|.org/cgi/content/full/0912043107/DCSupplemental .  Other Sections� Abstract Results Discussion Methods Supplementary Material References References 1. Edgar R, Domrachev M, Lash AE. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002; 30 :207�210. [ PMC free article ] [ PubMed ] 2. Horton PB, Kiseleva L, Fujibuchi W. RaPiDS: an algorith|ssion: directed search of large microarray compendia. Bioinformatics. 2007; 23 (20):2692�2699. [ PubMed ] 5. Zhu Y, et al. GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008; 24 (23):2798�2800. [ PMC free article ] [ PubMed ] 6. Shah NH, et al. Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics|e expression: directed search of large microarray compendia. Bioinformatics. 2007 Oct 15; 23(20):2692-9. [Bioinformatics. 2007] GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008 Dec 1; 24(23):2798-800. [Bioinformatics. 2008] Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics. 2009 Feb 5; 10 Suppl 2():S1. |e expression: directed search of large microarray compendia. Bioinformatics. 2007 Oct 15; 23(20):2692-9. [Bioinformatics. 2007] GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008 Dec 1; 24(23):2798-800. [Bioinformatics. 2008] Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics. 2009 Feb 5; 10 Suppl 2():S1. | 0 | 0 | 0 | NOT pmc_gds | 0 | 1 |
758 | GSE6919 | ['6919'] | ['2547', '2546', '2545'] | [] | 1865555 | [u'17430594'] | [''] | ['Michalopoulos', 'Ma', 'Becich', 'Bisceglia', 'Liang', 'Dhir', 'Lyons-Weiler', 'Monzon', 'Chandran'] | [] | BMC Cancer | 2007 | 4/12/2007 | 0 | AND pmc_gds | 0 | 1 | |||||
759 | GSE6921 | 2/6/2007 | ['6921'] | [] | [u'17372628'] | 1808428 | [u'17372628'] | ['Wang', 'Sasidharan', 'Gerstein', 'Li', 'Ronald', 'Chen', 'Tongprasit', 'Stolc', 'Korbel', 'Deng', 'He'] | ['Wang', 'Sasidharan', 'Gerstein', 'Li', 'Ronald', 'Chen', 'Tongprasit', 'Stolc', 'Korbel', 'Deng', 'He'] | ['Wang', 'Sasidharan', 'Gerstein', 'Li', 'Ronald', 'Chen', 'Tongprasit', 'Stolc', 'Korbel', 'Deng', 'He'] | PLoS One | 2007 | 3/14/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
760 | GSE6922 | 2/6/2007 | ['6922'] | [] | [u'17372628'] | 1808428 | [u'17372628'] | ['Wang', 'Sasidharan', 'Gerstein', 'Li', 'Ronald', 'Chen', 'Tongprasit', 'Stolc', 'Korbel', 'Deng', 'He'] | ['Wang', 'Sasidharan', 'Gerstein', 'Li', 'Ronald', 'Chen', 'Tongprasit', 'Stolc', 'Korbel', 'Deng', 'He'] | ['Wang', 'Sasidharan', 'Gerstein', 'Li', 'Ronald', 'Chen', 'Tongprasit', 'Stolc', 'Korbel', 'Deng', 'He'] | PLoS One | 2007 | 3/14/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
761 | GSE6925 | 9/21/2007 | ['6925'] | [] | [u'17765265'] | 2185545 | [u'17765265'] | ['Lee', 'Zhang', 'Peti', 'Palermino', 'Garc\xc3\xada-Contreras', 'Wood', u'Contreras', 'Page', 'Doshi'] | ['Lee', 'Zhang', 'Peti', 'Palermino', 'Garc\xc3\xada-Contreras', 'Wood', 'Page', 'Doshi'] | ['Lee', 'Zhang', 'Peti', 'Palermino', 'Garc\xc3\xada-Contreras', 'Wood', 'Page', 'Doshi'] | J Mol Biol | 2007 | 10/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
762 | GSE6927 | 6/21/2007 | ['6927'] | ['2933'] | [u'17307939'] | 1865734 | [u'17307939'] | ['Mans', 'Baker', 'Lamont', 'Handfield', 'Hasegawa', 'Lopez', 'Mao'] | ['Mans', 'Baker', 'Lamont', 'Handfield', 'Hasegawa', 'Lopez', 'Mao'] | ['Mans', 'Baker', 'Lamont', 'Handfield', 'Hasegawa', 'Lopez', 'Mao'] | Infect Immun | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
763 | GSE6930 | 3/1/2007 | ['6930'] | ['2733'] | [u'17425403'] | 1851624 | [u'17425403'] | ['Lessnick', 'Stegmaier', 'Wright', 'Wong', 'Kung', 'Chow', 'Peck', 'Golub', 'Ross'] | ['Lessnick', 'Stegmaier', 'Wright', 'Wong', 'Kung', 'Chow', 'Peck', 'Golub', 'Ross'] | ['Lessnick', 'Stegmaier', 'Wright', 'Wong', 'Kung', 'Chow', 'Peck', 'Golub', 'Ross'] | PLoS Med | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
764 | GSE6933 | 10/11/2007 | ['6933'] | [] | [u'17683608'] | 2374994 | [u'17683608'] | ['Kidder', 'Luttun', 'Pauwelyn', 'Ko', 'Geraerts', 'Hu', 'Verfaillie', u'Ulloa', 'Piao', 'Sharov', 'Chase', 'Ulloa-Montoya', 'Crabbe'] | ['Kidder', 'Luttun', 'Pauwelyn', 'Ko', 'Geraerts', 'Hu', 'Verfaillie', 'Piao', 'Sharov', 'Chase', 'Ulloa-Montoya', 'Crabbe'] | ['Kidder', 'Luttun', 'Pauwelyn', 'Ko', 'Geraerts', 'Hu', 'Verfaillie', 'Piao', 'Sharov', 'Chase', 'Ulloa-Montoya', 'Crabbe'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
765 | GSE6934 | 2/3/2007 | ['6934'] | [] | [u'19501082'] | 2711087 | [u'19538736'] | ['Potter', 'Little', 'Taylor', 'Brunskill', 'McMahon', 'Grimmond', 'Lesieur', 'Georgas', 'Rumballe', 'Valerius', 'Chiu', 'Tang', 'Aronow', 'Thiagarajan', 'Combes'] | ['Nedorezov', 'Karmazin', 'Forabosco', 'Ottolenghi', 'Uda', 'Cao', 'Piao', 'Schlessinger', 'Garcia-Ortiz', 'Cole', 'Pelosi', 'Omari'] | [] | BMC Dev Biol | 2009 | 6/18/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
766 | GSE6944 | 11/2/2007 | ['6944'] | [] | [u'17880706'] | 2040161 | [u'17880706'] | ['Yao', 'Salem', 'Rexroad', 'Silverstein'] | ['Yao', 'Salem', 'Rexroad', 'Silverstein'] | ['Yao', 'Salem', 'Rexroad', 'Silverstein'] | BMC Genomics | 2007 | 9/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
767 | GSE6946 | 2/8/2007 | ['6946'] | [] | [] | 2533010 | [u'18713465'] | [u'Samac', u'VandenBosch', u'Sharopova', u'Chandran'] | ['Garvin', 'Samac', 'VandenBosch', 'Sharopova', 'Chandran'] | [u'Samac', u'VandenBosch', u'Sharopova', u'Chandran'] | BMC Plant Biol | 2008 | 8/19/2008 | 0 | down-regulated in Al-treated roots. In addition, we used a 2-fold change cut-off for the significant genes. Normalized and raw data have been submitted to NCBI Gene Expression Omnibus (Accession No. GSE6946{{tag}}--DEPOSIT--). In both lines, the expression of a majority of transcripts appeared unchanged at both time points with Al treatment. As shown in Figure 4 , more genes were significantly altered by ≥ 2.0 | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
768 | GSE6955 | 2/5/2007 | ['6955'] | ['2613'] | [u'17309881'] | 2873396 | [u'20444257'] | ['Pevsner', 'Ojeda', 'Frerking', 'Ohliger', 'Deng', 'Sherman', 'Banine', 'Dissen', 'Budden', 'Matagne'] | ['Korja', 'Edgren', 'Karhu', 'Mousses', 'Ojala', 'Kallioniemi', 'Wolf', 'Kilpinen', 'Haapasalo'] | [] | BMC Cancer | 2010 | 5/5/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
769 | GSE6965 | 2/8/2007 | ['6965'] | ['2749'] | [u'18426895'] | 2610483 | [u'19125200'] | ['Mezger', 'Loeffler', u'Kneitz', 'Wozniok', 'Einsele', u'L\xf6ffler', 'Blockhaus', 'Kurzai', 'Hebart'] | ['Battaglia', 'Rizzetto', 'Paola', 'Rocca-Serra', 'Beltrame', 'Gambineri', 'Cavalieri'] | [] | PLoS One | 2009 | 2009 | 1 | AND pmc_gds | 0 | 1 | ||||
770 | GSE6965 | 2/8/2007 | ['6965'] | ['2749'] | [u'18426895'] | 2443887 | [u'18426895'] | ['Mezger', 'Loeffler', u'Kneitz', 'Wozniok', 'Einsele', u'L\xf6ffler', 'Blockhaus', 'Kurzai', 'Hebart'] | ['Mezger', 'Loeffler', 'Wozniok', 'Einsele', 'Blockhaus', 'Kurzai', 'Hebart'] | ['Mezger', 'Loeffler', 'Wozniok', 'Einsele', 'Blockhaus', 'Kurzai', 'Hebart'] | Antimicrob Agents Chemother | 2008 | 2008 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
771 | GSE6967 | 4/27/2007 | ['6967'] | ['2696'] | [u'17327269'] | 2988811 | [u'21124965'] | ['Krawetz', 'Diamond', 'Goodrich', 'Strader', 'Quintana', 'Platts', 'Rockett', 'Dix', 'Rawe', 'Thompson', 'Chemes'] | ['Goto', 'Rennert', 'Nagashima', 'Kumamoto', 'Hussain', 'Saito', 'Horikawa', 'Harris', 'Furusato', 'Robles', 'Yokota', 'Baxendale', 'Trivers', 'Sesterhenn', 'Takenoshita', 'Okamura', 'Yamashita', 'Lee'] | [] | PLoS One | 2010 | 11/19/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
772 | GSE6969 | 4/27/2007 | ['6969'] | [] | [u'17327269'] | 2453111 | [u'18522715'] | ['Krawetz', 'Diamond', 'Goodrich', 'Strader', 'Quintana', 'Platts', 'Rockett', 'Dix', 'Rawe', 'Thompson', 'Chemes'] | ['Loh', 'Wong', 'Eisenhaber'] | [] | Biol Direct | 2008 | 6/3/2008 | 0 | was hybridized in quadruplicates across 48 arrays on eight different Human-6 Expression BeadChips. The MT1-MMT (membrane type-1 matrix metalloproteinase) experiment This dataset available as NCBI GEO GSE5095 was complemented with replicate-specific standard errors and number of beads in a private communication by Vladislav S. Golubkov. In the experiment, 184B5 human normal mammary epithelial cells were|and 184B5 cell culture, following DNA-chip RNA expression profiling using Illumina Human-6 Expression BeadChips. The human spermatogenesis experiment The published expression profile dataset NCBI GEO GSE6969{{tag}}--REUSE-- from human ejaculates was used [ 8 ]. In the experiment, the samples were collected from 17 normal fertile men and 14 teratozoospermic men aged between 21 to 57. Upon RNA isolation of the spermatoz | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
773 | GSE6972 | 2/7/2007 | ['6972'] | [] | [u'18593933'] | 3033660 | [u'18593933'] | ['', 'Csokai', 'Thiemann', 'Wang', 'Dranchak', 'Sessler', 'Wei', 'Hacia', 'Hu', 'Magda', 'Ma', 'Lecane', 'Lynch'] | ['Csokai', 'Thiemann', 'Wang', 'Dranchak', 'Sessler', 'Wei', 'Hacia', 'Hu', 'Magda', 'Ma', 'Lecane', 'Lynch'] | ['Csokai', 'Lecane', 'Wang', 'Dranchak', 'Sessler', 'Wei', 'Hacia', 'Hu', 'Magda', 'Ma', 'Thiemann', 'Lynch'] | Cancer Res | 2008 | 7/1/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
774 | GSE6973 | 6/15/2007 | ['6973'] | [] | [u'17496079'] | 1951856 | [u'17496079'] | ['Ajdi\xc4\x87', 'Pham', u'Ajdic'] | ['Ajdi\xc4\x87', 'Pham'] | ['Ajdi\xc4\x87', 'Pham'] | J Bacteriol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
775 | GSE6976 | 9/24/2007 | ['6976'] | [] | [u'17683528'] | 2717951 | [u'19545436'] | ['Carvalho', u'Park', 'Brodsky', 'Fox', 'Neretti', 'McKee', 'Silver', 'Meyer'] | ['Heber', 'Sick', 'Howard'] | [] | BMC Bioinformatics | 2009 | 6/22/2009 | 1 | ded from the NCBI GEO database [ 28 ]. A variety of Affymetrix GeneChip 3' Expression array types are represented in the dataset, including: ath1121501 (Arabidopsis, 248 chips; GEO accession numbers: GSE5770, GSE5759, GSE911 [ 29 ], GSE2538 [ 30 ], GSE3350 [ 31 ], GSE3416 [ 32 ], GSE5534, GSE5535, GSE5530, GSE5529, GSE5522, GSE5520, GSE1491 [ 33 ], GSE2169, GSE2473), hgu133a (human, 72 chips; GSE1420 [|), hgu95av2 (human, 51 chips; GSE1563 [ 35 ]), hgu95d (human, 22 chips; GSE1007 [ 36 ]), hgu95e (human, 21 chips; GSE1007), mgu74a (mouse, 60 chips; GSE76, GSE1912 [ 37 ]), mgu74av2 (mouse, 29 chips; GSE1947 [ 38 ], GSE1419 [ 39 , 40 ]), moe430a (mouse, 10 chips; GSE1873 [ 41 ]), mouse4302 (mouse, 20 chips; GSE5338 [ 42 ], GSE1871 [ 43 ]), rae230a (rat, 26 chips; GSE1918, GSE2470), and rgu34a (rat, 44 |xperiment. The second dataset consists of all of the exon array .CEL files available in the GEO database at the time of this analysis (540 .CEL files). Fourteen different experiments are represented (GSE10599 [ 47 ], GSE10666 [ 48 ], GSE11150 [ 49 ], GSE11344 [ 50 ], GSE11967 [ 51 ], GSE12064 [ 52 ], GSE6976{{tag}}--REUSE-- [ 53 ], GSE7760 [ 54 ], GSE7761 [ 55 ], GSE8945 [ 56 ], GSE9342, GSE9372 [ 57 ], GSE9385 [ 58 ] | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
776 | GSE6976 | 9/24/2007 | ['6976'] | [] | [u'17683528'] | 2374990 | [u'17683528'] | ['Carvalho', u'Park', 'Brodsky', 'Fox', 'Neretti', 'McKee', 'Silver', 'Meyer'] | ['Carvalho', 'Brodsky', 'Fox', 'Neretti', 'McKee', 'Silver', 'Meyer'] | ['Carvalho', 'Brodsky', 'Fox', 'Neretti', 'McKee', 'Silver', 'Meyer'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
777 | GSE6977 | 8/14/2007 | ['6977'] | [] | [u'17684567'] | 1934932 | [u'17684567'] | ['Eshaghi', 'Li', u'T.', 'Liu', 'Chu', u'R.', 'Karuturi'] | ['Liu', 'Chu', 'Karuturi', 'Eshaghi', 'Li'] | ['Karuturi', 'Liu', 'Chu', 'Eshaghi', 'Li'] | PLoS One | 2007 | 8/8/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
778 | GSE6980 | 4/9/2007 | ['6980'] | ['2640'] | [u'17418411'] | 1885943 | [u'17418411'] | ['Carrasco', 'Zheng', 'Sukhdeo', 'Anderson', 'Protopopov', 'Munshi', 'Ivanova', 'Protopopova', 'Sinha', 'Enos', 'Tonon', 'DePinho', 'Horner', 'Mani', 'Pinkus', 'Henderson'] | ['Carrasco', 'Zheng', 'Sukhdeo', 'Anderson', 'Protopopov', 'Munshi', 'Ivanova', 'Protopopova', 'Sinha', 'Enos', 'Tonon', 'DePinho', 'Horner', 'Mani', 'Pinkus', 'Henderson'] | ['Carrasco', 'Zheng', 'Munshi', 'Protopopov', 'Sukhdeo', 'Horner', 'Anderson', 'Ivanova', 'Tonon', 'DePinho', 'Protopopova', 'Enos', 'Mani', 'Sinha', 'Pinkus', 'Henderson'] | Cancer Cell | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
779 | GSE6984 | 2/8/2007 | ['6984'] | [] | [u'18701715'] | 2515620 | [u'18701715'] | ['Gawel', 'Mieczkowski', 'Argueso', 'Resnick', 'Westmoreland', 'Petes'] | ['Gawel', 'Mieczkowski', 'Argueso', 'Resnick', 'Westmoreland', 'Petes'] | ['Gawel', 'Mieczkowski', 'Argueso', 'Resnick', 'Westmoreland', 'Petes'] | Proc Natl Acad Sci U S A | 2008 | 8/19/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
780 | GSE6989 | 8/30/2007 | ['6989'] | [] | [u'17660305'] | 2075023 | [u'17660305'] | ['Gellissen', 'Lee', 'Oh', 'Kang', 'Park', 'Kwon', 'Sohn', 'Hur', 'Rhee'] | ['Gellissen', 'Lee', 'Oh', 'Kang', 'Park', 'Kwon', 'Sohn', 'Hur', 'Rhee'] | ['Gellissen', 'Lee', 'Oh', 'Kang', 'Park', 'Kwon', 'Sohn', 'Hur', 'Rhee'] | Appl Environ Microbiol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
781 | GSE6991 | 2/10/2007 | ['6991'] | [] | [u'18701715'] | 2515620 | [u'18701715'] | ['Gawel', 'Mieczkowski', 'Argueso', 'Resnick', 'Westmoreland', 'Petes'] | ['Gawel', 'Mieczkowski', 'Argueso', 'Resnick', 'Westmoreland', 'Petes'] | ['Gawel', 'Mieczkowski', 'Argueso', 'Resnick', 'Westmoreland', 'Petes'] | Proc Natl Acad Sci U S A | 2008 | 8/19/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
782 | GSE6992 | 2/10/2007 | ['6992'] | [] | [u'18000553'] | 2064960 | [u'18000553'] | ['Pomposiello', 'Wholey', u'Chen', 'Blanchard', 'Conlon'] | ['Pomposiello', 'Wholey', 'Blanchard', 'Conlon'] | ['Pomposiello', 'Wholey', 'Blanchard', 'Conlon'] | PLoS One | 2007 | 11/14/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
783 | GSE6996 | 2/21/2007 | ['6996'] | [] | [u'17372628'] | 2736005 | [u'19641029'] | ['Wang', 'Sasidharan', 'Gerstein', 'Li', 'Ronald', 'Chen', 'Tongprasit', 'Stolc', 'Korbel', 'Deng', 'He'] | ['Buell', 'Lehti-Shiu', 'Prakash', 'Thibaud-Nissen', 'Zou', 'Shiu'] | [] | Plant Physiol | 2009 | 2009 Sep | 0 | umably functional and the number of Ψs that have evidence of expression based on (1) putative unique transcript (PUT), (2) MPSS tags, and (3) tiling array data. Rice and Arabidopsis PUTs were downloaded from PlantGDB (version 163a; Duvick et al., 2008 ) and were used to search against annotated genes and set II Ψs. PUTs are regarded as cognate transcripts for annotated genes and Ψ| are 97% or greater, (3) the aligned regions are 300 nucleotides or greater, and (4) the matched region is greater than 50% of the shorter sequence length. The MPSS tags for rice and Arabidopsis were downloaded ( http://mpss.udel.edu/ ; Nobuta et al., 2007 ) and mapped to the rice and Arabidopsis pseudomolecules (100% identity and 100% coverage). MPSS tags that mapped uniquely to functional genes or �|s expression tags for the respective genic sequences. The third type of expression data set we examined was tiling array data for Arabidopsis (GEO: GSE601 ; Yamada et al., 2003 ) and for rice (GEO: GSE6996{{tag}}--REUSE-- ; Li et al., 2007 ). In both studies, the genomes were covered with the use of multiple arrays. A between-array normalization procedure was applied to each data set using the affyPLM package of Bi | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
784 | GSE6996 | 2/21/2007 | ['6996'] | [] | [u'17372628'] | 1808428 | [u'17372628'] | ['Wang', 'Sasidharan', 'Gerstein', 'Li', 'Ronald', 'Chen', 'Tongprasit', 'Stolc', 'Korbel', 'Deng', 'He'] | ['Wang', 'Sasidharan', 'Gerstein', 'Li', 'Ronald', 'Chen', 'Tongprasit', 'Stolc', 'Korbel', 'Deng', 'He'] | ['Wang', 'Sasidharan', 'Gerstein', 'Li', 'Ronald', 'Chen', 'Tongprasit', 'Stolc', 'Korbel', 'Deng', 'He'] | PLoS One | 2007 | 3/14/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
785 | GSE7003 | 12/5/2007 | ['7003'] | [] | [u'18055602'] | 2174893 | [u'18055602'] | ['Evans', 'Barton', 'Emery', u'Lye', 'Hou', 'Wenkel'] | ['Wenkel', 'Hou', 'Barton', 'Emery', 'Evans'] | ['Wenkel', 'Hou', 'Barton', 'Emery', 'Evans'] | Plant Cell | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
786 | GSE7005 | 2/14/2007 | ['7005'] | [] | [u'17322329'] | 1855617 | [u'17322329'] | ['Ard\xc3\xb6', 'Hughes', 'Christensen', 'Rodr\xc3\xadguez', u'Rodr\xedguez', 'Steele', 'Wechter', u'Ard\xf6', 'Smeianov', 'Broadbent'] | ['Ard\xc3\xb6', 'Hughes', 'Christensen', 'Rodr\xc3\xadguez', 'Steele', 'Wechter', 'Smeianov', 'Broadbent'] | ['Ard\xc3\xb6', 'Hughes', 'Christensen', 'Rodr\xc3\xadguez', 'Steele', 'Wechter', 'Smeianov', 'Broadbent'] | Appl Environ Microbiol | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
787 | GSE7007 | 2/14/2007 | ['7007'] | [] | [u'17482132'] | 2671847 | [u'19404404'] | ['Charbord', 'Prieur', 'Laud-Duval', 'Tirode', 'Delorme', 'Delattre'] | ['Kofler', 'Kovar', 'Meltzer', 'Walker', 'Davis', 'Ban', 'Kauer'] | [] | PLoS One | 2009 | 2009 | 1 | AND pmc_gds | 0 | 1 | ||||
788 | GSE7009 | 3/1/2007 | ['7009'] | ['2781'] | [u'17586820'] | 1935013 | [u'17586820'] | ['Kulozik', 'Gehring', 'Viegas', 'Hentze', 'Breit'] | ['Kulozik', 'Gehring', 'Viegas', 'Hentze', 'Breit'] | ['Kulozik', 'Gehring', 'Viegas', 'Hentze', 'Breit'] | Nucleic Acids Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
789 | GSE7012 | 5/1/2007 | ['7012'] | [] | [u'17362910'] | 2865499 | [u'20370903'] | ['Gonzalez', 'Kusumi', 'Rappaport', 'Anderson', 'Pratt', 'William', 'Sewell', 'Saitta', 'Traas', 'Markov', 'Gibson'] | ['Lang', 'Krumsiek', 'Theis', 'Marr', 'Lutter'] | [] | BMC Genomics | 2010 | 4/6/2010 | 0 | All analyzed datasets were taken from the GEO [43] database: (i) The stem cell development (SCD) dataset consists of three cell lines (R1, J1, V6.5) differentiated into embryoid bodies (EB) at 11 time points from t = 0 h until t = d 14. From each time point and each cell line 3 technical replicates were measured (combination of three cell line differentiationsÊGSE2972,ÊGSE3749,ÊGSE3231). (ii) Within the somitogenesis dataset (SG) gene expression was measured from synchronized C2C12 myoblasts at 13 timepoints from t = 0 h until t = 6 h (GSE7012{{key}}--REUSE--). (iii) The neurite outgrowth (NO) and regeneration dataset consists of transcriptional activity, measured from dorsal root ganglia during a time course of neurite outgrowth in vitro under two conditions: untreated and under potent inhibitory cue Semaphorin3A. Measurements were taken at 5 time points from t = 2 h until t = 40 h including two technical replicates (GSE9738). | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
790 | GSE7012 | 5/1/2007 | ['7012'] | [] | [u'17362910'] | 1899184 | [u'17362910'] | ['Gonzalez', 'Kusumi', 'Rappaport', 'Anderson', 'Pratt', 'William', 'Sewell', 'Saitta', 'Traas', 'Markov', 'Gibson'] | ['Gonzalez', 'Kusumi', 'Rappaport', 'Anderson', 'Pratt', 'William', 'Sewell', 'Saitta', 'Traas', 'Markov', 'Gibson'] | ['Gonzalez', 'Kusumi', 'Anderson', 'Rappaport', 'Pratt', 'William', 'Sewell', 'Saitta', 'Traas', 'Markov', 'Gibson'] | Dev Biol | 2007 | 5/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
791 | GSE7015 | 5/1/2007 | ['7015'] | [] | [u'17362910'] | 1899184 | [u'17362910'] | ['Gonzalez', 'Kusumi', 'Rappaport', 'Anderson', 'Pratt', 'William', 'Sewell', 'Saitta', 'Traas', 'Markov', 'Gibson'] | ['Gonzalez', 'Kusumi', 'Rappaport', 'Anderson', 'Pratt', 'William', 'Sewell', 'Saitta', 'Traas', 'Markov', 'Gibson'] | ['Gonzalez', 'Kusumi', 'Anderson', 'Rappaport', 'Pratt', 'William', 'Sewell', 'Saitta', 'Traas', 'Markov', 'Gibson'] | Dev Biol | 2007 | 5/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
792 | GSE7018 | 12/29/2007 | ['7018'] | [] | [u'17916255'] | 2099445 | [u'17916255'] | ['Guiguen', 'Fostier', 'Houlgatte', 'Baron', 'Montfort'] | ['Guiguen', 'Fostier', 'Houlgatte', 'Baron', 'Montfort'] | ['Fostier', 'Guiguen', 'Houlgatte', 'Baron', 'Montfort'] | BMC Genomics | 2007 | 10/4/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
793 | GSE7020 | 2/14/2007 | ['7020'] | ['2630'] | [u'17420462'] | 1849960 | [u'17420462'] | ['', 'Geiger', 'Kalfa', 'Daria', 'Koesters', 'Jegga', 'Diwan', 'Aronow', 'Baines', 'Molkentin', 'Macleod', 'Spike', 'Dorn', 'Odley', 'Pushkaran'] | ['Geiger', 'Kalfa', 'Daria', 'Koesters', 'Jegga', 'Diwan', 'Aronow', 'Baines', 'Molkentin', 'Macleod', 'Spike', 'Dorn', 'Odley', 'Pushkaran'] | ['Geiger', 'Kalfa', 'Daria', 'Koesters', 'Jegga', 'Diwan', 'Aronow', 'Baines', 'Molkentin', 'Macleod', 'Spike', 'Dorn', 'Odley', 'Pushkaran'] | Proc Natl Acad Sci U S A | 2007 | 4/17/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
794 | GSE7023 | 6/12/2007 | ['7023'] | [] | [u'17409424'] | 2949900 | [u'20875095'] | ['Lucin', 'Teh', 'Kahnoski', 'Koeman', 'Swiatek', 'Yang', 'Dykema', 'Furge', 'Chen'] | ['Dawany', 'Tozeren'] | [] | BMC Bioinformatics | 2010 | 9/27/2010 | 0 | he individual datasets usually small in size, but the inferences made from individual studies are often inconsistent with similar studies [ 1 ]. As thousands of microarray samples have accumulated in publicly accessible databases in the last decade [ 2 - 4 ], several statistical methods have been developed to allow for the combination and comparison of data from multiple sources. Among the many methodolog|ons based on hypergeometric test. Table 1 Overview of datasets used and distribution of microarray samples Analysis Tissue Accession # Normal Cancer Platform IV1/IV2/SAM1/SAM2 Colon E-MTAB-57 22 25 A GSE4107 10 12 P2 GSE4183 8 15 P2 Kidney E-TABM-282 11 16 P2 GSE11024† 12 60 P2 GSE11151 3 57 P2 GSE14762† 12 10 P2 GSE15641 23 57 A GSE6344 10 10 A GSE7023{{tag}}--REUSE-- 12 35 P2 Liver GSE14323 19 47 A/A|2 49 58 A GSE7670 27 27 A Pancreas E-MEXP-1121† 6 17 A E-MEXP-950 11 14 A GSE15471 39 39 P2 GSE16515 15 36 P2 Total: 294 619 SAM2 Colon E-MEXP-1224 0 55 A E-MEXP-383 0 36 A E-TABM-176 55 0 P2 GSE12945 0 36 A GSE17538 0 232 P2 Kidney GSE10320 0 144 A GSE11904 0 21 A2 Liver E-TABM-292 0 32 A E-TABM-36 0 57 A GSE9843 0 69 P2 Lung GSE10445 0 72 P2 GSE12667 0 75 P2 Total: 55 829 IV2 Colon GSE6988 28|5E-257 No data - 262 2.34E-299 * Only 338 genes are used for colon IV1 Moreover, to assess the effect of the refRMA method in normalizing data, three samples from different colon datasets (E-MTAB-57, GSE4107 and GSE4183) were chosen. The expression values for the three arrays were obtained based on classical RMA and refRMA normalization techniques. Quantile-quantile plots were produced to compare the d|election A total of 31 Affymetrix microarray datasets containing 1,768 unique samples from human cancer (1,429) and corresponding healthy control tissues (339) were collected from the Gene Expression Omnibus (GEO; [ 2 , 3 ] and Array Express [ 4 ] online repositories (Additional File 2 ). Samples were selected for 5 different tissue types: colon, kidney, liver, lung and pancreas, then categorized into c|ms and the conversion of data to Entrez IDs resulted in the study of varying number of genes per dataset as well as different total overlap with the common Affymetrix platform (shown in parentheses); GSE6988: 9,072 (5,834) genes, GSE3: 12,452 (6,598) genes, GSE7367: 2118 (1,301) genes, GSE2088: 13754 (7,038) genes, and GSE8596: 6740 (4,330) genes. The datasets contained cancer versus normal samples fro|NCBI GEO: mining tens of millions of expression profiles--database and tools update Nucleic Acids Res 2007 35 Database D760 765 10.1093/nar/gkl887 17099226 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 1 207 210 10.1093/nar/30.1.207 11752295 Brazma A Parkinson H Sarkans U Shojatalab M Vilo J Abeygunawardena N Ho | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
795 | GSE7023 | 6/12/2007 | ['7023'] | [] | [u'17409424'] | 2518213 | [u'18773095'] | ['Lucin', 'Teh', 'Kahnoski', 'Koeman', 'Swiatek', 'Yang', 'Dykema', 'Furge', 'Chen'] | ['Teh', 'Metcalf', 'Kahnoski', 'Russell', 'Richard', 'Zhang', 'Houseman', 'Furge', 'Koeman', 'Swiatek', 'Kort', 'Matsuda', 'Dykema', 'Westphal', 'Petillo', 'Ohh', 'Tan', 'Vieillefond', 'Koelzer'] | ['Teh', 'Kahnoski', 'Koeman', 'Swiatek', 'Dykema', 'Furge'] | PLoS Genet | 2008 | 9/5/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
796 | GSE7023 | 6/12/2007 | ['7023'] | [] | [u'17409424'] | 3012009 | [u'21162720'] | ['Lucin', 'Teh', 'Kahnoski', 'Koeman', 'Swiatek', 'Yang', 'Dykema', 'Furge', 'Chen'] | ['Teh', 'Furge', 'Richard', 'Giraud', 'Aly', 'Denoux', 'Klomp', 'Zickert', 'MacKeigan', 'Yonneau', 'Chen', 'Yang', 'M\xc3\xa9jean', 'Dykema', 'Vasiliu', 'Niemi', 'Petillo', 'S\xc3\xa4\xc3\xa4f', 'Bergerheim', 'Gad', 'Nordenskj\xc3\xb6ld'] | ['Teh', 'Chen', 'Yang', 'Dykema', 'Furge'] | BMC Med Genomics | 2010 | 12/16/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
797 | GSE7026 | 2/17/2007 | ['7026'] | [] | [u'18344982'] | 2841398 | [u'18344982'] | ['Lamb', 'Leonardson', 'Castellini', 'Edwards', 'Lusis', u'Mehrabian', 'MacNeil', 'Lum', 'Schadt', 'Zhang', 'Sieberts', 'Emilsson', 'Horvath', 'Zhu', 'Drake', 'Yang', 'Wang', u'Allayee', 'Champy', 'Ghazalpour', 'Chen', 'Doss', 'Pinto'] | ['Lamb', 'Leonardson', 'Castellini', 'Horvath', 'Zhu', 'Sieberts', 'Zhang', 'Edwards', 'Champy', 'Lusis', 'Drake', 'Schadt', 'Yang', 'Emilsson', 'MacNeil', 'Ghazalpour', 'Chen', 'Doss', 'Pinto', 'Lum', 'Wang'] | ['Lamb', 'Leonardson', 'Castellini', 'Horvath', 'Zhu', 'Schadt', 'Zhang', 'Sieberts', 'Edwards', 'Champy', 'Lusis', 'Drake', 'Yang', 'Emilsson', 'MacNeil', 'Ghazalpour', 'Chen', 'Doss', 'Pinto', 'Lum', 'Wang'] | Nature | 2008 | 3/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
798 | GSE7027 | 2/17/2007 | ['7027'] | [] | [u'18344982'] | 2841398 | [u'18344982'] | ['Lamb', 'Leonardson', 'Castellini', 'Horvath', 'Zhu', 'Sieberts', 'Zhang', 'Edwards', 'Champy', 'Lusis', 'Drake', 'Schadt', 'Yang', 'Emilsson', 'MacNeil', 'Ghazalpour', 'Chen', 'Doss', 'Pinto', 'Lum', 'Wang'] | ['Lamb', 'Leonardson', 'Castellini', 'Horvath', 'Zhu', 'Sieberts', 'Zhang', 'Edwards', 'Champy', 'Lusis', 'Drake', 'Schadt', 'Yang', 'Emilsson', 'MacNeil', 'Ghazalpour', 'Chen', 'Doss', 'Pinto', 'Lum', 'Wang'] | ['Lamb', 'Leonardson', 'Castellini', 'Horvath', 'Zhu', 'Schadt', 'Zhang', 'Sieberts', 'Edwards', 'Champy', 'Lusis', 'Drake', 'Yang', 'Emilsson', 'MacNeil', 'Ghazalpour', 'Chen', 'Doss', 'Pinto', 'Lum', 'Wang'] | Nature | 2008 | 3/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
799 | GSE7028 | 2/17/2007 | ['7028'] | [] | [u'18344982'] | 2841398 | [u'18344982'] | ['Lamb', 'Leonardson', 'Castellini', '', 'Edwards', 'Lusis', 'MacNeil', 'Lum', 'Schadt', 'Zhang', 'Sieberts', 'Emilsson', 'Horvath', 'Zhu', 'Drake', 'Yang', 'Wang', 'Champy', 'Ghazalpour', 'Chen', 'Doss', 'Pinto'] | ['Lamb', 'Leonardson', 'Castellini', 'Horvath', 'Zhu', 'Sieberts', 'Zhang', 'Edwards', 'Champy', 'Lusis', 'Drake', 'Schadt', 'Yang', 'Emilsson', 'MacNeil', 'Ghazalpour', 'Chen', 'Doss', 'Pinto', 'Lum', 'Wang'] | ['Lamb', 'Leonardson', 'Castellini', 'Horvath', 'Zhu', 'Schadt', 'Zhang', 'Sieberts', 'Edwards', 'Champy', 'Lusis', 'Drake', 'Yang', 'Emilsson', 'MacNeil', 'Ghazalpour', 'Chen', 'Doss', 'Pinto', 'Lum', 'Wang'] | Nature | 2008 | 3/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
800 | GSE7029 | 2/17/2007 | ['7029'] | [] | [u'18344982'] | 2748096 | [u'19728865'] | ['Lamb', 'Leonardson', 'Castellini', '', 'Edwards', 'Lusis', 'MacNeil', 'Lum', 'Schadt', 'Zhang', 'Sieberts', 'Emilsson', 'Horvath', 'Zhu', 'Drake', 'Yang', 'Wang', 'Champy', 'Ghazalpour', 'Chen', 'Doss', 'Pinto'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | [] | BMC Genomics | 2009 | 9/3/2009 | 1 | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192 (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965 GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381 GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029{{tag}}--REUSE-- GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615 GPL2884 Cancer Process Study 0.752 0.0107 GSE7600 GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192 GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625 GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528 GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
801 | GSE7029 | 2/17/2007 | ['7029'] | [] | [u'18344982'] | 2841398 | [u'18344982'] | ['Lamb', 'Leonardson', 'Castellini', '', 'Edwards', 'Lusis', 'MacNeil', 'Lum', 'Schadt', 'Zhang', 'Sieberts', 'Emilsson', 'Horvath', 'Zhu', 'Drake', 'Yang', 'Wang', 'Champy', 'Ghazalpour', 'Chen', 'Doss', 'Pinto'] | ['Lamb', 'Leonardson', 'Castellini', 'Horvath', 'Zhu', 'Sieberts', 'Zhang', 'Edwards', 'Champy', 'Lusis', 'Drake', 'Schadt', 'Yang', 'Emilsson', 'MacNeil', 'Ghazalpour', 'Chen', 'Doss', 'Pinto', 'Lum', 'Wang'] | ['Lamb', 'Leonardson', 'Castellini', 'Horvath', 'Zhu', 'Schadt', 'Zhang', 'Sieberts', 'Edwards', 'Champy', 'Lusis', 'Drake', 'Yang', 'Emilsson', 'MacNeil', 'Ghazalpour', 'Chen', 'Doss', 'Pinto', 'Lum', 'Wang'] | Nature | 2008 | 3/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
802 | GSE7030 | 7/1/2007 | ['7030'] | [] | [u'18287491'] | 2287333 | [u'18287491'] | ['Chambrier', 'Guyon', 'Mbelo', 'Moing', 'Rogowsky', 'Deborde', 'Perez', 'Balzergue', 'Martin-Magniette', 'Cossegal'] | ['Chambrier', 'Guyon', 'Mbelo', 'Moing', 'Rogowsky', 'Deborde', 'Perez', 'Balzergue', 'Martin-Magniette', 'Cossegal'] | ['Guyon', 'Chambrier', 'Moing', 'Mbelo', 'Rogowsky', 'Deborde', 'Perez', 'Balzergue', 'Martin-Magniette', 'Cossegal'] | Plant Physiol | 2008 | 2008 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
803 | GSE7032 | 5/31/2007 | ['7032'] | ['2743'] | [u'17360536'] | 2947508 | [u'20927376'] | ['Hamilton', 'Wahlestedt', 'Gimeno', 'Timmons', 'Nedergaard', 'Baar', 'Walden', 'Petrovic', 'Cannon', 'Lassmann', 'Wennmalm', 'Larsson'] | ['Wisotzkey', 'Halperin', 'Ronaghi', 'Wang', 'Alag', 'Wall', 'Park', 'Kupershmidt', 'Su', 'Sundaresh', 'Akhtari', 'Cui', 'Grewal', 'Shekar', 'Flynn'] | [] | PLoS One | 2010 | 9/29/2010 | 0 | ltiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. Conclusions Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and dis|roduction High-throughput technologies have become essential tools for biological researchers. The advent of “open biology” has led to an exponential growth of high-throughput data in publicly shared repositories, such as NCBI GEO, EBI Array Express, and the Stanford Microarray Database (SMD) [1] . The billions of data points collected within these repositories provide an |e this we derived a brown fat tissue gene expression signature from the mouse tissue atlas dataset containing genome-wide gene expression profiles of 61unique tissues and organs (NCBI GEO Accession # GSE1133) [21] . Each gene was ranked according to its fold change relative to the median of all mouse tissues (see Methods ). As a result we obtained a tissue signature that consisted of |nd cell types. We derived a brown preadipocytes signature consisting of 2,302 probesets (mouse MG_U74Av2 Affymetrix chip) by comparing gene expression of brown to white preadipocytes (GEO Accession # GSE7032{{tag}}--REUSE--, Table S4 ) [23] . Using this signature, we performed a correlation analysis across all normal tissues and cell types in NextBio ( Figure 5B ), and found that muscle stem cells we|designing new experiments to study white and brown fat biology. Our strategy also provides a method that addresses, at multiple levels, the data quality concerns that are often raised with respect to publicly available data. First, the data goes through rounds of preprocessing, quality control, and curation. Second, all analysis results are rank-ordered according to enrichment statistics. Finally, the met| Blat IC 2006 The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313 1929 1935 17008526 12 Edgar R Domrachev M Lash AE 2002 Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30 207 210 11752295 13 Brazma A Parkinson H Sarkans U Shojatalab M Vilo J Abeygunawardena N Holloway E Kapushesky M Ke | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
804 | GSE7034 | 2/14/2007 | ['7034'] | [] | [] | 2225428 | [u'17925033'] | [u'Hanley', u'Porwollik', u'DeMarini', u'Ward', u'Swartz', u'McClelland', u'Warren'] | ['Hanley', 'Knapp', 'McClelland', 'DeMarini', 'Ward', 'Swartz', 'Porwollik', 'Warren'] | [u'Hanley', u'Porwollik', u'DeMarini', u'Ward', u'Swartz', u'McClelland', u'Warren'] | BMC Bioinformatics | 2007 | 10/9/2007 | 0 | 2 separate measurements (4 independent biological samples × 3 technical replicates per sample). These intensity data have been archived in the Gene Expression Omnibus [ 35 ], accession number GSE7034{{tag}}--REUSE--. Initial data analyses and concentration-related changes Data were first analyzed as described previously [ 16 ]. Briefly, spot intensities were quantified, the local background intensities were su | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
805 | GSE7035 | 4/19/2007 | ['7035'] | ['2705'] | [u'17482130'] | 2564847 | [u'17482130'] | ['Qu', 'Vafai', 'Girnun', 'Spiegelman', 'Bronson', 'Naseri', 'Alberta', 'Szwaya'] | ['Qu', 'Vafai', 'Girnun', 'Spiegelman', 'Bronson', 'Naseri', 'Alberta', 'Szwaya'] | ['Qu', 'Vafai', 'Girnun', 'Naseri', 'Bronson', 'Alberta', 'Szwaya', 'Spiegelman'] | Cancer Cell | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
806 | GSE7037 | 10/2/2007 | ['7037'] | [] | [u'19154620'] | 2657775 | [u'19154620'] | ['Mohammad', 'Singh', 'Sharma', u'Farhan'] | ['Mohammad', 'Singh', 'Sharma'] | ['Mohammad', 'Singh', 'Sharma'] | BMC Syst Biol | 2009 | 1/21/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
807 | GSE7045 | 2/23/2007 | ['7045'] | [] | [u'17360575'] | 2912892 | [u'20642854'] | ['Srivastava', 'Baliga', 'Vuthoori', 'Kaur', 'Donohoe', 'Facciotti', 'Hood', 'Shannon', 'Bonneau', 'Pan', 'Reiss'] | ['Baliga', 'Tenenbaum', 'Reiss', 'Bare', 'Koide'] | ['Baliga', 'Reiss'] | BMC Bioinformatics | 2010 | 7/19/2010 | 0 | ter inside a coding sequence of the succinate dehydrogenase operon Our genome browser was developed in conjunction with a study of the transcriptome structure of Halobacterium salinarum [ 19 ]. (GEO GSE13150) Transcription and protein-DNA binding were measured using whole-genome tiling arrays at several time points over the growth curve. This data was used to revise computationally predicted genes and|on factor TFBd. We assayed DNA binding for TFBd using chromatin immunoprecipitation followed by two different whole genome tiling array platforms, an in-house array with 500 base-pair resolution (GEO GSE7045{{tag}}--REUSE--) [ 20 ] and a higher resolution Nimblegen tiling array (GEO GPL8468). Part of the intent was to test the sufficiency of the 500 bp array to predict binding sites. Given the low resolution of this d | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
808 | GSE7045 | 2/23/2007 | ['7045'] | [] | [u'17360575'] | 1838652 | [u'17360575'] | ['Srivastava', 'Baliga', 'Vuthoori', 'Kaur', 'Donohoe', 'Facciotti', 'Hood', 'Shannon', 'Bonneau', 'Pan', 'Reiss'] | ['Srivastava', 'Baliga', 'Vuthoori', 'Kaur', 'Donohoe', 'Facciotti', 'Hood', 'Shannon', 'Bonneau', 'Pan', 'Reiss'] | ['Srivastava', 'Baliga', 'Vuthoori', 'Kaur', 'Facciotti', 'Hood', 'Donohoe', 'Bonneau', 'Pan', 'Shannon', 'Reiss'] | Proc Natl Acad Sci U S A | 2007 | 3/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
809 | GSE7048 | 2/17/2007 | ['7048'] | [] | [u'18682395'] | 2661400 | [u'18682395'] | ['Mulholland', 'Pearson', 'Smith', 'Monk', 'Poole'] | ['Mulholland', 'Pearson', 'Smith', 'Monk', 'Poole'] | ['Mulholland', 'Pearson', 'Smith', 'Monk', 'Poole'] | J Biol Chem | 2008 | 10/17/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
810 | GSE7049 | 3/15/2007 | ['7049'] | [] | [u'17400708'] | 1913798 | [u'17400708'] | ['Le', 'Kohler', 'Wincker', 'Fluch', 'Duplessis', 'Duchaussoy', 'Ningre', 'Frey', 'Couloux', 'Rinaldi', 'Martin'] | ['Le', 'Kohler', 'Wincker', 'Fluch', 'Duplessis', 'Duchaussoy', 'Ningre', 'Frey', 'Couloux', 'Rinaldi', 'Martin'] | ['Le', 'Kohler', 'Wincker', 'Fluch', 'Duplessis', 'Duchaussoy', 'Ningre', 'Frey', 'Couloux', 'Rinaldi', 'Martin'] | Plant Physiol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
811 | GSE7062 | 4/17/2007 | ['7062'] | [] | [u'17439305'] | 1852588 | [u'17439305'] | ['Goodrich', 'Clarenz', 'Zhang', 'Pellegrini', 'Cokus', 'Bernatavichute', 'Jacobsen'] | ['Goodrich', 'Clarenz', 'Zhang', 'Pellegrini', 'Cokus', 'Bernatavichute', 'Jacobsen'] | ['Goodrich', 'Clarenz', 'Zhang', 'Pellegrini', 'Cokus', 'Bernatavichute', 'Jacobsen'] | PLoS Biol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
812 | GSE7063 | 4/17/2007 | ['7063'] | [] | [u'17439305'] | 1852588 | [u'17439305'] | ['Goodrich', 'Clarenz', 'Zhang', 'Pellegrini', 'Cokus', 'Bernatavichute', 'Jacobsen'] | ['Goodrich', 'Clarenz', 'Zhang', 'Pellegrini', 'Cokus', 'Bernatavichute', 'Jacobsen'] | ['Goodrich', 'Clarenz', 'Zhang', 'Pellegrini', 'Cokus', 'Bernatavichute', 'Jacobsen'] | PLoS Biol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
813 | GSE7064 | 4/17/2007 | ['7064'] | [] | [u'17439305'] | 1852588 | [u'17439305'] | ['Goodrich', 'Clarenz', 'Zhang', 'Pellegrini', 'Cokus', 'Bernatavichute', 'Jacobsen'] | ['Goodrich', 'Clarenz', 'Zhang', 'Pellegrini', 'Cokus', 'Bernatavichute', 'Jacobsen'] | ['Goodrich', 'Clarenz', 'Zhang', 'Pellegrini', 'Cokus', 'Bernatavichute', 'Jacobsen'] | PLoS Biol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
814 | GSE7069 | 4/20/2007 | ['7069'] | ['2718'] | [u'17448993'] | 2945940 | [u'20840752'] | ['Harel', 'Mirny', 'Reizis', 'Doetsch', 'Hou', 'Arenzana', 'Galan-Caridad'] | ['Ojeda', 'de', 'Nitsch', 'Gon\xc3\xa7alves', 'Moreau'] | [] | BMC Bioinformatics | 2010 | 9/14/2010 | 0 | n, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expr| approach for scoring candidates is presented here for comparison purposes as a naïve strategy for network analysis of differential expression. We have benchmarked these four strategies on 40 publicly available data sets originated from Affymetrix chips on which mice with (simple) knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. For each data set we|egy based on a direct neighborhood analysis. These four network-based prioritization strategies for scoring candidate genes based on their differentially expressed neighborhood were benchmarked on 40 publicly available knockout experiments in mice. Performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expressi|ng candidate genes using network-based machine learning approaches even if no knowledge is available about the disease or phenotype. Methods Benchmark data The benchmark for this study consists of 40 publicly available data sets originated from Affymetrix chips on which mice with knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. Table 3 shows all data sets|ur benchmark. Table 3 The benchmark data. Gene Name GEO accession number Gene Name GEO accession number 1 Abca1 GSE5496 21 Mbnl1 GSE14691 2 Btk GSE2826 22 Mst1r, Ron GSE16629 3 Cav1 GSE10849 23 MyD88 GSE6688 4 Cav3 GSE10848 24 Nos3, eNos GSE1988 5 Cftr GSE5715 25 Phgdh GSE8555 6 Clcn1 GSE14691 26 Pmp22 GSE1947 7 Cnr1 GSE7694 27 PPAR α GSE6864 8 Emd GSE5304 28 Prkag3, AMPK G3 GSE4065 9 Epas1, Hi|-2 GSE16067 29 Pthlh, Pthrp GSE17654 10 Esrra GSE7196 30 Rab3a GSE6527 11 Gap43 GSE12687 31 RasGrf1 GSE8425 12 Gnmt GSE9809 32 Rbm15 GSE12628 13 Hdac1 GSE5583 33 Runx GSE4911 14 Hdac2 GSE6770 34 Scd1 GSE2926 15 Hsf4 GSE12415 35 Slc26a4 GSE10587 16 Hspa1A, Hsp70.1 GSE11120 36 Srf GSE13333 17 Il6 GSE411 37 Tgm2 GSE10285 18 Lhx1, Lim1 GSE4230 38 Zc3h12a GSE14891 19 Lhx8 GSE11897 39 Zfp36, Tpp GSE5324 20 L | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
815 | GSE7069 | 4/20/2007 | ['7069'] | ['2718'] | [u'17448993'] | 1899089 | [u'17448993'] | ['Harel', 'Mirny', 'Reizis', 'Doetsch', 'Hou', 'Arenzana', 'Galan-Caridad'] | ['Harel', 'Mirny', 'Reizis', 'Doetsch', 'Hou', 'Arenzana', 'Galan-Caridad'] | ['Harel', 'Mirny', 'Reizis', 'Doetsch', 'Hou', 'Arenzana', 'Galan-Caridad'] | Cell | 2007 | 4/20/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
816 | GSE7070 | 5/15/2007 | ['7070'] | [] | [u'17590080'] | 1894823 | [u'17590080'] | [u'Maeurer', 'Mehlitz', 'M\xc3\xa4urer', 'Mollenkopf', 'Meyer'] | ['Mehlitz', 'M\xc3\xa4urer', 'Mollenkopf', 'Meyer'] | ['Mehlitz', 'M\xc3\xa4urer', 'Mollenkopf', 'Meyer'] | PLoS Pathog | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
817 | GSE7071 | 5/31/2007 | ['7071'] | [] | [u'17577400'] | 1925099 | [u'17577400'] | ['de', 'Jia', 'Yun', 'Ressom', 'Cheng', 'Mohanty', 'Bajic'] | ['de', 'Jia', 'Yun', 'Ressom', 'Cheng', 'Mohanty', 'Bajic'] | ['Jia', 'de', 'Ressom', 'Cheng', 'Mohanty', 'Bajic', 'Yun'] | BMC Genomics | 2007 | 6/18/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
818 | GSE7074 | 5/31/2007 | ['7074'] | [] | [u'20068025'] | 2579797 | [u'18451330'] | [u'Kisters', 'Bijnens', 'Daemen', u'Wijnands', u'Kuiper', 'Horrevoets', u'Volger', 'Verhaegh', 'Eijgelaar', u'Cleutjens', u'Fledderus', u'vD.'] | ['Nadler', 'Gleissner', 'Ley', 'Sanders'] | [] | Arterioscler Thromb Vasc Biol | 2008 | 2008 Jun | 0 | Affymetrix equipment and H133A chips as described. Gene expression data are available at the NCBI gene expression and hybridization array data repository ( http://www.ncbi.nlm.nih.gov/geo/ , series GSE7138 ). Real-time PCR Total RNA was isolated from cultured macrophages and foam cells using the RNEasy Mini Kit with DNase treatment. Reverse transcription was performed with the Omni-script RT Kit (all|human atherosclerotic plaques by laser capture microdissection showed 75% increased AR mRNA expression as measured by gene chip analysis (Volger OL et al., http://www.ncbi.nlm.nih.gov/geo/ , series GSE7074{{tag}}--MENTION-- ), supporting a role for the polyol pathway in human atherosclerosis. Our findings that oxLDL-induced AR upregulation increases oxidative stress suggest that the harmful effects clearly outweigh an | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
819 | GSE7077 | 12/31/2007 | ['7077'] | [] | [u'17981215'] | 2515339 | [u'18698372'] | ['', 'Selvarajah', 'Maire', 'Squire', 'Bayani', 'Yoshimoto', 'Zielenska', 'Paderova'] | ['Maire', 'Squire', 'Sadikovic', 'Yoshimoto', 'Al-Romaih', 'Zielenska'] | ['Yoshimoto', 'Maire', 'Squire', 'Zielenska'] | PLoS One | 2008 | 7/30/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
820 | GSE7084 | 7/26/2007 | ['7084'] | ['2838'] | [u'17634102'] | 1934369 | [u'17634102'] | ['Kuivaniemi', 'Lenk', 'Gatalica', 'Weinsheimer', 'Tromp', 'Berguer'] | ['Kuivaniemi', 'Lenk', 'Gatalica', 'Weinsheimer', 'Tromp', 'Berguer'] | ['Kuivaniemi', 'Lenk', 'Gatalica', 'Weinsheimer', 'Tromp', 'Berguer'] | BMC Genomics | 2007 | 7/16/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
821 | GSE7094 | 2/23/2007 | ['7094'] | [] | [u'17328815'] | 2922100 | [u'20684756'] | ['Spindel', 'Norgren', 'Duan', 'Li'] | ['Arhondakis', 'Kossida', 'Kapasa'] | [] | Biol Direct | 2010 | 8/4/2010 | 0 | For the macaque, expression data were retrieved from embryonic stem cells [59], and from several adult tissues [60,61]{{key}}--REUSE--. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
822 | GSE7094 | 2/23/2007 | ['7094'] | [] | [u'17328815'] | 2845654 | [u'20376170'] | ['Spindel', 'Norgren', 'Duan', 'Li'] | ['Zheng', 'Wang', 'Zhang', 'Uhl', 'Li', 'Liu', 'Lu', 'Cao', 'Du', 'Yu', 'Wei'] | ['Li'] | PLoS Comput Biol | 2010 | 3/26/2010 | 0 | ;only one unspliced Sus scrofa EST (BI343741) could be mapped to the first 3′ untranslated region (UTR) of FLJ33706 . The GEO [22] microarray database included a databset GSE7094{{tag}}--REUSE-- which profiled five tissues (cortex, fibroblast, pancreas, testis and thymus) in rhesus monkey. Re-analysis of the data showed low expression signal in Rhesus Macaque (normalized expression inten| monkey microarray data We found in Affymetrix Rhesus Macaque Genome Array a probeset MmugDNA.22336.1.S1 for the orthologous locus of FLJ33706 . We also found a GEO [22] dataset, GSE7094{{tag}}--REUSE--, which profiled five tissues (cortex, fibroblast, pancreas, testis and thymus) in a rhesus monkey with six replicates for each sample [39] . We downloaded GSE7094{{tag}}--REUSE-- raw array files f | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
823 | GSE7094 | 2/23/2007 | ['7094'] | [] | [u'17328815'] | 1819383 | [u'17328815'] | ['Spindel', 'Norgren', 'Duan', 'Li'] | ['Spindel', 'Norgren', 'Duan', 'Li'] | ['Spindel', 'Norgren', 'Duan', 'Li'] | BMC Genomics | 2007 | 2/28/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
824 | GSE7096 | 4/27/2007 | ['7096'] | [] | [u'17462098'] | 1867817 | [u'17462098'] | ['Hashizume', 'Hosoe', 'Ishiwata', 'Kaneyama', 'Kizaki', 'Ushizawa', 'Takahashi'] | ['Hashizume', 'Hosoe', 'Ishiwata', 'Kaneyama', 'Kizaki', 'Ushizawa', 'Takahashi'] | ['Hashizume', 'Hosoe', 'Ishiwata', 'Kaneyama', 'Kizaki', 'Ushizawa', 'Takahashi'] | Reprod Biol Endocrinol | 2007 | 4/27/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
825 | GSE7097 | 4/12/2007 | ['7097'] | ['3029'] | [u'17426248'] | 2821899 | [u'19351829'] | ['Van', 'Wang', 'Semizarov', 'Tahir', 'Lesniewski', 'Olejniczak', 'Sauter', 'Anderson'] | ['Rudin', 'Parmigiani', 'Marchionni', 'Rhodes', 'Devereux', 'Hierman', 'Daniel', 'Peacock', 'Dorsch', 'Watkins', 'Yung'] | [] | Cancer Res | 2009 | 4/15/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
826 | GSE7098 | 3/15/2007 | ['7098'] | [] | [u'17400708'] | 1913798 | [u'17400708'] | ['Le', 'Kohler', 'Wincker', 'Fluch', 'Duplessis', 'Duchaussoy', 'Ningre', 'Frey', 'Couloux', 'Rinaldi', 'Martin'] | ['Le', 'Kohler', 'Wincker', 'Fluch', 'Duplessis', 'Duchaussoy', 'Ningre', 'Frey', 'Couloux', 'Rinaldi', 'Martin'] | ['Le', 'Kohler', 'Wincker', 'Fluch', 'Duplessis', 'Duchaussoy', 'Ningre', 'Frey', 'Couloux', 'Rinaldi', 'Martin'] | Plant Physiol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
827 | GSE7110 | 6/15/2007 | ['7110'] | [] | [] | 1919478 | [u'17567617'] | [u'Schaub', u'Skvortsov', u'Curtis', u'Abdueva', u'Tavare'] | ['Schaub', 'Skvortsov', 'Tavar\xc3\xa9', 'Curtis', 'Abdueva'] | ['Schaub', 'Skvortsov', 'Curtis', 'Abdueva'] | Nucleic Acids Res | 2007 | 2007 | 0 | Raw CEL data have been deposited in NCBI's Gene Expression Omnibus [18] and are accessible through GEO Series accession numberÊGSE7110{{tag}}--DEPOSIT--, while the processed results are provided as Supplementary Data. | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
828 | GSE7112 | 3/1/2007 | ['7112'] | ['2730'] | [u'18630751'] | 2848248 | [u'20226034'] | ['Schroeder', 'Hugouvieux', 'Kuhn'] | ['Blomster', 'Saloj\xc3\xa4rvi', 'Wrzaczek', 'Jaspers', 'Vainonen', 'Kangasj\xc3\xa4rvi', 'Overmyer', 'Reddy'] | [] | BMC Genomics | 2010 | 3/12/2010 | 0 | n with decreasing short-wave cut-off in the UV range (UV-B experiment); E-MEXP-739, Syringolin A; E-MEXP-1797, Rotenone), Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ (accession numbers GSE5615, Elicitors LPS, HrpZ, Flg22 and NPP1; GSE5685, Virulent and avirulent Pseudomonas syringae ; GSE9955, BTH experiment 1; GDS417 E. cichoracearum ; GSE5530, H 2 O 2 ; GSE5621, Cold time course expe| GSE5622, Osmotic stress time course experiment; GSE5623, Salt time course experiment; GSE5624, Drought time course experiment; GSE5722, O 3 ; GSE12887, Norflurazon; GSE10732, OPDA and Phytoprostane; GSE7112{{tag}}--REUSE--, ABA experiment 2) and The Integrated Microarray Database System http://ausubellab.mgh.harvard.edu/imds (Experiment name: BTH time course, BTH experiment 2). The raw Affymetrix data was preproces | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
829 | GSE7112 | 3/1/2007 | ['7112'] | ['2730'] | [u'18630751'] | 2729609 | [u'19638476'] | ['Schroeder', 'Hugouvieux', 'Kuhn'] | ['Smirnoff', 'Jones', 'Mullineaux', 'Baker', 'Galvez-Valdivieso', 'Lawson', 'Davies', 'Asami', 'Truman', 'Slattery', 'Fryer'] | [] | Plant Cell | 2009 | 2009 Jul | 0 | L exposure and exogenous ABA application. A total of 816 genes were identified as representing a significant overlap between HL and ABA responses and were clustered with data from the Gene Expression Omnibus (GEO) and NASCARRAYS databases. The 30 min, 1 h, and 3 h ABA data come from NASCARRAYS-176, 4 h ABA from GSE7112{{tag}}--REUSE-- , 3 h ABA #2 from GSE6171 , and 3 h HL from GSE7743 . In this TREEVIEW repr|in HL-exposed ABA biosynthesis-deficient mutants. All mutants caused a significant reduction in the expression of the test genes compared with the wild-type controls ( Figure 2C ). A meta-analysis of publicly available microarray data for treatment of seedlings with ABA ( Goda et al., 2008 ) compared with data from HL-exposed seedlings ( Kleine et al., 2007 ; see Methods) revealed that 816 genes were core| genes was induced under both conditions, while 320 were suppressed in response to both treatments (see Supplemental Data Set 1 online). When expression data for these genes were clustered with other publicly available ABA treatment data ( Figure 2D ), a strong correlation was observed between 3 h of HL exposure and plants 3 or 4 h after ABA application at a variety of concentrations (uncentered correlati|iar ABA content varies ( Rossel et al., 2006 ). The primers used in this study for quantitative RT-PCR are given in Supplemental Table 2 online. Bioinformatics Data for Affymetrix ATH1 GeneChips were downloaded from the GEO repository ( http://www.ncbi.nlm.nih.gov/sites/entrez?db=gds ) and the NASCArrays database ( http://affymetrix.Arabidopsis.info/narrays/experimentbrowse.pl ). The HL exposure d|ed that contributed to the significant weighted similarity score were clustered together with other ABA treatment time points from the NASCARRAYS-176 data set and additional ABA treatment data sets ( GSE7112{{tag}}--REUSE-- and GSE6171 ). Hierarchical clustering was performed using CLUSTER ( Eisen et al., 1998 ) and visualized with the program TREEVIEW ( Eisen et al., 1998 ). Complete linkage clustering using an unc | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
830 | GSE7118 | 3/1/2007 | ['7118'] | [] | [u'19295514', u'18927605'] | 2802188 | [u'20008927'] | [u'Calcar', 'Hawkins', 'Heintzman', 'Hon', 'Stark', 'Liu', 'Ren', 'Crawford', u'Barrera', 'Lee', 'Zhang', 'Ye', 'Kheradpour', u'Glass', 'Ching', u'Kim', 'Kellis', 'Green', 'Stuart', u'Webster', 'Stewart', u'Rosenfeld', 'Wang', 'Thomson', u'Luna', 'Antosiewicz-Bourget', 'Harp', 'Lobanenkov'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
831 | GSE7118 | 3/1/2007 | ['7118'] | [] | [u'19295514', u'18927605'] | 2556089 | [u'18927605'] | [u'Calcar', 'Hawkins', 'Heintzman', 'Hon', 'Stark', 'Liu', 'Ren', 'Crawford', u'Barrera', 'Lee', 'Zhang', 'Ye', 'Kheradpour', u'Glass', 'Ching', u'Kim', 'Kellis', 'Green', 'Stuart', u'Webster', 'Stewart', u'Rosenfeld', 'Wang', 'Thomson', u'Luna', 'Antosiewicz-Bourget', 'Harp', 'Lobanenkov'] | ['Hon', 'Ren', 'Wang'] | ['Hon', 'Ren', 'Wang'] | PLoS Comput Biol | 2008 | 2008 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
832 | GSE7118 | 3/1/2007 | ['7118'] | [] | [u'19295514', u'18927605'] | 2910248 | [u'19295514'] | [u'Calcar', 'Hawkins', 'Heintzman', 'Hon', 'Stark', 'Liu', 'Ren', 'Crawford', u'Barrera', 'Lee', 'Zhang', 'Ye', 'Kheradpour', u'Glass', 'Ching', u'Kim', 'Kellis', 'Green', 'Stuart', u'Webster', 'Stewart', u'Rosenfeld', 'Wang', 'Thomson', u'Luna', 'Antosiewicz-Bourget', 'Harp', 'Lobanenkov'] | ['Hawkins', 'Lee', 'Stewart', 'Heintzman', 'Hon', 'Ren', 'Zhang', 'Ye', 'Kellis', 'Stark', 'Kheradpour', 'Liu', 'Ching', 'Antosiewicz-Bourget', 'Stuart', 'Thomson', 'Green', 'Lobanenkov', 'Crawford', 'Harp'] | ['Hawkins', 'Lee', 'Stewart', 'Zhang', 'Hon', 'Heintzman', 'Ye', 'Kellis', 'Stark', 'Kheradpour', 'Liu', 'Ching', 'Green', 'Antosiewicz-Bourget', 'Stuart', 'Thomson', 'Ren', 'Lobanenkov', 'Crawford', 'Harp'] | Nature | 2009 | 5/7/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
833 | GSE7120 | 10/2/2007 | ['7120'] | [] | [u'19154620'] | 2657775 | [u'19154620'] | ['Mohammad', u'Priyanka', 'Singh', 'Sharma', u'Farhan'] | ['Mohammad', 'Singh', 'Sharma'] | ['Mohammad', 'Singh', 'Sharma'] | BMC Syst Biol | 2009 | 1/21/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
834 | GSE7121 | 3/15/2007 | ['7121'] | [] | [u'17400708'] | 1913798 | [u'17400708'] | ['', 'Le', 'Kohler', 'Wincker', 'Fluch', 'Duplessis', 'Duchaussoy', 'Ningre', 'Frey', 'Couloux', 'Rinaldi', 'Martin'] | ['Le', 'Kohler', 'Wincker', 'Fluch', 'Duplessis', 'Duchaussoy', 'Ningre', 'Frey', 'Couloux', 'Rinaldi', 'Martin'] | ['Le', 'Kohler', 'Wincker', 'Fluch', 'Duplessis', 'Duchaussoy', 'Ningre', 'Frey', 'Couloux', 'Rinaldi', 'Martin'] | Plant Physiol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
835 | GSE7122 | 2/23/2007 | ['7122'] | [] | [u'17343727'] | 1829401 | [u'17343727'] | ['Joosse', 'van', 'Nederlof'] | ['Joosse', 'van', 'Nederlof'] | ['Joosse', 'van', 'Nederlof'] | BMC Cancer | 2007 | 3/7/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
836 | GSE7123 | 2/27/2007 | ['7123'] | [] | [u'17267482'] | 2620272 | [u'19014681'] | ['Taylor', 'Sanda', 'Stephens', 'Tavis', 'Belle', 'Tsukahara', 'Li', 'Schaley', 'Edenberg', 'Howell', 'Brodsky', 'McClintick'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | luate switch-like gene expression patterns in high-dimensional datasets profiling diverse biological conditions. For this purpose, we compiled two large-scale gene expression microarray datasets from publicly available data repositories. The first dataset included samples spanning nineteen different tissue types from healthy donors. The second dataset included samples from donors with one of a number of i| genes may serve as candidate biomarkers or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epiderm|133, GSE2361, GSE3419, GSE3526, GSE7307 Heart 38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, G| Ovary 10 GSE2361, GSE3526, GSE6008, GSE7307 Pancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of|on regions of DNA that code for switch-like genes and their promoter regions. Methods Datasets Microarray datasets used in this study were compiled from the online public repositories Gene Expression Omnibus (GEO) [ 53 ] and Array Express (AE) [ 54 ] as described in additional file 2 . All datasets were profiled on the HGU133A or its recently expanded version, the HGU133plus2 Affymetrix platforms. The da|ssi A Lee C Relative impact of nucleotide and copy number variation on gene expression phenotypes Science 2007 315 848 853 17289997 10.1126/science.1136678 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic acids research 2002 30 207 210 11752295 10.1093/nar/30.1.207 Parkinson H Kapushesky M Shojatalab M Abeygunawardena N Coulson R Fa | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
837 | GSE7123 | 2/27/2007 | ['7123'] | [] | [u'17267482'] | 2546378 | [u'18691426'] | ['Taylor', 'Sanda', 'Stephens', 'Tavis', 'Belle', 'Tsukahara', 'Li', 'Schaley', 'Edenberg', 'Howell', 'Brodsky', 'McClintick'] | ['Xie', 'Shyr', 'Wei', 'Tu', 'Li', 'Huang'] | ['Li'] | J Transl Med | 2008 | 8/9/2008 | 0 | nal time-series microarray data used in this work is from a study of Milton W. Taylor which was published on Journal of Virology last year[ 4 ], and publicly available at GEO under accession number GSE7123{{tag}}--REUSE--. The initial data set consists of the gene expression profiles of 33 African-American and 36 Caucasian American patients with chronic HCV genotype 1 infection on day 0 (pretreatment), and 1, 2, 7, | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
838 | GSE7123 | 2/27/2007 | ['7123'] | [] | [u'17267482'] | 1866036 | [u'17267482'] | ['Taylor', 'Sanda', 'Stephens', 'Tavis', 'Belle', 'Tsukahara', 'Li', 'Schaley', 'Edenberg', 'Howell', 'Brodsky', 'McClintick'] | ['Taylor', 'Sanda', 'Stephens', 'Tavis', 'Belle', 'Tsukahara', 'Li', 'Schaley', 'Edenberg', 'Howell', 'Brodsky', 'McClintick'] | ['Taylor', 'Sanda', 'Stephens', 'Tavis', 'Belle', 'Tsukahara', 'Li', 'Schaley', 'Edenberg', 'Howell', 'Brodsky', 'McClintick'] | J Virol | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
839 | GSE7124 | 11/9/2007 | ['7124'] | ['3242'] | [] | 2846932 | [u'20236552'] | [u'Martin', u'Tyler', u'Dorrance', u'Maroof', u'Hoeschele'] | ['Ma', 'Ramachandran', 'Jiang'] | [] | BMC Evol Biol | 2010 | 3/18/2010 | 0 | u/ for rice and http://www.Arabidopsis.org/ for Arabidopsis . The soybean expression data under the pathogen Phytophthora sojae was downloaded from the GEO DataSets [ 86 ] with accession number GSE7124{{tag}}--REUSE--. The microarray data after the infection by soybean cyst nematode were obtained from the Arrayexpress database [ 87 ] with accession number E-MEXP-808 [ 98 ]. Lectin genes with at least 2-fold diff|e. Abbreviations ABA: Agaricus bisporus agglutinin; CRA: chitinase-related agglutinin; EEA: Euonymus europaeus agglutinin; EST: expression sequence tag; EUL: Euonymus lectin; GEO: gene expression omnibus; GNA: Galanthus nivalis agglutinin; HMM: hidden markov model; LysM: lysin motif; MPSS: massively parallel signature sequencing; MRCA: most recent common ancestor; MULE: Mutator -like transposable | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
840 | GSE7127 | 3/31/2007 | ['7127'] | [] | [u'17516929'] | 2602602 | [u'19104654'] | ['Johansson', u'Packer', u'Parsons', u'Stark', 'Pavey', u'Boyle', 'Hayward'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127{{tag}}--REUSE-- experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127{{tag}}--REUSE-- GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127{{tag}}--REUSE-- GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127{{tag}}--REUSE-- GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127{{tag}}--REUSE-- GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
841 | GSE7128 | 4/3/2007 | ['7128'] | [] | [u'17403935'] | 2118536 | [u'17403935'] | ['Shaffer', 'Staudt', 'Choi', 'Haddad', 'Calame', 'Kuo'] | ['Shaffer', 'Staudt', 'Choi', 'Haddad', 'Calame', 'Kuo'] | ['Shaffer', 'Staudt', 'Choi', 'Haddad', 'Calame', 'Kuo'] | J Exp Med | 2007 | 4/16/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
842 | GSE7130 | 10/22/2007 | ['7130'] | [] | [u'17952125'] | 2492386 | [u'17952125'] | ['Chaplin', 'Harada', 'Young', 'Caulee', 'Chelala', 'Lemoine', 'Bhakta', 'Baril'] | ['Chaplin', 'Harada', 'Young', 'Caulee', 'Chelala', 'Lemoine', 'Bhakta', 'Baril'] | ['Chaplin', 'Harada', 'Young', 'Caulee', 'Chelala', 'Lemoine', 'Bhakta', 'Baril'] | Oncogene | 2008 | 3/20/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
843 | GSE7137 | 4/2/2007 | ['7137'] | ['2687'] | [u'17403374'] | 1892530 | [u'17403374'] | ['Gray', 'Haldar', 'Wang', 'Orihuela', 'Jain', 'Hong', 'Fisch', 'Peroni', 'Kahn', 'Cline', 'Kim'] | ['Gray', 'Haldar', 'Wang', 'Orihuela', 'Jain', 'Hong', 'Fisch', 'Peroni', 'Kahn', 'Cline', 'Kim'] | ['Gray', 'Haldar', 'Wang', 'Orihuela', 'Hong', 'Fisch', 'Peroni', 'Kahn', 'Cline', 'Jain', 'Kim'] | Cell Metab | 2007 | 2007 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
844 | GSE7138 | 3/28/2007 | ['7138'] | [] | [u'17244792'] | 2579797 | [u'18451330'] | ['Lee', 'Miller', 'Cho', 'Gleissner', u'Sashkin', 'Jain', 'Ley', 'Dunson', 'Shashkin'] | ['Nadler', 'Gleissner', 'Ley', 'Sanders'] | ['Gleissner', 'Ley'] | Arterioscler Thromb Vasc Biol | 2008 | 2008 Jun | 0 | Affymetrix equipment and H133A chips as described. Gene expression data are available at the NCBI gene expression and hybridization array data repository ( http://www.ncbi.nlm.nih.gov/geo/ , series GSE7138{{tag}}--DEPOSIT-- ). Real-time PCR Total RNA was isolated from cultured macrophages and foam cells using the RNEasy Mini Kit with DNase treatment. Reverse transcription was performed with the Omni-script RT Kit (all|human atherosclerotic plaques by laser capture microdissection showed 75% increased AR mRNA expression as measured by gene chip analysis (Volger OL et al., http://www.ncbi.nlm.nih.gov/geo/ , series GSE7074 ), supporting a role for the polyol pathway in human atherosclerosis. Our findings that oxLDL-induced AR upregulation increases oxidative stress suggest that the harmful effects clearly outweigh an | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
845 | GSE7145 | 4/30/2007 | ['7145'] | [] | [u'18453640'] | 2413281 | [u'18453640'] | [u'Gonz\xe1lez', 'Pavez', 'Silva', 'Pacheco', 'Cambiazo', 'Ib\xc3\xa1\xc3\xb1ez', 'Campos-Vargas', 'Orellana', 'Gonz\xc3\xa1lez', 'Gonz\xc3\xa1lez-Ag\xc3\xbcero', 'Meisel', u'Ib\xe1\xf1ez', u'Gonz\xe1lez-Ag\xfcero', 'Retamales'] | ['Pavez', 'Silva', 'Pacheco', 'Cambiazo', 'Ib\xc3\xa1\xc3\xb1ez', 'Campos-Vargas', 'Orellana', 'Gonz\xc3\xa1lez', 'Gonz\xc3\xa1lez-Ag\xc3\xbcero', 'Meisel', 'Retamales'] | ['Pavez', 'Silva', 'Pacheco', 'Cambiazo', 'Ib\xc3\xa1\xc3\xb1ez', 'Campos-Vargas', 'Orellana', 'Gonz\xc3\xa1lez', 'Gonz\xc3\xa1lez-Ag\xc3\xbcero', 'Meisel', 'Retamales'] | J Exp Bot | 2008 | 2008 | 0 | AND pmc_gds | 1 | 0 | ||||
846 | GSE7146 | 3/1/2007 | ['7146'] | ['2791', '2790'] | [u'17472435'] | 2975422 | [u'21047384'] | ['Storgaard', 'Ladd', 'Johansson', 'Jensen', 'Poulsen', 'Krook', 'Mazzini', 'Chutkow', 'Zierath', 'Altshuler', 'Groop', 'Parikh', 'Tornqvist', 'Bj\xc3\xb6rnholm', 'Saxena', 'Mootha', 'Schulze', 'Carlsson', 'Ridderstr\xc3\xa5le', 'Vaag', 'Lee'] | ['Xu', 'Hu'] | [] | BMC Genomics | 2010 | 11/2/2010 | 0 | essed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test|dary region in the 2-D feature space of average gene expression (AG) versus average difference of gene expression (AD). Fig. 1 . shows the distribution of true DEGs in the 2D space for four datasets: GSE9499, GSE6342, GSE6740_1, and GSE6740_2 from GEO database [ 22 ]. Based on the fact that boundary region is characterized with scarcity of genes, a density based pruning algorithm is proposed here for p|s as collected by Kadota et. al. [ 20 ]. They collected 38 microarray datasets with experimentally determined true DEGS by real-time polymerase chain reaction (RT-PCR). Thirty six of the datasets are downloaded from GEO database [ 22 ]. Without losing generality, we experimented with 17 disease or dose response datasets of Homo sapiens out of the 36 GEO datasets (Table 1 ). The 17 datasets are reported j|how that real-world GEO datasets, especially historical ones, tend to have small sample size. Table 1 17 Datasets with 284 DEGs in total. Each dataset has 22833 genes. Dataset Conditions True DEG A B GSE1462 4 4 4 GSE1615_1 4 5 8 GSE1650 18 12 8 GSE2666_2 5 5 6 GSE3524 16 4 4 GSE3860 9 9 8 GSE4917 3 3 5 GSE5667_1 5 6 3 GSE6236 14 14 7 GSE6344 10 10 19 GSE6740_1 10 10 40 GSE6740_2 10 10 62 GSE7146{{tag}}--REUSE-- 6 6 6|GSE8441 11 11 9 GSE9499 15 7 77 GSE9574 15 14 5 The 17 Datasets used here cover a variety of biological or medical studies: GSE1462 (mitochondrial DNA mutations), GSE1615_1 (Valproic acid treatment), GSE1650 (chronic obstructive pulmonary disease), GSE2666_2(bone marrow Rho level effect), GSE3524 (tumor of epithelial tissue), GSE3860 (Hutchinson-Gilford progeria syndrome), GSE4917 (breast cancer), GSE5|67_1 (atopic dermatitis), GSE6236 (Adult vs. fetal reticulocyte transcriptome comparison), GSE6344 (renal cell carcinoma disease), GSE6740_1 (HIV-infection), GSE6740_2 (HIV-infection, disease state), GSE7146{{tag}}--REUSE-- (hyperinsulinaemic, does response), GSE7765 (dose response, DMSO or 100 nM Dioxin), GSE8441 (dietary intake response), GSE9574 (breast cancer), and GSE9499 (hypomorphic germline mutations). The div|ch as WAD and FC. To illustrate the bias of popular DEG identification algorithms, Fig. 2 shows true positive (TP), true negative (TN), false positive (FP), and false negative (FN) DEGs for dataset GSE9499 which has 77 true DEGs. Fig. 2 . (a) shows that fold change (FC) misses most true DEGs (FN genes), which are located in the region below the threshold average difference and with high expression le|obtaining a user-specified number of candidate DEGs. Table 2 Comparison of No. of missing true DEGs after DB pruning. ( N 0 = 4, R 0 = 0.0017) Total Gene: 22283 After DP-pruning True DEG DP missed GSE1462 2054 4 0 GSE1615_1 2449 8 3 GSE1650 1317 8 2 GSE2666_2 1618 6 2 GSE3524 814 4 0 GSE3860 2073 8 0 GSE4917 785 5 1 GSE5667_1 1316 3 0 GSE6236 2231 7 0 GSE6344 3127 19 0 GSE6740_1 1183 40 1 GSE6740_2 |DEG to have high expression levels and high expression difference. Table 3 Ranks of true DEGs in original gene list and pruned gene list. Genes are sorted by four DEG identification algorithms on the GSE1577 dataset. Increase of ranks of true DEGs means that DB pruning have correctly filtered out many non-DEGs. t-test/tTest’ 1404/808 7/6 1321/768 3800/1713 4741/1975 3633/1659 4145/1828 606/388 |on level remains the same. And the average difference of expression can be defined as sum of average difference among pairwise comparisons. Our DB pruning is implemented using C++ and Perl and can be downloaded from http://mleg.cse.sc.edu/degprune . Authors’ contributions JH initiated the project, proposed the DB pruning idea, helped in experimental designs, and wrote the manuscript. J. X. develo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
847 | GSE7146 | 3/1/2007 | ['7146'] | ['2791', '2790'] | [u'17472435'] | 2366062 | [u'18483554'] | ['Storgaard', 'Ladd', 'Johansson', 'Jensen', 'Poulsen', 'Krook', 'Mazzini', 'Chutkow', 'Zierath', 'Altshuler', 'Groop', 'Parikh', 'Tornqvist', 'Bj\xc3\xb6rnholm', 'Saxena', 'Mootha', 'Schulze', 'Carlsson', 'Ridderstr\xc3\xa5le', 'Vaag', 'Lee'] | ['Becker', 'Palsson'] | [] | PLoS Comput Biol | 2008 | 5/16/2008 | 1 | context-specific skeletal muscle models. Abbreviation Description Reference GEO Accession Number GB 3 patients before and 1 year after gastric bypass surgery (vastus lateralis). [27] GDS2089 GI 6 subjects before glucose/insulin infusion via clamp and 2 hours after beginning (vastus lateralis) [28] GSE7146{{tag}}--REUSE-- FO 24 subjects divided into 3 groups of eight: morbidly obese (MO | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
848 | GSE7146 | 3/1/2007 | ['7146'] | ['2791', '2790'] | [u'17472435'] | 2955048 | [u'20846437'] | ['Storgaard', 'Ladd', 'Johansson', 'Jensen', 'Poulsen', 'Krook', 'Mazzini', 'Chutkow', 'Zierath', 'Altshuler', 'Groop', 'Parikh', 'Tornqvist', 'Bj\xc3\xb6rnholm', 'Saxena', 'Mootha', 'Schulze', 'Carlsson', 'Ridderstr\xc3\xa5le', 'Vaag', 'Lee'] | ['Carter', 'Xu'] | [] | BMC Bioinformatics | 2010 | 9/16/2010 | 0 | 144 609 Hochberg 0 144 609 SidakSD 0 144 614 BH 0 407 5552 BY 0 227 2221 qvalue 0 407 6108 SAM 0 0 5330 Bayes 0 0 5705 EDR 5 593 4810 The raw cel files of these three data sets [ 21 , 23 , 25 ] were downloaded from the NCBI GEO database (GSE7146{{tag}}--REUSE--, GSE7333, GSE4107) and were preprocessed by the GC-RMA method. Two groups in each data set were tested by two-tailed t test assuming equal variance. All multiple|0 543 1319 891 1628 1724 TN 3774 4238 3424 3870 3110 3011 FN 91 118 80 98 75 72 TPR 0.4972 0.3481 0.5580 0.4586 0.5856 0.6022 FPR 0.2061 0.1136 0.2781 0.1871 0.3436 0.3641 The expression data set was downloaded from http://www.ambystoma.org and was preprocessed by the RMA method [ 38 ]. Differentially expressed genes (DEGs) were detected at the significance level of 0.05 by the EDR method and the other |ining 7129 probe sets. Only three genes were reported to be regulated by insulin in human muscle cell using a Wilcoxon signed rank test after filtering removed 5952 probe sets. The raw cel files were downloaded from the NCBI GEO database (GSE7146{{tag}}--REUSE--) containing data that are MIAME compliant as detailed on the MGED Society website http://www.mged.org/Workgroups/MIAME/miame.html The GC-RMA algorithm was used|d type and three miR-1-2 knockout mice at postnatal days 10 were compared for gene expression levels using Affymetrix mouse genome 430 2.0 array that contains 45101 probe sets. The raw cel files were downloaded from the NCBI GEO database (GSE7333) and were preprocessed by GC-RMA algorithm. With this data set, the EDR method was compared with 11 other multiple test procedures (Figure 2 ) at the same signi|ing the GeneChip U133-Plus 2.0 Array [ 25 ]. Twelve tumor specimens and ten adjacent grossly normal-appearing tissues from at least 8 cm away were collected for RNA extraction. The raw cel files were downloaded from the NCBI GEO database (GSE4107) and were preprocessed by GC-RMA algorithm. With this data set, the EDR method was compared with the other 11 multiple test procedures (Figure 2 ) at the same s|dent limb regeneration [ 26 ]. The same RNA samples were detected by Ambystoma GeneChip and 454 cDNA sequencing. There are total 4844 probe sets (TGs) on this GeneChip array. The raw cel files were downloaded from the public Ambystoma Microarray Database [ 37 ]. Detailed information of these data files and the DEGs confirmation by 454 cDNA sequencing were described in the original study [ 26 ]. The RMA | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
849 | GSE7146 | 3/1/2007 | ['7146'] | ['2791', '2790'] | [u'17472435'] | 1858708 | [u'17472435'] | ['Storgaard', 'Ladd', 'Johansson', 'Jensen', 'Poulsen', 'Krook', 'Mazzini', 'Chutkow', 'Zierath', 'Altshuler', 'Groop', 'Parikh', 'Tornqvist', 'Bj\xc3\xb6rnholm', 'Saxena', 'Mootha', 'Schulze', 'Carlsson', 'Ridderstr\xc3\xa5le', 'Vaag', 'Lee'] | ['Storgaard', 'Ladd', 'Johansson', 'Jensen', 'Poulsen', 'Krook', 'Mazzini', 'Chutkow', 'Zierath', 'Altshuler', 'Groop', 'Parikh', 'Tornqvist', 'Bj\xc3\xb6rnholm', 'Saxena', 'Mootha', 'Schulze', 'Carlsson', 'Ridderstr\xc3\xa5le', 'Vaag', 'Lee'] | ['Storgaard', 'Ladd', 'Johansson', 'Jensen', 'Mazzini', 'Krook', 'Tornqvist', 'Chutkow', 'Altshuler', 'Groop', 'Zierath', 'Bj\xc3\xb6rnholm', 'Saxena', 'Mootha', 'Carlsson', 'Ridderstr\xc3\xa5le', 'Schulze', 'Vaag', 'Parikh', 'Lee', 'Poulsen'] | PLoS Med | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
850 | GSE7148 | 11/2/2007 | ['7148'] | ['3207'] | [u'17854483'] | 2785812 | [u'19917117'] | ['Rose', 'Arevalo', 'Cacioppo', 'Cole', 'Hawkley', 'Sung'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | g features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model. Conclusion The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy paramet|value decomposition and we provide the details of the selected cost function. In Results a test of the algorithm on about 360 Genechips from recent (2006 onwards) experiments from the Gene Expression Omnibus (GEO) is presented. Finally, the advantages of this scheme and its overall performance as background subtraction method is highlighted. Methods Approach The general assumption is that the (natural) l|ding eigenvalues close to zero in machine precision. These are, however, rare, and were actually never found in the calculations presented here. Results We analyzed a total of 366 CEL-files which are publicly available from the GEO server http://www.ncbi.nlm.nih.gov/geo . Table 1 gives an overview of the distribution of CEL-files over the twelve different organisms considered in this study. The array s|d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d In this section we compare the estimated background signal (as given in Eq. (2)) with the intensities of given probe sets corresponding to non-expressed genes in the samples analyzed. We start with publicly available data taken from spike-in experiments on HGU95A chips http://www.affymetrix.com where genes have been spiked-in at known concentrations, ranging from 0 to 1024 pM (picoMolar). The data at |d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and |e nearest-neighbor model. Existing algorithms are either of the first [ 4 ] or the second [ 6 , 7 , 9 ] category, but not both. The background subtraction scheme has been tested on 360 GeneChips from publicly available data of recent expression experiments. Since the fitted values for the same parameters in different experiments do not show much variation, the algorithm is robust and can be easily transfe | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
851 | GSE7148 | 11/2/2007 | ['7148'] | ['3207'] | [u'17854483'] | 2375027 | [u'17854483'] | ['Rose', 'Arevalo', 'Cacioppo', 'Cole', 'Hawkley', 'Sung'] | ['Rose', 'Arevalo', 'Cacioppo', 'Cole', 'Hawkley', 'Sung'] | ['Rose', 'Arevalo', 'Cacioppo', 'Cole', 'Hawkley', 'Sung'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
852 | GSE7149 | 10/1/2007 | ['7149'] | [] | [u'18237285'] | 2766661 | [u'19328852'] | ['Linser', 'Corena-McLeod', u'Corena-Mcleod', 'Vanekeris', u'VanEkeris', 'Neira'] | ['Linser', 'Heyland', 'Moroz', 'VanEkeris', 'Ribeiro', 'Neira'] | ['VanEkeris', 'Linser', 'Neira'] | Insect Biochem Mol Biol | 2009 | 2009 May-Jun | 0 | AND pmc_gds | 1 | 0 | ||||
853 | GSE7152 | 3/31/2007 | ['7152'] | [] | [u'17450523'] | 2602602 | [u'19104654'] | ['Pavey', 'Packer', u'Parsons', 'Stark', 'Ayub', 'Rizos', 'Boyle', 'Hayward'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152{{tag}}--REUSE-- GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152{{tag}}--REUSE-- GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
854 | GSE7153 | 10/1/2007 | ['7153'] | [] | [] | 2602602 | [u'19104654'] | [u'Pavey', u'Packer', u'Parsons', u'Stark', u'Boyle', u'Hayward'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153{{tag}}--REUSE-- GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
855 | GSE7155 | 7/30/2007 | ['7155'] | [] | [u'17936574'] | 2128036 | [u'17920205'] | ['Oros', 'Gyetvai', 'Kovacs', 'Szakall', 'Toth', 'Saito', 'Prasad', 'Vadasz'] | ['Oros', 'Smiley', 'Wang', 'Gyetvai', 'Kovacs', 'Toth', 'Saito', 'Figarsky', 'Mohan', 'Vadasz'] | ['Oros', 'Gyetvai', 'Kovacs', 'Toth', 'Saito', 'Vadasz'] | Neuroscience | 2007 | 11/9/2007 | 0 | We used the Vadasz/Saito Whole Brain B6vsC 430 2.0 May_2007 RMA Database (Gene Expression Omnibus series entryGSE7155{{tag}}--REUSE--Ê(http://www.ncbi.nlm.nih.gov/geo/) for establishing gene expression strain differences between C57BL/6ByJ and BALB/cJ | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
856 | GSE7167 | 6/8/2007 | ['7167'] | [] | [u'17623098'] | 1934918 | [u'17623098'] | ['Lian', 'Sherman', 'Glod', 'Hu', 'Jayapal'] | ['Lian', 'Sherman', 'Glod', 'Hu', 'Jayapal'] | ['Lian', 'Sherman', 'Glod', 'Hu', 'Jayapal'] | BMC Genomics | 2007 | 7/10/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
857 | GSE7169 | 7/30/2007 | ['7169'] | [] | [u'17536022'] | 2586676 | [u'17536022'] | ['Zhang', 'Hodor', 'Wauthier', 'Miles', 'Ray', 'Clodfelter', 'Waxman', 'Holloway'] | ['Zhang', 'Hodor', 'Wauthier', 'Miles', 'Ray', 'Clodfelter', 'Waxman', 'Holloway'] | ['Zhang', 'Hodor', 'Wauthier', 'Miles', 'Ray', 'Clodfelter', 'Waxman', 'Holloway'] | Physiol Genomics | 2007 | 9/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
858 | GSE7170 | 7/30/2007 | ['7170'] | [] | [u'17536022'] | 2586676 | [u'17536022'] | ['Zhang', 'Hodor', 'Wauthier', 'Miles', u'Gregory', 'Ray', 'Clodfelter', 'Waxman', 'Holloway'] | ['Zhang', 'Hodor', 'Wauthier', 'Miles', 'Ray', 'Clodfelter', 'Waxman', 'Holloway'] | ['Zhang', 'Hodor', 'Wauthier', 'Miles', 'Ray', 'Clodfelter', 'Waxman', 'Holloway'] | Physiol Genomics | 2007 | 9/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
859 | GSE7172 | 6/8/2007 | ['7172'] | [] | [u'17623098'] | 2367054 | [u'18461186'] | ['Lian', 'Sherman', 'Glod', 'Hu', 'Jayapal'] | ['Kok', 'Jayapal', 'Sherman', 'Griffin', 'Yap', 'Philp', 'Hu'] | ['Sherman', 'Hu', 'Jayapal'] | PLoS One | 2008 | 5/7/2008 | 0 | in mass spectrometry are provided in Table S3 . The complete microarray data has been deposited in a MIAME compliant manner at Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo/ ): accession GSE7172{{tag}}--DEPOSIT--. Data analysis For microarray analysis, data points from technical replicate spots falling outside of mean ±1.2 times standard deviation were discarded for every gene. Genes with large abso | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
860 | GSE7172 | 6/8/2007 | ['7172'] | [] | [u'17623098'] | 1934918 | [u'17623098'] | ['Lian', 'Sherman', 'Glod', 'Hu', 'Jayapal'] | ['Lian', 'Sherman', 'Glod', 'Hu', 'Jayapal'] | ['Lian', 'Sherman', 'Glod', 'Hu', 'Jayapal'] | BMC Genomics | 2007 | 7/10/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
861 | GSE7175 | 8/5/2007 | ['7175'] | [] | [u'17369302'] | 1913326 | [u'17369302'] | ['', 'Kuramitsu', 'Nakagawa', 'Kashihara', 'Yokoyama', 'Shinkai', 'Kira'] | ['Kuramitsu', 'Nakagawa', 'Kashihara', 'Yokoyama', 'Shinkai', 'Kira'] | ['Kuramitsu', 'Nakagawa', 'Kashihara', 'Yokoyama', 'Shinkai', 'Kira'] | J Bacteriol | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
862 | GSE7181 | 3/6/2007 | ['7181'] | ['2728'] | [u'17483311'] | 2415112 | [u'18492260'] | ['Wischhusen', 'Brawanski', 'Lohmeier', 'Proescholdt', 'Bogdahn', 'Aigner', 'Beier', 'Hau', 'Oefner'] | ['Delattre', 'Ducray', 'Mokhtari', 'Lair', 'Paris', 'de', 'Sanson', 'Bi\xc3\xa8che', 'Idbaih', 'Marie', 'Hoang-Xuan', 'Thillet', 'Vidaud'] | [] | Mol Cancer | 2008 | 5/20/2008 | 0 | callosum (GSM175855, GSM175856, GSM175857, GSM175858, GSM176050) and 5 samples of cortex (GSM176049, GSM176344, GSM176345, GSM176346, GSM176347), available in the Gene Expression Omnibus repository (GSE7307) [ 6 ]. To compare the gene expression profile with glioblastomas cancer stem cells (CSC), we used the data of Beier et al. (GSE7181{{tag}}--REUSE--) [ 7 ]. All raw and normalized data files for the microarray ana|nal genes In order to better characterize the expression of neuronal genes in gliomas with complete 1p19q codeletion, we performed a new hierarchical clustering analysis with samples of normal brain (GSE7307), including grey matter (cortex) and white matter (corpus callosum). As glioblastomas expressed genes of neural cancer stem cells, we also included samples of glioblastoma cancer stem cells (GSE718|ma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis Cancer Cell 2006 9 157 173 16530701 10.1016/j.ccr.2006.02.019 Gene Expression Omnibus repository (GSE7307) (http://www.ncbi.nlm.nih.gov/geo) Beier D Hau P Proescholdt M Lohmeier A Wischhusen J Oefner PJ Aigner L Brawanski A Bogdahn U Beier CP CD133(+) and CD133(-) glioblastoma-derived cancer stem cells | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
863 | GSE7182 | 4/3/2007 | ['7182'] | ['2706'] | [u'17485517'] | 2722172 | [u'19620627'] | ['Henke', 'Melchior', 'Klein', 'Giebel', 'Soh', 'von', 'Jumaa', 'Li', 'Feldhahn', 'Duy', 'Hofmann', 'M\xc3\xbcschen', u'M\xfcschen'] | ['J\xc3\xa4ck', 'Martinelli', 'Herzog', 'Klemm', 'von', 'Jumaa', 'Park', 'Trageser', 'Duy', 'Hofmann', 'M\xc3\xbcschen', 'Groffen', 'Storlazzi', 'Kim', 'Schuh', 'Nahar', 'Li', 'Gruber', 'Heisterkamp', 'Iacobucci'] | ['von', 'Jumaa', 'Li', 'Duy', 'Hofmann', 'M\xc3\xbcschen'] | J Exp Med | 2009 | 8/3/2009 | 0 | ) version 4.0. SNP calls were generated using GTYPE. Affymetrix CEL files were analyzed for genomic copy number variations using Partek Genomic Suite V and are available from GEO under accession no. GSE13612 . The underlying algorithm of CNAG strongly improves the signal-to-noise ratios of the final copy number output by correcting for length and GC content of the individual PCR products using quadrat|ated by normalizing to the mean value. Ratios were exported in Gene Cluster and visualized as a heat map with Java TreeView. Cel files from GeneChip arrays are available from GEO under accession no. GSE7182{{tag}}--DEPOSIT-- and were imported to the BRB Array Tool ( http://linus.nci.nih.gov/BRB-ArrayTools.html ) and processed using the RMA algorithm (Robust Multi-Array Average) for normalization and summarization. Aff | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
864 | GSE7182 | 4/3/2007 | ['7182'] | ['2706'] | [u'17485517'] | 2118573 | [u'17485517'] | ['Henke', 'Melchior', 'Klein', 'Giebel', 'Soh', 'von', 'Jumaa', 'Li', 'Feldhahn', 'Duy', 'Hofmann', 'M\xc3\xbcschen', u'M\xfcschen'] | ['Henke', 'Melchior', 'Klein', 'Giebel', 'Soh', 'von', 'Jumaa', 'Li', 'Feldhahn', 'Duy', 'Hofmann', 'M\xc3\xbcschen'] | ['Henke', 'Melchior', 'Klein', 'Giebel', 'Soh', 'von', 'Jumaa', 'Li', 'Feldhahn', 'Duy', 'Hofmann', 'M\xc3\xbcschen'] | J Exp Med | 2007 | 5/14/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
865 | GSE7186 | 4/15/2007 | ['7186'] | [] | [u'17410184'] | 2930948 | [u'20435627'] | ['Heldrup', 'Johansson', 'Ritz', 'Ed\xc3\xa9n', 'Olofsson', 'Fioretos', 'Andersson', 'Behrendtz', 'H\xc3\xb6glund', u'Ed\xe9n', u'H\xf6glund', 'Lindgren', u'R\xe5de', 'Lassen', u'Porwit-MacDonald', 'Fontes', 'R\xc3\xa5de', 'Porwit-Macdonald'] | ['Koeffler', 'Ruckert', 'Akagi', 'Weiss', 'Okamoto', 'Sanada', 'Haferlach', 'Nowak', 'Kato', 'Dugas', 'Ogawa', 'Kawamata'] | [] | Haematologica | 2010 | 2010 Sep | 0 | kemia by Oncomine. Gene expression analyses were performed on (i) 87 samples of B-ALL and (ii) 11 samples of T-ALL of the Andersson_Leukemia study ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE7186{{tag}}--REUSE-- ). FOXO3 expression was lower in both B-ALL and T-ALL samples than in normal bone marrow ( P = 1.6×10 −11 and 1.2×10 −08 , respectively). ALL(B) = B-cell type acute | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
866 | GSE7186 | 4/15/2007 | ['7186'] | [] | [u'17410184'] | 2650744 | [u'19223548'] | ['Heldrup', 'Johansson', 'Ritz', 'Ed\xc3\xa9n', 'Olofsson', 'Fioretos', 'Andersson', 'Behrendtz', 'H\xc3\xb6glund', u'Ed\xe9n', u'H\xf6glund', 'Lindgren', u'R\xe5de', 'Lassen', u'Porwit-MacDonald', 'Fontes', 'R\xc3\xa5de', 'Porwit-Macdonald'] | ['Grosveld', 'Wikenheiser-Brokamp', 'Cripe', 'Wells', 'Currier', 'Morris', 'Wise-Draper', 'Mintz-Cole', 'Simpson'] | [] | Cancer Res | 2009 | 3/1/2009 | 0 | , and in a subset of ovarian and adult bone marrow associated cancers. Decreased DEK expression in AML as observed here ( Fig. 1A , see GEO website http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE7186{{tag}}--MENTION-- ) contrasts with a previous report by the Knuutila laboratory ( 31 ) describing increased DEK expression. Interestingly, a striking difference between the two reports is patient age: DEK downregula | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
867 | GSE7192 | 7/31/2007 | ['7192'] | [] | [u'17409064'] | 1893039 | [u'17409064'] | ['Vijayraghavan', 'Prasad', 'Yadav'] | ['Vijayraghavan', 'Prasad', 'Yadav'] | ['Vijayraghavan', 'Prasad', 'Yadav'] | Genetics | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
868 | GSE7194 | 3/8/2007 | ['7194'] | [] | [u'17562815'] | 2118649 | [u'17562815'] | ['Kawai', 'Kumar', 'van', 'Vogel', 'Akira', 'Goutagny', 'Fitzgerald', 'Young', 'Roberts', 'Ching', 'Kato', 'Savan', 'Perera', ''] | ['Kawai', 'Kumar', 'van', 'Vogel', 'Akira', 'Goutagny', 'Fitzgerald', 'Young', 'Roberts', 'Ching', 'Kato', 'Savan', 'Perera'] | ['Kawai', 'Kumar', 'van', 'Vogel', 'Akira', 'Goutagny', 'Fitzgerald', 'Young', 'Roberts', 'Ching', 'Kato', 'Savan', 'Perera'] | J Exp Med | 2007 | 7/9/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
869 | GSE7195 | 9/28/2007 | ['7195'] | [] | [u'17907811'] | 1994710 | [u'17907811'] | ['Brown', 'van', 'Wang', 'Rodriguez', 'Chi', 'Nuyten', 'Hastie', 'Mukherjee'] | ['Brown', 'van', 'Wang', 'Rodriguez', 'Chi', 'Nuyten', 'Hastie', 'Mukherjee'] | ['Brown', 'van', 'Wang', 'Rodriguez', 'Chi', 'Nuyten', 'Hastie', 'Mukherjee'] | PLoS Genet | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
870 | GSE7196 | 3/8/2007 | ['7196'] | ['2727'] | [u'17488637'] | 2945940 | [u'20840752'] | [u'Gigu\xe8re', 'Wilson', 'Evans', 'Huss', 'Alaynick', 'Gigu\xc3\xa8re', 'Dufour', 'Kelly', 'Blanchette', 'Downes'] | ['Ojeda', 'de', 'Nitsch', 'Gon\xc3\xa7alves', 'Moreau'] | [] | BMC Bioinformatics | 2010 | 9/14/2010 | 0 | n, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expr| approach for scoring candidates is presented here for comparison purposes as a naïve strategy for network analysis of differential expression. We have benchmarked these four strategies on 40 publicly available data sets originated from Affymetrix chips on which mice with (simple) knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. For each data set we|egy based on a direct neighborhood analysis. These four network-based prioritization strategies for scoring candidate genes based on their differentially expressed neighborhood were benchmarked on 40 publicly available knockout experiments in mice. Performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expressi|ng candidate genes using network-based machine learning approaches even if no knowledge is available about the disease or phenotype. Methods Benchmark data The benchmark for this study consists of 40 publicly available data sets originated from Affymetrix chips on which mice with knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. Table 3 shows all data sets|ur benchmark. Table 3 The benchmark data. Gene Name GEO accession number Gene Name GEO accession number 1 Abca1 GSE5496 21 Mbnl1 GSE14691 2 Btk GSE2826 22 Mst1r, Ron GSE16629 3 Cav1 GSE10849 23 MyD88 GSE6688 4 Cav3 GSE10848 24 Nos3, eNos GSE1988 5 Cftr GSE5715 25 Phgdh GSE8555 6 Clcn1 GSE14691 26 Pmp22 GSE1947 7 Cnr1 GSE7694 27 PPAR α GSE6864 8 Emd GSE5304 28 Prkag3, AMPK G3 GSE4065 9 Epas1, Hi|-2 GSE16067 29 Pthlh, Pthrp GSE17654 10 Esrra GSE7196{{tag}}--REUSE-- 30 Rab3a GSE6527 11 Gap43 GSE12687 31 RasGrf1 GSE8425 12 Gnmt GSE9809 32 Rbm15 GSE12628 13 Hdac1 GSE5583 33 Runx GSE4911 14 Hdac2 GSE6770 34 Scd1 GSE2926 15 Hsf4 GSE12415 35 Slc26a4 GSE10587 16 Hspa1A, Hsp70.1 GSE11120 36 Srf GSE13333 17 Il6 GSE411 37 Tgm2 GSE10285 18 Lhx1, Lim1 GSE4230 38 Zc3h12a GSE14891 19 Lhx8 GSE11897 39 Zfp36, Tpp GSE5324 20 L | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
871 | GSE7197 | 4/4/2007 | ['7197'] | ['2694'] | [u'17468215'] | 1914135 | [u'17468215'] | ['Qin', 'Zhou', 'Sun', 'Li', 'Zhao', 'Huang'] | ['Qin', 'Zhou', 'Sun', 'Li', 'Zhao', 'Huang'] | ['Qin', 'Zhou', 'Sun', 'Li', 'Zhao', 'Huang'] | Plant Physiol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
872 | GSE7200 | 10/5/2007 | ['7200'] | [] | [] | 2657775 | [u'19154620'] | [u'Singh', u'Sharma', u'Farhan'] | ['Mohammad', 'Singh', 'Sharma'] | ['Singh', 'Sharma'] | BMC Syst Biol | 2009 | 1/21/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
873 | GSE7206 | 6/1/2007 | ['7206'] | [] | [u'17428314'] | 1868918 | [u'17428314'] | ['Smeds', 'Gustafsson', 'Bergh', 'Liu', 'Miller', 'Lin', 'Tee', 'Str\xc3\xb6m', 'Thomsen', 'Vega', 'Kietz', 'Li'] | ['Smeds', 'Gustafsson', 'Bergh', 'Liu', 'Miller', 'Lin', 'Tee', 'Str\xc3\xb6m', 'Thomsen', 'Vega', 'Kietz', 'Li'] | ['Smeds', 'Bergh', 'Liu', 'Miller', 'Lin', 'Tee', 'Str\xc3\xb6m', 'Thomsen', 'Vega', 'Li', 'Kietz', 'Gustafsson'] | Breast Cancer Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
874 | GSE7208 | 4/21/2007 | ['7208'] | [] | [u'17638901'] | 2688022 | [u'19399471'] | ['Wu', 'Lewin', 'Platero', 'Fargnoli', 'Ayers'] | ['Mennerich', 'Buhr', 'Roepcke', 'Pilarsky', 'Br\xc3\xbcmmendorf', 'Groene', 'Rosenthal', 'Heinze', 'Staub', 'Weber', 'Castanos-Velez', 'Klaman', 'Mann', 'Hinzmann'] | [] | J Mol Med | 2009 | 2009 Jun | 1 | AND pmc_gds | 0 | 1 | ||||
875 | GSE7211 | 3/7/2007 | ['7211'] | [] | [] | 2608847 | [u'18931094'] | [u'Addepalli', u'Zhang', u'Rao', u'Hunt', u'Li', u'Yun', u'Falcone', u'Xu'] | ['Fukushima', 'Arita', 'Wada', 'Kanaya'] | [] | DNA Res | 2008 | 2008 Dec | 1 | or the model plant Arabidopsis thaliana ( A. thaliana ), >3000 gene-expression data have been measured by different research groups and stored in online repositories such as Gene Expression Omnibus (GEO), 1 The Arabidopsis Information Resource (TAIR), 2 and the Nottingham Arabidopsis Stock Centre Arrays (NASC). 3 Also available are the functional prediction tools based on gene co-expression,| The number of significant columns rapidly decreases as the λ increases, and contributing arrays are independent of the number of SVs. Most amplified array sets were the stamen development (GSE4733) and the Type III effectors on plant defense response (NASCarrays-59). Other significant arrays included profiles of early germinating seeds (ME00332), the response to bacterial-(LPS, HrpZ, Flg22) |000a0; 6 ). The correspondences for δ 4 and δ 5 were obscurer, but as their commonly highlighted experimental conditions we could recognize stamen development data set (accession, GSE4733) with gene sets for cytokinins 9- N -glucoside biosynthesis and cytokinins 7- N -glucoside biosynthesis. In summary, we could identify biological functions related to the largest five SVs, although|azuhiro Suwa, and Munehide Itoyama for assistance in classifying GeneChip data, and Tsuyoshi Kato for critical reading of our manuscript. References 1 Edgar R. Domrachev M. Lash A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res. 2002 30 207 210 11752295 2 Zhang P. The Arabidopsis information resource (TAIR): a model organism database providing a|Other significant arrays included profiles of early germinating seeds (ME00332), the response to bacterial-(LPS, HrpZ, Flg22) and oomycete-(NPP1) derived elicitors (ME00319), oxidative stress (GSE7211{{tag}}--REUSE--) and alternative oxidases (GSE4113 and GSE2406) | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
876 | GSE7213 | 3/8/2007 | ['7213'] | [] | [u'17392359'] | 1900105 | [u'17392359'] | ['Hibbert', 'Mirro', 'Rulli', 'Pederson', 'Biswal', 'Rein'] | ['Hibbert', 'Mirro', 'Rulli', 'Pederson', 'Biswal', 'Rein'] | ['Hibbert', 'Mirro', 'Rulli', 'Pederson', 'Biswal', 'Rein'] | J Virol | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
877 | GSE7218 | 3/13/2007 | ['7218'] | [] | [u'17420266'] | 2118534 | [u'17420266'] | ['Peng', 'Takatsu', 'Goodnow', 'Horikawa', 'Pogue', 'Martin', 'Silver'] | ['Peng', 'Takatsu', 'Goodnow', 'Horikawa', 'Pogue', 'Martin', 'Silver'] | ['Peng', 'Takatsu', 'Goodnow', 'Horikawa', 'Pogue', 'Martin', 'Silver'] | J Exp Med | 2007 | 4/16/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
878 | GSE7224 | 12/31/2007 | ['7224'] | [] | [u'17620369'] | 1934526 | [u'17620369'] | ['Nares', 'Wahl', 'Wen', 'Rangel', 'Sauk', 'Munson', 'Moutsopoulos', 'Nikitakis', u'\xa0Munson'] | ['Nares', 'Wahl', 'Wen', 'Rangel', 'Sauk', 'Munson', 'Moutsopoulos', 'Nikitakis'] | ['Nares', 'Wahl', 'Wen', 'Rangel', 'Sauk', 'Munson', 'Moutsopoulos', 'Nikitakis'] | Am J Pathol | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
879 | GSE7226 | 3/12/2007 | ['7226'] | [] | [u'16909388'] | 2823693 | [u'20092660'] | ['Wilson', 'Van', 'Schein', 'Krzywinski', 'Li', 'McGillivray', 'Pugh', 'Delaney', 'Bailey', 'Farnoud', 'Asano', 'Ally', u'Varhol', 'Friedman', 'Holt', 'Zahir', 'Barber', 'Birch', 'Eydoux', 'Charest', 'Jones', 'Kennedy', 'Arbour', 'Nayar', 'Rajcan-Separovic', 'Go', 'Marra', u'Wong', 'Cao', 'Fernandes', 'Gibson', 'Flibotte', 'Schnerch', 'Baross', 'Yong', 'Armstrong', 'Siddiqui', 'Chan', 'Langlois', 'Brown-John'] | ['Toplak', 'Demsar', 'Curk', 'Zupan'] | [] | BMC Genomics | 2010 | 1/22/2010 | 0 | nibus Gene Expression Omnibus [ 8 ] was considered for SNP data sets that contain at least 200 samples with approximately equal case/control distribution. Five data sets met these criteria: • GSE6754 [ 9 ] describing families with two individuals affected by autism spectrum disorders. Individuals were classified as affected (2459 samples) or unaffected (3473 samples) and described with around 1|E8054 [ 10 ] comprising 901 SNPs for each of the 121 cancerous samples and 87 controls. • GSE8055 [ 10 ] comprising 1,189 SNPs for each of the 141 cancerous samples and 89 controls. • GSE7226{{tag}}--REUSE-- [ 11 ] with platform designation GPL2004, comprising around 50,000 SNPs for each of the 102 samples from mentally retarded children and 213 controls from their unaffected siblings or parents. | entire set of SNP pairs, this limited our studies to about 2,000 SNPs. Therefore we only considered the first 2,000 SNPs of each data set, and a stratified sample of 500 individuals was used for the GSE6754 data set. Experimental methodology Feature scoring assigns interaction scores to all pairs of SNPs, resulting in a ranked list of SNP pairs. Either pairs of SNPs with scores exceeding a certain thr|ion to direct scoring and scoring with replication groups we report results obtained with bootstrap sampling. Click here for file Additional file 3 Performance graphs for differently sized subsets of GSE6754 . Performance graphs for data subsets of 100, 200, 500, 1000, 2000, and 5000 samples drawn from GSE6754. Click here for file Additional file 4 Source code and data sets . Source code and data sets | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
880 | GSE7230 | 3/10/2007 | ['7230'] | [] | [] | 2664703 | [u'19352461'] | [''] | ['Shay', 'Domany', 'Reiner-Benaim', 'Hegi', 'Lambiv'] | [] | Cancer Inform | 2009 | 2009 | 0 | monstrate the method for three different public aCGH datasets from two different childhood neoplasms associated with the nervous system on three different BAC array platforms: Medulloblastoma—GSE8634; Neuroblastoma—GSE5784 37 and GSE7230{{tag}}--REUSE--. 38 Results Algorithm Our method uses aCGH data to create a concise genomic description of each sample, including chromosomal status and appearance of|d separately but similarly, using the same method. Input The algorithm’s input is the raw log2 aCGH data, and the markers’ status. The raw log2 ratio data of chromosome 2p, taken from GSE7230{{tag}}--REUSE--, is presented in Figure 1A . Markers’ status is the assignment per marker per sample—loss (−1), normal (0) or gain (1). The status was set by the R package GLAD (Gain and L|tions matrix A, which has binary valued elements: A ms = 1 if the aCGH marker m was assigned a gain value on sample s, and A ms = 0 otherwise (the amplification matrix of chromosome 2p based on the GSE7230{{tag}} data is shown in Fig. 1B ). A deletions matrix D is defined similarly: D ms = 1 if the aCGH marker m has a loss assignment on sample s, and D ms = 0 otherwise (deletion matrix is not shown). Mar| in which an entire chromosome arm is lost, the corresponding entries are replaced by NaNs in the deletion matrix D. Figure 1 displays the amplification volume calculation for chromosomal arm 2p in GSE7230{{tag}} (Neuroblastoma). The height matrix H is actually the raw log2 ratio. H ms ( Fig. 1A ) is the measured aCGH log2 ratio value of marker m in sample s. A ms ( Fig. 1B ) is the amplification matrix|ntary Table 5. Significantly amplified markers appear in Supplementary Table 4, and amplifications in Supplementary Table 6. Medulloblastoma When applied to the Medulloblastoma dataset analyzed here (GSE8634) our method finds all the known chromosomal aberrations of this cancer, and several possibly new ones as well. Figure 2 displays the chromosome status map of the Medulloblastoma dataset, and the |ious chromosomal translocations in hematological malignancies. NPM1 was associated with centrosome duplication and the regulation of p53, and might have a role as a tumor suppressor. 47 This dataset (GSE8634) has not yet been published, but dataset GSE2139 that includes a subset of the samples 41 was analyzed for local aberrations. This publication included a list of amplifications and deletions. We s|were identified as significantly amplified by our method—MYCN, CDK6 and marker RP11–382A18. Marker RP11–382A18 is annotated near MYC region on chromosome 8q by the platform of GSE2139, used by. 41 MYC amplification and MYCN amplification are mutually exclusive. Nine of the amplifications reported there were not identified by our method. Four of their deletions included markers |as MYCN amplification and 1p deletion. Group 2B is characterized by 11q deletion, and to a lesser extent, 3p deletion. This classification explains most of the chromosomal arms associations found. In GSE5784 there are 15 amplifications (Supplementary Table 6B, 28 markers amplified, Supplementary Table 4B) and 115 deletions (Supplementary Table 3B, 245 markers deleted, Supplementary Table 5B). In GSE723|plified Supplementary Table 4C) and 49 deletions (Supplementary Table 5C, 87 markers deleted, Supplementary Table 3C). Three amplifications and 14 deletions are common to both Neuroblastoma datasets (GSE5784, GSE7230{{tag}}--REUSE--) ( Table 2 , Fig. 3C and D ). The first amplified region, which was separated into two regions in GSE7230{{tag}}, is on chromosome 2, and corresponds to the MYCN region. MYCN amplifications were|ance with this region being a known frequent normal copy number variation. 48 Eight of the common deletions correspond to the 1pter deletion, and this deletion was fractioned into eight deletions in GSE7230{{tag}}. Another common deletion is in the region of BRCA1, a known tumor suppressor gene. In GSE5784, several known tumor suppressor genes were deleted—APC, CDKN2A, RB1 and TGFBR1. Also, two regio| 11, that includes CCND1, FGF19, FGF3, FGF4 was amplified, as well as a region on chromosome 12 with ETV6. For GSE5784, no aberration list was given in the original publication 37 for comparison. In GSE7230{{tag}}, the ALK region on chromosome 2 was amplified. ALK was previously identified as having a role in Neuroblastoma. 49 The fumarate hydratase (FH) region was deleted in GSE7230{{tag}}. FH was shown to be a t|for the entire genome and for each chromosomal arm separately. We applied our method on three public datasets of childhood neoplasms associated with the nervous system—one of Medulloblastoma (GSE8634) and two of Neuroblastoma (GSE5784, GSE7230{{tag}}). In Medulloblastoma, we find five distinct sub groups. Two sub groups with isochromosome 17, one with many other chromosomal events (2), and one with fe|erwise it was left in the analysis. Potentially inaccurate location was identified for 17 to 144 markers per dataset, which constitute 0.7%–3.5% of the markers (see Table 1 ). We noticed for GSE8634 that many aberrations were highly correlated, and correlated to gender. Some of the samples were probably hybridized to opposite sex control samples. The 28 markers whose two sided t-test p-value b|te controlling procedures 10.1093/bioinformatics/btf877 Bioinformatics 2003 19 368 375 12584122 Figure 1 Calculation of the “volume” statistic for chromosomal arm 2p amplifications in GSE7230{{tag}} (Neuroblastoma) A ) The height matrix H (raw data) of 2p, where each element (m, s) on 2p is the log2 ratio of aCGH marker m in sample s. Each row corresponds to a marker, and each column correspon|arked in A–E by red asterisks. For presentation only, values are truncated to [−1, 1]. Figure 2 Chromosomal status and aberrations in Medulloblastoma A) Chromosomal status of dataset GSE8634. Each row corresponds to a chromosomal arm. Due to space limitation, only every second arm is labelled. Since some chromosomes are telocentric (with short p arm), there is a change from p to q. Val|tion only, values are truncated to the range [−1, 1], rising from blue to red. Figure 3 Chromosomal status and aberrations common to both Neuroblastoma datasets Chromosomal status of datasets GSE5784 ( A ) and GSE7230{{tag}} ( B ), and the aberrations common to both of Neuroblastoma datasets, shown for the patients of GSE5784 ( C ) and GSE7230{{tag}} ( D ). Each column corresponds to a sample. Samples are ma|the range [−1, 1], rising from blue to red. Table 1 Array CGH datasets analyzed. Dataset Condition Samples # Markers # Markers amplified Amplifications Markers deleted Deletions CNV Removed # GSE8634 Medulloblastoma 80 6295 13 10 137 99 4 126 GSE5784 Neuroblastoma 236 2457 28 15 245 115 4 17 GSE7230{{tag}}--REUSE-- Neuroblastoma 82 4073 30 18 87 49 0 144 The datasets are recognized by their Gene Expression Omn | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
881 | GSE7234 | 3/10/2007 | ['7234'] | [] | [] | 2923118 | [u'20670406'] | [u'Xie', u'Zhou', u'Sheng', u'Lin', u'Liu', u'Ou', u'Yuan', u'Deng', u'Luo'] | ['Wang', 'Tsai', 'Liu', 'Hou', 'Hung', 'Chen', 'Lee'] | ['Liu'] | J Biomed Sci | 2010 | 7/29/2010 | 0 | All microarray datasets in this paper are available at GEO under the accession no. ofÊGSE723{{key}}--DEPOSIT--ÊandÊGSE9520. | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
882 | GSE7234 | 3/10/2007 | ['7234'] | [] | [] | 2837036 | [u'20178649'] | [u'Xie', u'Zhou', u'Sheng', u'Lin', u'Liu', u'Ou', u'Yuan', u'Deng', u'Luo'] | ['Wang', 'Chao', 'Liu', 'Wu', 'Wong', 'Liang', 'Hsu', 'Hsieh'] | ['Liu'] | BMC Genomics | 2010 | 2/24/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
883 | GSE7236 | 3/14/2007 | ['7236'] | [] | [u'17273971'] | 2782819 | [u'19925429'] | ['Wurfel', 'Akey', 'Storey', 'Ronald', 'Madeoy', 'Strout'] | ['Dolan', 'Zhang'] | [] | Curr Pharm Des | 2009 | 2009 | 1 | AND pmc_gds | 0 | 1 | ||||
884 | GSE7238 | ['7238'] | [] | [] | 2211385 | [u'17977525'] | [''] | ['Mavrothalassitis', 'Wen', 'Tynan', 'M\xc3\xbanera', 'Oshima', 'Cecena', 'Williams'] | [] | Dev Biol | 2007 | 12/1/2007 | 0 | AND pmc_gds | 0 | 1 | |||||
885 | GSE7247 | 4/12/2007 | ['7247'] | [] | [u'19171878'] | 2610483 | [u'19125200'] | [u'Engelmayer', 'Xing', 'Komanduri', 'Li', 'Decker', 'Steiner', 'Yang', 'Shpall', 'Robinson'] | ['Battaglia', 'Rizzetto', 'Paola', 'Rocca-Serra', 'Beltrame', 'Gambineri', 'Cavalieri'] | [] | PLoS One | 2009 | 2009 | 1 | AND pmc_gds | 0 | 1 | ||||
886 | GSE7247 | 4/12/2007 | ['7247'] | [] | [u'19171878'] | 2676083 | [u'19171878'] | [u'Engelmayer', 'Xing', 'Komanduri', 'Li', 'Decker', 'Steiner', 'Yang', 'Shpall', 'Robinson'] | ['Xing', 'Komanduri', 'Li', 'Decker', 'Steiner', 'Yang', 'Shpall', 'Robinson'] | ['Xing', 'Komanduri', 'Li', 'Decker', 'Steiner', 'Yang', 'Shpall', 'Robinson'] | Blood | 2009 | 4/30/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
887 | GSE7248 | 3/14/2007 | ['7248'] | [] | [u'17571927'] | 1904365 | [u'17571927'] | ['Nettleton', 'Buckner', 'Scanlon', 'Borsuk', 'Elshire', 'Madi', 'Zhang', 'Beck', 'Janick-Buckner', 'Timmermans', 'Schnable'] | ['Nettleton', 'Buckner', 'Scanlon', 'Borsuk', 'Elshire', 'Madi', 'Zhang', 'Beck', 'Janick-Buckner', 'Timmermans', 'Schnable'] | ['Nettleton', 'Buckner', 'Scanlon', 'Borsuk', 'Elshire', 'Madi', 'Zhang', 'Beck', 'Janick-Buckner', 'Timmermans', 'Schnable'] | PLoS Genet | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
888 | GSE7262 | 5/21/2007 | ['7262'] | [] | [u'17559302'] | 1891328 | [u'17559302'] | ['Gissot', 'Ajioka', u'Jim', 'Greally', 'Kim', 'Kelly', u'John'] | ['Kelly', 'Greally', 'Gissot', 'Kim', 'Ajioka'] | ['Kelly', 'Gissot', 'Greally', 'Kim', 'Ajioka'] | PLoS Pathog | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
889 | GSE7270 | 10/12/2007 | ['7270'] | [] | [u'17978178'] | 2077015 | [u'17978178'] | ['Shim', 'Loda', 'Majumder', 'Sellers', 'Xu', 'Golub', 'Ross'] | ['Shim', 'Loda', 'Majumder', 'Sellers', 'Xu', 'Golub', 'Ross'] | ['Shim', 'Loda', 'Majumder', 'Sellers', 'Xu', 'Golub', 'Ross'] | Proc Natl Acad Sci U S A | 2007 | 11/6/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
890 | GSE7271 | 10/30/2007 | ['7271'] | ['3050'] | [u'18053684'] | 2396444 | [u'18053684'] | ['Hannah', 'Bajic', 'Klein'] | ['Hannah', 'Bajic', 'Klein'] | ['Hannah', 'Bajic', 'Klein'] | Brain Behav Immun | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
891 | GSE7272 | 3/16/2007 | ['7272'] | [] | [u'19238575'] | 2168447 | [u'17675385'] | ['Monchy', 'Zhang', 'Mergeay', u'BENOTMANE', 'Gang', u'MERGEAY', 'Taghavi', u'MONCHY', 'van', 'Greenberg'] | ['van', 'Taghavi', 'Benotmane', 'Janssen', 'Vallaeys', 'Mergeay', 'Monchy'] | ['Mergeay', 'van', 'Monchy', 'Taghavi'] | J Bacteriol | 2007 | 2007 Oct | 0 | d greater than 3. The expression ratios for pMOL28 and pMOL30 are shown in Table S2 in the supplemental material. Microarray accession number. Array data have been deposited at the Gene Expression Omnibus website ( http://www.ncbi.nlm.nih.gov/geo/ ) under accession number GSE7272{{tag}}--DEPOSIT-- . Nucleotide sequence accession numbers. The sequences obtained for plasmids pMOL28 and pMOL30 of C. metallidurans CH | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
892 | GSE7276 | 3/16/2007 | ['7276'] | [] | [u'17452364'] | 1888834 | [u'17452364'] | ['Ohta', 'Seki', 'Ogiwara', 'Kawashima', 'Onoda', 'Kugou', 'Iwahashi', 'Enomoto', 'Ui', 'Harata'] | ['Ohta', 'Seki', 'Ogiwara', 'Kawashima', 'Onoda', 'Kugou', 'Iwahashi', 'Enomoto', 'Ui', 'Harata'] | ['Seki', 'Ogiwara', 'Kawashima', 'Onoda', 'Kugou', 'Iwahashi', 'Enomoto', 'Ohta', 'Ui', 'Harata'] | Nucleic Acids Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
893 | GSE7302 | 5/1/2007 | ['7302'] | [] | [] | 2600583 | [u'18674933'] | [u'M\xe5nssone', u'Jacobsen'] | ['M\xc3\xa5nsson', 'Gurbuxani', 'Sigvardsson', 'Dias', 'Kee'] | [] | Immunity | 2008 | 8/15/2008 | 0 | . Probe level expression values were calculated with RMA ( Irizarry et al., 2003 ). Further analysis was performed using dChip ( www.dchip.org ). Array data are accessible through the gene expression omnibus (GEO; GSE8407 and GSE7302{{tag}}--DEPOSIT-- ). Transcription factor binding sites conserved between human and mouse genomic sequences were identified in the promoters of E2A-dependent lymphoid-associated genes and | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
894 | GSE7302 | 5/1/2007 | ['7302'] | [] | [] | 2830985 | [u'20152025'] | [u'M\xe5nssone', u'Jacobsen'] | ['Qian', 'Hansson', 'Zetterblad', 'Zandi', 'Bryder', 'Paulsson', 'Lagergren', 'M\xc3\xa5nsson', 'Sigvardsson'] | [] | BMC Genomics | 2010 | 2/12/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
895 | GSE7303 | 4/12/2007 | ['7303'] | [] | [u'17486136'] | 2387222 | [u'18408019'] | ['Yekutieli', 'Segal', '', 'Rozovsky', 'Tuller', 'Li', 'Rencus-Lazar', 'Chor', 'Edgar', 'Chamovitz', 'Oron'] | ['Zhu', 'Zhang', 'Qian', 'Geerts', 'Xu'] | [] | Hum Reprod | 2008 | 2008 Jun | 1 | Affymetrix data for a set of 87 different normal human tissue types (the ‘human body index’ set) representing a total of 504 tissue samples were retrieved from public gene expression omnibus (GEO) data sets on the National Center for Biotechnology Information (NCBI) website ( Barrett et al ., 2005 , 2007 ). CEL data from the Affymetrix GeneChip Human Genome U133 Plus 2.0 array data set| MASS5.0 algorithm (Affymetrix, Santa Barbara, CA, USA). Annotations and clinical data for the tissue samples analyzed were available from http://www.ncbi.nlm.nih.gov/geo/query/ through its GEO ID: GSE7303{{tag}}. Human tissue collection Endometrial tissue was collected during routine endometrial biopsy from women with a normal menstrual cycle who were eligible for IVF and embryo transfer because of (partia | 0 | 0 | 0 | NOT pmc_gds | 0 | 1 |
896 | GSE7303 | 4/12/2007 | ['7303'] | [] | [u'17486136'] | 2673709 | [u'17486136'] | ['Yekutieli', 'Segal', '', 'Rozovsky', 'Tuller', 'Li', 'Rencus-Lazar', 'Chor', 'Edgar', 'Chamovitz', 'Oron'] | ['Yekutieli', 'Segal', 'Rozovsky', 'Tuller', 'Li', 'Rencus-Lazar', 'Chor', 'Edgar', 'Chamovitz', 'Oron'] | ['Yekutieli', 'Segal', 'Rozovsky', 'Tuller', 'Li', 'Rencus-Lazar', 'Chor', 'Edgar', 'Chamovitz', 'Oron'] | Mol Syst Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
897 | GSE7305 | 4/9/2007 | ['7305'] | ['2835'] | [u'17640886'] | 2752458 | [u'19735579'] | ['Rojas', 'Hever', 'Maki', 'Herrera', 'Zlotnik', 'Acosta', 'Grigoriadis', 'Marin', 'Roth', 'Hevezi', 'White', 'Conlon'] | ['Zhao', 'He', 'Wang', 'Pan', 'Bai'] | [] | Reprod Biol Endocrinol | 2009 | 9/8/2009 | 1 | microarray raw or normalized data are available. Finally six public gene expression data sets were involved in our study, which assessed endometriosis transcripts on a genome-wide basis. In data set GSE7307, total 677 samples from more than 90 distinct tissue types were processed, but only the profiles related to endometriosis and eutopic endometrium were considered here. The data generated from human| Characteristics of datasets included in the studies. First Author or Contributor Chip GEO Accession Experimental design Classification Probes Number of samples Disease Normal Sha [ 4 ] U133 PLUS 2.0 GSE7846 unpaired, HEECS ovarian 54K 5 5 Burney [ 14 ] U133 PLUS 2.0 GSE6364 unpaired, tissues Ovarian, peritoneal, rectovaginal 54K 21 16 Eyster [ 15 ] CodeLink GSE5108 paired, tissues Ovarian, peritoneal |nd adjusted, normalized and log2 probe-set intensities calculated using the Robust Multichip Averaging (RMA) algorithm in affy package [ 23 , 24 ], and the Codelink arrays normalizations performed in GSE5108 were retained. Genes which cannot be mapped to any KEGG pathway identified were excluded from the further analysis. The interquartile range (IQR) was used as a measure of variability. From the resu| able to highlight genes weakly connected to the phenotype which may be difficult to detect by using classical univariate statistics. We have performed gene set enrichment analysis of six independent publicly available gene expression data sets to understand in depth the common biological mechanisms involved in endometriosis. Our study compared the gene expression between lesion locations (ovarian vs. per | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
898 | GSE7305 | 4/9/2007 | ['7305'] | ['2835'] | [u'17640886'] | 1941489 | [u'17640886'] | ['Rojas', 'Hever', 'Maki', 'Herrera', 'Zlotnik', 'Acosta', 'Grigoriadis', 'Marin', 'Roth', 'Hevezi', 'White', 'Conlon'] | ['Rojas', 'Hever', 'Maki', 'Herrera', 'Zlotnik', 'Acosta', 'Grigoriadis', 'Marin', 'Roth', 'Hevezi', 'White', 'Conlon'] | ['Rojas', 'Hever', 'Maki', 'Herrera', 'Zlotnik', 'Acosta', 'Grigoriadis', 'Marin', 'Roth', 'Hevezi', 'White', 'Conlon'] | Proc Natl Acad Sci U S A | 2007 | 7/24/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
899 | GSE7306 | 9/12/2007 | ['7306'] | [] | [u'17717048'] | 1987340 | [u'17717048'] | ['Capkova', 'Jansa', 'Forejt', 'Ivanek', 'Homolka'] | ['Capkova', 'Jansa', 'Forejt', 'Ivanek', 'Homolka'] | ['Capkova', 'Forejt', 'Jansa', 'Ivanek', 'Homolka'] | Genome Res | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
900 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2936523 | [u'20838587'] | [u'Roth'] | ['Korn', 'Altshuler', 'McCarroll', 'Sklar', 'Daly', 'Purcell', 'Raychaudhuri'] | [] | PLoS Genet | 2010 | 9/9/2010 | 0 | inaccurate. In cases where too few events have been genotyped the asymptotic p -value can be replaced by a p -value based on robust permutation testing instead. We have implemented this test in the publicly available genetic data analysis software, PLINK [23] . CNV-enrichment-test is robust to skewed gene size, even if there are case-control differences between the size and rate of CNVs|ts of interest and also to precisely replicate reported results. Materials and Methods Compiling Gene Sets Brain-Expressed To identify genes with specific expression in the brain, we obtained a large publicly available human tissue expression microarray panel (GEO accession: GSE7307{{tag}}--REUSE--) [28] . We analyzed the data using the robust multi-array (RMA) method for background correction, normaliza|ted brain or spinal cord to the remaining 69 tissue profiles with a one-tailed Mann-Whitney rank-sum test. We identified those genes obtaining p <0.01 as preferentially expressed. Synapse We downloaded Gene Ontology [30] structure and annotations on December 2006. Since it was available, we used a previous version of Gene Ontology to ensure independence from the results of recen|005b;30] from model organisms. We identified those genes that were annotated with the ‘Synaptic Junction’ code (GO:0045202), or descendents of that code. Neuronal Activity We downloaded the list of genes within the category ‘Neuronal Activities’ (BP00166) listed in the Panther database [12] . Learning We downloaded the list of genes within the cate | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
901 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2620272 | [u'19014681'] | [u'Roth'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307{{tag}}--REUSE--, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307{{tag}}--REUSE-- Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307{{tag}}--REUSE--, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307{{tag}}--REUSE--, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307{{tag}}--REUSE-- Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307{{tag}}--REUSE-- Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307{{tag}}--REUSE-- Stomach 10 GSE2361, GSE3526, GSE7307{{tag}}--REUSE-- Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307{{tag}}--REUSE--, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307{{tag}}--REUSE-- Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
902 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2692004 | [u'19526064'] | [u'Roth'] | ['Mayburd'] | [] | PLoS One | 2009 | 6/15/2009 | 1 | as smaller scale datasets were downloaded from Global Expression Omnibus (GEO) platform at NCBI [34] . In particular, Expression Project for Oncology (expO) was downloaded as record GSE2109 at GEO database [35] . The data for normal expression (Human Body Index project) were downloaded as GSE7307{{tag}}--REUSE-- and GSE3526 [34] . Multiple smaller projects describing|for all major measurements (see GPL570 platform at GEO for more detail and annotation). Prokaryotic data were derived in Affymetrix GeneChip E. coli Antisense Genome Array platform (GPL199, dataset GDS1827). Experimental noise reduction Aggregating of multiple microarray experiments by diverse authors poses unique challenges due to a significant component of technical noise, overlaid with biological |layi for Nematode Drug Targets. PLoS ONE 2(11) e1189 18000556 34 https://expo.intgen.org/geo/ 35 http://www.intgen.org/expo_scientific_release.cfm 36 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE7307{{tag}}--REUSE-- 37 http://www.itl.nist.gov/div898/handbook/index.htm 38 http://www.ncbi.nlm.nih.gov/sites/entrez 39 http://discover.nci.nih.gov/gominer/htgm.jsp 40 http://www.geneontology.org/amigo/help-front.shtm | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
903 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2752458 | [u'19735579'] | [u'Roth'] | ['Zhao', 'He', 'Wang', 'Pan', 'Bai'] | [] | Reprod Biol Endocrinol | 2009 | 9/8/2009 | 1 | microarray raw or normalized data are available. Finally six public gene expression data sets were involved in our study, which assessed endometriosis transcripts on a genome-wide basis. In data set GSE7307{{tag}}--REUSE--, total 677 samples from more than 90 distinct tissue types were processed, but only the profiles related to endometriosis and eutopic endometrium were considered here. The data generated from human| Characteristics of datasets included in the studies. First Author or Contributor Chip GEO Accession Experimental design Classification Probes Number of samples Disease Normal Sha [ 4 ] U133 PLUS 2.0 GSE7846 unpaired, HEECS ovarian 54K 5 5 Burney [ 14 ] U133 PLUS 2.0 GSE6364 unpaired, tissues Ovarian, peritoneal, rectovaginal 54K 21 16 Eyster [ 15 ] CodeLink GSE5108 paired, tissues Ovarian, peritoneal |nd adjusted, normalized and log2 probe-set intensities calculated using the Robust Multichip Averaging (RMA) algorithm in affy package [ 23 , 24 ], and the Codelink arrays normalizations performed in GSE5108 were retained. Genes which cannot be mapped to any KEGG pathway identified were excluded from the further analysis. The interquartile range (IQR) was used as a measure of variability. From the resu | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
904 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2250855 | [u'18297136'] | [u'Roth'] | ['Gerbens', 'van', 'Fehrmann', 'de', 'Crijns', 'Kamps', 'Hofstra', 'Weidenaar', 'Ter', 'Te'] | [] | PLoS One | 2008 | 2/20/2008 | 0 | n the first 50 TSR scores. We selected 621 samples, representing over 90 distinct tissue types from the publicly available human body index dataset from the Gene Expression Omnibus (Accession number: GSE7307{{tag}}--REUSE--). Supporting Information Table S1 (17.09 MB XLS) Click here for additional data file. Table S2 (12.07 MB XLS) Click here for additional data file. Table S3 (0.02 MB XLS) Click here for additional d | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
905 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2694358 | [u'19557189'] | [u'Roth'] | ['Rossin', 'Scolnick', 'Ng', 'Plenge', 'Xavier', 'Sklar', 'Altshuler', 'Daly', 'Purcell', 'Raychaudhuri'] | [] | PLoS Genet | 2009 | 2009 Jun | 0 | rily be enriched in this collection. To examine genes for tissue specific expression in the CNS system, we obtained a large publicly available human tissue expression microarray panel (GEO accession: GSE7307{{tag}}--REUSE--) [30] . We analyzed the data using the robust multi-array (RMA) method for background correction, normalization and polishing [55] . We filtered the data excluding | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
906 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2777408 | [u'19936228'] | [u'Roth'] | ['Tomoda', 'Conklin', 'Salomonis', 'Nakamura', 'Yamanaka'] | [] | PLoS One | 2009 | 11/10/2009 | 0 | http://www.ncbi.nlm.nih.gov/geo ) for the U133 plus 2.0 array platform from published datasets. This dataset consisted of normal adult human tissue samples from the human body index compendium study (GSE7307{{tag}}--REUSE--), hES cells, hiPS cells, and fibroblast lines (GSE9709 and GSE9832) [3] , [29] . For all datasets, one to three arbitrary biological replicates per tissue were use | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
907 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2858751 | [u'20346121'] | [u'Roth'] | ['Thorrez', 'Chang', 'Moreau', 'Tranchevent', 'Schuit'] | [] | BMC Genomics | 2010 | 3/26/2010 | 0 | dataset consisting of 70 microarrays covering 22 different murine tissues with 3-5 replicates per tissue was used as starting data. Data are accessible through the GEO database, with accession number GSE9954 [ 32 ]. The CEL files were analyzed using the affy library [ 33 ] of the BioConductor project [ 34 ] applying the Robust Multichip Average (RMA) function with default parameters (RMA background cor|k2 were cut out from the gel, purified and sent out for sequencing to Makrogen. Sequences were aligned to the mouse genome by BLAT. Normal human tissue data were retrieved from the Human Body Index (GSE7307{{tag}}--REUSE--, Neurocrine Bioscience, San Diego, California), of which a subset of 64 arrays were used, covering 18 different tissues with 3-5 replicates analogous to the murine dataset. Data were processed as d | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
908 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2988703 | [u'21047417'] | [u'Roth'] | ['Watkinson', 'Anastassiou', 'Varadan', 'Kim'] | [] | BMC Med Genomics | 2010 | 11/3/2010 | 0 | ata sets in the paper is given in Table 1 . They were identified by searching for rich data sets focused on a specific cancer in two public databases, The Cancer Genome Atlas and the Gene Expression Omnibus data depository. Furthermore, for the data sets initially used to infer the gene signature we required that they have well annotated staging information associated with the samples and that they cont|ists of Data sets in the paper Data set name Source Site GEO Accession Affymetrix platform Sample size TCGA Ovarian Cancer The Cancer Genome Atlas HT_HG-U133A a 377 CCR Ovarian Cancer Gene Expression Omnibus GSE9891 HG-U133_Plus_2 b 285 CCR Colon Cancer Gene Expression Omnibus GSE14333 HG-U133_Plus_2 290 Moffitt Colon Cancer Gene Expression Omnibus GSE17536 HG-U133_Plus_2 177 Singapore Gastric Cancer Gen|mnibus GSE15459 HG-U133_Plus_2 200 CCR Breast Cancer Gene Expression Omnibus GSE7390 HG-U133A c 198 Wang Breast Cancer Gene Expression Omnibus GSE2034 HG-U133A 286 Samsung Lung Cancer Gene Expression Omnibus GSE8894 HG-U133_Plus_2 138 Bild Lung Cancer Gene Expression Omnibus GSE3141 HG-U133_Plus_2 111 Neuroblastoma tumor Gene Expression Omnibus GSE3960 HG_U95Av2 102 Neoadjuvant Breast Cancer Gene Express|aging phenotype (specific to each cancer type) suggests that it could be used as a "proxy" of the MAF signature. This would allow us to improve on the gene list of Table 2 by making use of numerous publicly available gene expression data sets of cancers of many types, even without any staging information, as long as the MAF signature is present in a sizeable subset of them, aiming at finding the "inters|s-inhibiting therapeutic intervention targeting the MAF mechanism would be widely applicable to low-stage tumors. Conclusions In conclusion, we have shown that, using purely computational analysis of publicly available biological information, systems biology has revealed the core of a multi-cancer metastasis-associated gene expression signature. In the near future, a vast amount of additional information | mutual information with COL11A1 . Click here for file Additional file 5 Heat map of neuroblastoma data set . This file contains the result of hierarchical clustering for the neuroblastoma data set (GSE3960) using the MAF signature genes. Click here for file Additional file 6 Heat map of breast cancer data set using MAF signature genes . This file contains the result of hierarchical clustering for the|ure genes. Click here for file Additional file 7 Heat map of breast cancer data set using DCN metagene set . This file contains the result of hierarchical clustering for the breast cancer data set (GSE4779) using the DCN metagene set. Click here for file Acknowledgements Appreciation is expressed to Prof. Jessica Kandel, MD for helpful discussions. This work was supported by university inventor's ( | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
909 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2415112 | [u'18492260'] | [u'Roth'] | ['Delattre', 'Ducray', 'Mokhtari', 'Lair', 'Paris', 'de', 'Sanson', 'Bi\xc3\xa8che', 'Idbaih', 'Marie', 'Hoang-Xuan', 'Thillet', 'Vidaud'] | [] | Mol Cancer | 2008 | 5/20/2008 | 0 | callosum (GSM175855, GSM175856, GSM175857, GSM175858, GSM176050) and 5 samples of cortex (GSM176049, GSM176344, GSM176345, GSM176346, GSM176347), available in the Gene Expression Omnibus repository (GSE7307{{tag}}--REUSE--) [ 6 ]. To compare the gene expression profile with glioblastomas cancer stem cells (CSC), we used the data of Beier et al. (GSE7181) [ 7 ]. All raw and normalized data files for the microarray ana|nal genes In order to better characterize the expression of neuronal genes in gliomas with complete 1p19q codeletion, we performed a new hierarchical clustering analysis with samples of normal brain (GSE7307{{tag}}--REUSE--), including grey matter (cortex) and white matter (corpus callosum). As glioblastomas expressed genes of neural cancer stem cells, we also included samples of glioblastoma cancer stem cells (GSE718|ma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis Cancer Cell 2006 9 157 173 16530701 10.1016/j.ccr.2006.02.019 Gene Expression Omnibus repository (GSE7307{{tag}}--REUSE--) (http://www.ncbi.nlm.nih.gov/geo) Beier D Hau P Proescholdt M Lohmeier A Wischhusen J Oefner PJ Aigner L Brawanski A Bogdahn U Beier CP CD133(+) and CD133(-) glioblastoma-derived cancer stem cells | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
910 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2950125 | [u'20957185'] | [u'Roth'] | ['Marais', 'Vibranovski', 'Long', 'Landback', 'Zhang'] | [] | PLoS Biol | 2010 | 10/5/2010 | 0 | genes into the mammalian X chromosome, which may be independent of major chromosomal changes. X-Linked Young Genes Are Male-Biased, Whereas X-Linked Old Genes Are Not Based on human body index data (GSE7307{{tag}}, Materials and Methods ) and mouse tissue profiling data [27] at the NCBI GEO database [28] , we identified genes with sex-biased expression ( Materials and Meth| S5 ) motivated us to perform more thorough transcriptional profiling to get a more complete picture of how genes from this evolutionary period are transcribed. We investigated mouse exon atlas data (GSE15998) to ask whether X-linked genes are more frequently expressed in the tissue of interest across different age groups. We clustered tissues by the proportion of X-linked genes expressed versus the pr|uggests that stronger positive selection acts on rodents and could explain why the recent peak of gene gain ( Figure 2 ) began earlier in the mouse lineage than in the human. Materials and Methods We downloaded Ensembl [54] release 51 (November, 2008) as the basic gene dataset for our analyses. We used MySQL V5.0.45 to organize the data, BioPerl [55] and BioEnsembl &#x|e defined as de novo. Expression Profiling In order to avoid non-specific probes and to cover more recently annotated genes, we used the customized array annotation files (released on November, 2008) downloaded from University of Michigan [62] , HGU133Plus2_Hs_ENSG (Affymetrix Human 133 plus 2) and Mouse4302_Mm_ENSG (Affymetrix Mouse Genome 430 2.0 Array) for human and mouse, respectively|at this dataset does not overlap with what we described in Table 3 , since Table 3 only presents genes with unique probes, which 19 of these 20 genes do not have. Branch-Specific Ka/Ks Analysis We downloaded the vertebrate-wide 44-way coding sequence alignment from UCSC. UCSC known genes mapping to multiple Ensembl genes were discarded. For Ensembl genes mapping to multiple UCSC known genes, we retaine|icting with the age were removed. Based on the species tree ( Figure 1 ), we estimated Ka/Ks for each branch using free ratio model in PAML [71] . Functional Enrichment Analysis We downloaded Gene Ontology (GO) annotations for Ensembl V51. We used the program analyze.pl V1.9 of TermFinder package [72] to identify those significant terms for new genes, with multiple tes|ginating on a given chromosome out of all genes originating during that evolutionary period, that is, in that phylogenetic branch. Since human and mouse chromosomes are not completely orthologous, we downloaded net chain information (table netMm9) between human and mouse from UCSC [74] and extracted the top mouse hit for each individual human chromosome. For example, the top hit in mouse|uman and N  = 0.90, r  = 0.008, and d  = 0.22 for mouse. Panel A is based on Affymetrix Research Exon Array data for humans (GSE5791), while panel B is based on the Affymetrix Mouse Exon Array Panel. For the former, since the raw CEL file is not available, we downloaded the processed data from GEO website [28] ,|ndance (RA) of nine control tissues in mice. (0.55 MB DOC) Click here for additional data file. Figure S7 Heatmap of expression enrichment in X chromosome and autosome based on human body index data (GSE7307{{tag}}). The axes are labeled as in Figure 6 of the main text. Note that branches 10, 11, and 12 were skipped since these branches have too few (<5) genes with unique probes on the X chromosome | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
911 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2810306 | [u'20015385'] | [u'Roth'] | ['Tsai', 'Chang', 'Wang', 'Hu'] | [] | BMC Genomics | 2009 | 12/16/2009 | 0 | 122; HG-U133 Plus 2.0 GeneChip. Array data of normal CD133+ epithelial stem cells, which were used as a normal counterpart of cancer stem cells [ 39 ], isolated from benign prostatic hyperplasia were downloaded from the ArrayExpress database at the European Bioinformatics Institute ( http://www.ebi.ac.uk/microarray-as/ae/ ; Accession No. E-MEXP-993; array data files 1325504978.cel, 1325505459.cel and 1325| used). The gene expression profiles of EEC tissues of different stages were generated by the International Genomics Consortium (IGC) under the expO (Expression Project for Ontology) project and were downloaded from Gene Expression Omnibus (GEO http://www.ncbi.nlm.nih.gov/geo/ ; GSE2109). EEC array data were divided into training (n = 33; incl. all 4 stages) and testing cohorts (n = 15) (details in Table | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
912 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2882835 | [u'20479118'] | [u'Roth'] | ['Hosmalin', 'Dalod', 'Schwartz-Cornil', 'Guiton', 'Crozat', 'Boudinot', 'Storset', 'Ventre', 'Vu', 'Feuillet', 'Dutertre', 'Contreras', 'Marvel', 'Baranek'] | [] | J Exp Med | 2010 | 6/7/2010 | 0 | iglech mouse genes in 96 different cell types or tissues, in human (top) and mouse (bottom), respectively. The human data were retrieved from the GEO database, normal tissues and cell types from the GSE7307{{tag}}--REUSE-- dataset, PBMC-derived macrophages from GSE4883, monocyte-derived DCs from GSE7509, monocyte-derived macrophages from GSM213500, and alveolar macrophages from GSE2125, and blood and tonsil DC subset|) of the EBI ArrayExpress database. The data for the other leukocyte subsets directly isolated from normal human blood were described previously ( Du et al., 2006 ; Robbins et al., 2008 ) and can be downloaded from http://www-microarrays.u-strasbg.fr/files/datasetsE.php . The data for the mouse were downloaded from the BioGPS public database ( http://biogps.gnf.org ). Green circles, pDCs (dark, blood; l|ration and analysis of microarray data for mouse and human DC subsets were described elsewhere ( Robbins et al., 2008 ; Zucchini et al., 2008 ). All datasets used are public and their references for download from databases are given in the legends of figures. A compendium of human hgu133plus2 Affymetrix.CEL files was established, using Bioconductor (release 2.5) in the R statistical environment (version | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
913 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2975411 | [u'21047382'] | [u'Roth'] | ['Srivastava', 'Wang', 'Schwartz'] | [] | BMC Genomics | 2010 | 11/2/2010 | 0 | selective gene expression. Su et al. [ 6 ] used custom oligonucleotide arrays to examine the expression patterns of predicted genes across a panel of human and mouse tissues. The NCBI Gene Expression Omnibus (GEO at http://www.ncbi.nlm.nih.gov/geo/ ) has an Affymetrix microarray dataset for human body index of gene expression (GEO accession: GSE7307{{tag}}--REUSE--). Since each individual dataset does not contain a lar|be used to integrate the gene expression data from different microarray studies. Greco et al. [ 7 ] investigated tissue-selective expression patterns with an integrated dataset of microarray profiles publicly available at the GEO database. The relatively small dataset contained 195 expression profiles from six different microarray studies. The results suggested that gene expression data from Affymetrix Ge| used to identify brain, liver and testis-selective genes using a new computational method based on both microarray hybridization intensities and detection calls. The results further suggest that the publicly available microarray expression profiles from heterogeneous sources can be integrated into a single dataset for examining gene expression patterns across various tissues. Methods Collection and curat|ives, and show significant variations in data quality. To compile a compendium of high-quality microarray profiles for studying gene expression patterns, we manually curated the human microarray data publicly available in the NCBI GEO database (as of November 3, 2009). The following criteria were used to select microarray expression profiles in this study. First, the profiles had to be generated using the|hage 88 87 Heart 31 31 Tonsil 13 13 Lymph node 14 14 Blood (various cell types) 413 409 Other tissues 184 183 Microarray data normalization and integration Microarray raw data in CEL file format were downloaded from the GEO database, and then normalized by using the dChip software (available at http://www.dchip.org ). As a widely used tool for microarray data analysis, dChip can display and normalize CEL | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
914 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2821899 | [u'19351829'] | [u'Roth'] | ['Rudin', 'Parmigiani', 'Marchionni', 'Rhodes', 'Devereux', 'Hierman', 'Daniel', 'Peacock', 'Dorsch', 'Watkins', 'Yung'] | [] | Cancer Res | 2009 | 4/15/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
915 | GSE7307 | 4/9/2007 | ['7307'] | [] | [] | 2654039 | [u'19134193'] | [u'Roth'] | ['Mazzucco', 'Faggian', 'Pignatti', 'Trabetti', 'Biscuola', 'Laveder', 'Patuzzo', 'Cagnin', 'Lanfranchi', 'Iafrancesco', 'Pasquali'] | [] | BMC Genomics | 2009 | 1/9/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
916 | GSE7314 | 9/30/2007 | ['7314'] | [] | [] | 2570369 | [u'18811943'] | [u'Kuhar', u'Nettleton', u'Qu', u'Wang', u'Bearson', u'Couture', u'Lunney', u'Uthe', u'Tuggle', u'Dekkers'] | ['Kuhar', 'Nettleton', 'Qu', 'Wang', 'Bearson', 'Couture', 'Lunney', 'Uthe', 'Tuggle', 'Dekkers'] | ['Kuhar', 'Nettleton', 'Qu', 'Wang', 'Bearson', 'Couture', 'Lunney', 'Uthe', 'Tuggle', 'Dekkers'] | BMC Genomics | 2008 | 9/23/2008 | 0 | ith estimated FC < 2 (or estimated FC < 10 in some cases as noted in the Results section). Our microarray data have been submitted to the NCBI GEO database and the accession number is GSE7314{{tag}}--DEPOSIT--. Transcriptome determination and Sequence-based annotation of Probesets The transcriptome of normal and S . Choleraesuis infected MLN were determined as described previously [ 12 ]. Briefly, trans | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
917 | GSE7316 | 6/1/2007 | ['7316'] | [] | [u'17519220'] | 2677677 | [u'19436755'] | ['Sigman', 'Schanen', 'Vazquez-Lopez', 'Steindler', 'Alvarez-Retuerto', 'Geschwind', 'Martin', 'Pellegrini', 'Nishimura', 'Spence', 'Warren'] | ['Devriendt', 'Nitsch', 'Thienpont', 'Thorrez', 'Tranchevent', 'Moreau', 'Van'] | [] | PLoS One | 2009 | 2009 | 1 | osis) we have determined small neighborhoods of 150 or 20 neighboring genes, because for these data sets we obtained the best signal for small neighborhoods (data not shown) after applying the Fisher omnibus statistics (see Materials and Methods for more details). However, for one data set (Becker muscular dystrophy) we have determined a larger neighborhood of 2000 neighboring genes due to high signal |onsidered). We wanted the size of the neighborhood to be dependent on the disruption we found in the network. We determined this size by analyzing the observed signals obtained by applying the Fisher omnibus statistic to the list of candidate genes for different neighborhood sizes, and choosing the size for which we caught the best signal as the most reliable one. For three data sets (FXS, MFS, CF), we d|ghborhood size must be computed independently. Therefore, we first run the analysis for various value of β (from 0.001 to 0.5) and, for each, measure the signal captured by using the Fisher omnibus statistic ( ) on the rankings produced. We then generate a new p -value from the statistic S for each β using the χ 2 distribution. The value of the parameter β with t|. By contrast, for inappropriate neighborhood sizes, all candidates will have uniformly distributed p -values, leading to a low statistic S . Table S9 illustrates the signals derived by the Fisher omnibus meta-analysis using the example of FXS [11] , leading to an appropriate neighborhood size of approximately 150 genes for this data set. Data sets FXS Mendelian di|ion 1 (FMR1, OMIM *309550) Phenotype: mental retardation, macroorchidism, and distinct facial features Expression data: Nishimura et al. (2007) [11] (GEO accession number: GSE7316{{tag}}--REUSE--). Lymphoblastoid cell cultures from patients with confirmed FMR1 full mutation (CGG repeat expansion). Platform: Agilent-012391 Whole Human Genome Oligo Microarray G4112A MFS |ature, disproportionately long limbs and digits, joint laxity, eye anomalies and progressive cardiovascular problems. Expression data: Yao et al. (2007) [12] (GEO accession number GDS2960). Fibroblast cultures from patients with confirmed FBN1 missense (9) and nonsense (7) mutations as well as one multi-exon deletion. Platform: Research Genetics (Invitrogen) - GF211 Microarray Filte|b;43] ) Phenotype: chronic obstructive lung disease, bronchiectasia, and exocrine pancreatic insufficiency Expression data: Wright et al. (2006) [13] (GEO accession number GDS2143). Analysis of the nasal respiratory epithelium of cystic fibrosis (CF) patients with mild (4) or severe (5) lung disease. Platform: Affymetrix GeneChip Human Genome U133 Array Set HG-U133B BMD |ophin (DMD, OMIM *300377) Phenotype: muscle wasting and weakness, and in some cases with mental impairment. Expression data: Bakay et al. (2006) [14] (GEO accession number GDS2855). Analysis of muscle biopsy specimens from patients with various muscle diseases. Platform: Affymetrix GeneChip Human Genome U133 Array Set HG-U133B Stein-Levental syndrome Me|in-Levental syndrome [26] . Phenotype: obesity, hyperandrogenism and chronic anovulation Expression data: Cortón et al. (2007) [22] (GEO accession number: GDS2084). Omental fat biopsy from patients. Unconfirmed disorder etiology. Platform: Affymetrix GeneChip Human Genome U133 Array Set HG-U133A Supporting Information Supplementary Materials S1 (0.07 MB DO|the example of data set 1 (FXS [11] ). The neighborhood size is controlled by a weighting function (w = exp(−β⋅r). Applying the Fisher omnibus meta-analysis (S = ∑−2 ln (p-value)) for each parameter β, new p-values are generated from a Χ∧2 distribution. The parameter β | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
918 | GSE7318 | 4/20/2007 | ['7318'] | ['2691'] | [u'17449640'] | 1854842 | [u'17449640'] | ['Ninomiya', 'Arita', 'Shigenobu', 'Mukai', 'Hayashi', 'Kobayashi', 'Sato'] | ['Ninomiya', 'Arita', 'Shigenobu', 'Mukai', 'Hayashi', 'Kobayashi', 'Sato'] | ['Ninomiya', 'Arita', 'Shigenobu', 'Mukai', 'Hayashi', 'Kobayashi', 'Sato'] | Proc Natl Acad Sci U S A | 2007 | 5/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
919 | GSE7320 | 6/30/2007 | ['7320'] | [] | [u'17601824'] | 1955733 | [u'17601824'] | ['Bartel', 'Snyder', 'Axtell'] | ['Bartel', 'Snyder', 'Axtell'] | ['Bartel', 'Snyder', 'Axtell'] | Plant Cell | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
920 | GSE7324 | 3/21/2007 | ['7324'] | [] | [] | 1978186 | [u'17560331'] | [u'Speck', u'Bushweller', u'Cierpicki', u'J', u'Cheney', u'Roudaia', u'Laue', u'Liu', u'Klet', u'Chen', u'Hartman'] | ['Speck', 'Bushweller', 'Cierpicki', 'Cheney', 'Roudaia', 'Laue', 'Liu', 'Klet', 'Chen', 'Gaudet', 'Hartman'] | ['Speck', 'Bushweller', 'Cierpicki', 'Cheney', 'Roudaia', 'Laue', 'Liu', 'Klet', 'Chen', 'Hartman'] | Cancer Cell | 2007 | 2007 Jun | 0 | he microarray data discussed in this publication have been deposited in NCBIs Gene Expression Omnibus (GEO,Êhttp://www.ncbi.nlm.nih.gov/geo/) with accession number GSE7324{{tag}}--DEPOSIT-- | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
921 | GSE7329 | 6/1/2007 | ['7329'] | ['2824'] | [u'17519220'] | 2680807 | [u'19416532'] | ['', 'Sigman', 'Schanen', 'Vazquez-Lopez', 'Steindler', 'Alvarez-Retuerto', 'Geschwind', 'Martin', 'Pellegrini', 'Nishimura', 'Spence', 'Warren'] | ['Thomas', 'Zhang', 'Murphy', 'Gohlke', 'Mattingly', 'Davis', 'Rosenstein', 'Portier', 'Becker'] | [] | BMC Syst Biol | 2009 | 5/5/2009 | 1 | bal gene expression datasets utilized for validation of metabolic syndrome and neuropsychiatric subnetworks METABOLIC SYNDROME Condition Species Tissue GEO Acc . Reference obese/lean Human adipocytes GSE2508 [ 30 ] obese/lean Mouse adipocytes GSE4692 [ 31 ] Familial combined hyperlipedemia Human monocytes GSE11393 [ 32 ] Treatment Species Tissue GEO Acc . Reference Fenofibrate Rat liver GSE8251 [ 3|amide Rat liver GSE3952 [ 34 ] 9-cis retinoic acid Rat liver GSE3952 [ 34 ] Targretin Rat liver GSE3952 [ 34 ] Vitamin A deficient diet Rat liver GSE1600 [ 35 ] Omega 3 fatty acids Rat cardiomyocytes GSE4327 [ 36 ] Thiazolidinediones Human 3T3-L1 adipocytes GSE1458 [ 37 ] Atorvastatin Human monocytes GSE11393 [ 32 ] Cyfluthrin Human astrocytes GSE5023 [ 38 ] NEUROPSYCHIATRIC DISORDERS Condition Specie| 39 ] Depression Human prefrontal cortex GSE12654 [ 39 ] Schizophrenia Human prefrontal cortex GSE12654 [ 39 ] Schizophrenia Human frontal cortex E-MEXP-857 [ 40 ] Anxiety Mouse various brain regions GSE3327 [ 41 ] Autism Human lymphoblastoid cell lines GSE7329{{tag}}--REUSE-- [ 42 ] Autism Human whole blood GSE6575 [ 43 ] Treatment Species Tissue GEO Acc . Reference Chlorpyrifos Human astrocytes GSE5023 [ 38 ] Ch | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
922 | GSE7329 | 6/1/2007 | ['7329'] | ['2824'] | [u'17519220'] | 2637422 | [u'19214233'] | ['', 'Sigman', 'Schanen', 'Vazquez-Lopez', 'Steindler', 'Alvarez-Retuerto', 'Geschwind', 'Martin', 'Pellegrini', 'Nishimura', 'Spence', 'Warren'] | ['Berrettini', 'Wang', 'Bucan', 'Yang', 'Gregory', 'Hakonarson'] | [] | PLoS One | 2009 | 2009 | 1 | AND pmc_gds | 0 | 1 | ||||
923 | GSE7332 | 5/17/2007 | ['7332'] | [] | [u'17417652'] | 2865519 | [u'20463876'] | ['Bradbury', 'Socci', 'Studer', 'Barberi', 'Dincer', 'Panagiotakos'] | ['Stephan-Otto', 'Riester', 'Downey', 'Singer', 'Michor'] | [] | PLoS Comput Biol | 2010 | 5/6/2010 | 0 | the same study as well as expression data of 3 human embryonic stem cell lines (hESCs) and 3 hESC derived mesenchymal precursor lines (downloaded from NCBI Geo [47] accession number GSE7332{{tag}}--REUSE-- [48] ). We use gene expression data of AML [47] patient samples available within GEO (accession numbers GSE1159, GSE9476 [49] , GSE1729 [|GSE12417 [51] ). The breast cancer dataset is also compiled from Microarray data published in GEO with dataset numbers GSE7390 [16] , GSE2990 [15] , GSE3494 [17] , and GSE9574 [23] . A problem of micrarray meta-analyses is that the different dataset sources may introduce a bias. We therefore applied hierachical cluster | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
924 | GSE7333 | 3/29/2007 | ['7333'] | ['2614'] | [u'17397913'] | 2955048 | [u'20846437'] | ['Ransom', 'Srivastava', 'Schwartz', 'Tsuchihashi', 'Muth', 'Li', 'Zhao', 'McManus', 'Vedantham', 'von'] | ['Carter', 'Xu'] | [] | BMC Bioinformatics | 2010 | 9/16/2010 | 0 | 144 609 Hochberg 0 144 609 SidakSD 0 144 614 BH 0 407 5552 BY 0 227 2221 qvalue 0 407 6108 SAM 0 0 5330 Bayes 0 0 5705 EDR 5 593 4810 The raw cel files of these three data sets [ 21 , 23 , 25 ] were downloaded from the NCBI GEO database (GSE7146, GSE7333{{tag}}--REUSE--, GSE4107) and were preprocessed by the GC-RMA method. Two groups in each data set were tested by two-tailed t test assuming equal variance. All multiple|0 543 1319 891 1628 1724 TN 3774 4238 3424 3870 3110 3011 FN 91 118 80 98 75 72 TPR 0.4972 0.3481 0.5580 0.4586 0.5856 0.6022 FPR 0.2061 0.1136 0.2781 0.1871 0.3436 0.3641 The expression data set was downloaded from http://www.ambystoma.org and was preprocessed by the RMA method [ 38 ]. Differentially expressed genes (DEGs) were detected at the significance level of 0.05 by the EDR method and the other |ining 7129 probe sets. Only three genes were reported to be regulated by insulin in human muscle cell using a Wilcoxon signed rank test after filtering removed 5952 probe sets. The raw cel files were downloaded from the NCBI GEO database (GSE7146) containing data that are MIAME compliant as detailed on the MGED Society website http://www.mged.org/Workgroups/MIAME/miame.html The GC-RMA algorithm was used|d type and three miR-1-2 knockout mice at postnatal days 10 were compared for gene expression levels using Affymetrix mouse genome 430 2.0 array that contains 45101 probe sets. The raw cel files were downloaded from the NCBI GEO database (GSE7333{{tag}}--REUSE--) and were preprocessed by GC-RMA algorithm. With this data set, the EDR method was compared with 11 other multiple test procedures (Figure 2 ) at the same signi|ing the GeneChip U133-Plus 2.0 Array [ 25 ]. Twelve tumor specimens and ten adjacent grossly normal-appearing tissues from at least 8 cm away were collected for RNA extraction. The raw cel files were downloaded from the NCBI GEO database (GSE4107) and were preprocessed by GC-RMA algorithm. With this data set, the EDR method was compared with the other 11 multiple test procedures (Figure 2 ) at the same s|dent limb regeneration [ 26 ]. The same RNA samples were detected by Ambystoma GeneChip and 454 cDNA sequencing. There are total 4844 probe sets (TGs) on this GeneChip array. The raw cel files were downloaded from the public Ambystoma Microarray Database [ 37 ]. Detailed information of these data files and the DEGs confirmation by 454 cDNA sequencing were described in the original study [ 26 ]. The RMA | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
925 | GSE7336 | 4/2/2007 | ['7336'] | [] | [u'17409185'] | 1847597 | [u'17409185'] | ['Ballinger', 'Fischer', 'Penterman', 'Huh', 'Henikoff', 'Zilberman'] | ['Ballinger', 'Fischer', 'Penterman', 'Huh', 'Henikoff', 'Zilberman'] | ['Ballinger', 'Fischer', 'Penterman', 'Huh', 'Henikoff', 'Zilberman'] | Proc Natl Acad Sci U S A | 2007 | 4/17/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
926 | GSE7337 | 10/11/2007 | ['7337'] | [] | [u'17724083'] | 2169066 | [u'17724083'] | ['Wyrick', 'Parra'] | ['Wyrick', 'Parra'] | ['Wyrick', 'Parra'] | Mol Cell Biol | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
927 | GSE7338 | 10/11/2007 | ['7338'] | [] | [u'17724083'] | 2169066 | [u'17724083'] | ['Wyrick', 'Parra'] | ['Wyrick', 'Parra'] | ['Wyrick', 'Parra'] | Mol Cell Biol | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
928 | GSE7339 | 3/30/2007 | ['7339'] | [] | [] | 2633354 | [u'19087325'] | [u'Seki', u'Sugimoto', u'Takiguchi', u'Nimura', u'Fujisawa', u'Moriya', u'Iyoda', u'Kato', u'Kasai', u'Hashida'] | ['Liang'] | [] | BMC Med Genomics | 2008 | 12/16/2008 | 1 | using human primary lung cancer specimens (no cell lines) under the search terms "human lung adenocarcinoma" or "human lung squamous carcinoma" as of July 1 st , 2007 were reviewed. Only one dataset, GSE7339{{tag}}--REUSE--, was removed from further analysis due to too many genes missing from the 17-gene signature for prediction analysis. The 14 th dataset GSE2514 in Table 1 that is a mouse gene expression dataset | and subject to analyses. Table 1 Summary of lung cancer datasets used in this study. Database GEO Platform Institute PMID Technology type Organism AD# SCC# Stage # Genes % Correct AD % Correct SCC 1 GSE3398 GPL2648/2778/2832 Stanford 11707590 spotted cDNA Human 41 17 I to III 17 93 94 2 NA* Affy HG-U95A DFCI 11707567 oligonucleotide Human 139 21 I to III 17 91 86 3 NA** Affy HG-U133A U Michigan 121182| I to III 17 95 NA 4 GSE3141 Affy U95A/HuGeneFL Duke 16899777 oligonucleotide Human 54 57 I to III 17 83 72 5 GSE4573 Affy HG-U133A U Michigan 16885343 oligonucleotide Human 0 129 I to III 17 NA 85 6 GSE1037 CHUGAI 41K CIH, Japan 15016488 spotted cDNA Human 12 0 NA 17 83 NA 7 GSE6253 Affy HG-U95A/U133AB Washington U 17194181 oligonucleotide Human 14 36 I 17 79 78 8 GSE3268 Affy HG-U133A UC Davis 161889| Human 0 5 NA 17 NA 100 9 GSE1987 Affy HG-U95A Tel Aviv U NA oligonucleotide Human 8 17 I to III 17 88 59 10 GSE6044 Affy HG-Focus U Duesseldorf, Germany NA oligonucleotide Human 10 10 NA 16 70 90 11 GSE7880 Affy HG-Focus Heinrich-Heine U, Germany NA oligonucleotide Human 25 18 IIIB/IV 16 92 83 12 GSE2514 Affy HG-U95A U Colorado 16314486 oligonucleotide Human 20 0 NA 15 100 NA 13 GSE5843 GSE5123 PC Hum|14486 oligonucleotide Mouse 44 0 NA 13 100 NA * ** Figure 1 A flow diagram outlines selection of 13 human lung cancer databases used in this study . This diagram does not include the 14 th dataset GSE2514 in Table 1 that is a mouse gene expression dataset and was not used to calculate the accuracy of prediction in the meta-analysis. DB, database. Clustering analysis and evaluation of classification | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
929 | GSE7348 | 4/22/2007 | ['7348'] | [] | [u'17538624'] | 2949756 | [u'20858252'] | ['Hargreaves', 'Foster', 'Medzhitov'] | ['Perumal', 'Kollipara'] | [] | Immunome Res | 2010 | 9/21/2010 | 0 | and a subset of these targets clearly showed the pro-inflammatory gene expression pattern corresponding to LPS tolerance. Methods Differentially expressed genes The Foster et al microarray dataset (GSE7348{{tag}}--REUSE--) was downloaded from the NCBI-GEO database [ 33 ] via FTP protocol. This data set was derived from murine (C57BL/6 strain) bone marrow macrophages left untreated (N), stimulated with LPS for 24 ho|003e; one, and have the t-test p-value of <0.05. In this study, we utilized random mouse genes as background and they were selected using the RSAT tool [ 17 ]. The Mages et al [ 10 ] data was downloaded from GEO (GSE8621) and appropriate differential gene expression filters (based on fold change) were applied to compare with the pro-inflammatory transcriptional targets identified in our analysis o | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
930 | GSE7350 | 11/20/2007 | ['7350'] | [] | [u'18024519'] | 2223708 | [u'18024519'] | ['Bootsma', 'Hermans', 'Kuipers', 'de', 'Hendriksen', 'Estev\xc3\xa3o', 'Hoogenboezem'] | ['Bootsma', 'Hermans', 'Kuipers', 'de', 'Hendriksen', 'Estev\xc3\xa3o', 'Hoogenboezem'] | ['Bootsma', 'Hermans', 'Kuipers', 'de', 'Hendriksen', 'Estev\xc3\xa3o', 'Hoogenboezem'] | J Bacteriol | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
931 | GSE7351 | 6/1/2007 | ['7351'] | [] | [u'17616980'] | 1904468 | [u'17616980'] | ['Russell', 'Herold', 'Adryan', 'Bartkuhn', 'Renkawitz', 'Kwong', 'White', 'Holohan'] | ['Russell', 'Herold', 'Adryan', 'Bartkuhn', 'Renkawitz', 'Kwong', 'White', 'Holohan'] | ['Russell', 'Herold', 'Adryan', 'Bartkuhn', 'Renkawitz', 'Kwong', 'White', 'Holohan'] | PLoS Genet | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
932 | GSE7359 | 7/31/2007 | ['7359'] | [] | [u'17513503'] | 1913740 | [u'17513503'] | ['Goeres', 'Van', 'Zhang', 'Fauver', u'Milash', 'Sieburth', u'Dalley', 'Spencer'] | ['Goeres', 'Van', 'Zhang', 'Fauver', 'Sieburth', 'Spencer'] | ['Goeres', 'Van', 'Zhang', 'Fauver', 'Sieburth', 'Spencer'] | Plant Cell | 2007 | 2007 May | 0 | AND pmc_gds | 1 | 0 | ||||
933 | GSE7364 | 3/29/2007 | ['7364'] | [] | [u'17470268'] | 1868757 | [u'17470268'] | ['Auer', 'Yang', 'Kornacker', 'McHugh', 'Gastier-Foster', 'Wenger', 'Newsom', 'Nowak', 'Singh', 'Yu'] | ['Auer', 'Yang', 'Kornacker', 'McHugh', 'Gastier-Foster', 'Wenger', 'Newsom', 'Nowak', 'Singh', 'Yu'] | ['Auer', 'Kornacker', 'McHugh', 'Gastier-Foster', 'Wenger', 'Yang', 'Newsom', 'Nowak', 'Singh', 'Yu'] | BMC Genomics | 2007 | 4/30/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
934 | GSE7366 | 11/28/2007 | ['7366'] | [] | [u'18043742'] | 2084198 | [u'18043742'] | ['Bito', 'Kanagawa', u'Kimura', u'Iizumi', 'Fukuda', 'Suzuki', 'Nakamura', 'Carninci', 'Murakami', 'Otomo', 'Murata', 'Takahashi-Iida', 'Kikuta', 'Kanamori', 'Yamagata', 'Kawai', 'Ninomiya', 'Doi', 'Choi', 'Kishimoto', 'Kurita', 'Tagami', 'Arakawa', 'Matsubara', 'Nagata', 'Ito', 'Hirozane-Kishikawa', 'Fujitsuka', 'Matsumoto', 'Nagamura', 'Kamiya', 'Kikuchi', 'Yamamoto', 'Satoh', 'Sasaki', 'Hayashizaki'] | ['Bito', 'Kanagawa', 'Fukuda', 'Suzuki', 'Nakamura', 'Carninci', 'Murakami', 'Otomo', 'Murata', 'Takahashi-Iida', 'Kikuta', 'Kanamori', 'Yamagata', 'Kawai', 'Ninomiya', 'Doi', 'Choi', 'Kishimoto', 'Kurita', 'Tagami', 'Arakawa', 'Matsubara', 'Nagata', 'Ito', 'Hirozane-Kishikawa', 'Fujitsuka', 'Matsumoto', 'Nagamura', 'Kamiya', 'Kikuchi', 'Yamamoto', 'Satoh', 'Sasaki', 'Hayashizaki'] | ['Bito', 'Kanagawa', 'Fukuda', 'Suzuki', 'Nakamura', 'Carninci', 'Murakami', 'Otomo', 'Murata', 'Takahashi-Iida', 'Kikuta', 'Kanamori', 'Yamagata', 'Kawai', 'Ninomiya', 'Doi', 'Choi', 'Kishimoto', 'Kurita', 'Tagami', 'Arakawa', 'Matsubara', 'Nagata', 'Ito', 'Hirozane-Kishikawa', 'Fujitsuka', 'Matsumoto', 'Nagamura', 'Kamiya', 'Kikuchi', 'Yamamoto', 'Satoh', 'Sasaki', 'Hayashizaki'] | PLoS One | 2007 | 11/28/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
935 | GSE7372 | 3/28/2007 | ['7372'] | [] | [] | 2732365 | [u'19447789'] | [u'Zhong', u'Euskirchen', u'Snyder', u'Rozowsky', u'Gerstein'] | ['Choi', 'Ghosh', 'Nesvizhskii', 'Qin'] | [] | Bioinformatics | 2009 | 7/15/2009 | 1 | hIP-chip, offering cost-effective genome-wide coverage and resolution up to a single base pair. For many well-studied TFs, both ChIP-seq and ChIP-chip experiments have been applied and their data are publicly available. Previous analyses have revealed substantial technology-specific binding signals despite strong correlation between the two sets of results. Therefore, it is of interest to see whether the |g. ChIP-chip, can be a valuable source to complement the weakness of the sequencing technology. For many of the existing ChIP-seq data, ChIP-chip experiments have also been conducted and the data are publicly available. It is desirable to take advantage of existing ChIP-chip datasets to assist TFBS identification using ChIP-seq. While such a joint analysis has a promise, it is a challenging task to accoun|the human genome using ChIP-seq. ChIP-seq data for the treated and control cell lines were available from the Illumina web site and an unpublished ChIP-chip data was also available in Gene Expression Omnibus ( GSE7372{{tag}}--REUSE-- ). Since the array platform used in the ChIP-chip data (Nimblegen ENCODE tiling arrays) does not cover the whole genome, this section focuses on the 10 ENCODE regions each spanning 5 millio | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
936 | GSE7374 | 11/28/2007 | ['7374'] | [] | [u'18043742'] | 2084198 | [u'18043742'] | ['Bito', 'Kanagawa', u'Kimura', u'Iizumi', 'Fukuda', 'Suzuki', 'Nakamura', 'Carninci', 'Murakami', 'Otomo', 'Murata', 'Takahashi-Iida', 'Kikuta', 'Kanamori', 'Yamagata', 'Kawai', 'Ninomiya', 'Doi', 'Choi', 'Kishimoto', 'Kurita', 'Tagami', 'Arakawa', 'Matsubara', 'Nagata', 'Ito', 'Hirozane-Kishikawa', 'Fujitsuka', 'Matsumoto', 'Nagamura', 'Kamiya', 'Kikuchi', 'Yamamoto', 'Satoh', 'Sasaki', 'Hayashizaki'] | ['Bito', 'Kanagawa', 'Fukuda', 'Suzuki', 'Nakamura', 'Carninci', 'Murakami', 'Otomo', 'Murata', 'Takahashi-Iida', 'Kikuta', 'Kanamori', 'Yamagata', 'Kawai', 'Ninomiya', 'Doi', 'Choi', 'Kishimoto', 'Kurita', 'Tagami', 'Arakawa', 'Matsubara', 'Nagata', 'Ito', 'Hirozane-Kishikawa', 'Fujitsuka', 'Matsumoto', 'Nagamura', 'Kamiya', 'Kikuchi', 'Yamamoto', 'Satoh', 'Sasaki', 'Hayashizaki'] | ['Bito', 'Kanagawa', 'Fukuda', 'Suzuki', 'Nakamura', 'Carninci', 'Murakami', 'Otomo', 'Murata', 'Takahashi-Iida', 'Kikuta', 'Kanamori', 'Yamagata', 'Kawai', 'Ninomiya', 'Doi', 'Choi', 'Kishimoto', 'Kurita', 'Tagami', 'Arakawa', 'Matsubara', 'Nagata', 'Ito', 'Hirozane-Kishikawa', 'Fujitsuka', 'Matsumoto', 'Nagamura', 'Kamiya', 'Kikuchi', 'Yamamoto', 'Satoh', 'Sasaki', 'Hayashizaki'] | PLoS One | 2007 | 11/28/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
937 | GSE7376 | 10/23/2007 | ['7376'] | [] | [u'17712417'] | 1940319 | [u'17712417'] | [u'Rubie', 'Tolosi', 'Kamradt', 'Wahrheit', 'Jung', 'Wullich', 'Davis', 'Rahnenfuehrer', u'Rahnenfuhrer', 'Meltzer', 'Stoeckle', 'Walker', 'Schilling'] | ['Tolosi', 'Kamradt', 'Wahrheit', 'Jung', 'Wullich', 'Davis', 'Rahnenfuehrer', 'Meltzer', 'Stoeckle', 'Walker', 'Schilling'] | ['Tolosi', 'Kamradt', 'Wahrheit', 'Jung', 'Wullich', 'Davis', 'Rahnenfuehrer', 'Meltzer', 'Stoeckle', 'Walker', 'Schilling'] | PLoS One | 2007 | 8/22/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
938 | GSE7377 | 3/28/2007 | ['7377'] | ['2739'] | [u'17591970'] | 1941596 | [u'17591970'] | ['Allred', 'Lee', 'Mohsin', 'Tsimelzon', 'Medina', 'Wu', 'Mao'] | ['Allred', 'Lee', 'Mohsin', 'Tsimelzon', 'Medina', 'Wu', 'Mao'] | ['Allred', 'Lee', 'Mohsin', 'Tsimelzon', 'Medina', 'Wu', 'Mao'] | Am J Pathol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
939 | GSE7378 | 4/2/2007 | ['7378'] | [] | [u'17407600', u'18631401'] | 2972666 | [u'20975711'] | ['Gray', 'Benz', 'Eppenberger', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Zhou', 'Eppenberger-Castori'] | ['Wang', 'Li', "O'Connor-McCourt", 'Purisima', 'Collins', 'Deng', 'Lenferink', 'Cui'] | [] | Nat Commun | 2010 | 7/13/2010 | 0 | The breast cancer microarray data sets used were from WangÊet al.15Ê(Wang cohort or data set, Affymetrix arrays), ChangÊet al.6Ê(Chang cohort or data set, cDNA arrays), van 't VeerÊet al.5Ê(van 't Veer cohort or data set, cDNA arrays), MillerÊet al.21Ê(Miller cohort or data set, Affymetrix arrays) and from several other Affymetrix array data sets with the following NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/) IDs:GSE11121,ÊGSE1456,ÊGSE6532,ÊGSE9151,ÊGSE7378{{key}}--REUSE--ÊandÊGSE12093. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
940 | GSE7378 | 4/2/2007 | ['7378'] | [] | [u'17407600', u'18631401'] | 2689196 | [u'19445687'] | ['Gray', 'Benz', 'Eppenberger', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Zhou', 'Eppenberger-Castori'] | ['Kim'] | [] | BMC Bioinformatics | 2009 | 5/16/2009 | 1 | -centered and pooled into a single data set of 1,418 samples. Each color above the heatmap represents each data set. Table 1 Data sets analyzed in this study Data set Total ER+ ER- Survival Reference GSE1456 159 99 40 RFS [ 13 ] GSE2603 82 57 42 DMFS [ 14 ] GSE3494 236 213 34 DMFS [ 15 ] GSE6532 306 262 45 DMFS [ 16 , 17 ] GSE7378{{tag}}--REUSE-- 54 54 0 DMFS [ 18 ] GSE7390 198 134 64 DMFS [ 19 ] GSE11121 129 200 0 RF | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
941 | GSE7378 | 4/2/2007 | ['7378'] | [] | [u'17407600', u'18631401'] | 2689870 | [u'19393097'] | ['Gray', 'Benz', 'Eppenberger', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Zhou', 'Eppenberger-Castori'] | ['Nakai', 'Mizuno', 'Kitada', 'Sarai'] | [] | BMC Med Genomics | 2009 | 4/24/2009 | 1 | as possible. All tables were relationally linked and stored in the MySQL server. Table 1 Dataset content from PrognoScan Dataset Cancer type Subtype Cohort Author/Contributor Array type n Data source GSE13507 Bladder cancer Transitional cell carcinoma Cheongju Kim Human-6 v2 n = 165 GEO GSE5287 Bladder cancer Aarhus (1995–2004) Als et al . [ 10 ] HG-U133A n = 30 GEO GSE12417-GPL570 Blood cance|-GPL96 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133A n = 163 GEO GSE12417-GPL97 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133B n = 163 GEO GSE8970 Blood cancer AML San Diego Raponi et al . [ 12 ] HG-U133A n = 34 GEO GSE4475 Blood cancer B-cell lymphoma Berlin (2003–2005) Hummel et al . [ 13 ] HG-U133A n = 158 GEO E-TABM-346 Blood ca|E2658 Blood cancer Multiple myeloma Arkansas Zhan et al . [ 15 ] HG-U133_Plus_2 n = 559 GEO E-TABM-158 Breast cancer UCSF, CPMC (1989–1997) Chin et al . [ 16 ] HG-U133A n = 129 ArrayExpress GSE11121 Breast cancer Mainz (1988–1998) Schmidt et al . [ 17 ] HG-U133A n = 200 GEO GSE1378 Breast cancer MGH (1987–2000) Ma et al . [ 18 ] Arcturus 22 k n = 60 GEO GSE1379 Breast cancer|6-GPL96 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133A n = 159 GEO GSE1456-GPL97 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133B n = 159 GEO GSE2034 Breast cancer Rotterdam (1980–1995) Wang et al . [ 20 ] HG-U133A n = 286 GEO GSE2990 Breast cancer Uppsala, Oxford Sotiriou et al . [ 21 ] HG-U133A n = 187 GEO GSE3143 Breast cancer Duke |GSE3494-GPL96 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133A n = 236 GEO GSE3494-GPL97 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133B n = 236 GEO GSE4922-GPL96 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133A n = 249 GEO GSE4922-GPL97 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133B n = 249 GEO GSE|SE7378{{tag}}--REUSE-- Breast cancer UCSF Zhou et al . [ 26 ] U133AAofAv2 n = 54 GEO GSE7390 Breast cancer Uppsala, Oxford, Stockholm, IGR, GUYT, CRH (1980–1998) Desmedt et al . [ 27 ] HG-U133A n = 198 GEO GSE7849 Breast cancer Duke (1990–2001) Anders et al . [ 28 ] HG-U95A n = 76 GEO GSE9195 Breast cancer GUYT2 Loi et al . [ 25 ] HG-U133_Plus_2 n = 77 GEO GSE9893 Breast cancer Montpellier, Bordeau|man 21 K V12.0 n = 155 GEO GSE11595 Esophagus cancer Adenocarcinoma Sutton Giddings CRUKDMF_22 K_v1.0.0 n = 34 GEO GSE7696 Glioma Glioblastoma Lausanne Murat et al . [ 30 ] HG-U133_Plus_2 n = 70 GEO GSE4271-GPL96 Glioma MDA Phillips et al . [ 31 ] HG-U133A n = 77 GEO GSE4271-GPL97 Glioma MDA Phillips et al . [ 31 ] HG-U133B n = 77 GEO GSE2837 Head and neck cancer Squamous cell carcinoma VUMC, VAMC, |Adenocarcinoma Harvard Beer et al . [ 33 ] HG-U95A n = 84 Author's web site MICHIGAN-LC Lung cancer Adenocarcinoma Michigan (1994–2000) Beer et al . [ 33 ] HuGeneFL n = 86 Author's web site GSE11117 Lung cancer NSCLC Basel Baty Novachip human 34.5 k n = 41 GEO GSE3141 Lung cancer NSCLC Duke Bild et al . [ 22 ] HG-U133_Plus_2 n = 111 GEO GSE4716-GPL3694 Lung cancer NSCLC Nagoya (1995–|da et al . [ 34 ] GF200 n = 50 GEO GSE4716-GPL3696 Lung cancer NSCLC Nagoya (1995–1996) Tomida et al . [ 34 ] GF201 n = 50 GEO GSE8894 Lung cancer NSCLC Seoul Son HG-U133_Plus_2 n = 138 GEO GSE4573 Lung cancer Squamous cell carcinoma Michigan (1991–2002) Raponi et al . [ 35 ] HG-U133A n = 129 GEO DUKE-OC Ovarian cancer Duke Bild et al . [ 22 ] HG-U133A n = 134 Author's web site GSE8|horts. Datasets come from a number of different institutions around the world, and patient backgrounds differ. In addition, several datasets are based on specific subpopulations, for example, dataset GSE2034 is from lymph node-negative breast cancers, and GSE5287 is from cisplatin-containing chemotherapy-treated bladder cancers. Hence, it is possible that the specific association between gene expressio|subsequent care may affect the clinical course of a patient. 3) Experimental factors. Expression measurement of microarray is subject to various factors at the experiment level. Microdissection (e.g. GSE1378) would reduce contamination of mRNAs from non-cancer cells [ 57 ]. Formalin fixation of a sample (e.g. GSE2873) influences the quality of mRNAs [ 58 ]. Array type (e.g. Affymetrix, cDNA microarrays | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
942 | GSE7378 | 4/2/2007 | ['7378'] | [] | [u'17407600', u'18631401'] | 2575534 | [u'18631401'] | ['Gray', 'Benz', 'Eppenberger', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Zhou', 'Eppenberger-Castori'] | ['Yau', 'Benz'] | ['Yau', 'Benz'] | Breast Cancer Res | 2008 | 2008 | 1 | AND pmc_gds | 1 | 0 | ||||
943 | GSE7378 | 4/2/2007 | ['7378'] | [] | [u'17407600', u'18631401'] | 2216076 | [u'17850661'] | ['Gray', 'Benz', 'Eppenberger', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Zhou', 'Eppenberger-Castori'] | ['Roydasgupta', 'Gray', 'Hubbard', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Schittulli', 'Tommasi', 'Benz', 'Fridlyand', 'Fedele', 'Paradiso', 'Albertson'] | ['Gray', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Benz'] | Breast Cancer Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
944 | GSE7378 | 4/2/2007 | ['7378'] | [] | [u'17407600', u'18631401'] | 1852565 | [u'17407600'] | ['Gray', 'Benz', 'Eppenberger', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Zhou', 'Eppenberger-Castori'] | ['Gray', 'Benz', 'Eppenberger', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Zhou', 'Eppenberger-Castori'] | ['Gray', 'Eppenberger', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Benz', 'Zhou', 'Eppenberger-Castori'] | BMC Cancer | 2007 | 4/3/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
945 | GSE7385 | 4/1/2007 | ['7385'] | [] | [u'17407552'] | 1896006 | [u'17407552'] | ['Pugh', 'Huisinga'] | ['Pugh', 'Huisinga'] | ['Pugh', 'Huisinga'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
946 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2716388 | [u'19651608'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Neville', 'Clouston', 'Shin', 'Jungbluth', 'Miller', 'Grigoriadis', 'da', 'Chen', 'Lakhani', 'Caballero', 'Old', 'Hoek', 'Cebon', 'Simpson'] | [] | Proc Natl Acad Sci U S A | 2009 | 8/11/2009 | 0 | ccines. The aim of the present study was to investigate the expression of CT antigens in breast cancer. Using previously generated massively parallel signature sequencing (MPSS) data, together with 9 publicly available gene expression datasets, the expression pattern of CT antigens located on the X chromosome (CT-X) was interrogated. Whereas a minority of unselected breast cancers was found to contain CT- --REUSE-- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
947 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2711113 | [u'19563679'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Mosley', 'Keri'] | [] | BMC Cancer | 2009 | 6/29/2009 | 0 | lthough its smaller sample size led to fewer statistically significant results. Background Over the past decade, a large number of global gene expression data sets of human breast cancers have become publicly available [ 1 - 6 ]. These data sets have provided a wealth of information for the generation and testing of biological and clinical hypotheses [ 7 ]. Clinical and pathological factors with relevance| Methods Previously published microarray data sets Global gene expression and clinical data (including estrogen receptor status and metastasis recurrence latencies) were analyzed in four independent, publicly available breast cancer gene expression data sets. The Netherlands Clinical Institute (NKI2) data set contains data on 295 women with early stage breast cancer (downloaded from http://www.rii.com/pu|negative disease [ 1 ] (GEO series GSE2034). The KJX64 and KJ125 data sets contain data on 189 women, 64 of which were treated with tamoxifen, with primary operable invasive breast cancer (GEO series GSE2990) [ 6 ]. The TRANSBIG data set contains data for 183 untreated women from the Bordet Institute (GEO series GSE7390{{tag}}--REUSE--) [ 20 , 21 ]. All probes from each data set were used in the analyses except in the| two-sided, and a p-value less than 0.01 was considered statistically significant. Results Intrinsic bias in gene expression data sets We sought to ascertain whether there may be intrinsic bias among publicly available breast cancer gene expression data sets that would influence the likelihood of observing a significant difference in metastatic tumor recurrence latencies based on gene expression patterns | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
948 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2917039 | [u'20584310'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Baumbach', 'Gehrmann', 'Schormann', 'Maccoux', 'Sickmann', 'Freis', 'Ickstadt', 'Hengstler', 'Franckenstein', 'Wilhelm', 'Geppert', 'Rahnenf\xc3\xbchrer', 'Cadenas', 'Schug', 'Schumann', 'Schmidt', 'Hermes'] | [] | Breast Cancer Res | 2010 | 2010 | 0 | x HG-U133A array and the GeneChip System as described [ 31 ]. These data have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (GEO) and are accessible [GEO:GSE11121]. Results obtained from the Mainz cohort were validated in two previously published microarray datasets. Two breast cancer Affymetrix HG-U133A microarray datasets including patient outcome informa|re downloaded from the National Center for Biotechnology Information GEO data repository. The first dataset, the Rotterdam cohort [ 32 ], represents 180 lymph node-negative relapse-free patients [GEO:GSE2034] and 106 lymph node-negative patients that developed a distant metastasis. None of these patients had received systemic neoadjuvant or adjuvant therapy (Rotterdam cohort). The original data were re| dataset, the Transbig cohort, consists of 302 samples from breast cancer patients that remained untreated in the adjuvant setting after surgery [ 33 , 34 ]. GEO sample record numbers of samples [GEO:GSE6532, GEO:GSE7390{{tag}}--REUSE--] used for analysis are listed in the supplementary tables previously published by Schmidt and colleagues [ 31 ]. Raw .cel file data were processed by MAS 5.0 using a TGT of 500. Ethica | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
949 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2500166 | [u'18714348'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Patrick', 'Fournier', 'Bissell', 'Martin'] | [] | PLoS One | 2008 | 8/20/2008 | 1 | sing GeneSpring GX 7.3 software. The Wang dataset, consisting of the microarray profiles of 286 human breast tumors with associated clinical data [12] , was obtained from GEO (Series GSE2034). The downloaded data were transformed to set measurements less than 25 to 25, chips and genes were median normalized and median polished. The Sorlie dataset, with microarray profiles of 118 human |n reducing the significance of the results relative to the two larger and more complete datasets. Data for the first ten patients of Desmedt et al [14] were obtained from GEO record GSE7390{{tag}}--REUSE--. 10.1371/journal.pone.0002994.g001 Figure 1 The 22 gene 3D signature predicts survival in the microarray datasets of Wang, et al. , and Sorlie, et al. The 22 gene signature and unsupervised hiera | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
950 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2527336 | [u'18684329'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Reyal', 'Wessels', 'van', 'Reinders', 'Horlings'] | [] | BMC Genomics | 2008 | 8/6/2008 | 1 | l measured on Human Genome HG U133A Affymetrix arrays and normalized using the same protocol. The datasets were downloaded from NCBI's Gene Expression Omnibus (GEO, ) with the following identifiers; GSE6532 [ 24 ], GSE3494 [ 18 ], GSE1456 [ 23 ], GSE7390{{tag}}--REUSE-- [ 4 ] and GSE5327 [ 22 ]. The Chin et al. [ 25 ] data set was downloaded from ArrayExpress ( , identifier E-TABM-158). To ensure comparability betw | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
951 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2563019 | [u'18803878'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Sims', 'Pepper', 'Miller', 'Clarke', 'Hey', 'Okoniewski', 'Howell', 'Smethurst'] | [] | BMC Med Genomics | 2008 | 9/21/2008 | 1 | ed in this study. Datasets No. Tumours Array express/GEO ID GeneChip ER+ Age Tumour Size (cm) FU (years) Reference Chin et al. 2006 114 E-TABM U133AA 67% 51 2.3 6.1 [ 16 ] Desmedt et al. 2007 198 GSE7390{{tag}}--REUSE-- U133A 68% 47 2.0 13.6 [ 17 ] Farmer et al 2005 49 GSE1561 U133A 58% - - - [ 11 ] Ivshina et al. 2006 249 GSE4922 U133A 85% 63 2.0 9.9 [ 18 ] Loi et al. 2007 119, 87 GSE6532 U133A, U133 plus2.|t al. 2007 58 GSE5327 U133A 0% - - 7.2 [ 33 ] Pawitan et al. 2005 159 GSE1456 U133A 83% 58 $ 2.2 $ 7.1 [ 19 ] Richardson et al. 40 GSE3744 U133 plus2.0 38% - - - [ 10 ] Sotiriou et al. 2006 101* GSE2990 U133A 71% 60 2.0 5.8 [ 20 ] Wang et al. 2005 286 GSE2034 U133A 73% 52 - 7.2 [ 52 ] Continuous variables (age, size and follow up) are given as median values, except where indicated $ the mean | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
952 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2602602 | [u'19104654'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390{{tag}}--REUSE-- GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
953 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2689196 | [u'19445687'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Kim'] | [] | BMC Bioinformatics | 2009 | 5/16/2009 | 1 | -centered and pooled into a single data set of 1,418 samples. Each color above the heatmap represents each data set. Table 1 Data sets analyzed in this study Data set Total ER+ ER- Survival Reference GSE1456 159 99 40 RFS [ 13 ] GSE2603 82 57 42 DMFS [ 14 ] GSE3494 236 213 34 DMFS [ 15 ] GSE6532 306 262 45 DMFS [ 16 , 17 ] GSE7378 54 54 0 DMFS [ 18 ] GSE7390{{tag}}--REUSE-- 198 134 64 DMFS [ 19 ] GSE11121 129 200 0 RF | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
954 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2689870 | [u'19393097'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Nakai', 'Mizuno', 'Kitada', 'Sarai'] | [] | BMC Med Genomics | 2009 | 4/24/2009 | 1 | as possible. All tables were relationally linked and stored in the MySQL server. Table 1 Dataset content from PrognoScan Dataset Cancer type Subtype Cohort Author/Contributor Array type n Data source GSE13507 Bladder cancer Transitional cell carcinoma Cheongju Kim Human-6 v2 n = 165 GEO GSE5287 Bladder cancer Aarhus (1995–2004) Als et al . [ 10 ] HG-U133A n = 30 GEO GSE12417-GPL570 Blood cance|-GPL96 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133A n = 163 GEO GSE12417-GPL97 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133B n = 163 GEO GSE8970 Blood cancer AML San Diego Raponi et al . [ 12 ] HG-U133A n = 34 GEO GSE4475 Blood cancer B-cell lymphoma Berlin (2003–2005) Hummel et al . [ 13 ] HG-U133A n = 158 GEO E-TABM-346 Blood ca|E2658 Blood cancer Multiple myeloma Arkansas Zhan et al . [ 15 ] HG-U133_Plus_2 n = 559 GEO E-TABM-158 Breast cancer UCSF, CPMC (1989–1997) Chin et al . [ 16 ] HG-U133A n = 129 ArrayExpress GSE11121 Breast cancer Mainz (1988–1998) Schmidt et al . [ 17 ] HG-U133A n = 200 GEO GSE1378 Breast cancer MGH (1987–2000) Ma et al . [ 18 ] Arcturus 22 k n = 60 GEO GSE1379 Breast cancer|6-GPL96 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133A n = 159 GEO GSE1456-GPL97 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133B n = 159 GEO GSE2034 Breast cancer Rotterdam (1980–1995) Wang et al . [ 20 ] HG-U133A n = 286 GEO GSE2990 Breast cancer Uppsala, Oxford Sotiriou et al . [ 21 ] HG-U133A n = 187 GEO GSE3143 Breast cancer Duke |GSE3494-GPL96 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133A n = 236 GEO GSE3494-GPL97 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133B n = 236 GEO GSE4922-GPL96 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133A n = 249 GEO GSE4922-GPL97 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133B n = 249 GEO GSE|SE7378 Breast cancer UCSF Zhou et al . [ 26 ] U133AAofAv2 n = 54 GEO GSE7390{{tag}}--REUSE-- Breast cancer Uppsala, Oxford, Stockholm, IGR, GUYT, CRH (1980–1998) Desmedt et al . [ 27 ] HG-U133A n = 198 GEO GSE7849 Breast cancer Duke (1990–2001) Anders et al . [ 28 ] HG-U95A n = 76 GEO GSE9195 Breast cancer GUYT2 Loi et al . [ 25 ] HG-U133_Plus_2 n = 77 GEO GSE9893 Breast cancer Montpellier, Bordeau|man 21 K V12.0 n = 155 GEO GSE11595 Esophagus cancer Adenocarcinoma Sutton Giddings CRUKDMF_22 K_v1.0.0 n = 34 GEO GSE7696 Glioma Glioblastoma Lausanne Murat et al . [ 30 ] HG-U133_Plus_2 n = 70 GEO GSE4271-GPL96 Glioma MDA Phillips et al . [ 31 ] HG-U133A n = 77 GEO GSE4271-GPL97 Glioma MDA Phillips et al . [ 31 ] HG-U133B n = 77 GEO GSE2837 Head and neck cancer Squamous cell carcinoma VUMC, VAMC, |Adenocarcinoma Harvard Beer et al . [ 33 ] HG-U95A n = 84 Author's web site MICHIGAN-LC Lung cancer Adenocarcinoma Michigan (1994–2000) Beer et al . [ 33 ] HuGeneFL n = 86 Author's web site GSE11117 Lung cancer NSCLC Basel Baty Novachip human 34.5 k n = 41 GEO GSE3141 Lung cancer NSCLC Duke Bild et al . [ 22 ] HG-U133_Plus_2 n = 111 GEO GSE4716-GPL3694 Lung cancer NSCLC Nagoya (1995–|da et al . [ 34 ] GF200 n = 50 GEO GSE4716-GPL3696 Lung cancer NSCLC Nagoya (1995–1996) Tomida et al . [ 34 ] GF201 n = 50 GEO GSE8894 Lung cancer NSCLC Seoul Son HG-U133_Plus_2 n = 138 GEO GSE4573 Lung cancer Squamous cell carcinoma Michigan (1991–2002) Raponi et al . [ 35 ] HG-U133A n = 129 GEO DUKE-OC Ovarian cancer Duke Bild et al . [ 22 ] HG-U133A n = 134 Author's web site GSE8|horts. Datasets come from a number of different institutions around the world, and patient backgrounds differ. In addition, several datasets are based on specific subpopulations, for example, dataset GSE2034 is from lymph node-negative breast cancers, and GSE5287 is from cisplatin-containing chemotherapy-treated bladder cancers. Hence, it is possible that the specific association between gene expressio|subsequent care may affect the clinical course of a patient. 3) Experimental factors. Expression measurement of microarray is subject to various factors at the experiment level. Microdissection (e.g. GSE1378) would reduce contamination of mRNAs from non-cancer cells [ 57 ]. Formalin fixation of a sample (e.g. GSE2873) influences the quality of mRNAs [ 58 ]. Array type (e.g. Affymetrix, cDNA microarrays | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
955 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2718903 | [u'19619298'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Buechler'] | [] | BMC Cancer | 2009 | 7/20/2009 | 1 | y of questionable benefit. Methods Patient cohorts and data analysis The microarray datasets used here were obtained from the Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ , specifically, GSE4922 (UPPS), GSE6532 (OXFD, GUYT), GSE7390{{tag}}--REUSE-- (TRANSBIG), GSE9195 (GUYT2), and GSE11121 (MZ). The codes used for the cohorts in this paper are given in parentheses. Two independent cohorts were obtained fr|he combination of OXFT and OFXU from GSE6532. (GSE6532 contains additional cohorts, coded KIT and KIU, however these cohorts were excluded since many of the patients in these groups are also found in GSE4922.) None of the patients received adjuvant chemotherapy. A summary of the clinical traits of the patients is found in Table 1 . Complete descriptions can be found in the references at the Gene Expre|us was assessed by different methods across the cohorts. A "+" after the code for a cohort denotes the set of ER+ samples. In all cohorts, the survival endpoint used was distant metastasis, except in GSE4922, in which it was local recurrence or metastasis. Data on metastasis for most of the samples in this cohort are found in GSE6532. All survival data was censored to 10 years so as not to distort the |ix GeneChip platform hgu133a or hgu133plus2 . Table 1 Summary of the patient cohorts used in this study Uppsala Transbig Guys 1 Oxford Guys 2 Mainz Code UPPS TRANSBIG GUYT OXFD GUYT2 MZ GEO Series GSE4922 GSE7390{{tag}}--REUSE-- GSE6532 GSE6532 GSE9195 GSE11121 array hgu133a hgu133a hgu133plus2 hgu133a hgu133plus2 hgu133a # samples 249 198 87 178 77 200 # ER+ 200 138 85 144 77 169 LN+/-/? on ER+ 62/132/6 0/138/0 56|onsidering some probes as binary variables. The AP4 test for metastasis in ER+ breast cancer Derivation of the AP4 model The AP4 model is derived with the ER+ samples in two cohorts as training sets, GSE4922 (denoted UPPS+) and GSE7390{{tag}}--REUSE-- (TRANSBIG+). An initial set of 100 significant probes is identified as follows: Working in UPPS+, 100 training sets are selected, each containing 2 / 3 of the samples th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
956 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2734555 | [u'19640299'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Freudenberg', 'Joshi', 'Medvedovic', 'Hu'] | [] | BMC Bioinformatics | 2009 | 7/29/2009 | 1 | n data with and without prior variance-rescaling (see methods). Clustering algorithms were used to cluster data from four independent breast cancer gene expression datasets with GEO accession numbers GSE1456 [ 29 ], GSE3494 [ 28 ], GSE7390{{tag}}--REUSE-- [ 31 ], and GSE11121 [ 30 ]. For each clustering algorithm the total number of genes (y-axis) with the CLEAN score higher than the given threshold was plotted agains|stance based and Pearson's correlation based hierarchical clustering with and without prior variance-rescaling of the data, across four independent human breast cancer datasets (GEO expression series GSE1456 [ 29 ], GSE3494 [ 28 ], GSE7390{{tag}}--REUSE-- [ 30 ], and GSE11121 [ 31 ]). For all six algorithms, the hierarchical clustering was constructed using the average linkage principle. The number of genes common in |gher areas under the curve imply the higher functional coherence. Reproducibility and the comparison with cluster-wide scores We used the same four independent breast cancer gene expression datasets (GSE1456 [ 29 ], GSE3494 [ 28 ], GSE7390{{tag}}--REUSE-- [ 31 ], and GSE11121 [ 30 ]) and a study comparing tissue-specific gene expression patterns in mouse and human [ 32 ] to investigate reproducibility of the CLEAN sco|lts by comparing the correlation between the CLEAN scores (Figure 3A ) to correlation of pair-wise distances used to construct the hierarchical clustering of genes (Figure 3B ) in the two datasets (GSE3494 and GSE7390{{tag}}). In this analysis pairwise distances are based on the Bayesian posterior pairwise probabilities (PPPs) produced by the CSIMM algorithm [ 22 ]. Significantly increased correlation for t|n general. Figure 3 Integrating cluster analysis and functional knowledge . Genes were clustered using the CSIMM [ 22 ] algorithm and variance-scaled data from two independent breast cancer datasets (GSE3494 [ 28 ] and GSE7390{{tag}}--REUSE-- [ 31 ]), and CLEAN scores were computed for both clusterings. The number of genes common in both datasets after filtering was 8,567. A) The gene-specific CLEAN scores for the two|lgorithms was assessed by calculating all pairwise Pearson's correlation coefficients between scores for all algorithms applied to four independent human breast cancer datasets (GEO expression series GSE1456 [ 29 ], GSE3494 [ 28 ], GSE7390{{tag}}--REUSE-- [ 31 ], and GSE11121 [ 30 ]). Rows and columns in this symmetric heatmap represent specific scores for a specific clustering in a specific dataset in the heatmap. Th|and the cwCLEAN scores produced by the CLEAN R package. Figure 7 shows a screenshot of the new viewer we named Functional TreeView (FTreeview) displaying CLEAN results for the breast cancer dataset GSE3494 [ 28 ]. Panel 1 displays the per-gene functional coherence scores for individual category types. The broader the red bars are the higher is the score. Green indicates statistically non-significant |l coherence scores and the interactive Java-based viewer Functional TreeView (FTreeView). The figure shows a screenshot of the fTreeView session displaying CLEAN results for one breast cancer dataset GSE3494 [ 28 ]. fTreeView was developed from the original Java TreeView [ 38 ] by adding panel 3, which displays functional cluster annotations generated by the CLEAN R package. This functionality enables | cluster-based scores. Methods Data Preprocessing, Gene Selection and Clustering Raw data files (Affymetrix HG-U133A CEL files) of four independent human breast cancer datasets (GEO expression series GSE1456 [ 29 ], GSE3494 [ 28 ], GSE7390{{tag}}--REUSE-- [ 31 ], and GSE11121 [ 30 ]) were downloaded from the public repository GEO [ 33 ]. Each dataset was RMA-preprocessed [ 47 ] separately using the Entrez Gene-based c|an ('Brainarray') [ 48 ]. Preprocessed data files of a large-scale tissue expression data set [ 32 ] were also downloaded from the same repository. The tissues included both human (GEO dataset record GDS596) and mouse (GDS592). For genes with multiple probes per Entrez gene ID, in each case, the probeset with the highest median expression value per probeset was selected as the representative probeset f| We applied a mild variation filter using Cancer Outlier Profiler Analysis (COPA, 95 th percentile) [ 34 ] to select the top 10,000 genes to be clustered in each of the human breast cancer datasets (GSE1456, GSE3494, GSE7390{{tag}}, GSE11121). In each dataset expression values were centered by setting the median value of each gene to zero (subtracting the gene-specific medians) and clustering analyses were p | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
957 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2693232 | [u'18841463'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Alexe', 'Tan', 'Reiss'] | [] | Breast Cancer Res Treat | 2009 | 2009 Jun | 0 | ood that TGFβ signaling is active in that tumor. Each bar represents an individual tumor specimen. Three different breast cancer expression profile data sets were analyzed. These included the GSE_2034 data of 286 specimens described by Wang et al. [ 146 ], the GSE_4992 data of 249 specimens described by Ivshina et al. [ 181 ] and the GSE_7390{{tag}}--REUSE-- data on 165 specimens reported by Desmedt et al. [ 1|reast cancer subsets. As shown in Fig. 3B , the TBRS MSKCC was strongly positively associated with tumors in the HER2(NI), BA2 and LA1 subsets. These results were validated across three independent publicly available breast cancer expression data sets from different centers [ 147 , 180 , 181 ]. Moreover, the TBRS CINJ gave identical results to that developed by Padua et al. [ 147 ]. The principal dif | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
958 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2822536 | [u'20158879'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Kirillov', 'Nikolsky', 'Bessarabova', 'Nikolskaya', 'Bugrim', 'Shi'] | [] | BMC Genomics | 2010 | 2/10/2010 | 0 | ion study, we confirmed the phenomenon of bimodality and the ability of bimodal genes to form co-expressed clusters using four datasets carried out on standard Affymetrix and Agilent array platforms: GSE1456 [ 29 ], GSE7390{{tag}}--REUSE-- [ 30 ], GSE4922 [ 31 ], and an Agilent data set (Table 1 ). The Agilent dataset was formed as a non-redundant set of 193 samples from four studies: GSE1992 [ 32 ], GSE2740 [ 33 ], |ression patterns. In all five datasets, bimodality was defined by τ = 2.64 and standard deviation over 25th percentile of the distribution. a Recognized genes for each platform. Sorlie295 GSE1456 GSE7390{{tag}}--REUSE-- GSE4922 Agilent set Platform cDNA Affymetrix Affymetrix Affymetrix Agilent Bimodal genes 2476 (10604 a ) 5075 (12017 a ) 5440 (12017 a ) 4874 (12017 a ) 4983 (13379 a ) First, we compared t|nes out of the array of 10604 genes [ 28 ]. Using these parameters, we calculated sets of bimodal genes using the validation datasets of 5075, 5440, 4872, and 4983 genes from the independent datasets GSE1456, GSE7390{{tag}}--REUSE--, GSE4922, and the Agilent data set respectively (Table 1 ). Figure 2 Bimodal genes . (A) Distribution of GRB7 expression among 295 patients (Sorlie295 dataset). The green line marks the t|#x02248;-1 and uGRB7 = 1. Binary intersections of the pairs of bimodal genes from different datasets are large and statistically significant (Table 2 ). The largest intersection was for the datasets GSE7390{{tag}}--REUSE-- and GSE1456 at 3587 common bimodal genes - 66% of all bimodal genes for GSE7390{{tag}}--REUSE-- and 70% of all bimodal genes for GSE1456. The datasets Sorlie295 and GSE4922 had the smallest intersection of 1121 co|ed to estimate p-values. SetA SetB All genes intersection Bimodal genes intersection Bimodal genes for set A a Bimodal genes for set B a p-value Agilent Sorlie295 9433 1237 3661 2219 8.81E-77 Agilent GSE1456 10301 1830 3961 4307 5.86E-13 Agilent GSE4922 10301 1799 3961 4099 2.14E-20 Agilent GSE7390{{tag}}--REUSE-- 10301 1839 3961 4551 0.000154 Sorlie295 GSE1456 9367 1173 2223 3851 3.49E-37 Sorlie295 GSE4922 9367 1121 |al genes with synchronised expression in accordance with physiological conditions. Figure 4 Signal normalization for bimodal genes . (A) Expression profiles for genes FOXA1 and GATA3 in Sorlie295 and GSE1456 data sets before normalization. (B) Expression profiles for genes FOXA1 and GATA3 in Sorlie295 and GSE1456 data sets before normalization and after normalization. Signal normalization also helped t|ized expression of the same two genes, FOXA1 and GATA3, was compared between experiments run on two array platforms: cDNA array, Sorlie 295 [ 28 ] and Affymetrix (Affymetrix Human Genome U133A Array) GSE1456 [ 29 ]. The original expression profiles of the two genes had different intensity intervals (Figure 4A ), while the normalized expression values ranged between -1 and 1. (Figure 4B ). We generate | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
959 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2887810 | [u'20504364'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Eklund', 'Spjuth', 'Wikberg'] | [] | BMC Bioinformatics | 2010 | 5/26/2010 | 0 | od of making correct clinical prognostic and treatment decisions. Methods Transcriptomics data: Predicting distant metastasis development in breast cancer patients The datasets with accession numbers GSE2034, GSE7390{{tag}}--REUSE--, GSE4922, GSE2990, and GSE1456 were retrieved with the GEO Web service [ 19 ]. The datasets were originally published in Wang et al . [ 13 ], Desmedt et al . [ 20 ], Miller et al . [ 14 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
960 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2988703 | [u'21047417'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Watkinson', 'Anastassiou', 'Varadan', 'Kim'] | [] | BMC Med Genomics | 2010 | 11/3/2010 | 0 | ata sets in the paper is given in Table 1 . They were identified by searching for rich data sets focused on a specific cancer in two public databases, The Cancer Genome Atlas and the Gene Expression Omnibus data depository. Furthermore, for the data sets initially used to infer the gene signature we required that they have well annotated staging information associated with the samples and that they cont|ists of Data sets in the paper Data set name Source Site GEO Accession Affymetrix platform Sample size TCGA Ovarian Cancer The Cancer Genome Atlas HT_HG-U133A a 377 CCR Ovarian Cancer Gene Expression Omnibus GSE9891 HG-U133_Plus_2 b 285 CCR Colon Cancer Gene Expression Omnibus GSE14333 HG-U133_Plus_2 290 Moffitt Colon Cancer Gene Expression Omnibus GSE17536 HG-U133_Plus_2 177 Singapore Gastric Cancer Gen|mnibus GSE15459 HG-U133_Plus_2 200 CCR Breast Cancer Gene Expression Omnibus GSE7390{{tag}}--REUSE-- HG-U133A c 198 Wang Breast Cancer Gene Expression Omnibus GSE2034 HG-U133A 286 Samsung Lung Cancer Gene Expression Omnibus GSE8894 HG-U133_Plus_2 138 Bild Lung Cancer Gene Expression Omnibus GSE3141 HG-U133_Plus_2 111 Neuroblastoma tumor Gene Expression Omnibus GSE3960 HG_U95Av2 102 Neoadjuvant Breast Cancer Gene Express|aging phenotype (specific to each cancer type) suggests that it could be used as a "proxy" of the MAF signature. This would allow us to improve on the gene list of Table 2 by making use of numerous publicly available gene expression data sets of cancers of many types, even without any staging information, as long as the MAF signature is present in a sizeable subset of them, aiming at finding the "inters|s-inhibiting therapeutic intervention targeting the MAF mechanism would be widely applicable to low-stage tumors. Conclusions In conclusion, we have shown that, using purely computational analysis of publicly available biological information, systems biology has revealed the core of a multi-cancer metastasis-associated gene expression signature. In the near future, a vast amount of additional information | mutual information with COL11A1 . Click here for file Additional file 5 Heat map of neuroblastoma data set . This file contains the result of hierarchical clustering for the neuroblastoma data set (GSE3960) using the MAF signature genes. Click here for file Additional file 6 Heat map of breast cancer data set using MAF signature genes . This file contains the result of hierarchical clustering for the|ure genes. Click here for file Additional file 7 Heat map of breast cancer data set using DCN metagene set . This file contains the result of hierarchical clustering for the breast cancer data set (GSE4779) using the DCN metagene set. Click here for file Acknowledgements Appreciation is expressed to Prof. Jessica Kandel, MD for helpful discussions. This work was supported by university inventor's ( | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
961 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2990751 | [u'21124904'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Hoshida'] | [] | PLoS One | 2010 | 11/23/2010 | 0 | ancer Training set 97 Hu25K** (b) [11] Test set 49 HuGeneFL* (a) [12] 3. Liver cirrhosis in human and rat Training set 23 HG-U133plus2* GSE6764 [13] Test set 12 Rat Genome 230* GSE13747 - 4. Multiple tissue types (breast, lung, prostate, colon) Training set 51 HG-U95A* (a) [14] , [15|005b;14] , [15] 5. Molecular subclasses of breast cancer Training set 295 Stanford cDNA (c) [16] Test set 1 (“TransBig”) 198 HG-U133A* GSE7390{{tag}}--REUSE-- [18] Test set 2 (“Wang”) 286 HG-U133A* GSE2034 [19] Test set 3 (“Weigelt”) 53 Human WG6*** E-TABM-543|ce in cross-species prediction. We first defined a human liver cirrhosis signature including 801 up-regulated and 445 down-regulated genes in comparison between 13 cirrhotic and 10 normal livers from publicly available dataset [13] ( Table 1 ). We then tested whether the signature was presented in another publicly available dataset of gene-expression profiles of rat liver cirrhosis induc| of existing genomic signatures for their potential value as reliable medical diagnostics. The NTP methodology is implemented as Nearest Template Prediction module of GenePattern analysis toolkit and publicly available from www.broadinstitute.org/genepattern . Materials and Methods Data preprocessing We utilized data sets already normalized in the respective studies. Multiple probes corresponding a singl|its sample-wise mean and sample standard deviation in each dataset to adjust range of gene expression level between training and test datasets. All datasets and class labels used for the analysis are publicly available at http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi . The author thanks Joshua Gould, Heidi Kuehn, Barbara Hill, and Michael Reich for technical help and Stefano Monti, DR Mani, a | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
962 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2996978 | [u'20731868'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Mefford'] | [] | BMC Genomics | 2010 | 8/23/2010 | 0 | in this paper and gene clusters and signatures induced on the same data by other methods and programs. Table 2 Data sets. dataset source n survival endpoint median/average follow-up reference Uppsala GSE3494 251 BCSS 10.2 years Miller 2005 Stockholm GSE1456 159 DMFS 6.1 years Pawitan 2005 TRANSBIG GSE7390{{tag}}--REUSE-- 198 DMFS 13.6 years Desmedt 2007 NKI Chang 295 DMFS 6.7 years VandeVijver 2002 Data sets were down|n of partition "size" and "tolerance" parameters, but, as the algorithm increases the partition size, and relaxes the stringency of what qualifies as a match, these sets tend to quickly merge into an omnibus stromal set. Despite this, the importance of distinguishing the smaller sets becomes apparent in the survival analysis: five of the six stromal gene sets are associated with increased survival, sever|ignature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series Clin Cancer Res 2007 13 11 3207 3214 10.1158/1078-0432.CCR-06-2765 17545524 Gene expression omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ Chang HY Nuyten DS Sneddon JB Hastie T Tibshirani R Sorlie T Dai H He YD van't Veer LJ Bartelink H Robustness, scalability, and integration of a wound-response | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
963 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2865519 | [u'20463876'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Stephan-Otto', 'Riester', 'Downey', 'Singer', 'Michor'] | [] | PLoS Comput Biol | 2010 | 5/6/2010 | 0 | the same study as well as expression data of 3 human embryonic stem cell lines (hESCs) and 3 hESC derived mesenchymal precursor lines (downloaded from NCBI Geo [47] accession number GSE7332 [48] ). We use gene expression data of AML [47] patient samples available within GEO (accession numbers GSE1159, GSE9476 [49] , GSE1729 [|GSE12417 [51] ). The breast cancer dataset is also compiled from Microarray data published in GEO with dataset numbers GSE7390{{tag}}--REUSE-- [16] , GSE2990 [15] , GSE3494 [17] , and GSE9574 [23] . A problem of micrarray meta-analyses is that the different dataset sources may introduce a bias. We therefore applied hierachical cluster | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
964 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2946304 | [u'20825665'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Sims', 'Dexter', 'Mackay', 'Mitsopoulos', 'Grigoriadis', 'Ahmad', 'Zvelebil'] | [] | BMC Syst Biol | 2010 | 9/8/2010 | 0 | cally relevant cancer phenotypes. Methods Microarray data normalisation Microarray gene expression data for five of the breast cancer datasets used in this study, were obtained from the GEO database (GSE6532, GSE1456, GSE3494, GSE7390{{tag}}--REUSE--, GSE2034) [ 27 ]. The paired gene expression and array comparative genomic hybridization data for 43 ER+ tumours [ 20 ] was downloaded from the database referenced therei|h the ridge parameter set by leave-one-out cross-validation in the training set (values ranged from 25 to 120). Competitive selection was carried out on the merged dataset of 793 ER+ samples from the GSE6532, GSE1456, GSE3494, GSE7390{{tag}}--REUSE-- and GSE2034 datasets. One hundred random sample sets, each with 396 tumours, were drawn from the pool. The ridge regression model was then built up selecting the RMG at e|umor suppressor blocks cell cycle progression and inhibits cyclin D1 accumulation Mol Cell Biol 2002 22 4309 4318 10.1128/MCB.22.12.4309-4318.2002 12024041 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 207 210 10.1093/nar/30.1.207 11752295 Bolstad BM Irizarry RA Astrand M Speed TP A comparison of normalization m | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
965 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2879562 | [u'20211017'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Gray', 'Gu', 'Bayani', 'Hu', 'Lenburg', 'Blakely', 'Huang', 'Sadanandam', 'Pai', 'Mao'] | [] | Breast Cancer Res | 2010 | 2010 | 0 | tients with breast tumors was examined in existing microarray data sets of primary tumor samples that had been profiled with an Affymetrix microarray assay (either HG-U133A or HG U133 Plus 2.0) ((GEO:GSE1456), (GEO:GSE7390{{tag}}--REUSE--), (GEO:GSE2034), (GEO:GSE4922)) or Agilent oligo microarray (Santa Clara, CA, USA)(Table 3 ). Probe 218726_at and 20366 (GenBank: NM_018410 ) were used to measure HJURP expressio|. The process data from GEO website were downloaded for analysis. Table 3 Information of gene expression datasets used in this study Dataset GEO access number or web location Radiotherapy Reference 1 GSE1456 Not available [ 21 ] 2 GSE7390{{tag}}--REUSE-- Not available [ 22 ] 3 NKI [ 26 ] 82.4% patients [ 23 ] 4 GSE2034 86.7% patients [ 24 ] 5 GSE4922 Not available [ 25 ] HJURP shRNA construct The shRNA sequences were|is assessed by Affymetrix microarray. HJURP expression is measured as log 2 (probe intensities). The microarray data were found in Gene Expression Omnibus (GEO) database GEO accession numbers [GEO:GSE10780] [ 16 ]. HJURP mRNA level is an independent prognostic biomarker for poor clinical outcome We assessed the association between HJURP mRNA levels and clinical factors and outcomes using a cohort|urvival was validated in three independent cohorts of patients with breast cancer. Parts (a), (b) and (c) show the Kaplan-Meier survival curves for disease-free and overall survival in Dataset 1 (GSE1456), Dataset 2 (GSE7390{{tag}}--REUSE--) and Dataset 3 (NKI) respectively. The P -values shown were obtained from a long-rank test. Figure 5 Validation of the association between HJURP mRNA and disease-free surviv|ease-free survival was further validated in two independent cohorts of patients with breast cancer. Parts (a) and (b) show the Kaplan-Meier survival curves for disease-free survival in Dataset 4 (GSE2034) and Dataset 5 (GSE4922). The P -values shown were obtained from a long-rank test. Finally, we investigated whether HJURP mRNA levels were an independent prognostic factor over molecular subtype | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
966 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2892466 | [u'20500820'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Hellwig', 'Gehrmann', 'Hengstler', 'Rahnenf\xc3\xbchrer', 'Schormann', 'Schmidt'] | [] | BMC Bioinformatics | 2010 | 5/25/2010 | 0 | ormalization of the raw data was done using RMA [ 24 ] from the Bioconductor [ 25 ] package affy [ 26 ]. The raw .cel files are deposited at the NCBI GEO data repository [ 27 ] with accession number GSE11121. Results We analyze and compare distributions of bimodality measures. All methods presented in the methodology section are applied to the Mainz cohort study. For all bimodality measures we present|ur analysis on an other free available data set. The raw .cel files and clinical parameters of the Rotterdam cohort [ 28 , 29 ] were downloaded from the NCBI GEO data repository with accession number GSE2034 (n = 286). Here, the outlier-sum statistic and the likelihood ratio also have the smallest p-values of the logrank test (see Additional file 2 : Supplemental Figure 2). For the bimodality index th|gnificant enrichment ( p < 10 -7 ). To determine whether our results also hold for known prognostic subgroups we used a pooled cohort of 766 patients from different free available data sets: GSE11121 (n = 200), GSE2034 (n = 286), GSE7390{{tag}}--REUSE-- (n = 177) [ 30 ] and GSE6532 (n = 103) [ 31 , 32 ]. We look at three subgroups defined by the expression of the two genes ESR1 and erbB2, namely ESR1+/erbB2- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
967 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 3012716 | [u'21209904'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Quackenbush', 'Holton', 'Rubio', 'Cheng', 'Pak', 'Iglehart', 'Prendergast', 'Mar', 'Culhane', 'Aryee', 'Bentink', 'Glinskii', 'Cai', 'Hahn', 'Chittenden', 'Howe', 'Holmes', 'Taylor', 'Sultana', 'Lanahan', 'Schwede', 'Zhao'] | [] | PLoS One | 2010 | 12/30/2010 | 0 | on of human cancers, 411 probesets with a fold-change ≥2 in SAM analysis of GIPC1 depleted MDA-MB231 cells compared to control cells (GIPC1 signature; Table S1 ) were used to interrogate two publicly available and clinically annotated breast and ovarian cancer datasets with the Bioconductor package, globaltest [27] . A large merged breast cancer DNA microarray dataset with 689 |human colorectal cancer cells. By using RNAi to deplete GIPC1 mRNA in MDA-MB-231 cells we were able to identify a wide range of genes whose expression was altered. We compared this GIPC1 signature to publicly available breast and ovarian cancer gene expression datasets for which well-annotated phenotype and outcome data were available. We found strong correlation between the GIPC1 signature and a number o|h individual EASE GO term. Analysis of clinical breast and ovarian cancer public datasets The clinical relevance of the GIPC1 KD signature (n = 411 probesets) was evaluated in publicly available breast and ovarian cancer gene expression data which were downloaded from the Gene Expression Omnibus database at NCBI. After excluding patients with missing clinical data, the breast can|ined by merging the datasets GSE6532 [30] , GSE4922 [29] , and GSE7390{{tag}}--REUSE-- [28] . The ovarian cancer dataset contained 274 gene expression profiles from GSE9891 [31] . Association between gene expression of the GIPC1 KD signature and each clinical variable in the breast and ovarian cancer datasets were evaluated using globaltest [| were size, grade, type (malignant versus. low malignant potential) and overall survival. Analysis of merged MDA-MB231 GIPC1 KD and HMEC oncogene signature dataset An HMEC oncogene signature dataset, GSE3151 [23] was merged with the MDA-MB231 GIPC1 KD dataset. The merged dataset was normalized with RMA [20] using the Bioconductor package, affy . A meta-analysis was p | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
968 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2917034 | [u'20569502'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Sims', 'Taylor', 'Harrison', 'Muir', 'Langdon', 'Cameron', 'Kuske', 'Dixon', 'Liang', 'Walker', 'Faratian'] | [] | Breast Cancer Res | 2010 | 2010 | 0 | AND pmc_gds | 0 | 1 | ||||
969 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2913987 | [u'20630075'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Costa', 'Santos', 'Garaulet', 'Lorz', 'Due\xc3\xb1as', 'Buitrago-P\xc3\xa9rez', 'Mart\xc3\xadnez-Cruz', 'Paramio', 'Segrelles', 'Garc\xc3\xada-Escudero', 'Saiz-Ladera'] | [] | Mol Cancer | 2010 | 7/14/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
970 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2972286 | [u'20964848'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Liu', 'Gold', 'Wang', 'Miecznikowski', 'Sucheston'] | ['Wang'] | BMC Cancer | 2010 | 10/21/2010 | 0 | mple size and the availability of gene expression microarray data derived from RNA extracted from breast cancer tumors with sufficient follow-up data. At the time of this analysis, five datasets were publicly available; we reference these by primary author: Desmedt [ 7 ] (data accessible at NCBI GEO database [ 8 ], accession GSE7390{{tag}}--REUSE--), Miller [ 9 ] (accession GSE3494), Pawitan [ 10 ] (accession GSE1456), |sis approach shows promising evidence that genetic pathways can further stratify survival across datasets. Methods Data Collection and Pre-processing The breast cancer microarray datasets were either downloaded from the NCBI GEO database or provided by the authors through their public websites. Among the five datasets, three were based on the Affymetrix U133 platform, one on the Affymetrix U95 platform, a|nical demographics for each of the datasets is provided in the Additional File 1 . Table 1 Microarray Dataset Summary Dataset Total Samples Array Description Total Probes Years of Diagnosis Desmedt (GSE7390{{tag}}--REUSE--) 198 Affymetrix U133A 22283 1980-1998 Miller (GSE3494) 251 Affymetrix U133A 22283 1987-1989 Pawitan (GSE1456) 159 Affymetrix U133 22283 1994-1996 VAN DE Vijver 295 Agilent 24481* 1984-1995 Bild (G| comparative meta-analyses offer researchers an exciting opportunity to obtain generalizable results with appropriate statistical power. There are several examples of meta-analyses and re-analysis of publicly available datasets related to breast cancer research [ 31 - 33 ]. However, there are challenges to consider when performing a meta-analysis, including inter-study differences, lack of variables in co|st cancer patients in the TRANSBIG multicenter independent validation series Clinical Cancer Research 2007 13 11 3207 10.1158/1078-0432.CCR-06-2765 17545524 Edgar R Domrachev M Lash A Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic acids research 2002 30 207 10.1093/nar/30.1.207 11752295 Miller L Smeds J George J Vega V Vergara L Ploner A Pawitan Y Hall P Kla | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
971 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2553442 | [u'18635567'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Sotiriou', 'Haibe-Kains', 'Bontempi', 'Desmedt'] | ['Sotiriou', 'Haibe-Kains', 'Desmedt'] | Bioinformatics | 2008 | 10/1/2008 | 1 | et al. , 2007 ), TAM (Haibe-Kains et al. , 2008 ; Loi et al. , 2007 ) and UPP (Miller et al. , 2005 ). These datasets are publicly available from the GEO database 2 through accession numbers GSE2034, GSE7390{{tag}}--REUSE--, GSE6532/GSE9195 and GSE3494, respectively. VDX includes the gene expressions of 286 untreated node-negative BC patients and was used to build GENE76 and to validate GGI (see end of Secti | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
972 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2533026 | [u'18717985'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Sotiriou', "Van't", 'Haibe-Kains', 'Desmedt', 'Bontempi', 'Piccart', 'Piette', 'Cardoso', 'Buyse'] | ['Sotiriou', 'Haibe-Kains', 'Desmedt', 'Piccart', 'Piette', 'Cardoso', 'Buyse'] | BMC Genomics | 2008 | 8/21/2008 | 0 | expression and clinical data Gene expression and clinical data of TRANSBIG series [ 7 , 8 ] were retrieved from EMBL-EBI ArrayExpress ( , accession number E-TABM-77) and NCBI GEO ( , accession number GSE7390{{tag}}--REUSE--) databases, for the validation of the 70-gene signature (TBAGD) and of the 76-gene signature (TBVDX), respectively. The original TRANSBIG series included 309 patients for whom the 70-gene signature | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
973 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2895626 | [u'20500821'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Loi', 'Haviv', 'Abraham', 'Kowalczyk', 'Zobel'] | ['Loi'] | BMC Bioinformatics | 2010 | 5/25/2010 | 0 | . Methods We explored a range of methods for extracting gene sets. These statistics are described below; we first discuss the data used. Data We used five breast cancer datasets from NCBI GEO [ 23 ]: GSE2034 [ 24 ], GSE4922 [ 25 ], GSE6532 [ 26 , 27 ], GSE7390{{tag}}--REUSE-- [ 28 ], and GSE11121 [ 29 ]. All five are Affymetrix HG-U133A microarrays (some datasets include other platforms; these platforms were excluded)|15 remaining probesets. The datasets were independently normalised (see Additional File 1 ). Data Composition The data contains both lymph-node-negative and node-positive breast cancer patients. For GSE7390{{tag}}, GSE11121, and GSE2034, none of the patients received adjuvant treatment. For GSE6532 and GSE4922, some patients received adjuvant therapy; these were removed from the data. The data contains patie|e considered noninformative and were removed from the data, as shown in Table 1 . Table 1 Sample sizes and breakdown by class Dataset Good Obs. Removed Obs. < 5 years ≥ 5 years Total GSE2034 82 165 247 8 GSE4922 30 103 133 9 GSE6532 21 91 112 25 GSE7390{{tag}} 36 154 190 8 GSE11121 28 154 182 18 Observations (samples) were removed if they were censored before the 5-year cutoff. Gene Sets We u|bility of the ranks, we used the percentile bootstrap to sample the observations with replacement, generating a bootstrap distribution for the centroid weights for genes and gene sets in one dataset (GSE4922). Since there are 22,215 genes and only 5414 gene sets, a reduced gene list was derived by training a centroid classifier on the GSE11121 dataset and selecting the 5414 genes with the highest absol|t alone between them; gene set features are more stable. Figure 3 Bootstrap . Mean and 2.5%/97.5% of the ranks of genes and gene sets (set centroid statistic), over 5000 bootstrap replications of the GSE4922 dataset. The features have been sorted by their mean rank. Concordance of Datasets We were interested in how the different datasets agreed on the importance of the features (genes or gene sets). We|sed on publications of expression profiles, rarely using more than dozens of samples. To see whether different MSigDB categories were more useful for predicting metastasis, we combined four datasets (GSE2034, GSE4922, GSE6532, and GSE7390{{tag}}--REUSE--) into a single training set. A separate centroid classifier was trained on each gene set, using the set-centroid statistic, and the gene sets were then ranked by thei| negative correlations. Figure 6 Kolmogorov-Smirnov analysis . Kolmogorov-Smirnov enrichment for MSigDB categories, using the set-centroid statistic. A AUC and spline smooth for each set, tested on GSE11121. B Number of mapped probesets in each set, on log 2 scale, and spline smooth. C Two-sample Kolmogorov-Smirnov Brownian-bridge for each MSigDB category ( p -values: C1: 1.44 × 10 -4 , | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
974 | GSE7390 | 6/11/2007 | ['7390'] | [] | [u'17545524'] | 2917035 | [u'20569503'] | ['Sotiriou', 'Lidereau', 'Delorenzi', 'Klijn', 'Viale', 'Wang', u'Saghatchian', 'Zhang', 'Haibe-Kains', 'Loi', 'Desmedt', 'Bergh', 'Buyse', 'Harris', 'Piccart', 'Piette', 'Cardoso', 'Ellis', "d'Assignies", 'Foekens', 'Lallemand'] | ['Gonzalez-Angulo', 'Lluch', 'Casa', 'Fu', 'Brown', 'Zhang', 'Mills', 'Hilsenbeck', 'Hennessy', 'Schiff', 'Osborne', 'Gray', 'Creighton', 'Lee'] | ['Zhang'] | Breast Cancer Res | 2010 | 2010 | 0 | e Berkeley National Laboratory at the University of California at San Francisco. Gene-expression analysis Gene-transcription profiling datasets were obtained from previous studies (CMap build01, [GEO:GSE5258]; van de Vijver (available at [ 27 ]); Loi, [GEO:GSE9195]; Wang, [GEO:GSE2034]; Desmedt, [GEO:GSE7390{{tag}}--REUSE--]; Neve (available at [ 28 ]). Of the 134 ER + tumors in the Desmedt dataset, 28 were also repr | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
975 | GSE7391 | 6/23/2007 | ['7391'] | [] | [u'18687881'] | 2694283 | [u'19557186'] | ['Ponting', 'Hehir-Kwa', 'Nguyen', 'Veltman', 'Webber', 'Pfundt'] | ['de', 'Ponting', 'Hehir-Kwa', 'Nguyen', 'Veltman', 'Webber'] | ['Nguyen', 'Veltman', 'Hehir-Kwa', 'Ponting', 'Webber'] | PLoS Genet | 2009 | 2009 Jun | 0 | is last set is described in Nguyen et al. [53] and, together with the Koolen et al. [52] MR–associated CNV data, are available from the Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo/ ) with accession number GSE7391{{tag}--REUSE--}. Combined, these apparently benign CNVs represent 430 Mb of unique sequence (14.0% of the total NCBI35 human genome assembly | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
976 | GSE7391 | 6/23/2007 | ['7391'] | [] | [u'18687881'] | 2577867 | [u'18687881'] | ['Ponting', 'Hehir-Kwa', 'Nguyen', 'Veltman', 'Webber', 'Pfundt'] | ['Ponting', 'Hehir-Kwa', 'Nguyen', 'Veltman', 'Webber', 'Pfundt'] | ['Ponting', 'Hehir-Kwa', 'Nguyen', 'Veltman', 'Webber', 'Pfundt'] | Genome Res | 2008 | 2008 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
977 | GSE7392 | 3/30/2007 | ['7392'] | [] | [u'17397532'] | 2620272 | [u'19014681'] | [u'Cosio', u'Griffin', 'Park', 'Stegall', u'Grande'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392{{tag}}--REUSE-- Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
978 | GSE7392 | 3/30/2007 | ['7392'] | [] | [u'17397532'] | 1852103 | [u'17397532'] | [u'Cosio', u'Griffin', 'Park', 'Stegall', u'Grande'] | ['Park', 'Stegall'] | ['Park', 'Stegall'] | BMC Genomics | 2007 | 3/30/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
979 | GSE7393 | 12/23/2007 | ['7393'] | [] | [u'17999778'] | 2258198 | [u'17999778'] | ['', 'Joseph-Strauss', 'Simchen', 'Barkai', 'Zenvirth'] | ['Joseph-Strauss', 'Simchen', 'Barkai', 'Zenvirth'] | ['Joseph-Strauss', 'Simchen', 'Barkai', 'Zenvirth'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
980 | GSE7395 | 4/19/2007 | ['7395'] | [] | [u'17705881', u'18286283'] | 2206718 | [u'17705881'] | [u'Lo', 'Cavallo', 'Calogero', u'Cirenei', 'Forni', 'Cordero', 'Saviozzi', 'Quaglino'] | ['Cavallo', 'Calogero', 'Cordero', 'Forni'] | ['Cavallo', 'Calogero', 'Cordero', 'Forni'] | Breast Cancer Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
981 | GSE7407 | 3/31/2007 | ['7407'] | ['2658'] | [u'17446436'] | 2949890 | [u'20831831'] | ['Tian', 'Alcendor', 'Holle', 'Wagner', 'Gao', 'Zhai', 'Sadoshima', 'Zablocki', 'Yu', 'Vatner'] | ['Monsonego', 'Rubin', 'Geifman'] | [] | BMC Bioinformatics | 2010 | 9/12/2010 | 0 | ound 0.456971* 0.505641 External side of plasma membrane 0.744269 0.508488* Not Found (A) Enrichment in genes with up-regulated expression in hippocampus treated with Fluoxetine, based on GEO profile GSE6476. All terms passed FDR > 0.25 and state the FDR values. (B) GEO profile GSE6675. All terms passed FDR < 0.25 and state the FDR values (Up-regulated in Control in comparison to FGF2 t|ture. † indicates terms that were not included in the NIGO subset. For three GEO expression profiles, the results of the analysis are described in detail (Table 1 ). In the analysis of the GSE6476, in which the effect of chronic Fluoxetine treatment on hippocampal gene expression was examined [ 10 ], eleven terms passed statistic filtering with NIGO and not with the full GO, out of which 7 w|ded in the full GO but not in NIGO passed the statistic cutoffs but this term ('Protein self-association') was functionally irrelevant and contributed very little to the analysis. For the GEO profile GSE6675, in which astroglial gene expression program elicited by fibroblast growth factor-2 was examined, four terms relevant to neural/immune systems passed statistic filtering with NIGO and not with the |ith the full GO. This is an example of how without the use of NIGO one would have to raise the cutoff to at least 0.55 in order for this term to appear in the analysis results. Functional analysis of GSE6509 with NIGO also revealed statistically significant terms that were missed when using the full GO. This experiment involved microarray expression profiling designed to explore the effect of RU486, a |ed both in NIGO and the full GO, received higher FDR values when NIGO was used (Table 2 ), contrary to our expectations from the impact of reducing the ontology size. For example, in the analysis of GSE8788 in which gene expression analysis was conducted using Trib1-deficient macrophages treated with LPS as compared to LPS-treated wild-type macrophages [ 12 ], 11 out of 14 terms (that passed statistic|ment in FDR values when using NIGO is partially due to the stochastic nature of the GSEA algorithm. To test this hypothesis, the same analysis was repeated three times with each of the ontologies for GSE8788. FDR values were averaged and a comparison of analysis results was performed based on these averaged FDR values. In accordance with our hypothesis, the averaging of FDR values improved the apparent|l GO and GO-slim in enrichment analysis using GSEA Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.05 2 (2) 5 0 0 2 0 GSE6675 FDR < 0.25 0 6 0 0 1 0 GSE6476 FDR < 0.25 1 (1) 11 0 0 4 0 GSE3779 P < 0.05 7 (3) 13 1 (1) 0 1 5 GSE8425 P < 0.05 6 (1) 0 0 | 0 0 0 1 GSE8788 P < 0.01 6 (6) 0 0 11 3 1 GSE8788* P < 0.01 6 (6) 0 0 3 11 1 GSE9659 P < 0.01 6 (6) 0 NS A 4 28 NS A Non-neural- or immune-related studies (negative controls) GSE7407{{tag}}--REUSE-- P < 0.01 5 (5) 0 0 0 1 2 GSE2259 FDR < 0.25 3 (1) 0 1 (1) 1 2 1 GSE8191 P < 0.05 5 (2) 0 0 0 0 0 The enrichment analysis results of GSEA, providing the full GO, NIGO or GO-s| for each subset given in separate columns. *To test the effect of stochasticity, the analysis of this profile is based on 3-fold averaged FDR values from three independent GSEA analyses. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. Three non-neural- or immune-related datasets were used to test the performance of NIGO in functional analysis of microarray data in|. These datasets included the GEO expression profile, GSE7407{{tag}}--REUSE--, in which gene expression in heart tissue with cardiac specific over-expression of Sirt1 was examined [ 13 ], the GEO expression profile, GSE8191, in which the gene expression profile of mammary glands from pregnant mice was compared to that of mammary glands from lactating mice [ 14 ], and the GEO expression profile, GSE2259, in which gene |in enrichment analysis using the Fisher Exact Test Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.0001 1 (1) 1 0 0 22 2 GSE6675 P < 0.05 1 (1) 0 11 (1) 0 2 0 GSE6476 P < 0.0001 0 2 2 0 15 3 GSE3779 P < 0.1 0 0 0 0 2 0 GSE8425 P < 0.1 0 0 0 0 0 0 GSE|0.001 0 2 0 0 9 1 GSE6136 P < 0.000001 0 1 0 0 37 7 GSE8788 P < 0.01 0 2 0 0 4 1 GSE9659 P < 0.001 0 3 NS A 0 33 NS A Non-neural- or immune-related studies (negative controls) GSE7407{{tag}}--REUSE-- P < 0.0000000001 33 (30) 0 0 9 0 0 GSE2259 P < 0.1 63 (35) 0 2 8 8 7 GSE8191 P < 0.001 6 (6) 7 6 (2) 0 16 3 Enrichment analysis results from Ontologizer, providing the full |ional GO subset (the full GO or generic slim) are also shown for each term ('Lowest P-Value'), with the number of terms with the lowest P-values for each subset given in separate columns. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. These results, together with the analysis of five additional neural/immune-related experiments not described in Table 1 (namely p| cell's activity. These results further demonstrate that analysis with NIGO can enhance interpretation of functional analysis results produced for relevant microarray datasets. In the analysis of the GSE6509 expression dataset, three relevant terms passed the statistical cutoff with NIGO but not with the full GO. These terms, 'viral envelope', 'viral infectious cycle' and 'viral capsid' are all terms r|et. Such terms received FDR or p-values that were very close (but larger) than the cutoff values used. This is partially explained by the stochastic nature of the GSEA algorithm. Indeed, for one set (GSE8788), we compared the raw results with averaged FDR values. Averaging dramatically decreased the number of such terms. Furthermore, we conducted a similar functional analysis using the Fisher Exact Tes|nous peptide antigen via MHC class I' and 'Positive regulation of T cell mediated cytotoxicity' without defeating the purpose of the slimming process. Yet these two terms were found to be enriched in GSE6476, and are crucial for generating a hypothesis based on the expression profile. This shows that GO slims may be complemented by small, yet fully detailed domain-specific subsets of GO. Figure 4 The U|plementary/NIGO/Supplementary.html . The ontologies were clipped using the Protégé 4.0 beta OWL editor [ 26 ]. For by-species filtering, annotation files for human, rat and mouse were downloaded (October 2008) from GOA-EBI [ 27 ]. Association files used for GSEA analysis were generated based on the GOA-EBI annotation files and in the format required by GSEA. In this format, each row repres|a analysis and functional analysis were conducted using the GenePattern [ 29 ], GSEA [ 9 ] (release 2.5) web servers, and Ontologizer [ 16 ] as follows: (1) for each study, raw data (.CEL files) were downloaded from GEO [ 30 ]; (2) expression files (.gct files) were created using the Gene Pattern Expression File Creator module; (3) where necessary (i.e. for expression files GSE6509, GSE6675, GSE6476, GSE7|was conducted using the GSEA module. GSEA was run three times for each dataset, using a different GO version for each run. For the full GO, we used the organism-specific GO subset. In the analysis of GSE8788, GSEA was run three times for each of the three ontologies and FDR values were averaged over the three runs. (5) Differentially-expressed genes were found using the Gene Pattern ComparativeMarkerSe|, to an input term (Figure 4 ). The connections of UMLS concepts to the tested GO term are provided by UMLS and defined within the UMLS data files. Microarray data sets All microarray data sets were downloaded from GEO at NCBI [ 30 ]. The GEO sets used in this study are described in Additional file 2 . Availability NIGO is freely available as Additional file 6 and for download from: http://bioinfo.bg|re included or excluded from NIGO. Click here for file Additional file 2 Microarray Datasets for Comparative analysis of NIGO, GO and GO-slim . This file contains a summary of the microarray datasets downloaded from GEO at NCBI and used to test the performance of NIGO. Click here for file Additional file 3 GSEA analysis results . This file contains a summary of the GSEA analysis results for each of the mi|etroviral infectivity through different mechanisms Am J Physiol Lung Cell Mol Physiol 2009 297 L538 545 10.1152/ajplung.00162.2009 19561138 The Gene Ontology Consortium http://www.geneontology.org/GO.downloads.ontology.shtml Web Ontology Language Guide http://www.w3.org/TR/owl-guide Protégé 4.0 beta http://protege.stanford.edu Barrell D Dimmer E Huntley RP Binns D O'Donovan C Apweiler R Th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
982 | GSE7413 | 4/3/2007 | ['7413'] | [] | [] | 2329846 | [u'18403594'] | [u'DeMore'] | ['Patterson', 'Perou', 'Tanner', 'Klauber-DeMore', 'Hu', 'Livasy', 'Gabrielli', 'Bhati', 'Fan', 'Moore', 'Reynolds', 'Ketelsen'] | [] | Am J Pathol | 2008 | 2008 May | 0 | versus green channel intensity. 12 The UNC Microarray database ( https://genome.unc.edu/ ) was used to perform the filtering and preprocessing. All data have been deposited into the Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo ) under the accession number GSE7413{{tag}}--DEPOSIT-- . A two-class SAM (Significance Analysis of Microarrays, http://www-stat.stanford.edu/~tibs/SAM/ ) 13 , 14 was performed to i | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
983 | GSE7414 | 4/20/2007 | ['7414'] | [] | [u'17446352'] | 2926599 | [u'20385573'] | ['Fejes-Toth', 'Sachidanandam', 'Aravin', 'Hannon', 'Girard'] | ['Hohjoh', 'Ohnishi', 'Watanabe', 'Totoki', 'Tokunaga', 'Sakaki', 'Yamamoto', 'Toyoda', 'Sasaki'] | [] | Nucleic Acids Res | 2010 | 2010 Aug | 0 | ding to various repeats such as rRNA, tRNA, retrotransposon and DNA transposon, genomic positions of the repeats were retrieved from the University of California, Santa Cruz (UCSC) website ( http://hgdownload.cse.ucsc.edu/downloads.html ) and compared with the genomic positions of small RNAs. If the genomic position of a particular small RNA overlapped with any repeats by at least 15 nt, this small RNA wa|gs) and mRNAs based on sequence similarity, the sequences of these RNAs were extracted from the flat files (sequence and annotation files) of GenBank ( ftp://ftp.ncbi.nih.gov/genbank/ ) and sequences downloaded from the following databases: tRNAs, Genomic tRNA Database ( http://lowelab.ucsc.edu/GtRNAdb ); snoRNAs, snoRNA database ( http://www-snorna.biotoul.fr ) and RNA database ( http://jsm-research.imb.|seq Genes ( ftp://ftp.ncbi.nih.gov/refseq ) and Ensembl Genes ( http://www.ensembl.org ). Blastn searches were then performed using the small RNA sequences determined in this study as queries and the downloaded sequences as a database. After the annotation of small RNA clones, small RNA clusters were identified and characterized using in-house programs based on our previous study ( 13 ). We also carried o and the Gene Expression Omnibus (GEO) database (accession number: GSE7414{{key}}--REUSE--) | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
984 | GSE7415 | 4/6/2007 | ['7415'] | [] | [] | 2613928 | [u'18990226'] | [''] | ['Juskeviciute', 'Vadigepalli', 'Hoek'] | [] | BMC Genomics | 2008 | 11/6/2008 | 0 | Elmer, Waltham, MA). Raw quantitated array data was normalized using the print-tip lowess and scale normalization algorithms [ 52 ]. MIAME compliant microarray data are deposited at , accession # GSE7415{{tag}}--DEPOSIT-- (PHx) and GSE9137 (sham). ANOVA model The normalized gene expression data was analyzed using a mixed-effects ANOVA response model for each gene using the statistical software package in R follow | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
985 | GSE7420 | 11/7/2007 | ['7420'] | [] | [u'18344421'] | 2330305 | [u'18344421'] | ['Jauneau', 'Marquis', 'Minic', 'Jouanin', 'Renou', u'Martin-Magniette', 'Martinez', 'Saulnier', 'Ch\xc3\xa1vez', 'Cobbett', 'Bitton', 'Goffner', 'Fulton', 'Ranocha'] | ['Jauneau', 'Marquis', 'Minic', 'Jouanin', 'Renou', 'Martinez', 'Saulnier', 'Ch\xc3\xa1vez', 'Cobbett', 'Bitton', 'Goffner', 'Fulton', 'Ranocha'] | ['Marquis', 'Minic', 'Bitton', 'Jouanin', 'Renou', 'Martinez', 'Saulnier', 'Ch\xc3\xa1vez', 'Cobbett', 'Jauneau', 'Goffner', 'Fulton', 'Ranocha'] | Plant Physiol | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
986 | GSE7421 | 11/7/2007 | ['7421'] | [] | [u'18344421'] | 2330305 | [u'18344421'] | ['Jauneau', 'Marquis', 'Minic', 'Jouanin', 'Renou', u'Martin-Magniette', 'Martinez', 'Saulnier', 'Ch\xc3\xa1vez', 'Cobbett', 'Bitton', 'Goffner', 'Fulton', 'Ranocha'] | ['Jauneau', 'Marquis', 'Minic', 'Jouanin', 'Renou', 'Martinez', 'Saulnier', 'Ch\xc3\xa1vez', 'Cobbett', 'Bitton', 'Goffner', 'Fulton', 'Ranocha'] | ['Marquis', 'Minic', 'Bitton', 'Jouanin', 'Renou', 'Martinez', 'Saulnier', 'Ch\xc3\xa1vez', 'Cobbett', 'Jauneau', 'Goffner', 'Fulton', 'Ranocha'] | Plant Physiol | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
987 | GSE7428 | 11/9/2007 | ['7428'] | [] | [u'17934521'] | 2650419 | [u'17934521'] | ['', 'Plant', 'Pearson', 'Vogazianou', 'Ichimura', 'Langford', 'Liu', 'Baird', 'Collins', 'Gregory', 'B\xc3\xa4cklund'] | ['Plant', 'Pearson', 'Vogazianou', 'Ichimura', 'Langford', 'Liu', 'Baird', 'Collins', 'Gregory', 'B\xc3\xa4cklund'] | ['Plant', 'Pearson', 'Vogazianou', 'Ichimura', 'Langford', 'Liu', 'Baird', 'Collins', 'Gregory', 'B\xc3\xa4cklund'] | Oncogene | 2008 | 3/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
988 | GSE7432 | 7/13/2007 | ['7432'] | ['3505'] | [u'17630276'] | 1955696 | [u'17630276'] | ['Stepanova', 'Alonso', 'Likhacheva', 'Yun'] | ['Stepanova', 'Alonso', 'Likhacheva', 'Yun'] | ['Stepanova', 'Alonso', 'Likhacheva', 'Yun'] | Plant Cell | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
989 | GSE7436 | 4/4/2007 | ['7436'] | ['2932'] | [u'17425807'] | 2856677 | [u'20419098'] | ['Kalathur', 'Ranganathan', 'Agrawal', 'Kondaiah', 'Chavalmane', 'Bhushan', 'Takahashi'] | ['Madan', 'Yoon', 'Fang', 'Lin', 'Foltz', 'Yan', 'Kim', 'Hwang', 'Hood'] | [] | PLoS One | 2010 | 4/19/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
990 | GSE7436 | 4/4/2007 | ['7436'] | ['2932'] | [u'17425807'] | 1858692 | [u'17425807'] | ['Kalathur', 'Ranganathan', 'Agrawal', 'Kondaiah', 'Chavalmane', 'Bhushan', 'Takahashi'] | ['Kalathur', 'Ranganathan', 'Agrawal', 'Kondaiah', 'Chavalmane', 'Bhushan', 'Takahashi'] | ['Kalathur', 'Ranganathan', 'Agrawal', 'Kondaiah', 'Chavalmane', 'Bhushan', 'Takahashi'] | BMC Genomics | 2007 | 4/11/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
991 | GSE7439 | 9/20/2007 | ['7439'] | [] | [u'17635870'] | 2044543 | [u'17635870'] | ['Kendall', 'Rasko', 'Sperandio'] | ['Kendall', 'Rasko', 'Sperandio'] | ['Kendall', 'Rasko', 'Sperandio'] | Infect Immun | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
992 | GSE7447 | 7/1/2007 | ['7447'] | [] | [] | 2988648 | [u'18451147'] | [u'Ito', u'Meltzer', u'Sato'] | ['Agarwal', 'Ito', 'Jin', 'Yang', 'Olaru', 'Kan', 'Meltzer', 'David', 'Shimada', 'Mori', 'Hamilton', 'Cheng', 'Paun', 'Abraham', 'Sato'] | [u'Ito', u'Meltzer', u'Sato'] | Cancer Res | 2008 | 5/1/2008 | 0 | All processed and raw data are available in Minimum Information about Microarray Gene ExperimentÐcompliant format via the Gene Expression Omnibus.6ÊAccession numbers areÊGSE7447{{key}}--DEPOSIT--,GSM180360,ÊGSM180361,ÊGSM180362,ÊGSM180363,ÊGSM180364,ÊGSM180365,GSM180366,ÊGSM180367, andÊGSM180368. | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
993 | GSE7448 | 6/10/2007 | ['7448'] | [] | [u'17989254'] | 2751786 | [u'19732433'] | ['', 'Lai', 'Stark', 'Kellis', 'Bartel', 'Johnston', 'Ruby'] | ['Menzel', 'Zhou', 'Hu', 'Khaitovich', 'Yan', 'Chen', 'Xu'] | [] | BMC Genomics | 2009 | 9/4/2009 | 1 | me as for known miRNAs. Mapping fly sequencing data to the Drosophila melanogaster genome Roche/454 and Solexa sequences of the Drosophila melanogaster small RNA libraries were downloaded from GEO [GSE7448{{tag}}--REUSE--] [ 29 ] and [GSE11624] [ 50 ], and mapped to the Drosophila melanogaster genome (dm3, BDGP Release 5 from UCSC) using SOAP [ 51 ], respectively. Only sequences perfectly matching the genome and w | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
994 | GSE7448 | 6/10/2007 | ['7448'] | [] | [u'17989254'] | 2825236 | [u'20113528'] | ['', 'Lai', 'Stark', 'Kellis', 'Bartel', 'Johnston', 'Ruby'] | ['Korbie', 'Mattick', 'Hansen', 'Makunin', 'Jung'] | [] | BMC Genomics | 2010 | 2/1/2010 | 0 | As frequently cover the entire length of annotated snoRNAs or tRNAs, which suggests that other loci specifying similar ncRNAs can be identified by clusters of short RNA sequences. Results We combined publicly available datasets of tens of millions of short RNA sequence tags from Drosophila melanogaster , and mapped them to the Drosophila genome. Approximately 6 million perfectly mapping sequence tags w|snoRNAs, as well as a number of novel ncRNAs. Results Compilation of short RNA sequence reads into tag-contigs We obtained 10,846,433 sequence tags comprising 55,894,809 reads from 12 Gene Expression Omnibus (GEO) datasets (Table 1 ) derived from 90 experiments performed on Drosophila cell lines and tissues. Approximately 6 million tags were perfectly mapped to the D. melanogaster genome, excluding |ee Methods). As a measure of expression level, each TC was assigned a tag-depth score based on the maximum number of overlapping reads covering any part of the locus (Fig. 1 ) (see Methods). Table 1 Publicly available short RNA sequencing datasets on D. malanogaster GEO accession No. of tags Mappable References GSE10277 23252 12057 [ 14 ] GSE10515 49878 12096 [ 15 ] GSE10790 347861 30780 [ 19 ] GSE10794|19 255670 381508 [ 9 ] GSE11086 1277025 1509771 [ 13 ] GSE11624 6643474 3125323 [ 12 ] GSE6734 32160 34362 [ 8 ] GSE7448{{tag}}--REUSE-- 753797 452471 [ 17 ] GSE9138 13299 13294 [ 20 ] GSE9389 59906 32472 [ 18 , 9 ] GSE12527 2967 817 [ 11 ] total 10846433 6297373 Figure 1 Compilation of a tag-contig . Contiguously overlapping tags (grey arrows) were assembled into a tag-contig (TC) (block arrow). The tag-depth is the |rts of existing transposons generating siRNAs or piRNAs. Conclusions Several studies investigating the population of small RNAs have yielded millions of sequence reads. In this study, we combined all publicly available sequence data from Drosophila melanogaster short RNA into hundreds of thousands tag-contigs and associated subsets of them with known ncRNAs such as snoRNAs and tRNAs. The characteristic | miRbase release 12.0 [ 25 ]. Repeats were annotated using RepeatMasker [ 43 ] in FlyBase 5.12. Mapping of sequence tags We obtained all public available deep-sequencing datasets from Gene Expression Omnibus database at National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/geo in SOFT format (Table 1 ). These sequences were subsequently mapped to the genome of D. melanogaster usi | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
995 | GSE7448 | 6/10/2007 | ['7448'] | [] | [u'17989254'] | 2795490 | [u'19948966'] | ['', 'Lai', 'Stark', 'Kellis', 'Bartel', 'Johnston', 'Ruby'] | ['Th\xc3\xa9odore', 'Boug\xc3\xa9', 'Sismeiro', 'Antoniewski', 'Poisot', 'Berry', 'Fagegaltier', 'Copp\xc3\xa9e', 'Voinnet'] | [] | Proc Natl Acad Sci U S A | 2009 | 12/15/2009 | 0 | hila .  Other Sectionsâ�¼ Abstract Results and Discussion Materials and Methods Supplementary Material References Results and Discussion We examined the length distribution of TE-matching small RNAs in publicly available small RNA libraries from the fly soma ( 4 ), see Materials and Methods ). We found that a dramatic shift in the size of repeat-derived small RNAs occurs during development: the greater pop|all RNA sequence files from staged collections of 0–1 h ( GSM180330 ) and 12–24 h ( GSM180333 ) embryos, pupae ( GSM180336 ), adult heads ( GSM180328 ) and S2 cells ( GSM180337 ) were downloaded from GEO under accession nos. GPL5061 and GSE7448{{tag}}--REUSE-- . P19 and NLS-P19 bound RNAs as well as small RNAs from S2 cells and stably transformed P19 and NLS-P19 S2 cells were cloned using the DGE-Small | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
996 | GSE7448 | 6/10/2007 | ['7448'] | [] | [u'17989254'] | 2082411 | [u'18043761'] | ['', 'Lai', 'Stark', 'Kellis', 'Bartel', 'Johnston', 'Ruby'] | ['Sandmann', 'Cohen'] | [] | PLoS One | 2007 | 11/28/2007 | 0 | ing data was published for small RNA libraries cloned from ten different samples, including the major developmental stages of Drosophila melanogaster development [12] [GEO:GSE7448{{tag}}]. This allowed us to compare our experimental data with an independent unbiased global expression analysis. We mapped these small RNAs to the non-repetitive part of the Drosophila genome |ing T4 polynucleotide kinase (NEB). Mapping small RNA sequencing data to predicted loci Sequences detected in Drosophila small RNA libraries [12] were obtained from GEO [GEO:GSE7448{{tag}}--REUSE--] and mapped to the putative pre-miRNA sequences using megablast [42] with wordsize 14 and a score cutoff of 16. Only perfectly matched sequence matches were retained. The | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
997 | GSE7448 | 6/10/2007 | ['7448'] | [] | [u'17989254'] | 2704076 | [u'19474147'] | ['', 'Lai', 'Stark', 'Kellis', 'Bartel', 'Johnston', 'Ruby'] | ['Taft', 'Carninci', 'Lassmann', 'Glazov', 'Mattick', 'Hayashizaki'] | [] | RNA | 2009 | 2009 Jul | 0 | her Sectionsâ�¼ Abstract INTRODUCTION RESULTS AND DISCUSSION MATERIALS AND METHODS SUPPLEMENTAL MATERIAL REFERENCES MATERIALS AND METHODS Additional small RNA data sets Small RNAs were obtained from 26 publicly available small RNA deep sequencing libraries (identifiers are listed in parentheses): human THP-1 small RNAs (DNA Database of Japan, AIAAA0000001–AIAAT0000001) ( Taft et al. 2009 ); mouse WT|7 ; Czech et al. 2008 ); Arabidopsis ARGONAUTE4 and ARGONAUTE7 IPs (NCBI GEO, GSE12037 ) ( Montgomery et al. 2008 ); and Argonaute-1 IPs from WT and cid14 Δ mutant S. pombe (NCBI GEO, GSE311595 ) ( Buhler et al. 2008 ). Small nucleolar RNA annotations SnoRNA annotations were compiled from multiple sources. Human snoRNA annotations were obtained through the small RNA UCSC Genome Browser | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
998 | GSE7448 | 6/10/2007 | ['7448'] | [] | [u'17989254'] | 2698667 | [u'18376413'] | ['', 'Lai', 'Stark', 'Kellis', 'Bartel', 'Johnston', 'Ruby'] | ['Lai', 'Phillips', 'Tyler', 'Duan', 'Chou', 'Okamura'] | ['Lai'] | Nat Struct Mol Biol | 2008 | 2008 Apr | 0 | ted to the prevailing view that miRNA* species are, by and large, rare RNAs. More recent analysis of > 1 million small RNA sequences that aligned to the D. melanogaster genome (GEO dataset GSE7448{{tag}}--MENTION-- ) not only identified new miRNA genes, but also yielded a nearly comprehensive set of cloned miRNA* species 25 . These data permitted detailed analyses of miRNA* biology. Inspection of 316,927 miRN | 0 | 0 | 1 | NOT pmc_gds | 1 | 0 |
999 | GSE7448 | 6/10/2007 | ['7448'] | [] | [u'17989254'] | 2151012 | [u'18172163'] | ['', 'Lai', 'Stark', 'Kellis', 'Bartel', 'Johnston', 'Ruby'] | ['Lai', 'Chung', 'Hagen', 'Hannon', 'Tyler', 'Berezikov', 'Okamura'] | ['Lai'] | Genes Dev | 2008 | 1/1/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1000 | GSE7448 | 6/10/2007 | ['7448'] | [] | [u'17989254'] | 2099593 | [u'17989254'] | ['', 'Lai', 'Stark', 'Kellis', 'Bartel', 'Johnston', 'Ruby'] | ['Lai', 'Stark', 'Kellis', 'Bartel', 'Johnston', 'Ruby'] | ['Lai', 'Stark', 'Kellis', 'Bartel', 'Johnston', 'Ruby'] | Genome Res | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1001 | GSE7449 | 7/27/2007 | ['7449'] | [] | [u'17652178'] | 1935027 | [u'17652178'] | ['Shah', 'Hopkins', 'Graves', 'Hollenhorst'] | ['Shah', 'Hopkins', 'Graves', 'Hollenhorst'] | ['Shah', 'Hopkins', 'Graves', 'Hollenhorst'] | Genes Dev | 2007 | 8/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1002 | GSE7451 | 12/31/2007 | ['7451'] | [] | [] | 2856841 | [u'17968930'] | [u'Wong', u'Wang', u'Hu'] | ['Elashoff', 'Meijer', 'Xie', 'Zhou', 'Loo', 'Hu', 'Ieong', 'Wong', 'Kallenberg', 'Henry', 'Vissink', 'Pijpe', 'Yu', 'Wang'] | [u'Wong', u'Wang', u'Hu'] | Arthritis Rheum | 2007 | 2007 Nov | 0 | data we obtained into a Minimum Information About a Microarray Experiment (MIAME)−compliant database (available at http://www.mged.org/workgroups/MIAME/miame.html ); the accession number is GSE7451{{tag}}--DEPOSIT-- . Statistical analysis for the mRNA study The expression microarrays were scanned, and the fluorescence intensity was measured using Microarray Suite 5.0 software (Affymetrix). The arrays were then | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1003 | GSE7453 | 4/6/2007 | ['7453'] | [] | [u'17910761'] | 2078596 | [u'17910761'] | ['Barbry', 'Renesto', 'Raoult', 'Crapoulet', 'La'] | ['Barbry', 'Renesto', 'Raoult', 'Crapoulet', 'La'] | ['Barbry', 'Renesto', 'Raoult', 'Crapoulet', 'La'] | BMC Genomics | 2007 | 10/2/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1004 | GSE7454 | 4/5/2007 | ['7454'] | [] | [u'17845729'] | 2034371 | [u'17845729'] | ['Hughes', 'Squire', 'Wang', 'Xue', 'Somers', 'Bayani', u'Al', 'Al-Romaih', 'Prasad', 'Zielenska', 'Cutz'] | ['Hughes', 'Squire', 'Wang', 'Xue', 'Somers', 'Bayani', 'Al-Romaih', 'Prasad', 'Zielenska', 'Cutz'] | ['Hughes', 'Somers', 'Wang', 'Xue', 'Squire', 'Bayani', 'Al-Romaih', 'Prasad', 'Zielenska', 'Cutz'] | Cancer Cell Int | 2007 | 9/10/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1005 | GSE7463 | 5/31/2007 | ['7463'] | ['2785'] | [u'17505532'] | 2614415 | [u'19077237'] | ['McDonald', 'Bowen', 'Schubert', 'Benigno', 'Matyunina', 'Moreno', 'Logani', 'Dickerson'] | ['McDonald', 'Logani', 'Benigno', 'Osunkoya', 'Moreno', 'Laycock', 'Scharer'] | ['McDonald', 'Logani', 'Moreno', 'Benigno'] | J Transl Med | 2008 | 12/11/2008 | 0 | explanation of patient samples and microarray hybridization and normalization techniques is described elsewhere [ 22 ]. The complete dataset is available at the NCBI GEO website ( , accession number GSE7463{{tag}}--DEPOSIT--) and at the author's website . Cell Culture and Drug Treatment PTX10 and 1A9 cells were cultured in RPMI media (Mediatech, Herndon, VA) supplemented with 10% fetal bovine serum and grown in 5% CO | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1006 | GSE7465 | 8/31/2007 | ['7465'] | [] | [u'17720827'] | 2075049 | [u'17720827'] | ['Wiedmann', 'Boor', 'Chan', 'Raengpradub'] | ['Wiedmann', 'Boor', 'Chan', 'Raengpradub'] | ['Wiedmann', 'Boor', 'Chan', 'Raengpradub'] | Appl Environ Microbiol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1007 | GSE7469 | 7/1/2007 | ['7469'] | [] | [u'17404513'] | 2753794 | [u'17597816'] | ['Mullen', 'Kaufmann', u'Sharpless', 'Zhou', 'Paules', 'Lobenhofer', 'Elkon', 'Chou', 'Hurban', 'Simpson', 'Bushel'] | ['Thomas', 'Kaufmann', 'Qu', 'Moore', 'Zhou', 'Hao', 'Helms-Deaton', 'Ibrahim', 'Sharpless', 'Shields', 'Liu', 'Scott', 'Nevis', 'Cordeiro-Stone', 'Simpson'] | ['Simpson', 'Kaufmann', 'Zhou', u'Sharpless'] | J Invest Dermatol | 2008 | 2008 Jan | 0 | ed, gridded images were uploaded to the UNC microarray database ( http://genome.unc.edu/ ). All primary data from this work are available at that site and have been deposited into the gene expression omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ ) under accession number GSE7469{{tag}}--DEPOSIT-- . The array data were retrieved from the UNC microarray database, under the following criteria: (1) only reliable spots, as d | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1008 | GSE7472 | 10/18/2007 | ['7472'] | [] | [u'17850650'] | 2048963 | [u'17850650'] | ['Vink', 'van', 'Rodenburg', 'Keijer', 'Bovee-Oudenhoven', 'Roosing', 'Katan', 'Kramer'] | ['Vink', 'van', 'Rodenburg', 'Keijer', 'Bovee-Oudenhoven', 'Roosing', 'Katan', 'Kramer'] | ['Vink', 'van', 'Rodenburg', 'Keijer', 'Bovee-Oudenhoven', 'Roosing', 'Katan', 'Kramer'] | BMC Microbiol | 2007 | 9/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1009 | GSE7473 | 5/1/2007 | ['7473'] | [] | [u'17379603'] | 2671044 | [u'19188435'] | ['Bertrand-Michel', 'Bioulac-Sage', 'Boulanger', 'Rebouissou', 'Auffray', 'Balabaud', u'Terc\xe9', 'Imbeaud', 'Zucman-Rossi', 'Terc\xc3\xa9'] | ['Rios', 'Talianidis', 'Boj', 'Ferrer', 'Martin', 'Servitja', 'Guigo'] | [] | Diabetes | 2009 | 2009 May | 0 | bouissou et al. ( 17 ). To relate expression ratios of bound genes versus all genes, we reprocessed the published human hepatocellular adenoma and control tissue HG-U133A Affymetrix chip dataset (GEO GSE7473{{tag}}--REUSE--) with RMA using identical conditions as for the mouse chip datasets. Genomic binding analysis. We used the genomic binding datasets in human hepatocytes and mouse liver genes reported by Odom et al | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1010 | GSE7475 | 6/30/2007 | ['7475'] | ['2878'] | [u'17656681'] | 2176127 | [u'17656681'] | ['Kobzik', 'Dahl', 'Lim', 'Leme', 'Mariani', 'Yang', 'Fedulov'] | ['Kobzik', 'Dahl', 'Lim', 'Leme', 'Mariani', 'Yang', 'Fedulov'] | ['Kobzik', 'Dahl', 'Lim', 'Leme', 'Mariani', 'Yang', 'Fedulov'] | Am J Respir Cell Mol Biol | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1011 | GSE7477 | 6/15/2007 | ['7477'] | [] | [] | 2241611 | [u'17967175'] | [u'Qi', u'Wick', u'Whittam', u'Bergholz'] | ['Riordan', 'Ouellette', 'Whittam', 'Bergholz', 'Qi', 'Wick'] | [u'Qi', u'Wick', u'Whittam', u'Bergholz'] | BMC Microbiol | 2007 | 10/29/2007 | 0 | with 4 replicates (i.e. 4 independent cultures that were each sampled 10 times) for a total of 40 samples. The resulting array data have been deposited in the NCBI Gene Expression Omnibus, accession GSE7477{{tag}}--DEPOSIT--. The samples collected from the mid-exponential phase 3-hour time point (Fig. 1 ) were used as a common reference for hybridization and analysis for all subsequent time points in a replicated refe|ePix 6.0 (Molecular Devices). The 3 h sample served as a common reference and was hybridized with all subsequent samples. Array data have been deposited at the NCBI Gene Expression Omnibus (Accession GSE7477{{tag}}--DEPOSIT--). Quantitative real-time PCR Expression levels of 14 ORFs determined to be differentially expressed (p < 0.00001) were verified by quantitative real-time PCR (Q-PCR). Primer pairs were desi | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1012 | GSE7483 | 4/11/2007 | ['7483'] | [] | [] | 2748096 | [u'19728865'] | [u'deBlois', u'Wang', u'Schwartz', u'Pritchard', u'Marchand', u'Adams', u'Duan', u'Amon'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | [] | BMC Genomics | 2009 | 9/3/2009 | 1 | f uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when|r to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. Results GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query an|gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . Background One of the major challenges in the post-g|vering gene functions on a genomic scale [ 1 ]. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO) [ 2 ], ArrayExpress [ 3 ] and researchers' websites. These resources serve at least two purposes. One is as an archive of the data, which allows other researchers to confirm results that have b|eveloped a web tool named GEM-TREND (Gene Expression data Mining Toward Relevant Network Discovery) to automatically retrieve gene expression data across a wide range of microarray experiments in the publicly available GEO database by comparing gene-expression patterns between a query and the database entries. Subsequently, the system generates a gene co-expression network for retrieved gene expression da|, and each series links to GEO by clicking the GSE ID or GPL ID (Fig. 3e ). In addition, the series of interest can be selected for further processing. Both search results and selected series can be downloaded in CSV format. Figure 3 Screenshot of GEM-TREND . a) Query input area. The gene-expression signature, gene expression ratio data and text are accepted. Network IDs can be used to retrieve previous | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192 (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965 GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381 GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029 GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615 GPL2884 Cancer Process Study 0.752 0.0107 GSE7600 GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192 GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625 GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528 GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou|language: Java, PHP Other requirements: Java 1.5.0 or higher License: The tool is available free of charge Any restrictions to of use by non-academics: None List of abbreviations GEO: Gene Expression Omnibus; GO: Gene Ontology; GSE: Series in GEO; GPL: Platform in GEO; MeSH: Medical Subject Headings. Authors' contributions CF designed the system and wrote the manuscript; MA gave comments and edited the m|l E Koller D Kim SK A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules Science 2003 302 249 255 12934013 10.1126/science.1087447 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 207 210 11752295 10.1093/nar/30.1.207 Parkinson H Kapushesky M Shojatalab M Abeygunawardena N Coulson R Farne A | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1013 | GSE7485 | 8/1/2007 | ['7485'] | [] | [u'17827287'] | 2168698 | [u'17827287'] | ['Schaefer', 'Antunes', 'Stevens', 'Ferreira', 'Qin', 'Ruby', 'Greenberg'] | ['Schaefer', 'Antunes', 'Stevens', 'Ferreira', 'Qin', 'Ruby', 'Greenberg'] | ['Schaefer', 'Antunes', 'Stevens', 'Ferreira', 'Qin', 'Ruby', 'Greenberg'] | J Bacteriol | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1014 | GSE7492 | 11/29/2007 | ['7492'] | [] | [u'18024685'] | 2168136 | [u'17933929'] | ['Wiedmann', 'Boor', 'Raengpradub'] | ['Loss', 'Orsi', 'Boor', 'Schwab', 'Hu', 'Raengpradub', 'Wiedmann'] | ['Wiedmann', 'Boor', 'Raengpradub'] | Appl Environ Microbiol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1015 | GSE7492 | 11/29/2007 | ['7492'] | [] | [u'18024685'] | 2223194 | [u'18024685'] | ['Wiedmann', 'Boor', 'Raengpradub'] | ['Wiedmann', 'Boor', 'Raengpradub'] | ['Wiedmann', 'Boor', 'Raengpradub'] | Appl Environ Microbiol | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1016 | GSE7493 | 4/11/2007 | ['7493'] | ['2823'] | [u'17463094'] | 2785812 | [u'19917117'] | ['Cleveland', 'Lobsiger', 'Boill\xc3\xa9e'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493{{tag}}--REUSE-- Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1017 | GSE7495 | 4/12/2007 | ['7495'] | [] | [u'17496105'] | 1914138 | [u'17496105'] | ['Ruelland', 'Burketov\xc3\xa1', 'Zachowski', 'Flemr', 'Renou', 'Krinke', 'Valentov\xc3\xa1', 'Vergnolle', 'Taconnat', u'Martin-Magniette'] | ['Ruelland', 'Burketov\xc3\xa1', 'Zachowski', 'Flemr', 'Renou', 'Krinke', 'Valentov\xc3\xa1', 'Vergnolle', 'Taconnat'] | ['Ruelland', 'Burketov\xc3\xa1', 'Zachowski', 'Flemr', 'Renou', 'Krinke', 'Valentov\xc3\xa1', 'Vergnolle', 'Taconnat'] | Plant Physiol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
1018 | GSE7496 | 10/18/2007 | ['7496'] | [] | [u'17850650'] | 2048963 | [u'17850650'] | ['Vink', 'van', 'Rodenburg', 'Keijer', 'Bovee-Oudenhoven', 'Roosing', 'Katan', 'Kramer'] | ['Vink', 'van', 'Rodenburg', 'Keijer', 'Bovee-Oudenhoven', 'Roosing', 'Katan', 'Kramer'] | ['Vink', 'van', 'Rodenburg', 'Keijer', 'Bovee-Oudenhoven', 'Roosing', 'Katan', 'Kramer'] | BMC Microbiol | 2007 | 9/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1019 | GSE7499 | 10/15/2007 | ['7499'] | [] | [u'17916261'] | 2222645 | [u'17916261'] | ['Ward', 'Caley', 'Stone', 'Kassahn', 'Crozier'] | ['Ward', 'Caley', 'Stone', 'Kassahn', 'Crozier'] | ['Ward', 'Caley', 'Stone', 'Kassahn', 'Crozier'] | BMC Genomics | 2007 | 10/5/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1020 | GSE7504 | 4/12/2007 | ['7504'] | [] | [u'17764573'] | 2025592 | [u'17764573'] | ['', 'Mantri', 'Coram', 'Pang', 'Ford'] | ['Mantri', 'Coram', 'Pang', 'Ford'] | ['Mantri', 'Coram', 'Pang', 'Ford'] | BMC Genomics | 2007 | 9/2/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1021 | GSE7508 | 4/13/2007 | ['7508'] | ['2883'] | [u'17545577'] | 1933515 | [u'17545577'] | ['Ciuffi', 'Leipzig', 'Bushman', 'Wang', 'Berry'] | ['Ciuffi', 'Leipzig', 'Bushman', 'Wang', 'Berry'] | ['Ciuffi', 'Wang', 'Leipzig', 'Berry', 'Bushman'] | Genome Res | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
1022 | GSE7509 | 4/30/2007 | ['7509'] | [] | [u'17502666'] | 2882835 | [u'20479118'] | ['', 'Steinman', 'Banerjee', 'Ravetch', 'Dhodapkar', 'Matayeva', 'Veri', 'Connolly', 'Kukreja'] | ['Hosmalin', 'Dalod', 'Schwartz-Cornil', 'Guiton', 'Crozat', 'Boudinot', 'Storset', 'Ventre', 'Vu', 'Feuillet', 'Dutertre', 'Contreras', 'Marvel', 'Baranek'] | [] | J Exp Med | 2010 | 6/7/2010 | 0 | iglech mouse genes in 96 different cell types or tissues, in human (top) and mouse (bottom), respectively. The human data were retrieved from the GEO database, normal tissues and cell types from the GSE7307 dataset, PBMC-derived macrophages from GSE4883, monocyte-derived DCs from GSE7509{{tag}}--REUSE--, monocyte-derived macrophages from GSM213500, and alveolar macrophages from GSE2125, and blood and tonsil DC subset|) of the EBI ArrayExpress database. The data for the other leukocyte subsets directly isolated from normal human blood were described previously ( Du et al., 2006 ; Robbins et al., 2008 ) and can be downloaded from http://www-microarrays.u-strasbg.fr/files/datasetsE.php . The data for the mouse were downloaded from the BioGPS public database ( http://biogps.gnf.org ). Green circles, pDCs (dark, blood; l|ration and analysis of microarray data for mouse and human DC subsets were described elsewhere ( Robbins et al., 2008 ; Zucchini et al., 2008 ). All datasets used are public and their references for download from databases are given in the legends of figures. A compendium of human hgu133plus2 Affymetrix.CEL files was established, using Bioconductor (release 2.5) in the R statistical environment (version | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1023 | GSE7509 | 4/30/2007 | ['7509'] | [] | [u'17502666'] | 2118610 | [u'17502666'] | ['', 'Steinman', 'Banerjee', 'Ravetch', 'Dhodapkar', 'Matayeva', 'Veri', 'Connolly', 'Kukreja'] | ['Steinman', 'Banerjee', 'Ravetch', 'Dhodapkar', 'Matayeva', 'Veri', 'Connolly', 'Kukreja'] | ['Steinman', 'Banerjee', 'Ravetch', 'Dhodapkar', 'Matayeva', 'Veri', 'Connolly', 'Kukreja'] | J Exp Med | 2007 | 6/11/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1024 | GSE7514 | 10/14/2007 | ['7514'] | [] | [u'17933929'] | 2168136 | [u'17933929'] | ['Loss', 'Orsi', 'Boor', 'Schwab', 'Hu', 'Raengpradub', 'Wiedmann'] | ['Loss', 'Orsi', 'Boor', 'Schwab', 'Hu', 'Raengpradub', 'Wiedmann'] | ['Loss', 'Orsi', 'Boor', 'Schwab', 'Hu', 'Raengpradub', 'Wiedmann'] | Appl Environ Microbiol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1025 | GSE7517 | 11/3/2007 | ['7517'] | [] | [u'17965207'] | 2168140 | [u'17965207'] | ['Oliver', 'Orsi', 'Boor', 'Palmer', 'Raengpradub', 'Hu', 'Wiedmann'] | ['Oliver', 'Orsi', 'Boor', 'Palmer', 'Raengpradub', 'Hu', 'Wiedmann'] | ['Oliver', 'Orsi', 'Boor', 'Palmer', 'Raengpradub', 'Hu', 'Wiedmann'] | Appl Environ Microbiol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1026 | GSE7518 | 4/17/2007 | ['7518'] | [] | [u'17616735'] | 1955693 | [u'17616735'] | ['Molina', 'Kahmann'] | ['Molina', 'Kahmann'] | ['Molina', 'Kahmann'] | Plant Cell | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
1027 | GSE7523 | 6/26/2007 | ['7523'] | [] | [] | 2375008 | [u'17727723'] | [u'Liu', u'Song'] | ['Zhu', 'Johnson', 'Zhang', 'Song', 'Li', 'Liu', 'Manrai', 'Chen'] | [u'Liu', u'Song'] | Genome Biol | 2007 | 2007 | 0 | Lowess normalized data; (e) MA2C (Simple) normalized data; (f) MA2C (Robust C = 2) normalized data. Different colors correspond to different samples. Spike-in experiment We used the data (GEO GSE7523{{tag}}--REUSE--) from a recent spike-in experiment to test MA2C. The spike-in samples contained 96 clones in the ENCODE region of approximately 500 bp, at 8 different concentrations corresponding to (2 n + 1)-fol | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1028 | GSE7525 | 5/20/2007 | ['7525'] | [] | [u'18713793'] | 2238714 | [u'18197176'] | ['', 'Erdogan', 'Zhang', 'Ullmann', 'Schubert', 'Tzschach', 'Larsen', 'Jurkatis', 'Tommerup', 'Chen', 'T\xc3\xbcmer', 'Jacobsen', 'Ropers'] | ['Bezalel', 'Kaganovich', 'Weinberger', 'Barkai', 'Tirosh'] | [] | Mol Syst Biol | 2008 | 2008 | 0 | rue for the evolution of gene expression and to examine the possible contribution to phenotypic adaptation and speciation. Materials and methods Prediction of changes in human TF-binding sequences We downloaded the mammalian promoters multiple alignments from the UCSC genome browser. We focused on proximal promoters (1 kb) of human, chimpanzee, mouse and rat and searched them for the presence of exact mat|rved in chimpanzee and mouse (or rat). Prediction of changes in yeast TF-binding sequences Promoters (600 bp) of S. cerevisiae , S. paradoxus , S. mikatae , S. kudriavzevii and S. bayanus were downloaded from SGD and aligned by clustelw. We searched these promoters for the presence of TF sequence motifs as defined by MacIsaac et al (2006) . For each TF, we calculated the position weight matrix s|ae array and each subsequent array, and divided the log 2 expression ratios of that array by the regression coefficient. Raw and normalized expression data are available at the GEO (Gene Expression Omnibus, http://www.ncbi.nlm.nih.gov/geo/ ) database with the series accession GSE7525{{tag}}--DEPOSIT--. Differential expression in the mating response To identify genes that are differentially expressed among the three spe | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
1029 | GSE7536 | 8/14/2007 | ['7536'] | [] | [u'17660535'] | 2013721 | [u'17660535'] | ['Warchol', 'Lovett', 'Sajan'] | ['Warchol', 'Lovett', 'Sajan'] | ['Warchol', 'Lovett', 'Sajan'] | Genetics | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
1030 | GSE7540 | 4/26/2007 | ['7540'] | ['2678'] | [u'14557539'] | 2875035 | [u'20097653'] | ['Preuss', 'Kudo', 'Redmond', 'Geschwind', 'Zapala', 'C\xc3\xa1ceres', 'Lachuer', u'C\xe1ceres', 'Barlow', 'Lockhart'] | ['Ruppin', 'Sharan', 'Shlomi', 'Tuller', 'Waldman'] | [] | Nucleic Acids Res | 2010 | 2010 May | 0 | e). GE data All expression data was downloaded from Gene Expression Omnibus ( 34 ) ( http://www.ncbi.nlm.nih.gov/geo/ ). Human tissues (including fetal tissues): we used the GE of Su et al. ( 35 ) (GDS596). As the original data set is redundant (i.e. it includes similar tissues; for example, more than 20 of the tissues are from different parts of the brain) we focused our analysis on 30 (out of 79) n|ssues ( Supplementary Table S2 ). Other GE sets: fetal and adult circulating blood reticulocytes (GDS2655), Mouse tissues (GDS592), Mouse fetal and adult liver (GSE13149), Mouse embryonic stem cells (GDS2666), Yeast (GDS772, wild type), Chimpanzee (GSE7540{{tag}}--REUSE--), Rat (GDS589, three strains), E. coli (GSE6836), D. melanogaster (GSE7763) and C. elegans (GSE8004). We averaged technical repeats and probes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1031 | GSE7540 | 4/26/2007 | ['7540'] | ['2678'] | [u'14557539'] | 2756411 | [u'18849986'] | ['Preuss', 'Kudo', 'Redmond', 'Geschwind', 'Zapala', 'C\xc3\xa1ceres', 'Lachuer', u'C\xe1ceres', 'Barlow', 'Lockhart'] | ['Iwamoto', 'Konopka', 'Oldham', 'Langfelder', 'Geschwind', 'Kato', 'Horvath'] | ['Geschwind'] | Nat Neurosci | 2008 | 2008 Nov | 0 | rray data Microarray data from human cerebral cortex, caudate nucleus and cerebellum were gathered from nine published studies 2 , 4 - 7 , 19 - 22 . Raw data from these studies are available in GEO ( GSE1572 (ref. 22 ), GSE3790 (ref. 6 ), GSE5392 (ref. 7 ), GSE7540{{tag}}--REUSE-- (ref. 19 ), GSE12649 (ref. 4 ) and GSE12654 (ref. 5 ) at http://www.ncbi.nlm.nih.gov/geo/ ) or ArrayExpress (E-AFMX-1 (re | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1032 | GSE7540 | 4/26/2007 | ['7540'] | ['2678'] | [u'14557539'] | 1933513 | [u'17609390'] | ['Preuss', 'Kudo', 'Redmond', 'Geschwind', 'Zapala', 'C\xc3\xa1ceres', 'Lachuer', u'C\xe1ceres', 'Barlow', 'Lockhart'] | ['Schork', 'C\xc3\xa1ceres', 'Zapala', 'Greenhall', 'Libiger', 'Barlow', 'Lockhart'] | ['C\xc3\xa1ceres', 'Zapala', 'Lockhart', 'Barlow'] | Genome Res | 2007 | 2007 Aug | 0 | sion data from a study on inflammatory bowel disease were analyzed ( Burczynski et al. 2006 ). Data for peripheral blood samples on Affymetrix HG-U133A arrays were obtained from GEO accession number GSE3365 ( http://www.ncbi.nlm.nih.gov/projects/geo/ ) and directly from the investigators. The aim of the study was to identify gene expression signatures from peripheral blood mononuclear cells that coul|etions or insertions and genes with different splice forms ( Winzeler et al. 1998 ; Hu et al. 2001 ; Li and Wong 2001 ). We further illustrated the additional information that can be generated with publicly available data files that contain detailed clinical or phenotypic information. Using GeSNP, we identified several well-known inflammatory bowel disease candidate genes and many new, promising candida|act Results Discussion Methods References Methods Computer software The algorithm was written in standard ANSI C++ and compiled to run on UNIX. The extensively commented source code is available for download from Supplemental materials at the Genome Research Web site and the GeSNP Web site, http://porifera.ucsd.edu/~cabney/cgi-bin/geSNP.cgi . In addition, the GeSNP Web site hosts a user-friendly Web-b|within a probe set, a greater t -value, and/or multiple probe sets representing a single gene provide increased confidence that a true sequence difference exists for that gene. Annotation files were downloaded from Affymetrix ( http://www.affymetrix.com/analysis/index.affx ). Additional information on candidate genes was obtained from NCBI’s Entrez ( http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?|Standard protocols were used for the generation of cDNA from RNA. Primers were designed to amplify the regions defined by the Affymetrix probe set target sequences of the selected genes, which can be downloaded from the Affymetrix Analysis Center Web site. Standard PCR reactions were performed on an Applied Biosystems GeneAmp PCR System 9700, and PCR products were purified using the recommended procedures|e accessed at http://porifera.ucsd.edu/~cabney/cgi-bin/geSNP.cgi . The Affymetrix CEL files for the mouse studies and the human/chimpanzee array data have been submitted to GEO under accession nos. GSE6238 and GSE7540{{tag}}--DEPOSIT-- , respectively.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6307307  Other Sectionsâ�¼ Abstract Results Discuss | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1033 | GSE7554 | 5/11/2007 | ['7554'] | [] | [u'17500590'] | 1876501 | [u'17500590'] | ['Renne', 'Riva', 'Skalsky', 'Baker', 'Maldonado', 'Samols', 'Lopez'] | ['Renne', 'Riva', 'Skalsky', 'Baker', 'Maldonado', 'Samols', 'Lopez'] | ['Renne', 'Riva', 'Skalsky', 'Baker', 'Maldonado', 'Samols', 'Lopez'] | PLoS Pathog | 2007 | 5/11/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1034 | GSE7556 | 8/24/2007 | ['7556'] | ['2963'] | [u'17726699'] | 2988731 | [u'21044322'] | ['Lam', 'Lee', 'Zhang', 'Chari', 'Buys', 'Ling', 'MacAulay'] | ['Ali', 'Hooks', 'Altman', 'Hurst', 'Murph', 'Callihan'] | [] | Mol Cancer | 2010 | 11/2/2010 | 0 | hased from Ambion (Austin, TX). RGS plasmids were purchased from the UMR cDNA Resource Center (Rolla, MO). Bioinformatics Gene expression profiling data were acquired through the NCBI Gene Expression Omnibus (GEO) DataSets. The datasets GSE7556{{tag}}--REUSE-- [ 15 ], GSE15709 [ 16 ] and GSE2058 (unpublished) were downloaded and mined using Microsoft Excel prior to further analysis. Hierarchical clustering analyses was |ossible roles for RGS proteins in ovarian cancer chemoresistance. To determine whether altered RGS expression correlates with acquired chemoresistance, we assessed RGS expression in multiple datasets downloaded from the NCBI Gene Expression Omnibus DataSets that contain whole-genome expression data in cultured ovarian cancer cell lines before and after acquired chemoresistance. Dataset GSE15709 describes | RGS transcripts and the cisplatin-resistant phenotype (Figure 1A ). We further compared the level of expression from individual RGS transcripts in parental A2780 and cisplatin-resistant cells using GSE15709 and determined that RGS2, RGS5, RGS10, and RGS17 were significantly lower in resistant cells than in parental cells (Figure 1A , ***p < 0.001, *p < 0.05). Multiple probes for RGS5|;ΔCt method. *p < 0.05, normalized control vs. RGS groups. To confirm these results, we assessed RGS expression using an additional independent microarray gene expression dataset. In GSE7556{{tag}}--REUSE--, transcript expression in SKOV-3 cells with acquired chemoresistance to vincristine was compared to parental vincristine-sensitive cells [ 15 ]. Comparison of the changes among RGS transcripts conf | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1035 | GSE7557 | 8/31/2007 | ['7557'] | [] | [u'17965775'] | 2040320 | [u'17965775'] | ['Jones', 'Broaddus', 'Wolters', 'Markovics', 'Cambier', 'Araya', 'Finkbeiner', 'Sheppard', 'Barzcak', 'Erle', 'Xiao', 'Hill', 'Jablons', 'Nishimura'] | ['Jones', 'Broaddus', 'Wolters', 'Markovics', 'Cambier', 'Araya', 'Finkbeiner', 'Sheppard', 'Barzcak', 'Erle', 'Xiao', 'Hill', 'Jablons', 'Nishimura'] | ['Jones', 'Broaddus', 'Wolters', 'Markovics', 'Cambier', 'Araya', 'Finkbeiner', 'Sheppard', 'Barzcak', 'Erle', 'Xiao', 'Hill', 'Jablons', 'Nishimura'] | J Clin Invest | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1036 | GSE7559 | 7/31/2007 | ['7559'] | [] | [u'17785531'] | 1987344 | [u'17785531'] | ['King', 'Van', 'Kaur', 'Hohmann', 'Schmid', 'Martin', 'Baliga', 'Pan', 'Reiss'] | ['King', 'Van', 'Kaur', 'Hohmann', 'Schmid', 'Martin', 'Baliga', 'Pan', 'Reiss'] | ['King', 'Van', 'Kaur', 'Hohmann', 'Schmid', 'Martin', 'Baliga', 'Pan', 'Reiss'] | Genome Res | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1037 | GSE7560 | 11/10/2007 | ['7560'] | [] | [u'18158900'] | 2193228 | [u'18158900'] | ['', 'McLaughlin', 'Monet', 'Pidoux', 'Bonilla', 'Allshire', 'Richardson', 'Hamilton', 'Dunleavy', 'Ekwall'] | ['McLaughlin', 'Monet', 'Pidoux', 'Bonilla', 'Allshire', 'Richardson', 'Hamilton', 'Dunleavy', 'Ekwall'] | ['McLaughlin', 'Monet', 'Pidoux', 'Hamilton', 'Allshire', 'Richardson', 'Bonilla', 'Dunleavy', 'Ekwall'] | Mol Cell | 2007 | 12/28/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1038 | GSE7563 | 10/1/2007 | ['7563'] | [] | [u'17711579'] | 1988831 | [u'17711579'] | ['Soderlund', 'Brunborg', 'Lindeman', 'Andreassen', 'Komada', 'Olsen', 'Duale'] | ['Soderlund', 'Brunborg', 'Lindeman', 'Andreassen', 'Komada', 'Olsen', 'Duale'] | ['Soderlund', 'Brunborg', 'Lindeman', 'Andreassen', 'Komada', 'Olsen', 'Duale'] | Mol Cancer | 2007 | 8/21/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1039 | GSE7564 | 8/24/2007 | ['7564'] | [] | [u'17666523'] | 2658633 | [u'18535165'] | ['Angers', 'Xie', 'Gimble', 'Kang', 'Grissom', 'Wu', 'Jetten', 'Beak', 'Collins', 'Wada'] | ['Kang', 'Jetten', 'Wada', 'Xie'] | ['Kang', 'Jetten', 'Wada', 'Xie'] | Exp Biol Med (Maywood) | 2008 | 2008 Oct | 0 | of RORα and/or RORγ also affect the expression of many other genes in the liver (for the complete microarray data, see http://www.ncbi.nlm.nih.gov/geo/ (GEO Series accession number GSE7564{{tag}}--DEPOSIT-- )). Moreover, RORα sg/sg mice, but not the RORγ null mice, showed lower plasma triglyceride and cholesterol levels than the control mice. In contrast, RORγ null mice, but n | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1040 | GSE7567 | 7/30/2007 | ['7567'] | ['3070'] | [u'17601827'] | 1955718 | [u'17601827'] | ['Jiang', 'Ono', 'Takatsuji', 'Sugano', 'Nakayama', 'Toki', 'Shimono'] | ['Jiang', 'Ono', 'Takatsuji', 'Sugano', 'Nakayama', 'Toki', 'Shimono'] | ['Jiang', 'Ono', 'Takatsuji', 'Sugano', 'Nakayama', 'Toki', 'Shimono'] | Plant Cell | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
1041 | GSE7569 | 4/24/2007 | ['7569'] | [] | [u'17578908'] | 1899476 | [u'17578908'] | ['Kasukawa', 'Yamada', 'Hardin', 'Takahashi', 'Itoh', 'Matsumoto', 'Tanimura', 'Ueda', 'Houl', 'Dauwalder', 'Ukai-Tadenuma', u'Tadenuma', 'Uno'] | ['Kasukawa', 'Yamada', 'Hardin', 'Takahashi', 'Itoh', 'Matsumoto', 'Tanimura', 'Ueda', 'Houl', 'Dauwalder', 'Ukai-Tadenuma', 'Uno'] | ['Kasukawa', 'Yamada', 'Hardin', 'Takahashi', 'Itoh', 'Matsumoto', 'Tanimura', 'Houl', 'Dauwalder', 'Ueda', 'Ukai-Tadenuma', 'Uno'] | Genes Dev | 2007 | 7/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1042 | GSE7570 | 4/26/2007 | ['7570'] | [] | [u'18829985'] | 2593676 | [u'18829985'] | ['', 'Venger', 'Less', 'Blum', 'Morin', 'Aharoni', 'Eshed', 'Malitsky', 'Elbaz'] | ['Venger', 'Less', 'Blum', 'Morin', 'Aharoni', 'Eshed', 'Malitsky', 'Elbaz'] | ['Venger', 'Less', 'Blum', 'Morin', 'Eshed', 'Aharoni', 'Malitsky', 'Elbaz'] | Plant Physiol | 2008 | 2008 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1043 | GSE7571 | 12/29/2007 | ['7571'] | [] | [] | 2396644 | [u'18485203'] | [u'Papoutsakis', u'Wang', u'Windgassen'] | ['Papoutsakis', 'Wang', 'Windgassen'] | ['Papoutsakis', 'Wang', 'Windgassen'] | BMC Genomics | 2008 | 5/16/2008 | 0 | ontology assignment) was carried out using 'MultiExperiment Viewer (MeV)' from The Institute for Genomic Research (TIGR) [ 56 ]. Raw and normalized data were deposited in the Gene Expression Omnibus (GSE6607 (CD3+ T-cell experiment), GSE7571{{tag}}--DEPOSIT-- (CD4+ T-cell experiment) and GSE7572 (CD8+ T-cell experiment)) [ 57 ]. Within each population (three biological replicates using cells from different donors), mult | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1044 | GSE7571 | 12/29/2007 | ['7571'] | [] | [] | 2600644 | [u'18947405'] | [u'Papoutsakis', u'Wang', u'Windgassen'] | ['Papoutsakis', 'Wang', 'Windgassen'] | ['Papoutsakis', 'Wang', 'Windgassen'] | BMC Med Genomics | 2008 | 10/23/2008 | 0 | ical clustering, and Gene Ontology assignment) with 'MultiExperiment Viewer (MeV)' from The Institute for Genomic Research (TIGR) [ 11 ]. Raw and normalized data were deposited in the Gene Expression Omnibus (GSE6607 (CD3+ T-cell experiment), GSE7571{{tag}}--DEPOSIT-- (CD4+ T-cell experiment) and GSE7572 (CD8+ T-cell experiment)) [ 12 ]. Within each population (three biological replicates using cells from three different |J Liang W Bhagabati N Braisted J Klapa M Currier T Thiagarajan M TM4: a free, open-source system for microarray data management and analysis Biotechniques 2003 34 374 378 12613259 The Gene Expression Omnibus The Gene Ontology Consortium Website Hosack DA Dennis G Jr Sherman BT Lane HC Lempicki RA Identifying biological themes within lists of genes with EASE Genome Biol 2003 4 R70 14519205 10.1186/gb-2003 | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1045 | GSE7572 | 12/29/2007 | ['7572'] | [] | [] | 2396644 | [u'18485203'] | [u'Papoutsakis', u'Wang', u'Windgassen'] | ['Papoutsakis', 'Wang', 'Windgassen'] | ['Papoutsakis', 'Wang', 'Windgassen'] | BMC Genomics | 2008 | 5/16/2008 | 0 | ontology assignment) was carried out using 'MultiExperiment Viewer (MeV)' from The Institute for Genomic Research (TIGR) [ 56 ]. Raw and normalized data were deposited in the Gene Expression Omnibus (GSE6607 (CD3+ T-cell experiment), GSE7571 (CD4+ T-cell experiment) and GSE7572{{tag}}--DEPOSIT-- (CD8+ T-cell experiment)) [ 57 ]. Within each population (three biological replicates using cells from different donors), mult | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1046 | GSE7572 | 12/29/2007 | ['7572'] | [] | [] | 2600644 | [u'18947405'] | [u'Papoutsakis', u'Wang', u'Windgassen'] | ['Papoutsakis', 'Wang', 'Windgassen'] | ['Papoutsakis', 'Wang', 'Windgassen'] | BMC Med Genomics | 2008 | 10/23/2008 | 0 | tering, and Gene Ontology assignment) with 'MultiExperiment Viewer (MeV)' from The Institute for Genomic Research (TIGR) [ 11 ]. Raw and normalized data were deposited in the Gene Expression Omnibus (GSE6607 (CD3+ T-cell experiment), GSE7571 (CD4+ T-cell experiment) and GSE7572{{tag}}--DEPOSIT-- (CD8+ T-cell experiment)) [ 12 ]. Within each population (three biological replicates using cells from three different donors) | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1047 | GSE7573 | 4/26/2007 | ['7573'] | [] | [u'17494765'] | 1895976 | [u'17494765'] | ['Liao', 'Hyduke', 'Tran', 'Jarboe', 'Chou'] | ['Liao', 'Hyduke', 'Tran', 'Jarboe', 'Chou'] | ['Liao', 'Hyduke', 'Tran', 'Jarboe', 'Chou'] | Proc Natl Acad Sci U S A | 2007 | 5/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1048 | GSE7576 | 5/25/2007 | ['7576'] | [] | [u'17510325'] | 2831334 | [u'19969546'] | [u'Hackermueller', 'Hackerm\xc3\xbcller', 'Cheng', 'Stadler', 'Hofacker', 'Willingham', 'Tammana', 'Dumais', 'Dike', 'Gingeras', 'Nix', 'Duttagupta', 'Ganesh', 'Hertel', 'Drenkow', 'Ghosh', 'Helt', 'Kapranov', u'Madhavan', 'Sementchenko', 'Bell', 'Piccolboni', 'Cheung', 'Patel'] | ['Chen'] | [] | Nucleic Acids Res | 2010 | 2010 Mar | 0 | ates. The profiled RNAs were polyadenylated and >200 nt. The preprocessed probe signal and the transcribed regions (transfrags) were downloaded from the GEO database with the accession number GSE7576{{tag}}--REUSE-- ( http://www.ncbi.nlm.nih.gov/geo ). The signal thresholds for the transfrags correspond to a 4% false discovery rate. The probe intensity was further quantile-normalized for the cytosol and nucleu | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1049 | GSE7576 | 5/25/2007 | ['7576'] | [] | [u'17510325'] | 2919708 | [u'20385588'] | [u'Hackermueller', 'Hackerm\xc3\xbcller', 'Cheng', 'Stadler', 'Hofacker', 'Willingham', 'Tammana', 'Dumais', 'Dike', 'Gingeras', 'Nix', 'Duttagupta', 'Ganesh', 'Hertel', 'Drenkow', 'Ghosh', 'Helt', 'Kapranov', u'Madhavan', 'Sementchenko', 'Bell', 'Piccolboni', 'Cheung', 'Patel'] | ['Staller', 'Silva', 'Grosso', 'Fel\xc3\xadcio-Silva', 'Mollet', 'Carmo-Fonseca', 'Ben-Dov', 'Eleut\xc3\xa9rio', 'Alves'] | [] | Nucleic Acids Res | 2010 | 2010 Aug | 0 | tched against first exons, including the region 200 nt upstream. Tiling array data Tiling array data covering the non-repetitive portion of the human genome ( 25 ) were obtained from NCBI Geo Dataset GSE7576{{tag}}--REUSE--. These data related to the 2004 human genome assembly sequence (NCBI Build 35, hg17); therefore, the genomic coordinates of the tiling array probe signals were lifted to the March 2006 human genome | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1050 | GSE7576 | 5/25/2007 | ['7576'] | [] | [u'17510325'] | 2423295 | [u'18378698'] | [u'Hackermueller', 'Hackerm\xc3\xbcller', 'Cheng', 'Stadler', 'Hofacker', 'Willingham', 'Tammana', 'Dumais', 'Dike', 'Gingeras', 'Nix', 'Duttagupta', 'Ganesh', 'Hertel', 'Drenkow', 'Ghosh', 'Helt', 'Kapranov', u'Madhavan', 'Sementchenko', 'Bell', 'Piccolboni', 'Cheung', 'Patel'] | ['Subtil-Rodr\xc3\xadguez', 'Quiles', 'Beato', 'Jordan', 'Mill\xc3\xa1n-Ari\xc3\xb1o', 'Ballar\xc3\xa9'] | [] | Mol Cell Biol | 2008 | 2008 Jun | 0 | AND pmc_gds | 0 | 1 | ||||
1051 | GSE7578 | 5/1/2007 | ['7578'] | ['2724'] | [u'17452456'] | 2570201 | [u'18701473'] | ['Wiederschain', 'Taraszka', 'Jackson', 'Wang', 'Johnson', 'Fordjour', 'Morrissey', 'Bettano', 'Deeds', 'Mosher', 'Lengauer', 'Jones', 'Chen', 'Benson'] | ['Triche', 'van', 'Abdueva', 'Douglas', 'Peng', 'Shimada', 'Hsu', 'Hung', 'Cooper', 'Lawlor'] | [] | Cancer Res | 2008 | 8/15/2008 | 0 | AND pmc_gds | 0 | 1 | ||||
1052 | GSE7580 | 5/9/2007 | ['7580'] | [] | [u'17579518'] | 1892575 | [u'17579518'] | ['Bulski', u'Tanurdzic', 'Jiang', 'Dedhia', 'Agier', 'Tanurdzi\xc4\x87', u'Carasquillo', 'Lippman', 'McCombie', 'Colot', 'Doerge', 'Martienssen', 'Carrasquillo', 'Rabinowicz', 'Vaughn'] | ['Bulski', 'Jiang', 'Dedhia', 'Agier', 'Tanurdzi\xc4\x87', 'Lippman', 'McCombie', 'Colot', 'Doerge', 'Martienssen', 'Carrasquillo', 'Rabinowicz', 'Vaughn'] | ['Bulski', 'Jiang', 'Dedhia', 'Agier', 'Tanurdzi\xc4\x87', 'Lippman', 'McCombie', 'Colot', 'Doerge', 'Martienssen', 'Carrasquillo', 'Rabinowicz', 'Vaughn'] | PLoS Biol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
1053 | GSE7586 | 7/1/2007 | ['7586'] | ['2822'] | [u'17579077'] | 2761903 | [u'19772654'] | ['Duffy', 'Lachowitzer', 'Fried', 'Mutabingwa', 'Muehlenbachs'] | ['Gillis', 'Pavlidis'] | [] | BMC Bioinformatics | 2009 | 9/22/2009 | 0 | experiment name, organism part, array design and age category for the experiments are listed in each column. Experiments used for analysis . Gemma ID Name Organism part Array Design Age category 622 GSE8586 Umbilical cord GPL570 Prenatal 726 GSE9164 Foreskin cells GPL5876 Prenatal 233 GSE1397 Brain, heart GPL96 Prenatal 215 khatua-astrocytoma Brain GPL91 Child/young adult 218 pomeroy-embryonal Brain, |Child/young adult 555 GSE5808 Blood cell GPL96 Child/young adult 585 GSE7586{{tag}}--REUSE-- Placenta GPL570 Adult 178 GSE80 Muscle GPL91 Adult 633 GSE8607 Testis GPL91 Adult 275 GSE4757 Brain GPL570 Older adult 721 GSE8919 Brain GPL2700 Older adult 263 GSE5281 Brain GPL570 Older adult To allow the investigation of differential expression over age, we computed a relative rank-based measure of expression level for each | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1054 | GSE7599 | 6/5/2007 | ['7599'] | [] | [u'17515920'] | 2614768 | [u'19050074'] | ['Feng', 'Gutierrez', 'Lin', 'Rowe', 'Zaghlul', 'Stratton', 'Kabbarah', 'Look', 'Futreal', 'Histen', 'Maser', "O'Neil", 'Fielding', 'Aster', 'Edkins', u'Depinho', 'Perna', 'Goldstone', 'Protopopov', 'Jiang', 'Foroni', 'Wong', 'Nogueira', 'DePinho', 'Martin', 'Mani', 'Choudhury', 'Wiedemeyer', 'Duke', 'Mansour', 'Wang', 'Stevens', 'Brennan', 'McNamara', 'Ivanova', 'Chin', 'Campbell'] | ['Kimmelman', 'Aguirre', 'Feng', 'Ponugoti', 'Chu', 'Redston', 'Nabioullin', 'Paik', 'Futreal', 'Zheng', 'Zhang', 'Protopopova', 'Hahn', 'Deroo', 'Protopopov', 'Klimstra', 'Tsao', 'Yang', 'DePinho', 'McGrath', 'Wang', 'Yeo', 'Xiao', 'Hezel', 'Sahin', 'Ivanova', 'Chin', 'Ying'] | ['Feng', 'Wang', 'Protopopov', 'Futreal', 'Ivanova', 'Chin', 'DePinho'] | Proc Natl Acad Sci U S A | 2008 | 12/9/2008 | 0 | High resolution aCGH and bioinformatic analysis identified recurrent copy number alterations (CNAs) in a collection of 14 human PDAC tumors and 15 cell lines [Fig. 1AÊandÊsupporting information (SI) Table S1] (42){{tag}}--MENTION-- | 0 | 0 | 1 | NOT pmc_gds | 1 | 0 |
1055 | GSE7600 | 6/5/2007 | ['7600'] | [] | [u'17515920'] | 2748096 | [u'19728865'] | ['Feng', 'Gutierrez', 'Lin', 'Rowe', 'Zaghlul', 'Stratton', 'Kabbarah', 'Look', 'Futreal', 'Histen', 'Maser', "O'Neil", 'Fielding', 'Aster', 'Edkins', u'Depinho', 'Perna', 'Goldstone', 'Protopopov', 'Jiang', 'Foroni', 'Wong', 'Nogueira', 'DePinho', 'Martin', 'Mani', 'Choudhury', 'Wiedemeyer', 'Duke', 'Mansour', 'Wang', 'Stevens', 'Brennan', 'McNamara', 'Ivanova', 'Chin', 'Campbell'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | ['Feng'] | BMC Genomics | 2009 | 9/3/2009 | 1 | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192 (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965 GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381 GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029 GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615 GPL2884 Cancer Process Study 0.752 0.0107 GSE7600{{tag}}--REUSE-- GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192 GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625 GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528 GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1056 | GSE7601 | 4/26/2007 | ['7601'] | [] | [u'17524146'] | 2206345 | [u'17524146'] | ['Kuipers', 'Klos', 'Zeidler', 'Rihl', 'Hess', 'Schrader'] | ['Kuipers', 'Klos', 'Zeidler', 'Rihl', 'Hess', 'Schrader'] | ['Schrader', 'Klos', 'Zeidler', 'Rihl', 'Hess', 'Kuipers'] | Arthritis Res Ther | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1057 | GSE7606 | 6/5/2007 | ['7606'] | [] | [u'17515920'] | 2875381 | [u'20520718'] | ['Feng', 'Gutierrez', 'Lin', 'Rowe', 'Zaghlul', 'Stratton', 'Kabbarah', 'Look', 'Futreal', 'Histen', 'Maser', "O'Neil", 'Fielding', 'Aster', 'Edkins', u'Depinho', 'Perna', 'Goldstone', 'Protopopov', 'Jiang', 'Foroni', 'Wong', 'Nogueira', 'DePinho', 'Martin', 'Mani', 'Choudhury', 'Wiedemeyer', 'Duke', 'Mansour', 'Wang', 'Stevens', 'Brennan', 'McNamara', 'Ivanova', 'Chin', 'Campbell'] | ['Nazarian', 'Chin', 'Duncan', 'Wagner', 'Feng', 'Xiao', 'Wu', 'Nogueira', 'Cordon-Cardo', 'Bosenberg', 'Kwong', 'Brennan', 'Scott', 'Kabbarah', 'Ramaswamy', 'Granter', 'Golub'] | ['Nogueira', 'Feng', 'Chin', 'Kabbarah', 'Brennan'] | PLoS One | 2010 | 5/24/2010 | 0 | clinical and histopathologic characteristics of these samples are summarized in Supplemental Tables S1 and S2 , and the array-CGH profiles are available online at GEO under super-series accession #GSE7606{{tag}}--DEPOSIT--. Raw array-CGH profiles were processed by a modified circular binary segmentation (CBS) algorithm [11] , [12] , and copy number aberrations (CNAs), represented by |taining 22,500 elements designed for expression profiling (Human 1A V2, Agilent Technologies). All data is MIAME compliant, and the raw data has been deposited in to GEO under super-series accession #GSE7606{{tag}}--DEPOSIT--. Using NCBI Build 35, 16,097 unique map positions were defined with a median interval between mapped elements of 54.8 Kb. Fluorescence ratios of scanned images were calculated as the average of two | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1058 | GSE7615 | 6/5/2007 | ['7615'] | [] | [u'17515920'] | 2748096 | [u'19728865'] | ['', 'Feng', 'Gutierrez', 'Lin', 'Rowe', 'Zaghlul', 'Stratton', 'Kabbarah', 'Look', 'Futreal', 'Histen', 'Maser', "O'Neil", 'Fielding', 'Aster', 'Edkins', 'Perna', 'Goldstone', 'Protopopov', 'Jiang', 'Foroni', 'Wong', 'Nogueira', 'DePinho', 'Martin', 'Mani', 'Choudhury', 'Wiedemeyer', 'Duke', 'Mansour', 'Wang', 'Stevens', 'Brennan', 'McNamara', 'Ivanova', 'Chin', 'Campbell'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | ['Feng'] | BMC Genomics | 2009 | 9/3/2009 | 1 | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192 (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965 GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381 GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029 GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615{{tag}}--REUSE-- GPL2884 Cancer Process Study 0.752 0.0107 GSE7600 GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192 GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625 GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528 GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1059 | GSE7615 | 6/5/2007 | ['7615'] | [] | [u'17515920'] | 2150982 | [u'18070937'] | ['', 'Feng', 'Gutierrez', 'Lin', 'Rowe', 'Zaghlul', 'Stratton', 'Kabbarah', 'Look', 'Futreal', 'Histen', 'Maser', "O'Neil", 'Fielding', 'Aster', 'Edkins', 'Perna', 'Goldstone', 'Protopopov', 'Jiang', 'Foroni', 'Wong', 'Nogueira', 'DePinho', 'Martin', 'Mani', 'Choudhury', 'Wiedemeyer', 'Duke', 'Mansour', 'Wang', 'Stevens', 'Brennan', 'McNamara', 'Ivanova', 'Chin', 'Campbell'] | ['Feng', 'Moreau', 'Maser', 'Tchinda', "O'Neil", 'Neuberg', 'DeAngelo', 'Li', 'Liu', 'Look', 'Wong', 'DePinho', 'Gutierrez', 'Silverman', 'Chin', 'McKenna', 'Lee', 'Kutok', 'Rothstein'] | ['Feng', 'Maser', 'Gutierrez', "O'Neil", 'Wong', 'Chin', 'DePinho', 'Look'] | J Exp Med | 2007 | 12/24/2007 | 0 | 44K (cell lines) or 244K (primary samples) 60-mer oligonucleotides (Agilent Technologies), as previously described ( 33 ). Microarray data have been deposited in the GEO database under accession no. GSE7615{{tag}}--DEPOSIT-- . FISH. FISH was performed with a commercially available probe mix containing a MYB (6q23) probe labeled with spectrum aqua and a centromeric probe for chromosome 6 labeled with spectrum orang | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1060 | GSE7615 | 6/5/2007 | ['7615'] | [] | [u'17515920'] | 2714968 | [u'17515920'] | ['', 'Feng', 'Gutierrez', 'Lin', 'Rowe', 'Zaghlul', 'Stratton', 'Kabbarah', 'Look', 'Futreal', 'Histen', 'Maser', "O'Neil", 'Fielding', 'Aster', 'Edkins', 'Perna', 'Goldstone', 'Protopopov', 'Jiang', 'Foroni', 'Wong', 'Nogueira', 'DePinho', 'Martin', 'Mani', 'Choudhury', 'Wiedemeyer', 'Duke', 'Mansour', 'Wang', 'Stevens', 'Brennan', 'McNamara', 'Ivanova', 'Chin', 'Campbell'] | ['Feng', 'Gutierrez', 'Lin', 'Rowe', 'Zaghlul', 'Stratton', 'Kabbarah', 'Look', 'Futreal', 'Histen', 'Maser', "O'Neil", 'Fielding', 'Aster', 'Edkins', 'Perna', 'Goldstone', 'Protopopov', 'Jiang', 'Foroni', 'Wong', 'Nogueira', 'DePinho', 'Martin', 'Mani', 'Choudhury', 'Wiedemeyer', 'Duke', 'Mansour', 'Wang', 'Stevens', 'Brennan', 'McNamara', 'Ivanova', 'Chin', 'Campbell'] | ['Feng', 'Gutierrez', 'Lin', 'Rowe', 'Zaghlul', 'Stratton', 'Kabbarah', 'Look', 'Futreal', 'Histen', 'Maser', "O'Neil", 'Fielding', 'Aster', 'Edkins', 'Perna', 'Goldstone', 'Protopopov', 'Jiang', 'Foroni', 'Wong', 'Nogueira', 'DePinho', 'Martin', 'Mani', 'Choudhury', 'Wiedemeyer', 'Duke', 'Mansour', 'Wang', 'Stevens', 'Brennan', 'McNamara', 'Ivanova', 'Chin', 'Campbell'] | Nature | 2007 | 6/21/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1061 | GSE7620 | 4/27/2007 | ['7620'] | [] | [u'17616984'] | 1913099 | [u'17616984'] | ['Fischetti', 'Kirk', 'Schuch', 'Euler', 'Ryan'] | ['Fischetti', 'Kirk', 'Schuch', 'Euler', 'Ryan'] | ['Fischetti', 'Kirk', 'Schuch', 'Euler', 'Ryan'] | PLoS Comput Biol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
1062 | GSE7621 | 6/14/2007 | ['7621'] | ['2821'] | [u'17571925'] | 2620272 | [u'19014681'] | ['Lesnick', 'Papapetropoulos', 'Maraganore', 'Shehadeh', 'de', 'Ffrench-Mullen', 'Mash', 'Rocca', 'Henley', 'Ahlskog'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621{{tag}}--REUSE--, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1063 | GSE7621 | 6/14/2007 | ['7621'] | ['2821'] | [u'17571925'] | 2654916 | [u'19305504'] | ['Lesnick', 'Papapetropoulos', 'Maraganore', 'Shehadeh', 'de', 'Ffrench-Mullen', 'Mash', 'Rocca', 'Henley', 'Ahlskog'] | ['Matigian', 'Silburn', 'Wells', 'Chalk', 'Anderson', 'Mellick', 'Mackay-Sim', 'Sutherland'] | [] | PLoS One | 2009 | 2009 | 0 | CB , PT 4 4 4 MSA U133A RMA ANOVA Zhang 15965975 - SN ,PT, BA9 15 19 - U133A RMA ANOVA+FDR Miller 16143538 - SN, STR 6&4 8&4 - CodeLink CodeLink Student t-test Moran 16344956 GSE8397 LSN, MSN, SFG 15&9&5 8&7&3 - U133A+B GC-RMA+MAS5+PLIER two-class unpaired+FDR Lesnick 17571925 GSE7621{{tag}}--REUSE-- SN 16 9 - U133 plus 2.0 MAS5 A|n on a single gene or pathway basis. Materials and Methods We conducted literature searches in National Center for Biotechnology Information (NCBI) PubMed and dataset searches in NCBI gene expression omnibus and ArrayExpress (EBI) [48] , [49] to identify all reported microarray studies that explored differential gene expression in Parkinson disease. Studies satisfied th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1064 | GSE7621 | 6/14/2007 | ['7621'] | ['2821'] | [u'17571925'] | 2957424 | [u'20976054'] | ['Lesnick', 'Papapetropoulos', 'Maraganore', 'Shehadeh', 'de', 'Ffrench-Mullen', 'Mash', 'Rocca', 'Henley', 'Ahlskog'] | ['Shamir', 'Karp', 'Ulitsky', 'Krishnamurthy'] | [] | PLoS One | 2010 | 10/19/2010 | 0 | pone.0013367.t001 Table 1 Gene expression datasets used in this study. Dataset KEGG pathway Reference GEO accession Number of cases Number of controls AD Alzheimer's disease (AD) [41] GSE5281 10 13 ASTHMA Asthma [46] GSE4302 42 28 PYLORI Epithelial cell signaling in Helicobacter pylori infection - GSE5081 8 8 HD Huntington's disease (HD) [48] GSE3790 38 3|-GLIOBLASTOMA Pathways in cancer [47] GSE4290 77 23 SUN-ASTROCYTOMA Pathways in cancer [47] GSE4290 26 23 SUN-OLIGODENDROGLIOMA Pathways in cancer [47] GSE4290 50 23 ESTILO-OTSCC Pathways in cancer [44] GSE13601 31 26 YE-OTSCC Pathways in cancer [45] GSE9844 26 12 MORAN-PD Parkinson's disease (PD) [42] GSE83|SLE Systemic lupus erythematosus (SLE) [49] GSE8650 38 21 Each dataset contained a comparison of sick individuals and healthy controls. All the data were obtained from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ ). We first evaluated the performance of different variants of our algorithm and found that DEGAS usually identified the smallest pathways ( Text S1 and Figu | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1065 | GSE7621 | 6/14/2007 | ['7621'] | ['2821'] | [u'17571925'] | 2944782 | [u'20885780'] | ['Lesnick', 'Papapetropoulos', 'Maraganore', 'Shehadeh', 'de', 'Ffrench-Mullen', 'Mash', 'Rocca', 'Henley', 'Ahlskog'] | ['Klassen', 'Khush', 'Morgan', 'Valantine', 'Dudley', 'Li', 'Sigdel', 'Sarwal', 'Kambham', 'Chen', 'Butte', 'Hsieh', 'Caohuu'] | [] | PLoS Comput Biol | 2010 | 9/23/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1066 | GSE7621 | 6/14/2007 | ['7621'] | ['2821'] | [u'17571925'] | 2817002 | [u'20161708'] | ['Lesnick', 'Papapetropoulos', 'Maraganore', 'Shehadeh', 'de', 'Ffrench-Mullen', 'Mash', 'Rocca', 'Henley', 'Ahlskog'] | ['Singer', 'Papapetropoulos', 'Wang', 'Vance', 'Shehadeh', 'Guevara', 'Yu'] | ['Shehadeh', 'Papapetropoulos'] | PLoS One | 2010 | 2/8/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
1067 | GSE7621 | 6/14/2007 | ['7621'] | ['2821'] | [u'17571925'] | 1904362 | [u'17571925'] | ['Lesnick', 'Papapetropoulos', 'Maraganore', 'Shehadeh', 'de', 'Ffrench-Mullen', 'Mash', 'Rocca', 'Henley', 'Ahlskog'] | ['Lesnick', 'Papapetropoulos', 'Maraganore', 'Shehadeh', 'de', 'Ffrench-Mullen', 'Mash', 'Rocca', 'Henley', 'Ahlskog'] | ['Lesnick', 'Papapetropoulos', 'Maraganore', 'Shehadeh', 'de', 'Ffrench-Mullen', 'Mash', 'Rocca', 'Henley', 'Ahlskog'] | PLoS Genet | 2007 | 2007 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
1068 | GSE7622 | 4/27/2007 | ['7622'] | [] | [u'18366709'] | 2323391 | [u'18366709'] | ['Sreevatsan', 'Zhu', 'Paustian', 'Robbe-Austerman', 'Kapur', 'Bannantine'] | ['Sreevatsan', 'Zhu', 'Paustian', 'Robbe-Austerman', 'Kapur', 'Bannantine'] | ['Sreevatsan', 'Zhu', 'Paustian', 'Robbe-Austerman', 'Kapur', 'Bannantine'] | BMC Genomics | 2008 | 3/20/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1069 | GSE7625 | 10/1/2007 | ['7625'] | [] | [u'18055543'] | 2111120 | [u'18055543'] | ['Noguchi', 'Yoshikawa', 'Abe', 'Mizushima', 'Nakagama', 'Shang', 'Akatsuka', 'Ohara', 'Naito', 'Toyokuni', 'Liu', u'Kodama', 'Izumiya', 'Dutta'] | ['Noguchi', 'Yoshikawa', 'Abe', 'Mizushima', 'Nakagama', 'Shang', 'Akatsuka', 'Ohara', 'Naito', 'Toyokuni', 'Liu', 'Izumiya', 'Dutta'] | ['Noguchi', 'Yoshikawa', 'Abe', 'Mizushima', 'Nakagama', 'Shang', 'Akatsuka', 'Ohara', 'Naito', 'Toyokuni', 'Liu', 'Izumiya', 'Dutta'] | Am J Pathol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1070 | GSE7627 | 5/4/2007 | ['7627'] | [] | [u'17470555'] | 1951470 | [u'17470555'] | ['Lieb', 'Liang', 'Hogan', 'Fang', 'Zhang'] | ['Lieb', 'Liang', 'Hogan', 'Fang', 'Zhang'] | ['Lieb', 'Liang', 'Hogan', 'Fang', 'Zhang'] | Mol Cell Biol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
1071 | GSE7631 | 12/19/2007 | ['7631'] | [] | [u'18180456'] | 2206617 | [u'18180456'] | ['Coruzzi', 'Birnbaum', 'Dean', 'Gutierrez', 'Gifford'] | ['Coruzzi', 'Birnbaum', 'Dean', 'Gutierrez', 'Gifford'] | ['Coruzzi', 'Birnbaum', 'Dean', 'Gutierrez', 'Gifford'] | Proc Natl Acad Sci U S A | 2008 | 1/15/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1072 | GSE7644 | 7/9/2007 | ['7644'] | [] | [u'17578907'] | 1899475 | [u'17578907'] | ['', 'Rosbash', 'McDonald', 'Stoleru', 'Kadener', 'Nawathean'] | ['McDonald', 'Stoleru', 'Kadener', 'Nawathean', 'Rosbash'] | ['McDonald', 'Stoleru', 'Kadener', 'Nawathean', 'Rosbash'] | Genes Dev | 2007 | 7/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1073 | GSE7646 | 7/9/2007 | ['7646'] | [] | [u'17578907'] | 1899475 | [u'17578907'] | ['', 'Rosbash', 'McDonald', 'Stoleru', 'Kadener', 'Nawathean'] | ['McDonald', 'Stoleru', 'Kadener', 'Nawathean', 'Rosbash'] | ['McDonald', 'Stoleru', 'Kadener', 'Nawathean', 'Rosbash'] | Genes Dev | 2007 | 7/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1074 | GSE7648 | 10/2/2007 | ['7648'] | ['3001'] | [u'17550900'] | 1951960 | [u'17470548'] | ['Landschulz', 'Haasch', 'Wilcox', 'Lin', 'Rondinone', 'Jung', 'Jacobson', 'Nguyen', 'Bush', 'Yang', 'Voorbach', 'Brodjian', 'Collins', 'Trevillyan', 'Surowy', 'Doktor'] | ['Kasler', 'Verdin'] | [] | Mol Cell Biol | 2007 | 2007 Jul | 0 | AND pmc_gds | 0 | 1 | ||||
1075 | GSE7650 | 5/1/2007 | ['7650'] | [] | [u'17675379'] | 2045235 | [u'17675379'] | ['Terabayashi', 'Miyakoshi', 'Shintani', 'Nojiri', 'Yamane', 'Kai'] | ['Terabayashi', 'Miyakoshi', 'Shintani', 'Nojiri', 'Yamane', 'Kai'] | ['Terabayashi', 'Miyakoshi', 'Shintani', 'Nojiri', 'Yamane', 'Kai'] | J Bacteriol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1076 | GSE7651 | 7/9/2007 | ['7651'] | [] | [u'17578907'] | 1899475 | [u'17578907'] | ['', 'Rosbash', 'McDonald', 'Stoleru', 'Kadener', 'Nawathean'] | ['McDonald', 'Stoleru', 'Kadener', 'Nawathean', 'Rosbash'] | ['McDonald', 'Stoleru', 'Kadener', 'Nawathean', 'Rosbash'] | Genes Dev | 2007 | 7/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1077 | GSE7652 | 7/9/2007 | ['7652'] | [] | [u'17578907'] | 1899475 | [u'17578907'] | ['', 'Rosbash', 'McDonald', 'Stoleru', 'Kadener', 'Nawathean'] | ['McDonald', 'Stoleru', 'Kadener', 'Nawathean', 'Rosbash'] | ['McDonald', 'Stoleru', 'Kadener', 'Nawathean', 'Rosbash'] | Genes Dev | 2007 | 7/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1078 | GSE7653 | 7/9/2007 | ['7653'] | [] | [u'17578907'] | 1899475 | [u'17578907'] | ['', 'Rosbash', 'McDonald', 'Stoleru', 'Kadener', 'Nawathean'] | ['McDonald', 'Stoleru', 'Kadener', 'Nawathean', 'Rosbash'] | ['McDonald', 'Stoleru', 'Kadener', 'Nawathean', 'Rosbash'] | Genes Dev | 2007 | 7/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1079 | GSE7660 | 4/28/2007 | ['7660'] | [] | [u'17560372'] | 2680204 | [u'19216778'] | ['Urban', 'Wanke', 'Huber', 'De', 'Lippman', 'Broach', 'Mukhopadhyay', 'Anrather', 'Soulard', 'Riezman', 'Loewith', 'Ammerer', 'Hall', 'Deloche'] | ['Chiogna', 'Risso', 'Massa', 'Romualdi'] | [] | BMC Bioinformatics | 2009 | 2/13/2009 | 0 | f the top ranking gene lists: 20, 50, 100, 500, and 600. 4.3.4 Real Data We used two cDNA expression datasets and two oligonucleotide datasets to validate our simulation results. All the datasets are publicly available at the GEO database. Baird et al. [ 24 ] (hereafter dataset A) studied expression profiling of 181 tumors representing various classes of bone and soft tissue sarcomas. In this study, we se|ference was obtained by pooling sarcoma cell lines. Expression datasets and platform annotation are available on the NCBI GEO database with platform identification number GPL1977 and reference series GSE2553. Urban et al. [ 25 ] (hereafter dataset B) analysed the rapamycin response in Saccharomyces cerevisiae. Global transcriptional analysis of rapamycin response was conducted on cells expressing eithe|M185498, GSM185503, GSM185504, GSM185518, GSM185519. Expression datasets and platform annotation are available on the NCBI GEO database with platform identification number GPL884 and reference series GSE7660{{tag}}--REUSE--. Smith et al. [ 26 ] (hereafter dataset C) studied the expression profiles of transcription factor deletion strains in the presence of oleate. mRNA levels in each of four deletion strains (delta_OA|onsidered only the delta_ADR1 samples. Expression datasets and platform annotation are available on the NCBI GEO database with platform identification number GPL4287 and GPL4303, and reference series GSE5862. De Pittà and colleagues [ 27 ] (hereafter dataset D) obtained expression profiling of bone marrow from paediatric patients with acute lymphoblastic leukemia (ALL) using a dedicated muscle |, Europe) prepared from male fetal skeletal muscle. Expression datasets and platform annotation are available on the NCBI GEO database with platform identification number GPL2011 and reference series GSE2604. By the analysis of four datasets, we are able to test the normalisation procedures in different situations: (i) either a large (dataset B) or a small (dataset C) proportion of genes expected to be | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1080 | GSE7664 | 4/30/2007 | ['7664'] | ['2778'] | [u'17572062'] | 2644708 | [u'19055840'] | ['King', 'Arbieva', 'Prabhakar', 'Jayaraman', 'Gavin', 'Gillis'] | ['Bagby', 'Sears', 'Kulesz-Martin', 'Pelz'] | [] | BMC Bioinformatics | 2008 | 12/4/2008 | 0 | catter plot to compare pairs of samples, we expect to see most transcripts centered along the diagonal line. When this is not the case, further normalization may be required. We have examined over 30 publicly available datasets, and found many to contain samples with systematic non-linear distortions apparent in their scatter plots. In this report, we will consider a variety of datasets demonstrating vari|ence of 5000 to 10000, 10000 to 15000, and 15000 to 20000. Top Row – GB dataset comparing Fanconi vs. Normal using MAS 5.0 processed data on left and RMA processed data on the right. On left, GSE6802 dataset comparing R vs. C, using RMA. On right, 339RS dataset comparing TA vs. C using RMA. Next, we look at the effect of different GRiS sizes on the detection of statistically significant genes. |toff applied to the respective summary and normalization methods. B-C) Color coding is modified so that blue genes are shown only in the left panel and red genes are shown only in the right panel. B) GSE6475 data comparing 6 AL (acne lesion) replicates to 6 AN (acne normal) replicates. Samples are plotted as in A. C) SS data comparing 6 mutant samples to 6 wild type (3 male and 3 female for each condit | 0 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1081 | GSE7669 | 8/30/2007 | ['7669'] | ['2931'] | [u'17594488'] | 2206335 | [u'17594488'] | ['Kinne', 'Pohlers', 'Beyer', 'Koczan', 'Wilhelm', 'Thiesen'] | ['Kinne', 'Pohlers', 'Beyer', 'Koczan', 'Wilhelm', 'Thiesen'] | ['Kinne', 'Pohlers', 'Beyer', 'Koczan', 'Wilhelm', 'Thiesen'] | Arthritis Res Ther | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1082 | GSE7670 | 8/30/2007 | ['7670'] | [] | [u'17540040'] | 2831002 | [u'20064233'] | ['Huang', 'Lin', 'Su', 'Wu', 'Liang', 'Hsu', 'Chen', 'Chang', 'Whang-Peng'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux'] | [] | BMC Bioinformatics | 2010 | 1/11/2010 | 0 | Data collection (cel files) was performed using Gene Expression Omnibus [19] on the Affymetrix platform HG-U133a (Human Genome model U133a). This collection consists of 34 datasets (tableÊ2) for which there are at least 15 replicates for each of 2 different experimental conditions. {{key}}--REUSE-- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1083 | GSE7670 | 8/30/2007 | ['7670'] | [] | [u'17540040'] | 2949900 | [u'20875095'] | ['Huang', 'Lin', 'Su', 'Wu', 'Liang', 'Hsu', 'Chen', 'Chang', 'Whang-Peng'] | ['Dawany', 'Tozeren'] | [] | BMC Bioinformatics | 2010 | 9/27/2010 | 0 | he individual datasets usually small in size, but the inferences made from individual studies are often inconsistent with similar studies [ 1 ]. As thousands of microarray samples have accumulated in publicly accessible databases in the last decade [ 2 - 4 ], several statistical methods have been developed to allow for the combination and comparison of data from multiple sources. Among the many methodolog|ons based on hypergeometric test. Table 1 Overview of datasets used and distribution of microarray samples Analysis Tissue Accession # Normal Cancer Platform IV1/IV2/SAM1/SAM2 Colon E-MTAB-57 22 25 A GSE4107 10 12 P2 GSE4183 8 15 P2 Kidney E-TABM-282 11 16 P2 GSE11024† 12 60 P2 GSE11151 3 57 P2 GSE14762† 12 10 P2 GSE15641 23 57 A GSE6344 10 10 A GSE7023 12 35 P2 Liver GSE14323 19 47 A/A|2 49 58 A GSE7670{{tag}}--REUSE-- 27 27 A Pancreas E-MEXP-1121† 6 17 A E-MEXP-950 11 14 A GSE15471 39 39 P2 GSE16515 15 36 P2 Total: 294 619 SAM2 Colon E-MEXP-1224 0 55 A E-MEXP-383 0 36 A E-TABM-176 55 0 P2 GSE12945 0 36 A GSE17538 0 232 P2 Kidney GSE10320 0 144 A GSE11904 0 21 A2 Liver E-TABM-292 0 32 A E-TABM-36 0 57 A GSE9843 0 69 P2 Lung GSE10445 0 72 P2 GSE12667 0 75 P2 Total: 55 829 IV2 Colon GSE6988 28|5E-257 No data - 262 2.34E-299 * Only 338 genes are used for colon IV1 Moreover, to assess the effect of the refRMA method in normalizing data, three samples from different colon datasets (E-MTAB-57, GSE4107 and GSE4183) were chosen. The expression values for the three arrays were obtained based on classical RMA and refRMA normalization techniques. Quantile-quantile plots were produced to compare the d|election A total of 31 Affymetrix microarray datasets containing 1,768 unique samples from human cancer (1,429) and corresponding healthy control tissues (339) were collected from the Gene Expression Omnibus (GEO; [ 2 , 3 ] and Array Express [ 4 ] online repositories (Additional File 2 ). Samples were selected for 5 different tissue types: colon, kidney, liver, lung and pancreas, then categorized into c|ms and the conversion of data to Entrez IDs resulted in the study of varying number of genes per dataset as well as different total overlap with the common Affymetrix platform (shown in parentheses); GSE6988: 9,072 (5,834) genes, GSE3: 12,452 (6,598) genes, GSE7367: 2118 (1,301) genes, GSE2088: 13754 (7,038) genes, and GSE8596: 6740 (4,330) genes. The datasets contained cancer versus normal samples fro|NCBI GEO: mining tens of millions of expression profiles--database and tools update Nucleic Acids Res 2007 35 Database D760 765 10.1093/nar/gkl887 17099226 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 1 207 210 10.1093/nar/30.1.207 11752295 Brazma A Parkinson H Sarkans U Shojatalab M Vilo J Abeygunawardena N Ho | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1084 | GSE7670 | 8/30/2007 | ['7670'] | [] | [u'17540040'] | 2821899 | [u'19351829'] | ['Huang', 'Lin', 'Su', 'Wu', 'Liang', 'Hsu', 'Chen', 'Chang', 'Whang-Peng'] | ['Rudin', 'Parmigiani', 'Marchionni', 'Rhodes', 'Devereux', 'Hierman', 'Daniel', 'Peacock', 'Dorsch', 'Watkins', 'Yung'] | [] | Cancer Res | 2009 | 4/15/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
1085 | GSE7670 | 8/30/2007 | ['7670'] | [] | [u'17540040'] | 2745680 | [u'19761563'] | ['Huang', 'Lin', 'Su', 'Wu', 'Liang', 'Hsu', 'Chen', 'Chang', 'Whang-Peng'] | ['Chang', 'Ramoni'] | ['Chang'] | BMC Bioinformatics | 2009 | 9/17/2009 | 1 | problems and perform automatic discrimination of lung cancer subtypes. The classifier is trained by a Duke University data [ 22 ], which is available on Gene Expression Omnibus with accession number GSE3141, in a total of 58 ACs and 53 SCCs. The lung specimens are assayed by Affymetrix HG-U133A. Figure 2 shows the function-dependent transcriptional network inferred from the data. Of the 22,283 gene |e of 10-fold cross validation achieves 98.5% accuracy. We further test the classification accuracy of the network on seven independent study populations with Gene Expression Omnibus accession numbers GSE10072, GSE7670{{tag}}--REUSE--, GSE12667, GSE4824, GSE2109, GSE4573, and GSE6253, for a total of 422 samples, 232 AC and 190 SCC, from subjects of Caucasian, Asian and African descent representing 84.6%, 6.9%, and 2.8%|xpression patterns in peripheral blood cells are expected to assist the diagnosis of TAA. The data used to derive the classifier is publicly available on Gene Expression Omnibus with accession number GSE9106 [ 39 ], which involves 36 cases and 25 controls for training purpose. Peripheral blood samples were collected at Yale-New Haven Hospital. Gene expression experiments were carried out by Applied Bio|isite to TAA. The 10-fold cross validation of the classifier yields 97% accuracy. We further examine the classifier on the independent samples, 24 cases and 9 controls, also included in the Yale data GSE9106. The accuracy on these independent samples achieves 82%, demonstrating good performance of our approach. Figure 3 The functional dependence network for TAA diagnosis. There are 346 genes selected | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1086 | GSE7670 | 8/30/2007 | ['7670'] | [] | [u'17540040'] | 2865773 | [u'20458363'] | ['Huang', 'Lin', 'Su', 'Wu', 'Liang', 'Hsu', 'Chen', 'Chang', 'Whang-Peng'] | ['Chen', 'Liu', 'Xiong', 'Rayner', 'Li'] | ['Chen'] | Cancer Inform | 2010 | 3/26/2010 | 0 | Genome U133 Array Set HG-U133A. A total of 66 samples were used for microarray analysis, including pair-wise samples from 27 patients. 10 The accession number in the Gene Expression Omnibus (GEO) is GSE7670{{tag}}--REUSE--. The protein-protein interaction data was downloaded from the Human Protein Reference Database (HPRD) (09/01/2007 release). Entropy minimization The joint association of gene pair expression states | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1087 | GSE7670 | 8/30/2007 | ['7670'] | [] | [u'17540040'] | 2637897 | [u'19146704'] | ['Huang', 'Lin', 'Su', 'Wu', 'Liang', 'Hsu', 'Chen', 'Chang', 'Whang-Peng'] | ['Jen', 'Lin', 'Tung', 'Wang', 'Hsu'] | ['Lin', 'Hsu'] | BMC Genomics | 2009 | 1/16/2009 | 0 | biological function interpretation. Methods Data collections and preprocessing Six independent data sets (Normal, HCC 1 , HCC 2 , Tumor 1 , Tumor 2 , Tumor 3 ), including one normal tissue data set (GSE3526), two HCC data sets (E-TABM-36 and GSE6764), and data sets for other three tumor types: thyroid cancer (GSE3678), colon cancer(GSE4107) and lung cancer (GSE7670{{tag}}--REUSE--), were downloaded from two public ar | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1088 | GSE7670 | 8/30/2007 | ['7670'] | [] | [u'17540040'] | 2659802 | [u'19337377'] | ['Huang', 'Lin', 'Su', 'Wu', 'Liang', 'Hsu', 'Chen', 'Chang', 'Whang-Peng'] | ['Hong', 'Lee', 'Whang-Peng', 'Su', 'Wu', 'Huang', 'Chou', 'Cheng', 'Chen', 'Lai'] | ['Chen', 'Whang-Peng', 'Wu', 'Su', 'Huang'] | PLoS One | 2009 | 2009 | 0 | 1-1 , CL 1-5 [51] and CL 1-5 -F4 [52] . The data have been deposited in NCBIs Gene Expression Omnibus (GEO) and are accessible through GEO series accession number GSE7670{{tag}}--DEPOSIT--. No patients had previously received any neoadjuvant treatment, e.g. chemotherapy, before the surgery. The study protocol had the approval of the ethics committee at Taipei Veterans General Hospita | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1089 | GSE7670 | 8/30/2007 | ['7670'] | [] | [u'17540040'] | 1894975 | [u'17540040'] | ['Huang', 'Lin', 'Su', 'Wu', 'Liang', 'Hsu', 'Chen', 'Chang', 'Whang-Peng'] | ['Huang', 'Lin', 'Su', 'Wu', 'Liang', 'Hsu', 'Chen', 'Chang', 'Whang-Peng'] | ['Huang', 'Lin', 'Su', 'Wu', 'Liang', 'Hsu', 'Chen', 'Chang', 'Whang-Peng'] | BMC Genomics | 2007 | 6/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1090 | GSE7673 | 7/7/2007 | ['7673'] | [] | [u'17558410'] | 2723786 | [u'18346918'] | ['Singer', u'Carrol', 'Carroll', 'Greally', 'Kuo', 'Dierov', 'Ranuncolo', 'Green', 'Polo', 'Melnick'] | ['Polo', 'Ranuncolo', 'Melnick'] | ['Polo', 'Ranuncolo', 'Melnick'] | Blood Cells Mol Dis | 2008 | 2008 Jul-Aug | 0 | enomic array representing the CHEK1 genomic locus with overlapping 50-mer oligonucleotides (Nimblegen Systems, Madison, WI). The array design and complete results are available on the Gene Expression Omnibus (GEO) website accession number GSE7673{{tag}}--DEPOSIT-- . Specific BCL6 binding to genomic regions was detected by determining the fold enrichment of a five-oligonucleotide sliding window over input. BCL6 binding wa | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1091 | GSE7678 | 8/20/2007 | ['7678'] | [] | [u'17656095'] | 2447788 | [u'18463138'] | ['Feng', 'Qin', 'Love', 'Gerin', 'Kaczorowski', 'Bommer', u'Kacsorowski', 'Cho', 'MacDougald', 'Giordano', 'Zhai', 'Fearon', 'Kuick', 'Moore'] | ['Brandt', "'t", 'den', 'Villerius', 'Ivliev'] | [] | Nucleic Acids Res | 2008 | 7/1/2008 | 1 | 2018;p53[Title] or tp53[Title]’ yielded 30 GEO Series (as on 12 January 2008). Searching for GEO Series with the same query via PubMed added 11 more experiments. Four of these 11 experiments (GSE2155, GSE3072, GSE7678{{tag}}--REUSE-- and GSE8023) contain neither of the terms ‘p53’ or ‘tp53’ in their titles nor in the GEO annotation (except for the ‘citation’ fiel | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1092 | GSE7683 | 7/4/2007 | ['7683'] | ['2802'] | [u'17603917'] | 1929075 | [u'17603917'] | ['Beier', 'James', 'Tuckermann', 'Underhill', 'Ulici'] | ['Beier', 'James', 'Tuckermann', 'Underhill', 'Ulici'] | ['Beier', 'James', 'Tuckermann', 'Underhill', 'Ulici'] | BMC Genomics | 2007 | 7/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1093 | GSE7685 | 9/1/2007 | ['7685'] | [] | [u'20084171'] | 2805713 | [u'20084171'] | ['Stanton', 'James', 'Agoston', 'Ulici', 'Beier', 'Underhill'] | ['Stanton', 'James', 'Agoston', 'Ulici', 'Beier', 'Underhill'] | ['Stanton', 'James', 'Agoston', 'Ulici', 'Beier', 'Underhill'] | PLoS One | 2010 | 1/13/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
1094 | GSE7685 | 9/1/2007 | ['7685'] | [] | [u'20084171'] | 2810323 | [u'20111593'] | ['Stanton', 'James', 'Agoston', 'Ulici', 'Beier', 'Underhill'] | ['Hoenselaar', 'James', 'Ulici', 'Beier'] | ['Beier', 'James', 'Ulici'] | PLoS One | 2010 | 1/25/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
1095 | GSE7688 | 7/28/2007 | ['7688'] | [] | [u'18042645'] | 2582686 | [u'19043553'] | ['Barrera', 'Cavenee', 'Zhang', 'Arden', 'Smith', 'Li', 'Green', 'Ren'] | ['Huber', 'Toedling'] | [] | PLoS Comput Biol | 2008 | 2008 Nov | 0 | We obtained the data from the GEO repositoryÊ[19]Ê(accessionÊGSE7688{{tag}}--REUSE--). | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1096 | GSE7688 | 7/28/2007 | ['7688'] | [] | [u'18042645'] | 2134779 | [u'18042645'] | ['Barrera', 'Cavenee', 'Zhang', 'Arden', 'Smith', 'Li', 'Green', 'Ren'] | ['Barrera', 'Cavenee', 'Zhang', 'Arden', 'Smith', 'Li', 'Green', 'Ren'] | ['Barrera', 'Cavenee', 'Zhang', 'Arden', 'Smith', 'Li', 'Green', 'Ren'] | Genome Res | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1097 | GSE7689 | 8/29/2007 | ['7689'] | [] | [u'17937502'] | 2014791 | [u'17937502'] | ['S\xc3\xa9verac', u'S\xe9m\xe9riva', u'S\xe9verac', 'Perrin', 'S\xc3\xa9natore', 'S\xc3\xa9m\xc3\xa9riva', 'Zeitouni', u'S\xe9natore', 'Aknin'] | ['S\xc3\xa9verac', 'Perrin', 'S\xc3\xa9natore', 'S\xc3\xa9m\xc3\xa9riva', 'Zeitouni', 'Aknin'] | ['S\xc3\xa9verac', 'Perrin', 'S\xc3\xa9natore', 'S\xc3\xa9m\xc3\xa9riva', 'Zeitouni', 'Aknin'] | PLoS Genet | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1098 | GSE7693 | 5/5/2007 | ['7693'] | [] | [] | 2996082 | [u'20681019'] | [u'Harfouche', u'Amiot', u'Lemaitre', u'Martin', u'Vaigot', u'Rachidi'] | ['Harfouche', 'Rigaud', 'Lemaitre', 'Rachidi', 'Fortunel', 'Martin', 'Marie', 'Vaigot', 'Moratille'] | [u'Harfouche', u'Martin', u'Vaigot', u'Rachidi', u'Lemaitre'] | Stem Cells | 2010 | 2010 Sep | 0 | s software ( http://www.ingenuity.com ). A complete description of the microarrays used in this study is available in the GEO database ( http://www.ncbi.nlm.nih.gov/geo/ ) under the GEO accession no. GSE7693{{tag}}--DEPOSIT--. Quantitative Reverse Transcription Polymerase Chain Reaction Analysis Three hours after radiation exposure, total RNA was extracted from the two populations using the PicoPure Isolation kit (Arctu | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1099 | GSE7694 | 6/11/2007 | ['7694'] | ['2820'] | [u'17556587'] | 2945940 | [u'20840752'] | ['Gaffal', 'Steuder', 'Schlicker', 'Petrosino', 'Karsak', 'Zimmer', 'Di', 'Wang-Eckhardt', 'Starowicz', u'T\xfcting', 'Buettner', 'Cravatt', 'Mechoulam', 'Werner', 'Date', 'Rehnelt', 'T\xc3\xbcting'] | ['Ojeda', 'de', 'Nitsch', 'Gon\xc3\xa7alves', 'Moreau'] | [] | BMC Bioinformatics | 2010 | 9/14/2010 | 0 | n, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expr| approach for scoring candidates is presented here for comparison purposes as a naïve strategy for network analysis of differential expression. We have benchmarked these four strategies on 40 publicly available data sets originated from Affymetrix chips on which mice with (simple) knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. For each data set we|egy based on a direct neighborhood analysis. These four network-based prioritization strategies for scoring candidate genes based on their differentially expressed neighborhood were benchmarked on 40 publicly available knockout experiments in mice. Performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expressi|ng candidate genes using network-based machine learning approaches even if no knowledge is available about the disease or phenotype. Methods Benchmark data The benchmark for this study consists of 40 publicly available data sets originated from Affymetrix chips on which mice with knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. Table 3 shows all data sets|ur benchmark. Table 3 The benchmark data. Gene Name GEO accession number Gene Name GEO accession number 1 Abca1 GSE5496 21 Mbnl1 GSE14691 2 Btk GSE2826 22 Mst1r, Ron GSE16629 3 Cav1 GSE10849 23 MyD88 GSE6688 4 Cav3 GSE10848 24 Nos3, eNos GSE1988 5 Cftr GSE5715 25 Phgdh GSE8555 6 Clcn1 GSE14691 26 Pmp22 GSE1947 7 Cnr1 GSE7694{{tag}}--REUSE-- 27 PPAR α GSE6864 8 Emd GSE5304 28 Prkag3, AMPK G3 GSE4065 9 Epas1, Hi|-2 GSE16067 29 Pthlh, Pthrp GSE17654 10 Esrra GSE7196 30 Rab3a GSE6527 11 Gap43 GSE12687 31 RasGrf1 GSE8425 12 Gnmt GSE9809 32 Rbm15 GSE12628 13 Hdac1 GSE5583 33 Runx GSE4911 14 Hdac2 GSE6770 34 Scd1 GSE2926 15 Hsf4 GSE12415 35 Slc26a4 GSE10587 16 Hspa1A, Hsp70.1 GSE11120 36 Srf GSE13333 17 Il6 GSE411 37 Tgm2 GSE10285 18 Lhx1, Lim1 GSE4230 38 Zc3h12a GSE14891 19 Lhx8 GSE11897 39 Zfp36, Tpp GSE5324 20 L | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1100 | GSE7695 | 9/30/2007 | ['7695'] | [] | [u'17921306'] | 2168620 | [u'17921306'] | ['Hidalgo-Arroyo', 'Cariss', 'Constantinidou', 'Hobman', 'Patel', 'Penn', 'Avison'] | ['Hidalgo-Arroyo', 'Cariss', 'Constantinidou', 'Hobman', 'Patel', 'Penn', 'Avison'] | ['Hidalgo-Arroyo', 'Cariss', 'Constantinidou', 'Hobman', 'Penn', 'Patel', 'Avison'] | J Bacteriol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1101 | GSE7704 | 8/15/2007 | ['7704'] | ['2869'] | [u'17724070'] | 2998477 | [u'21083928'] | ['Nguyen', 'Matthews', 'Kang', 'Hoang', 'Son'] | ['Ehrlich', 'Mazurie', 'Parker', 'Pitts', 'Roe', 'Folsom', 'Richards', 'Stewart'] | [] | BMC Microbiol | 2010 | 11/17/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1102 | GSE7704 | 8/15/2007 | ['7704'] | ['2869'] | [u'17724070'] | 2168270 | [u'17724070'] | ['Nguyen', 'Matthews', 'Kang', 'Hoang', 'Son'] | ['Nguyen', 'Matthews', 'Kang', 'Hoang', 'Son'] | ['Nguyen', 'Matthews', 'Kang', 'Hoang', 'Son'] | Infect Immun | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1103 | GSE7708 | 5/5/2007 | ['7708'] | ['2782'] | [u'17566103'] | 1965528 | [u'17566103'] | ['Nickols', 'Dervan'] | ['Nickols', 'Dervan'] | ['Nickols', 'Dervan'] | Proc Natl Acad Sci U S A | 2007 | 6/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1104 | GSE7742 | 5/8/2007 | ['7742'] | [] | [u'18270588'] | 2223071 | [u'18270588'] | ['Dolinay', 'Kaminski', 'Choi', 'Hoetzel', 'Wu', 'Ifedigbo', 'Watkins', 'Szilasi', 'Ryter', 'Kaynar'] | ['Dolinay', 'Kaminski', 'Choi', 'Hoetzel', 'Wu', 'Ifedigbo', 'Watkins', 'Szilasi', 'Ryter', 'Kaynar'] | ['Dolinay', 'Kaminski', 'Choi', 'Hoetzel', 'Wu', 'Ifedigbo', 'Watkins', 'Szilasi', 'Ryter', 'Kaynar'] | PLoS One | 2008 | 2/13/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1105 | GSE7743 | 5/9/2007 | ['7743'] | [] | [u'17478635'] | 2729609 | [u'19638476'] | ['Kindgren', 'Benedict', 'Kleine', 'Strand', 'Hendrickson'] | ['Smirnoff', 'Jones', 'Mullineaux', 'Baker', 'Galvez-Valdivieso', 'Lawson', 'Davies', 'Asami', 'Truman', 'Slattery', 'Fryer'] | [] | Plant Cell | 2009 | 2009 Jul | 0 | L exposure and exogenous ABA application. A total of 816 genes were identified as representing a significant overlap between HL and ABA responses and were clustered with data from the Gene Expression Omnibus (GEO) and NASCARRAYS databases. The 30 min, 1 h, and 3 h ABA data come from NASCARRAYS-176, 4 h ABA from GSE7112 , 3 h ABA #2 from GSE6171 , and 3 h HL from GSE7743{{tag}}--REUSE-- . In this TREEVIEW repr|in HL-exposed ABA biosynthesis-deficient mutants. All mutants caused a significant reduction in the expression of the test genes compared with the wild-type controls ( Figure 2C ). A meta-analysis of publicly available microarray data for treatment of seedlings with ABA ( Goda et al., 2008 ) compared with data from HL-exposed seedlings ( Kleine et al., 2007 ; see Methods) revealed that 816 genes were core| genes was induced under both conditions, while 320 were suppressed in response to both treatments (see Supplemental Data Set 1 online). When expression data for these genes were clustered with other publicly available ABA treatment data ( Figure 2D ), a strong correlation was observed between 3 h of HL exposure and plants 3 or 4 h after ABA application at a variety of concentrations (uncentered correlati|iar ABA content varies ( Rossel et al., 2006 ). The primers used in this study for quantitative RT-PCR are given in Supplemental Table 2 online. Bioinformatics Data for Affymetrix ATH1 GeneChips were downloaded from the GEO repository ( http://www.ncbi.nlm.nih.gov/sites/entrez?db=gds ) and the NASCArrays database ( http://affymetrix.Arabidopsis.info/narrays/experimentbrowse.pl ). The HL exposure d|ed that contributed to the significant weighted similarity score were clustered together with other ABA treatment time points from the NASCARRAYS-176 data set and additional ABA treatment data sets ( GSE7112 and GSE6171 ). Hierarchical clustering was performed using CLUSTER ( Eisen et al., 1998 ) and visualized with the program TREEVIEW ( Eisen et al., 1998 ). Complete linkage clustering using an unc | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1106 | GSE7743 | 5/9/2007 | ['7743'] | [] | [u'17478635'] | 1914119 | [u'17478635'] | ['Kindgren', 'Benedict', 'Kleine', 'Strand', 'Hendrickson'] | ['Kindgren', 'Benedict', 'Kleine', 'Strand', 'Hendrickson'] | ['Kindgren', 'Benedict', 'Kleine', 'Strand', 'Hendrickson'] | Plant Physiol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
1107 | GSE7745 | 5/10/2007 | ['7745'] | [] | [] | 2761415 | [u'19761587'] | [u'Troelsen', u'Olsen', u'M\xf8ller', u'Bressendorff', u'Nielsen'] | ['Troelsen', 'Olsen', 'Bressendorff', 'Boyd', 'M\xc3\xb8ller'] | ['Troelsen', 'Olsen', 'Bressendorff'] | BMC Gastroenterol | 2009 | 9/17/2009 | 0 | rce bioconductor project http://www.bioconductor.org [ 38 ]. The gene expression data is available at NCBI's GEO http://www.ncbi.nlm.nih.gov/sites/entrez?db=gds under the series accession number: GSE7745{{tag}}--DEPOSIT-- . Western blot Western blot analyses were performed by separating nuclear extracts prepared as described in [ 39 ] from undifferentiated pre-confluent and differentiated 10-days post-confluent Caco|nd acetylated histone H3 data is available is available at http://gastro.sund.ku.dk/chipchip/ and at NCBI's GEO http://www.ncbi.nlm.nih.gov/sites/entrez?db=gds under the series accession number: GSE7745{{tag}}--DEPOSIT-- . ChIP qPCR analysis Verification of enrichment due to immunoprecipitation with HNF4α- and HA-antibody was done with real-time PCR using the LightCycler (Roche) according to the manufacture|4α mRNA level increases 7.8-fold during Caco-2 differentiation in the Affymetrix Gene chip analysis (accessible at http://www.ncbi.nlm.nih.gov/sites/entrez?db=gds ; series accession number: GSE7745{{tag}}--DEPOSIT-- ). Also at the protein level a significant increase in the HNF4α expression was detected (Figure 1A ). In order to investigate the role of HNF4α in differentiated intestinal cells,|nt of the HNF4α IP DNA compared to genomic DNA (IgG intron). Array data can be downloaded from NCBI's GEO http://www.ncbi.nlm.nih.gov/sites/entrez?db=gds under the series accession number: GSE7745{{tag}}--DEPOSIT-- . Figure 6 A model of some of the components in an HNF4α regulated transcription factor network regulating intestinal expression of genes during cellular differentiation . Competing interes | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1108 | GSE7745 | 5/10/2007 | ['7745'] | [] | [] | 2641037 | [u'19151602'] | [u'Troelsen', u'Olsen', u'M\xf8ller', u'Bressendorff', u'Nielsen'] | ['Mougey', 'Irvin', 'Feng', 'Lima', 'Castro'] | [] | Pharmacogenet Genomics | 2009 | 2009 Feb | 0 | [ PMC free article ] [ PubMed ] 43. Nielsen M, Bressendorff S, Møller J, Olsen J, Troelsen JT. Mapping of HNF4a binding sites, acetylation of histone H3 and expression in Caco2 cells. [ GSE7745{{tag}}--MENTION-- ] National Center for Biotechnology Information. Geo Datasets. 2008 44. Expert Panel Report 3 (EPR-3): Guidelines for the Diagnosis and Management of Asthma-Summary Report 2007. J Allergy Clin | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
1109 | GSE7745 | 5/10/2007 | ['7745'] | [] | [] | 2905877 | [u'19443739'] | [u'Troelsen', u'Olsen', u'M\xf8ller', u'Bressendorff', u'Nielsen'] | ['Newburger', 'Berger', 'Metzler', 'Coburn', 'Chan', 'Kuznetsov', 'Jaeger', 'Gehrke', 'Hughes', 'Vedenko', 'Talukder', 'Morris', 'Chen', 'Badis', 'Wang', 'Bulyk', 'Philippakis'] | [] | Science | 2009 | 6/26/2009 | 0 | ) all bound genomic regions in ChIP-chip data, or (C) those bound regions lacking primary motif k -mers, as compared to randomly selected sequences was calculated ( 5 ) for Hnf4a (GEO accession # GSE7745{{tag}}--MENTION-- ). ChIP-chip ‘bound’ peaks were identified according to the criteria of that study ( 28 ). A window size of 500 bp with a step size of 100 bp was used. The GOMER thresholds used are|ritten permission of AAAS. Materials and methods are available as supporting material on Science Online. PBM data are available at http://the_brain.bwh.harvard.edu/pbms/webworks/ and also via the publicly available UniPROBE database ( 31 ). Supporting Online Material www.sciencemag.org Materials and Methods Supporting Text Figs. S1 to S15 Tables S1 to S3 Supplementary References Websites: (1) Sequen | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
1110 | GSE7748 | 8/29/2007 | ['7748'] | [] | [u'18004281'] | 2792174 | [u'19887575'] | [u'Perdersen', 'Mitalipov', 'Byrne', 'Pedersen', 'Gokhale', 'Clepper', 'Nelson', 'Wolf', 'Sanger'] | ['Tanay', 'Palsson', 'Reynisd\xc3\xb3ttir', 'Cohen', 'Mitalipov', 'Dighe', 'Landan'] | ['Mitalipov'] | Genome Res | 2009 | 2009 Dec | 1 | AND pmc_gds | 1 | 0 | ||||
1111 | GSE7757 | 5/31/2007 | ['7757'] | ['2819'] | [u'17587440'] | 1925098 | [u'17587440'] | ['Campo', 'Basso', 'Li', 'Liu', 'Trentin', 'Kohlmann', 'Zangrando', 'te'] | ['Campo', 'Basso', 'Li', 'Liu', 'Trentin', 'Kohlmann', 'Zangrando', 'te'] | ['Campo', 'Basso', 'Li', 'Liu', 'Trentin', 'Kohlmann', 'Zangrando', 'te'] | BMC Genomics | 2007 | 6/22/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1112 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2670604 | [u'19141711'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Urban', 'Kwok', 'Giacomini', 'Stryke', 'Ferrin', 'Johns', 'Yee', 'Castro', 'Hesselson', 'Tahara', 'Kawamoto'] | [] | J Pharmacol Exp Ther | 2009 | 2009 Apr | 0 | The OCTN2 mRNA expression level of the individual lymphoblastoid cell lines was obtained from the GEO database (accession numbersÊGSE5859ÊandÊGSE7761{{tag}}--REUSE--).Ê | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1113 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2858682 | [u'20421931'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Gilissen', 'Hehir-Kwa', 'de', 'Ponting', 'Webber', 'Brunner', 'Veltman', 'Wieskamp', 'Pfundt'] | [] | PLoS Comput Biol | 2010 | 4/22/2010 | 0 | 05d; . The stable expression was calculated via the standard deviation of log 2 intensities across 176 Hapmap cell lines (CEU and YRI) hybridized onto an Affymetrix GeneChip Human Exon 1.0 ST array (GSE7761{{tag}}--REUSE--). Supporting Information Figure S1 Workflow used to develop the classifier. The classifier is able to distinguish between MR CNVs and benign CNVs based upon solely genomic features without the use | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1114 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2717951 | [u'19545436'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Heber', 'Sick', 'Howard'] | [] | BMC Bioinformatics | 2009 | 6/22/2009 | 1 | ded from the NCBI GEO database [ 28 ]. A variety of Affymetrix GeneChip 3' Expression array types are represented in the dataset, including: ath1121501 (Arabidopsis, 248 chips; GEO accession numbers: GSE5770, GSE5759, GSE911 [ 29 ], GSE2538 [ 30 ], GSE3350 [ 31 ], GSE3416 [ 32 ], GSE5534, GSE5535, GSE5530, GSE5529, GSE5522, GSE5520, GSE1491 [ 33 ], GSE2169, GSE2473), hgu133a (human, 72 chips; GSE1420 [|), hgu95av2 (human, 51 chips; GSE1563 [ 35 ]), hgu95d (human, 22 chips; GSE1007 [ 36 ]), hgu95e (human, 21 chips; GSE1007), mgu74a (mouse, 60 chips; GSE76, GSE1912 [ 37 ]), mgu74av2 (mouse, 29 chips; GSE1947 [ 38 ], GSE1419 [ 39 , 40 ]), moe430a (mouse, 10 chips; GSE1873 [ 41 ]), mouse4302 (mouse, 20 chips; GSE5338 [ 42 ], GSE1871 [ 43 ]), rae230a (rat, 26 chips; GSE1918, GSE2470), and rgu34a (rat, 44 |xperiment. The second dataset consists of all of the exon array .CEL files available in the GEO database at the time of this analysis (540 .CEL files). Fourteen different experiments are represented (GSE10599 [ 47 ], GSE10666 [ 48 ], GSE11150 [ 49 ], GSE11344 [ 50 ], GSE11967 [ 51 ], GSE12064 [ 52 ], GSE6976 [ 53 ], GSE7760 [ 54 ], GSE7761{{tag}}--REUSE-- [ 55 ], GSE8945 [ 56 ], GSE9342, GSE9372 [ 57 ], GSE9385 [ 58 ] | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1115 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2680361 | [u'19066393'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Pui', 'Evans', 'Yang', 'Mullighan', 'Downing', 'French', 'Cheng', 'Raimondi', 'Relling'] | [] | Blood | 2009 | 5/7/2009 | 0 | 9 probe sets for further analysis. 38 Gene expression analysis of the CEU and YRI cell lines using the Affymetrix GeneChip Human Exon 1.0 ST Array comprising approximately 1.4 million probe sets was downloaded and processed as previously described ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE7761{{tag}}--REUSE-- ). 39 Somatic leukemia copy number variation DNA was extracted from diagnostic bone marrow of 82 p|oducts/software/specific/gtype.affx ). SNPs with call rates of less than 95% in all patients and minor allele frequencies of less than 1% were filtered out, leaving 447 287 SNPs for further analysis. Publicly available genotyping data for the 87 CEU cell lines and 89 YRI cell lines were downloaded from the International HapMap Project website, release 22 ( www.hapmap.org ). Inherited CNV Methods used for |served in the 176 cell lines. There was no overlap in the top 7 genes for patient leukemia cells versus normal lymphoblastoid cell lines. SNP genotype versus MTXPG accumulation analysis. Using the publicly available genotyping data from the International HapMap project for the CEU and YRI cell lines, we found that 8310 SNP genotypes were significantly associated ( P < .01) with MTXPG accumulat | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1116 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2567052 | [u'18496134'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Dolan', 'Zhang', 'Cox', 'Duan', 'Huang', 'Kistner', 'Bleibel'] | ['Dolan', 'Kistner', 'Duan', 'Huang', 'Cox'] | Pharmacogenet Genomics | 2008 | 2008 Jun | 0 | 0 ST arrays (Affymetrix Laboratory, Affymetrix Inc., Santa Clara, California, USA) with interrogation of greater than 17000 human genes. Genome-wide association was performed between over two million publicly available HapMap SNPs and gene expression. Main results The expression of two PGRN-VIPs ( GSTT1 and GSTM1 ) are significantly associated with SNPs within 2.5 Mb of the genes; whereas the expression|orthern and western Europe) and YRI (Yoruba in Ibadan, Nigeria) using Affymetrix GeneChip Human Exon 1.0 ST array (exon array), which interrogates more than 17 000 human genes [ 2 ]. Over two million publicly available HapMap single nucleotide polymorphisms (SNPs) ( www.hapmap.org ) [ 3 ] and gene expression (GenBank accession no: GSE7761{{tag}}--MENTION-- ) are available for these cell lines. In this report, we demonstra|m the Coriell Institute for Medical Research (Camden, New Jersey, USA). Cell lines were maintained and diluted as described [ 4 ]. Genotype and gene expression association analysis SNP genotypes were downloaded from the International HapMap database ( http://www.HapMap.org ) (release 22). Only 2 098 437 and 2 286 186 SNPs that passed Mendelian error checks and have a minor allele frequency greater than 5%| with 550 000 tag SNPs was considered statistically significant. The 550 000 tag SNPs were selected based on Illumina HumanHap550 BeadChip design using HapMap SNP data [ 6 ] ( http://www.illumina.com/downloads/HUMANHAP550_TechBull.pdf ). Evaluation of very important pharmacogenes selected by Pharmacogenetic Research Network To examine whether the PGRN-VIPs were probed on the exon array, the gene symbols o|general linear model (false discovery rate less than 0.05). CEU, CEPH, Utah residents with ancestry from northern and western Europe; YRI, Yoruba in Ibadan. GWA was performed between over two million publicly available HapMap SNPs ( www.hapmap.org ) [ 3 ] and gene expression in LCLs to identify the relationship between genetic variants and gene expression for the PGRN-VIPs. As a result of the observed dif|ects on PGRN-VIP expression. Given the important role that these PGRN-VIPs play in drug therapy, these SNPs may impact drug therapy as well. The dense genotyping and gene expression data we have made publicly available for the HapMap cell lines allow systematic GWA analysis that would not be possible otherwise. We recognize that there are likely transcriptional, pretranslational, and/or posttranslational | 0 | 0 | 1 | NOT pmc_gds | 1 | 0 |
1117 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2774468 | [u'19622575'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Dolan', "O'Donnell"] | ['Dolan'] | Clin Cancer Res | 2009 | 8/1/2009 | 0 | motherapeutic agent, each cell line’s unique sensitivity to drug-induced cell growth inhibition can be measured as a phenotype ( 59 ). These phenotypes can then be subjected to GWAS using the publicly available HapMap genotypes to identify potential SNPs contributing to cytotoxicity. As a means to prioritize SNPs that act through their effect on gene expression, we have also made publicly availabl|Disclosure of Potential Conflicts of Interest No potential conflicts of interest were disclosed. 3 The International HapMap Project (2008), http://www.hapmap.org/abouthapmap.html . 4 Gene Expression Omnibus, GSE7761{{tag}}--MENTION-- , National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (2007), http://www.ncbi.nlm.nih.gov/geo/ .  Other Sectionsâ�¼ Abstract Pharmacoge | 0 | 0 | 1 | NOT pmc_gds | 1 | 0 |
1118 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2731557 | [u'18677484'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Eileen', 'Mirkov', 'Rosner', 'Hennessy', 'Liu', 'Shukla', 'Nagasubramanian', 'Ratain', 'Ram\xc3\xadrez', 'Cook', 'Innocenti', 'Bleibel'] | ['Shukla'] | Cancer Chemother Pharmacol | 2009 | 2009 Apr | 0 | founding interpretations of gene-expression variation, we removed data from exons for which probe sets contained two or more probes harboring SNPs before summarizing expression (GEO accession number GSE7761{{tag}}--DEPOSIT-- ). Sample size and power calculation We calculated the number of LCLs needed to select 20 CC and 20 CT + TT LCLs for the camptothecin study. We relied on previous studies in Caucasians Trikka et al | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1119 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2652364 | [u'19109566'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Dolan', 'Lamba', 'Delaney', 'Mi', 'Duan', 'Huang', 'Kistner', 'Hartford'] | ['Dolan', 'Duan', 'Huang', 'Kistner'] | Blood | 2009 | 3/5/2009 | 0 | h to cellular susceptibility to ara-C in 2 distinct populations. A similar approach has been used in evaluating pharmacodynamic genes important in cisplatin and etoposide through the use of LCLs with publicly available genotypic data. 28 , 29 These cell lines provide a well-controlled, reproducible system free from confounding factors, such as patient comorbidities and drug-drug interactions. Furthermore| that treatment date. The relative levels of ara-CTP were then analyzed by t test according to DCK SNP genotypes. Whole-genome analysis of genotype and cytotoxicity association SNP genotypes were downloaded from the International HapMap database, release 22 ( www.HapMap.org ). SNPs with evidence of Mendelian allele transmission errors and those with a minor allele frequency less than 5% were filtered |2212;6 , which is corrected by the number of transcript clusters tested. The analysis was performed independently for each population. All raw exon-array data have been deposited into Gene Expression Omnibus (accession no. GSE7761{{tag}}--DEPOSIT-- ). 36 Analysis of gene expression and cytotoxicity To examine the relationship between gene expression and sensitivity to ara-C, general linear models were constructed as prev| differences in gene expression between populations. Am J Hum Genet. 2008; 82 :631�640. [ PMC free article ] [ PubMed ] 36. National Center for Biotechnology Information. Geo: gene expression omnibus. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi . 37. Suarez-Kurtz G. Pharmacogenomics in admixed populations. Trends Pharmacol Sci. 2005; 26 :196�201. [ PubMed ] 38. Hoefen RJ, Berk BC. Th | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1120 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2714371 | [u'18451141'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Fackenthal', 'Dolan', 'Kistner', 'Duan', 'Huang', 'Delaney', 'Das', 'Bleibel'] | ['Dolan', 'Duan', 'Huang', 'Kistner'] | Cancer Res | 2008 | 5/1/2008 | 0 | er Treatment, National Cancer Institute (Bethesda, MD). Genotype and cytotoxicity association analysis Cell growth inhibition was tested in 175 LCLs as described previously ( 17 ). SNP genotypes were downloaded from the International HapMap database 5 (release 21) and filtered. Details for the SNP filtration criteria can be found in our previous publication ( 16 ). A total of 387,417 SNPs having a high m|their contribution in generating the exon array data on the CEU and YRI cell lines. Footnotes The gene expression data described in this manuscript has been deposited into GEO (GenBank accession no. GSE7761{{tag}}--DEPOSIT-- ). The phenotype data has been deposited in http://www.pharmgkb.org/ (PS206925). 4 Coriell Institute for Medical Research, http://ccr.coriell.org/ . 5 International HapMap Project, http://www.h | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1121 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2743011 | [u'18765826'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Hartford', 'Duan', 'Dolan', 'Huang', 'Kistner'] | ['Dolan', 'Duan', 'Huang', 'Kistner'] | Mol Cancer Ther | 2008 | 2008 Sep | 0 | ata to identify potentially functional SNPs associated with chemotherapy-induced cytotoxicity ( 21 , 22 ). To this end, we chose to apply our model using the International HapMap cell lines, because publicly available dense genotyping allows systemic genome-wide association analysis that would not be possible in other systems. The focus of this study is to identify a definable set of genetic variants and|. We determined cell proliferation rate by evaluating cell growth at 72 h for each cell line without drug addition. Genotype and Cytotoxicity Association Analysis SNP genotypes of YRI population were downloaded from the International HapMap database 5 (Release 22, nonredundant and rs_strand version), and 2,286,186 SNPs with minor allele frequencies >5% and no Mendelian inheritance transmission er|s using Affymetrix GeneChip Human Exon 1.0 ST array (Exon Array) as described previously ( 22 ). The gene expression data described in this article has been deposited into GEO (GenBank accession no. GSE7761{{tag}}--DEPOSIT-- ). Significant SNPs generated from the genotype-cytotoxicity association in YRI were tested for their association with gene expression using the QTDT test with gender as a covariate as described pr | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1122 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 2889115 | [u'20442332'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Gamazon', 'Cox', 'Huang', 'Dolan'] | ['Gamazon', 'Cox', 'Huang', 'Dolan'] | Proc Natl Acad Sci U S A | 2010 | 5/18/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
1123 | GSE7761 | 6/6/2007 | ['7761'] | [] | [u'17701890', u'20442332'] | 1950832 | [u'17701890'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Gamazon', 'Cox', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | ['Dolan', 'Schweitzer', 'Blume', 'Clark', 'Shukla', 'Duan', 'Huang', 'Kistner', 'Chen'] | Am J Hum Genet | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
1124 | GSE7762 | 6/14/2007 | ['7762'] | ['2815'] | [u'17598886'] | 2394777 | [u'17598886'] | ['Przewlocki', 'Korostynski', 'Piechota', 'Solecki', 'Kaminska'] | ['Przewlocki', 'Korostynski', 'Piechota', 'Solecki', 'Kaminska'] | ['Przewlocki', 'Korostynski', 'Piechota', 'Solecki', 'Kaminska'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1125 | GSE7763 | 6/21/2007 | ['7763'] | ['2784'] | [u'17534367'] | 2875035 | [u'20097653'] | ['Wang', 'Chintapalli', 'Dow'] | ['Ruppin', 'Sharan', 'Shlomi', 'Tuller', 'Waldman'] | [] | Nucleic Acids Res | 2010 | 2010 May | 0 | e). GE data All expression data was downloaded from Gene Expression Omnibus ( 34 ) ( http://www.ncbi.nlm.nih.gov/geo/ ). Human tissues (including fetal tissues): we used the GE of Su et al. ( 35 ) (GDS596). As the original data set is redundant (i.e. it includes similar tissues; for example, more than 20 of the tissues are from different parts of the brain) we focused our analysis on 30 (out of 79) n|ssues ( Supplementary Table S2 ). Other GE sets: fetal and adult circulating blood reticulocytes (GDS2655), Mouse tissues (GDS592), Mouse fetal and adult liver (GSE13149), Mouse embryonic stem cells (GDS2666), Yeast (GDS772, wild type), Chimpanzee (GSE7540), Rat (GDS589, three strains), E. coli (GSE6836), D. melanogaster (GSE7763{{tag}}--REUSE--) and C. elegans (GSE8004). We averaged technical repeats and probes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1126 | GSE7763 | 6/21/2007 | ['7763'] | ['2784'] | [u'17534367'] | 2668767 | [u'19412336'] | ['Wang', 'Chintapalli', 'Dow'] | ['Johansson', 'Svensson', 'Lundberg', 'Ryd\xc3\xa9n', 'Stenberg', 'Larsson'] | [] | PLoS Genet | 2009 | 2009 May | 0 | AND pmc_gds | 0 | 1 | ||||
1127 | GSE7763 | 6/21/2007 | ['7763'] | ['2784'] | [u'17534367'] | 2838750 | [u'20305719'] | ['Wang', 'Chintapalli', 'Dow'] | ['Morrow', 'Innocenti'] | [] | PLoS Biol | 2010 | 3/16/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1128 | GSE7764 | 5/15/2007 | ['7764'] | ['2957'] | [u'17540585'] | 2827956 | [u'20197626'] | ['Presti', 'French', 'Bredemeyer', 'Cai', 'Cao', 'Fehniger', 'Ley'] | ['Eberlein', 'Homann', 'Rosen', 'Golden-Mason', 'Nguyen', 'Victorino'] | [] | J Clin Invest | 2010 | 3/1/2010 | 0 | data and analysis. We used deposited data from gene array analyses of resting and IL-15–stimulated murine NK cells, performed by Fehniger et al. ( 59 ), from the NCBI GEO Web site (accession GSE7764{{tag}}--REUSE-- ). For the purpose of our investigations, we downloaded the raw data, performed MAS5 normalizations, and determined mRNA expression levels for all murine chemokines. The only statistically signific | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1129 | GSE7765 | 5/12/2007 | ['7765'] | ['2745', '2744'] | [u'17517823'] | 2975422 | [u'21047384'] | ['Yoon', 'Wang', 'Zhang', 'Choi', 'Taylor', 'Hsu', 'Chen', 'Hankinson'] | ['Xu', 'Hu'] | [] | BMC Genomics | 2010 | 11/2/2010 | 0 | essed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test|dary region in the 2-D feature space of average gene expression (AG) versus average difference of gene expression (AD). Fig. 1 . shows the distribution of true DEGs in the 2D space for four datasets: GSE9499, GSE6342, GSE6740_1, and GSE6740_2 from GEO database [ 22 ]. Based on the fact that boundary region is characterized with scarcity of genes, a density based pruning algorithm is proposed here for p|s as collected by Kadota et. al. [ 20 ]. They collected 38 microarray datasets with experimentally determined true DEGS by real-time polymerase chain reaction (RT-PCR). Thirty six of the datasets are downloaded from GEO database [ 22 ]. Without losing generality, we experimented with 17 disease or dose response datasets of Homo sapiens out of the 36 GEO datasets (Table 1 ). The 17 datasets are reported j|how that real-world GEO datasets, especially historical ones, tend to have small sample size. Table 1 17 Datasets with 284 DEGs in total. Each dataset has 22833 genes. Dataset Conditions True DEG A B GSE1462 4 4 4 GSE1615_1 4 5 8 GSE1650 18 12 8 GSE2666_2 5 5 6 GSE3524 16 4 4 GSE3860 9 9 8 GSE4917 3 3 5 GSE5667_1 5 6 3 GSE6236 14 14 7 GSE6344 10 10 19 GSE6740_1 10 10 40 GSE6740_2 10 10 62 GSE7146 6 6 6|GSE8441 11 11 9 GSE9499 15 7 77 GSE9574 15 14 5 The 17 Datasets used here cover a variety of biological or medical studies: GSE1462 (mitochondrial DNA mutations), GSE1615_1 (Valproic acid treatment), GSE1650 (chronic obstructive pulmonary disease), GSE2666_2(bone marrow Rho level effect), GSE3524 (tumor of epithelial tissue), GSE3860 (Hutchinson-Gilford progeria syndrome), GSE4917 (breast cancer), GSE5|67_1 (atopic dermatitis), GSE6236 (Adult vs. fetal reticulocyte transcriptome comparison), GSE6344 (renal cell carcinoma disease), GSE6740_1 (HIV-infection), GSE6740_2 (HIV-infection, disease state), GSE7146 (hyperinsulinaemic, does response), GSE7765{{tag}}--REUSE-- (dose response, DMSO or 100 nM Dioxin), GSE8441 (dietary intake response), GSE9574 (breast cancer), and GSE9499 (hypomorphic germline mutations). The div|ch as WAD and FC. To illustrate the bias of popular DEG identification algorithms, Fig. 2 shows true positive (TP), true negative (TN), false positive (FP), and false negative (FN) DEGs for dataset GSE9499 which has 77 true DEGs. Fig. 2 . (a) shows that fold change (FC) misses most true DEGs (FN genes), which are located in the region below the threshold average difference and with high expression le|obtaining a user-specified number of candidate DEGs. Table 2 Comparison of No. of missing true DEGs after DB pruning. ( N 0 = 4, R 0 = 0.0017) Total Gene: 22283 After DP-pruning True DEG DP missed GSE1462 2054 4 0 GSE1615_1 2449 8 3 GSE1650 1317 8 2 GSE2666_2 1618 6 2 GSE3524 814 4 0 GSE3860 2073 8 0 GSE4917 785 5 1 GSE5667_1 1316 3 0 GSE6236 2231 7 0 GSE6344 3127 19 0 GSE6740_1 1183 40 1 GSE6740_2 |DEG to have high expression levels and high expression difference. Table 3 Ranks of true DEGs in original gene list and pruned gene list. Genes are sorted by four DEG identification algorithms on the GSE1577 dataset. Increase of ranks of true DEGs means that DB pruning have correctly filtered out many non-DEGs. t-test/tTest’ 1404/808 7/6 1321/768 3800/1713 4741/1975 3633/1659 4145/1828 606/388 |on level remains the same. And the average difference of expression can be defined as sum of average difference among pairwise comparisons. Our DB pruning is implemented using C++ and Perl and can be downloaded from http://mleg.cse.sc.edu/degprune . Authors’ contributions JH initiated the project, proposed the DB pruning idea, helped in experimental designs, and wrote the manuscript. J. X. develo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1130 | GSE7774 | 10/31/2007 | ['7774'] | [] | [u'18509161'] | 2711087 | [u'19538736'] | ['Choi', 'Rajkovic', 'Xin', 'Ballow'] | ['Nedorezov', 'Karmazin', 'Forabosco', 'Ottolenghi', 'Uda', 'Cao', 'Piao', 'Schlessinger', 'Garcia-Ortiz', 'Cole', 'Pelosi', 'Omari'] | [] | BMC Dev Biol | 2009 | 6/18/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
1131 | GSE7774 | 10/31/2007 | ['7774'] | [] | [u'18509161'] | 2710541 | [u'18509161'] | ['Choi', 'Rajkovic', 'Xin', 'Ballow'] | ['Choi', 'Rajkovic', 'Xin', 'Ballow'] | ['Choi', 'Rajkovic', 'Xin', 'Ballow'] | Biol Reprod | 2008 | 2008 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
1132 | GSE7775 | 10/31/2007 | ['7775'] | [] | [u'18509161'] | 2711087 | [u'19538736'] | [u'Berger', u'Qin', 'Choi', 'Xin', 'Rajkovic', u'Bulyk', 'Ballow'] | ['Nedorezov', 'Karmazin', 'Forabosco', 'Ottolenghi', 'Uda', 'Cao', 'Piao', 'Schlessinger', 'Garcia-Ortiz', 'Cole', 'Pelosi', 'Omari'] | [] | BMC Dev Biol | 2009 | 6/18/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
1133 | GSE7775 | 10/31/2007 | ['7775'] | [] | [u'18509161'] | 2710541 | [u'18509161'] | [u'Berger', u'Qin', 'Choi', 'Xin', 'Rajkovic', u'Bulyk', 'Ballow'] | ['Choi', 'Rajkovic', 'Xin', 'Ballow'] | ['Choi', 'Rajkovic', 'Xin', 'Ballow'] | Biol Reprod | 2008 | 2008 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
1134 | GSE7785 | 5/15/2007 | ['7785'] | [] | [u'14500831'] | 2000902 | [u'17708771'] | ['Fu', 'Raaka', 'Dimitrov', 'Downey', 'Cam', 'Spitznagel', 'Tan', u'Gershengorn', 'Lempicki', 'Xu'] | ['Stiles', 'Lu', 'Lee', 'Cam'] | ['Cam'] | BMC Genomics | 2007 | 8/20/2007 | 0 | of RNA extraction, slide configuration (left vs. right), dye, and array were removed. Cy5 and Cy3 channel signals were then averaged for each sample. All raw and processed data are available in GEO (GSE7785{{tag}}--DEPOSIT--). Real-time PCR Expression Analysis An inventoried assay for the gene TXNL2 was obtained from Applied Biosystems (Hs01582641_g1) and was used to test for the TXNL2.aAug05 and TXNL2.bAug05 transcrip | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1135 | GSE7796 | 5/31/2007 | ['7796'] | [] | [u'17653275'] | 1920555 | [u'17653275'] | ['Wilczek', 'Schellenberg', 'Sangster', 'Lindquist', 'McLellan', 'Kong', 'Bahrami', 'Kelley', 'Queitsch', 'Watanabe'] | ['Wilczek', 'Schellenberg', 'Sangster', 'Lindquist', 'McLellan', 'Kong', 'Bahrami', 'Kelley', 'Queitsch', 'Watanabe'] | ['Wilczek', 'Schellenberg', 'Sangster', 'Lindquist', 'McLellan', 'Kong', 'Bahrami', 'Kelley', 'Queitsch', 'Watanabe'] | PLoS One | 2007 | 7/25/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1136 | GSE7805 | 7/31/2007 | ['7805'] | [] | [u'18312645'] | 2322995 | [u'18312645'] | ['Govoroun', 'Elis', 'Monget', 'Batellier', 'Blesbois', 'Balzergue', 'Martin-Magniette', 'Couty'] | ['Govoroun', 'Elis', 'Monget', 'Batellier', 'Blesbois', 'Balzergue', 'Martin-Magniette', 'Couty'] | ['Govoroun', 'Batellier', 'Monget', 'Elis', 'Blesbois', 'Balzergue', 'Martin-Magniette', 'Couty'] | BMC Genomics | 2008 | 2/29/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1137 | GSE7806 | 9/1/2007 | ['7806'] | [] | [] | 2803434 | [u'20161838'] | [u'Hassig', u'Dozier', u'Massari'] | ['Noble', 'Zhang', 'Rao', 'Dozier', 'Massari', 'Herbert', 'Nguyen', 'Hassig', 'Rozenkrants', 'Symons', 'Jenkins', 'Shiau', 'Gahman'] | [u'Hassig', u'Dozier', u'Massari'] | Curr Chem Genomics | 2008 | 9/27/2008 | 0 | l other inflammatory mediators normally upregulated by cytokines, ( TNFα, IL1β, MCP-2 and I-TAC ) were downregulated by compound 1 relative to compound 2 or vehicle ( GEO submission GSE7806{{tag}}--DEPOSIT-- ). Gene expression was not universally reduced however, for example, transcripts of IL15 (2.4-fold), chemokine orphan receptor 1 ( CMKOR1, 11.2-fold) and ATP-binding cassette protein 1 ( ABCA1, | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1138 | GSE7808 | 9/25/2007 | ['7808'] | [] | [u'17881722', u'18434627'] | 2620272 | [u'19014681'] | ['Koukoui', 'Sullivan', 'Thimon', 'Calvo', 'L\xc3\xa9gar\xc3\xa9'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808{{tag}}--REUSE-- Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1139 | GSE7814 | 10/1/2007 | ['7814'] | ['3145'] | [u'17991715'] | 2111112 | [u'17991715'] | ['Lovegrove', 'Gharib', 'Kain', 'Patel', 'Liles', 'Hawkes'] | ['Lovegrove', 'Gharib', 'Kain', 'Patel', 'Liles', 'Hawkes'] | ['Lovegrove', 'Gharib', 'Kain', 'Patel', 'Liles', 'Hawkes'] | Am J Pathol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1140 | GSE7815 | 5/23/2007 | ['7815'] | [] | [u'18371336'] | 2941458 | [u'20862250'] | ['Eminli', '', 'Utikal', 'Xie', 'Tchieu', 'Stadtfeld', 'Plath', 'Maherali', 'Arnold', 'Jaenisch', 'Sridharan', 'Yachechko', 'Hochedlinger'] | ['Izpis\xc3\xbaa', 'Barrero', 'Paramonov', 'Bou\xc3\xa9'] | [] | PLoS One | 2010 | 9/17/2010 | 0 | sor genes differing in iPSCs from ESCs, and which could be the source of higher risks. Materials and Methods Gene expression analysis The datasets used for the human analyses are: Takahashi et al. (GSE9561) [5] ; Yu et al. (GSE9071) [7] ; Park et al. (GSE9832) [6] ; Zhao et al. (GSE12922) [52] ; Masaki et al. (GSE9709) �|2390) [30] ; Aasen et al. (GSE12583) [37] ; Huangfu et al. (pers. comm.) [53] ; Lowry et al. (GSE9865) [54] ; Ebert et al. (GSE13828) [15] ; Yu et al. (GSE15148) [55] ; Soldner et al. (GSE14711) [11] . The datasets used for the mouse analyses are: Takahashi et al. (GSE525|7815{{tag}}--REUSE--) [30] ; Feng et al. (GSE13211) [56] ; Sridharan et al. (GSE14012) [35] ; Wernig et al. (E-MEXP1037) [4] ; Chen et al . (GSE15267); Zhou et al . (GSE16062) [57] ; Zhao et al . (GSE16925) [20] ; Kang et al . (GSE17004) [19] ; Heng et al . (GSE19023) [58] ; Ic|(wpr =  pr *weight). Next, the average percentrank and the average weighted percentrank were identified for the replicates of each sample. In addition, for the dataset GSE7841 we have averaged the available iPSCs samples (day2, day16, day17 and day18). For the dataset E-MEXP-1037 we have averaged iPSCs samples (clones 8 and 18). For the dataset GSE13211 we have averaged |OSCE (clones 8 and 13) and iPSCs OSE (clones T8 and T9) samples. For the dataset GSE14012 we have averaged ESCs (v6.5 and E14), MEFs (male and female) and iPSCs (1D4 and 2D4) samples. For the dataset GSE15267 we have averaged ESCs (CGR8 and R1), iPSCs reprogrammed with four factors (S2C12 and S2C16) and iPSCs reprogrammed with 3 factors (S53C1 and S53C5). For the dataset GSE19023 we have averaged MEFs |teworthy that the bivalent genes profiles of the iPSC lines described to contribute to viable mice through tetraploid complementation assay (the most stringent proof of pluripotency available so far, GSE16925 and GSE17004) have the highest correlation coefficients when compared with the ESC lines. As expected, the correlation between bivalent genes profiles of fibroblasts and ESCs is very low and espec|es whose expression in iPSCs could restrict or at least bias the differentiation potential. Encouragingly, the iPSC lines that were shown to generate viable mice by tetraploid complementation assays (GSE16925 and GSE17004) express none to very few of such genes, whereas the first iPSCs generated that did not contribute to the germline (GSE5259), as well as the partially reprogrammed iPSC lines (GSE1401|umber of these potentially troublesome genes ( Figure 4 ). For example, the partially reprogrammed iPSC lines 1A2 and 1B3 (GSE14012), as well as the Fbx15KO iPSC line, which showed a limited potency (GSE5259), express Hoxc8, which is a homeodomain gene important for early embryogenesis, especially for neural development, and whose expression level is normally tightly regulated [78] an|133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) Phalanx Human one aray (GPL6254) Affy HG-U133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) GEO accession number GSE12390 Pers. Comm GSE9865 GSE12583 GSE12922 GSE13828 GSE15148 GSE14711 Corr coeff whole array iPS/ES: average (min-max) Primary iPS: 0.988 (0.984–0.989) Secondary iPS: 0.991 (0.990–0.991) | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1141 | GSE7822 | 11/2/2007 | ['7822'] | [] | [u'17968032'] | 2748095 | [u'19709427'] | ['Feng', 'Protopopov', 'Ivanova', 'Nogueira', 'Greshock', 'Nathanson', 'Chin', 'Weber', 'Perna'] | ['Heringa', 'van', 'Hettling', 'Pirovano', 'Binsl'] | [] | BMC Genomics | 2009 | 8/26/2009 | 0 | strointestinal stromal tumors (GIST). These were analyzed using 3 K BAC and PAC arrays where only spots with signal intensities of at least two times the background intensities were included ([ 17 ], GSE5336, see Additional file 2 ). The third dataset includes samples from 4 human melanoma cell lines, which were analyzed using Agilent 44 K oligonucleotide-based CGH arrays ([ 18 ], GSE7822{{tag}}--REUSE--, see Additio | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1142 | GSE7826 | 5/22/2007 | ['7826'] | ['2885'] | [u'17652167'] | 2880477 | [u'17652167'] | ['Zer', 'Shin', 'Sachs'] | ['Zer', 'Shin', 'Sachs'] | ['Zer', 'Shin', 'Sachs'] | Physiol Genomics | 2007 | 10/22/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1143 | GSE7827 | 6/7/2007 | ['7827'] | [] | [] | 2654745 | [u'18988674'] | [u'Puttaiah', u'Rudraiah', u'Samji'] | ['Jayaram', 'Priyanka', 'Sridaran', 'Medhamurthy'] | [] | Endocrinology | 2009 | 2009 Mar | 0 | s collected from monkeys treated with VEH or CET for 24 h; the hybridization details and individual CEL and CHP files have been deposited at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE8371 . Microarray analysis results revealed that inhibition of pituitary LH secretion by CET treatment significantly ( P < 0.05) affected the expression of 3949 genes (>2-fold change wi|he CL of monkeys treated with CET plus PBS and CET plus rhLH, hybridization details, and individual CEL and CHP files have been deposited at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE7827{{tag}}--DEPOSIT-- . Microarray analysis results revealed that replacement of exogenous rhLH after inhibition of pituitary LH secretion by CET treatment significantly ( P < 0.05) affected the expression of 4|3b1; -treated monkeys. Microarray comparison analysis, hybridization details, and individual CEL and CHP files have been deposited online at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE7971 . Microarray analysis results revealed that PGF 2α treatment significantly ( P < 0.05) affected the expression of 2290 genes in the CL (>2-fold change with Benjamini and H|lysis data generated for different stages of the CL of the rhesus monkey deposited recently in the public domain by Bogan et al . ( 70 ) at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE10367 . The analysis of microarray data between late midluteal phase and very late luteal phase by GeneSifter software revealed that 2882 genes were differentially regulated (1280 and 1522 up-regulation | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
1144 | GSE7831 | 12/13/2007 | ['7831'] | [] | [u'18029397'] | 2694999 | [u'19536338'] | ['Ertl', 'Weninger', 'Hensley', 'Cavanagh', 'Rendl', 'Hunter', 'Tobias', 'Iparraguirre', 'Masek', 'von'] | ['Rabadan', 'Greenbaum', 'Levine'] | [] | PLoS One | 2009 | 6/18/2009 | 0 | se MOE430v2 Gene Chip Microarray (Affymatrix, Santa Clara, CA, USA). This data is publicly available from the NCBI Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo/ ) under the record number GSE7831{{tag}}--DEPOSIT--. As in that work, we use the expression data generated 4 hours after stimulation to analyze the innate response. For subsequent analysis, the coding regions for all mouse genes were needed. All of | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
1145 | GSE7833 | 5/18/2007 | ['7833'] | ['2951'] | [u'19008490'] | 2636937 | [u'19008490'] | ['Winchester', 'Maxwell', 'Gleadle', 'Smith', 'Brooks', 'Elvidge', 'Talbot', 'Liu', 'Ragoussis', 'Glenny', 'Robbins'] | ['Winchester', 'Maxwell', 'Gleadle', 'Smith', 'Brooks', 'Elvidge', 'Talbot', 'Liu', 'Ragoussis', 'Glenny', 'Robbins'] | ['Winchester', 'Maxwell', 'Gleadle', 'Smith', 'Brooks', 'Elvidge', 'Talbot', 'Liu', 'Ragoussis', 'Glenny', 'Robbins'] | J Appl Physiol | 2009 | 2009 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1146 | GSE7837 | 8/31/2007 | ['7837'] | [] | [] | 2516568 | [u'18709148'] | [u'Orner', u'Benninghoff', u'Tilton', u'Hendricks', u'Pereira', u'Williams'] | ['Orner', 'Benninghoff', 'Carpenter', 'Tilton', 'Hendricks', 'Pereira', 'Williams'] | [u'Orner', u'Benninghoff', u'Tilton', u'Hendricks', u'Pereira', u'Williams'] | Environ Health Perspect | 2008 | 2008 Aug | 0 | ppm DHEA, 500 (low) and 1,800 (high) ppm PFOA, or 5 ppm E 2 in the diet compared with control animals. Supplemental raw data files are available online through Gene Expression Omnibus accession no. GSE7837{{tag}}--DEPOSIT-- ( NCBI 2008b ). Array hybridizations were performed with a common reference sample using dye swapping, and final fold-change values were calculated as a ratio to control animals. Bidirectional hier | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1147 | GSE7838 | 7/1/2007 | ['7838'] | [] | [] | 2922944 | [u'19828638'] | [u'Igarashi', u'Lowe', u'Nochi', u'Yoshida', u'Takayama', u'Kiyono', u'Terahara', u'Kurokawa', u'Yuki'] | ['Kumar', 'Taylor', 'Nochi', 'Kiyono', 'Knoop', 'Yagita', 'Sakthivel', 'Akiba', 'Butler', 'Williams'] | [u'Nochi', u'Kiyono'] | J Immunol | 2009 | 11/1/2009 | 0 | low sorted PP M cells and villous enterocytes revealed that both of these intestinal epithelial cell types express mRNA for RANK ( 35 ) (gene expression data for RANK archived in NCBI Gene Expression Omnibus under accession number GSE7838{{tag}}--MENTION-- ; http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE7838{{tag}}--MENTION-- ). While the induction of M cells by RANKL appears to be mediated by direct action of RANKL on RANK-expre | 0 | 0 | 1 | NOT pmc_gds | 1 | 0 |
1148 | GSE7841 | 6/7/2007 | ['7841'] | [] | [u'17554338'] | 2941458 | [u'20862250'] | ['Ichisaka', 'Okita', 'Yamanaka'] | ['Izpis\xc3\xbaa', 'Barrero', 'Paramonov', 'Bou\xc3\xa9'] | [] | PLoS One | 2010 | 9/17/2010 | 0 | sor genes differing in iPSCs from ESCs, and which could be the source of higher risks. Materials and Methods Gene expression analysis The datasets used for the human analyses are: Takahashi et al. (GSE9561) [5] ; Yu et al. (GSE9071) [7] ; Park et al. (GSE9832) [6] ; Zhao et al. (GSE12922) [52] ; Masaki et al. (GSE9709) �|2390) [30] ; Aasen et al. (GSE12583) [37] ; Huangfu et al. (pers. comm.) [53] ; Lowry et al. (GSE9865) [54] ; Ebert et al. (GSE13828) [15] ; Yu et al. (GSE15148) [55] ; Soldner et al. (GSE14711) [11] . The datasets used for the mouse analyses are: Takahashi et al. (GSE525|7815) [30] ; Feng et al. (GSE13211) [56] ; Sridharan et al. (GSE14012) [35] ; Wernig et al. (E-MEXP1037) [4] ; Chen et al . (GSE15267); Zhou et al . (GSE16062) [57] ; Zhao et al . (GSE16925) [20] ; Kang et al . (GSE17004) [19] ; Heng et al . (GSE19023) [58] ; Ic|(wpr =  pr *weight). Next, the average percentrank and the average weighted percentrank were identified for the replicates of each sample. In addition, for the dataset GSE7841{{tag}}--REUSE-- we have averaged the available iPSCs samples (day2, day16, day17 and day18). For the dataset E-MEXP-1037 we have averaged iPSCs samples (clones 8 and 18). For the dataset GSE13211 we have averaged |OSCE (clones 8 and 13) and iPSCs OSE (clones T8 and T9) samples. For the dataset GSE14012 we have averaged ESCs (v6.5 and E14), MEFs (male and female) and iPSCs (1D4 and 2D4) samples. For the dataset GSE15267 we have averaged ESCs (CGR8 and R1), iPSCs reprogrammed with four factors (S2C12 and S2C16) and iPSCs reprogrammed with 3 factors (S53C1 and S53C5). For the dataset GSE19023 we have averaged MEFs |teworthy that the bivalent genes profiles of the iPSC lines described to contribute to viable mice through tetraploid complementation assay (the most stringent proof of pluripotency available so far, GSE16925 and GSE17004) have the highest correlation coefficients when compared with the ESC lines. As expected, the correlation between bivalent genes profiles of fibroblasts and ESCs is very low and espec|es whose expression in iPSCs could restrict or at least bias the differentiation potential. Encouragingly, the iPSC lines that were shown to generate viable mice by tetraploid complementation assays (GSE16925 and GSE17004) express none to very few of such genes, whereas the first iPSCs generated that did not contribute to the germline (GSE5259), as well as the partially reprogrammed iPSC lines (GSE1401|umber of these potentially troublesome genes ( Figure 4 ). For example, the partially reprogrammed iPSC lines 1A2 and 1B3 (GSE14012), as well as the Fbx15KO iPSC line, which showed a limited potency (GSE5259), express Hoxc8, which is a homeodomain gene important for early embryogenesis, especially for neural development, and whose expression level is normally tightly regulated [78] an|133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) Phalanx Human one aray (GPL6254) Affy HG-U133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) GEO accession number GSE12390 Pers. Comm GSE9865 GSE12583 GSE12922 GSE13828 GSE15148 GSE14711 Corr coeff whole array iPS/ES: average (min-max) Primary iPS: 0.988 (0.984–0.989) Secondary iPS: 0.991 (0.990–0.991) | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1149 | GSE7842 | 10/9/2007 | ['7842'] | [] | [u'17922911'] | 2246288 | [u'17922911'] | ['Tavar\xc3\xa9', 'Ellis', 'Blenkiron', 'Goldstein', 'Thorne', 'Dunning', u'Tavar\xe9', 'Teschendorff', 'Chin', 'Caldas', 'Miska', 'Green', 'Barbosa-Morais', 'Spiteri'] | ['Tavar\xc3\xa9', 'Ellis', 'Blenkiron', 'Goldstein', 'Thorne', 'Dunning', 'Teschendorff', 'Chin', 'Caldas', 'Miska', 'Green', 'Barbosa-Morais', 'Spiteri'] | ['Miska', 'Ellis', 'Blenkiron', 'Goldstein', 'Thorne', 'Dunning', 'Tavar\xc3\xa9', 'Green', 'Caldas', 'Teschendorff', 'Chin', 'Barbosa-Morais', 'Spiteri'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1150 | GSE7846 | 10/31/2007 | ['7846'] | ['3060'] | [u'17956924'] | 2752458 | [u'19735579'] | ['Lang', '', 'Zhang', 'Sun', 'Lin', 'Sha', 'Wu', 'Lei', 'Chen'] | ['Zhao', 'He', 'Wang', 'Pan', 'Bai'] | [] | Reprod Biol Endocrinol | 2009 | 9/8/2009 | 1 | microarray raw or normalized data are available. Finally six public gene expression data sets were involved in our study, which assessed endometriosis transcripts on a genome-wide basis. In data set GSE7307, total 677 samples from more than 90 distinct tissue types were processed, but only the profiles related to endometriosis and eutopic endometrium were considered here. The data generated from human| Characteristics of datasets included in the studies. First Author or Contributor Chip GEO Accession Experimental design Classification Probes Number of samples Disease Normal Sha [ 4 ] U133 PLUS 2.0 GSE7846{{tag}}--REUSE-- unpaired, HEECS ovarian 54K 5 5 Burney [ 14 ] U133 PLUS 2.0 GSE6364 unpaired, tissues Ovarian, peritoneal, rectovaginal 54K 21 16 Eyster [ 15 ] CodeLink GSE5108 paired, tissues Ovarian, peritoneal |nd adjusted, normalized and log2 probe-set intensities calculated using the Robust Multichip Averaging (RMA) algorithm in affy package [ 23 , 24 ], and the Codelink arrays normalizations performed in GSE5108 were retained. Genes which cannot be mapped to any KEGG pathway identified were excluded from the further analysis. The interquartile range (IQR) was used as a measure of variability. From the resu | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1151 | GSE7850 | 5/25/2007 | ['7850'] | [] | [u'17525199'] | 2573398 | [u'19122891'] | ['Babra', 'Rosenbaum', 'Choi', 'Smith', 'Davies', 'Zamora', 'Chipps', 'Powers', 'Planck', 'Pan'] | ['Rosenbaum', 'Choi', 'Binek', 'Smith', 'Appukuttan', 'Stout', 'Planck'] | ['Choi', 'Smith', 'Rosenbaum', 'Planck'] | J Ocul Biol Dis Infor | 2008 | 2008 | 0 | ng duplicate cultures of endothelial cells, from which RNA was extracted separately for hybridization to independent arrays). These data may be found in the Gene Expression Omnibus repository [ 13 ] (GSE7850{{tag}}--REUSE--: see retinal/choroidal endothelial cell_human 4/5/6_unstimulated_replicate1/2). Information about the experimental procedure and data normalization and processing were previously published [ 3 ]. S | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1152 | GSE7864 | 7/17/2007 | ['7864'] | [] | [u'17554337'] | 2764428 | [u'19671526'] | ['Lowe', 'Jackson', 'Magnus', 'Xuan', 'Lim', 'de', 'Hannon', 'Zender', 'Xue', 'Cleary', 'Linsley', 'Liang', 'Chen', 'Ridzon', 'He'] | ['Xie', 'Tu', 'Li', 'Liu', 'Yu', 'Hua'] | [] | Nucleic Acids Res | 2009 | 2009 Oct | 1 | es the log 2 -transformed ratio of gene expression between treatment (after transfection) and control (before transfection). ( B ) mRNA level changes of miR-124’s putative non-target genes in GDS2657. Putative non-target genes are the whole set of genes found in GDS2657 minus the putative target genes. ( C ) Numbers of up- (up panel) and down-regulated (below panel) genes at different time poin|. The final TF–gene set included 130 338 relationships between 214 human TFs and 16 534 targets. For more information, see Supplementary Table 3 . MPGE datasets Five groups of MPGE datasets (GDS1858, GDS2657, GSE6474, GSE6838 and GSE7864{{tag}}--REUSE--) were downloaded from the GEO database; these groups include 53 individual datasets involving 19 miRNAs. The GDS1858 dataset group ( 2 ) includes data on HeLa|ansfection with wild type or mutant miR-1, miR-124, or miR-373. GDS2657 ( 13 ) includes gene expression profiles at seven time points (4, 8, 16, 24, 32, 72 and 120 h) after overexpression of miR-124. GSE6474 ( 14 ) includes four replicated measurements of gene expression changes after overexpression of let-7a. GSE6838 ( 15 ) includes gene expression data over a time course (6, 10, 14 and 24 h) after ov|mediators of miRNA-triggered regulation, summarized for each of 53 MPGE datasets Dataset group miRNA Cell line Time point K–S test P -value De-graded targets TF mediators Shuffling P -value GDS1858 miR-1 HeLa 12 1.2 e –15 91 ETS1, CREB1, YY1 0 24 4.7 e –15 107 TFAP2A, CREB1, YY1, SREBF1 0 miR-124 12 1.8 e –12 109 GLI3* 0.04 24 6.5 e –22 132 MLLT7, NKX6.1 0.03 m|02013;73 366 AHR*, RREB1 0 32 6.2 e –64 329 AHR*, SP1*, EGR1, RELA*, RREB1, NR3C1*, SP2 0 72 2.2 e –59 292 CREB1, SP1, ETS1, MLLT7, SP2 0 120 1.1 e –19 144 AHR*, SP1*, MLLT7 0 GSE6474 let-7a3 A549 Not known 1.1 e –2 1 PAX3, HOXA1, BACH2, EGR3, MYC 0.02 GSE6838 let-7c HCT116 Dicer−/− #2 24 1.5 e –54 211 MYC 0.05 miR-103 10 5.7 e –08 82 MEF2|– miR-195 10 4.3 e –30 137 SMAD7, NFATC3 0.05 24 7.4 e –31 98 FOXC1 0.08 miR-20 24 9.3 e –30 59 – – miR-215 24 2.6 e –11 38 – – GSE7864{{tag}}--REUSE-- miR-34a A549 H-1 term 24 1.0 e –29 112 E2F5*, YY1 0.03 HCT116 Dicer −/− #2 24 1.1 e –29 132 E2F3, YY1, NFE2L1 0.02 TOV21G H1-term 24 1.4 e –23 70 E2F5, BACH2|al or greater BIC score than that of the regressed Equation ( 2 ). Many of our predictions are supported by independent experimental studies. For example, our analyses of two MPGE datasets for miR-1 (GDS1858) predicted 130 primary targets; 50 (38.4%) of these targets appeared in TarBase ( 19 ), a database collecting experimentally validated miRNA targets. Some miRNAs, like let-7c, miR-16 and miR-17-5p,|ly (Please find these plots in the ‘wrapped results’ available at http://www.biosino.org/kanghu/~DCR/ supplementary file1.zip). With the exception of let-7a-3 in the A549 cell line (GSE6474), all miRNAs were found to have degradation-inducing ability in all surveyed situations, as the K–S test P -values were exclusively <0.001 ( Table 1 ). Some miRNAs, such as miR-124|ry targets account for a significant proportion of the observed mRNA level changes in an MPGE dataset ( Figure 1 C). For example, at the 32-h time point after miR-124 overexpression (dataset from the GDS2657 group), miRNA’s direct regulation could explain decreased MCs of only 181 genes; with our predicted two-layer regulatory model, the decreased MCs of an additional 98 genes and increased MCs| of another 42 genes were attributed to miRNA’s indirect regulation, raising the proportion of explainable MCs from 27.8 to 47.7%. The classifications of regulated genes at all time-points in GDS2657 are shown in Supplementary Table 8 , where a general trend is evident that the direct regulation decreases rapidly while the secondary regulation maintains at a considerable multitude, resulting i|tered on miRNA and mediated by TFs Figure 3 depicts a typical two-layer regulatory network, mined from an MPGE dataset measured at the 12-h time point after overexpression of miR-1 (dataset from the GDS1858 group). In addition to directly down-regulating 91 degraded targets (blue arrows), miR-1 overexpression causes expression changes in more than 100 non-target genes, possibly through translationally|O:005114, GO:000752; all FDRs < 0.29), consistent with the known fact that miR-1 is expressed selectively in heart and skeletal muscle. Similar analyses were performed on the miR-124 network (GDS2657, 32 h), resulting in the identification of 129 significant biological themes (FDR < 0.25), among which neuron apoptosis (GO:0051402) is in accordance with miR-124’s proven role in d | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1153 | GSE7868 | 9/1/2007 | ['7868'] | ['3111'] | [u'17679089'] | 2726827 | [u'19632176'] | ['Brown', 'J\xc3\xa4nne', 'Wang', 'Carroll', 'Keeton', 'Li', 'Liu', 'Chinnaiyan', 'Pienta'] | ['J\xc3\xa4nne', 'Han', 'Mehra', 'Li', 'Liu', 'Chinnaiyan', 'Rubin', 'Fiore', 'Regan', 'Brown', 'Balk', 'Zhang', 'Loda', 'Wu', 'Fiorentino', 'Manrai', 'Yuan', 'True', 'Meyer', 'Wang', 'Yu', 'Kantoff', 'Lupien', 'Carroll', 'Beroukhim', 'Chen', 'Xu'] | ['Brown', 'J\xc3\xa4nne', 'Wang', 'Carroll', 'Li', 'Liu', 'Chinnaiyan'] | Cell | 2009 | 7/23/2009 | 0 | The expression raw data for LNCaP cells in the presence or absence of androgen was from our previous work (Wang et al., 2007) (GEO dataset GSE7868{{tag}}--REUSE--). | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1154 | GSE7872 | 6/1/2007 | ['7872'] | [] | [u'19295514'] | 2910248 | [u'19295514'] | [u'Calcar', 'Hawkins', 'Heintzman', 'Hon', 'Stark', 'Liu', 'Ren', 'Crawford', u'Barrera', 'Lee', 'Zhang', 'Ye', 'Kheradpour', u'Glass', 'Ching', u'Kim', 'Kellis', 'Green', 'Stuart', u'Webster', 'Stewart', u'Rosenfeld', u'Wang', 'Thomson', u'Luna', 'Antosiewicz-Bourget', 'Harp', 'Lobanenkov'] | ['Hawkins', 'Lee', 'Stewart', 'Heintzman', 'Hon', 'Ren', 'Zhang', 'Ye', 'Kellis', 'Stark', 'Kheradpour', 'Liu', 'Ching', 'Antosiewicz-Bourget', 'Stuart', 'Thomson', 'Green', 'Lobanenkov', 'Crawford', 'Harp'] | ['Hawkins', 'Lee', 'Stewart', 'Zhang', 'Hon', 'Heintzman', 'Ye', 'Kellis', 'Stark', 'Kheradpour', 'Liu', 'Ching', 'Green', 'Antosiewicz-Bourget', 'Stuart', 'Thomson', 'Ren', 'Lobanenkov', 'Crawford', 'Harp'] | Nature | 2009 | 5/7/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1155 | GSE7873 | 5/23/2007 | ['7873'] | [] | [u'18179715'] | 2266709 | [u'18179715'] | ['Jones', 'Nuzhdin', 'Ye', 'Mezey'] | ['Jones', 'Nuzhdin', 'Ye', 'Mezey'] | ['Jones', 'Nuzhdin', 'Ye', 'Mezey'] | BMC Evol Biol | 2008 | 1/7/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1156 | GSE7874 | 8/22/2007 | ['7874'] | [] | [u'19597182'] | 2745848 | [u'19597182'] | ['Noh', 'Goh', 'Oneal', 'Miller', 'Russell', 'Meier', 'Byrnes', 'Lee', 'Kiefer', 'Gantt', 'Rognerud', 'Dean', 'Ou', 'Sripichai', 'Bhanu', 'Tanno'] | ['Noh', 'Goh', 'Oneal', 'Miller', 'Russell', 'Meier', 'Byrnes', 'Lee', 'Kiefer', 'Gantt', 'Rognerud', 'Dean', 'Ou', 'Sripichai', 'Bhanu', 'Tanno'] | ['Noh', 'Tanno', 'Oneal', 'Miller', 'Russell', 'Meier', 'Byrnes', 'Lee', 'Kiefer', 'Gantt', 'Rognerud', 'Dean', 'Ou', 'Sripichai', 'Bhanu', 'Goh'] | Blood | 2009 | 9/10/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1157 | GSE7875 | 10/3/2007 | ['7875'] | [] | [u'17912369'] | 1991598 | [u'17912369'] | ['Gill', 'Paolino', 'Fayard', u'Hollander', 'Holl\xc3\xa4nder', 'Hynx', 'Hemmings'] | ['Gill', 'Paolino', 'Fayard', 'Holl\xc3\xa4nder', 'Hynx', 'Hemmings'] | ['Gill', 'Paolino', 'Fayard', 'Holl\xc3\xa4nder', 'Hynx', 'Hemmings'] | PLoS One | 2007 | 10/3/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1158 | GSE7878 | 8/1/2007 | ['7878'] | [] | [u'18320023'] | 2254495 | [u'18320023'] | ['Giaccone', 'Jassem', 'Gallegos', 'Meijer', 'Floor', 'Rodriguez', 'Ylstra', 'Smit', 'Roepman', 'Beebe', 'Neckers', 'Mooi', 'Niklinski', u'GallegosRuiz', 'van', 'Muley'] | ['Giaccone', 'Jassem', 'Gallegos', 'Meijer', 'Floor', 'Rodriguez', 'Ylstra', 'Smit', 'Roepman', 'Beebe', 'Neckers', 'Mooi', 'Niklinski', 'van', 'Muley'] | ['Giaccone', 'Jassem', 'Gallegos', 'Meijer', 'Floor', 'Rodriguez', 'Ylstra', 'Smit', 'Roepman', 'Beebe', 'Neckers', 'Mooi', 'Niklinski', 'van', 'Muley'] | PLoS One | 2008 | 3/5/2008 | 1 | AND pmc_gds | 1 | 0 | ||||
1159 | GSE7880 | 6/4/2007 | ['7880'] | [] | [] | 2633354 | [u'19087325'] | [u'Garcia-Pardillos', u'Neukirchen', u'Rohrbeck', u'Haas', u'Geddert', u'Rosskopf', u'Rohr', u'Kronenwett'] | ['Liang'] | [] | BMC Med Genomics | 2008 | 12/16/2008 | 1 | using human primary lung cancer specimens (no cell lines) under the search terms "human lung adenocarcinoma" or "human lung squamous carcinoma" as of July 1 st , 2007 were reviewed. Only one dataset, GSE7339, was removed from further analysis due to too many genes missing from the 17-gene signature for prediction analysis. The 14 th dataset GSE2514 in Table 1 that is a mouse gene expression dataset | and subject to analyses. Table 1 Summary of lung cancer datasets used in this study. Database GEO Platform Institute PMID Technology type Organism AD# SCC# Stage # Genes % Correct AD % Correct SCC 1 GSE3398 GPL2648/2778/2832 Stanford 11707590 spotted cDNA Human 41 17 I to III 17 93 94 2 NA* Affy HG-U95A DFCI 11707567 oligonucleotide Human 139 21 I to III 17 91 86 3 NA** Affy HG-U133A U Michigan 121182| I to III 17 95 NA 4 GSE3141 Affy U95A/HuGeneFL Duke 16899777 oligonucleotide Human 54 57 I to III 17 83 72 5 GSE4573 Affy HG-U133A U Michigan 16885343 oligonucleotide Human 0 129 I to III 17 NA 85 6 GSE1037 CHUGAI 41K CIH, Japan 15016488 spotted cDNA Human 12 0 NA 17 83 NA 7 GSE6253 Affy HG-U95A/U133AB Washington U 17194181 oligonucleotide Human 14 36 I 17 79 78 8 GSE3268 Affy HG-U133A UC Davis 161889| Human 0 5 NA 17 NA 100 9 GSE1987 Affy HG-U95A Tel Aviv U NA oligonucleotide Human 8 17 I to III 17 88 59 10 GSE6044 Affy HG-Focus U Duesseldorf, Germany NA oligonucleotide Human 10 10 NA 16 70 90 11 GSE7880{{tag}}--REUSE-- Affy HG-Focus Heinrich-Heine U, Germany NA oligonucleotide Human 25 18 IIIB/IV 16 92 83 12 GSE2514 Affy HG-U95A U Colorado 16314486 oligonucleotide Human 20 0 NA 15 100 NA 13 GSE5843 GSE5123 PC Hum|14486 oligonucleotide Mouse 44 0 NA 13 100 NA * ** Figure 1 A flow diagram outlines selection of 13 human lung cancer databases used in this study . This diagram does not include the 14 th dataset GSE2514 in Table 1 that is a mouse gene expression dataset and was not used to calculate the accuracy of prediction in the meta-analysis. DB, database. Clustering analysis and evaluation of classification | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1160 | GSE7889 | 5/26/2007 | ['7889'] | [] | [] | 1978075 | [u'17668008'] | [u'Le', u'Palakodeti', u'Lucas', u'Fernald', u'Young', u'Jiang'] | ['Le', 'Jiang', 'Lucas', 'Fernald', 'Young', 'Palakodeti'] | ['Le', 'Jiang', 'Lucas', 'Fernald', 'Young', 'Palakodeti'] | EMBO Rep | 2007 | 2007 Aug | 0 | ing up to 95°C with a slope of 0.1°C/s. Data deposition. The data discussed in this publication have been deposited in National Center for Biotechnology Information's Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ ) and are accessible through GEO Series accession number GSE7889{{tag}}--DEPOSIT-- . Supplementary information is available at EMBO reports online ( http://www.nature.com/em | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1161 | GSE7890 | 11/26/2007 | ['7890'] | ['3071'] | [u'17989729'] | 2933038 | [u'17989729'] | ['Opalenik', 'Smith', 'Williams', 'Boone', 'Russell'] | ['Opalenik', 'Smith', 'Williams', 'Boone', 'Russell'] | ['Opalenik', 'Smith', 'Williams', 'Boone', 'Russell'] | J Invest Dermatol | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
1162 | GSE7891 | 12/21/2007 | ['7891'] | ['3150'] | [u'17956998'] | 2276652 | [u'17956998'] | ['Knepper', 'Pisitkun', 'Uawithya', 'Ruttenberg'] | ['Knepper', 'Pisitkun', 'Uawithya', 'Ruttenberg'] | ['Knepper', 'Pisitkun', 'Uawithya', 'Ruttenberg'] | Physiol Genomics | 2008 | 1/17/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1163 | GSE7895 | 10/2/2007 | ['7895'] | [] | [u'17894889'] | 2831002 | [u'20064233'] | ['Beane', 'Spira', 'Brody', 'Liu', 'Lenburg', 'Sebastiani'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux'] | [] | BMC Bioinformatics | 2010 | 1/11/2010 | 0 | Data collection (cel files) was performed using Gene Expression Omnibus [19] on the Affymetrix platform HG-U133a (Human Genome model U133a). This collection consists of 34 datasets (tableÊ2) for which there are at least 15 replicates for each of 2 different experimental conditions. {{key}}--REUSE-- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1164 | GSE7895 | 10/2/2007 | ['7895'] | [] | [u'17894889'] | 2375039 | [u'17894889'] | ['Beane', 'Spira', 'Brody', 'Liu', 'Lenburg', 'Sebastiani'] | ['Beane', 'Spira', 'Brody', 'Liu', 'Lenburg', 'Sebastiani'] | ['Beane', 'Spira', 'Brody', 'Liu', 'Lenburg', 'Sebastiani'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1165 | GSE7896 | 5/25/2007 | ['7896'] | ['2832'] | [u'18393631'] | 2837036 | [u'20178649'] | ['Avery', 'Shepherd', u'Inniss', 'Moore', 'Heath'] | ['Wang', 'Chao', 'Liu', 'Wu', 'Wong', 'Liang', 'Hsu', 'Hsieh'] | [] | BMC Genomics | 2010 | 2/24/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1166 | GSE7899 | 12/21/2007 | ['7899'] | [] | [u'17922897'] | 2246287 | [u'17922897'] | ['Ashley', 'Franco', 'Orr', 'Chan', 'Thomson', 'Vanpoucke', 'Grace', 'Williams', 'Hayward'] | ['Ashley', 'Franco', 'Orr', 'Chan', 'Thomson', 'Vanpoucke', 'Grace', 'Williams', 'Hayward'] | ['Ashley', 'Franco', 'Orr', 'Chan', 'Thomson', 'Vanpoucke', 'Grace', 'Williams', 'Hayward'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1167 | GSE7902 | 6/27/2007 | ['7902'] | [] | [u'17597760'] | 2565505 | [u'18941637'] | ['McKay', '', 'Evans', 'Tesar', 'Brook', 'Gardner', 'Chenoweth', 'Davies', 'Mack'] | ['Terskikh', 'Maurer', 'Mercola', 'Nelson', 'Oshima', 'Bajpai', 'Cece\xc3\xb1a'] | [] | PLoS One | 2008 | 2008 | 1 | nd K18 (26×) in EpiSC while RNAs for Nanog, Sox2 and Oct4 are similar in the two cell types. C, coordinate variation of K8 and K18 RNAs in individual array values of ES and EpiSC samples from GSE7902{{tag}}. D, strong correlation between K18 and Jun RNA levels. Jun is a component of the AP-1 transcription factor activity previously identified as important in the induction of K18 RNA in differentiation|ate regulation of pairs of type I and II keratins is not known. Materials and Methods RNA expression analysis Primary RNA expression data from mouse ES cells [35] (Agilent platform, GSE3231), human ES cells [17] (Illumina platform), ( www.stemcellcommunity.org ) and mouse EpiSC and ES cells (Agilent platform, GSE7902{{tag}}--REUSE--) [10] were downloaded and examin | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1168 | GSE7902 | 6/27/2007 | ['7902'] | [] | [u'17597760'] | 2483735 | [u'18665239'] | ['McKay', '', 'Evans', 'Tesar', 'Brook', 'Gardner', 'Chenoweth', 'Davies', 'Mack'] | ['Tsuda', 'Tamai', 'Ochiya', 'Hayashi', 'Teratani', 'Ogawa', 'Ueda', 'Shimizu', 'Kawamata'] | [] | PLoS One | 2008 | 7/30/2008 | 0 | lls ( Figure S8B and Table S3 ). Thus, microarray analysis shows that the gene expression profile of our rES cells partially resembles mES cells. Furthermore, using a database (GEO accession number GSE7902{{tag}}--REUSE--), the expression pattern of rES cells was compared with that of mES cells and EpiSCs [34] . The cluster analyses of rES cells, mES cells and EpiSCs were performed by sorting 2866 a | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1169 | GSE7904 | 5/30/2007 | ['7904'] | [] | [] | 2602602 | [u'19104654'] | [u'Richardson'] | ['Remy', 'Didier', 'Granjeaud', 'Imbert', 'Bergon', 'Nguyen', 'Puthier', 'Lopez', 'Textoris'] | [] | PLoS One | 2008 | 2008 | 1 | iated DKNN value (with k being set typically to 100 for microarrays containing 10 to 50k elements). Distributions of DKNN values observed with both an artificial and a real dataset (Complex9RN200 and GSE1456 respectively, see thereafter for a description) are shown in Figure S3A and S3B (solid curve). The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a parti|s ability to extract relevant informations from a noisy environment. However, a range of optimal values for inflation parameter needs to be defined to get the best results. Performances of DBF-MCL on GSE1456 dataset Next, DBF-MCL was tested with microarray data to explore its effectiveness in finding clusters of co-regulated genes. To this end, we used the microarray data from Pawitan et al. |sed for analysis. Figure S5B , shows the number of informative genes obtained with various k values. Again, two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset. As expected, the transition from dense to sparse regions was less marked than in the artificial dataset. A k value of 100 was chosen to allow the extraction of a large part of data that ca| microarray dataset. Interestingly, in all cases meaningful partitioning results were obtained using inflation parameter set to 2. 10.1371/journal.pone.0004001.g001 Figure 1 Results obtained with the GSE1456 dataset. DBF-MCL was run with GSE1456 as input (k = 100, FDR = 10%, S 1..3 , Inflation = 2). (A) Hierarchical cluster|ected genes. (D) The graph after MCL partitioning. Each point is colored according to its associated class. (E) Correspondence between hierarchical clustering and DBF-MCL results. (F) TS obtained for GSE1456 (G) Functional enrichment associated with these TS. Systematic extraction of TS We next applied DBF-MCL algorithm to all experiments performed on human, mouse and rat Affymetrix microarrays and ava|he results panel (2), the information panel (3), the plugins panel (4) and the plugin display panel (5). This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 (“Temporal profiling in muscle regeneration”). Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis. Meta-analysis of public microarray data using|e 1 Transcriptional signatures containing Affymetrix probes for ESR1, GATA3 and FOXA1. TS ID 1 Genes 2 Probes 2 Samples 2 Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569 GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904{{tag}}--REUSE-- GPL570 unpublished 2007 - 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE677|lein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361|et al 2007 17420468 B79B1C0B9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8F0B528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 - EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue|05b;15] and in genes related to the PIR keyword “multigene family”. Furthermore, several signatures, of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment [16] . Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors. Indeed, gain of 8q is freq|<1.10−20 ) for any of the human cytoband tested. TS ID 1 Enrich. 2 Cytoband q.value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24% 17q12-q21 1.7.10 −39 Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9% 8q24.3 7.0.10 −32 Melanoma GSE7153 GPL570 Unpublished 2007 - 60E29DA83 16% 8q24.3 6.8.10 −24 Melanoma GSE7127 GPL570| GPL570 Johansson P et al 2007 17516929 60E6B4129 35% 20p13 1.6.10 −26 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28% 6p21.3 1.2.10 −28 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17% 7q22.1 6.3.10 −31 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32% 11q23.3 1.4.10 −26 Melan|127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42% 8q24.3 1.1.10 −36 Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16% 11q23.3 6.9.10 −23 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11% 7q22.1 9.5.10 −30 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 1 Transcriptional signature ID. 2 Enrichment: Proportion o|ial platforms ( e.g. National Cancer Institute, Vanderbilt Microarray Shared Resource, Genome Institute of Singapore), several of them being related to the MicroArray Quality Control (MAQC) project (GSE5350) [26] . However, to date, systematic analysis of all experiments performed on these platforms has not been done. The flexibility of our approach also makes it possible to integrate|ile. Figure S3 Distributions of DKNN values. Observed DKNN values (solid line) and of a set of simulated DKNN values S (dotted line) are shown for (A) the Complex9RN200 artificial dataset and (B) the GSE1456 microarray dataset. (9.01 MB TIF) Click here for additional data file. Figure S4 Colors correspond to the clusters found using the corresponding algorithm (A) The whole dataset (9,112 points). (B) |s input using a range of k values (FDR = 10%, S1..3, Inflation = 1.2). (B) DBF-MCL was run with several microarray datasets as input (including GSE1456) using a range of k values (FDR = 10%, S1..3, Inflation = 2). (8.72 MB TIF) Click here for additional data file. Figure S6 The TBMap plugin. |for technical assistance. References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases: standards and ontologies. Nat Genet 32 Suppl 469 73 12454640 2 Barrett T Edgar R 2006 Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411 352 69 16939800 3 Diehn M Sherlock G Binkley G Jin H Matese JC 2003 SOURCE: a unified genomic resource of functional | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1170 | GSE7904 | 5/30/2007 | ['7904'] | [] | [] | 2620272 | [u'19014681'] | [u'Richardson'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904{{tag}}--REUSE-- Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1171 | GSE7904 | 5/30/2007 | ['7904'] | [] | [] | 2794997 | [u'19934316'] | [u'Richardson'] | ['Wilpan', 'Mills', 'Kamdar', 'Alley', 'Wright', 'Graber', 'Singh', 'Schott'] | [] | Cancer Res | 2009 | 12/15/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
1172 | GSE7905 | 6/4/2007 | ['7905'] | ['3113'] | [u'19014478'] | 2645369 | [u'19014478'] | ['Rakhmatulin', 'Blake', u'Canales', 'Samaha', 'Nikolsky', 'Guryanov', 'Nikolskaya', 'Dosymbekov', u'Kelly', 'Serebriyskaya', 'Li', u'Frances', 'Sviridov', 'Bugrim', 'Shi', 'Dezso', 'Brennan', u'Julie', u'Yongming'] | ['Rakhmatulin', 'Blake', 'Samaha', 'Nikolsky', 'Guryanov', 'Nikolskaya', 'Dosymbekov', 'Serebriyskaya', 'Li', 'Sviridov', 'Bugrim', 'Shi', 'Dezso', 'Brennan'] | ['Rakhmatulin', 'Blake', 'Samaha', 'Nikolsky', 'Guryanov', 'Nikolskaya', 'Dosymbekov', 'Serebriyskaya', 'Li', 'Sviridov', 'Bugrim', 'Shi', 'Dezso', 'Brennan'] | BMC Biol | 2008 | 11/12/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1173 | GSE7908 | 5/30/2007 | ['7908'] | [] | [] | 2529311 | [u'18710484'] | [u'Jafari', u'Mallard', u'Nino-Soto'] | ['Mallard', 'Nino-Soto', 'Bridle', 'Jozani'] | [u'Mallard', u'Nino-Soto'] | BMC Res Notes | 2008 | 6/23/2008 | 0 | no acid residue substitution. cDNA Microarray experiments Complete and detailed information on microarray experimental protocols, the datasets and the platform were submitted to GEO (accession number GSE7908{{tag}}--DEPOSIT--). Experiments are described according to the MIAME standard [ 11 ]. Heterologous hybridizations (bovine probes – porcine targets) were performed to compare the three groups representing def | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1174 | GSE7929 | 10/2/2007 | ['7929'] | [] | [u'18505921'] | 2880990 | [u'20433688'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Berger', 'Michiels', 'Pierre', 'Depiereux', 'DeHertogh', 'Bareke', 'DeMeulder', 'Gaigneaux'] | [] | BMC Cancer | 2010 | 4/30/2010 | 0 | vailable 15 head and neck squamous cell carcinoma VS. 3 corresponding lymph node metastases HG-UgeneFL AE Available 12 head and neck squamous cell carcinoma VS. 11 corresponding lymph node metastases GSE1056 HG-U95Av2 GEO Not available 2 human hepatocellular carcinoma under hypoxia for 2 hours VS. 2 control human hepatocellular carcinoma HG-U95Av2 GEO Not available 2 human hepatocellular carcinoma unde| GSE2280 HG-U133A GEO Available 22 squamous cell carcinoma of the oral cavity VS. 5 corresponding lymph node metastases GSE2603 HG-U133A GEO Available 100 primary breast cancer VS. 21 lung metastases GSE3325 HG-U133Plus2.0 GEO Available 7 primary prostate cancer VS. 6 metastases GSE4086 HG-U133Plus2.0 GEO Available 2 human Burkitt's lymphoma under hypoxia VS. 2 control human Burkitt's lymphoma GSE468 H|ure VS. 12 samples from culture of cutaneous metastasis of melanoma HG-U133B GEO Not available 3 samples from normal melanocyte culture VS. 12 samples from culture of cutaneous metastasis of melanoma GSE4843 HG-U133Plus2.0 GEO Not available 45 samples from culture of cutaneous melanoma metastasis GSE6369 HG-U133Plus2.0 GEO Available 1 primary prostate carcinoma VS. 1 metastatic prostate carcinoma GSE69|VS. 25 metastatic prostate tumors HG-U95B GEO Available 66 primary prostate tumors VS. 25 metastatic prostate tumors HG-U95C GEO Available 65 primary prostate tumors VS. 25 metastatic prostate tumors GSE7929{{tag}}--REUSE-- HG-U133A GEO Available 11 poorly metastatic melanoma VS. 21 highly metastatic melanoma GSE7930 HG-U133A GEO Available 3 poorly metastatic prostate tumors VS. 3 highly metastatic prostate tumors GSE| with GCRMA [ 29 ] with the default parameters. Processing was performed with the Window Welch t test [ 35 ]. Due to a low number of conditions or of replicates to be statistically useful, datasets GSE4843 and GSE6369 could not be analyzed individually. These individual analyses provided one gene list for each dataset or sub-dataset. For each gene list, we ranked the genes in ascending order of the p|h director of FNRS (Fonds National de la Recherche Scientifique, Belgium). We thank J.J. LaPres (Biochemistry and Molecular Biology, Michigan State University, East Lansing) for providing the dataset GSE1056 and K.S. Hoek (Department of Dermatology, University Hospital of Zürich, Zürich) for providing the datasets GSE4840 and GSE4843. Friedl P Wolf K Tumour-cell invasion and migration: | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1175 | GSE7929 | 10/2/2007 | ['7929'] | [] | [u'18505921'] | 2978222 | [u'20969778'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux', 'Habra'] | [] | BMC Bioinformatics | 2010 | 10/22/2010 | 0 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 | |
1176 | GSE7929 | 10/2/2007 | ['7929'] | [] | [u'18505921'] | 2756991 | [u'18505921'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Subramanian', 'Xu', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Ramaswamy', 'Brunet', 'Ross'] | Mol Cancer Res | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
1177 | GSE7930 | 7/2/2007 | ['7930'] | ['2865'] | [u'17640904'] | 2880990 | [u'20433688'] | ['Shen', 'Barry', 'Whittaker', 'Bronson', 'Hynes', 'Crowley', 'Wong', 'Kissil', 'Haack'] | ['Berger', 'Michiels', 'Pierre', 'Depiereux', 'DeHertogh', 'Bareke', 'DeMeulder', 'Gaigneaux'] | [] | BMC Cancer | 2010 | 4/30/2010 | 0 | vailable 15 head and neck squamous cell carcinoma VS. 3 corresponding lymph node metastases HG-UgeneFL AE Available 12 head and neck squamous cell carcinoma VS. 11 corresponding lymph node metastases GSE1056 HG-U95Av2 GEO Not available 2 human hepatocellular carcinoma under hypoxia for 2 hours VS. 2 control human hepatocellular carcinoma HG-U95Av2 GEO Not available 2 human hepatocellular carcinoma unde| GSE2280 HG-U133A GEO Available 22 squamous cell carcinoma of the oral cavity VS. 5 corresponding lymph node metastases GSE2603 HG-U133A GEO Available 100 primary breast cancer VS. 21 lung metastases GSE3325 HG-U133Plus2.0 GEO Available 7 primary prostate cancer VS. 6 metastases GSE4086 HG-U133Plus2.0 GEO Available 2 human Burkitt's lymphoma under hypoxia VS. 2 control human Burkitt's lymphoma GSE468 H|ure VS. 12 samples from culture of cutaneous metastasis of melanoma HG-U133B GEO Not available 3 samples from normal melanocyte culture VS. 12 samples from culture of cutaneous metastasis of melanoma GSE4843 HG-U133Plus2.0 GEO Not available 45 samples from culture of cutaneous melanoma metastasis GSE6369 HG-U133Plus2.0 GEO Available 1 primary prostate carcinoma VS. 1 metastatic prostate carcinoma GSE69|VS. 25 metastatic prostate tumors HG-U95B GEO Available 66 primary prostate tumors VS. 25 metastatic prostate tumors HG-U95C GEO Available 65 primary prostate tumors VS. 25 metastatic prostate tumors GSE7929 HG-U133A GEO Available 11 poorly metastatic melanoma VS. 21 highly metastatic melanoma GSE7930{{tag}}--REUSE-- HG-U133A GEO Available 3 poorly metastatic prostate tumors VS. 3 highly metastatic prostate tumors GSE| with GCRMA [ 29 ] with the default parameters. Processing was performed with the Window Welch t test [ 35 ]. Due to a low number of conditions or of replicates to be statistically useful, datasets GSE4843 and GSE6369 could not be analyzed individually. These individual analyses provided one gene list for each dataset or sub-dataset. For each gene list, we ranked the genes in ascending order of the p|h director of FNRS (Fonds National de la Recherche Scientifique, Belgium). We thank J.J. LaPres (Biochemistry and Molecular Biology, Michigan State University, East Lansing) for providing the dataset GSE1056 and K.S. Hoek (Department of Dermatology, University Hospital of Zürich, Zürich) for providing the datasets GSE4840 and GSE4843. Friedl P Wolf K Tumour-cell invasion and migration: | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1178 | GSE7930 | 7/2/2007 | ['7930'] | ['2865'] | [u'17640904'] | 2978222 | [u'20969778'] | ['Shen', 'Barry', 'Whittaker', 'Bronson', 'Hynes', 'Crowley', 'Wong', 'Kissil', 'Haack'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux', 'Habra'] | [] | BMC Bioinformatics | 2010 | 10/22/2010 | 0 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 | |
1179 | GSE7930 | 7/2/2007 | ['7930'] | ['2865'] | [u'17640904'] | 1924789 | [u'17640904'] | ['Shen', 'Barry', 'Whittaker', 'Bronson', 'Hynes', 'Crowley', 'Wong', 'Kissil', 'Haack'] | ['Shen', 'Barry', 'Whittaker', 'Bronson', 'Hynes', 'Crowley', 'Wong', 'Kissil', 'Haack'] | ['Shen', 'Barry', 'Whittaker', 'Bronson', 'Hynes', 'Crowley', 'Wong', 'Kissil', 'Haack'] | Proc Natl Acad Sci U S A | 2007 | 7/31/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1180 | GSE7931 | 8/23/2007 | ['7931'] | [] | [u'17711596'] | 2323221 | [u'17711596'] | ['Tobe', 'Ooka', 'Watanabe', 'Hayashi', 'Kuhara', 'Asadulghani', 'Nougayr\xc3\xa8de', 'Nakayama', 'Oswald', 'Kurokawa', 'Tashiro', 'Ogura', 'Terajima'] | ['Tobe', 'Ooka', 'Watanabe', 'Hayashi', 'Kuhara', 'Asadulghani', 'Nougayr\xc3\xa8de', 'Nakayama', 'Oswald', 'Kurokawa', 'Tashiro', 'Ogura', 'Terajima'] | ['Tobe', 'Ooka', 'Watanabe', 'Kuhara', 'Asadulghani', 'Hayashi', 'Nougayr\xc3\xa8de', 'Nakayama', 'Oswald', 'Kurokawa', 'Tashiro', 'Ogura', 'Terajima'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1181 | GSE7944 | 5/30/2007 | ['7944'] | ['3146', '2814'] | [u'18179590'] | 2785812 | [u'19917117'] | ['Feng', 'Wang', 'Deng', 'Guo', 'Fan', 'Xiang', 'Yu', 'He'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944{{tag}}--REUSE-- S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1182 | GSE7946 | 5/30/2007 | ['7946'] | [] | [u'17471469', u'18959792'] | 2994899 | [u'21152100'] | ['Oosting', 'van', 'de', 'Szuhai', 'Eilers', 'Lips', 'Morreau', 'Karsten', 'Nanya', 'Ogawa', 'Tollenaar'] | ['Amos', 'Tuna', 'Martens', 'Zhu', 'Smid'] | [] | PLoS One | 2010 | 11/30/2010 | 0 | the region of mutated genes in various cancers, thereby identifying the region for next-generation sequencing. Methods/Principal Findings We retrieved large genomic data sets from the Gene Expression Omnibus database to perform genome-wide analysis of aUPD in breast tumor samples and cell lines using approaches that can reliably detect aUPD. aUPD was identified in 52.29% of the tumor samples. The|cted using AsCNAR/CNAGv3 software ( http://genome.umin.jp ) [34] . The raw data (CEL files) of the Affymetrix GeneChip DNA-mapping microarrays from six sets of breast cancer samples; GSE3743 [5] , GSE7545 [35] , GSE10099 [36] , GSE16619 [37] , GSE19399 [38] and GSE13696) [39] were re|ression Omnibus (GEO) database ( http://www.ncbi.nih.nlm.gov/geo ). The analysis was done by using non-self controls with sex-matched reference samples from HapMap data and from previously published, publicly available datasets; GSE14656 [40] , GSE14860 [41] , GSE10922 [42] , GSE11417 [43] , GSE10092 [44] , GSE9611 � | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1183 | GSE7946 | 5/30/2007 | ['7946'] | [] | [u'17471469', u'18959792'] | 2584339 | [u'18959792'] | ['Oosting', 'van', 'de', 'Szuhai', 'Eilers', 'Lips', 'Morreau', 'Karsten', 'Nanya', 'Ogawa', 'Tollenaar'] | ['Oosting', 'van', 'de', 'Lips', 'Morreau', 'Karsten', 'Eilers', 'Tollenaar'] | ['Oosting', 'van', 'de', 'Lips', 'Morreau', 'Karsten', 'Eilers', 'Tollenaar'] | BMC Cancer | 2008 | 10/29/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1184 | GSE7951 | 6/30/2007 | ['7951'] | [] | [u'17556504'] | 2923530 | [u'20353606'] | ['Kong', 'Yang', 'Xu', 'Xue', 'Li'] | ['Narsai', 'Ivanova', 'Whelan', 'Ng'] | [] | BMC Plant Biol | 2010 | 3/31/2010 | 0 | To compile the entire publically available Affymetrix rice microarray (as at 1stÊAugust 2009), all experiments containing CEL files were downloaded from the Gene Expression Omnibus within the National Centre for Biotechnology Information database or from the MIAME ArrayExpress databasehttp://www.ebi.ac.uk/arrayexpress/.Ê --REUSE-- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1185 | GSE7951 | 6/30/2007 | ['7951'] | [] | [u'17556504'] | 2633829 | [u'19074628'] | ['Kong', 'Yang', 'Xu', 'Xue', 'Li'] | ['Millar', 'Usadel', 'Carroll', 'Narsai', 'Ivanova', 'Howell', 'Lohse', 'Whelan'] | [] | Plant Physiol | 2009 | 2009 Feb | 1 | later time points. This group was enriched in transcription factors and signal transduction components. A subset of these transiently expressed transcription factors were further interrogated across publicly available rice array data, indicating that some were only expressed during the germination process. Analysis of the 1-kb upstream regions of transcripts displaying similar changes in abundance identi| in the transcript abundance observed for over 6,000 genes in cluster 1). To determine if these were specific to the process of germination, we analyzed the expression of transcription factors across publicly available rice Affymetrix microarray data. These included analyses of over 30 microarrays from different tissues and stress treatments, and following normalization, all data were made relative to max|ession. A, Analysis of the expression profiles of 34 transcription factors that displayed between 70% and 100% of their maximum expression at 1 and 3 HAI in the germination time course, compared with publicly available array data for a variety of rice tissues and treatments. Boxed in yellow are transcription factors that appeared only to be induced during germination (i.e. in this study). Boxed in blue ar|calization. To date, only a few large-scale localization studies have been carried out, so less than 300 could be assigned in this way. In order to overcome this, all protein sequence information was downloaded for the 24,150 genes, and four primary sources were employed: (1) experimentally shown localization based on protein work ( Heazlewood et al., 2003 ; Howell et al., 2006 , 2007 ; Kleffmann et al|ss different tissues under different conditions and compare these with the germination transcript abundance profiles generated from this study, rice array data were retrieved from the Gene Expression Omnibus within the National Center for Biotechnology Information database. All data were MAS5.0 normalized and normalized against average ubiquitin expression for that array. These normalized array data were|r data relative to this. This normalization allowed cross-comparison of arrays from all of the different studies at once. The arrays analyzed included all of the arrays from this study, together with publicly available rice genome arrays carried out from different tissues/conditions, including 7-d-old seedlings that were untreated, drought stressed, salt stressed, or cold stressed ( GSE6901 ; Jain et al.| d following pollination, 10-d-old embryos, 10-d-old endosperms, seedling roots, seedling shoots, unpollinated stigmas (at antithesis), ovaries (at antithesis), mature anthers, and suspension cells ( GSE7951{{tag}} ; Li et al., 2007 ); aerobically grown coleoptiles (4 d) and anoxically grown coleoptiles (4 d; GSE6908 ; Lasanthi-Kudahettige et al., 2007 ); crowns and growing points under salt stress and con|and tolerant mutants in subspecies indica and japonica ( GSE4438 ; Walia et al., 2007 ); crowns and growing points under control and salt stress conditions in subspecies indica and japonica (GDS1383; Walia et al., 2005 ); and leaves following biotic stress and control treatments ( GSE7256 ; Ribot et al., 2008 ). Promoter Motif Analysis Following expression analysis, distinct groups of transc|ome “peaking” subsets, where a peak is as defined above. 3′ UTR Sequence Analysis The full genome 3′ UTR and 5′ UTR sequences are available from TIGR. This was downloaded and filtered to retain only the 3′ UTRs. However, this only added up to 3,027 UTRs available for the “whole genome.” Taking this small number into consideration, it was not |uer et al., 2005 ), and normalization of matched peak areas to the peak area of the internal standard, ribitol, and to fresh tissue weight of extracted samples. The MSRI library was constructed using publicly available AMDIS software (version 2.65) to extract MSRI information for authentic standard derivatives from standard runs and MSRI information for unknown analytes from representative analyses of com | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1186 | GSE7951 | 6/30/2007 | ['7951'] | [] | [u'17556504'] | 2633852 | [u'19010998'] | ['Kong', 'Yang', 'Xu', 'Xue', 'Li'] | ['Taylor', 'Millar', 'Eubel', 'Narsai', 'Whelan', 'Huang'] | [] | Plant Physiol | 2009 | 2009 Feb | 1 | ntal stages and in response to stresses. To analyze the gene expression pattern, we extracted the available rice microarray data from the National Center for Biotechnology Information gene expression omnibus ( http://www.ncbi.nlm.nih.gov/geo ) of six independent studies with relevance for mitochondrial function ( Walia et al., 2005 , 2007 ; Jain et al., 2007 ; Lasanthi-Kudahettige et al., 2007 ; Li e|anges across different tissues and under different conditions and to compare these with the obtained germination transcript abundance profiles, rice array data were retrieved from the Gene Expression Omnibus within the National Center for Biotechnology Information database ( GSE6901 , GSE7951{{tag}}--REUSE-- GSE6908 , GSE4438 , GDS1383, and GSE7256 ). All data were MAS5.0 normalized and normalized against average u|all other data relative to this. This normalization allowed cross-comparison of arrays from all of the different studies at once. The arrays analyzed included all arrays from this study together with publicly available rice genome arrays carried out from different tissues/conditions. Hierarchical clustering across all of the arrays was carried out with average linkage clustering based on Euclidian distanc | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1187 | GSE7951 | 6/30/2007 | ['7951'] | [] | [u'17556504'] | 2825235 | [u'20109239'] | ['Kong', 'Yang', 'Xu', 'Xue', 'Li'] | ['Ghanashyam', 'Jain', 'Bhattacharjee'] | [] | BMC Genomics | 2010 | 1/29/2010 | 0 | olved in a particular biological process. In the second approach, we used the microarray data for various tissues/organs and developmental stages available at GEO database under the accession numbers GSE6893 and GSE7951{{tag}}--REUSE--. The series GSE6893 includes microarray data from 45 hybridizations representing three biological replicates each of 15 different tissues/organs and developmental stages [ 30 ], whereas| GSE7951{{tag}}--REUSE-- includes the microarray data from 12 hybridizations representing 9 different tissue samples [ 33 ]. Because three biological replicates were available only for stigma and ovary in the series GSE7951{{tag}}--REUSE-- dataset, only these data were used in this analysis. All the tissues/organs and developmental stages for which microarray data was analyzed in this study are summarized in Additional file 5 . The |ted [ 6 , 40 ]. To study the effect of various abiotic stresses (desiccation, salt, cold and arsenate) on the expression profiles of GST genes, microarray data available under series accession number GSE6901 [ 30 ] was analyzed. Differential expression analysis for rice seedlings treated with different abiotic stresses (desiccation, salt and cold) as compared to mock-treated control seedlings was perfo|vely. The control seedlings were kept in water for 3 h, at 28 ± 1°C. Microarray data analysis The microarray data publicly available at GEO database under the series accession numbers GSE6893 (expression data for reproductive development), GSE7951{{tag}}--REUSE-- (expression profiling of stigma), GSE6901 (expression data for stress treatment), GSE4471 (expression data from rice varieties Azucena and Ba| arsenate), GSE5167 (expression data for auxin and cytokinin response), GSE6719 (expression data for cytokinin response), GSE7256 (expression data for virulent infection by Magnaporthe grisea ), and GSE10373 (expression data for interaction with the parasitic plant Striga hermonthica ) were used for expression analysis of rice GST genes. The entire microarray experiments used in this study are listed | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1188 | GSE7951 | 6/30/2007 | ['7951'] | [] | [u'17556504'] | 2528094 | [u'18650402'] | ['Kong', 'Yang', 'Xu', 'Xue', 'Li'] | ['Liu', 'Zhao', 'Lu', 'Han', 'Huang'] | [] | Plant Physiol | 2008 | 2008 Sep | 0 | a sequences, by BACs directly or by 87 assembled contigs, were performed. The alignment results of BGI 93-11 contigs and Nipponbare pseudomolecules, which were generated by the software nucmer, were downloaded using the GFF Dumper on the TIGR Genome Browser. We found that a small quantity of anchor results were self-contradictory; that is, two 93-11 contigs that localized on the same location yielded opp| more than 100 bp were further confirmed by BLAST2. The indica Guangluai 4 BACs were obtained from http://www.ncgr.ac.cn/chinese/databasei.htm . The genomic sequences of japonica Nipponbare were downloaded from http://www.tigr.org/tdb/e2k1/osa1 , and the indica 93-11 sequences were downloaded from ftp://ftp.genomics.org.cn . Mining of TIPs in the Rice Genome For each insertion region identified a| known TE repeat databases using RepeatMasker, as described above. Those elements, which were composed of a single LTR, were recognized as solo LTR retroelements. EST Analysis and Gene Prediction All publicly available rice ESTs were obtained from the National Center for Biotechnology Information EST database ( http://www.ncbi.nlm.nih.gov/projects/dbEST/ ). Full-length cDNAs of both KOME ( http://red.dna.|t the two gene fragments of indica XIP-I separated by a TE insertion. The probes in the two probe sets were remapped to the rice genomes, Nipponbare pseudomolecules and 93-11 contigs, by BLASTN. We downloaded the microarray data files of each experiment from the GEO Web site ( http://www.ncbi.nlm.nih.gov/geo/ ). Overall, there are 57 chips of indica IR64 (45 from GSE6893 and 12 from GSE6901 ) and 4 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1189 | GSE7951 | 6/30/2007 | ['7951'] | [] | [u'17556504'] | 2726226 | [u'19604350'] | ['Kong', 'Yang', 'Xu', 'Xue', 'Li'] | ['Feng', 'Zhu', 'Wang', 'Zhang', 'Liu', 'Wu'] | [] | BMC Genomics | 2009 | 7/15/2009 | 0 | r, a site-specific posterior analysis was used to predict amino acid residues that were crucial for functional divergence. Investigation of transcription patterns Gene expression microarray datasets (GSE7951{{tag}}--REUSE--, GSE13161, GSE6893, GSE6908, and GSE6901 for rice; GSE680, GSE7641, and GSE8365 for Arabidopsis ) were downloaded from the GEO database in NCBI. The microarray data of rice include the analysis of | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1190 | GSE7955 | 10/15/2007 | ['7955'] | ['2998'] | [] | 2626604 | [u'19017407'] | [u'Wright', u'Crofton', u'Harrill', u'Li'] | ['Crofton', 'Tornero-Velez', 'Wright', 'Radio', 'Harrill', 'Mundy', 'Li'] | [u'Wright', u'Crofton', u'Harrill', u'Li'] | BMC Genomics | 2008 | 11/18/2008 | 0 | ioD and Cre ) increased linearly on all arrays. Gene expression profiles for this experiment have been archived in the NCBI Gene Expression Omnibus (GEO) repository with the series accession number GSE7955{{tag}}--DEPOSIT--. Microarray Data Analysis Expression summaries were calculated using RMAExpress © v4.7 (University of California at Berkeley). Consistent with previous reports, Robust Multiarray Average ( | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1191 | GSE7956 | 10/2/2007 | ['7956'] | [] | [u'18505921'] | 2978222 | [u'20969778'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux', 'Habra'] | [] | BMC Bioinformatics | 2010 | 10/22/2010 | 0 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 | |
1192 | GSE7956 | 10/2/2007 | ['7956'] | [] | [u'18505921'] | 2880990 | [u'20433688'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Berger', 'Michiels', 'Pierre', 'Depiereux', 'DeHertogh', 'Bareke', 'DeMeulder', 'Gaigneaux'] | [] | BMC Cancer | 2010 | 4/30/2010 | 0 | Abstract article-meta The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes th|ferences  The rapid accumulation of high-throughput genomic data offers an unprecedented opportunity to study human diseases. The National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) ( 1 ) with more than 330,000 gene expression profiles and an annual growth rate of 150%, is currently the largest database of its kind. The GEO systematically documents the molecular basis of m|rmance after Stage II refinement. ( B ) An example illustrating the error correction by the Stage II refinement. The query profile studies uterine leiomyomas obtained from fibroid afflicted patients (GDS484). The profile is annotated with four concepts by UMLS text mapping: Connective/Soft Tissue Neoplasm, Muscle tissue neoplasm, fibroid tumor, and uterine fibroids. The Stage I diagnosis predicted four| prediction is later corrected by Stage II refinement. ( C ) The figure presents the 110 disease classes and their hierarchical relationships. The red nodes represent diagnosed disease concepts for GDS563: (1) Nervous system disorder (2) Neuromuscular diseases (3) Myopathy (4) Musculoskeletal diseases (5) Congenital, Hereditary, and Neonatal diseases and abnormalities (CHNDA) (6) Genetic diseases, in| prediction performance decreases with the data reduction. Table 1. Prediction result of a subset of prevalent diseases We further exemplify the performance of our approach using the NCBI GEO dataset GDS563. This dataset was produced to identify modifying factors and pathogenic pathways involved in Duchenne Muscular Dystrophy (DMD). It consists of 24 microarrays from two subsets: 12 from DMD patients a|mance is shown in SI Text . A closer examination of the results shows further interesting features of our method. One example comes from the result for a query profiling the T-cells of HIV patients (GDS2649). Even though HIV is not included in the 110 disease classes of our diagnosis database due to the lack of sufficient training data, we obtain the relevant concept RNA virus infection that can descr|.org/cgi/content/full/0912043107/DCSupplemental .  Other Sections� Abstract Results Discussion Methods Supplementary Material References References 1. Edgar R, Domrachev M, Lash AE. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002; 30 :207�210. [ PMC free article ] [ PubMed ] 2. Horton PB, Kiseleva L, Fujibuchi W. RaPiDS: an algorith|ssion: directed search of large microarray compendia. Bioinformatics. 2007; 23 (20):2692�2699. [ PubMed ] 5. Zhu Y, et al. GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008; 24 (23):2798�2800. [ PMC free article ] [ PubMed ] 6. Shah NH, et al. Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics|e expression: directed search of large microarray compendia. Bioinformatics. 2007 Oct 15; 23(20):2692-9. [Bioinformatics. 2007] GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008 Dec 1; 24(23):2798-800. [Bioinformatics. 2008] Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics. 2009 Feb 5; 10 Suppl 2():S1. |e expression: directed search of large microarray compendia. Bioinformatics. 2007 Oct 15; 23(20):2692-9. [Bioinformatics. 2007] GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008 Dec 1; 24(23):2798-800. [Bioinformatics. 2008] Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics. 2009 Feb 5; 10 Suppl 2():S1. | 0 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1193 | GSE7956 | 10/2/2007 | ['7956'] | [] | [u'18505921'] | 2756991 | [u'18505921'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Subramanian', 'Xu', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Ramaswamy', 'Brunet', 'Ross'] | Mol Cancer Res | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
1194 | GSE7957 | 10/8/2007 | ['7957'] | ['3040'] | [u'17923522'] | 2168342 | [u'17923522'] | ['Birkland', 'McGuire', 'Parks', 'Gharib', 'Kassim', 'Mecham'] | ['Birkland', 'McGuire', 'Parks', 'Gharib', 'Kassim', 'Mecham'] | ['Birkland', 'McGuire', 'Parks', 'Gharib', 'Kassim', 'Mecham'] | Infect Immun | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1195 | GSE7965 | 6/13/2007 | ['7965'] | [] | [u'18344981', u'20062063'] | 2748096 | [u'19728865'] | ['Lamb', 'Leonardson', 'Gunnarsdottir', 'Thorsteinsdottir', 'Gulcher', 'Fossdal', 'Bjornsdottir', 'Jonasdottir', 'Leifsson', 'Schadt', 'Zhang', 'L\xc3\xb8chen', 'Reitman', 'Holm', 'Emilsson', 'Hald', 'Stoltenberg', 'Mouy', 'Steinthorsdottir', 'Stefansson', 'Gretarsdottir', 'Zhu', 'Kristjansson', 'Gudbjartsson', 'Thorgeirsson', 'Hveem', 'Gudjonsson', 'Wilsgaard', 'Arnar', 'Magnusson', 'Nj\xc3\xb8lstad', 'Styrkarsdottir', 'Gislason', 'Reynisdottir', 'Thorleifsson', 'Mathiesen', 'Helgason', 'Carlson', 'Nyrnes', 'Kong', 'Helgadottir', 'Eiriksdottir', 'Stefansdottir', 'Zink', 'Walters'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | [] | BMC Genomics | 2009 | 9/3/2009 | 1 | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192 (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965{{tag}}--REUSE-- GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381 GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029 GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615 GPL2884 Cancer Process Study 0.752 0.0107 GSE7600 GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192 GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625 GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528 GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1196 | GSE7966 | 7/25/2007 | ['7966'] | [] | [u'18577215'] | 2443805 | [u'18577215'] | ['Muthaiyan', 'McDowell', 'Giotis', 'Blair', u'Wilkinsons', 'Wilkinson'] | ['Wilkinson', 'Muthaiyan', 'Blair', 'McDowell', 'Giotis'] | ['Muthaiyan', 'Wilkinson', 'McDowell', 'Blair', 'Giotis'] | BMC Microbiol | 2008 | 6/24/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1197 | GSE7967 | 11/1/2007 | ['7967'] | [] | [u'18039032'] | 2082467 | [u'18039032'] | ['Bhattacharya', 'Valiathan', 'Soontararuks', 'Svensson', 'Kandjanapa', 'Navasumrit', 'Nookabkaew', 'Samson', 'Fry', 'Mahidol', 'Ruchirawat', 'Hogan', 'Luo'] | ['Bhattacharya', 'Valiathan', 'Soontararuks', 'Svensson', 'Kandjanapa', 'Navasumrit', 'Nookabkaew', 'Samson', 'Fry', 'Mahidol', 'Ruchirawat', 'Hogan', 'Luo'] | ['Bhattacharya', 'Valiathan', 'Soontararuks', 'Svensson', 'Kandjanapa', 'Navasumrit', 'Nookabkaew', 'Samson', 'Fry', 'Mahidol', 'Ruchirawat', 'Hogan', 'Luo'] | PLoS Genet | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1198 | GSE7971 | 6/2/2007 | ['7971'] | ['3039'] | [] | 2654745 | [u'18988674'] | [u'Puttaiah', u'Rudraiah', u'Samji'] | ['Jayaram', 'Priyanka', 'Sridaran', 'Medhamurthy'] | [] | Endocrinology | 2009 | 2009 Mar | 0 | s collected from monkeys treated with VEH or CET for 24 h; the hybridization details and individual CEL and CHP files have been deposited at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE8371 . Microarray analysis results revealed that inhibition of pituitary LH secretion by CET treatment significantly ( P < 0.05) affected the expression of 3949 genes (>2-fold change wi|he CL of monkeys treated with CET plus PBS and CET plus rhLH, hybridization details, and individual CEL and CHP files have been deposited at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE7827 . Microarray analysis results revealed that replacement of exogenous rhLH after inhibition of pituitary LH secretion by CET treatment significantly ( P < 0.05) affected the expression of 4|3b1; -treated monkeys. Microarray comparison analysis, hybridization details, and individual CEL and CHP files have been deposited online at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE7971{{tag}}--DEPOSIT-- . Microarray analysis results revealed that PGF 2α treatment significantly ( P < 0.05) affected the expression of 2290 genes in the CL (>2-fold change with Benjamini and H|lysis data generated for different stages of the CL of the rhesus monkey deposited recently in the public domain by Bogan et al . ( 70 ) at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE10367 . The analysis of microarray data between late midluteal phase and very late luteal phase by GeneSifter software revealed that 2882 genes were differentially regulated (1280 and 1522 up-regulation | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
1199 | GSE8004 | 6/25/2007 | ['8004'] | [] | [u'17612406'] | 2875035 | [u'20097653'] | ['Miller', 'Von', 'Fox', 'Roy', 'Watson', 'Olszewski', 'Spencer'] | ['Ruppin', 'Sharan', 'Shlomi', 'Tuller', 'Waldman'] | [] | Nucleic Acids Res | 2010 | 2010 May | 0 | e). GE data All expression data was downloaded from Gene Expression Omnibus ( 34 ) ( http://www.ncbi.nlm.nih.gov/geo/ ). Human tissues (including fetal tissues): we used the GE of Su et al. ( 35 ) (GDS596). As the original data set is redundant (i.e. it includes similar tissues; for example, more than 20 of the tissues are from different parts of the brain) we focused our analysis on 30 (out of 79) n|ssues ( Supplementary Table S2 ). Other GE sets: fetal and adult circulating blood reticulocytes (GDS2655), Mouse tissues (GDS592), Mouse fetal and adult liver (GSE13149), Mouse embryonic stem cells (GDS2666), Yeast (GDS772, wild type), Chimpanzee (GSE7540), Rat (GDS589, three strains), E. coli (GSE6836), D. melanogaster (GSE7763) and C. elegans (GSE8004{{tag}}--REUSE--). We averaged technical repeats and probes | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1200 | GSE8004 | 6/25/2007 | ['8004'] | [] | [u'17612406'] | 2556388 | [u'18836535'] | ['Miller', 'Von', 'Fox', 'Roy', 'Watson', 'Olszewski', 'Spencer'] | ['Abbott', 'Rubin', 'Halpern', 'Ramsey', 'Stephan', 'Hen', 'Alter'] | [] | PLoS One | 2008 | 10/6/2008 | 1 | AND pmc_gds | 0 | 1 | ||||
1201 | GSE8004 | 6/25/2007 | ['8004'] | [] | [u'17612406'] | 2323220 | [u'17612406'] | ['Miller', 'Von', 'Fox', 'Roy', 'Watson', 'Olszewski', 'Spencer'] | ['Miller', 'Von', 'Fox', 'Roy', 'Watson', 'Olszewski', 'Spencer'] | ['Miller', 'Von', 'Fox', 'Roy', 'Watson', 'Olszewski', 'Spencer'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1202 | GSE8005 | 12/26/2007 | ['8005'] | [] | [u'18213484'] | 2921062 | [u'18213484'] | ['Neumann', 'Auslander', 'Ophir', 'Tom', 'Chalifa-Caspi', 'Yudkovski', 'Reinhardt', 'Herut'] | ['Neumann', 'Auslander', 'Ophir', 'Tom', 'Chalifa-Caspi', 'Yudkovski', 'Reinhardt', 'Herut'] | ['Neumann', 'Auslander', 'Ophir', 'Tom', 'Chalifa-Caspi', 'Yudkovski', 'Reinhardt', 'Herut'] | Mar Biotechnol (NY) | 2008 | 2008 May-Jun | 0 | AND pmc_gds | 1 | 0 | ||||
1203 | GSE8006 | 6/6/2007 | ['8006'] | ['2968'] | [u'18439305'] | 2408602 | [u'18439305'] | ['Ohnishi', 'Watanabe', 'Ishige', 'Anjiki', 'Imamura', 'Iizuka', 'Hioki', 'Takashima', 'Nishiyama', 'Yamamoto', 'Munakata'] | ['Ohnishi', 'Watanabe', 'Ishige', 'Anjiki', 'Imamura', 'Iizuka', 'Hioki', 'Takashima', 'Nishiyama', 'Yamamoto', 'Munakata'] | ['Ohnishi', 'Watanabe', 'Ishige', 'Anjiki', 'Imamura', 'Iizuka', 'Hioki', 'Takashima', 'Nishiyama', 'Yamamoto', 'Munakata'] | BMC Genomics | 2008 | 4/26/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1204 | GSE8014 | 10/1/2007 | ['8014'] | [] | [u'17586671'] | 1950988 | [u'17586671'] | ['Veening', 'Lulko', 'Beekman', 'Bron', 'Smits', 'Buist', 'Blom', 'Kuipers'] | ['Veening', 'Lulko', 'Beekman', 'Bron', 'Smits', 'Buist', 'Blom', 'Kuipers'] | ['Veening', 'Lulko', 'Beekman', 'Bron', 'Smits', 'Buist', 'Blom', 'Kuipers'] | Appl Environ Microbiol | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
1205 | GSE8019 | 11/12/2007 | ['8019'] | [] | [u'17925015'] | 2148066 | [u'17925015'] | ['Hakvoort', 'van', 'Vermeulen', 'Gilhuijs-Pederson', 'Nikolsky', 'Evelo', u'Sokolovi\u0107', 'Wehkamp', 'Sokolovi\xc4\x87', 'Lamers'] | ['Hakvoort', 'van', 'Vermeulen', 'Gilhuijs-Pederson', 'Nikolsky', 'Evelo', 'Wehkamp', 'Sokolovi\xc4\x87', 'Lamers'] | ['Hakvoort', 'van', 'Vermeulen', 'Gilhuijs-Pederson', 'Nikolsky', 'Evelo', 'Wehkamp', 'Sokolovi\xc4\x87', 'Lamers'] | BMC Genomics | 2007 | 10/9/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1206 | GSE8023 | 12/20/2007 | ['8023'] | [] | [u'17975013'] | 2447788 | [u'18463138'] | ['Krejci', 'Geiger', 'Andreassen', 'Wunderlich', 'Schleimer', 'Jansen', 'Mulloy', 'Chou'] | ['Brandt', "'t", 'den', 'Villerius', 'Ivliev'] | [] | Nucleic Acids Res | 2008 | 7/1/2008 | 1 | 2018;p53[Title] or tp53[Title]’ yielded 30 GEO Series (as on 12 January 2008). Searching for GEO Series with the same query via PubMed added 11 more experiments. Four of these 11 experiments (GSE2155, GSE3072, GSE7678 and GSE8023{{tag}}--REUSE--) contain neither of the terms ‘p53’ or ‘tp53’ in their titles nor in the GEO annotation (except for the ‘citation’ fiel | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1207 | GSE8023 | 12/20/2007 | ['8023'] | [] | [u'17975013'] | 2234055 | [u'17975013'] | ['Krejci', 'Geiger', 'Andreassen', 'Wunderlich', 'Schleimer', 'Jansen', 'Mulloy', 'Chou'] | ['Krejci', 'Geiger', 'Andreassen', 'Wunderlich', 'Schleimer', 'Jansen', 'Mulloy', 'Chou'] | ['Krejci', 'Geiger', 'Andreassen', 'Wunderlich', 'Jansen', 'Schleimer', 'Mulloy', 'Chou'] | Blood | 2008 | 2/15/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1208 | GSE8024 | 6/25/2007 | ['8024'] | [] | [u'17603471'] | 2657164 | [u'19094206'] | ['Wernig', 'Xie', 'Nusbaum', 'Alvarez', 'Bernstein', 'Lander', 'Lee', 'Issac', 'Mendenhall', 'Brockman', 'Kim', 'Koche', 'Presser', 'Jaffe', 'Lieberman', 'Mikkelsen', 'Ku', 'Russ', "O'Donovan", 'Giannoukos', 'Jaenisch', 'Meissner'] | ['Chepelev', 'Won', 'Ren', 'Wang'] | [] | BMC Bioinformatics | 2008 | 12/18/2008 | 0 | e unambiguously assigned to a gene, namely located 2.5 Kb within an annotated TSS. Mikkelsen et. al conducted replicate measurements of gene expression in the same cell lines (GEO accession number is GSE8024{{tag}}--REUSE--). There were 13482 unique genes in their experiments. The numbers of active and inactive genes in each cell line were counted using the majority rule in the replicate experiments and the genes with Based on the gene expression measured by Mikkelsen et al., we randomly selected 200 active and 200 inactive promoters in the ES cells as the training set. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1209 | GSE8024 | 6/25/2007 | ['8024'] | [] | [u'17603471'] | 2941458 | [u'20862250'] | ['Wernig', 'Xie', 'Nusbaum', 'Alvarez', 'Bernstein', 'Lander', 'Lee', 'Issac', 'Mendenhall', 'Brockman', 'Kim', 'Koche', 'Presser', 'Jaffe', 'Lieberman', 'Mikkelsen', 'Ku', 'Russ', "O'Donovan", 'Giannoukos', 'Jaenisch', 'Meissner'] | ['Izpis\xc3\xbaa', 'Barrero', 'Paramonov', 'Bou\xc3\xa9'] | [] | PLoS One | 2010 | 9/17/2010 | 0 | sor genes differing in iPSCs from ESCs, and which could be the source of higher risks. Materials and Methods Gene expression analysis The datasets used for the human analyses are: Takahashi et al. (GSE9561) [5] ; Yu et al. (GSE9071) [7] ; Park et al. (GSE9832) [6] ; Zhao et al. (GSE12922) [52] ; Masaki et al. (GSE9709) �|2390) [30] ; Aasen et al. (GSE12583) [37] ; Huangfu et al. (pers. comm.) [53] ; Lowry et al. (GSE9865) [54] ; Ebert et al. (GSE13828) [15] ; Yu et al. (GSE15148) [55] ; Soldner et al. (GSE14711) [11] . The datasets used for the mouse analyses are: Takahashi et al. (GSE525|7815) [30] ; Feng et al. (GSE13211) [56] ; Sridharan et al. (GSE14012) [35] ; Wernig et al. (E-MEXP1037) [4] ; Chen et al . (GSE15267); Zhou et al . (GSE16062) [57] ; Zhao et al . (GSE16925) [20] ; Kang et al . (GSE17004) [19] ; Heng et al . (GSE19023) [58] ; Ic|(wpr =  pr *weight). Next, the average percentrank and the average weighted percentrank were identified for the replicates of each sample. In addition, for the dataset GSE7841 we have averaged the available iPSCs samples (day2, day16, day17 and day18). For the dataset E-MEXP-1037 we have averaged iPSCs samples (clones 8 and 18). For the dataset GSE13211 we have averaged |OSCE (clones 8 and 13) and iPSCs OSE (clones T8 and T9) samples. For the dataset GSE14012 we have averaged ESCs (v6.5 and E14), MEFs (male and female) and iPSCs (1D4 and 2D4) samples. For the dataset GSE15267 we have averaged ESCs (CGR8 and R1), iPSCs reprogrammed with four factors (S2C12 and S2C16) and iPSCs reprogrammed with 3 factors (S53C1 and S53C5). For the dataset GSE19023 we have averaged MEFs |teworthy that the bivalent genes profiles of the iPSC lines described to contribute to viable mice through tetraploid complementation assay (the most stringent proof of pluripotency available so far, GSE16925 and GSE17004) have the highest correlation coefficients when compared with the ESC lines. As expected, the correlation between bivalent genes profiles of fibroblasts and ESCs is very low and espec|es whose expression in iPSCs could restrict or at least bias the differentiation potential. Encouragingly, the iPSC lines that were shown to generate viable mice by tetraploid complementation assays (GSE16925 and GSE17004) express none to very few of such genes, whereas the first iPSCs generated that did not contribute to the germline (GSE5259), as well as the partially reprogrammed iPSC lines (GSE1401|umber of these potentially troublesome genes ( Figure 4 ). For example, the partially reprogrammed iPSC lines 1A2 and 1B3 (GSE14012), as well as the Fbx15KO iPSC line, which showed a limited potency (GSE5259), express Hoxc8, which is a homeodomain gene important for early embryogenesis, especially for neural development, and whose expression level is normally tightly regulated [78] an|133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) Phalanx Human one aray (GPL6254) Affy HG-U133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) GEO accession number GSE12390 Pers. Comm GSE9865 GSE12583 GSE12922 GSE13828 GSE15148 GSE14711 Corr coeff whole array iPS/ES: average (min-max) Primary iPS: 0.988 (0.984–0.989) Secondary iPS: 0.991 (0.990–0.991) | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1210 | GSE8024 | 6/25/2007 | ['8024'] | [] | [u'17603471'] | 2765275 | [u'19687145'] | ['Wernig', 'Xie', 'Nusbaum', 'Alvarez', 'Bernstein', 'Lander', 'Lee', 'Issac', 'Mendenhall', 'Brockman', 'Kim', 'Koche', 'Presser', 'Jaffe', 'Lieberman', 'Mikkelsen', 'Ku', 'Russ', "O'Donovan", 'Giannoukos', 'Jaenisch', 'Meissner'] | ['Andersson', 'Rada-Iglesias', 'Wadelius', 'Komorowski', 'Enroth'] | [] | Genome Res | 2009 | 2009 Oct | 0 | l. 1996 ; Kogan and Trifonov 2005 ). Positioned nucleosomes at these junctions were proposed to protect splice sites at exon starts from mutations ( Kogan and Trifonov 2005 ). In this letter, we use publicly available data from genome-wide studies to examine: (1) if nucleosomes are positioned at internal exons; (2) if certain histone modifications are preferentially found at internal exons compared to fi|ns Methods References Results and Discussion Nucleosomes are well positioned at internal exons independent of transcription level In order to investigate intragenic nucleosome positioning we analyzed publicly available sequencing data from human CD4 + T-cells ( Schones et al. 2008 ) and Caenorhabditis elegans cells ( Valouev et al. 2008 ). We constructed footprints of these nucleosome signals centered |odified in a transcription-dependent manner, rather than remodeled. For humans, genome-wide data for 38 different histone methylations ( Barski et al. 2007 ) and acetylations ( Wang et al. 2008 ) are publicly available. These data are generated from the same cell type, i.e., CD4 + T-cells, as the nucleosome data ( Schones et al. 2008 ). Human gene-expression measurements for this cell type are also avail|clusions Methods References Methods Annotations Nucleosome and histone modification data for H. sapiens resting CD4 + T-cells ( Barski et al. 2007 ; Schones et al. 2008 ; Wang et al. 2008 ) were publicly available in hg18 (March 2006) coordinates. The exon-expression ( GSE11384 ) ( Oberdoerffer et al. 2008 ) and gene-expression data ( Su et al. 2004 ) for human CD4 + T-cells were annotated to Ensemb|e6 (May 2008) coordinates, and we used the corresponding Ensembl database, release 50 (WS190), as the source of annotations. The Mus musculus embryonic stem cell data ( Mikkelsen et al. 2007 ) were publicly available in mm8 (Feb. 2006) coordinates. Gene annotations for the array platform used for the mouse expression experiment ( GSE8024{{tag}}--REUSE-- ) ( Mikkelsen et al. 2007 ) were extracted from the Ensembl databa|ording to expression We categorized the Ensembl genes into three groups according to gene-expression measurements from the human CD4 + T-cells ( Su et al. 2004 ) and the mouse embryonic stem cells ( GSE8024{{tag}}--REUSE-- ) ( Mikkelsen et al. 2007 ), respectively. High-expressed genes were defined with an expression level above one standard deviation over the mean. The groups of medium- and low-expressed genes were | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1211 | GSE8024 | 6/25/2007 | ['8024'] | [] | [u'17603471'] | 2904778 | [u'20657823'] | ['Wernig', 'Xie', 'Nusbaum', 'Alvarez', 'Bernstein', 'Lander', 'Lee', 'Issac', 'Mendenhall', 'Brockman', 'Kim', 'Koche', 'Presser', 'Jaffe', 'Lieberman', 'Mikkelsen', 'Ku', 'Russ', "O'Donovan", 'Giannoukos', 'Jaenisch', 'Meissner'] | ['Laframboise', 'Adams', 'Akhtar-Zaidi', 'Schnetz', 'Tesar', 'Wei', 'Handoko', 'Flicek', 'Bartels', 'Scacheri', 'Fisher', 'Pereira', 'Crawford'] | [] | PLoS Genet | 2010 | 7/15/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1212 | GSE8024 | 6/25/2007 | ['8024'] | [] | [u'17603471'] | 2829121 | [u'20107439'] | ['Wernig', 'Xie', 'Nusbaum', 'Alvarez', 'Bernstein', 'Lander', 'Lee', 'Issac', 'Mendenhall', 'Brockman', 'Kim', 'Koche', 'Presser', 'Jaffe', 'Lieberman', 'Mikkelsen', 'Ku', 'Russ', "O'Donovan", 'Giannoukos', 'Jaenisch', 'Meissner'] | ['Wernig', 'Ostermeier', 'Vierbuchen', 'Kokubu', 'Pang', 'S\xc3\xbcdhof'] | ['Wernig'] | Nature | 2010 | 2/25/2010 | 0 | searched for genes specifically expressed in neural tissues. Those) were selected based on published expression arrays of MEFs, ES cells and neural progenitor cells retrieved from the Gene Expression Omnibus database ( GSE8024{{tag}} , http://www.ncbi.nlm.nih.gov/gds ) and the EST Profile function of NCBI’s Unigene database ( http://www.ncbi.nlm.nih.gov/unigene ). cDNAs for the factors included in the | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1213 | GSE8024 | 6/25/2007 | ['8024'] | [] | [u'17603471'] | 2921165 | [u'17603471'] | ['Wernig', 'Xie', 'Nusbaum', 'Alvarez', 'Bernstein', 'Lander', 'Lee', 'Issac', 'Mendenhall', 'Brockman', 'Kim', 'Koche', 'Presser', 'Jaffe', 'Lieberman', 'Mikkelsen', 'Ku', 'Russ', "O'Donovan", 'Giannoukos', 'Jaenisch', 'Meissner'] | ['Wernig', 'Xie', 'Nusbaum', 'Alvarez', 'Bernstein', 'Lander', 'Lee', 'Issac', 'Mendenhall', 'Brockman', 'Kim', 'Koche', 'Presser', 'Jaffe', 'Lieberman', 'Mikkelsen', 'Ku', 'Russ', "O'Donovan", 'Giannoukos', 'Jaenisch', 'Meissner'] | ['Wernig', 'Xie', 'Nusbaum', 'Alvarez', 'Bernstein', 'Lander', 'Lee', 'Issac', 'Mendenhall', 'Brockman', 'Kim', 'Koche', 'Presser', 'Jaffe', 'Lieberman', 'Mikkelsen', 'Ku', 'Russ', "O'Donovan", 'Giannoukos', 'Jaenisch', 'Meissner'] | Nature | 2007 | 8/2/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1214 | GSE8025 | 12/1/2007 | ['8025'] | [] | [] | 2575512 | [u'18680595'] | [u'Nambiar', u'Fox', u'Carey', u'Schauer', u'Borenshtein', u'Groff', u'Fry'] | ['Nambiar', 'Fox', 'Carey', 'Schauer', 'Borenshtein', 'Groff', 'Fry'] | ['Nambiar', 'Fox', 'Carey', 'Schauer', 'Borenshtein', 'Groff', 'Fry'] | Genome Biol | 2008 | 2008 | 0 | nal groups was determined with FatiGO as indicated above. Raw data and normalized microarray expression data have been deposited at the Gene Expression Omnibus (GEO) [ 88 ] under the accession number GSE8025{{tag}}--DEPOSIT--. TaqMan quantitative RT-PCR Total RNA (5 μg) was used to generate cDNA with SuperScriptII RT (Invitrogen) as recommended by the manufacturer. cDNA (100 ng) was amplified in a 25 μl | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1215 | GSE8035 | 10/16/2007 | ['8035'] | [] | [u'17933919'] | 2168061 | [u'17933919'] | ['Pronk', 'Knijnenburg', 'De', 'Daran-Lapujade', u'Daran-lapujade', 'Walker', 'Walsh', 'Reinders', 'Hazelwood', 'Daran'] | ['Pronk', 'Knijnenburg', 'De', 'Daran-Lapujade', 'Walker', 'Walsh', 'Reinders', 'Hazelwood', 'Daran'] | ['Pronk', 'Knijnenburg', 'De', 'Daran-Lapujade', 'Walker', 'Daran', 'Walsh', 'Reinders', 'Hazelwood'] | Appl Environ Microbiol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1216 | GSE8044 | 6/8/2007 | ['8044'] | ['2813'] | [u'17618855'] | 2564846 | [u'17618855'] | ['Langin', 'Yang', 'Rohas', 'Uldry', 'Chin', 'Spiegelman', 'Kajimura', 'Seale', 'Tavernier'] | ['Langin', 'Yang', 'Rohas', 'Uldry', 'Chin', 'Spiegelman', 'Kajimura', 'Seale', 'Tavernier'] | ['Uldry', 'Langin', 'Rohas', 'Yang', 'Chin', 'Spiegelman', 'Kajimura', 'Seale', 'Tavernier'] | Cell Metab | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
1217 | GSE8047 | 6/30/2007 | ['8047'] | [] | [u'17612405'] | 1940012 | [u'17612405'] | ['Ehrenkaufer', 'Singh', 'Eichinger'] | ['Ehrenkaufer', 'Singh', 'Eichinger'] | ['Ehrenkaufer', 'Singh', 'Eichinger'] | BMC Genomics | 2007 | 7/5/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1218 | GSE8048 | 6/8/2007 | ['8048'] | [] | [u'17877838'] | 2077334 | [u'17877838'] | ['Qu', 'Song', 'Hu', 'Chen', u'Bao', 'Yu'] | ['Chen', 'Yu', 'Hu', 'Qu', 'Song'] | ['Chen', 'Yu', 'Hu', 'Qu', 'Song'] | BMC Plant Biol | 2007 | 9/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1219 | GSE8051 | 11/28/2007 | ['8051'] | ['3074'] | [u'17986358'] | 2917432 | [u'20642857'] | ['Pittelkow', 'Meaney', 'Brackenbury', 'Peng', 'Ohms', 'Wilson', 'Hill', 'Sandow', 'Grayson'] | ['Green', 'Han', 'Yu', 'Qiao', 'Huang'] | [] | BMC Syst Biol | 2010 | 7/20/2010 | 0 | publications [ 17 - 24 ]. SNP signals that have been replicated in multiple large-scale association or GWAS were obtained from ref. [ 19 , 20 ]. Gene expression microarray datasets were obtained from GSE8051{{tag}}--REUSE--, GSE703, and GSE4707 for HT, GSE16415 and ref. [ 13 , 14 , 35 ] for T2D. Compilation of HT and T2D phenotypes HT and T2D phenotypes were manually selected and mapped to the phenotype ontology terms | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1220 | GSE8051 | 11/28/2007 | ['8051'] | ['3074'] | [u'17986358'] | 2219888 | [u'17986358'] | ['Pittelkow', 'Meaney', 'Brackenbury', 'Peng', 'Ohms', 'Wilson', 'Hill', 'Sandow', 'Grayson'] | ['Pittelkow', 'Meaney', 'Brackenbury', 'Peng', 'Ohms', 'Wilson', 'Hill', 'Sandow', 'Grayson'] | ['Pittelkow', 'Meaney', 'Wilson', 'Brackenbury', 'Peng', 'Ohms', 'Hill', 'Sandow', 'Grayson'] | BMC Genomics | 2007 | 11/7/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1221 | GSE8052 | 7/1/2007 | ['8052'] | [] | [u'17611496'] | 2557141 | [u'18846218'] | ['Cookson', 'von', 'Frischer', 'Vogelberg', 'Dixon', 'Liang', 'Bufe', 'Farrall', 'Abecasis', 'Depner', 'Rietschel', 'Illig', 'Lathrop', 'Simma', 'Weiland', 'Kabesch', 'Heath', 'Wong', 'Gut', 'Strachan', 'Heinzmann', 'Moffatt', 'Willis-Owen'] | ['Ouyang', 'Krontiris', 'Smith'] | [] | PLoS One | 2008 | 2008 | 1 | s time taking advantage of the high-density HapMap Phase II data – within the block containing the reported peak SNPs. We applied public HapMap expression data across three major populations (GSE2552 [13] and GSE5859 [19] , based on the Affymetrix platform, and GSE6536 [20] , based on the Illumina platform). 10.1371/journal.pone.0003362.t001 T|in Ibadan, Nigeria; CHB: Han Chinese in Beijing, China; JPT: Japanese in Tokyo, Japan. 3 Relative position (upstream; up / downstream; dn) to initiation/termination sites. 4 Based on public data from GSE 6536 (Illumina platform) or GSE 2552 / GSE 5859 (Affymetrix platform). 5 Reported in Morley et al., Nature 430, 743–7 (2004). 6 Reported in Cheung et al., Nature 437, 1365–9 (2005).|es in the lineage), a SNP with the most complete genotypes (underlined) was chosen for testing association. The nominal p-values for these SNPs in each major population, based on expression data sets GSE6536 (Illumina platform) and GSE2552/GSE5859 (Affymetrix platform), are shown in order. The coalescent-based maximum likelihood tree structure and the regression of expression phenotypes are plotted at |ed the same evolutionarily conserved feature, we next asked whether their cis -regulatory phenotypes were, as expected, also conserved across populations. Based on one set of expression data in YRI (GSE6536, Illumina platform) and two sets data in CHB/JPT (GSE5859, Affymetrix platform; GSE6536, Illumina platform), we tested the association for all tagging SNPs ( Figure 4 and Supporting Information F|age or allelic imbalance (AI) assays, we added a further validation step by confirming the cis -association using an independent dataset having a relatively large sample size [33] (GSE8052{{tag}}--REUSE--; Affymetrix platform; 400 UK samples). Thirty of the 44 genes passed the genome-wide significance threshold (a LOD score of 6.076, corresponding to a false discovery rate of 0.05, as listed in supp|tion. Association analysis between each tagging SNP and two sets of HapMap expression data, based on two (Affymetrix and Illumina) platforms and across three HapMap populations, (GEO accession number GSE2552 [13] , GSE5859 [19] , and GSE6536 [20] ), was conducted by following the regression methods described in Cheung et al. [13] (dis | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1222 | GSE8052 | 7/1/2007 | ['8052'] | [] | [u'17611496'] | 2967749 | [u'21044366'] | ['Cookson', 'von', 'Frischer', 'Vogelberg', 'Dixon', 'Liang', 'Bufe', 'Farrall', 'Abecasis', 'Depner', 'Rietschel', 'Illig', 'Lathrop', 'Simma', 'Weiland', 'Kabesch', 'Heath', 'Wong', 'Gut', 'Strachan', 'Heinzmann', 'Moffatt', 'Willis-Owen'] | ['Yousif', 'Mbagwu', 'Ohno-Machado', 'Lacson'] | [] | BMC Bioinformatics | 2010 | 10/28/2010 | 0 | es/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background The amount of data deposited in the Gene Expression Omnibus (GEO) has expanded significantly. It is important to ensure that these data are properly annotated with clinical data and descriptions of experimental conditions so that they can be useful for future|istency. Association between relevant variables, however, was adequate. 10–12 March 2010 2010 AMIA Summit on Translational Bioinformatics San Francisco, CA, USA Background The Gene Expression Omnibus (GEO) project was initiated by the National Center for Biotechnology Information (NCBI) to serve as a repository for gene expression data [ 1 , 2 ]. In addition to GEO, there are several other large-| 400,000 samples. There has been an ever growing interest in large microarray repositories for several reasons: (a) Microarray data are required by funding agencies and scientific journals to be made publicly accessible; (b) such repositories enable researchers to view data from other research groups; and (c) with proper pre-processing, such repositories may allow researchers to formulate and test hypothe|viously described [ 14 , 15 ]. The annotation tool used for this research was developed to facilitate human annotation by allowing easy access between the data descriptions and measurements that were downloaded from GEO and appropriate scientific publications from Pubmed [ 13 ]. The annotators are able to read the study descriptions that researchers deposited in GEO, as well as individual sample descripti|, and the results are displayed in Table 3 . Table 4 shows all the studies’ goals and the number of samples in each of the 17 annotated studies. Table 3 Coverage of Asthma variables in GDS GSE 470 GSE 473 GSE 3183 GSE 3004 Total Agent 100% 0% 100% 100% 17.4% Disease State 100% 100% 0% 0% 88.2% Time 100% 0% 100% 0% 12.7% Other 0% 100% 0% 0% 82.5% No. of Samples 12 175 15 10 212 Table 4 Annota|dy No. of Samples Topic/Title GSE8052{{tag}}--REUSE-- 404 Determinants of susceptibility to childhood asthma GSE473 175 Defining diagnostic genes from purified CD4+ blood cells that have specific diagnostic profiles GSE4302 118 Profiling of airway epithelial cells GSE3184 40 Murine airway hyperresponsiveness GSE483 39 Allergic response to ragweed GSE1301 24 Mechanisms by which IL-13 elicits the symptoms of asthma GSE8|fects of exercise on gene expression GSE6858 16 Expression data from experimental murine asthma GSE3183 15 Early cytokine-mediated mechanisms that lead to asthma GSE470 12 Asthma exacerbatory factors GSE9465 12 Pulmonary responses to ambient particulate matter GSE3004 10 Effects of allergen challenge on airway cell gene expression GSE2276 9 Effect of PGE receptor subtype agonist on an asthma model GSE4|d inhaler 697 24.1 Disease frequency 627 31.7 Gender 489 46.7 Atopic 425 53.7 Tissue 403 56.1 Challenge 0 1.0 The consistency of the studies in the asthma domain was also measured. In one such study (GSE4302), the data for 32 asthmatics randomized to a placebo-controlled trial of fluticasone propionate were examined. The authors use the generic name “fluticasone propionate” within both | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1223 | GSE8058 | 12/1/2007 | ['8058'] | ['3140'] | [u'17959600'] | 2739662 | [u'17959600'] | ['Melov', 'Vantipalli', 'Hubbard', 'Lithgow', 'McColl', 'Killilea'] | ['Melov', 'Vantipalli', 'Hubbard', 'Lithgow', 'McColl', 'Killilea'] | ['Melov', 'Vantipalli', 'Hubbard', 'Lithgow', 'McColl', 'Killilea'] | J Biol Chem | 2008 | 1/4/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1224 | GSE8059 | 8/7/2007 | ['8059'] | ['3191'] | [u'17623099'] | 1959522 | [u'17623099'] | ['Schmitz', 'Zhou', u'd\u2019Amore', 'Chan', 'Xiao', 'Iqbal', 'Geng', 'Dybkaer', "d'Amore"] | ['Schmitz', 'Zhou', 'Chan', 'Xiao', 'Iqbal', 'Geng', 'Dybkaer', "d'Amore"] | ['Schmitz', 'Zhou', 'Chan', 'Xiao', 'Iqbal', 'Geng', 'Dybkaer', "d'Amore"] | BMC Genomics | 2007 | 7/10/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1225 | GSE8076 | 7/30/2007 | ['8076'] | [] | [] | 2121100 | [u'17719025'] | [u'Elkahloun', u'Feldman', u'Pei', u'Noushmehr'] | ['Costa', 'Ouspenskaia', 'Elkahloun', 'Pei', 'Feldman', 'Noushmehr'] | [u'Elkahloun', u'Feldman', u'Noushmehr', u'Pei'] | Dev Biol | 2007 | 10/1/2007 | 0 | 03c; 5% (p < 0.05) ( Wei et al., 2004 ). The microarray data is available at NCBI GEO via the following link: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=dbuhxkcecoegari&acc= GSE8076{{tag}}--DEPOSIT-- To identify classes of enriched GeneOntology (GO) terms, we selected the subset of genes with RefSeq IDs, and either mean intensity log2 ratios > 1 (2 fold enrichment – up regulate | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1226 | GSE8081 | 8/1/2007 | ['8081'] | [] | [u'17565696'] | 2394768 | [u'17565696'] | ['Rozenhek', 'Mohammad', 'Babak', 'Barash', 'van', 'Fagnani', 'Zhang', 'Misquitta', 'Ip', 'Hughes', 'Frey', 'Blencowe', 'Saltzman', 'Willaime-Morawek', 'Shai', 'Pan', 'Lee'] | ['Rozenhek', 'Mohammad', 'Babak', 'Barash', 'van', 'Fagnani', 'Zhang', 'Misquitta', 'Ip', 'Hughes', 'Frey', 'Blencowe', 'Saltzman', 'Willaime-Morawek', 'Shai', 'Pan', 'Lee'] | ['Rozenhek', 'Mohammad', 'Babak', 'Barash', 'van', 'Fagnani', 'Zhang', 'Misquitta', 'Ip', 'Hughes', 'Frey', 'Blencowe', 'Saltzman', 'Willaime-Morawek', 'Shai', 'Pan', 'Lee'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1227 | GSE8084 | 6/19/2007 | ['8084'] | [] | [u'17959654'] | 2175336 | [u'17959654'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | Nucleic Acids Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1228 | GSE8086 | 6/19/2007 | ['8086'] | [] | [u'17959654'] | 2175336 | [u'17959654'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | Nucleic Acids Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1229 | GSE8088 | 10/16/2007 | ['8088'] | [] | [u'17933919'] | 2168061 | [u'17933919'] | ['Pronk', 'Knijnenburg', 'De', 'Daran-Lapujade', 'Walker', 'Walsh', 'Reinders', 'Hazelwood', 'Daran'] | ['Pronk', 'Knijnenburg', 'De', 'Daran-Lapujade', 'Walker', 'Walsh', 'Reinders', 'Hazelwood', 'Daran'] | ['Pronk', 'Knijnenburg', 'De', 'Daran-Lapujade', 'Walker', 'Daran', 'Walsh', 'Reinders', 'Hazelwood'] | Appl Environ Microbiol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1230 | GSE8089 | 10/16/2007 | ['8089'] | [] | [u'17933919'] | 2168061 | [u'17933919'] | ['Pronk', 'Knijnenburg', 'De', 'Daran-Lapujade', 'Walker', 'Walsh', 'Reinders', 'Hazelwood', 'Daran'] | ['Pronk', 'Knijnenburg', 'De', 'Daran-Lapujade', 'Walker', 'Walsh', 'Reinders', 'Hazelwood', 'Daran'] | ['Pronk', 'Knijnenburg', 'De', 'Daran-Lapujade', 'Walker', 'Daran', 'Walsh', 'Reinders', 'Hazelwood'] | Appl Environ Microbiol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1231 | GSE8092 | 9/20/2007 | ['8092'] | [] | [u'17878939'] | 1975675 | [u'17878939'] | ['King', 'Mertsola', 'Heikkinen', 'Mooi', 'Kallonen', 'Sara', 'Saarinen', 'Soini', 'He'] | ['King', 'Mertsola', 'Heikkinen', 'Mooi', 'Kallonen', 'Sara', 'Saarinen', 'Soini', 'He'] | ['King', 'Saarinen', 'Mooi', 'Kallonen', 'Sara', 'Heikkinen', 'Mertsola', 'Soini', 'He'] | PLoS One | 2007 | 9/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1232 | GSE8093 | 12/17/2007 | ['8093'] | [] | [u'18039366'] | 2231380 | [u'18039366'] | ['Zhang', 'Guo', 'Hu', 'Liu', 'Li', 'Deng', 'He'] | ['Zhang', 'Guo', 'Hu', 'Liu', 'Li', 'Deng', 'He'] | ['Zhang', 'Guo', 'Hu', 'Liu', 'Li', 'Deng', 'He'] | BMC Genomics | 2007 | 11/26/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1233 | GSE8096 | 7/12/2007 | ['8096'] | ['2810'] | [u'16849555'] | 2746025 | [u'19417140'] | ['Kenny', 'Bissell', 'Martin', 'Fournier', 'Yaswen', 'Bosch', 'Xhaja'] | ['Bissell', 'Fournier', 'Yaswen', 'Fata', 'Martin'] | ['Bissell', 'Fournier', 'Yaswen', 'Martin'] | Cancer Res | 2009 | 5/15/2009 | 0 | ding to the manufacturer’s instructions. mRNA profiling The full microarray results were published in a previous study ( 33 ) and data can be retrieved at the public database links GEOSeries GSE8096{{tag}}--REUSE-- and ArrayExpress E-MEXP-1006. In short, cell samples were harvested in duplicate at 3, 5, and 7 days post-seeding in lrECM. Purified total cellular RNA was biotin-labeled and hybridized to human o | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1234 | GSE8099 | 7/19/2007 | ['8099'] | [] | [u'17612404'] | 2323219 | [u'17612404'] | ['DeRisi', 'Shock', 'Fischer'] | ['DeRisi', 'Shock', 'Fischer'] | ['DeRisi', 'Shock', 'Fischer'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1235 | GSE8105 | 6/20/2007 | ['8105'] | [] | [u'17665085'] | 2583241 | [u'18649190'] | ['Cohen', 'Chrisman', 'Spray', 'Scemes', 'Iacobas', 'Suadicani'] | ['Iacobas', 'Spray', 'Scemes', 'Urban-Maldonado'] | ['Iacobas', 'Spray', 'Scemes'] | Cell Commun Adhes | 2008 | 2008 May | 0 | ED) and data complying with the Minimum Information About Microarray Experiments (MIAME; Brazma et al. 2001 ) have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus database ( http://www.ncbi.nlm.nih.gov/geo ) as series GSE8105{{tag}}--DEPOSIT-- . Of the 8039 quantified genes in all three conditions, 8.2% were up-regulated and 5.7% down-regulated in KO and 6.2% up-regulated and | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1236 | GSE8107 | 6/19/2007 | ['8107'] | [] | [u'17959654'] | 2444045 | [u'18628968'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Takano', 'Mehra', 'Charaniya', 'Hu'] | ['Mehra', 'Charaniya', 'Hu'] | PLoS One | 2008 | 7/16/2008 | 0 | oral transcriptome data for M145 wild-type in a batch liquid culture of modified R5 medium was obtained from a public repository, Gene Expression Omnibus [27] (GEO accession number: GSE8107{{tag}}--REUSE--) [28] . Steady state and dynamic analysis of mathematical model To obtain numerical solutions the differential equations were solved using the stiff differential equations solvers | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1237 | GSE8107 | 6/19/2007 | ['8107'] | [] | [u'17959654'] | 2267785 | [u'18230178'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Jayapal', 'Kyung', 'Sherman', 'Glod', 'Hu', 'Charaniya', 'Lian', 'Mehra'] | ['Lian', 'Hu', 'Mehra', 'Charaniya', 'Jayapal'] | BMC Genomics | 2008 | 1/29/2008 | 0 | SK4425). Microarray data availability Microarray data used in this study has been made available at NCBI – Gene Expression Omnibus [ 52 ] in a MIAME-compliant manner: series accession numbers GSE8107{{tag}}--DEPOSIT-- (M145) and GSE8160 (YSK4425). Authors' contributions WL and KPJ carried out liquid culture experiments, printed DNA microarrays, performed transcriptome experiments, overall data analysis, interpre | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1238 | GSE8107 | 6/19/2007 | ['8107'] | [] | [u'17959654'] | 2175336 | [u'17959654'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | Nucleic Acids Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1239 | GSE8108 | 6/14/2007 | ['8108'] | [] | [] | 2175336 | [u'17959654'] | [u'Hu', u'Jayapal'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | [u'Hu', u'Jayapal'] | Nucleic Acids Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1240 | GSE8109 | 6/14/2007 | ['8109'] | [] | [] | 2175336 | [u'17959654'] | [u'Lian', u'Hu', u'Jayapal'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | [u'Lian', u'Hu', u'Jayapal'] | Nucleic Acids Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1241 | GSE8110 | 6/14/2007 | ['8110'] | [] | [] | 2175336 | [u'17959654'] | [u'Lian', u'Hu', u'Jayapal'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | [u'Lian', u'Hu', u'Jayapal'] | Nucleic Acids Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1242 | GSE8111 | 10/22/2007 | ['8111'] | [] | [u'17890903'] | 2760818 | [u'19578060'] | ['Menacho-Marquez', 'Gadea', 'Ari\xc3\xb1o', u'Murgu\xeda', u'Ari\xf1o', 'Murgu\xc3\xada', 'Perez-Valle', u'P\xe9rez'] | ['Zamdborg', 'Ma'] | [] | Nucleic Acids Res | 2009 | 2009 Sep | 0 | A four-sample dataset of cDNA microarrays, which hybridized four biological replicates of GCN2 c constitutively active mutant samples to a common reference wild-type sample, was retrieved from GEO (GSE8111{{tag}}--REUSE--) ( 41 ). The data is the log transformed gene expression ratios between mutant and wild-type, and was used by MotifExpress as the response to fit multivariate regression model. Three motifs were se|e RAP1 motif) and the RRB regulon (strongly associated with the PAC motif) ( 42 ) are both known to be repressed by treatment with rapamycin, which is also known to induce Gcn4p synthesis ( 40 ). The GSE8111{{tag}}--REUSE-- dataset shows a transcription profile quite similar to rapamycin treatment, with the RRB and RP genes downregulated, and amino-acid biogenesis genes upregulated. Analysis of genes used for motif di|n2/4p. A three-sample dataset of cDNA microarrays, comparing wild-type cells in midlog-phase grown at 30°C to wild-type cells heat-shocked at 39°C for 15 min, was downloaded from GEO (GSE7665) ( 43 ). The data was the log-transformed gene expression ratios between heat shock and control conditions, and was used by MotifExpress as the response to fit a multivariate regression model. A se | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1243 | GSE8114 | 6/13/2007 | ['8114'] | [] | [] | 2709617 | [u'19500374'] | [u'Sorensson-Nystrom', u'Ballermann', u'Fierlbeck'] | ['Granqvist', 'Ballermann', 'Nystr\xc3\xb6m', 'Kulak', 'Fierlbeck'] | [u'Ballermann', u'Fierlbeck'] | BMC Nephrol | 2009 | 6/5/2009 | 0 | laboratory [ 22 ]. Human Glomerular SAGE Database Content The complete human glomerular SAGE library was deposited in the Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ repository (record GSE8114{{tag}}--DEPOSIT--, Accession # GSM199994) and in the SAGE Genie collection http://cgap.nci.nih.gov/SAGE as "LSAGE_Kidney_Glomeruli_Normal_B_bjballer1". It consists of 22,907, unique 17 bp tag sequences and the abs | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1244 | GSE8115 | 7/1/2007 | ['8115'] | [] | [u'18302802'] | 2703986 | [u'19474337'] | ['Holmes', 'Rolfe', 'Goffard', 'Imin', 'Weiller'] | ['Goffard', 'Frickey', 'Weiller'] | ['Goffard', 'Weiller'] | Nucleic Acids Res | 2009 | 7/1/2009 | 0 | gene expression of meristematic and non-meristematic root tissues ( 26 ). The data have been deposited in NCBI's Gene Expression Omnibus ( 34 ) and are accessible through GEO series accession number GSE8115{{tag}}--DEPOSIT--. Following normalization, differentially expressed probe sets were identified by evaluating the log 2 ratio between the two conditions. All probe sets that differed by more than a 2-fold differenc | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1245 | GSE8115 | 7/1/2007 | ['8115'] | [] | [u'18302802'] | 2277415 | [u'18302802'] | ['Holmes', 'Rolfe', 'Goffard', 'Imin', 'Weiller'] | ['Holmes', 'Rolfe', 'Goffard', 'Imin', 'Weiller'] | ['Holmes', 'Rolfe', 'Goffard', 'Imin', 'Weiller'] | BMC Plant Biol | 2008 | 2/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1246 | GSE8119 | 6/15/2007 | ['8119'] | [] | [u'17526780'] | 1932806 | [u'17526780'] | ['Rine', 'Mongodin', 'Ashby', 'Dimster-Denk', 'Nelson'] | ['Rine', 'Mongodin', 'Ashby', 'Dimster-Denk', 'Nelson'] | ['Rine', 'Mongodin', 'Ashby', 'Dimster-Denk', 'Nelson'] | Appl Environ Microbiol | 2007 | 2007 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
1247 | GSE8121 | 6/14/2007 | ['8121'] | [] | [u'17932561'] | 2014731 | [u'17932561'] | ['', 'Thomas', 'Cvijanovich', 'Shanley', 'Kalyanaraman', 'Doctor', 'Penfil', 'Barnes', 'Lin', 'Aronow', 'Wong', 'Sakthivel', 'Tofil', 'Monaco', 'Allen', 'Odoms'] | ['Thomas', 'Cvijanovich', 'Shanley', 'Kalyanaraman', 'Doctor', 'Penfil', 'Barnes', 'Lin', 'Aronow', 'Wong', 'Sakthivel', 'Tofil', 'Monaco', 'Allen', 'Odoms'] | ['Thomas', 'Cvijanovich', 'Shanley', 'Kalyanaraman', 'Doctor', 'Penfil', 'Barnes', 'Lin', 'Aronow', 'Wong', 'Sakthivel', 'Tofil', 'Monaco', 'Allen', 'Odoms'] | Mol Med | 2007 | 2007 Sep-Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1248 | GSE8122 | 11/28/2007 | ['8122'] | [] | [u'18042546'] | 2736560 | [u'18042546'] | ['Liao', 'Blackshear', 'Cuthbertson', 'Birnbaumer'] | ['Liao', 'Blackshear', 'Cuthbertson', 'Birnbaumer'] | ['Liao', 'Blackshear', 'Cuthbertson', 'Birnbaumer'] | J Biol Chem | 2008 | 2/1/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1249 | GSE8128 | 8/1/2007 | ['8128'] | [] | [u'17637833'] | 2582619 | [u'18812397'] | ['Hanspers', 'Tingley', 'Vranizan', 'Babbitt', 'Doniger', 'Fong', 'Conklin', 'Ferrin', 'Young', 'Hu', 'Zambon', 'Bacchetti', 'Nord', 'Skarnes'] | ['Schn\xc3\xbctgen', 'von', 'Lutz', 'Wurst', 'Horn', 'Floss', 'Noppinger', 'De-Zolt', 'Hansen'] | [] | Nucleic Acids Res | 2008 | 2008 Nov | 1 | e’ when sequence tags mapped the noncoding (antisense) strand of annotated protein-coding or EST genes. Gene expression values for trapped ENSEMBL genes were derived from GEO series accession GSE8128{{tag}}--REUSE-- ( 9 ) ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE8128{{tag}}--REUSE-- ). For mapping antisense insertions to regions of naturally occurring antisense transcripts, we used the annotations assembled in t|gure 7. Box-Plot of absolute gene expression levels of genes trapped with the FlipRosaβgeo vectors. Gene expression values for trapped ENSEMBL genes were derived from the GEO series accession GSE8128{{tag}}--REUSE-- ( 9 ). Box boundaries represent the first and third quartiles (Q. 25 , Q. 75 ). The median is indicated by the horizontal line dividing the interquartile range. Upper and lower ticks indicate the 1|aβgeo 9.95 5999 1436 80.7 FlipRosaβgeo a 14.71 5526 1909 74.3 eFlipRosaβgeo a 7.92 6278 1157 84.4 a Absolute gene expression values were obtained from the GEO series accession GSE8128{{tag}}--REUSE-- ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE8128{{tag}}--REUSE-- ). DISCUSSION In the present study we have developed and validated a novel class of conditional gene trap vectors that activate gene expr | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1250 | GSE8128 | 8/1/2007 | ['8128'] | [] | [u'17637833'] | 1910612 | [u'17637833'] | ['Hanspers', 'Tingley', 'Vranizan', 'Babbitt', 'Doniger', 'Fong', 'Conklin', 'Ferrin', 'Young', 'Hu', 'Zambon', 'Bacchetti', 'Nord', 'Skarnes'] | ['Hanspers', 'Tingley', 'Vranizan', 'Babbitt', 'Doniger', 'Fong', 'Conklin', 'Ferrin', 'Young', 'Hu', 'Zambon', 'Bacchetti', 'Nord', 'Skarnes'] | ['Hanspers', 'Tingley', 'Vranizan', 'Babbitt', 'Conklin', 'Fong', 'Ferrin', 'Young', 'Hu', 'Doniger', 'Zambon', 'Bacchetti', 'Nord', 'Skarnes'] | PLoS One | 2007 | 7/18/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1251 | GSE8131 | 9/1/2007 | ['8131'] | [] | [u'18950541'] | 2605756 | [u'18950541'] | [u'Holmes', 'Rolfe', 'Goffard', 'Imin', 'Nizamidin'] | ['Rolfe', 'Goffard', 'Imin', 'Nizamidin'] | ['Rolfe', 'Goffard', 'Imin', 'Nizamidin'] | BMC Plant Biol | 2008 | 10/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1252 | GSE8142 | 9/1/2007 | ['8142'] | [] | [] | 2644711 | [u'19192291'] | [u'Marcelo'] | ['Rutitzky', 'Casal', 'Ghiglione', 'Yanovsky', 'Cur\xc3\xa1'] | [] | BMC Genomics | 2009 | 2/3/2009 | 0 | ried out by the TIGR Expression Profiling Service [ 15 ]. All raw and normalized microarray data is available at: 1) the Solanaceae Gene Expression Database (ID 47 and 52), and 2) The Gene Expression Omnibus (accession number GSE8142{{tag}}--DEPOSIT--). Genes were considered to be regulated by photoperiod or phyB if the average of the log 2 (LD/SD) or (WT/α- PHYB ) ratio was: 1) significantly different from 0 (on | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
1253 | GSE8155 | 8/15/2007 | ['8155'] | [] | [] | 2217673 | [u'18235850'] | [u'Ranganathan', u'Datu', u'Gasser', u'Nagaraj', u'Ong', u"O'Donoghue", u'McInnes', u'Loukas'] | ['Ranganathan', 'Datu', 'Gasser', 'Nagaraj', 'Ong', "O'Donoghue", 'McInnes', 'Loukas'] | ['Ranganathan', 'Datu', 'Gasser', 'Nagaraj', 'Ong', "O'Donoghue", 'McInnes', 'Loukas'] | PLoS Negl Trop Dis | 2008 | 1/9/2008 | 0 | ata not shown). The data discussed in this publication have been deposited in NCBIs Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo/ ) and are accessible through GEO Series accession number GSE8155{{tag}}--DEPOSIT-- 10.1371/journal.pntd.0000130.g003 Figure 3 Magnitude (M) versus Amplitude (A) plot of array data summarized by population. Data points are colour-coded to highlight the difference between signifi | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1254 | GSE8159 | 6/18/2007 | ['8159'] | ['2786'] | [u'15780142'] | 2785812 | [u'19917117'] | ['Shaffer', 'Vidal', 'Moore', 'Miller', 'Von', 'Fox', 'Dupuy', 'Olszewski', 'Barlow'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159{{tag}}--REUSE--, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1255 | GSE8159 | 6/18/2007 | ['8159'] | ['2786'] | [u'15780142'] | 2323220 | [u'17612406'] | ['Shaffer', 'Vidal', 'Moore', 'Miller', 'Von', 'Fox', 'Dupuy', 'Olszewski', 'Barlow'] | ['Miller', 'Von', 'Fox', 'Roy', 'Watson', 'Olszewski', 'Spencer'] | ['Olszewski', 'Miller', 'Von', 'Fox'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1256 | GSE8160 | 6/23/2007 | ['8160'] | [] | [u'17959654'] | 2267785 | [u'18230178'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Jayapal', 'Kyung', 'Sherman', 'Glod', 'Hu', 'Charaniya', 'Lian', 'Mehra'] | ['Lian', 'Hu', 'Mehra', 'Charaniya', 'Jayapal'] | BMC Genomics | 2008 | 1/29/2008 | 0 | SK4425). Microarray data availability Microarray data used in this study has been made available at NCBI – Gene Expression Omnibus [ 52 ] in a MIAME-compliant manner: series accession numbers GSE8107 (M145) and GSE8160{{tag}}--DEPOSIT-- (YSK4425). Authors' contributions WL and KPJ carried out liquid culture experiments, printed DNA microarrays, performed transcriptome experiments, overall data analysis, interpre | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1257 | GSE8160 | 6/23/2007 | ['8160'] | [] | [u'17959654'] | 2175336 | [u'17959654'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | ['Jayapal', 'Mehra', 'Karypis', 'Hu', 'Charaniya', 'Lian'] | Nucleic Acids Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1258 | GSE8163 | 12/1/2007 | ['8163'] | [] | [] | 2220043 | [u'18024232'] | [''] | ['Han', 'Golding', 'Varmuza', 'Kuzmin', 'Latham', 'Mann'] | [] | Gene Expr Patterns | 2008 | 2008 Jan | 0 | Institute for labeling and hybridization to Affymetrix MOE 430 A and B chips, using a protocol described in Iscove et al., 2002 . Microarray data have been deposited with NCBI GEO, accession number GSE8163{{tag}}--DEPOSIT-- . The experiment (Stembase experiment 235) is part of a large collection of microarray experiments aimed at analyzing stem cell function ( Perez-Iratxeta, et al., 2005 ). Microarray data were analy | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
1259 | GSE8170 | 8/1/2007 | ['8170'] | [] | [u'17623099'] | 1959522 | [u'17623099'] | ['Schmitz', 'Zhou', u'd\u2019Amore', 'Chan', 'Xiao', 'Iqbal', 'Geng', 'Dybkaer', "d'Amore"] | ['Schmitz', 'Zhou', 'Chan', 'Xiao', 'Iqbal', 'Geng', 'Dybkaer', "d'Amore"] | ['Schmitz', 'Zhou', 'Chan', 'Xiao', 'Iqbal', 'Geng', 'Dybkaer', "d'Amore"] | BMC Genomics | 2007 | 7/10/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1260 | GSE8174 | 6/21/2007 | ['8174'] | [] | [u'16702414'] | 2780416 | [u'19956538'] | [u'Robert', 'Springer', 'Stupar'] | ['Nettleton', 'Yeh', 'Springer', 'Fu', 'Barbazuk', 'Jia', 'Schnable', 'Richmond', 'Wu', 'Rosenbaum', 'Ji', 'Ying', 'Iniguez', 'Jeddeloh', 'Kitzman'] | ['Springer'] | PLoS Genet | 2009 | 2009 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1261 | GSE8174 | 6/21/2007 | ['8174'] | [] | [u'16702414'] | 2365949 | [u'18402703'] | [u'Robert', 'Springer', 'Stupar'] | ['Springer', 'Gardiner', 'Stupar', 'Haun', 'Oldre', 'Chandler'] | ['Springer', 'Stupar'] | BMC Plant Biol | 2008 | 4/10/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1262 | GSE8188 | 12/1/2007 | ['8188'] | [] | [u'17660570'] | 2034640 | [u'17660570'] | ['Makarevitch', 'Springer', 'Barbazuk', 'Stupar', 'Haun', 'Kaeppler', 'Iniguez'] | ['Makarevitch', 'Springer', 'Barbazuk', 'Stupar', 'Haun', 'Kaeppler', 'Iniguez'] | ['Makarevitch', 'Springer', 'Barbazuk', 'Stupar', 'Haun', 'Kaeppler', 'Iniguez'] | Genetics | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1263 | GSE8191 | 6/21/2007 | ['8191'] | ['2843'] | [u'17338830'] | 2949890 | [u'20831831'] | ['Anderson', 'Rudolph', 'Neville', 'McManaman'] | ['Monsonego', 'Rubin', 'Geifman'] | [] | BMC Bioinformatics | 2010 | 9/12/2010 | 0 | ound 0.456971* 0.505641 External side of plasma membrane 0.744269 0.508488* Not Found (A) Enrichment in genes with up-regulated expression in hippocampus treated with Fluoxetine, based on GEO profile GSE6476. All terms passed FDR > 0.25 and state the FDR values. (B) GEO profile GSE6675. All terms passed FDR < 0.25 and state the FDR values (Up-regulated in Control in comparison to FGF2 t|ture. † indicates terms that were not included in the NIGO subset. For three GEO expression profiles, the results of the analysis are described in detail (Table 1 ). In the analysis of the GSE6476, in which the effect of chronic Fluoxetine treatment on hippocampal gene expression was examined [ 10 ], eleven terms passed statistic filtering with NIGO and not with the full GO, out of which 7 w|ded in the full GO but not in NIGO passed the statistic cutoffs but this term ('Protein self-association') was functionally irrelevant and contributed very little to the analysis. For the GEO profile GSE6675, in which astroglial gene expression program elicited by fibroblast growth factor-2 was examined, four terms relevant to neural/immune systems passed statistic filtering with NIGO and not with the |ith the full GO. This is an example of how without the use of NIGO one would have to raise the cutoff to at least 0.55 in order for this term to appear in the analysis results. Functional analysis of GSE6509 with NIGO also revealed statistically significant terms that were missed when using the full GO. This experiment involved microarray expression profiling designed to explore the effect of RU486, a |ed both in NIGO and the full GO, received higher FDR values when NIGO was used (Table 2 ), contrary to our expectations from the impact of reducing the ontology size. For example, in the analysis of GSE8788 in which gene expression analysis was conducted using Trib1-deficient macrophages treated with LPS as compared to LPS-treated wild-type macrophages [ 12 ], 11 out of 14 terms (that passed statistic|ment in FDR values when using NIGO is partially due to the stochastic nature of the GSEA algorithm. To test this hypothesis, the same analysis was repeated three times with each of the ontologies for GSE8788. FDR values were averaged and a comparison of analysis results was performed based on these averaged FDR values. In accordance with our hypothesis, the averaging of FDR values improved the apparent|l GO and GO-slim in enrichment analysis using GSEA Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.05 2 (2) 5 0 0 2 0 GSE6675 FDR < 0.25 0 6 0 0 1 0 GSE6476 FDR < 0.25 1 (1) 11 0 0 4 0 GSE3779 P < 0.05 7 (3) 13 1 (1) 0 1 5 GSE8425 P < 0.05 6 (1) 0 0 | 0 0 0 1 GSE8788 P < 0.01 6 (6) 0 0 11 3 1 GSE8788* P < 0.01 6 (6) 0 0 3 11 1 GSE9659 P < 0.01 6 (6) 0 NS A 4 28 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.01 5 (5) 0 0 0 1 2 GSE2259 FDR < 0.25 3 (1) 0 1 (1) 1 2 1 GSE8191{{tag}}--REUSE-- P < 0.05 5 (2) 0 0 0 0 0 The enrichment analysis results of GSEA, providing the full GO, NIGO or GO-s| for each subset given in separate columns. *To test the effect of stochasticity, the analysis of this profile is based on 3-fold averaged FDR values from three independent GSEA analyses. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. Three non-neural- or immune-related datasets were used to test the performance of NIGO in functional analysis of microarray data in|. These datasets included the GEO expression profile, GSE7407, in which gene expression in heart tissue with cardiac specific over-expression of Sirt1 was examined [ 13 ], the GEO expression profile, GSE8191{{tag}}--REUSE--, in which the gene expression profile of mammary glands from pregnant mice was compared to that of mammary glands from lactating mice [ 14 ], and the GEO expression profile, GSE2259, in which gene |in enrichment analysis using the Fisher Exact Test Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.0001 1 (1) 1 0 0 22 2 GSE6675 P < 0.05 1 (1) 0 11 (1) 0 2 0 GSE6476 P < 0.0001 0 2 2 0 15 3 GSE3779 P < 0.1 0 0 0 0 2 0 GSE8425 P < 0.1 0 0 0 0 0 0 GSE|0.001 0 2 0 0 9 1 GSE6136 P < 0.000001 0 1 0 0 37 7 GSE8788 P < 0.01 0 2 0 0 4 1 GSE9659 P < 0.001 0 3 NS A 0 33 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.0000000001 33 (30) 0 0 9 0 0 GSE2259 P < 0.1 63 (35) 0 2 8 8 7 GSE8191{{tag}}--REUSE-- P < 0.001 6 (6) 7 6 (2) 0 16 3 Enrichment analysis results from Ontologizer, providing the full |ional GO subset (the full GO or generic slim) are also shown for each term ('Lowest P-Value'), with the number of terms with the lowest P-values for each subset given in separate columns. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. These results, together with the analysis of five additional neural/immune-related experiments not described in Table 1 (namely p| cell's activity. These results further demonstrate that analysis with NIGO can enhance interpretation of functional analysis results produced for relevant microarray datasets. In the analysis of the GSE6509 expression dataset, three relevant terms passed the statistical cutoff with NIGO but not with the full GO. These terms, 'viral envelope', 'viral infectious cycle' and 'viral capsid' are all terms r|et. Such terms received FDR or p-values that were very close (but larger) than the cutoff values used. This is partially explained by the stochastic nature of the GSEA algorithm. Indeed, for one set (GSE8788), we compared the raw results with averaged FDR values. Averaging dramatically decreased the number of such terms. Furthermore, we conducted a similar functional analysis using the Fisher Exact Tes|nous peptide antigen via MHC class I' and 'Positive regulation of T cell mediated cytotoxicity' without defeating the purpose of the slimming process. Yet these two terms were found to be enriched in GSE6476, and are crucial for generating a hypothesis based on the expression profile. This shows that GO slims may be complemented by small, yet fully detailed domain-specific subsets of GO. Figure 4 The U|plementary/NIGO/Supplementary.html . The ontologies were clipped using the Protégé 4.0 beta OWL editor [ 26 ]. For by-species filtering, annotation files for human, rat and mouse were downloaded (October 2008) from GOA-EBI [ 27 ]. Association files used for GSEA analysis were generated based on the GOA-EBI annotation files and in the format required by GSEA. In this format, each row repres|a analysis and functional analysis were conducted using the GenePattern [ 29 ], GSEA [ 9 ] (release 2.5) web servers, and Ontologizer [ 16 ] as follows: (1) for each study, raw data (.CEL files) were downloaded from GEO [ 30 ]; (2) expression files (.gct files) were created using the Gene Pattern Expression File Creator module; (3) where necessary (i.e. for expression files GSE6509, GSE6675, GSE6476, GSE7|was conducted using the GSEA module. GSEA was run three times for each dataset, using a different GO version for each run. For the full GO, we used the organism-specific GO subset. In the analysis of GSE8788, GSEA was run three times for each of the three ontologies and FDR values were averaged over the three runs. (5) Differentially-expressed genes were found using the Gene Pattern ComparativeMarkerSe|, to an input term (Figure 4 ). The connections of UMLS concepts to the tested GO term are provided by UMLS and defined within the UMLS data files. Microarray data sets All microarray data sets were downloaded from GEO at NCBI [ 30 ]. The GEO sets used in this study are described in Additional file 2 . Availability NIGO is freely available as Additional file 6 and for download from: http://bioinfo.bg|re included or excluded from NIGO. Click here for file Additional file 2 Microarray Datasets for Comparative analysis of NIGO, GO and GO-slim . This file contains a summary of the microarray datasets downloaded from GEO at NCBI and used to test the performance of NIGO. Click here for file Additional file 3 GSEA analysis results . This file contains a summary of the GSEA analysis results for each of the mi|etroviral infectivity through different mechanisms Am J Physiol Lung Cell Mol Physiol 2009 297 L538 545 10.1152/ajplung.00162.2009 19561138 The Gene Ontology Consortium http://www.geneontology.org/GO.downloads.ontology.shtml Web Ontology Language Guide http://www.w3.org/TR/owl-guide Protégé 4.0 beta http://protege.stanford.edu Barrell D Dimmer E Huntley RP Binns D O'Donovan C Apweiler R Th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1264 | GSE8191 | 6/21/2007 | ['8191'] | ['2843'] | [u'17338830'] | 2225983 | [u'18039394'] | ['Anderson', 'Rudolph', 'Neville', 'McManaman'] | ['German', 'Pollard', 'Neville', 'Rudolph', 'Lemay'] | ['Neville', 'Rudolph'] | BMC Syst Biol | 2007 | 11/27/2007 | 0 | arly lactation by L1 and L2, full lactation by L9, and involution by I2. These data have been deposited in NCBI's Gene Expression Omnibus [ 43 ] and are accessible through GEO Series accession number GSE8191{{tag}}--DEPOSIT--. Microarray data analysis GeneSpring GX 7.3.1 was used to analyze the data. First, GC-RMA preprocessing was applied to all CEL files. Signal intensity values were normalized as follows. Values belo | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1265 | GSE8193 | 9/12/2007 | ['8193'] | [] | [u'17850661', u'18631401'] | 2575534 | [u'18631401'] | ['Roydasgupta', 'Gray', 'Hubbard', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Schittulli', 'Tommasi', 'Benz', 'Fridlyand', 'Fedele', 'Paradiso', 'Albertson'] | ['Yau', 'Benz'] | ['Yau', 'Benz'] | Breast Cancer Res | 2008 | 2008 | 1 | AND pmc_gds | 1 | 0 | ||||
1266 | GSE8193 | 9/12/2007 | ['8193'] | [] | [u'17850661', u'18631401'] | 2216076 | [u'17850661'] | ['Roydasgupta', 'Gray', 'Hubbard', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Schittulli', 'Tommasi', 'Benz', 'Fridlyand', 'Fedele', 'Paradiso', 'Albertson'] | ['Roydasgupta', 'Gray', 'Hubbard', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Schittulli', 'Tommasi', 'Benz', 'Fridlyand', 'Fedele', 'Paradiso', 'Albertson'] | ['Roydasgupta', 'Gray', 'Hubbard', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Schittulli', 'Tommasi', 'Benz', 'Fridlyand', 'Fedele', 'Paradiso', 'Albertson'] | Breast Cancer Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1267 | GSE8194 | 6/21/2007 | ['8194'] | [] | [u'16702414'] | 2048729 | [u'17766400'] | ['', 'Springer', 'Stupar'] | ['Springer', 'Hermanson', 'Stupar'] | ['Springer', 'Stupar'] | Plant Physiol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1268 | GSE8197 | 9/1/2007 | ['8197'] | [] | [u'17921288'] | 2168612 | [u'17921288'] | ['', 'Alm', 'Joachimiak', 'Zhou', 'Stahl', 'Wall', 'Hazen', 'Borglin', 'Joyner', 'Arkin', 'Yang', 'Huang', 'Stolyar', 'He'] | ['Alm', 'Joachimiak', 'Zhou', 'Stahl', 'Wall', 'Hazen', 'Borglin', 'Joyner', 'Arkin', 'Yang', 'Huang', 'Stolyar', 'He'] | ['Alm', 'Joachimiak', 'Zhou', 'Stahl', 'Wall', 'Hazen', 'Borglin', 'Joyner', 'Arkin', 'Yang', 'Huang', 'Stolyar', 'He'] | J Bacteriol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1269 | GSE8218 | 6/22/2007 | ['8218'] | [] | [u'20663908'] | 2244693 | [u'18283340'] | ['Xia', 'Wang', 'Jia', 'Mercola', 'Sawyers', 'Yao', 'Wang-Rodriquez', 'McClelland'] | ['Laxman', 'Prensner', 'Mehra', 'Helgeson', 'Chinnaiyan', 'Rubin', 'Cao', 'Varambally', 'Shah', 'Tomlins', 'Yu'] | [] | Neoplasia | 2008 | 2008 Feb | 1 | oncomine.org ) [ 28 ] as molecular concepts, using all features on the Agilent Whole Human Genome Oligo Microarray as the null set. For the assessment of prostate-specific gene expression, the expO ( GSE2109 ) and Shyamsundar normal tissue [ 29 ] data sets were accessed using the Oncomine database. Cancer and normal tissue types are defined in Table W3 . For the assessment of prostate cell type expres|he OCM. The two most significantly enriched concepts in our under-expressed in VCaP-siERG signature were two signatures of genes over-expressed in ETS-positive versus -negative prostate cancers ( GSE8218{{tag}}--REUSE-- , OR = 5.73, P = 2.5 � 10 �19 and Vanaja et al. [ 40 ], OR = 3.49, P = 3.9 � 10 �11 ) ( Figure 3 c ). All other over-expressed in ETS-positive versus -negative prostate cancer signatures [ 3|essed in VCaP-siERG signature using the OCM. Consistent with the results described above, all under-expressed in ETS-positive versus -negative prostate cancer signatures in the Oncomine database ( GSE8218{{tag}}--REUSE-- and [ 33,40�42 ]) were enriched in our over-expressed in VCaP-siERG signature (OR = 6.41�2.71, P = 5.2 � 10 �15 to 7.0 � 10 �5 ). Intriguingly, OCM analysis revealed that the most significant | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1270 | GSE8218 | 6/22/2007 | ['8218'] | [] | [u'20663908'] | 2732022 | [u'18538735'] | ['Xia', 'Wang', 'Jia', 'Mercola', 'Sawyers', 'Yao', 'Wang-Rodriquez', 'McClelland'] | ['Lilja', 'Demichelis', 'Gerald', 'Tsodikov', 'Wei', 'Reuter', 'Mehra', 'Eggener', 'Chinnaiyan', 'Rubin', 'Al-Ahmadie', 'Varambally', 'Shah', 'Tomlins', 'Stenman', 'Andr\xc3\xa9n', 'Fall', 'Hotakainen', 'Laxman', 'Johnson', 'Rhodes', 'Perner', 'Morris', 'Cao', 'Eastham', 'Yu', 'Kantoff', 'Bjartell', 'Helgeson', 'Fine', 'Scardino'] | [] | Cancer Cell | 2008 | 2008 Jun | 1 | AND pmc_gds | 0 | 1 | ||||
1271 | GSE8226 | 7/13/2007 | ['8226'] | [] | [] | 2917912 | [u'17823450'] | [u'Benninghoff', u'Williams'] | ['Benninghoff', 'Williams'] | ['Benninghoff', 'Williams'] | Toxicol Sci | 2008 | 2008 Jan | 0 | dy in the Enzyme immunoassay (data not shown). Data Filtering and Comparison of Numbers of Genes Regulated by Experimental Treatments All raw gene expression data are available at the Gene Expression Omnibus online microarray data repository ( http://www.ncbi.nlm.nih.gov/geo/ , accession GSE8226{{tag}}--DEPOSIT-- ). Calculated fold-change ratios of experimental samples compared to reference after background subtraction a|f Environmental Health and Safety (ES03850, ES07060, ES00210, and ES013534). Footnotes SUPPLEMENTARY DATA Supplementary data are available online at www.toxsci.oupjournals.org and at GEO Expression Omnibus at www.ncbi.nlm.nih.gov/geo/ . Parts of this manuscript were presented at 14th North America International Society for the Study of Xenobiotics Meeting, Rio Grande, Puerto Rico, 22–26 Octobe | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1272 | GSE8227 | 6/23/2007 | ['8227'] | [] | [] | 2494569 | [u'17991461'] | [u'Kudryavteseva', u'Andersen', u'Monuki', u'Soto', u'Lin', u'Flanagan', u'Lu', u'Mannik', u'Xu', u'Yu', u'Spencer', u'Wang'] | ['Andersen', 'Wang', 'Soto', 'Lin', 'Kudryavtseva', 'Flanagan', 'Lu', 'Mannik', 'Monuki', 'Yu', 'Spencer', 'Xu'] | ['Andersen', 'Wang', 'Soto', 'Lin', 'Flanagan', 'Lu', 'Mannik', 'Monuki', 'Yu', 'Spencer', 'Xu'] | Dev Biol | 2007 | 12/15/2007 | 0 | ify the significantly differentially expressed genes into biological process ontology categories ( Thomas et al., 2003 ). Primary microarray data from this paper is accessible at NCBI Gene Expression Omnibus ( www.ncbi.nlm.nih.gov/geo ), GSE8227{{tag}}--DEPOSIT-- . Fluorescence Activated Cell Sorting (FACS) Back skins from 8-week-old wild type and DN-Clim transgenic mice were dissected and treated overnight with 0.25% tr | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1273 | GSE8231 | 11/8/2007 | ['8231'] | [] | [u'17848203'] | 2375026 | [u'17848203'] | ['Fukushige', 'Brodigan', 'Miller', 'Von', 'Fox', 'Krause', 'McDermott', 'Watson'] | ['Fukushige', 'Brodigan', 'Miller', 'Von', 'Fox', 'Krause', 'McDermott', 'Watson'] | ['Fukushige', 'Brodigan', 'Miller', 'Von', 'Fox', 'Krause', 'McDermott', 'Watson'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1274 | GSE8238 | 6/22/2007 | ['8238'] | ['2783'] | [u'17127748'] | 2785812 | [u'19917117'] | ['Stanfield', 'Skinner', 'Nilsson'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | g features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model. Conclusion The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy paramet|value decomposition and we provide the details of the selected cost function. In Results a test of the algorithm on about 360 Genechips from recent (2006 onwards) experiments from the Gene Expression Omnibus (GEO) is presented. Finally, the advantages of this scheme and its overall performance as background subtraction method is highlighted. Methods Approach The general assumption is that the (natural) l|ding eigenvalues close to zero in machine precision. These are, however, rare, and were actually never found in the calculations presented here. Results We analyzed a total of 366 CEL-files which are publicly available from the GEO server http://www.ncbi.nlm.nih.gov/geo . Table 1 gives an overview of the distribution of CEL-files over the twelve different organisms considered in this study. The array s|d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d In this section we compare the estimated background signal (as given in Eq. (2)) with the intensities of given probe sets corresponding to non-expressed genes in the samples analyzed. We start with publicly available data taken from spike-in experiments on HGU95A chips http://www.affymetrix.com where genes have been spiked-in at known concentrations, ranging from 0 to 1024 pM (picoMolar). The data at |d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and |e nearest-neighbor model. Existing algorithms are either of the first [ 4 ] or the second [ 6 , 7 , 9 ] category, but not both. The background subtraction scheme has been tested on 360 GeneChips from publicly available data of recent expression experiments. Since the fitted values for the same parameters in different experiments do not show much variation, the algorithm is robust and can be easily transfe | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1275 | GSE8249 | 8/2/2007 | ['8249'] | [] | [u'17660561'] | 2711087 | [u'19538736'] | ['Haynie', 'Castrillon', 'Shidler', 'Gallardo', 'Akbay', u'Schidler', 'Shirley', 'Contreras', 'John', 'Ward'] | ['Nedorezov', 'Karmazin', 'Forabosco', 'Ottolenghi', 'Uda', 'Cao', 'Piao', 'Schlessinger', 'Garcia-Ortiz', 'Cole', 'Pelosi', 'Omari'] | [] | BMC Dev Biol | 2009 | 6/18/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
1276 | GSE8249 | 8/2/2007 | ['8249'] | [] | [u'17660561'] | 2013718 | [u'17660561'] | ['Haynie', 'Castrillon', 'Shidler', 'Gallardo', 'Akbay', u'Schidler', 'Shirley', 'Contreras', 'John', 'Ward'] | ['Haynie', 'Castrillon', 'Shidler', 'Gallardo', 'Akbay', 'Shirley', 'Contreras', 'John', 'Ward'] | ['Haynie', 'Castrillon', 'Shidler', 'Gallardo', 'Akbay', 'Shirley', 'Contreras', 'John', 'Ward'] | Genetics | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
1277 | GSE8251 | 6/23/2007 | ['8251'] | [] | [u'17557906'] | 2680807 | [u'19416532'] | ['Fielden', 'Gollub', 'Brennan'] | ['Thomas', 'Zhang', 'Murphy', 'Gohlke', 'Mattingly', 'Davis', 'Rosenstein', 'Portier', 'Becker'] | [] | BMC Syst Biol | 2009 | 5/5/2009 | 1 | bal gene expression datasets utilized for validation of metabolic syndrome and neuropsychiatric subnetworks METABOLIC SYNDROME Condition Species Tissue GEO Acc . Reference obese/lean Human adipocytes GSE2508 [ 30 ] obese/lean Mouse adipocytes GSE4692 [ 31 ] Familial combined hyperlipedemia Human monocytes GSE11393 [ 32 ] Treatment Species Tissue GEO Acc . Reference Fenofibrate Rat liver GSE8251{{tag}}--REUSE-- [ 3|amide Rat liver GSE3952 [ 34 ] 9-cis retinoic acid Rat liver GSE3952 [ 34 ] Targretin Rat liver GSE3952 [ 34 ] Vitamin A deficient diet Rat liver GSE1600 [ 35 ] Omega 3 fatty acids Rat cardiomyocytes GSE4327 [ 36 ] Thiazolidinediones Human 3T3-L1 adipocytes GSE1458 [ 37 ] Atorvastatin Human monocytes GSE11393 [ 32 ] Cyfluthrin Human astrocytes GSE5023 [ 38 ] NEUROPSYCHIATRIC DISORDERS Condition Specie| 39 ] Depression Human prefrontal cortex GSE12654 [ 39 ] Schizophrenia Human prefrontal cortex GSE12654 [ 39 ] Schizophrenia Human frontal cortex E-MEXP-857 [ 40 ] Anxiety Mouse various brain regions GSE3327 [ 41 ] Autism Human lymphoblastoid cell lines GSE7329 [ 42 ] Autism Human whole blood GSE6575 [ 43 ] Treatment Species Tissue GEO Acc . Reference Chlorpyrifos Human astrocytes GSE5023 [ 38 ] Ch | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1278 | GSE8252 | 6/23/2007 | ['8252'] | ['2863'] | [u'18958736'] | 2830949 | [u'20181063'] | ['Boutros', 'Pacitto', 'Uetrecht', 'Popovic'] | ['de', 'Jensen', 'Gecz', 'Janecke', 'Rujirabanjerd', 'Raynaud', 'Hamel', 'Stricker', 'Hackett', 'Bartenschlager', 'N\xc3\xbcmann', 'Tzschach', 'Fryns', 'Kuss', 'Ropers', 'Chelly', 'Sp\xc3\xb6rle', 'Nelson'] | [] | Pathogenetics | 2010 | 2/2/2010 | 0 | ariant' algorithm for normalisation of the expression data. All micro array data is 'Minimum Information About a Microarray Experiment' (MIAME) compliant and stored under the GEO series access number GSE8252{{tag}}--DEPOSIT-- http://www.ncbi.nlm.nih.gov/geo/ . Using the Illumina Sentrix Human-6 v2 platform RNA from 12 patients with mutations in KDM5C and 5 control lymphoblastoid cell lines were analysed. cDNA synthesis | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
1279 | GSE8269 | 7/24/2007 | ['8269'] | [] | [u'17617897'] | 1941732 | [u'17617897'] | ['Bethin', 'Zhao', 'Curtis', 'Koon', 'Soper'] | ['Bethin', 'Zhao', 'Curtis', 'Koon', 'Soper'] | ['Bethin', 'Zhao', 'Curtis', 'Koon', 'Soper'] | Reprod Biol Endocrinol | 2007 | 7/6/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1280 | GSE8270 | 7/13/2007 | ['8270'] | [] | [u'17626884'] | 2758738 | [u'19797771'] | ['Lindroos', 'Shirahige', 'Wedahl', 'Katou', 'Str\xc3\xb6m', 'Sj\xc3\xb6gren', 'Karlsson', u'Strom', u'Sjogren'] | ['Mistry', 'Megee', 'Wang', 'Guacci', 'Kogut'] | [] | Genes Dev | 2009 | 10/1/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
1281 | GSE8287 | 8/31/2007 | ['8287'] | [] | [u'17878305'] | 2000517 | [u'17878305'] | ['Shaulsky', 'Min', 'Van', 'Kuspa', 'Alexander'] | ['Shaulsky', 'Min', 'Van', 'Kuspa', 'Alexander'] | ['Shaulsky', 'Kuspa', 'Van', 'Min', 'Alexander'] | Proc Natl Acad Sci U S A | 2007 | 9/25/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1282 | GSE8290 | 7/24/2007 | ['8290'] | [] | [u'18288265'] | 2801700 | [u'20003344'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra', u'Muller'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | ['M\xc3\xbcller'] | BMC Genomics | 2009 | 12/11/2009 | 0 | dicted PPAR targets [ 13 ]. Our study complemented the genome-wide analysis conducted to date by adding a meta-analysis performed across species of expression data related to PPARα signaling. Publicly available gene expression studies selected for our meta-analysis included experiments addressing molecular response to high fat diet, PPARα activation by various stimuli and gene expression i|tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302/E-GEOD-8302 Hs Liver Affymetrix 2 [ 55 ] GSE8302/E-GEOD-8302 Mm Liver Affymetrix 3 [ 55 ] GSE8302/E-GEOD-8302 Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291/E-GEOD-8291 Mm Liver Affymetrix 6 [ 55 ] GSE8292/E-GEOD-8292 Mm Liver Affymetrix 7 [ 55 ] GSE8295/E-GEOD-8295 Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753/E-GEOD-8753 Mm Liver Affymetrix 9 [ 57 ] GSE6903/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302, GSE9291 and GSE9290). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1283 | GSE8290 | 7/24/2007 | ['8290'] | [] | [u'18288265'] | 2233741 | [u'18288265'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra', u'Muller'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra'] | PPAR Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1284 | GSE8291 | 7/24/2007 | ['8291'] | [] | [u'18288265'] | 2801700 | [u'20003344'] | ['Carlberg', u'Rakhshandehro', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra', u'Muller'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | ['M\xc3\xbcller'] | BMC Genomics | 2009 | 12/11/2009 | 0 | tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302/E-GEOD-8302 Hs Liver Affymetrix 2 [ 55 ] GSE8302/E-GEOD-8302 Mm Liver Affymetrix 3 [ 55 ] GSE8302/E-GEOD-8302 Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291{{tag}}--REUSE--/E-GEOD-8291{{tag}}--REUSE-- Mm Liver Affymetrix 6 [ 55 ] GSE8292/E-GEOD-8292 Mm Liver Affymetrix 7 [ 55 ] GSE8295/E-GEOD-8295 Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753/E-GEOD-8753 Mm Liver Affymetrix 9 [ 57 ] GSE6903/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302, GSE9291 and GSE9290). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1285 | GSE8291 | 7/24/2007 | ['8291'] | [] | [u'18288265'] | 2233741 | [u'18288265'] | ['Carlberg', u'Rakhshandehro', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra', u'Muller'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra'] | PPAR Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1286 | GSE8292 | 7/24/2007 | ['8292'] | [] | [u'18288265'] | 2801700 | [u'20003344'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra', u'Muller'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | ['M\xc3\xbcller'] | BMC Genomics | 2009 | 12/11/2009 | 0 | tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302/E-GEOD-8302 Hs Liver Affymetrix 2 [ 55 ] GSE8302/E-GEOD-8302 Mm Liver Affymetrix 3 [ 55 ] GSE8302/E-GEOD-8302 Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291/E-GEOD-8291 Mm Liver Affymetrix 6 [ 55 ] GSE8292{{tag}}--REUSE--/E-GEOD-8292{{tag}}--REUSE-- Mm Liver Affymetrix 7 [ 55 ] GSE8295/E-GEOD-8295 Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753/E-GEOD-8753 Mm Liver Affymetrix 9 [ 57 ] GSE6903/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302, GSE9291 and GSE9290). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1287 | GSE8295 | 7/24/2007 | ['8295'] | [] | [u'18288265'] | 2801700 | [u'20003344'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra', u'Muller'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | ['M\xc3\xbcller'] | BMC Genomics | 2009 | 12/11/2009 | 0 | tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302/E-GEOD-8302 Hs Liver Affymetrix 2 [ 55 ] GSE8302/E-GEOD-8302 Mm Liver Affymetrix 3 [ 55 ] GSE8302/E-GEOD-8302 Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291/E-GEOD-8291 Mm Liver Affymetrix 6 [ 55 ] GSE8292/E-GEOD-8292 Mm Liver Affymetrix 7 [ 55 ] GSE8295{{tag}}--REUSE--/E-GEOD-8295{{tag}}--REUSE-- Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753/E-GEOD-8753 Mm Liver Affymetrix 9 [ 57 ] GSE6903/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302, GSE9291 and GSE9290). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1288 | GSE8295 | 7/24/2007 | ['8295'] | [] | [u'18288265'] | 2233741 | [u'18288265'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra', u'Muller'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra'] | PPAR Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1289 | GSE8298 | 8/9/2007 | ['8298'] | [] | [u'18083910'] | 2217651 | [u'18083910'] | ['Lenarz-Wyatt', 'Katagiri', 'Van', 'Weisberg', 'Sato'] | ['Lenarz-Wyatt', 'Katagiri', 'Van', 'Weisberg', 'Sato'] | ['Lenarz-Wyatt', 'Katagiri', 'Van', 'Weisberg', 'Sato'] | Plant Cell | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1290 | GSE8301 | 10/3/2007 | ['8301'] | [] | [u'18020713'] | 2077892 | [u'18020713'] | ['', 'Johansson', 'Pettersson', 'Stenberg', 'Larsson'] | ['Johansson', 'Pettersson', 'Stenberg', 'Larsson'] | ['Johansson', 'Pettersson', 'Stenberg', 'Larsson'] | PLoS Genet | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1291 | GSE8302 | 7/24/2007 | ['8302'] | [] | [u'18288265'] | 2801700 | [u'20003344'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra', u'Muller'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | ['M\xc3\xbcller'] | BMC Genomics | 2009 | 12/11/2009 | 0 | tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302{{tag}}--REUSE--/E-GEOD-8302{{tag}}--REUSE-- Hs Liver Affymetrix 2 [ 55 ] GSE8302{{tag}}--REUSE--/E-GEOD-8302{{tag}}--REUSE-- Mm Liver Affymetrix 3 [ 55 ] GSE8302{{tag}}--REUSE--/E-GEOD-8302{{tag}}--REUSE-- Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291/E-GEOD-8291 Mm Liver Affymetrix 6 [ 55 ] GSE8292/E-GEOD-8292 Mm Liver Affymetrix 7 [ 55 ] GSE8295/E-GEOD-8295 Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753/E-GEOD-8753 Mm Liver Affymetrix 9 [ 57 ] GSE6903/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302{{tag}}--REUSE--, GSE9291 and GSE9290). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1292 | GSE8302 | 7/24/2007 | ['8302'] | [] | [u'18288265'] | 2233741 | [u'18288265'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra', u'Muller'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra'] | PPAR Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1293 | GSE8302 | 7/24/2007 | ['8302'] | [] | [u'18288265'] | 2578827 | [u'18680712'] | ['Carlberg', 'Matilainen', 'de', 'Sanderson', 'Kersten', 'M\xc3\xbcller', 'Rakhshandehroo', 'Stienstra', u'Muller'] | ['Vidal', 'Han', 'Hao', 'Lin', 'Li', 'Liu', 'Hill'] | [] | Cell Metab | 2008 | 2008 Aug | 0 | n. Previous transcriptional profiling studies have demonstrated that Wy14,643 induces the expression of many peroxisomal and mitochondrial fat oxidation genes in cultured hepatocytes (Gene Expression Omnibus dataset GSE8302{{tag}}--MENTION-- ). To determine the extent to which BAF60a regulates the expression of PPARα target genes, we performed microarray analysis on primary hepatocytes transduced with GFP or BAF6|yloxapol (500 mg/kg) through tail vein. Tail blood was sampled 30, 60, and 90 minutes after injection and assayed for TG concentrations. Microarray data has been deposited to the NCBI Gene Expression Omnibus database ( GSE9523 ). Supplementary Material body 01 Click here to view. (107K, pdf) 02 Click here to view. (222K, xls) Acknowledgements We thank S. Gu and A. Baker for technical assistance, L. Chang | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
1294 | GSE8305 | 8/1/2007 | ['8305'] | [] | [u'17671091'] | 1935030 | [u'17671091'] | ['Bermejo', 'Shirahige', 'Katou', 'Foiani', 'Doksani', 'Tanaka', 'Capra'] | ['Bermejo', 'Shirahige', 'Katou', 'Foiani', 'Doksani', 'Tanaka', 'Capra'] | ['Bermejo', 'Shirahige', 'Katou', 'Doksani', 'Foiani', 'Tanaka', 'Capra'] | Genes Dev | 2007 | 8/1/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1295 | GSE8308 | 9/1/2007 | ['8308'] | [] | [u'17766400'] | 2048729 | [u'17766400'] | ['', 'Springer', 'Hermanson', 'Stupar'] | ['Springer', 'Hermanson', 'Stupar'] | ['Springer', 'Hermanson', 'Stupar'] | Plant Physiol | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1296 | GSE8311 | 6/27/2007 | ['8311'] | [] | [] | 2978079 | [u'21085707'] | [u'Rubenstein'] | ['Wiltshire', 'Beyer', 'Miller', 'Kempermann', 'Pletcher', 'Overall', 'Su', 'Michaelson', 'Loguercio', 'Walker'] | [] | PLoS One | 2010 | 11/10/2010 | 0 | Usf1 and Atf6 Over-expression Datasets Plaisier et al. [44] studied overexpression of Usf1 in HEK293T cells in vitro to ascertain the genes downstream of Usf1 (GEO accession: GSE17300). CEL files from the repository were processed as described in [44] . Differential expression between Usf1 and empty vector control transfected cells was analyzed using a Studen|ology data at Mouse Genome Informatics (MGI 4.32, December 2009; http://www.informatics.jax.org ). For Atf6 , overexpression was studied in mouse heart tissue [46] (GEO accession: GSE8311{{tag}}--REUSE--). We selected 607 genes exhibiting at least 2-fold differential expression (p≤0.001). Analysis of Tissue Specificity in Gene Expression The GNF Mouse GeneAtlas V3 was used as an independent | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1297 | GSE8311 | 6/27/2007 | ['8311'] | [] | [] | 2581427 | [u'18799476'] | [u'Rubenstein'] | ['Labosky', 'Golden', 'Cho', 'Nasrallah', 'Marsh', 'Fulp'] | [] | Hum Mol Genet | 2008 | 12/1/2008 | 1 | AND pmc_gds | 0 | 1 | ||||
1298 | GSE8317 | 6/28/2007 | ['8317'] | [] | [u'17725815'] | 2375004 | [u'17725815'] | ['Relman', 'Brown', 'Hensley', 'Wahl-Jensen', 'Young', 'Jahrling', 'Reed', 'Geisbert', 'Rubins', 'Daddario'] | ['Relman', 'Brown', 'Hensley', 'Wahl-Jensen', 'Young', 'Jahrling', 'Reed', 'Geisbert', 'Rubins', 'Daddario'] | ['Relman', 'Brown', 'Hensley', 'Wahl-Jensen', 'Young', 'Jahrling', 'Reed', 'Geisbert', 'Rubins', 'Daddario'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1299 | GSE8325 | 6/29/2007 | ['8325'] | ['2971'] | [u'18240145'] | 2687992 | [u'19376825'] | ['Zoubeidi', 'Andersen', 'Gleave', 'Sowery', 'Ettinger', 'Hadaschik', u'Baybik', u'Hurtado-Coll', 'Roberge'] | ['Lane', 'Law', 'Pavlidis', 'French', 'Xu'] | [] | Bioinformatics | 2009 | 6/15/2009 | 1 | ctions revealed many related predictions for a single disorder, possibly explaining the low precision. An example is the experiment ‘Cytotoxic activity of HTI-286 in prostate cancer’ (GSE8325{{tag}}--REUSE--). Predicted annotations for this study were ‘Malignant neoplasm of prostate’, ‘Malignant Neoplasms’, ‘Refractory Carcinoma’ and finally ‘Pros | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1300 | GSE8326 | 6/29/2007 | ['8326'] | [] | [] | 2597350 | [u'18469141'] | [u'Stepanov', u'Neale', u'Nitiss'] | ['Stepanov', 'Neale', 'Nitiss'] | ['Stepanov', 'Neale', 'Nitiss'] | Mol Pharmacol | 2008 | 2008 Aug | 0 | ations (March, 2007) were obtained from the Affymetrix website ( http://www.affymetrix.com/analysis/index.affx ). All microarray data have been submitted to GEO ( http://www.ncbi.nlm.nih.gov/geo/ ) ( GSE8326{{tag}}--DEPOSIT-- ). Three replicate cultures of each yeast strain were used to identify differentially expressed transcripts. Signal values were log2-transformed prior to analysis. The local pooled error (LPE) t-te|D:repressor fusions, and that deletion of PDR1 enhanced the effectiveness of the Pdr1DBD:repressor fusions. The full set of expression data is available on-line ( http://www.ncbi.nlm.nih.gov/geo/ , ( GSE8326{{tag}}--DEPOSIT-- ). Figure 3 Expression analysis in cells containing pdr1DBD-fusion constructs. Expression analysis in cells containing pdr1DBD-fusion constructs Figure 3 Expression analysis in cells containing pdr|conditions reported here, we also analyzed the effect of deletion of PDR1 in cells carrying an empty vector. The full set of expression data is available on-line ( http://www.ncbi.nlm.nih.gov/geo/ , GSE8326{{tag}}--DEPOSIT-- ). We confirmed some of the results obtained with the microarray studies by Northern analysis using probes for two target genes, PDR5 , and YOR1 . Both genes are well-established PDR1 targets. | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1301 | GSE8333 | 6/30/2007 | ['8333'] | [] | [u'17327916'] | 2994899 | [u'21152100'] | ['Moreau', 'Look', 'Neuberg', 'Diller', 'Fox', 'Maris', 'Meyerson', 'Li', 'Fortina', 'Attiyeh', 'George'] | ['Amos', 'Tuna', 'Martens', 'Zhu', 'Smid'] | [] | PLoS One | 2010 | 11/30/2010 | 0 | the region of mutated genes in various cancers, thereby identifying the region for next-generation sequencing. Methods/Principal Findings We retrieved large genomic data sets from the Gene Expression Omnibus database to perform genome-wide analysis of aUPD in breast tumor samples and cell lines using approaches that can reliably detect aUPD. aUPD was identified in 52.29% of the tumor samples. The|cted using AsCNAR/CNAGv3 software ( http://genome.umin.jp ) [34] . The raw data (CEL files) of the Affymetrix GeneChip DNA-mapping microarrays from six sets of breast cancer samples; GSE3743 [5] , GSE7545 [35] , GSE10099 [36] , GSE16619 [37] , GSE19399 [38] and GSE13696) [39] were re|ression Omnibus (GEO) database ( http://www.ncbi.nih.nlm.gov/geo ). The analysis was done by using non-self controls with sex-matched reference samples from HapMap data and from previously published, publicly available datasets; GSE14656 [40] , GSE14860 [41] , GSE10922 [42] , GSE11417 [43] , GSE10092 [44] , GSE9611 � | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1302 | GSE8340 | 12/1/2007 | ['8340'] | [] | [] | 2861709 | [u'20442785'] | [u'Lamb', u'Maynard', u'Schaefer', u'Wynn', u'Crow', u'Davies', u'Pesce'] | ['Lamb', 'Maynard', 'Riner', 'Wynn', 'Crow', 'Walls', 'Schaefer', 'Davies', 'Pesce'] | [u'Lamb', u'Maynard', u'Schaefer', u'Wynn', u'Crow', u'Davies', u'Pesce'] | PLoS Pathog | 2010 | 4/29/2010 | 0 | 00ef;ve CD4 + T cells, we used a whole genome microarray to compare transcript levels in liver tissue from non-infected RAG-1 -/- and OT-II/RAG-1 -/- mice (NCBI GEO series accession number GSE8340{{tag}}.) Using this approach, we identified 165 genes that were differentially expressed only in the presence of naïve CD4 + T cells ( Figure 3A and Table S1 ). None of these genes are |were considered significant. Accession numbers The gene expression data discussed in this manuscript have been deposited in the NCBI's GEO database and are available under GEO series accession number GSE8340{{tag}}--DEPOSIT--. Supporting Information Table S1 Differential gene expression in liver tissue of RAG-1 -/- and OT-II/RAG-1 -/- mice. (0.29 MB DOC) Click here for additional data file. Table S2 Biological Functio | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1303 | GSE8348 | 7/12/2007 | ['8348'] | [] | [u'17725816'] | 2174512 | [u'17725816'] | ['Teusink', 'de', 'Serrano', 'Bron', 'Wels', 'Molenaar', 'Smid'] | ['Teusink', 'de', 'Serrano', 'Bron', 'Wels', 'Molenaar', 'Smid'] | ['Teusink', 'de', 'Serrano', 'Bron', 'Wels', 'Molenaar', 'Smid'] | Microb Cell Fact | 2007 | 8/28/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1304 | GSE8352 | 7/7/2007 | ['8352'] | [] | [u'17724126'] | 2118696 | [u'17724126'] | ['McKinstry', 'Lee', 'Weng', 'Huston', 'Golech', 'Swain'] | ['McKinstry', 'Lee', 'Weng', 'Huston', 'Golech', 'Swain'] | ['McKinstry', 'Lee', 'Weng', 'Huston', 'Golech', 'Swain'] | J Exp Med | 2007 | 9/3/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1305 | GSE8362 | 9/5/2007 | ['8362'] | [] | [u'17631629'] | 2045161 | [u'17631629'] | ['Bootsma', 'Hermans', 'Bijlsma', 'Kuipers', 'de', 'Burghout', 'Kloosterman'] | ['Bootsma', 'Hermans', 'Bijlsma', 'Kuipers', 'de', 'Burghout', 'Kloosterman'] | ['Bootsma', 'Hermans', 'Bijlsma', 'Kuipers', 'de', 'Burghout', 'Kloosterman'] | J Bacteriol | 2007 | 2007 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
1306 | GSE8365 | 7/15/2007 | ['8365'] | [] | [u'17683202'] | 2688271 | [u'19210792'] | ['Harmer', 'Covington'] | ['Quisel', 'Hazen', 'Borevitz', 'Gendron', 'Naef', 'Ecker', 'Chen', 'Kay'] | [] | Genome Biol | 2009 | 2/11/2009 | 0 | ry rate 5% p -value threshold (-) are considered to have a circadian rhythm. In addition to these consistencies, we compared the tiling array dataset with a similarly produced 2-day time course [GEO:GSE8365{{tag}}--REUSE--] [ 34 ] hybridized to the Affymetrix ATH1 gene array. The spectral analysis for each gene on the gene array was plotted against all of the features for that transcript on the tilling array. While c|se strand arrays using the affy Bioconductor package in R according to Bolstad et al [ 70 ]. The Affymetrix AtTILE1 Genechip data (.CEL files) have been deposited at the Gene Expression Omnibus [GEO:GSE13814]. Fourier/spectral analysis Hybridization efficiencies of oligonucleotide probes on tiling arrays vary considerably and some probes tend to be unresponsive. Thus, to avoid spurious decreases of si | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1307 | GSE8365 | 7/15/2007 | ['8365'] | [] | [u'17683202'] | 2726226 | [u'19604350'] | ['Harmer', 'Covington'] | ['Feng', 'Zhu', 'Wang', 'Zhang', 'Liu', 'Wu'] | [] | BMC Genomics | 2009 | 7/15/2009 | 0 | r, a site-specific posterior analysis was used to predict amino acid residues that were crucial for functional divergence. Investigation of transcription patterns Gene expression microarray datasets (GSE7951, GSE13161, GSE6893, GSE6908, and GSE6901 for rice; GSE680, GSE7641, and GSE8365{{tag}}--REUSE-- for Arabidopsis ) were downloaded from the GEO database in NCBI. The microarray data of rice include the analysis of | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1308 | GSE8365 | 7/15/2007 | ['8365'] | [] | [u'17683202'] | 1939880 | [u'17683202'] | ['Harmer', 'Covington'] | ['Harmer', 'Covington'] | ['Harmer', 'Covington'] | PLoS Biol | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
1309 | GSE8371 | 7/6/2007 | ['8371'] | [] | [] | 2654745 | [u'18988674'] | [u'Puttaiah', u'Rudraiah', u'Samji'] | ['Jayaram', 'Priyanka', 'Sridaran', 'Medhamurthy'] | [] | Endocrinology | 2009 | 2009 Mar | 0 | s collected from monkeys treated with VEH or CET for 24 h; the hybridization details and individual CEL and CHP files have been deposited at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE8371{{tag}}--DEPOSIT-- . Microarray analysis results revealed that inhibition of pituitary LH secretion by CET treatment significantly ( P < 0.05) affected the expression of 3949 genes (>2-fold change wi|he CL of monkeys treated with CET plus PBS and CET plus rhLH, hybridization details, and individual CEL and CHP files have been deposited at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE7827 . Microarray analysis results revealed that replacement of exogenous rhLH after inhibition of pituitary LH secretion by CET treatment significantly ( P < 0.05) affected the expression of 4|3b1; -treated monkeys. Microarray comparison analysis, hybridization details, and individual CEL and CHP files have been deposited online at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE7971 . Microarray analysis results revealed that PGF 2α treatment significantly ( P < 0.05) affected the expression of 2290 genes in the CL (>2-fold change with Benjamini and H|lysis data generated for different stages of the CL of the rhesus monkey deposited recently in the public domain by Bogan et al . ( 70 ) at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE10367 . The analysis of microarray data between late midluteal phase and very late luteal phase by GeneSifter software revealed that 2882 genes were differentially regulated (1280 and 1522 up-regulation | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
1310 | GSE8379 | 7/6/2007 | ['8379'] | ['2804'] | [u'17616518'] | 2785812 | [u'19917117'] | ['Slattery', 'Heideman', 'Liko'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379{{tag}}--REUSE-- YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1311 | GSE8381 | 7/11/2007 | ['8381'] | [] | [u'17878319'] | 2323447 | [u'17878319'] | ['Evans', 'Mansergh', 'Perry', 'Elford', 'Wells'] | ['Evans', 'Mansergh', 'Perry', 'Elford', 'Wells'] | ['Mansergh', 'Perry', 'Evans', 'Elford', 'Wells'] | Physiol Genomics | 2007 | 12/19/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1312 | GSE8392 | 10/10/2007 | ['8392'] | [] | [u'17937500'] | 2014789 | [u'17937500'] | ['Preuss', 'Borevitz', 'Jones-Rhoades', u'Justin'] | ['Preuss', 'Borevitz', 'Jones-Rhoades'] | ['Preuss', 'Borevitz', 'Jones-Rhoades'] | PLoS Genet | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1313 | GSE8393 | 7/31/2007 | ['8393'] | [] | [] | 2287379 | [u'18272145'] | [u'Birren', u'Moon'] | ['Birren', 'Moon'] | ['Birren', 'Moon'] | Dev Biol | 2008 | 3/15/2008 | 0 | re period ( Supplementary Table 1 ). Overall, these changes represented approximately 1.6% of the probe set. Microarray data can be accessed at http://www.ncbi.nlm.nih.gov/geo/ (Accession numbers: GSE8393{{tag}}--DEPOSIT-- , GSM206651 , GSM206655 ). We verified the expression changes of 14 genes using real-time PCR. The direction of target-induced changes in mRNA levels correlated with microarray analysis with a gr | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1314 | GSE8401 | 10/2/2007 | ['8401'] | [] | [u'18505921'] | 2831002 | [u'20064233'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux'] | [] | BMC Bioinformatics | 2010 | 1/11/2010 | 0 | Data collection (cel files) was performed using Gene Expression Omnibus [19] on the Affymetrix platform HG-U133a (Human Genome model U133a). This collection consists of 34 datasets (tableÊ2) for which there are at least 15 replicates for each of 2 different experimental conditions. {{key}}--REUSE-- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1315 | GSE8401 | 10/2/2007 | ['8401'] | [] | [u'18505921'] | 2978222 | [u'20969778'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux', 'Habra'] | [] | BMC Bioinformatics | 2010 | 10/22/2010 | 0 | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 | |
1316 | GSE8401 | 10/2/2007 | ['8401'] | [] | [u'18505921'] | 2978353 | [u'20621981'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Koman', 'Unl\xc3\xbc', 'Nist\xc3\xa9r', 'Kavak'] | [] | Nucleic Acids Res | 2010 | 11/1/2010 | 0 | Es, we point to the distinct expression of a subgroup of the keratin gene family compared to epidermal differentiation complex (EDC) ( 12 ) expression in skin cancers. Finally we make all of our data publicly available in an online tool, oncoreveal ( http://www.oncoreveal.org ). METHODS Data collection and processing In total 170 normal and 132 cancer SAGE libraries representing 32 different types of canc|m and connective tissue tumors. This data has been published in at least 32 separate articles (see Supplementary Tables S2 and 3 for lists and references). Of the 37 microarray data sets, 25 were downloaded from Oncomine. The rest was collected from the Entrez GEO database. In order to make the GEO data comparable to oncomine data, data was log2 transformed and median was set to 0 by subtracting | less than three expression data points were ignored in the calculations. To define NA-RIDGEs, we used the processed expression values from a wide histologically normal human tissue expression study (GSE2361) from Entrez GEO. ( 16 ) To assign P- value thresholds to this average score (to assess which scores are significantly high), we calculated the average R s score for a gene with N random genes f|significant average Spearman scores ( P >  0.05) where N stands for the window size of the non-RIDGE region. The melanoma metastasis dataset was adapted from Entrez GEO record GSE8401{{tag}}--REUSE-- ( 17 ); which represents RNAs of fresh frozen tissue samples from either primary melanomas or melanoma metastasis samples. RESULTS We performed a gene and SAGE tag-centric meta-analysis of cancer g | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1317 | GSE8401 | 10/2/2007 | ['8401'] | [] | [u'18505921'] | 2880990 | [u'20433688'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Berger', 'Michiels', 'Pierre', 'Depiereux', 'DeHertogh', 'Bareke', 'DeMeulder', 'Gaigneaux'] | [] | BMC Cancer | 2010 | 4/30/2010 | 0 | 0 | 0 | 0 | NOT pmc_gds | 0 | 1 | |
1318 | GSE8401 | 10/2/2007 | ['8401'] | [] | [u'18505921'] | 3008357 | [u'19483725'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Baba', 'Gatza', 'Matsumura', 'Nevins', 'Murphy', 'Andrechek', 'Yao', 'Mori', 'Kim', 'Chang'] | [] | Oncogene | 2009 | 8/6/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
1319 | GSE8401 | 10/2/2007 | ['8401'] | [] | [u'18505921'] | 2756991 | [u'18505921'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Subramanian', 'Ramaswamy', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Xu', 'Brunet', 'Ross'] | ['Subramanian', 'Xu', 'Hoshida', 'Shen', 'Mesirov', 'Wagner', 'Hynes', 'Ramaswamy', 'Brunet', 'Ross'] | Mol Cancer Res | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
1320 | GSE8407 | 8/1/2007 | ['8407'] | [] | [u'18371379'] | 2600583 | [u'18674933'] | ['Bryder', u'M\xe5nsson', u'Kwok', 'Weissman', 'Attema', 'Chan', 'Pronk', 'M\xc3\xa5nsson', 'Norddahl', 'Sigvardsson', 'Rossi', u'Logi'] | ['M\xc3\xa5nsson', 'Gurbuxani', 'Sigvardsson', 'Dias', 'Kee'] | ['M\xc3\xa5nsson', 'Sigvardsson'] | Immunity | 2008 | 8/15/2008 | 0 | . Probe level expression values were calculated with RMA ( Irizarry et al., 2003 ). Further analysis was performed using dChip ( www.dchip.org ). Array data are accessible through the gene expression omnibus (GEO; GSE8407{{tag}}--DEPOSIT-- and GSE7302 ). Transcription factor binding sites conserved between human and mouse genomic sequences were identified in the promoters of E2A-dependent lymphoid-associated genes and | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1321 | GSE8407 | 8/1/2007 | ['8407'] | [] | [u'18371379'] | 2830985 | [u'20152025'] | ['Bryder', u'M\xe5nsson', u'Kwok', 'Weissman', 'Attema', 'Chan', 'Pronk', 'M\xc3\xa5nsson', 'Norddahl', 'Sigvardsson', 'Rossi', u'Logi'] | ['Qian', 'Hansson', 'Zetterblad', 'Zandi', 'Bryder', 'Paulsson', 'Lagergren', 'M\xc3\xa5nsson', 'Sigvardsson'] | ['M\xc3\xa5nsson', 'Bryder', 'Sigvardsson'] | BMC Genomics | 2010 | 2/12/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
1322 | GSE8416 | 8/6/2007 | ['8416'] | [] | [u'17646409', u'20007543'] | 2782329 | [u'19775282'] | ['Neuberg', 'Hardwick', 'Draetta', 'Winter', 'Strack', 'Sears', "O'Neil", 'Rao', 'von', 'Meijerink', 'Ahn', 'Li', 'Welcker', 'Sanda', 'Clurman', 'Gutierrez', 'Pieters', 'Larson', 'Tibbitts', 'Grim', 'Look'] | ['Draetta', 'Van', 'Winter', 'Majumder', 'Elbi', 'Tammam', 'Reilly', 'Leach', "O'Neil", 'Dai', 'Efferson', 'Kunii', 'Kim', 'Ware', 'Hardwick', 'Qu', 'Look', 'Strack', 'Bristow', 'Gorenstein', 'Zhao', 'Angagaw', 'Rao', 'Kohl', 'Nikov', 'Scott', 'Kenific'] | ['Hardwick', 'Draetta', 'Winter', 'Strack', "O'Neil", 'Rao', 'Look'] | Br J Pharmacol | 2009 | 2009 Nov | 0 | treated with either GSI or DMSO and performed an unsupervised analysis of mRNA profiles from 13 T-ALL cell lines using Ingenuity® software (submitted in the GEO database, accession number # GSE8416{{tag}}--DEPOSIT-- ). Genes that were up-regulated upon GSI treatment and correlated with GSI sensitivity included those involved in promoting apoptosis and PPARα signalling ( Figure 6A left panel). Striking | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1323 | GSE8416 | 8/6/2007 | ['8416'] | [] | [u'17646409', u'20007543'] | 2118656 | [u'17646409'] | ['Neuberg', 'Hardwick', 'Draetta', 'Winter', 'Strack', 'Sears', "O'Neil", 'Rao', 'von', 'Meijerink', 'Ahn', 'Li', 'Welcker', 'Sanda', 'Clurman', 'Gutierrez', 'Pieters', 'Larson', 'Tibbitts', 'Grim', 'Look'] | ['Hardwick', 'Draetta', 'Winter', 'Strack', 'Sears', "O'Neil", 'Rao', 'Meijerink', 'Welcker', 'Clurman', 'Pieters', 'Tibbitts', 'Grim', 'Look'] | ['Hardwick', 'Draetta', 'Winter', 'Strack', 'Sears', "O'Neil", 'Rao', 'Meijerink', 'Welcker', 'Clurman', 'Pieters', 'Tibbitts', 'Grim', 'Look'] | J Exp Med | 2007 | 8/6/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1324 | GSE8425 | 7/13/2007 | ['8425'] | ['2816'] | [u'17321057'] | 2949890 | [u'20831831'] | ['Santos', u'De', 'de', 'Porteros', u'Fern\xe1ndez-Medarde', 'Fuster', 'Fern\xc3\xa1ndez-Medarde', u'Nu\xf1ez', 'N\xc3\xba\xc3\xb1ez'] | ['Monsonego', 'Rubin', 'Geifman'] | [] | BMC Bioinformatics | 2010 | 9/12/2010 | 0 | ound 0.456971* 0.505641 External side of plasma membrane 0.744269 0.508488* Not Found (A) Enrichment in genes with up-regulated expression in hippocampus treated with Fluoxetine, based on GEO profile GSE6476. All terms passed FDR > 0.25 and state the FDR values. (B) GEO profile GSE6675. All terms passed FDR < 0.25 and state the FDR values (Up-regulated in Control in comparison to FGF2 t|ture. † indicates terms that were not included in the NIGO subset. For three GEO expression profiles, the results of the analysis are described in detail (Table 1 ). In the analysis of the GSE6476, in which the effect of chronic Fluoxetine treatment on hippocampal gene expression was examined [ 10 ], eleven terms passed statistic filtering with NIGO and not with the full GO, out of which 7 w|ded in the full GO but not in NIGO passed the statistic cutoffs but this term ('Protein self-association') was functionally irrelevant and contributed very little to the analysis. For the GEO profile GSE6675, in which astroglial gene expression program elicited by fibroblast growth factor-2 was examined, four terms relevant to neural/immune systems passed statistic filtering with NIGO and not with the |ith the full GO. This is an example of how without the use of NIGO one would have to raise the cutoff to at least 0.55 in order for this term to appear in the analysis results. Functional analysis of GSE6509 with NIGO also revealed statistically significant terms that were missed when using the full GO. This experiment involved microarray expression profiling designed to explore the effect of RU486, a |ed both in NIGO and the full GO, received higher FDR values when NIGO was used (Table 2 ), contrary to our expectations from the impact of reducing the ontology size. For example, in the analysis of GSE8788 in which gene expression analysis was conducted using Trib1-deficient macrophages treated with LPS as compared to LPS-treated wild-type macrophages [ 12 ], 11 out of 14 terms (that passed statistic|ment in FDR values when using NIGO is partially due to the stochastic nature of the GSEA algorithm. To test this hypothesis, the same analysis was repeated three times with each of the ontologies for GSE8788. FDR values were averaged and a comparison of analysis results was performed based on these averaged FDR values. In accordance with our hypothesis, the averaging of FDR values improved the apparent|l GO and GO-slim in enrichment analysis using GSEA Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.05 2 (2) 5 0 0 2 0 GSE6675 FDR < 0.25 0 6 0 0 1 0 GSE6476 FDR < 0.25 1 (1) 11 0 0 4 0 GSE3779 P < 0.05 7 (3) 13 1 (1) 0 1 5 GSE8425{{tag}}--REUSE-- P < 0.05 6 (1) 0 0 | 0 0 0 1 GSE8788 P < 0.01 6 (6) 0 0 11 3 1 GSE8788* P < 0.01 6 (6) 0 0 3 11 1 GSE9659 P < 0.01 6 (6) 0 NS A 4 28 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.01 5 (5) 0 0 0 1 2 GSE2259 FDR < 0.25 3 (1) 0 1 (1) 1 2 1 GSE8191 P < 0.05 5 (2) 0 0 0 0 0 The enrichment analysis results of GSEA, providing the full GO, NIGO or GO-s| for each subset given in separate columns. *To test the effect of stochasticity, the analysis of this profile is based on 3-fold averaged FDR values from three independent GSEA analyses. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. Three non-neural- or immune-related datasets were used to test the performance of NIGO in functional analysis of microarray data in|. These datasets included the GEO expression profile, GSE7407, in which gene expression in heart tissue with cardiac specific over-expression of Sirt1 was examined [ 13 ], the GEO expression profile, GSE8191, in which the gene expression profile of mammary glands from pregnant mice was compared to that of mammary glands from lactating mice [ 14 ], and the GEO expression profile, GSE2259, in which gene |in enrichment analysis using the Fisher Exact Test Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.0001 1 (1) 1 0 0 22 2 GSE6675 P < 0.05 1 (1) 0 11 (1) 0 2 0 GSE6476 P < 0.0001 0 2 2 0 15 3 GSE3779 P < 0.1 0 0 0 0 2 0 GSE8425{{tag}}--REUSE-- P < 0.1 0 0 0 0 0 0 GSE|0.001 0 2 0 0 9 1 GSE6136 P < 0.000001 0 1 0 0 37 7 GSE8788 P < 0.01 0 2 0 0 4 1 GSE9659 P < 0.001 0 3 NS A 0 33 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.0000000001 33 (30) 0 0 9 0 0 GSE2259 P < 0.1 63 (35) 0 2 8 8 7 GSE8191 P < 0.001 6 (6) 7 6 (2) 0 16 3 Enrichment analysis results from Ontologizer, providing the full |ional GO subset (the full GO or generic slim) are also shown for each term ('Lowest P-Value'), with the number of terms with the lowest P-values for each subset given in separate columns. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. These results, together with the analysis of five additional neural/immune-related experiments not described in Table 1 (namely p| cell's activity. These results further demonstrate that analysis with NIGO can enhance interpretation of functional analysis results produced for relevant microarray datasets. In the analysis of the GSE6509 expression dataset, three relevant terms passed the statistical cutoff with NIGO but not with the full GO. These terms, 'viral envelope', 'viral infectious cycle' and 'viral capsid' are all terms r|et. Such terms received FDR or p-values that were very close (but larger) than the cutoff values used. This is partially explained by the stochastic nature of the GSEA algorithm. Indeed, for one set (GSE8788), we compared the raw results with averaged FDR values. Averaging dramatically decreased the number of such terms. Furthermore, we conducted a similar functional analysis using the Fisher Exact Tes|nous peptide antigen via MHC class I' and 'Positive regulation of T cell mediated cytotoxicity' without defeating the purpose of the slimming process. Yet these two terms were found to be enriched in GSE6476, and are crucial for generating a hypothesis based on the expression profile. This shows that GO slims may be complemented by small, yet fully detailed domain-specific subsets of GO. Figure 4 The U|plementary/NIGO/Supplementary.html . The ontologies were clipped using the Protégé 4.0 beta OWL editor [ 26 ]. For by-species filtering, annotation files for human, rat and mouse were downloaded (October 2008) from GOA-EBI [ 27 ]. Association files used for GSEA analysis were generated based on the GOA-EBI annotation files and in the format required by GSEA. In this format, each row repres|a analysis and functional analysis were conducted using the GenePattern [ 29 ], GSEA [ 9 ] (release 2.5) web servers, and Ontologizer [ 16 ] as follows: (1) for each study, raw data (.CEL files) were downloaded from GEO [ 30 ]; (2) expression files (.gct files) were created using the Gene Pattern Expression File Creator module; (3) where necessary (i.e. for expression files GSE6509, GSE6675, GSE6476, GSE7|was conducted using the GSEA module. GSEA was run three times for each dataset, using a different GO version for each run. For the full GO, we used the organism-specific GO subset. In the analysis of GSE8788, GSEA was run three times for each of the three ontologies and FDR values were averaged over the three runs. (5) Differentially-expressed genes were found using the Gene Pattern ComparativeMarkerSe|, to an input term (Figure 4 ). The connections of UMLS concepts to the tested GO term are provided by UMLS and defined within the UMLS data files. Microarray data sets All microarray data sets were downloaded from GEO at NCBI [ 30 ]. The GEO sets used in this study are described in Additional file 2 . Availability NIGO is freely available as Additional file 6 and for download from: http://bioinfo.bg|re included or excluded from NIGO. Click here for file Additional file 2 Microarray Datasets for Comparative analysis of NIGO, GO and GO-slim . This file contains a summary of the microarray datasets downloaded from GEO at NCBI and used to test the performance of NIGO. Click here for file Additional file 3 GSEA analysis results . This file contains a summary of the GSEA analysis results for each of the mi|etroviral infectivity through different mechanisms Am J Physiol Lung Cell Mol Physiol 2009 297 L538 545 10.1152/ajplung.00162.2009 19561138 The Gene Ontology Consortium http://www.geneontology.org/GO.downloads.ontology.shtml Web Ontology Language Guide http://www.w3.org/TR/owl-guide Protégé 4.0 beta http://protege.stanford.edu Barrell D Dimmer E Huntley RP Binns D O'Donovan C Apweiler R Th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1325 | GSE8425 | 7/13/2007 | ['8425'] | ['2816'] | [u'17321057'] | 2945940 | [u'20840752'] | ['Santos', u'De', 'de', 'Porteros', u'Fern\xe1ndez-Medarde', 'Fuster', 'Fern\xc3\xa1ndez-Medarde', u'Nu\xf1ez', 'N\xc3\xba\xc3\xb1ez'] | ['Ojeda', 'de', 'Nitsch', 'Gon\xc3\xa7alves', 'Moreau'] | ['de'] | BMC Bioinformatics | 2010 | 9/14/2010 | 0 | n, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expr| approach for scoring candidates is presented here for comparison purposes as a naïve strategy for network analysis of differential expression. We have benchmarked these four strategies on 40 publicly available data sets originated from Affymetrix chips on which mice with (simple) knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. For each data set we|egy based on a direct neighborhood analysis. These four network-based prioritization strategies for scoring candidate genes based on their differentially expressed neighborhood were benchmarked on 40 publicly available knockout experiments in mice. Performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expressi|ng candidate genes using network-based machine learning approaches even if no knowledge is available about the disease or phenotype. Methods Benchmark data The benchmark for this study consists of 40 publicly available data sets originated from Affymetrix chips on which mice with knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. Table 3 shows all data sets|ur benchmark. Table 3 The benchmark data. Gene Name GEO accession number Gene Name GEO accession number 1 Abca1 GSE5496 21 Mbnl1 GSE14691 2 Btk GSE2826 22 Mst1r, Ron GSE16629 3 Cav1 GSE10849 23 MyD88 GSE6688 4 Cav3 GSE10848 24 Nos3, eNos GSE1988 5 Cftr GSE5715 25 Phgdh GSE8555 6 Clcn1 GSE14691 26 Pmp22 GSE1947 7 Cnr1 GSE7694 27 PPAR α GSE6864 8 Emd GSE5304 28 Prkag3, AMPK G3 GSE4065 9 Epas1, Hi|-2 GSE16067 29 Pthlh, Pthrp GSE17654 10 Esrra GSE7196 30 Rab3a GSE6527 11 Gap43 GSE12687 31 RasGrf1 GSE8425{{tag}}--REUSE-- 12 Gnmt GSE9809 32 Rbm15 GSE12628 13 Hdac1 GSE5583 33 Runx GSE4911 14 Hdac2 GSE6770 34 Scd1 GSE2926 15 Hsf4 GSE12415 35 Slc26a4 GSE10587 16 Hspa1A, Hsp70.1 GSE11120 36 Srf GSE13333 17 Il6 GSE411 37 Tgm2 GSE10285 18 Lhx1, Lim1 GSE4230 38 Zc3h12a GSE14891 19 Lhx8 GSE11897 39 Zfp36, Tpp GSE5324 20 L | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1326 | GSE8426 | 12/28/2007 | ['8426'] | [] | [u'17988385'] | 2258177 | [u'17988385'] | ['Longo', 'Zonderman', 'Li', 'Zhang', 'Poosala', 'Ibe', 'Firman', 'Mattson', 'Wood', 'Prabhu', 'Duan', 'Becker', 'Xu', 'Zhan', 'Brenneman'] | ['Longo', 'Zonderman', 'Li', 'Zhang', 'Poosala', 'Ibe', 'Firman', 'Mattson', 'Wood', 'Prabhu', 'Duan', 'Becker', 'Xu', 'Zhan', 'Brenneman'] | ['Longo', 'Zonderman', 'Zhang', 'Wood', 'Ibe', 'Li', 'Mattson', 'Brenneman', 'Prabhu', 'Poosala', 'Becker', 'Firman', 'Duan', 'Zhan', 'Xu'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1327 | GSE8427 | 7/20/2007 | ['8427'] | [] | [u'18070919'] | 2258740 | [u'18070919'] | ['Shen', 'Renou', 'Zhao', u'Martin-Magniette', 'Dong', u'Taconnat', 'Soubigou-Taconnat', 'Steinmetz', 'Xu'] | ['Shen', 'Renou', 'Zhao', 'Dong', 'Soubigou-Taconnat', 'Steinmetz', 'Xu'] | ['Shen', 'Renou', 'Zhao', 'Dong', 'Soubigou-Taconnat', 'Steinmetz', 'Xu'] | Mol Cell Biol | 2008 | 2008 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
1328 | GSE8428 | 7/11/2007 | ['8428'] | [] | [u'17699609'] | 2752274 | [u'19607825'] | ['Maves', 'Waskiewicz', 'Tyler', 'Cao', 'Paul', 'Tapscott', 'Moens'] | ['Tyler', 'Tapscott', 'Maves', 'Moens'] | ['Tyler', 'Tapscott', 'Maves', 'Moens'] | Dev Biol | 2009 | 9/15/2009 | 0 | d a False Discovery Rate of <0.05 to identify genes whose expression is significantly dependent on Pbx function. Array data are available at NCBI GEO ( www.ncbi.nlm.nih.gov/geo/ ), accession GSE8428{{tag}}--DEPOSIT-- . Fig. 2 Validation of microarray-identified Pbx-dependent heart genes. (A–R) RNA in situ expression of genes from Table 1 in (A,C,E,G,I,K,M,O,Q) wild-type control or (B,D,F,H,J,L,N,P,R) |ed Pbx-dependent genes in all tissues ( Maves et al., 2007 ). From the gene lists generated from the arrays, we identified the genes that are expressed in wild-type heart precursors, according to the publicly available RNA in situ database ( Thisse and Thisse, 2005 ), or that are known to regulate vertebrate heart development ( Table 1 ). For each of these heart genes, we performed RNA in situ hybridizati | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1329 | GSE8429 | 7/20/2007 | ['8429'] | [] | [u'18070919'] | 2258740 | [u'18070919'] | ['Shen', 'Renou', 'Zhao', u'Martin-Magniette', 'Dong', u'Taconnat', 'Soubigou-Taconnat', 'Steinmetz', 'Xu'] | ['Shen', 'Renou', 'Zhao', 'Dong', 'Soubigou-Taconnat', 'Steinmetz', 'Xu'] | ['Shen', 'Renou', 'Zhao', 'Dong', 'Soubigou-Taconnat', 'Steinmetz', 'Xu'] | Mol Cell Biol | 2008 | 2008 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
1330 | GSE8431 | 12/18/2007 | ['8431'] | [] | [u'18164278'] | 2988767 | [u'18164278'] | ['Gudas', 'Su'] | ['Gudas', 'Su'] | ['Gudas', 'Su'] | Biochem Pharmacol | 2008 | 3/1/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1331 | GSE8441 | 7/11/2007 | ['8441'] | ['2868'] | [u'17490972'] | 2975422 | [u'21047384'] | ['Campbell', 'Fleet', 'Carnell', 'Thalacker-Mercer', 'Craig'] | ['Xu', 'Hu'] | [] | BMC Genomics | 2010 | 11/2/2010 | 0 | essed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test|dary region in the 2-D feature space of average gene expression (AG) versus average difference of gene expression (AD). Fig. 1 . shows the distribution of true DEGs in the 2D space for four datasets: GSE9499, GSE6342, GSE6740_1, and GSE6740_2 from GEO database [ 22 ]. Based on the fact that boundary region is characterized with scarcity of genes, a density based pruning algorithm is proposed here for p|s as collected by Kadota et. al. [ 20 ]. They collected 38 microarray datasets with experimentally determined true DEGS by real-time polymerase chain reaction (RT-PCR). Thirty six of the datasets are downloaded from GEO database [ 22 ]. Without losing generality, we experimented with 17 disease or dose response datasets of Homo sapiens out of the 36 GEO datasets (Table 1 ). The 17 datasets are reported j|how that real-world GEO datasets, especially historical ones, tend to have small sample size. Table 1 17 Datasets with 284 DEGs in total. Each dataset has 22833 genes. Dataset Conditions True DEG A B GSE1462 4 4 4 GSE1615_1 4 5 8 GSE1650 18 12 8 GSE2666_2 5 5 6 GSE3524 16 4 4 GSE3860 9 9 8 GSE4917 3 3 5 GSE5667_1 5 6 3 GSE6236 14 14 7 GSE6344 10 10 19 GSE6740_1 10 10 40 GSE6740_2 10 10 62 GSE7146 6 6 6|GSE8441{{tag}}--REUSE-- 11 11 9 GSE9499 15 7 77 GSE9574 15 14 5 The 17 Datasets used here cover a variety of biological or medical studies: GSE1462 (mitochondrial DNA mutations), GSE1615_1 (Valproic acid treatment), GSE1650 (chronic obstructive pulmonary disease), GSE2666_2(bone marrow Rho level effect), GSE3524 (tumor of epithelial tissue), GSE3860 (Hutchinson-Gilford progeria syndrome), GSE4917 (breast cancer), GSE5|67_1 (atopic dermatitis), GSE6236 (Adult vs. fetal reticulocyte transcriptome comparison), GSE6344 (renal cell carcinoma disease), GSE6740_1 (HIV-infection), GSE6740_2 (HIV-infection, disease state), GSE7146 (hyperinsulinaemic, does response), GSE7765 (dose response, DMSO or 100 nM Dioxin), GSE8441{{tag}}--REUSE-- (dietary intake response), GSE9574 (breast cancer), and GSE9499 (hypomorphic germline mutations). The div|ch as WAD and FC. To illustrate the bias of popular DEG identification algorithms, Fig. 2 shows true positive (TP), true negative (TN), false positive (FP), and false negative (FN) DEGs for dataset GSE9499 which has 77 true DEGs. Fig. 2 . (a) shows that fold change (FC) misses most true DEGs (FN genes), which are located in the region below the threshold average difference and with high expression le|obtaining a user-specified number of candidate DEGs. Table 2 Comparison of No. of missing true DEGs after DB pruning. ( N 0 = 4, R 0 = 0.0017) Total Gene: 22283 After DP-pruning True DEG DP missed GSE1462 2054 4 0 GSE1615_1 2449 8 3 GSE1650 1317 8 2 GSE2666_2 1618 6 2 GSE3524 814 4 0 GSE3860 2073 8 0 GSE4917 785 5 1 GSE5667_1 1316 3 0 GSE6236 2231 7 0 GSE6344 3127 19 0 GSE6740_1 1183 40 1 GSE6740_2 |DEG to have high expression levels and high expression difference. Table 3 Ranks of true DEGs in original gene list and pruned gene list. Genes are sorted by four DEG identification algorithms on the GSE1577 dataset. Increase of ranks of true DEGs means that DB pruning have correctly filtered out many non-DEGs. t-test/tTest’ 1404/808 7/6 1321/768 3800/1713 4741/1975 3633/1659 4145/1828 606/388 |on level remains the same. And the average difference of expression can be defined as sum of average difference among pairwise comparisons. Our DB pruning is implemented using C++ and Perl and can be downloaded from http://mleg.cse.sc.edu/degprune . Authors’ contributions JH initiated the project, proposed the DB pruning idea, helped in experimental designs, and wrote the manuscript. J. X. develo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1332 | GSE8450 | 11/30/2007 | ['8450'] | [] | [u'18079700'] | 2206136 | [u'18079700'] | ['Genier', 'Vaur', 'Drogat', 'Bernard', 'Schmidt', 'Uhlmann', 'Dheur', 'Ekwall', 'Javerzat'] | ['Genier', 'Vaur', 'Drogat', 'Bernard', 'Schmidt', 'Uhlmann', 'Dheur', 'Ekwall', 'Javerzat'] | ['Genier', 'Vaur', 'Drogat', 'Bernard', 'Schmidt', 'Uhlmann', 'Dheur', 'Ekwall', 'Javerzat'] | EMBO J | 2008 | 1/9/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1333 | GSE8462 | 7/13/2007 | ['8462'] | [] | [u'17848203'] | 2375026 | [u'17848203'] | ['Fukushige', 'Brodigan', 'Miller', 'Von', 'Fox', 'Krause', 'McDermott', 'Watson'] | ['Fukushige', 'Brodigan', 'Miller', 'Von', 'Fox', 'Krause', 'McDermott', 'Watson'] | ['Fukushige', 'Brodigan', 'Miller', 'Von', 'Fox', 'Krause', 'McDermott', 'Watson'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1334 | GSE8473 | 7/18/2007 | ['8473'] | [] | [u'17715369'] | 2043398 | [u'17715369'] | ['Edgerton', 'Jang', 'Vylkova', 'Nayyar', 'Li'] | ['Edgerton', 'Jang', 'Vylkova', 'Nayyar', 'Li'] | ['Nayyar', 'Edgerton', 'Vylkova', 'Jang', 'Li'] | Eukaryot Cell | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1335 | GSE8478 | 7/14/2007 | ['8478'] | ['3122', '3121', '3120'] | [u'17977147', u'17951393'] | 2223695 | [u'17993534'] | ['Moser', 'Fischer', 'Friberg', 'Pessi', 'Lindemann', 'Hauser', 'Rehrauer', 'Hennecke', 'Ahrens'] | ['Hacker', 'Aktas', 'Narberhaus', 'Sohlenkamp', 'Geiger'] | [] | J Bacteriol | 2008 | 2008 Jan | 0 | ere expressed in B. japonicum (Fig. ​ (Fig.4). 4 ). This finding was confirmed by primer extension analysis (data not shown) and by recently deposited microarray data in the Gene Expression Omnibus (GEO) database ( http://www.ncbi.nlm.nih.gov/geo/ ; accession number GSE8478{{tag}}--MENTION-- ). Consistent with our previous primer extension analysis ( 30 ), significant expression between 200 and 300 MU was detec|ere obtained, recent microarray studies had shown that expression of these genes did not change under microaerobic conditions (NCBI GEO database; http://www.ncbi.nlm.nih.gov/geo/ ; accession number GSE8478{{tag}} ). Construction and phospholipid profiling of B. japonicum PC biosynthesis mutants. To examine the phenotypic importance of each putative PC biosynthesis gene, we constructed a series of indiv | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
1336 | GSE8480 | 11/1/2007 | ['8480'] | [] | [u'17873074'] | 2074979 | [u'17873074'] | ['Eltis', 'Sales', 'Liu', 'Wood', 'Sharp', 'LeBlanc', 'Mohn', 'Alvarez-Cohen'] | ['Eltis', 'Sales', 'Liu', 'Wood', 'Sharp', 'LeBlanc', 'Mohn', 'Alvarez-Cohen'] | ['Eltis', 'Sales', 'Liu', 'Wood', 'Sharp', 'LeBlanc', 'Mohn', 'Alvarez-Cohen'] | Appl Environ Microbiol | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1337 | GSE8481 | 7/17/2007 | ['8481'] | [] | [u'18545679'] | 2398777 | [u'18545679'] | ['Watanabe', 'Ishii', 'Kami', 'Umezawa', 'Naito', 'Saito', 'Matsumoto', 'Makino', 'Shiojima', 'Komuro', 'Toyoda', 'Takahashi'] | ['Watanabe', 'Ishii', 'Kami', 'Umezawa', 'Naito', 'Saito', 'Matsumoto', 'Makino', 'Shiojima', 'Komuro', 'Toyoda', 'Takahashi'] | ['Watanabe', 'Ishii', 'Kami', 'Umezawa', 'Naito', 'Saito', 'Matsumoto', 'Makino', 'Shiojima', 'Komuro', 'Toyoda', 'Takahashi'] | PLoS One | 2008 | 6/11/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1338 | GSE8482 | 9/1/2007 | ['8482'] | [] | [u'17675387'] | 2168666 | [u'17675387'] | ['Tikh', 'Paquette', 'Yarwood', 'Greenberg', 'Volper'] | ['Tikh', 'Paquette', 'Yarwood', 'Greenberg', 'Volper'] | ['Tikh', 'Paquette', 'Yarwood', 'Greenberg', 'Volper'] | J Bacteriol | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1339 | GSE8484 | 3/1/2007 | ['8484'] | [] | [u'16569449'] | 2912668 | [u'20689746'] | ['Asgharpour', 'Baba', 'Martino-Catt', 'Evans', 'Ehrenkaufer', 'Hamano', 'Fei', 'Crasta', 'Houpt', 'Gilchrist', 'Okada', 'Petri', 'Stroup', 'Singh', 'Nozaki', 'Mann', 'Trapaidze'] | ['Bousquet', 'Moore', 'Zhang', 'Gilchrist', 'Lannigan', 'Petri', 'Mann'] | ['Gilchrist', 'Petri', 'Mann'] | MBio | 2010 | 5/18/2010 | 0 | d amebae and amebae isolated from the cecal lumen 1 day after infection were downloaded from the National Institutes of Health Gene Expression Omnibus (accession numbers: platform, GPL9693; data set, GSE8484{{tag}}--REUSE--) ( 11 , 34 ). The data were reanalyzed using the reannotated E. histolytica genome (accession no. NZ_AAFB00000000; http://amoebadb.org/ ), the new Array Data Analysis and Management Sys | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1340 | GSE8486 | 11/19/2007 | ['8486'] | [] | [u'18243105'] | 2669738 | [u'18243105'] | ['Weng', 'Crawford', 'Furey', 'Meltzer', 'Shulha', 'Davis', 'Margulies', 'Boyle'] | ['Weng', 'Crawford', 'Furey', 'Meltzer', 'Shulha', 'Davis', 'Margulies', 'Boyle'] | ['Weng', 'Crawford', 'Furey', 'Meltzer', 'Shulha', 'Davis', 'Margulies', 'Boyle'] | Cell | 2008 | 1/25/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1341 | GSE8487 | 10/29/2007 | ['8487'] | [] | [u'17692804'] | 2730509 | [u'17692804'] | ['Kuehl', '', 'Santra', 'Gabrea', 'Wright', 'Barlogie', 'Williams', 'Hurt', 'Dave', 'Lenz', 'Hanamura', 'Demchenko', 'Shaughnessy', 'Zhao', 'Zhan', 'Davis', 'Staudt', 'Bellamy', 'Stephens', 'Xiao', 'Annunziata', 'Tan', 'Dang'] | ['Kuehl', 'Santra', 'Gabrea', 'Wright', 'Barlogie', 'Williams', 'Hurt', 'Dave', 'Lenz', 'Hanamura', 'Demchenko', 'Shaughnessy', 'Zhao', 'Zhan', 'Davis', 'Staudt', 'Bellamy', 'Stephens', 'Xiao', 'Annunziata', 'Tan', 'Dang'] | ['Kuehl', 'Santra', 'Gabrea', 'Wright', 'Barlogie', 'Demchenko', 'Williams', 'Hurt', 'Dave', 'Lenz', 'Hanamura', 'Staudt', 'Shaughnessy', 'Zhao', 'Zhan', 'Davis', 'Bellamy', 'Stephens', 'Xiao', 'Annunziata', 'Tan', 'Dang'] | Cancer Cell | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
1342 | GSE8489 | 7/19/2007 | ['8489'] | [] | [u'18978777'] | 2596672 | [u'18978777'] | ['Ma', 'Johnson', u'Anton', u'Medina', u'Nguyen', 'Wong', 'Ji', 'Jiang', 'Myers'] | ['Ma', 'Johnson', 'Wong', 'Ji', 'Jiang', 'Myers'] | ['Ma', 'Johnson', 'Wong', 'Ji', 'Jiang', 'Myers'] | Nat Biotechnol | 2008 | 2008 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1343 | GSE8493 | 7/18/2007 | ['8493'] | [] | [u'18154652'] | 2605562 | [u'19116666'] | ['Rodrigues', 'Nunes', 'de', 'da', 'Machado', 'Shida', 'Ribeiro', 'Coletta-Filho'] | ['Civerolo', 'Nakaya', 'Van', 'Vasconcelos', 'de', 'Kitajima', 'Chen', 'Souza', 'Paula'] | ['de'] | PLoS One | 2008 | 2008 | 0 | tomatic and asymptomatic plants isolated in South American (using as reference the 9a5c strain genome) (for further information, refer to da Silva et al [27] and GEO database, entry GSE8493{{tag}}--REUSE--). Briefly, all six strains were from a culture collection and collected from sweet orange trees: four (56a, 9.12c, 187b, and 36f) were obtained from CVC-affected trees, and represent the most preva|50% absents) Equal (50% absents) Equal (30% absents) Equal * Presence is given in terms of number of copies of each ORF within phages (data extracted from GEO database GSE8493{{tag}}--REUSE--). Region xfp4 is present in low copy number in both symptomatic and non-symptomatic strains (56a, 9.12c and CV21) and in equal copy numbers in the other strains. The principal feature of this regio| the expression profile under stress conditions, particularly under heat shock conditions (for further details, refer to [28] – [30] and GEO database entries GSE6619, GSE4161, GSE3044 and GSE4960). These microarray series were re-analyzed with the focus on the prophage-like regions ( Table 3 ). 10.1371/journal.pone.0004059.t003 Table 3 Phage-related genes diffe|; ↑ XF1869 phage-related protein ↑ a Expression under different medium growth conditions (3G10R against PW) (data extracted from da Silva et al [27] and GEO database GSE6619). b Expression under heat shock response, at 40°C, when compared to normal conditions of temperature (25°C) (data extracted from Koide et al [30] and GEO database |1). c Expression under heat shock response, at 40°C, when compared to normal conditions of temperature (29°C) (data extracted from Vencio et al [57] and GEO database GSE3044). d Expression of mutant strain (rpoE) of the strain J1a12 (against 9a5c array), under heat shock response, at 40°C, when compared to normal conditions of temperature (25°C) (data e|microarray datasets were performed in order to study the gene expression pattern of prophage-like elements in CVC strain in different heat shock conditions. Microarray data were extracted from series GSE3044, GSE4161, GSE4960, GSE6619 and GSE8493{{tag}}--REUSE-- [27] – [30] on NCBI's Gene Expression Omnibus (GEO) database ( http://www.ncbi.nlm.nih.gov/geo ) site [55&#|ially gene contents were identified by using the Significance Analysis of Microarray (SAM) method [56] with a false discovery rate (FDR) <2–5%. The data from GSE4161 was obtained direct from the original paper [30] supplementary materials ( http://blasto.iq.usp.br/˜tkoide/Xylella/Heat_shock ). Supporting Information Figure S1 Multiple | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1344 | GSE8501 | 7/18/2007 | ['8501'] | [] | [u'17612493'] | 3019229 | [u'21194446'] | ['Lim', 'Garrett-Engele', 'Farh', 'Bartel', 'Johnston', 'Grimson'] | ['S\xc3\xa6trom', 'Saito'] | [] | BMC Bioinformatics | 2010 | 12/31/2010 | 0 | We downloaded the Jackson [25], Lim [6], Grimson [5], and Linsley [21] datasets from the Gene Expression Omnibus (GEO) database [GEO:GSE5814, GEO:GSE2075, GEO:GSE8501{{key}}--REUSE--, GEO:GSE6838] [38] and the Birmingham [24] dataset from the ArrayExpress database [ArrayExpress:E-MEXP-668] [39]. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1345 | GSE8501 | 7/18/2007 | ['8501'] | [] | [u'17612493'] | 2945792 | [u'20799968'] | ['Lim', 'Garrett-Engele', 'Farh', 'Bartel', 'Johnston', 'Grimson'] | ['Agius', 'Sander', 'Koppal', 'Leslie', 'Betel'] | [] | Genome Biol | 2010 | 2010 | 0 | ated targets are missed. We note that this effect is not restricted to our particular choice of conservation measure or even to the mirSVR scoring system. We repeated the analysis with context scores downloaded from TargetScan and using their associated conservation scores ( P CT ) [ 26 ] and similarly found no improvement in detection rates of the most downregulated targets with increased P CT thresho|ing of microRNA regulation in a physiological context. Materials and methods Training and test data sets Training data The mRNA expression training data was taken from the Grimson et al. [ 8 ] [GEO:GSE8501{{tag}}--REUSE--] data set, containing expression arrays from HeLa cells transfected by miR-122a, miR-128a, miR-132, miR-133a, miR-142, miR-148b, miR-181a, miR-7, miR-9. Although mRNA expression was measured at 12 |on change after transfection was positive. Test data of microRNA transfection with mRNA expression measurements The mRNA expression test data set was taken from the Linsley et al. study [ 21 ] [GEO:GSE6838], which comprised expression data from let-7c, miR-103, miR-106b, miR-141, miR-15a, miR-16, miR-17-5p, miR-192, miR-20, miR-200a, and miR-215 transfection experiments (all measured after 24 h), and|-16, hsa-miR-30e-5p, hsa-miR-19b, hsa-miR-32, hsa-miR-20a and hsa-miR-21) and searched for their target sites in genes that are enriched in the AGO1-4 IPs. Microarray data from the IP experiments was downloaded from [ 36 ] and normalized using the GCRMA R package; log enrichment values were computed using the limma package. CLIP data Data was provided by private communication from the authors. Non-can|ing a weighted dynamic programming approach where matches in the seed regions have higher position-specific weights, resulting in alignments that strongly favor 5' base-pairing. 3' UTR sequences were downloaded from UCSC genome browser, with the longest UTR chosen from afilternative isoforms. "Canonical target" sites are defined as sites that contain minimally a 6-mer perfect match at positions 2 to 7 of | for Selbach et al. test set, and IP enrichment for Landthaler et al. test set) were Z -score transformed. Context score and PITA scores Context scores values were computed using the source code downloaded from [ 39 ] that implements the regression model described in Grimson et al. 2007. Briefly, context score is composed of three regression values, which are specific to each seed class, that model|r, T1A 7-mer, m8 7-mer and 8-mer; the context score is computed as the sum of the three regression values specific to the seed class. The computed context scores are highly correlated with the scores downloaded from TargetScan release 5.0 (0.96 average Pearson correlation, see Additional file 1 , Table S1). PITA scores were computed with code downloaded from [ 40 ] using default parameters. Target sites | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1346 | GSE8501 | 7/18/2007 | ['8501'] | [] | [u'17612493'] | 2909566 | [u'20508147'] | ['Lim', 'Garrett-Engele', 'Farh', 'Bartel', 'Johnston', 'Grimson'] | ['Krogh', 'Marks', 'Jacobsen', 'Wen'] | [] | Genome Res | 2010 | 2010 Aug | 0 | utic small RNAs.  Other Sections� Abstract Results and Discussion Methods References Methods Experimental and sequence data Microarray expression data sets were obtained from the NCBI Gene Expression Omnibus: 11 different miRNA transfections in HeLa cells measured 24 h after transfection (accession nos. GSE2075 and GSE8501{{tag}}--REUSE-- ) ( Lim et al. 2005 ; Grimson et al. 2007 ), miR-124 transfection time-series | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1347 | GSE8501 | 7/18/2007 | ['8501'] | [] | [u'17612493'] | 2775596 | [u'19767416'] | ['Lim', 'Garrett-Engele', 'Farh', 'Bartel', 'Johnston', 'Grimson'] | ['Zavolan', 'Hausser', 'Landthaler', 'Jaskiewicz', 'Gaidatzis'] | [] | Genome Res | 2009 | 2009 Nov | 0 | AND pmc_gds | 0 | 1 | ||||
1348 | GSE8509 | 7/24/2007 | ['8509'] | [] | [u'18184805'] | 2206600 | [u'18184805'] | ['Jones', 'Sharopova', 'Zhang', 'VandenBosch', 'Walker', 'Lohar'] | ['Jones', 'Sharopova', 'Zhang', 'VandenBosch', 'Walker', 'Lohar'] | ['Jones', 'Sharopova', 'Zhang', 'VandenBosch', 'Walker', 'Lohar'] | Proc Natl Acad Sci U S A | 2008 | 1/15/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1349 | GSE8510 | 8/1/2007 | ['8510'] | [] | [u'18000064'] | 2141839 | [u'18000064'] | ['Guillemin', 'Mays', 'de', 'Mills', 'Parks', 'Hobbs', 'Gilkes', 'Solomon', 'Wong', 'Guidez', 'Jovanovic', 'Grimwade', 'Pandolfi'] | ['Guillemin', 'Mays', 'de', 'Mills', 'Parks', 'Hobbs', 'Gilkes', 'Solomon', 'Wong', 'Guidez', 'Jovanovic', 'Grimwade', 'Pandolfi'] | ['Guillemin', 'Mays', 'de', 'Mills', 'Parks', 'Gilkes', 'Solomon', 'Hobbs', 'Guidez', 'Jovanovic', 'Grimwade', 'Wong', 'Pandolfi'] | Proc Natl Acad Sci U S A | 2007 | 11/20/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1350 | GSE8512 | 7/19/2007 | ['8512'] | [] | [u'18197246'] | 2732277 | [u'18753155'] | ['Smith', 'Chakrabarti', 'Peng', 'Bhasin', 'Chen', 'Kulkarni'] | ['Chen', 'Smith', 'Wang', 'Zhang'] | ['Chen', 'Smith'] | Bioinformatics | 2008 | 11/1/2008 | 0 | in local statistics for SPCA, GSEA and Fisher's exact methods, see details below. The expression data with 22 283 transcripts were obtained from Affymetrix U133a GeneChip platform (GEO Accession No. GSE2034 ). We first mapped these transcripts to EntrezGene ID and then associated them with GO biological process categories. In order to reduce the redundancy in GO, we further removed all child categorie|in objective was to identify gene sets associated with variations in lesion scores. Affymetrix 430v2 expression data from 93 female and 114 male mice were used for this experiment (GEO Accession No. GSE8512{{tag}}--REUSE-- ). Each sample had 22 174 expressed transcripts. After mapping these transcripts to EntrezGene ID and associating them with GO biological process categories, there were 9744 genes, mapped to 255 GO | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1351 | GSE8512 | 7/19/2007 | ['8512'] | [] | [u'18197246'] | 2174529 | [u'18197246'] | ['Smith', 'Chakrabarti', 'Peng', 'Bhasin', 'Chen', 'Kulkarni'] | ['Smith', 'Chakrabarti', 'Peng', 'Bhasin', 'Chen', 'Kulkarni'] | ['Smith', 'Chakrabarti', 'Peng', 'Bhasin', 'Chen', 'Kulkarni'] | PLoS One | 2008 | 1/16/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1352 | GSE8514 | 7/19/2007 | ['8514'] | ['2860'] | [u'17911395'] | 2620272 | [u'19014681'] | ['Mariniello', 'Shibata', 'Ye', 'Rainey', 'Mantero'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514{{tag}}--REUSE--, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1353 | GSE8514 | 7/19/2007 | ['8514'] | ['2860'] | [u'17911395'] | 2785812 | [u'19917117'] | ['Mariniello', 'Shibata', 'Ye', 'Rainey', 'Mantero'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311 ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514{{tag}}--REUSE-- HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1354 | GSE8519 | 8/9/2007 | ['8519'] | [] | [u'17761794'] | 1964862 | [u'17761794'] | ['Wanner', 'Walden', 'Nichols', 'Brockmann', 'Robertson', 'Luetje'] | ['Wanner', 'Walden', 'Nichols', 'Brockmann', 'Robertson', 'Luetje'] | ['Wanner', 'Walden', 'Nichols', 'Brockmann', 'Robertson', 'Luetje'] | Proc Natl Acad Sci U S A | 2007 | 9/4/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1355 | GSE8524 | 11/30/2007 | ['8524'] | ['3078'] | [u'18042831'] | 2801700 | [u'20003344'] | ['Dijk', 'Mohren', 'van', u'Willems', u'M\xfcller', 'de', 'Smit', 'Mariman', 'M\xc3\xbcller', 'Boekschoten'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | ['M\xc3\xbcller'] | BMC Genomics | 2009 | 12/11/2009 | 0 | tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302/E-GEOD-8302 Hs Liver Affymetrix 2 [ 55 ] GSE8302/E-GEOD-8302 Mm Liver Affymetrix 3 [ 55 ] GSE8302/E-GEOD-8302 Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291/E-GEOD-8291 Mm Liver Affymetrix 6 [ 55 ] GSE8292/E-GEOD-8292 Mm Liver Affymetrix 7 [ 55 ] GSE8295/E-GEOD-8295 Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753/E-GEOD-8753 Mm Liver Affymetrix 9 [ 57 ] GSE6903/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524{{tag}}--REUSE--/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302, GSE9291 and GSE9290). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1356 | GSE8527 | 10/1/2007 | ['8527'] | ['3041'] | [u'17709418'] | 2168309 | [u'17709418'] | ['Egmont-Petersen', 'Bootsma', 'Hermans'] | ['Egmont-Petersen', 'Bootsma', 'Hermans'] | ['Egmont-Petersen', 'Bootsma', 'Hermans'] | Infect Immun | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1357 | GSE8528 | 11/21/2007 | ['8528'] | [] | [u'18094359'] | 2748096 | [u'19728865'] | ['Paredes', 'De', 'Kerr', 'M\xc3\xa9ndez', 'Merchant-Larios', 'Ojeda'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | [] | BMC Genomics | 2009 | 9/3/2009 | 1 | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192 (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965 GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381 GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029 GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615 GPL2884 Cancer Process Study 0.752 0.0107 GSE7600 GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192 GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625 GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528{{tag}}--REUSE-- GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1358 | GSE8538 | 7/25/2007 | ['8538'] | [] | [u'17725831'] | 2020488 | [u'17725831'] | [u'No\xe9', 'Esteve', 'No\xc3\xa9', u'Montserrat', 'Romero', 'Remesar', 'Salas', 'Ciudad', u'Xavier'] | ['Esteve', 'No\xc3\xa9', 'Romero', 'Remesar', 'Salas', 'Ciudad'] | ['Esteve', 'No\xc3\xa9', 'Romero', 'Remesar', 'Salas', 'Ciudad'] | BMC Genomics | 2007 | 8/28/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1359 | GSE8544 | 10/11/2007 | ['8544'] | [] | [u'17921307'] | 2168947 | [u'17921307'] | ['Ahn', 'Wen', 'Burne'] | ['Ahn', 'Wen', 'Burne'] | ['Ahn', 'Wen', 'Burne'] | J Bacteriol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1360 | GSE8547 | 12/31/2007 | ['8547'] | [] | [u'17999357'] | 2276350 | [u'17999357'] | ['Konopka', 'Coppola', 'Oldham', 'Ren', 'Geschwind', 'Vernes', 'Fisher', 'Ou', 'Bomar', 'Spiteri'] | ['Konopka', 'Coppola', 'Oldham', 'Ren', 'Geschwind', 'Vernes', 'Fisher', 'Ou', 'Bomar', 'Spiteri'] | ['Konopka', 'Coppola', 'Oldham', 'Ren', 'Geschwind', 'Vernes', 'Fisher', 'Ou', 'Bomar', 'Spiteri'] | Am J Hum Genet | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1361 | GSE8555 | 7/25/2007 | ['8555'] | ['2874'] | [u'18228065'] | 2945940 | [u'20840752'] | ['Azuma', 'Sayano', 'Kuhara', 'Furuya', 'Hirabayashi', 'Kawakami', 'Yang', 'Yoshida', 'Tanaka'] | ['Ojeda', 'de', 'Nitsch', 'Gon\xc3\xa7alves', 'Moreau'] | [] | BMC Bioinformatics | 2010 | 9/14/2010 | 0 | n, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expr| approach for scoring candidates is presented here for comparison purposes as a naïve strategy for network analysis of differential expression. We have benchmarked these four strategies on 40 publicly available data sets originated from Affymetrix chips on which mice with (simple) knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. For each data set we|egy based on a direct neighborhood analysis. These four network-based prioritization strategies for scoring candidate genes based on their differentially expressed neighborhood were benchmarked on 40 publicly available knockout experiments in mice. Performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expressi|ng candidate genes using network-based machine learning approaches even if no knowledge is available about the disease or phenotype. Methods Benchmark data The benchmark for this study consists of 40 publicly available data sets originated from Affymetrix chips on which mice with knockout genes were tested against controls. The raw cel files were downloaded from GEO [ 12 ]. Table 3 shows all data sets|ur benchmark. Table 3 The benchmark data. Gene Name GEO accession number Gene Name GEO accession number 1 Abca1 GSE5496 21 Mbnl1 GSE14691 2 Btk GSE2826 22 Mst1r, Ron GSE16629 3 Cav1 GSE10849 23 MyD88 GSE6688 4 Cav3 GSE10848 24 Nos3, eNos GSE1988 5 Cftr GSE5715 25 Phgdh GSE8555{{tag}}--REUSE-- 6 Clcn1 GSE14691 26 Pmp22 GSE1947 7 Cnr1 GSE7694 27 PPAR α GSE6864 8 Emd GSE5304 28 Prkag3, AMPK G3 GSE4065 9 Epas1, Hi|-2 GSE16067 29 Pthlh, Pthrp GSE17654 10 Esrra GSE7196 30 Rab3a GSE6527 11 Gap43 GSE12687 31 RasGrf1 GSE8425 12 Gnmt GSE9809 32 Rbm15 GSE12628 13 Hdac1 GSE5583 33 Runx GSE4911 14 Hdac2 GSE6770 34 Scd1 GSE2926 15 Hsf4 GSE12415 35 Slc26a4 GSE10587 16 Hspa1A, Hsp70.1 GSE11120 36 Srf GSE13333 17 Il6 GSE411 37 Tgm2 GSE10285 18 Lhx1, Lim1 GSE4230 38 Zc3h12a GSE14891 19 Lhx8 GSE11897 39 Zfp36, Tpp GSE5324 20 L | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1362 | GSE8557 | 9/20/2007 | ['8557'] | [] | [u'17936709'] | 2586088 | [u'19079572'] | ['Alekseyenko', 'Kuroda', 'Park', 'Workman', 'Larschan', 'Li', 'Peng', 'Yang', 'Gortchakov'] | ['Straub', 'Grimaud', 'Mitterweger', 'Becker', 'Gilfillan'] | [] | PLoS Genet | 2008 | 2008 Dec | 1 | AND pmc_gds | 0 | 1 | ||||
1363 | GSE8563 | 10/23/2007 | ['8563'] | [] | [u'17947641'] | 2572942 | [u'18838677'] | ['Mathis', 'Gray', 'Venanzi', 'Benoist'] | ['Mathis', 'Benoist', 'Melamed', 'Venanzi'] | ['Mathis', 'Benoist', 'Venanzi'] | Proc Natl Acad Sci U S A | 2008 | 10/14/2008 | 1 | AND pmc_gds | 1 | 0 | ||||
1364 | GSE8567 | 12/20/2007 | ['8567'] | [] | [u'18074030'] | 2111048 | [u'18074030'] | ['Iwamoto', 'Bundo', 'Saito', 'Nakano', 'Kato', 'Ukai', 'Ueda', 'Hashimoto'] | ['Iwamoto', 'Bundo', 'Saito', 'Nakano', 'Kato', 'Ukai', 'Ueda', 'Hashimoto'] | ['Iwamoto', 'Bundo', 'Saito', 'Nakano', 'Kato', 'Ukai', 'Ueda', 'Hashimoto'] | PLoS One | 2007 | 12/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1365 | GSE8573 | 9/13/2007 | ['8573'] | [] | [u'17901201'] | 2592540 | [u'19079590'] | ['Schurr', 'Wilson', 'Nickerson', 'Baker-Coleman', 'Ruggles', 'Radabaugh', 'Hing', 'Cheng', 'Shah', 'McClelland', 'Catella', 'Goulart', 'Kilcoyne', 'McCracken', 'Hammond', 'Devich', 'Parra', 'Joshi', 'Ott', 'Stafford', 'Norwood', 'Stefanyshyn-Piper', 'Hunt', 'Pierson', 'Nelson', 'Quick', 'Richter', 'Morici', 'Fernandez', 'Vogel', 'Tsaprailis', 'Nelman-Gonzalez', 'H\xc3\xb6ner', 'Porwollik', 'Buchanan', 'Ramamurthy', 'Dumars', 'Allen', 'Rupert', 'Stodieck', 'Bober'] | ['Schurr', 'Wilson', 'Nickerson', 'Sarker', 'Smith', 'Ruggles', 'Quick', 'Radabaugh', 'Hing', 'Cheng', 'Gorie', 'Shah', 'McClelland', 'Porter', 'Catella', 'Benjamin', 'Goulart', 'CdeBaca', 'Devich', 'Parra', 'Porwollik', 'Norwood', 'Stefanyshyn-Piper', 'Hunt', 'Barrila', 'Crabb\xc3\xa9', 'Davis', 'Leys', 'McCracken', 'Richter', 'Morici', 'Tsaprailis', 'Nelman-Gonzalez', 'H\xc3\xb6ner', 'Ott', 'Buchanan', 'Pierson', 'Dumars', 'Mergeay', 'Rupert', 'Bober', 'Narayan'] | ['Schurr', 'Wilson', 'Nickerson', 'Quick', 'Radabaugh', 'Hing', 'Cheng', 'Shah', 'McClelland', 'Catella', 'Goulart', 'McCracken', 'Devich', 'Parra', 'Ott', 'Norwood', 'Stefanyshyn-Piper', 'Hunt', 'Pierson', 'Ruggles', 'Richter', 'Morici', 'Tsaprailis', 'Nelman-Gonzalez', 'H\xc3\xb6ner', 'Porwollik', 'Buchanan', 'Dumars', 'Rupert', 'Bober'] | PLoS One | 2008 | 2008 | 0 | [16] , [17] . The microarray data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE8573{{tag}}--DEPOSIT--). Multidimensional protein identification (MudPIT) analysis via tandem mass spectrometry coupled to dual nano-liquid chromatography (LC-LC-MS/MS) Acetone-protein precipitates from whole cell lysate | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1366 | GSE8573 | 9/13/2007 | ['8573'] | [] | [u'17901201'] | 2042201 | [u'17901201'] | ['Schurr', 'Wilson', 'Nickerson', 'Baker-Coleman', 'Ruggles', 'Radabaugh', 'Hing', 'Cheng', 'Shah', 'McClelland', 'Catella', 'Goulart', 'Kilcoyne', 'McCracken', 'Hammond', 'Devich', 'Parra', 'Joshi', 'Ott', 'Stafford', 'Norwood', 'Stefanyshyn-Piper', 'Hunt', 'Pierson', 'Nelson', 'Quick', 'Richter', 'Morici', 'Fernandez', 'Vogel', 'Tsaprailis', 'Nelman-Gonzalez', 'H\xc3\xb6ner', 'Porwollik', 'Buchanan', 'Ramamurthy', 'Dumars', 'Allen', 'Rupert', 'Stodieck', 'Bober'] | ['Schurr', 'Wilson', 'Nickerson', 'Baker-Coleman', 'Ruggles', 'Radabaugh', 'Hing', 'Cheng', 'Shah', 'McClelland', 'Catella', 'Goulart', 'Kilcoyne', 'McCracken', 'Hammond', 'Devich', 'Parra', 'Joshi', 'Ott', 'Stafford', 'Norwood', 'Stefanyshyn-Piper', 'Hunt', 'Pierson', 'Nelson', 'Quick', 'Richter', 'Morici', 'Fernandez', 'Vogel', 'Tsaprailis', 'Nelman-Gonzalez', 'H\xc3\xb6ner', 'Porwollik', 'Buchanan', 'Ramamurthy', 'Dumars', 'Allen', 'Rupert', 'Stodieck', 'Bober'] | ['Schurr', 'Wilson', 'Nickerson', 'Baker-Coleman', 'Ruggles', 'Radabaugh', 'Hing', 'Cheng', 'Shah', 'McClelland', 'Norwood', 'Goulart', 'Kilcoyne', 'McCracken', 'Hammond', 'Devich', 'Parra', 'Joshi', 'Ott', 'Stafford', 'Catella', 'Stefanyshyn-Piper', 'Hunt', 'Pierson', 'Nelson', 'Quick', 'Richter', 'Morici', 'Fernandez', 'Vogel', 'Tsaprailis', 'Nelman-Gonzalez', 'H\xc3\xb6ner', 'Porwollik', 'Buchanan', 'Ramamurthy', 'Dumars', 'Allen', 'Rupert', 'Stodieck', 'Bober'] | Proc Natl Acad Sci U S A | 2007 | 10/9/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1367 | GSE8585 | 12/1/2007 | ['8585'] | [] | [] | 2494808 | [u'18434436'] | [u'Khan', u'Rogers', u'McDonald', u'Vadigepalli', u'Schwaber', u'Gao'] | ['Khan', 'Rogers', 'McDonald', 'Vadigepalli', 'Schwaber', 'Gao'] | ['Khan', 'Rogers', 'McDonald', 'Vadigepalli', 'Schwaber', 'Gao'] | Am J Physiol Regul Integr Comp Physiol | 2008 | 2008 Jul | 0 | e log transformed and normalized to the on-chip universal reference ( 30 ) using median-scale correction. The microarray data are available at the supplemental site, as well as in the Gene Expression Omnibus with accession no. GSE8585{{tag}}--DEPOSIT-- . Testing for significant gene expression was performed using an ANOVA design ( 49 ): normalized signal intensity = treatment + timepoint + treatme| associated with the acute hypertensive stimulus. All the microarray data are available in the supplement to this paper at http://www.dbi.tju.edu/ajp-hypertension , as well as in the Gene Expression Omnibus with accession no. GSE8585{{tag}}--DEPOSIT-- . On the basis of earlier c-fos results in the literature showing the earliest responses at 40 min, we hypothesized that our initial 60-min time point would capture an ear | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1368 | GSE8586 | 7/26/2007 | ['8586'] | ['3356'] | [u'17916252'] | 2761903 | [u'19772654'] | ['Allred', 'Van', 'Sun', 'Cohen', 'Leviton', 'Kohane'] | ['Gillis', 'Pavlidis'] | [] | BMC Bioinformatics | 2009 | 9/22/2009 | 0 | experiment name, organism part, array design and age category for the experiments are listed in each column. Experiments used for analysis . Gemma ID Name Organism part Array Design Age category 622 GSE8586{{tag}}--REUSE-- Umbilical cord GPL570 Prenatal 726 GSE9164 Foreskin cells GPL5876 Prenatal 233 GSE1397 Brain, heart GPL96 Prenatal 215 khatua-astrocytoma Brain GPL91 Child/young adult 218 pomeroy-embryonal Brain, |Child/young adult 555 GSE5808 Blood cell GPL96 Child/young adult 585 GSE7586 Placenta GPL570 Adult 178 GSE80 Muscle GPL91 Adult 633 GSE8607 Testis GPL91 Adult 275 GSE4757 Brain GPL570 Older adult 721 GSE8919 Brain GPL2700 Older adult 263 GSE5281 Brain GPL570 Older adult To allow the investigation of differential expression over age, we computed a relative rank-based measure of expression level for each | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1369 | GSE8586 | 7/26/2007 | ['8586'] | ['3356'] | [u'17916252'] | 2246284 | [u'17916252'] | ['Allred', 'Van', 'Sun', 'Cohen', 'Leviton', 'Kohane'] | ['Allred', 'Van', 'Sun', 'Cohen', 'Leviton', 'Kohane'] | ['Allred', 'Van', 'Sun', 'Cohen', 'Leviton', 'Kohane'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1370 | GSE8596 | 7/27/2007 | ['8596'] | [] | [] | 2949900 | [u'20875095'] | [u'Kiyokawa', u'Okita', u'Fujimoto', u'Horiuchi', u'Taguchi', u'Umezawa', u'Nakaijima', u'Miyagawa', u'Hata', u'Katagiri', u'Sato'] | ['Dawany', 'Tozeren'] | [] | BMC Bioinformatics | 2010 | 9/27/2010 | 0 | he individual datasets usually small in size, but the inferences made from individual studies are often inconsistent with similar studies [ 1 ]. As thousands of microarray samples have accumulated in publicly accessible databases in the last decade [ 2 - 4 ], several statistical methods have been developed to allow for the combination and comparison of data from multiple sources. Among the many methodolog|ons based on hypergeometric test. Table 1 Overview of datasets used and distribution of microarray samples Analysis Tissue Accession # Normal Cancer Platform IV1/IV2/SAM1/SAM2 Colon E-MTAB-57 22 25 A GSE4107 10 12 P2 GSE4183 8 15 P2 Kidney E-TABM-282 11 16 P2 GSE11024† 12 60 P2 GSE11151 3 57 P2 GSE14762† 12 10 P2 GSE15641 23 57 A GSE6344 10 10 A GSE7023 12 35 P2 Liver GSE14323 19 47 A/A|2 49 58 A GSE7670 27 27 A Pancreas E-MEXP-1121† 6 17 A E-MEXP-950 11 14 A GSE15471 39 39 P2 GSE16515 15 36 P2 Total: 294 619 SAM2 Colon E-MEXP-1224 0 55 A E-MEXP-383 0 36 A E-TABM-176 55 0 P2 GSE12945 0 36 A GSE17538 0 232 P2 Kidney GSE10320 0 144 A GSE11904 0 21 A2 Liver E-TABM-292 0 32 A E-TABM-36 0 57 A GSE9843 0 69 P2 Lung GSE10445 0 72 P2 GSE12667 0 75 P2 Total: 55 829 IV2 Colon GSE6988 28|5E-257 No data - 262 2.34E-299 * Only 338 genes are used for colon IV1 Moreover, to assess the effect of the refRMA method in normalizing data, three samples from different colon datasets (E-MTAB-57, GSE4107 and GSE4183) were chosen. The expression values for the three arrays were obtained based on classical RMA and refRMA normalization techniques. Quantile-quantile plots were produced to compare the d|election A total of 31 Affymetrix microarray datasets containing 1,768 unique samples from human cancer (1,429) and corresponding healthy control tissues (339) were collected from the Gene Expression Omnibus (GEO; [ 2 , 3 ] and Array Express [ 4 ] online repositories (Additional File 2 ). Samples were selected for 5 different tissue types: colon, kidney, liver, lung and pancreas, then categorized into c|ms and the conversion of data to Entrez IDs resulted in the study of varying number of genes per dataset as well as different total overlap with the common Affymetrix platform (shown in parentheses); GSE6988: 9,072 (5,834) genes, GSE3: 12,452 (6,598) genes, GSE7367: 2118 (1,301) genes, GSE2088: 13754 (7,038) genes, and GSE8596{{tag}}--REUSE--: 6740 (4,330) genes. The datasets contained cancer versus normal samples fro|NCBI GEO: mining tens of millions of expression profiles--database and tools update Nucleic Acids Res 2007 35 Database D760 765 10.1093/nar/gkl887 17099226 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 1 207 210 10.1093/nar/30.1.207 11752295 Brazma A Parkinson H Sarkans U Shojatalab M Vilo J Abeygunawardena N Ho | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1371 | GSE8596 | 7/27/2007 | ['8596'] | [] | [] | 2268432 | [u'18212050'] | [u'Kiyokawa', u'Okita', u'Fujimoto', u'Horiuchi', u'Taguchi', u'Umezawa', u'Nakaijima', u'Miyagawa', u'Hata', u'Katagiri', u'Sato'] | ['Kiyokawa', 'Umezawa', 'Fujimoto', 'Horiuchi', 'Taguchi', 'Okita', 'Nakaijima', 'Miyagawa', 'Hata', 'Toyoda', 'Katagiri', 'Sato'] | [u'Kiyokawa', u'Okita', u'Fujimoto', u'Horiuchi', u'Taguchi', u'Umezawa', u'Nakaijima', u'Miyagawa', u'Hata', u'Katagiri', u'Sato'] | Mol Cell Biol | 2008 | 2008 Apr | 0 | nted in five fields per membrane as described in Materials and Methods. *, P < 0.05. Microarray data accession numbers. Microarray data have been deposited in the Gene Expression Omnibus database GEO ( www.ncbi.nlm.nih.gov/geo ) (accession numbers GSE8665 and GSE8596{{tag}}--DEPOSIT-- ).  Other Sections� Abstract MATERIALS AND METHODS RESULTS DISCUSSION Supplementary Material REFERENCES RESULTS EWS|search grants (the 3rd-Term Comprehensive 10-Year Strategy for Cancer Control [H19-010], Research on Children and Families [H18-005 and H19-003], Research on Human Genome Tailor Made, and Research on Publicly Essential Drugs and Medical Devices [H18-005[) and a grant for child health and development from the Ministry of Health, Labor and Welfare of Japan, JSPS (Kakenhi 18790263). This work was also suppor | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1372 | GSE8597 | 9/1/2007 | ['8597'] | ['3315'] | [u'17986456'] | 2782370 | [u'19117983'] | ['Mader', 'Laperri\xc3\xa8re', u'Laperri\xe8re', 'Aid', 'White', 'Bourdeau', u'Desch\xeanes', 'Desch\xc3\xaanes'] | ['Steffen', 'Hilsenbeck', 'Ochsner', 'Chen', 'McKenna', 'Watkins'] | [] | Cancer Res | 2009 | 1/1/2009 | 0 | t datasets at either time point. Moreover, relaxing the q -value cut -off to 0.2 resulted in only a modest increase in the number of genes in this intersection (data not shown here but available for download from the GEMS website). This initial result indicated that given the extent in variation across the datasets, traditional Venn analysis would be of limited use in arriving at a consensus gene express| Table 1 Studies selected for meta-analysis. Supplementary Material body Supplementary Click here to view. (4.1M, zip) Acknowledgments We thank the principal investigators who made their datasets publicly available. This work was supported by NIDDK NURSA U19 DK62434. Other Sections� Abstract Introduction Materials and Methods Results Gene Expression MetaSignatures (GEMS) web resource Discussion | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1373 | GSE8597 | 9/1/2007 | ['8597'] | ['3315'] | [u'17986456'] | 2248750 | [u'17986456'] | ['Mader', 'Laperri\xc3\xa8re', u'Laperri\xe8re', 'Aid', 'White', 'Bourdeau', u'Desch\xeanes', 'Desch\xc3\xaanes'] | ['Mader', 'Laperri\xc3\xa8re', 'Aid', 'White', 'Bourdeau', 'Desch\xc3\xaanes'] | ['Mader', 'Laperri\xc3\xa8re', 'Aid', 'White', 'Bourdeau', 'Desch\xc3\xaanes'] | Nucleic Acids Res | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1374 | GSE8601 | 7/27/2007 | ['8601'] | [] | [u'17722985'] | 1950957 | [u'17722985'] | ['', 'Grant', 'Stoeckert', 'Guttman', 'Diskin', 'Mies', 'Baldwin', 'Dudycz-Sulicz'] | ['Grant', 'Stoeckert', 'Guttman', 'Diskin', 'Mies', 'Baldwin', 'Dudycz-Sulicz'] | ['Grant', 'Stoeckert', 'Guttman', 'Diskin', 'Mies', 'Baldwin', 'Dudycz-Sulicz'] | PLoS Genet | 2007 | 2007 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
1375 | GSE8607 | 7/31/2007 | ['8607'] | ['2842'] | [u'16158187'] | 2761903 | [u'19772654'] | ['Klein-Hitpass', 'Kliesch', 'Brehm', 'Neuvians', 'von', 'Winterhager', 'Gr\xc3\xbcmmer', 'Dushaj', 'Grobholz', 'Bergmann', 'Gashaw', 'Schmid'] | ['Gillis', 'Pavlidis'] | [] | BMC Bioinformatics | 2009 | 9/22/2009 | 0 | experiment name, organism part, array design and age category for the experiments are listed in each column. Experiments used for analysis . Gemma ID Name Organism part Array Design Age category 622 GSE8586 Umbilical cord GPL570 Prenatal 726 GSE9164 Foreskin cells GPL5876 Prenatal 233 GSE1397 Brain, heart GPL96 Prenatal 215 khatua-astrocytoma Brain GPL91 Child/young adult 218 pomeroy-embryonal Brain, |Child/young adult 555 GSE5808 Blood cell GPL96 Child/young adult 585 GSE7586 Placenta GPL570 Adult 178 GSE80 Muscle GPL91 Adult 633 GSE8607{{tag}}--REUSE-- Testis GPL91 Adult 275 GSE4757 Brain GPL570 Older adult 721 GSE8919 Brain GPL2700 Older adult 263 GSE5281 Brain GPL570 Older adult To allow the investigation of differential expression over age, we computed a relative rank-based measure of expression level for each | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1376 | GSE8608 | 8/1/2007 | ['8608'] | ['2866'] | [u'18084737'] | 2620272 | [u'19014681'] | ['Lang', 'Hoffmann', 'Ziegler-Heitbrock', 'Hofer', 'Mages', 'Colige', 'Frankenberger', 'Meyer'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608{{tag}}--REUSE--, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1377 | GSE8608 | 8/1/2007 | ['8608'] | ['2866'] | [u'18084737'] | 2770025 | [u'19835593'] | ['Lang', 'Hoffmann', 'Ziegler-Heitbrock', 'Hofer', 'Mages', 'Colige', 'Frankenberger', 'Meyer'] | ['Schramm', 'Ziegler-Heitbrock', 'Stanzel', 'Hofer', 'Eder', 'Frankenberger', 'Seidel'] | ['Frankenberger', 'Hofer', 'Ziegler-Heitbrock'] | Part Fibre Toxicol | 2009 | 10/16/2009 | 0 | mRNA expression In earlier experiments using gene expression arrays, we noted a decreased expression of CYP1B1 mRNA in monocyte-derived macrophages (MDM) of patients with COPD ( ; accession number GSE8608{{tag}}--MENTION--). Since exposure of particles plays a major role in the etiology of this disease, we studied the effect of particles on cells of the monocyte/macrophage lineage. To cover a wider range of particle | 0 | 0 | 1 | NOT pmc_gds | 1 | 0 |
1378 | GSE8612 | 8/2/2007 | ['8612'] | [] | [u'17892551'] | 2104538 | [u'17892551'] | ['Sutton', 'Abidi', 'Evans', 'Hene', 'Sreenu', 'Vuong', 'Davis', 'Rowland-Jones'] | ['Sutton', 'Abidi', 'Evans', 'Hene', 'Sreenu', 'Vuong', 'Davis', 'Rowland-Jones'] | ['Sutton', 'Abidi', 'Evans', 'Hene', 'Sreenu', 'Vuong', 'Davis', 'Rowland-Jones'] | BMC Genomics | 2007 | 9/24/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1379 | GSE8613 | 7/28/2007 | ['8613'] | [] | [u'17785431'] | 2169065 | [u'17785431'] | ['Winston', 'Hickman'] | ['Winston', 'Hickman'] | ['Winston', 'Hickman'] | Mol Cell Biol | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1380 | GSE8621 | 8/10/2007 | ['8621'] | [] | [u'18086374'] | 2949756 | [u'20858252'] | ['Mages', 'Lang', 'Dietrich'] | ['Perumal', 'Kollipara'] | [] | Immunome Res | 2010 | 9/21/2010 | 0 | and a subset of these targets clearly showed the pro-inflammatory gene expression pattern corresponding to LPS tolerance. Methods Differentially expressed genes The Foster et al microarray dataset (GSE7348) was downloaded from the NCBI-GEO database [ 33 ] via FTP protocol. This data set was derived from murine (C57BL/6 strain) bone marrow macrophages left untreated (N), stimulated with LPS for 24 ho|003e; one, and have the t-test p-value of <0.05. In this study, we utilized random mouse genes as background and they were selected using the RSAT tool [ 17 ]. The Mages et al [ 10 ] data was downloaded from GEO (GSE8621{{tag}}--REUSE--) and appropriate differential gene expression filters (based on fold change) were applied to compare with the pro-inflammatory transcriptional targets identified in our analysis o | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1381 | GSE8624 | 12/27/2007 | ['8624'] | [] | [u'18199684'] | 2262958 | [u'18199684'] | ['', 'Rodriguez', 'Tapia', 'Roy', 'Meirelles', 'Allen', 'Benn', 'Aragon', 'Werner-Washburne', 'Davidson', 'Joe'] | ['Rodriguez', 'Tapia', 'Roy', 'Meirelles', 'Allen', 'Benn', 'Aragon', 'Werner-Washburne', 'Davidson', 'Joe'] | ['Rodriguez', 'Tapia', 'Roy', 'Meirelles', 'Allen', 'Benn', 'Aragon', 'Werner-Washburne', 'Davidson', 'Joe'] | Mol Biol Cell | 2008 | 2008 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
1382 | GSE8625 | 8/3/2007 | ['8625'] | [] | [u'19134196'] | 2748096 | [u'19728865'] | ['Wride', 'Mansergh', 'Evans', 'Hurley', 'Hunter', 'Daly'] | ['Makiguchi', 'Feng', 'Okuno', 'Tamon', 'Tsujimoto', 'Araki', 'Kunimoto', 'Niijima'] | [] | BMC Genomics | 2009 | 9/3/2009 | 1 | f uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when|r to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. Results GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query an|gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . Background One of the major challenges in the post-g|vering gene functions on a genomic scale [ 1 ]. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO) [ 2 ], ArrayExpress [ 3 ] and researchers' websites. These resources serve at least two purposes. One is as an archive of the data, which allows other researchers to confirm results that have b|eveloped a web tool named GEM-TREND (Gene Expression data Mining Toward Relevant Network Discovery) to automatically retrieve gene expression data across a wide range of microarray experiments in the publicly available GEO database by comparing gene-expression patterns between a query and the database entries. Subsequently, the system generates a gene co-expression network for retrieved gene expression da|, and each series links to GEO by clicking the GSE ID or GPL ID (Fig. 3e ). In addition, the series of interest can be selected for further processing. Both search results and selected series can be downloaded in CSV format. Figure 3 Screenshot of GEM-TREND . a) Query input area. The gene-expression signature, gene expression ratio data and text are accepted. Network IDs can be used to retrieve previous | from stomach subregions (Additional file 4 -CSV-Gene expression profile of mast cells pooled from mouse stomach subregions) [ 20 ]. In the score-ordered results of query-1 (P-value < 0.01), GSE1827 (titled "Waldman Bladder tumors") was ranked in fourth. Moreover, the top 10 entries showed appropriate annotations related to tumors, inflammatory and immune responses (Table 2 ). For the query-2|ts, and seven entries among the top 10 were observed using rat liver samples (Table 3 ). The biological relationships among the top 10 results of query-3 (P-value < 0.02) were not clear, but GSE6192 (titled "Gene expression changes during murine mucosal mast cell in vitro differentiation") was found out in the twelfth rank (Table 4 ). These findings indicate the general applicability of GEM-T| search using the gene expression profile of human bladder cancer - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE6112 GPL4475 Tubercolosis and healthy infected patients PBMC_TB_vs_Pool_LTBI 1 0.0054 GSE3901 GPL3279 Response of quiescent human fibroblasts to different growth factors and serum 0.96 0.0004 GSE1726 GP|2567 Human breast tumor 0.854 0.0063 GSE60 GPL174 Diffuse large B-cell lymphoma 0.851 0.0029 GSE838 GPL564 Individual-specific variation of gene expression in peripheral blood leukocytes 0.845 0.0057 GSE3176 GPL1528 p53 In Inflamatory Stress Response 0.815 0.0001 GSE344 GPL273 Spotted long oligonucleotide arrays 0.813 0.0097 GSE7965 GPL3991 Blood and Adipose tissue samples 0.805 1.50E-03 The results w| the gene expression profile of rat chemical hepatocarcinogenesis - top 10 entries sorted by similarity scores with lowest P-value < 0.01. Series Platform Description Similarity score P-value GSE5337 GPL890 Gene Expression Profiling In Rat Smooth Muscle Cells Modulated by Rapamycin and Paclitaxel. 1 1.00E-04 GSE5860 GPL890 Gene expression analysis of rat livers after exposure to acetaminophen 0|en (APAP) Rat Liver Test Gene Expression Data Set 0.895 1.00E-04 GSE5381 GPL890 Gene expression analysis of liver and kidney following methapyrilene treatment in male Sprague-Dawley rats 0.685 0.0015 GSE791 GPL542 GH inj old liver (1-7) 0.657 5.00E-04 GSE4270 GPL890 Aging Induced Alterations in Hepatic Gene Expression of the Male Fisher Rat 0.637 0.0045 GSE3608 GPL3076 Renal medullary genes in salt-sen|ssion profile of mast cells pooled from mouse stomach subregions - top 20 entries sorted by similarity scores with lowest P-value < 0.02. Series Platform Description Similarity score P-value GSE3088 GPL2510 Expression profiling of Muscle tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 1 1.00E-04 GSE2814 GPL2510 Expression profiling of liver tissue from (C57BL/6J �|time course: regulation of uterine genes by estradiol in ovariectomized mice 0.874 5.00E-04 GSE8104 GPL5137 Primary macrophage response to L. monocytogenes and bacteria-derived ligands 0.855 1.70E-03 GSE8100 GPL5137 WT and myd88-/- macrophage response to WT and hly- L. monocytogenes 0.847 6.00E-04 GSE2220 GPL1832 Genetic variation of gene expression is tissue specific in inbred mice 0.795 1.00E-02 GSE4|stren Behaves as a Weak Estrogen Rather than a Non-genomic Selective Activator in the Mouse Uterus 0.792 0.0034 GSE7029 GPL2510 Zfp90 Transgenic Signature in Mouse White Adipose Tissue 0.788 2.50E-03 GSE7615 GPL2884 Cancer Process Study 0.752 0.0107 GSE7600 GPL2884 Atm-/-, mTerc-/-, p53-/- triple knock-out lymphoma vs normal mouse DNA (GPL2884) 0.752 1.13E-02 GSE3086 GPL2510 Expression profiling of Adi|ose tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null backgrounds 0.726 0.009 GSE6192 GPL891 Gene expression changes during murine mucosal mast cell in vitro differentiation 0.709 1.76E-02 GSE4248 GPL891 Identification of genes regulated by RORg in mouse thymus 0.705 1.99E-02 GSE3087 GPL2510 Expression profiling of brain tissue from (C57BL/6J × C3H/HeJ)F2 mice on ApoE null background|nput Reduction 0.605 8.70E-03 GSE1013 GPL967 Gene Expression Profile of NHE1 Null Mutation 0.552 0.0134 GSE8625{{tag}}--REUSE-- GPL5530 Comparison of undifferentiated ES cell lines HM1, IMT11, SHBL6.3 0.488 1.82E-02 GSE8528 GPL5369 Expression analysis of gene differentially expressed in the developping ovary 0.455 0.0011 GSE3289 GPL2828 Chronic hypoxia alters the level, maturation and control of gene expression in mou|language: Java, PHP Other requirements: Java 1.5.0 or higher License: The tool is available free of charge Any restrictions to of use by non-academics: None List of abbreviations GEO: Gene Expression Omnibus; GO: Gene Ontology; GSE: Series in GEO; GPL: Platform in GEO; MeSH: Medical Subject Headings. Authors' contributions CF designed the system and wrote the manuscript; MA gave comments and edited the m|l E Koller D Kim SK A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules Science 2003 302 249 255 12934013 10.1126/science.1087447 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 207 210 11752295 10.1093/nar/30.1.207 Parkinson H Kapushesky M Shojatalab M Abeygunawardena N Coulson R Farne A | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1383 | GSE8625 | 8/3/2007 | ['8625'] | [] | [u'19134196'] | 2656490 | [u'19134196'] | ['Wride', 'Mansergh', 'Evans', 'Hurley', 'Hunter', 'Daly'] | ['Wride', 'Mansergh', 'Evans', 'Hurley', 'Hunter', 'Daly'] | ['Wride', 'Mansergh', 'Evans', 'Hurley', 'Hunter', 'Daly'] | BMC Dev Biol | 2009 | 1/9/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1384 | GSE8633 | 8/3/2007 | ['8633'] | [] | [u'19630270'] | 2581452 | [u'18487372'] | ['Calonge', 'Tong', 'Beuerman', 'Gao', 'Diebold', 'Stern'] | ['Cook', 'Barney', 'Graziano', 'Stahl'] | [] | Invest Ophthalmol Vis Sci | 2008 | 2008 Sep | 0 | rge experiments, it is essential to confirm any research using cell lines with parallel studies in primary cells. This is supported by gene expression profiles deposited in GenBank (accession number GSE8633{{tag}}--MENTION-- ), which compare conjunctival tissue with primary HCECs and the IOBA-NHC and ChWK cell lines (Gene Expression Omnibus Web site). These data suggested that, globally, primary HCECs clustered more cl | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
1385 | GSE8634 | 8/1/2007 | ['8634'] | [] | [u'19255330'] | 2664703 | [u'19352461'] | ['Mendrzyk', 'Toedt', 'Joos', 'Scheurlen', 'Remke', 'Radlwimmer', 'Benner', 'Devens', 'Gerber', 'Reifenberger', 'Rutkowski', 'Lichter', 'Wiestler', 'Felsberg', 'Kulozik', 'Korshunov', 'Pfister', 'Wittmann'] | ['Shay', 'Domany', 'Reiner-Benaim', 'Hegi', 'Lambiv'] | [] | Cancer Inform | 2009 | 2009 | 0 | monstrate the method for three different public aCGH datasets from two different childhood neoplasms associated with the nervous system on three different BAC array platforms: Medulloblastoma—GSE8634{{tag}}--REUSE--; Neuroblastoma—GSE5784 37 and GSE7230. 38 Results Algorithm Our method uses aCGH data to create a concise genomic description of each sample, including chromosomal status and appearance of|d separately but similarly, using the same method. Input The algorithm’s input is the raw log2 aCGH data, and the markers’ status. The raw log2 ratio data of chromosome 2p, taken from GSE7230, is presented in Figure 1A . Markers’ status is the assignment per marker per sample—loss (−1), normal (0) or gain (1). The status was set by the R package GLAD (Gain and L|tions matrix A, which has binary valued elements: A ms = 1 if the aCGH marker m was assigned a gain value on sample s, and A ms = 0 otherwise (the amplification matrix of chromosome 2p based on the GSE7230 data is shown in Fig. 1B ). A deletions matrix D is defined similarly: D ms = 1 if the aCGH marker m has a loss assignment on sample s, and D ms = 0 otherwise (deletion matrix is not shown). Mar| in which an entire chromosome arm is lost, the corresponding entries are replaced by NaNs in the deletion matrix D. Figure 1 displays the amplification volume calculation for chromosomal arm 2p in GSE7230 (Neuroblastoma). The height matrix H is actually the raw log2 ratio. H ms ( Fig. 1A ) is the measured aCGH log2 ratio value of marker m in sample s. A ms ( Fig. 1B ) is the amplification matrix|ntary Table 5. Significantly amplified markers appear in Supplementary Table 4, and amplifications in Supplementary Table 6. Medulloblastoma When applied to the Medulloblastoma dataset analyzed here (GSE8634{{tag}}--REUSE--) our method finds all the known chromosomal aberrations of this cancer, and several possibly new ones as well. Figure 2 displays the chromosome status map of the Medulloblastoma dataset, and the |ious chromosomal translocations in hematological malignancies. NPM1 was associated with centrosome duplication and the regulation of p53, and might have a role as a tumor suppressor. 47 This dataset (GSE8634{{tag}}) has not yet been published, but dataset GSE2139 that includes a subset of the samples 41 was analyzed for local aberrations. This publication included a list of amplifications and deletions. We s|were identified as significantly amplified by our method—MYCN, CDK6 and marker RP11–382A18. Marker RP11–382A18 is annotated near MYC region on chromosome 8q by the platform of GSE2139, used by. 41 MYC amplification and MYCN amplification are mutually exclusive. Nine of the amplifications reported there were not identified by our method. Four of their deletions included markers |as MYCN amplification and 1p deletion. Group 2B is characterized by 11q deletion, and to a lesser extent, 3p deletion. This classification explains most of the chromosomal arms associations found. In GSE5784 there are 15 amplifications (Supplementary Table 6B, 28 markers amplified, Supplementary Table 4B) and 115 deletions (Supplementary Table 3B, 245 markers deleted, Supplementary Table 5B). In GSE723|plified Supplementary Table 4C) and 49 deletions (Supplementary Table 5C, 87 markers deleted, Supplementary Table 3C). Three amplifications and 14 deletions are common to both Neuroblastoma datasets (GSE5784, GSE7230) ( Table 2 , Fig. 3C and D ). The first amplified region, which was separated into two regions in GSE7230, is on chromosome 2, and corresponds to the MYCN region. MYCN amplifications were|ance with this region being a known frequent normal copy number variation. 48 Eight of the common deletions correspond to the 1pter deletion, and this deletion was fractioned into eight deletions in GSE7230. Another common deletion is in the region of BRCA1, a known tumor suppressor gene. In GSE5784, several known tumor suppressor genes were deleted—APC, CDKN2A, RB1 and TGFBR1. Also, two regio| 11, that includes CCND1, FGF19, FGF3, FGF4 was amplified, as well as a region on chromosome 12 with ETV6. For GSE5784, no aberration list was given in the original publication 37 for comparison. In GSE7230, the ALK region on chromosome 2 was amplified. ALK was previously identified as having a role in Neuroblastoma. 49 The fumarate hydratase (FH) region was deleted in GSE7230. FH was shown to be a t|for the entire genome and for each chromosomal arm separately. We applied our method on three public datasets of childhood neoplasms associated with the nervous system—one of Medulloblastoma (GSE8634{{tag}}--REUSE--) and two of Neuroblastoma (GSE5784, GSE7230). In Medulloblastoma, we find five distinct sub groups. Two sub groups with isochromosome 17, one with many other chromosomal events (2), and one with fe|erwise it was left in the analysis. Potentially inaccurate location was identified for 17 to 144 markers per dataset, which constitute 0.7%–3.5% of the markers (see Table 1 ). We noticed for GSE8634{{tag}}--REUSE-- that many aberrations were highly correlated, and correlated to gender. Some of the samples were probably hybridized to opposite sex control samples. The 28 markers whose two sided t-test p-value b|te controlling procedures 10.1093/bioinformatics/btf877 Bioinformatics 2003 19 368 375 12584122 Figure 1 Calculation of the “volume” statistic for chromosomal arm 2p amplifications in GSE7230 (Neuroblastoma) A ) The height matrix H (raw data) of 2p, where each element (m, s) on 2p is the log2 ratio of aCGH marker m in sample s. Each row corresponds to a marker, and each column correspon|arked in A–E by red asterisks. For presentation only, values are truncated to [−1, 1]. Figure 2 Chromosomal status and aberrations in Medulloblastoma A) Chromosomal status of dataset GSE8634{{tag}}. Each row corresponds to a chromosomal arm. Due to space limitation, only every second arm is labelled. Since some chromosomes are telocentric (with short p arm), there is a change from p to q. Val|tion only, values are truncated to the range [−1, 1], rising from blue to red. Figure 3 Chromosomal status and aberrations common to both Neuroblastoma datasets Chromosomal status of datasets GSE5784 ( A ) and GSE7230 ( B ), and the aberrations common to both of Neuroblastoma datasets, shown for the patients of GSE5784 ( C ) and GSE7230 ( D ). Each column corresponds to a sample. Samples are ma|the range [−1, 1], rising from blue to red. Table 1 Array CGH datasets analyzed. Dataset Condition Samples # Markers # Markers amplified Amplifications Markers deleted Deletions CNV Removed # GSE8634{{tag}}--REUSE-- Medulloblastoma 80 6295 13 10 137 99 4 126 GSE5784 Neuroblastoma 236 2457 28 15 245 115 4 17 GSE7230 Neuroblastoma 82 4073 30 18 87 49 0 144 The datasets are recognized by their Gene Expression Omn | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1386 | GSE8648 | 8/2/2007 | ['8648'] | [] | [u'18221507'] | 2267714 | [u'18221507'] | ['Noskov\xc3\xa1', 'Seneca', u'\u010c\xed\u017ekov\xe1', 'Kmoch', 'C\xc3\xadzkov\xc3\xa1', 'Str\xc3\xa1neck\xc3\xbd', 'Hartmannov\xc3\xa1', 'Mayr', 'Potock\xc3\xa1', u'Str\xe1neck\xfd', 'Houst\xc4\x95k', 'Sperl', 'Iv\xc3\xa1nek', 'Hans\xc3\xadkov\xc3\xa1', 'Piherov\xc3\xa1', 'Zeman', 'Paul', 'Divina', 'Honz\xc3\xadk', 'Tesarov\xc3\xa1'] | ['Noskov\xc3\xa1', 'Seneca', 'Kmoch', 'C\xc3\xadzkov\xc3\xa1', 'Str\xc3\xa1neck\xc3\xbd', 'Hartmannov\xc3\xa1', 'Mayr', 'Potock\xc3\xa1', 'Houst\xc4\x95k', 'Sperl', 'Iv\xc3\xa1nek', 'Hans\xc3\xadkov\xc3\xa1', 'Piherov\xc3\xa1', 'Zeman', 'Paul', 'Divina', 'Honz\xc3\xadk', 'Tesarov\xc3\xa1'] | ['Noskov\xc3\xa1', 'Seneca', 'C\xc3\xadzkov\xc3\xa1', 'Hans\xc3\xadkov\xc3\xa1', 'Str\xc3\xa1neck\xc3\xbd', 'Hartmannov\xc3\xa1', 'Kmoch', 'Potock\xc3\xa1', 'Houst\xc4\x95k', 'Sperl', 'Iv\xc3\xa1nek', 'Mayr', 'Piherov\xc3\xa1', 'Zeman', 'Paul', 'Divina', 'Honz\xc3\xadk', 'Tesarov\xc3\xa1'] | BMC Genomics | 2008 | 1/25/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1387 | GSE8650 | 9/1/2007 | ['8650'] | [] | [u'17724127'] | 2957424 | [u'20976054'] | ['Ardura', 'Wise', 'Chung', 'Mejias', 'Palucka', 'Allman', 'Bennett', 'Smith', 'Stichweh', 'Banchereau', 'Ramilo', 'Punaro', 'Chaussabel', 'Pascual', 'Allantaz'] | ['Shamir', 'Karp', 'Ulitsky', 'Krishnamurthy'] | [] | PLoS One | 2010 | 10/19/2010 | 0 | pone.0013367.t001 Table 1 Gene expression datasets used in this study. Dataset KEGG pathway Reference GEO accession Number of cases Number of controls AD Alzheimer's disease (AD) [41] GSE5281 10 13 ASTHMA Asthma [46] GSE4302 42 28 PYLORI Epithelial cell signaling in Helicobacter pylori infection - GSE5081 8 8 HD Huntington's disease (HD) [48] GSE3790 38 3|-GLIOBLASTOMA Pathways in cancer [47] GSE4290 77 23 SUN-ASTROCYTOMA Pathways in cancer [47] GSE4290 26 23 SUN-OLIGODENDROGLIOMA Pathways in cancer [47] GSE4290 50 23 ESTILO-OTSCC Pathways in cancer [44] GSE13601 31 26 YE-OTSCC Pathways in cancer [45] GSE9844 26 12 MORAN-PD Parkinson's disease (PD) [42] GSE83|SLE Systemic lupus erythematosus (SLE) [49] GSE8650{{tag}}--REUSE-- 38 21 Each dataset contained a comparison of sick individuals and healthy controls. All the data were obtained from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ ). We first evaluated the performance of different variants of our algorithm and found that DEGAS usually identified the smallest pathways ( Text S1 and Figu | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1388 | GSE8650 | 9/1/2007 | ['8650'] | [] | [u'17724127'] | 2911917 | [u'20576155'] | ['Ardura', 'Wise', 'Chung', 'Mejias', 'Palucka', 'Allman', 'Bennett', 'Smith', 'Stichweh', 'Banchereau', 'Ramilo', 'Punaro', 'Chaussabel', 'Pascual', 'Allantaz'] | ['Griffin', 'Layh-Schmitt', 'Hinze', 'Barnes', 'Thornton', 'Grom', 'Colbert', 'Glass', 'Aronow', 'Fall', 'Thompson', 'Mo'] | [] | Arthritis Res Ther | 2010 | 2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1389 | GSE8658 | 8/20/2007 | ['8658'] | [] | [u'17664351', u'20410489'] | 2610483 | [u'19125200'] | ['Homolya', 'Benko', 'L\xc3\xa1nyi', 'Barta', 'Gurnell', 'Sz\xc3\xa9les', 'Nagy', 'Agostini', 'Chatterjee', 'Szatmari', 'Hegyi', 'Bar\xc3\xa1th', 'Dezso', 'Szatm\xc3\xa1ri', 'T\xc3\xb6r\xc3\xb6csik', 'P\xc3\xb3liska'] | ['Battaglia', 'Rizzetto', 'Paola', 'Rocca-Serra', 'Beltrame', 'Gambineri', 'Cavalieri'] | [] | PLoS One | 2009 | 2009 | 1 | AND pmc_gds | 0 | 1 | ||||
1390 | GSE8665 | 8/3/2007 | ['8665'] | [] | [] | 2268432 | [u'18212050'] | [u'Kiyokawa', u'Okita', u'Fujimoto', u'Horiuchi', u'Taguchi', u'Umezawa', u'Nakaijima', u'Miyagawa', u'Hata', u'Katagiri', u'Sato'] | ['Kiyokawa', 'Umezawa', 'Fujimoto', 'Horiuchi', 'Taguchi', 'Okita', 'Nakaijima', 'Miyagawa', 'Hata', 'Toyoda', 'Katagiri', 'Sato'] | [u'Kiyokawa', u'Okita', u'Fujimoto', u'Horiuchi', u'Taguchi', u'Umezawa', u'Nakaijima', u'Miyagawa', u'Hata', u'Katagiri', u'Sato'] | Mol Cell Biol | 2008 | 2008 Apr | 0 | nted in five fields per membrane as described in Materials and Methods. *, P < 0.05. Microarray data accession numbers. Microarray data have been deposited in the Gene Expression Omnibus database GEO ( www.ncbi.nlm.nih.gov/geo ) (accession numbers GSE8665{{tag}}--DEPOSIT-- and GSE8596 ).  Other Sections� Abstract MATERIALS AND METHODS RESULTS DISCUSSION Supplementary Material REFERENCES RESULTS EWS|search grants (the 3rd-Term Comprehensive 10-Year Strategy for Cancer Control [H19-010], Research on Children and Families [H18-005 and H19-003], Research on Human Genome Tailor Made, and Research on Publicly Essential Drugs and Medical Devices [H18-005[) and a grant for child health and development from the Ministry of Health, Labor and Welfare of Japan, JSPS (Kakenhi 18790263). This work was also suppor | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1391 | GSE8667 | 9/29/2007 | ['8667'] | [] | [u'17542650'] | 2802188 | [u'20008927'] | ['Farnham', '', 'Blahnik', "O'Geen", 'Squazzo', 'Iyengar', 'Rinn', 'Green', 'Chang'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1392 | GSE8668 | 11/19/2007 | ['8668'] | ['3073'] | [u'18006867'] | 2620272 | [u'19014681'] | ['Leu', 'Galassetti', 'Cooper', 'Radom-Aizik', 'Zaldivar'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668{{tag}}--REUSE--, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1393 | GSE8668 | 11/19/2007 | ['8668'] | ['3073'] | [u'18006867'] | 2967749 | [u'21044366'] | ['Leu', 'Galassetti', 'Cooper', 'Radom-Aizik', 'Zaldivar'] | ['Yousif', 'Mbagwu', 'Ohno-Machado', 'Lacson'] | [] | BMC Bioinformatics | 2010 | 10/28/2010 | 0 | es/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background The amount of data deposited in the Gene Expression Omnibus (GEO) has expanded significantly. It is important to ensure that these data are properly annotated with clinical data and descriptions of experimental conditions so that they can be useful for future|istency. Association between relevant variables, however, was adequate. 10–12 March 2010 2010 AMIA Summit on Translational Bioinformatics San Francisco, CA, USA Background The Gene Expression Omnibus (GEO) project was initiated by the National Center for Biotechnology Information (NCBI) to serve as a repository for gene expression data [ 1 , 2 ]. In addition to GEO, there are several other large-| 400,000 samples. There has been an ever growing interest in large microarray repositories for several reasons: (a) Microarray data are required by funding agencies and scientific journals to be made publicly accessible; (b) such repositories enable researchers to view data from other research groups; and (c) with proper pre-processing, such repositories may allow researchers to formulate and test hypothe|viously described [ 14 , 15 ]. The annotation tool used for this research was developed to facilitate human annotation by allowing easy access between the data descriptions and measurements that were downloaded from GEO and appropriate scientific publications from Pubmed [ 13 ]. The annotators are able to read the study descriptions that researchers deposited in GEO, as well as individual sample descripti|, and the results are displayed in Table 3 . Table 4 shows all the studies’ goals and the number of samples in each of the 17 annotated studies. Table 3 Coverage of Asthma variables in GDS GSE 470 GSE 473 GSE 3183 GSE 3004 Total Agent 100% 0% 100% 100% 17.4% Disease State 100% 100% 0% 0% 88.2% Time 100% 0% 100% 0% 12.7% Other 0% 100% 0% 0% 82.5% No. of Samples 12 175 15 10 212 Table 4 Annota|dy No. of Samples Topic/Title GSE8052 404 Determinants of susceptibility to childhood asthma GSE473 175 Defining diagnostic genes from purified CD4+ blood cells that have specific diagnostic profiles GSE4302 118 Profiling of airway epithelial cells GSE3184 40 Murine airway hyperresponsiveness GSE483 39 Allergic response to ragweed GSE1301 24 Mechanisms by which IL-13 elicits the symptoms of asthma GSE8|fects of exercise on gene expression GSE6858 16 Expression data from experimental murine asthma GSE3183 15 Early cytokine-mediated mechanisms that lead to asthma GSE470 12 Asthma exacerbatory factors GSE9465 12 Pulmonary responses to ambient particulate matter GSE3004 10 Effects of allergen challenge on airway cell gene expression GSE2276 9 Effect of PGE receptor subtype agonist on an asthma model GSE4|d inhaler 697 24.1 Disease frequency 627 31.7 Gender 489 46.7 Atopic 425 53.7 Tissue 403 56.1 Challenge 0 1.0 The consistency of the studies in the asthma domain was also measured. In one such study (GSE4302), the data for 32 asthmatics randomized to a placebo-controlled trial of fluticasone propionate were examined. The authors use the generic name “fluticasone propionate” within both | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1394 | GSE8671 | 8/3/2007 | ['8671'] | ['2947'] | [u'18171984'] | 2935415 | [u'20823331'] | ['Clevers', 'Van', 'Ranalli', u'Sabates', 'Pastorelli', 'Luz', 'de', 'Menigatti', 'Rehrauer', 'Kurowski', 'Marra', 'Cattaneo', 'Gomes', 'Anti', 'Jiricny', 'Maake', 'Sabates-Bellver', 'Laczko', 'Bujnicki', 'Faggiani'] | ['Davicioni', 'Dao', 'Salari', 'Ester', 'Sch\xc3\xb6nhuth', 'Moser', 'Colak'] | [] | Bioinformatics | 2010 | 9/15/2010 | 0 | ation-theoretic criteria as described for the approaches which were employed for benchmarking. See Supplementary Materials for details. 3.3 Datasets and classification schemes 3.3.1 Network data We downloaded the licensed PPI network from the STRING database, version 8.1 (Jensen et al. , 2009 ). STRING provides several variants of association network where edges come with a confidence score. Networks |oyutürk ( 2010 ) was used. 3.3.2 Colon cancer gene expression data In analogy to Chowdhury and Koyutürk's ( 2010 ) study, we treated the microarray datasets with the accession numbers GSE8671{{tag}}--REUSE--, GSE10950 and GSE6988 from the Gene Expression Omnibus (Barrett et al. , 2009 ). GSE8671{{tag}}--REUSE-- contains 8987 gene expression profiles across 32 prospectively collected adenomas with those of normal muc|3 normal liver, 27 liver metastasis and 20 primary colorectal tumors without liver metastasis (Ki et al. , 2007 ). 3.3.3 Breast cancer gene expression data We considered the gene expression dataset GSE3494 treated in Miller et al. ( 2005 ) along with all available additional information. Experiments performed in Miller et al. ( 2005 ) aim at predicting TP53 mutation status, tumor grade and surviv|is means that accuracy values cannot be taken as unbiased results since feature selection is based on the outcome of the cross-validation. 3.3.4 Differential expression For the colon cancer datasets, GSE8671{{tag}}--REUSE-- and GSE10950, we determine differential expression as described in Chowdhury and Koyutürk ( 2010 ). We first normalize expression values for each gene v individually. Let E ( v , j ) be|ealthy sample for one patient l . We then put D ( v ) l =+, resp. D ( v ) l = − if v is overexpressed in j 1 , but not in j 2 , resp. the other way round. In the breast cancer dataset GSE3494 (see below), we determine a normal distribution for all values and normalize the entire data accordingly. For an arbitrary sample l , let E ( v , l ) be the corresponding normalized value. Subse| approach implementing a linear kernel using Matlab's svmclassify . For colon cancer versus healthy classification, the training data are identical with that used for marker computation (i.e. either GSE8671{{tag}} or GSE10950). For colon cancer with versus without liver metastasis, markers are computed using GSE8671{{tag}} or GSE10950 and classification is performed by leave-one-out cross-validation in GSE6988. Thi|010 ) as a guideline and followed their workflow for cross-platform predictions. 4.1.1 Marker computation We computed and subsequently ranked subnetwork markers as described in Section 3 both using GSE8671{{tag}} (parameter choices: α = 0.5, L = 3) and GSE10950 (α = 0.5, L = 2). Parameters were chosen as non-restrictive as possible such that the total number of subnetwork markers did not| described in Chuang et al. ( 2007 ), SGM as described in Chowdhury and Koyutürk ( 2010 ) and were provided with subnetwork markers by Chowdhury and Koyutürk ( 2010 ) extracted from GSE8671{{tag}}--REUSE--, accordingly ranked (NETCOVER = NC). However, neither subnetwork markers from GSE10950 nor the implementation of the NetCover (NC) algorithm were publicly available at the time when experiments wer| Koyutürk, 2010 ). Predictions refer to predicting cancer versus healthy tissue, resp. liver metastasis versus non-liver metastasis (henceforth referred to as ‘Prognosis’) in GSE6988 using the markers from GSE8671{{tag}} and GSE10950. Note that we cannot display certain values referring to markers from GSE10950 for NC since we were not provided with the corresponding subnetworks nor t|l (=TP/(TP + FN)) for different choices of subnetwork markers and the two prediction tasks where markers are chosen according to the corresponding rankings. Note that values for NC using markers from GSE10950 are missing due to the above-mentioned reasons. In Chowdhury and Koyutürk ( 2010 ), an average AUC of 0.86 is reported for prediction of GSE6988 (wDCB: 0.91, see Table 1 in the Suppleme|ch is based on AUC. Overall, our method outperforms all competitors both when predicting cancer versus non-cancer and metastasis versus non-metatastis. In the latter case, when using subnetworks from GSE8671{{tag}}, the increase in accuracy from 83%, the best value obtained by the competitors, to 92%, obtained by our method wDCB is quite remarkable. Note that this is a relative increase of more than 50% (9% o|cancer Here, we use Miller et al. ( 2005 ) as a guideline. We focus on TP53 mutation status and predict wildtype (wt) versus mutant (mt), a binary classification task. We first compute markers from GSE3494 and subsequently employ the suggested leave-one-out cross-validation scheme in the same dataset. As has been recently pointed out (Chuang et al. , 2007 ; Ein-Dor et al. , 2005 , 2006 ) non-cro| major source of motivation for subnetwork marker approaches. In the following we will distinguish between single probe markers (SPMs) that is a SGM approach making use of all probe data available in GSE3494 even if probes cannot be mapped (possibly reflecting non-coding RNA, etc.). 1 SGMs, which is the equivalent of SPM using only mappable gene probes, GMI (Chuang et al. , 2007 ) and our approach w|rk markers using markers extracted from GSE3494 for predicting TP53 mutation status (wildtype versus mutant) in GSE3494 (leave-one-out cross-validation). 4.3 Analysis of our top markers 4.3.1 Markers GSE8671{{tag}}, colon cancer GO enrichment analysis of the 186 genes identified in the top subnetworks from GSE8671{{tag}} revealed a significant role for genes involved in the biological processes of DNA replication, D|been shown to be early markers for CRC (Burger, 2008 ) and overall almost all CRC display dysregulation of the TP53 pathway through mutations or other means of functional inactivation. 4.3.2 Markers GSE3494, breast cancer Here we focused on the role of TP53, whose expression signature was used previously to classify prognostic classes in two breast cancer and one liver cancer cohorts with known TP53 s | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1395 | GSE8671 | 8/3/2007 | ['8671'] | ['2947'] | [u'18171984'] | 2620272 | [u'19014681'] | ['Clevers', 'Van', 'Ranalli', u'Sabates', 'Pastorelli', 'Luz', 'de', 'Menigatti', 'Rehrauer', 'Kurowski', 'Marra', 'Cattaneo', 'Gomes', 'Anti', 'Jiricny', 'Maake', 'Sabates-Bellver', 'Laczko', 'Bujnicki', 'Faggiani'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671{{tag}}--REUSE--, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1396 | GSE8671 | 8/3/2007 | ['8671'] | ['2947'] | [u'18171984'] | 2797084 | [u'20090827'] | ['Clevers', 'Van', 'Ranalli', u'Sabates', 'Pastorelli', 'Luz', 'de', 'Menigatti', 'Rehrauer', 'Kurowski', 'Marra', 'Cattaneo', 'Gomes', 'Anti', 'Jiricny', 'Maake', 'Sabates-Bellver', 'Laczko', 'Bujnicki', 'Faggiani'] | ['Nibbe', 'Chance', 'Koyut\xc3\xbcrk'] | [] | PLoS Comput Biol | 2010 | 1/15/2010 | 0 | the Level of mRNA We evaluated the individual differential gene expression of each crosstalker identified using the Nibbe and Friedman proteomic seeds using two microarray datasets obtained from GEO (GSE10950 & GSE8671{{tag}}--REUSE--). GSE8671{{tag}}--REUSE-- represents 64 experiments using mRNA isolated from tissue biopsies obtained from 32 patients (matched tumor and adjacent normal mucosa) performed on an Affymetrix GeneC|c seeds (Nibbe et al. , Friedman et al. ), a seed of CRC driver genes (Sjöblom et al. ), and all proteins in the HPRD PPI network, as quantified by mutual information with phenotype, using GSE8671{{tag}}--REUSE-- and GSE10950. Synergistic Regulation of Sub-Networks Induced by Proteomic Seeds For the purpose of discussion we will refer to a sub-network by the proteomic seed that induced the sub-network (e.g.|tion of sub-network mRNA expression profile with phenotype class) versus network size for candidate sub-networks. All interactors (green squares) and crosstalkers (red diamonds) were scored using (a) GSE10950 and (b) GSE8671{{tag}}. The blue lines represent the linear interpolation of the means of the estimated null distributions computed for random candidate sub-networks of size 2,4,8,16,32, and 64, using th|I on one or the other microarray datasets. Two crosstalker sub-networks (red diamonds), CCT2 and TCP1 , show improvement over their corresponding interactor sub-network on both arrays. Notably, on GSE10950, the mutual information score of the TPI1 crosstalker sub-network is significant, while the corresponding interactor sub-network failed to show significance. Figure 4 shows the corresponding pl|rks of crosstalkers. 10.1371/journal.pcbi.1000639.g005 Figure 5 Significant sub-networks induced by proteomic seeds. Network graph visualization of sub-networks induced by Friedman seed, scored using GSE10950 (a) and Nibbe seed, scored using GSE8671{{tag}} (b). Proteomic seeds that induced a significant crosstalker sub-network are shown in red, other proteomic seeds are shown in orange, crosstalkers are bla|age matched (N = normal/T = tumor) patient tissue biopsies not used in the original proteomic screen by Nibbe et. al. Values are in kilodalton (kDa). GSE8671{{tag}} and GSE10950 represent the ratio of the mean mRNA value (tumor/normal) from the respective microarray array. Fold change was determined by densitometry. Synergistic Dysregulation of Sub-Networks In| supposed that driver gene seeds (n = 42) might be superior both in terms of the number and significance of the sub-networks identified. As shown in Figure 7 , when scored by GSE8671{{tag}}, only four significant sub-networks were found. Strikingly, for every one of them, only the crosstalker sub-networks were significant. Using GSE10950, seven sub-networks of crosstalkers were signif| Methods ). The significant sub-networks in each group were first ranked by MI, and the features were valued by superposing the mRNA expression values of each gene in the sub-network. When trained on GSE10950 and validated on GSE8671{{tag}}, proteomic crosstalkers outperformed the interactor sub-networks (both proteomic and genomic) when the number of features used to train the classifier was three or less. B|on values for the genes were aggregated to compute a feature for each sub-network with significant MI. These features were used to train an SVM-based classifier to distinguish normal from tumor using GSE10950, and then cross-validated on GSE8671{{tag}} (a), and vice-versa (b). Discussion We have shown that proteomic targets showing significant expression changes for a complex phenotype, such as CRC, provide v|periments. As mentioned in the previous section, with respect to the proteomic seeds, a number of the same sub-networks showed significance (>1σ from background) when scored by either GSE10950 or GSE8671{{tag}}. With respect to the driver gene seed, every sub-network that showed significance when scored by the GSE8671{{tag}} array was also found to be significant when scored by the GSE10950 array. On|f proteomic seeds did not show complete redundancy between arrays is that the microarrays represent experiments performed on different pathologic stages of CRC tumors, very early stage in the case of GSE8671{{tag}} (adenoma) versus a more established tumor in GSE10950 (primary). The pathologic stage of the proteomic samples in the Nibbe seed was homogenous late stage CRC (Duke's D) while the Friedman seed was|the crosstalker (or interactor) subnetworks with synergistic differential expression (ϕ( Q )) one standard deviation above random mean, according to a specific mRNA expression data set (e.g., GSE8671{{tag}}). Assume that there are K such subnetworks. Then, for each k ≤K, we use the k subnetworks with maximum ϕ( Q ) to train an SVM classifier on the same data set (GSE8671{{tag}}--REUSE--), using M | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1397 | GSE8671 | 8/3/2007 | ['8671'] | ['2947'] | [u'18171984'] | 2845644 | [u'20361040'] | ['Clevers', 'Van', 'Ranalli', u'Sabates', 'Pastorelli', 'Luz', 'de', 'Menigatti', 'Rehrauer', 'Kurowski', 'Marra', 'Cattaneo', 'Gomes', 'Anti', 'Jiricny', 'Maake', 'Sabates-Bellver', 'Laczko', 'Bujnicki', 'Faggiani'] | ['Daigle', 'Cushman', 'McLaughlin', 'Tsao', 'Altman', 'Reaven', 'Cam', 'Deng'] | [] | PLoS Comput Biol | 2010 | 3/26/2010 | 0 | nificant DE genes from each dataset in its entirety; this resulted in 1122 (12.3%), 588 (4.4%), and 6002 (29.9%) DE genes for the prostate cancer, letrozole treatment (GEO ID: GSE5462), and colorectal cancer (GSE8671{{tag}}--REUSE--) datasets, respectively. After downloading the three corresponding knowledge compendia (minus the highly replicated datasets) and running SVD on each, we determined|ouse transcriptomes. Proc Natl Acad Sci U S A 99 4465 70 11904358 3 IGC 2008 expo (expression project for oncology). URL http://www.intgen.org/expo 4 Edgar R Domrachev M Lash AE 2002 Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic Acids Res 30 207 10 11752295 5 Ein-Dor L Zuk O Domany E 2006 Thousands of samples are needed to generate a robust gene list for | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1398 | GSE8671 | 8/3/2007 | ['8671'] | ['2947'] | [u'18171984'] | 2948500 | [u'20957034'] | ['Clevers', 'Van', 'Ranalli', u'Sabates', 'Pastorelli', 'Luz', 'de', 'Menigatti', 'Rehrauer', 'Kurowski', 'Marra', 'Cattaneo', 'Gomes', 'Anti', 'Jiricny', 'Maake', 'Sabates-Bellver', 'Laczko', 'Bujnicki', 'Faggiani'] | ['Jarosz', 'Rubel', 'Oledzki', 'Paziewska', 'Pachlewski', 'Goryca', 'Skrzypczak', 'Mikula', 'Ostrowsk'] | [] | PLoS One | 2010 | 10/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1399 | GSE8676 | 12/3/2007 | ['8676'] | [] | [] | 2335119 | [u'18331636'] | [u'Chardon', u'Flori', u'Cochet', u'Rogel-Gaillard', u'Lef\xe8vre', u'Lemonnier', u'Hugot', u'Robin'] | ['Lef\xc3\xa8vre', 'Hugot', 'Flori', 'Cochet', 'Rogel-Gaillard', 'Lemonnier', 'Chardon', 'Robin'] | ['Chardon', 'Flori', 'Cochet', 'Rogel-Gaillard', 'Lemonnier', 'Hugot', 'Robin'] | BMC Genomics | 2008 | 3/10/2008 | 0 | ème d'Information du projet d'Analyse des Genomes des Animaux d'Elevage) [ 20 ]. The SLA/PrV and the Qiagen-NRSP8 microarray data have been submitted to the GEO and received accession numbers GSE8676{{tag}}--DEPOSIT-- and GSE9259, respectively. Statistical data analysis The normalization and statistical analysis steps were performed with scripts written with R software. Functions contained in stats, anapuce and | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1400 | GSE8680 | 8/9/2007 | ['8680'] | [] | [] | 2443166 | [u'18577207'] | [u'Tschuch'] | ['Tschuch', 'Werft', 'Benner', 'Pscherer', 'Barrionuevo', 'Hotz-Wagenblatt', 'Lichter', 'Schulz', 'Mertens'] | [u'Tschuch'] | BMC Mol Biol | 2008 | 6/24/2008 | 0 | ue depicts the degree of deregulation, while the B value is a measure of significance of the deregulation. Raw and normalized data are deposited in the Gene Expression Omnibus database (accession No. GSE8680{{tag}}--DEPOSIT--) [ 31 ]. Real-time PCR quantification cDNA templates were generated from the corresponding mRNA using SuperScript II and anchored oligo-d(T) 20 primer (Invitrogen, Karlsruhe, Germany). For amplifi | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1401 | GSE8686 | 8/23/2007 | ['8686'] | ['2987'] | [u'18832915'] | 2713681 | [u'18832915'] | ['Ellis', 'Fu', 'Vozenin-Brotons', 'Loose', 'Hauer-Jensen', 'McGonigle', 'Sweetnam', 'Fink', 'Bartolozzi', 'Paradise', 'Boerma', 'Wang'] | ['Ellis', 'Fu', 'Vozenin-Brotons', 'Loose', 'Hauer-Jensen', 'McGonigle', 'Sweetnam', 'Fink', 'Bartolozzi', 'Paradise', 'Boerma', 'Wang'] | ['Ellis', 'Fu', 'Vozenin-Brotons', 'Loose', 'Hauer-Jensen', 'McGonigle', 'Sweetnam', 'Fink', 'Bartolozzi', 'Paradise', 'Boerma', 'Wang'] | Blood Coagul Fibrinolysis | 2008 | 2008 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1402 | GSE8692 | 8/7/2007 | ['8692'] | ['3069'] | [u'17726534'] | 1950084 | [u'17726534'] | ['Puskar', 'Petzold', 'Santiago', 'Lao', 'Papagiannakopoulos', 'Lee', 'Kornblum', 'Kosik', 'Liu', 'Doyle', 'Nelson', 'Clay', 'Qi', 'Shraiman'] | ['Puskar', 'Petzold', 'Santiago', 'Lao', 'Papagiannakopoulos', 'Lee', 'Kornblum', 'Kosik', 'Liu', 'Doyle', 'Nelson', 'Clay', 'Qi', 'Shraiman'] | ['Puskar', 'Petzold', 'Santiago', 'Lao', 'Papagiannakopoulos', 'Lee', 'Kornblum', 'Kosik', 'Liu', 'Doyle', 'Nelson', 'Clay', 'Qi', 'Shraiman'] | PLoS One | 2007 | 8/29/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1403 | GSE8693 | 8/7/2007 | ['8693'] | [] | [u'17883843'] | 2099419 | [u'17883843'] | ['Hultin-Rosenberg', 'Kultima', 'Ellegren', 'Brunstr\xc3\xb6m', 'Scholz', u'Brunstr\xf6m', 'Dencker'] | ['Hultin-Rosenberg', 'Kultima', 'Ellegren', 'Brunstr\xc3\xb6m', 'Scholz', 'Dencker'] | ['Hultin-Rosenberg', 'Kultima', 'Brunstr\xc3\xb6m', 'Ellegren', 'Scholz', 'Dencker'] | BMC Biol | 2007 | 9/20/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1404 | GSE8700 | 8/7/2007 | ['8700'] | ['2946'] | [u'18239588'] | 2801700 | [u'20003344'] | ['Calley', 'Varga', 'Lawrence', 'Zhang', 'Hu', 'Surapaneni', 'Estrem', 'Gallagher', 'Chen', 'Li', 'Dow'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | [] | BMC Genomics | 2009 | 12/11/2009 | 0 | tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302/E-GEOD-8302 Hs Liver Affymetrix 2 [ 55 ] GSE8302/E-GEOD-8302 Mm Liver Affymetrix 3 [ 55 ] GSE8302/E-GEOD-8302 Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291/E-GEOD-8291 Mm Liver Affymetrix 6 [ 55 ] GSE8292/E-GEOD-8292 Mm Liver Affymetrix 7 [ 55 ] GSE8295/E-GEOD-8295 Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753/E-GEOD-8753 Mm Liver Affymetrix 9 [ 57 ] GSE6903/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700{{tag}}--REUSE--/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302, GSE9291 and GSE9290). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1405 | GSE8700 | 8/7/2007 | ['8700'] | ['2946'] | [u'18239588'] | 2972245 | [u'20961460'] | ['Calley', 'Varga', 'Lawrence', 'Zhang', 'Hu', 'Surapaneni', 'Estrem', 'Gallagher', 'Chen', 'Li', 'Dow'] | ['Wang', 'Zang', 'Kuroyanagi', 'Umemoto', 'Oka', 'Hirano', 'Shimada', 'Tanaka', 'Nishimura'] | [] | BMC Physiol | 2010 | 10/21/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1406 | GSE8710 | 11/1/2007 | ['8710'] | ['3183'] | [u'17804806'] | 1976222 | [u'17804806'] | ['Biben', 'Woehl', 'Groom', 'Mackay', 'Leung', 'Batten', 'Harvey', 'Mellado', 'Mart\xc3\xadnez-Mu\xc3\xb1oz', u'Biben*', 'Li', u'Sierro*', 'Ransohoff', 'Sierro', u'MacKay', 'Mart\xc3\xadnez-A'] | ['Biben', 'Woehl', 'Groom', 'Mackay', 'Leung', 'Batten', 'Harvey', 'Mellado', 'Mart\xc3\xadnez-Mu\xc3\xb1oz', 'Li', 'Ransohoff', 'Sierro', 'Mart\xc3\xadnez-A'] | ['Woehl', 'Groom', 'Mackay', 'Leung', 'Batten', 'Harvey', 'Mellado', 'Mart\xc3\xadnez-Mu\xc3\xb1oz', 'Li', 'Biben', 'Sierro', 'Ransohoff', 'Mart\xc3\xadnez-A'] | Proc Natl Acad Sci U S A | 2007 | 9/11/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1407 | GSE8716 | 8/22/2007 | ['8716'] | [] | [u'17908821'] | 2802188 | [u'20008927'] | ['Farnham', 'Rabinovich', 'Oberley', 'Jin', 'Bieda', 'Green', 'Xu'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1408 | GSE8716 | 8/22/2007 | ['8716'] | [] | [u'17908821'] | 2045138 | [u'17908821'] | ['Farnham', 'Rabinovich', 'Oberley', 'Jin', 'Bieda', 'Green', 'Xu'] | ['Farnham', 'Rabinovich', 'Oberley', 'Jin', 'Bieda', 'Green', 'Xu'] | ['Farnham', 'Rabinovich', 'Oberley', 'Jin', 'Bieda', 'Green', 'Xu'] | Genome Res | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1409 | GSE8725 | 8/15/2007 | ['8725'] | [] | [] | 2147049 | [u'18159241'] | [u'Mandrell', u'Wang', u'Ussery', u'Mendz', u'Rubenfield', u'Stolz', u'Malek', u'Rogosin', u'Miller', u'W\xf6sten', u'Parker', u'Stanker'] | ['Parker', 'Binnewies', 'Wang', 'Miller', 'Mendz', 'Rubenfield', 'Hallin', 'Stolz', 'W\xc3\xb6sten', 'Rogosin', 'Ussery', 'Malek', 'Mandrell', 'Stanker'] | [u'Mandrell', u'Wang', u'Ussery', u'Mendz', u'Rubenfield', u'Stolz', u'Malek', u'Rogosin', u'Miller', u'Parker', u'Stanker'] | PLoS One | 2007 | 12/26/2007 | 0 | hierarchical clustering with the standard correlation and bootstrapping. Microarray data have been deposited in the NCBI GEO repository ( http://www.ncbi.nlm.nih.gov/geo/ ) with the accession number GSE8725{{tag}}--DEPOSIT--. Supporting Information Figure S1 Diagram of the A. butzleri strain RM4018 genome. Genes and features are drawn to scale. Genes are colored according to role category. tRNA and rRNA loci are incl | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1410 | GSE8728 | 8/30/2007 | ['8728'] | [] | [] | 2258866 | [u'18203827'] | [u'Whitmann', u'Hendrickson', u'Haydock', u'Moore', u'Leigh'] | ['S\xc3\xb6ll', 'Leigh', 'Liu', 'Hendrickson', 'Porat', 'Rosas-Sandoval', 'Whitman'] | [u'Hendrickson', u'Leigh'] | J Bacteriol | 2008 | 2008 Mar | 0 | arison involved four technical replicates as described previously ( 33 ). Gene expression ratios were viewed using a TIGR MultiExperiment viewer ( 24 ). Data are available at the NCBI Gene Expression Omnibus (GEO) through accession numbers GSE6747 and GSE8728{{tag}}--DEPOSIT-- . 16S rRNA abundance was calculated by multiplying the yield of total RNA (μg per OD 660 per ml of culture) by 0.24. Real-time RT-PCR r | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1411 | GSE8731 | 8/11/2007 | ['8731'] | [] | [u'19450442'] | 2706329 | [u'19450442'] | ['Wride', 'Mansergh', 'Kisiswa', 'Evans', 'Geatrell', 'Gan', 'Jarrin', 'Boulton', 'Williams'] | ['Wride', 'Mansergh', 'Kisiswa', 'Evans', 'Geatrell', 'Gan', 'Jarrin', 'Boulton', 'Williams'] | ['Wride', 'Mansergh', 'Kisiswa', 'Evans', 'Geatrell', 'Gan', 'Jarrin', 'Boulton', 'Williams'] | Exp Eye Res | 2009 | 2009 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
1412 | GSE8737 | 8/10/2007 | ['8737'] | [] | [u'18398503'] | 2289793 | [u'18398503'] | ['Werft', 'Pietsch', 'Thieme', 'Kratz', 'Janzarik', 'Scheurlen', 'Reifenberger', 'Remke', 'Olbrich', 'Herold-Mende', 'Joos', 'Korshunov', 'Ahmadi', 'Gnekow', 'Omran', 'Ernst', 'Becker', 'Toedt', 'Radlwimmer', 'Lichter', 'Kulozik', 'Pfister', 'Wittmann'] | ['Werft', 'Pietsch', 'Thieme', 'Kratz', 'Janzarik', 'Scheurlen', 'Reifenberger', 'Remke', 'Olbrich', 'Herold-Mende', 'Joos', 'Korshunov', 'Ahmadi', 'Gnekow', 'Omran', 'Ernst', 'Becker', 'Toedt', 'Radlwimmer', 'Lichter', 'Kulozik', 'Pfister', 'Wittmann'] | ['Werft', 'Pietsch', 'Thieme', 'Kratz', 'Janzarik', 'Scheurlen', 'Remke', 'Olbrich', 'Herold-Mende', 'Joos', 'Korshunov', 'Toedt', 'Ahmadi', 'Gnekow', 'Omran', 'Ernst', 'Becker', 'Reifenberger', 'Radlwimmer', 'Lichter', 'Kulozik', 'Pfister', 'Wittmann'] | J Clin Invest | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
1413 | GSE8739 | 11/21/2007 | ['8739'] | [] | [u'17933900'] | 2174696 | [u'17933900'] | ['Thomas', 'Zhang', 'Sun', 'Nambara', 'Park', 'Zentella', 'Jikumaru', 'Fleet', 'Kamiya', 'Endo', 'Murase'] | ['Thomas', 'Zhang', 'Sun', 'Nambara', 'Park', 'Zentella', 'Jikumaru', 'Fleet', 'Kamiya', 'Endo', 'Murase'] | ['Thomas', 'Zhang', 'Sun', 'Nambara', 'Park', 'Jikumaru', 'Zentella', 'Kamiya', 'Fleet', 'Endo', 'Murase'] | Plant Cell | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1414 | GSE8740 | 10/1/2007 | ['8740'] | [] | [u'17953483'] | 2039767 | [u'17953483'] | ['Banu', 'Hudson', 'Coughlan', 'Harmer', 'Covington', 'Kaspi', 'Dehesh', 'Walley'] | ['Banu', 'Hudson', 'Coughlan', 'Harmer', 'Covington', 'Kaspi', 'Dehesh', 'Walley'] | ['Banu', 'Hudson', 'Coughlan', 'Harmer', 'Covington', 'Kaspi', 'Dehesh', 'Walley'] | PLoS Genet | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1415 | GSE8741 | 11/21/2007 | ['8741'] | [] | [u'17933900'] | 2174696 | [u'17933900'] | ['Thomas', 'Zhang', 'Sun', 'Nambara', 'Park', 'Zentella', 'Jikumaru', 'Fleet', 'Kamiya', 'Endo', 'Murase'] | ['Thomas', 'Zhang', 'Sun', 'Nambara', 'Park', 'Zentella', 'Jikumaru', 'Fleet', 'Kamiya', 'Endo', 'Murase'] | ['Thomas', 'Zhang', 'Sun', 'Nambara', 'Park', 'Jikumaru', 'Zentella', 'Kamiya', 'Fleet', 'Endo', 'Murase'] | Plant Cell | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1416 | GSE8746 | 8/14/2007 | ['8746'] | [] | [u'17868443'] | 2375028 | [u'17868443'] | ['King', 'Dukes', 'Watson', 'Britton', 'Abu-Median'] | ['King', 'Dukes', 'Watson', 'Britton', 'Abu-Median'] | ['King', 'Watson', 'Abu-Median', 'Dukes', 'Britton'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1417 | GSE8751 | 11/14/2007 | ['8751'] | [] | [u'17978103'] | 2049191 | [u'17978103'] | ['Bell', 'MacAlpine', 'Ahn', u'manak', 'Botchan', 'Cheung', 'Manak', 'Beall', 'Lewis', 'Speed', 'Georlette'] | ['Bell', 'MacAlpine', 'Ahn', 'Botchan', 'Cheung', 'Manak', 'Beall', 'Lewis', 'Speed', 'Georlette'] | ['Bell', 'MacAlpine', 'Ahn', 'Botchan', 'Cheung', 'Manak', 'Beall', 'Lewis', 'Speed', 'Georlette'] | Genes Dev | 2007 | 11/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1418 | GSE8753 | 12/1/2007 | ['8753'] | [] | [u'18292748'] | 2801700 | [u'20003344'] | ['Heng', 'Zhao', 'Chan'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | [] | BMC Genomics | 2009 | 12/11/2009 | 0 | tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302/E-GEOD-8302 Hs Liver Affymetrix 2 [ 55 ] GSE8302/E-GEOD-8302 Mm Liver Affymetrix 3 [ 55 ] GSE8302/E-GEOD-8302 Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291/E-GEOD-8291 Mm Liver Affymetrix 6 [ 55 ] GSE8292/E-GEOD-8292 Mm Liver Affymetrix 7 [ 55 ] GSE8295/E-GEOD-8295 Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753{{tag}}--REUSE--/E-GEOD-8753{{tag}}--REUSE-- Mm Liver Affymetrix 9 [ 57 ] GSE6903/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302, GSE9291 and GSE9290). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1419 | GSE8757 | 8/16/2007 | ['8757'] | [] | [u'17925008', u'19336569'] | 2881367 | [u'20529912'] | ['Ellis', 'Brenton', 'Foekens', 'Porter', 'Smid', 'van', 'Zhang', 'Tavar\xc3\xa9', 'Marioni', 'Yu', 'Costa', 'Jiang', 'Thorne', 'Green', 'Caldas', 'Martens', 'Sieuwerts', 'Wang', 'Ylstra', 'Klijn', 'Teschendorff', 'Chin', 'Pinder', 'Barbosa-Morais'] | ['Earl', 'Zhu', 'Benz', 'Vaske', 'Haussler', 'Stuart', 'Sanborn', 'Szeto'] | [] | Bioinformatics | 2010 | 6/15/2010 | 0 | .1 Data sources Breast cancer copy number data from Chin et al. ( 2007 ) was obtained from NCBI Gene Expression Omnibus (GEO) under accessions GPL5737 with associated array platform annotation from GSE8757{{tag}}--REUSE--. Probe annotations were converted to BED15 format for display in the UCSC Cancer Genomics Browser (Zhu et al. , 2009 ) and subsequent analysis. Array data were mapped to probe annotations via pro | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1420 | GSE8757 | 8/16/2007 | ['8757'] | [] | [u'17925008', u'19336569'] | 2778664 | [u'19863778'] | ['Ellis', 'Brenton', 'Foekens', 'Porter', 'Smid', 'van', 'Zhang', 'Tavar\xc3\xa9', 'Marioni', 'Yu', 'Costa', 'Jiang', 'Thorne', 'Green', 'Caldas', 'Martens', 'Sieuwerts', 'Wang', 'Ylstra', 'Klijn', 'Teschendorff', 'Chin', 'Pinder', 'Barbosa-Morais'] | ['Costa', 'van', 'ten', 'Eijk', 'Welsh', 'Schmitt', 'Ylstra', 'Narvaez'] | ['Costa', 'van', 'Ylstra'] | BMC Genomics | 2009 | 10/28/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1421 | GSE8757 | 8/16/2007 | ['8757'] | [] | [u'17925008', u'19336569'] | 2246289 | [u'17925008'] | ['Ellis', 'Brenton', 'Foekens', 'Porter', 'Smid', 'van', 'Zhang', 'Tavar\xc3\xa9', 'Marioni', 'Yu', 'Costa', 'Jiang', 'Thorne', 'Green', 'Caldas', 'Martens', 'Sieuwerts', 'Wang', 'Ylstra', 'Klijn', 'Teschendorff', 'Chin', 'Pinder', 'Barbosa-Morais'] | ['Tavar\xc3\xa9', 'Costa', 'van', 'Wang', 'Ylstra', 'Ellis', 'Thorne', 'Pinder', 'Teschendorff', 'Chin', 'Caldas', 'Brenton', 'Marioni', 'Green', 'Barbosa-Morais', 'Porter'] | ['Costa', 'Ellis', 'Wang', 'Ylstra', 'van', 'Thorne', 'Pinder', 'Brenton', 'Green', 'Caldas', 'Teschendorff', 'Marioni', 'Chin', 'Tavar\xc3\xa9', 'Barbosa-Morais', 'Porter'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1422 | GSE8758 | 10/30/2007 | ['8758'] | [] | [u'17720781'] | 2168715 | [u'17720781'] | ['Gill', 'Vickerman', 'Iobst', 'Jesionowski'] | ['Gill', 'Vickerman', 'Iobst', 'Jesionowski'] | ['Gill', 'Vickerman', 'Iobst', 'Jesionowski'] | J Bacteriol | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1423 | GSE8759 | 9/14/2007 | ['8759'] | ['2960'] | [u'17850668'] | 2174953 | [u'17850668'] | ['Ruzzo', 'Schwartz', 'Milewicz', 'Jaeger', 'Morale', 'Yao', u'Morales', 'Francke', 'Mulvihill', 'Emond'] | ['Ruzzo', 'Schwartz', 'Milewicz', 'Jaeger', 'Morale', 'Yao', 'Francke', 'Mulvihill', 'Emond'] | ['Ruzzo', 'Schwartz', 'Milewicz', 'Jaeger', 'Morale', 'Yao', 'Francke', 'Mulvihill', 'Emond'] | BMC Genomics | 2007 | 9/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1424 | GSE8761 | 11/6/2007 | ['8761'] | ['3061'] | [u'17981122'] | 2443060 | [u'17981122'] | ['Farny', 'Roth', 'Silver', 'Komili'] | ['Farny', 'Roth', 'Silver', 'Komili'] | ['Farny', 'Roth', 'Silver', 'Komili'] | Cell | 2007 | 11/2/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1425 | GSE8762 | 8/15/2007 | ['8762'] | ['2887'] | [u'17724341'] | 2620272 | [u'19014681'] | ['Isaacs', 'R\xc3\xa9gulier', u'R\xe9gulier', 'Kristiansen', 'Pratyaksha', 'Luthi-Carter', 'Runne', 'Kuhn', 'Delorenzi', 'Wild', 'Tabrizi'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762{{tag}}--REUSE--, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1426 | GSE8762 | 8/15/2007 | ['8762'] | ['2887'] | [u'17724341'] | 1964868 | [u'17724341'] | ['Isaacs', 'R\xc3\xa9gulier', u'R\xe9gulier', 'Kristiansen', 'Pratyaksha', 'Luthi-Carter', 'Runne', 'Kuhn', 'Delorenzi', 'Wild', 'Tabrizi'] | ['Isaacs', 'R\xc3\xa9gulier', 'Kristiansen', 'Pratyaksha', 'Luthi-Carter', 'Runne', 'Kuhn', 'Delorenzi', 'Wild', 'Tabrizi'] | ['Isaacs', 'R\xc3\xa9gulier', 'Kristiansen', 'Pratyaksha', 'Luthi-Carter', 'Runne', 'Kuhn', 'Delorenzi', 'Wild', 'Tabrizi'] | Proc Natl Acad Sci U S A | 2007 | 9/4/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1427 | GSE8765 | 11/2/2007 | ['8765'] | [] | [u'17981122'] | 2443060 | [u'17981122'] | ['Farny', 'Roth', 'Silver', 'Komili'] | ['Farny', 'Roth', 'Silver', 'Komili'] | ['Farny', 'Roth', 'Silver', 'Komili'] | Cell | 2007 | 11/2/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1428 | GSE8766 | 8/15/2007 | ['8766'] | [] | [u'19134196'] | 2656490 | [u'19134196'] | ['Wride', 'Mansergh', 'Evans', 'Hurley', 'Hunter', 'Daly'] | ['Wride', 'Mansergh', 'Evans', 'Hurley', 'Hunter', 'Daly'] | ['Wride', 'Mansergh', 'Evans', 'Hurley', 'Hunter', 'Daly'] | BMC Dev Biol | 2009 | 1/9/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1429 | GSE8781 | 9/4/2007 | ['8781'] | [] | [u'17905996'] | 2168962 | [u'17905996'] | ['', 'Kuramitsu', 'Terada', 'Yokoyama', 'Shinkai', 'Ohbayashi', 'Shirouzu'] | ['Kuramitsu', 'Terada', 'Yokoyama', 'Shinkai', 'Ohbayashi', 'Shirouzu'] | ['Kuramitsu', 'Terada', 'Yokoyama', 'Shinkai', 'Ohbayashi', 'Shirouzu'] | J Bacteriol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1430 | GSE8786 | 8/15/2007 | ['8786'] | [] | [u'15207491'] | 2600667 | [u'18985025'] | ['Voskuil', 'Visconti', 'Schoolnik'] | ['Bal\xc3\xa1zsi', 'Gennaro', 'Shi', 'Heath'] | [] | Mol Syst Biol | 2008 | 2008 | 0 | ). We identified significantly affected M. tuberculosis origons during hypoxia-induced growth arrest by feeding the newly assembled TR network and the recently published time course microarray data GSE8786{{tag}}--REUSE-- ( Voskuil et al , 2004 ) into NetReSFun . Briefly, the program calculates scaled cross-covariances cov i (τ) between the expression profile x i ( t ) of each gene i and a set of step f| chose the TF with more target genes to name the corresponding origons furB , Rv1931c , and sigE , respectively. Origons significantly affected during transition to non-replicative persistence The GSE8786{{tag}}--REUSE-- microarray dataset that we used ( Voskuil et al , 2004 ) consisted of two time series: aerated growth and growth arrest in hypoxia. For each gene, we used its expression at day 0 in aerated growth | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1431 | GSE8788 | 8/17/2007 | ['8788'] | ['2944'] | [u'17724128'] | 2949890 | [u'20831831'] | ['Kawai', 'Kumar', 'Takeda', 'Ishii', 'Takeuchi', 'Akira', 'Matsuura', 'Okamoto', 'Saitoh', 'Uematsu', 'Yamamoto', 'Satoh', 'Sato'] | ['Monsonego', 'Rubin', 'Geifman'] | [] | BMC Bioinformatics | 2010 | 9/12/2010 | 0 | ound 0.456971* 0.505641 External side of plasma membrane 0.744269 0.508488* Not Found (A) Enrichment in genes with up-regulated expression in hippocampus treated with Fluoxetine, based on GEO profile GSE6476. All terms passed FDR > 0.25 and state the FDR values. (B) GEO profile GSE6675. All terms passed FDR < 0.25 and state the FDR values (Up-regulated in Control in comparison to FGF2 t|ture. † indicates terms that were not included in the NIGO subset. For three GEO expression profiles, the results of the analysis are described in detail (Table 1 ). In the analysis of the GSE6476, in which the effect of chronic Fluoxetine treatment on hippocampal gene expression was examined [ 10 ], eleven terms passed statistic filtering with NIGO and not with the full GO, out of which 7 w|ded in the full GO but not in NIGO passed the statistic cutoffs but this term ('Protein self-association') was functionally irrelevant and contributed very little to the analysis. For the GEO profile GSE6675, in which astroglial gene expression program elicited by fibroblast growth factor-2 was examined, four terms relevant to neural/immune systems passed statistic filtering with NIGO and not with the |ith the full GO. This is an example of how without the use of NIGO one would have to raise the cutoff to at least 0.55 in order for this term to appear in the analysis results. Functional analysis of GSE6509 with NIGO also revealed statistically significant terms that were missed when using the full GO. This experiment involved microarray expression profiling designed to explore the effect of RU486, a |ed both in NIGO and the full GO, received higher FDR values when NIGO was used (Table 2 ), contrary to our expectations from the impact of reducing the ontology size. For example, in the analysis of GSE8788{{tag}}--REUSE-- in which gene expression analysis was conducted using Trib1-deficient macrophages treated with LPS as compared to LPS-treated wild-type macrophages [ 12 ], 11 out of 14 terms (that passed statistic|ment in FDR values when using NIGO is partially due to the stochastic nature of the GSEA algorithm. To test this hypothesis, the same analysis was repeated three times with each of the ontologies for GSE8788{{tag}}--REUSE--. FDR values were averaged and a comparison of analysis results was performed based on these averaged FDR values. In accordance with our hypothesis, the averaging of FDR values improved the apparent|l GO and GO-slim in enrichment analysis using GSEA Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.05 2 (2) 5 0 0 2 0 GSE6675 FDR < 0.25 0 6 0 0 1 0 GSE6476 FDR < 0.25 1 (1) 11 0 0 4 0 GSE3779 P < 0.05 7 (3) 13 1 (1) 0 1 5 GSE8425 P < 0.05 6 (1) 0 0 | 0 0 0 1 GSE8788{{tag}}--REUSE-- P < 0.01 6 (6) 0 0 11 3 1 GSE8788{{tag}}--REUSE--* P < 0.01 6 (6) 0 0 3 11 1 GSE9659 P < 0.01 6 (6) 0 NS A 4 28 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.01 5 (5) 0 0 0 1 2 GSE2259 FDR < 0.25 3 (1) 0 1 (1) 1 2 1 GSE8191 P < 0.05 5 (2) 0 0 0 0 0 The enrichment analysis results of GSEA, providing the full GO, NIGO or GO-s| for each subset given in separate columns. *To test the effect of stochasticity, the analysis of this profile is based on 3-fold averaged FDR values from three independent GSEA analyses. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. Three non-neural- or immune-related datasets were used to test the performance of NIGO in functional analysis of microarray data in|. These datasets included the GEO expression profile, GSE7407, in which gene expression in heart tissue with cardiac specific over-expression of Sirt1 was examined [ 13 ], the GEO expression profile, GSE8191, in which the gene expression profile of mammary glands from pregnant mice was compared to that of mammary glands from lactating mice [ 14 ], and the GEO expression profile, GSE2259, in which gene |in enrichment analysis using the Fisher Exact Test Unique to Subset Lowest FDR Value Experiment Cutoff used Full GO NIGO Generic GO slim Full GO NIGO Generic GO slim Neural- or immune-related studies GSE6509 P < 0.0001 1 (1) 1 0 0 22 2 GSE6675 P < 0.05 1 (1) 0 11 (1) 0 2 0 GSE6476 P < 0.0001 0 2 2 0 15 3 GSE3779 P < 0.1 0 0 0 0 2 0 GSE8425 P < 0.1 0 0 0 0 0 0 GSE|0.001 0 2 0 0 9 1 GSE6136 P < 0.000001 0 1 0 0 37 7 GSE8788{{tag}}--REUSE-- P < 0.01 0 2 0 0 4 1 GSE9659 P < 0.001 0 3 NS A 0 33 NS A Non-neural- or immune-related studies (negative controls) GSE7407 P < 0.0000000001 33 (30) 0 0 9 0 0 GSE2259 P < 0.1 63 (35) 0 2 8 8 7 GSE8191 P < 0.001 6 (6) 7 6 (2) 0 16 3 Enrichment analysis results from Ontologizer, providing the full |ional GO subset (the full GO or generic slim) are also shown for each term ('Lowest P-Value'), with the number of terms with the lowest P-values for each subset given in separate columns. A Profile GSE9659 involves rat tissue, and was not analyzed with the mouse GO-slim. These results, together with the analysis of five additional neural/immune-related experiments not described in Table 1 (namely p| cell's activity. These results further demonstrate that analysis with NIGO can enhance interpretation of functional analysis results produced for relevant microarray datasets. In the analysis of the GSE6509 expression dataset, three relevant terms passed the statistical cutoff with NIGO but not with the full GO. These terms, 'viral envelope', 'viral infectious cycle' and 'viral capsid' are all terms r|et. Such terms received FDR or p-values that were very close (but larger) than the cutoff values used. This is partially explained by the stochastic nature of the GSEA algorithm. Indeed, for one set (GSE8788{{tag}}--REUSE--), we compared the raw results with averaged FDR values. Averaging dramatically decreased the number of such terms. Furthermore, we conducted a similar functional analysis using the Fisher Exact Tes|nous peptide antigen via MHC class I' and 'Positive regulation of T cell mediated cytotoxicity' without defeating the purpose of the slimming process. Yet these two terms were found to be enriched in GSE6476, and are crucial for generating a hypothesis based on the expression profile. This shows that GO slims may be complemented by small, yet fully detailed domain-specific subsets of GO. Figure 4 The U|plementary/NIGO/Supplementary.html . The ontologies were clipped using the Protégé 4.0 beta OWL editor [ 26 ]. For by-species filtering, annotation files for human, rat and mouse were downloaded (October 2008) from GOA-EBI [ 27 ]. Association files used for GSEA analysis were generated based on the GOA-EBI annotation files and in the format required by GSEA. In this format, each row repres|a analysis and functional analysis were conducted using the GenePattern [ 29 ], GSEA [ 9 ] (release 2.5) web servers, and Ontologizer [ 16 ] as follows: (1) for each study, raw data (.CEL files) were downloaded from GEO [ 30 ]; (2) expression files (.gct files) were created using the Gene Pattern Expression File Creator module; (3) where necessary (i.e. for expression files GSE6509, GSE6675, GSE6476, GSE7|was conducted using the GSEA module. GSEA was run three times for each dataset, using a different GO version for each run. For the full GO, we used the organism-specific GO subset. In the analysis of GSE8788{{tag}}--REUSE--, GSEA was run three times for each of the three ontologies and FDR values were averaged over the three runs. (5) Differentially-expressed genes were found using the Gene Pattern ComparativeMarkerSe|, to an input term (Figure 4 ). The connections of UMLS concepts to the tested GO term are provided by UMLS and defined within the UMLS data files. Microarray data sets All microarray data sets were downloaded from GEO at NCBI [ 30 ]. The GEO sets used in this study are described in Additional file 2 . Availability NIGO is freely available as Additional file 6 and for download from: http://bioinfo.bg|re included or excluded from NIGO. Click here for file Additional file 2 Microarray Datasets for Comparative analysis of NIGO, GO and GO-slim . This file contains a summary of the microarray datasets downloaded from GEO at NCBI and used to test the performance of NIGO. Click here for file Additional file 3 GSEA analysis results . This file contains a summary of the GSEA analysis results for each of the mi|etroviral infectivity through different mechanisms Am J Physiol Lung Cell Mol Physiol 2009 297 L538 545 10.1152/ajplung.00162.2009 19561138 The Gene Ontology Consortium http://www.geneontology.org/GO.downloads.ontology.shtml Web Ontology Language Guide http://www.w3.org/TR/owl-guide Protégé 4.0 beta http://protege.stanford.edu Barrell D Dimmer E Huntley RP Binns D O'Donovan C Apweiler R Th | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1432 | GSE8788 | 8/17/2007 | ['8788'] | ['2944'] | [u'17724128'] | 2118688 | [u'17724128'] | ['Kawai', 'Kumar', 'Takeda', 'Ishii', 'Takeuchi', 'Akira', 'Matsuura', 'Okamoto', 'Saitoh', 'Uematsu', 'Yamamoto', 'Satoh', 'Sato'] | ['Kawai', 'Kumar', 'Takeda', 'Ishii', 'Takeuchi', 'Akira', 'Matsuura', 'Okamoto', 'Saitoh', 'Uematsu', 'Yamamoto', 'Satoh', 'Sato'] | ['Kawai', 'Kumar', 'Takeda', 'Ishii', 'Akira', 'Matsuura', 'Okamoto', 'Takeuchi', 'Uematsu', 'Yamamoto', 'Satoh', 'Saitoh', 'Sato'] | J Exp Med | 2007 | 9/3/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1433 | GSE8801 | 9/12/2007 | ['8801'] | [] | [u'17850661'] | 2216076 | [u'17850661'] | ['Roydasgupta', 'Gray', 'Hubbard', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Schittulli', 'Tommasi', 'Benz', 'Fridlyand', 'Fedele', 'Paradiso', 'Albertson'] | ['Roydasgupta', 'Gray', 'Hubbard', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Schittulli', 'Tommasi', 'Benz', 'Fridlyand', 'Fedele', 'Paradiso', 'Albertson'] | ['Roydasgupta', 'Gray', 'Hubbard', 'Moore', 'Dairkee', 'Chew', 'Yau', 'Schittulli', 'Tommasi', 'Benz', 'Fridlyand', 'Fedele', 'Paradiso', 'Albertson'] | Breast Cancer Res | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1434 | GSE8810 | 8/21/2007 | ['8810'] | [] | [u'17967063'] | 2041996 | [u'17967063'] | ['Ahmed', 'Issa', 'Shu', 'Zhang', 'Shen', 'Guo', 'Kondo', 'Chen', 'Waterland'] | ['Ahmed', 'Issa', 'Shu', 'Zhang', 'Shen', 'Guo', 'Kondo', 'Chen', 'Waterland'] | ['Ahmed', 'Issa', 'Shu', 'Zhang', 'Shen', 'Guo', 'Kondo', 'Chen', 'Waterland'] | PLoS Genet | 2007 | 2007 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1435 | GSE8817 | 9/17/2007 | ['8817'] | [] | [u'17889666'] | 2081968 | [u'17889666'] | ['Bergkessel', 'Pleiss', 'Guthrie', 'Whitworth'] | ['Bergkessel', 'Pleiss', 'Guthrie', 'Whitworth'] | ['Bergkessel', 'Pleiss', 'Guthrie', 'Whitworth'] | Mol Cell | 2007 | 9/21/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1436 | GSE8818 | 9/24/2007 | ['8818'] | ['2984'] | [u'17785439'] | 2169070 | [u'17785439'] | ['Huelsken', u'Parisi', 'Robine', u'Naef', 'Fevr', 'Louvard'] | ['Robine', 'Huelsken', 'Louvard', 'Fevr'] | ['Robine', 'Huelsken', 'Louvard', 'Fevr'] | Mol Cell Biol | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1437 | GSE8826 | 8/21/2007 | ['8826'] | [] | [] | 2996971 | [u'20507624'] | [u'Evensen', u'Koop', u'Cooper', u'Mutoloki', u'Marjara'] | ['Koop', 'Marjara', 'Cooper', 'Mutoloki', 'Evensen'] | ['Evensen', 'Koop', 'Cooper', 'Mutoloki', 'Marjara'] | BMC Genomics | 2010 | 5/27/2010 | 0 | tions: a ratio of at least 1.5 fold change in at least 3 of the six replicates and also a statistical p value of < 0.05. The data is deposited at NCBI's GEO repository under accession number GSE8826{{tag}}--DEPOSIT-- and Platform GPL2716 . Validation of microarray results by quantitative RT-PCR The results obtained by microarray experiments were verified by real-time quantitative PCR (qPCR) by using the Light | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1438 | GSE8832 | 8/22/2007 | ['8832'] | ['2970'] | [u'18413791'] | 2376757 | [u'18413791'] | ['Farkas', 'Tsai', 'Alvarez', 'Chou', 'Gottesfeld', 'Dervan'] | ['Farkas', 'Tsai', 'Alvarez', 'Chou', 'Gottesfeld', 'Dervan'] | ['Farkas', 'Tsai', 'Alvarez', 'Chou', 'Gottesfeld', 'Dervan'] | Mol Cancer Ther | 2008 | 2008 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
1439 | GSE8835 | 8/22/2007 | ['8835'] | [] | [u'15965501'] | 2831002 | [u'20064233'] | ['Gribben', 'Neuberg', u'G\xf6rg\xfcn', 'Zahrieh', 'G\xc3\xb6rg\xc3\xbcn', 'Holderried'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux'] | [] | BMC Bioinformatics | 2010 | 1/11/2010 | 0 | Data collection (cel files) was performed using Gene Expression Omnibus [19] on the Affymetrix platform HG-U133a (Human Genome model U133a). This collection consists of 34 datasets (tableÊ2) for which there are at least 15 replicates for each of 2 different experimental conditions. {{key}}--REUSE-- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1440 | GSE8835 | 8/22/2007 | ['8835'] | [] | [u'15965501'] | 3002992 | [u'21187905'] | ['Gribben', 'Neuberg', u'G\xf6rg\xfcn', 'Zahrieh', 'G\xc3\xb6rg\xc3\xbcn', 'Holderried'] | ['Kanduri', 'Santoni', 'Castiglione', 'Hovig', 'Clancy', 'Pedicini', 'Barren\xc3\xa4s', 'Benson'] | [] | PLoS Comput Biol | 2010 | 12/16/2010 | 0 | cells in health and disease We proceeded to examine how the in silico findings related to in vitro studies of T-cells from healthy controls and patients with different T-cell related diseases. We downloaded several sets of gene expression microarray data from the public domain to test whether Th1 and Th2 genes were inversely correlated in T-cell related diseases. If Th1 and Th2 cells are antagonists w| was considered irrelevant. Compilation and analysis of gene expression microarray data To examine whether Th1 and Th2 gene activation patterns denoted two opposed pathways, gene expression data were downloaded from Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo/ ). Datasets were selected based on the criteria that they i) measured mRNA expression from CD4+ cells from healthy controls o|-cell related diseases ( e.g. , SLE) and ii) that there were at least 5 samples per disease or per controls, ( Table 5 ). 10.1371/journal.pcbi.1001032.t005 Table 5 Gene expression microarray datasets downloaded from the Gene Expression Omnibus repository. GEO Accession Number Disorder GSE4588 Systemic Lupus Erythematosus (SLE), Rheumatoid Arthritis (RA) GSE6740 HIV GSE8835{{tag}}--REUSE-- B cell chronic lymphocytic leuke| GSE9927 Type I HIV (HIV-I) GSE10586 Type 1 Diabetes (T1D) GSE12079 Hypereosinophilic syndrome GSE13732 Clinically Isolated Syndrome - Multiple Sclerosis GSE14317 Adult T-cell leukemia/lymphoma (ATL) GSE14924 Acute Myeloid Leukaemia (AML) GSE17354 Adenosine deaminase (ADA) - Severe combined immunodeficiency (SCID) (Therapy treated) Differentially expressed genes between patients and controls in each di | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1441 | GSE8835 | 8/22/2007 | ['8835'] | [] | [u'15965501'] | 2669383 | [u'19332800'] | ['Gribben', 'Neuberg', u'G\xf6rg\xfcn', 'Zahrieh', 'G\xc3\xb6rg\xc3\xbcn', 'Holderried'] | ['Le', 'Gribben', 'Croce', 'Gorgun', 'Quackenbush', 'Liu', 'Zahrieh', 'Holderried', 'Ramsay'] | ['Holderried', 'Zahrieh', 'Gribben'] | Proc Natl Acad Sci U S A | 2009 | 4/14/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1442 | GSE8836 | 12/1/2007 | ['8836'] | [] | [u'19332800'] | 2669383 | [u'19332800'] | ['Le', 'Gribben', 'Croce', 'Gorgun', 'Quackenbush', u'G\xf6rg\xfcn', 'Liu', 'Zahrieh', 'Holderried', 'Ramsay'] | ['Le', 'Gribben', 'Croce', 'Gorgun', 'Quackenbush', 'Liu', 'Zahrieh', 'Holderried', 'Ramsay'] | ['Le', 'Gribben', 'Croce', 'Gorgun', 'Quackenbush', 'Liu', 'Zahrieh', 'Holderried', 'Ramsay'] | Proc Natl Acad Sci U S A | 2009 | 4/14/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1443 | GSE8841 | 8/25/2007 | ['8841'] | [] | [u'19047114'] | 2689870 | [u'19393097'] | ['', 'Apolone', 'Cinquini', 'Clivio', "D'Incalci", 'Mariani', 'Marchini', 'Marrazzo', 'Garbi', 'Torri', 'Chiorino', 'Bonomi', 'Fruscio', 'Broggini', "Dell'Anna"] | ['Nakai', 'Mizuno', 'Kitada', 'Sarai'] | [] | BMC Med Genomics | 2009 | 4/24/2009 | 1 | ccess and analyze without an effective analysis platform. Description To take advantage of public resources in full, a database named "PrognoScan" has been developed. This is 1) a large collection of publicly available cancer microarray datasets with clinical annotation, as well as 2) a tool for assessing the biological relationship between gene expression and prognosis. PrognoScan employs the minimum P |-analysis of multiple datasets. Conclusion PrognoScan provides a powerful platform for evaluating potential tumor markers and therapeutic targets and would accelerate cancer research. The database is publicly accessible at . Background A number of genes are recognized as being potentially relevant to cancers. One way to evaluate such genes is to assess their relationship to prognosis. At present, many ca|transformation [ 6 ], and MYC for tumor maintenance [ 7 ], and provided a rationale for the application to gene expression. Thus, we developed "PrognoScan", a database featuring a large collection of publicly available cancer microarray datasets with clinical annotation and a tool for assessing the relationship between gene expression and prognosis using the minimum P -value approach. This database enabl|ll accelerate cancer research. Construction and content Data collection Cancer microarray datasets with clinical annotation were intensively collected from the public domain including Gene Expression Omnibus (GEO) [ 8 ], ArrayExpress [ 9 ] and individual laboratory web sites, under the following criteria: 1) includes patient information on survival event and time, 2) contains large enough sample sizes to|as possible. All tables were relationally linked and stored in the MySQL server. Table 1 Dataset content from PrognoScan Dataset Cancer type Subtype Cohort Author/Contributor Array type n Data source GSE13507 Bladder cancer Transitional cell carcinoma Cheongju Kim Human-6 v2 n = 165 GEO GSE5287 Bladder cancer Aarhus (1995–2004) Als et al . [ 10 ] HG-U133A n = 30 GEO GSE12417-GPL570 Blood cance|-GPL96 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133A n = 163 GEO GSE12417-GPL97 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133B n = 163 GEO GSE8970 Blood cancer AML San Diego Raponi et al . [ 12 ] HG-U133A n = 34 GEO GSE4475 Blood cancer B-cell lymphoma Berlin (2003–2005) Hummel et al . [ 13 ] HG-U133A n = 158 GEO E-TABM-346 Blood ca|E2658 Blood cancer Multiple myeloma Arkansas Zhan et al . [ 15 ] HG-U133_Plus_2 n = 559 GEO E-TABM-158 Breast cancer UCSF, CPMC (1989–1997) Chin et al . [ 16 ] HG-U133A n = 129 ArrayExpress GSE11121 Breast cancer Mainz (1988–1998) Schmidt et al . [ 17 ] HG-U133A n = 200 GEO GSE1378 Breast cancer MGH (1987–2000) Ma et al . [ 18 ] Arcturus 22 k n = 60 GEO GSE1379 Breast cancer|6-GPL96 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133A n = 159 GEO GSE1456-GPL97 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133B n = 159 GEO GSE2034 Breast cancer Rotterdam (1980–1995) Wang et al . [ 20 ] HG-U133A n = 286 GEO GSE2990 Breast cancer Uppsala, Oxford Sotiriou et al . [ 21 ] HG-U133A n = 187 GEO GSE3143 Breast cancer Duke |GSE3494-GPL96 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133A n = 236 GEO GSE3494-GPL97 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133B n = 236 GEO GSE4922-GPL96 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133A n = 249 GEO GSE4922-GPL97 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133B n = 249 GEO GSE|SE7378 Breast cancer UCSF Zhou et al . [ 26 ] U133AAofAv2 n = 54 GEO GSE7390 Breast cancer Uppsala, Oxford, Stockholm, IGR, GUYT, CRH (1980–1998) Desmedt et al . [ 27 ] HG-U133A n = 198 GEO GSE7849 Breast cancer Duke (1990–2001) Anders et al . [ 28 ] HG-U95A n = 76 GEO GSE9195 Breast cancer GUYT2 Loi et al . [ 25 ] HG-U133_Plus_2 n = 77 GEO GSE9893 Breast cancer Montpellier, Bordeau|man 21 K V12.0 n = 155 GEO GSE11595 Esophagus cancer Adenocarcinoma Sutton Giddings CRUKDMF_22 K_v1.0.0 n = 34 GEO GSE7696 Glioma Glioblastoma Lausanne Murat et al . [ 30 ] HG-U133_Plus_2 n = 70 GEO GSE4271-GPL96 Glioma MDA Phillips et al . [ 31 ] HG-U133A n = 77 GEO GSE4271-GPL97 Glioma MDA Phillips et al . [ 31 ] HG-U133B n = 77 GEO GSE2837 Head and neck cancer Squamous cell carcinoma VUMC, VAMC, |Adenocarcinoma Harvard Beer et al . [ 33 ] HG-U95A n = 84 Author's web site MICHIGAN-LC Lung cancer Adenocarcinoma Michigan (1994–2000) Beer et al . [ 33 ] HuGeneFL n = 86 Author's web site GSE11117 Lung cancer NSCLC Basel Baty Novachip human 34.5 k n = 41 GEO GSE3141 Lung cancer NSCLC Duke Bild et al . [ 22 ] HG-U133_Plus_2 n = 111 GEO GSE4716-GPL3694 Lung cancer NSCLC Nagoya (1995–|da et al . [ 34 ] GF200 n = 50 GEO GSE4716-GPL3696 Lung cancer NSCLC Nagoya (1995–1996) Tomida et al . [ 34 ] GF201 n = 50 GEO GSE8894 Lung cancer NSCLC Seoul Son HG-U133_Plus_2 n = 138 GEO GSE4573 Lung cancer Squamous cell carcinoma Michigan (1991–2002) Raponi et al . [ 35 ] HG-U133A n = 129 GEO DUKE-OC Ovarian cancer Duke Bild et al . [ 22 ] HG-U133A n = 134 Author's web site GSE8|en in red font. Each dataset has a link to the public domain where the raw data is archived. By clicking a probe ID in the summary table, a detailed report for the test is displayed. The table can be downloaded in a tab delimited file from the button at bottom. Figure 2 PrognoScan screenshot and sample search results (part 2) . (A) Annotation table. Row headings are color-coded. For example, headings of d| cancer prognosis and MCTS1 to brain, blood, breast and lung cancer prognosis for the first time. PrognoScan aims to fulfill such substantial practical requirements. Regarding survival analysis using publicly available microarray datasets, several considerations exist: 1) Cohorts. Datasets come from a number of different institutions around the world, and patient backgrounds differ. In addition, several d|subsequent care may affect the clinical course of a patient. 3) Experimental factors. Expression measurement of microarray is subject to various factors at the experiment level. Microdissection (e.g. GSE1378) would reduce contamination of mRNAs from non-cancer cells [ 57 ]. Formalin fixation of a sample (e.g. GSE2873) influences the quality of mRNAs [ 58 ]. Array type (e.g. Affymetrix, cDNA microarrays | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1444 | GSE8842 | 8/25/2007 | ['8842'] | [] | [u'19047114'] | 2707631 | [u'19609451'] | ['', 'Apolone', 'Cinquini', 'Clivio', "D'Incalci", 'Mariani', 'Marchini', 'Marrazzo', 'Garbi', 'Torri', 'Chiorino', 'Bonomi', 'Fruscio', 'Broggini', "Dell'Anna"] | ['Jiang', 'Lee', 'Zhang', 'Song', 'Liu', 'Zhao', 'Fan'] | [] | PLoS One | 2009 | 7/17/2009 | 1 | (MD Anderson cancer center database) [14] and UCSF-2 (Stanford microarray database) [15] and three HGG (grade III and GBM combined) sets from the cohorts UCLA (GEO GDS1975) [3] , MDA (GEO GDS1815) [4] , and CMBC (BROAD institute database) [2] ( Table 1 ). Among the five cohorts, UCLA, UCSF-1 and MDA have 35, 34, and|a I (63); II (20) d UM-HLM [23] Oligos Affymetrix 56 66 (10) 118 a I (160); II (48) d Bladder AUH [24] Oligos Affymetrix 13 NA 30 a III+IV (30) c Ovarium MNI(GSE8842{{tag}}--REUSE--) Spotted cDNA 80 52 (12) 13 a I (68) d a Death. b Metastasis. c Tumor grade. d Tumor stage. NA, not available. m, month. Yr, year. Ref, reference. Using the median OS as a cutoff for each cohort, w|iction of the three gene classifiers for patients with other tumor types, we obtained 12 cohorts including five breast cancer cohorts: GIS (ArrayExpress E-GEOD-3494) [16] , CRCM (GEO GSE9893) [17] , SUSM (Stanford microarray database) [18] , NCI (Rosetta inpharmatics inc database) [19] , EMC (GEO GSE2034) [20] , five l| , PCH (GEO GSE5843) [22] , CAN/DF (caArray) [23] , MSK (caArray) [23] , UM-HLM (caArray) [23] , one bladder cancer cohort AUH (GEO GSE5287) [24] , and one ovarian cancer cohort MNI (GEO GSE8842{{tag}}--REUSE--) with microarray expression data and clinicopathogical information publicly available (detailed in Materials and methods ) (|y stage I) from EMC [20] ; one bladder cancer set of 30 advanced bladder cancers from AUH [24] ; one ovarian tumor set of 68 stage I ovarian carcinomas from MNI (GEO GSE8842{{tag}}--REUSE--). For the two breast cancer cohorts NCI and EMC, where the overall survival times were unavailable, time to distant metastasis was used instead. For all the cohorts, we used normalized microarray d | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1445 | GSE8842 | 8/25/2007 | ['8842'] | [] | [u'19047114'] | 2829240 | [u'19724865'] | ['', 'Apolone', 'Cinquini', 'Clivio', "D'Incalci", 'Mariani', 'Marchini', 'Marrazzo', 'Garbi', 'Torri', 'Chiorino', 'Bonomi', 'Fruscio', 'Broggini', "Dell'Anna"] | ['Brown', 'van', 'Potter', 'Lin', 'Cohn', 'Chan', 'Crijns', 'Fabbri', 'Liu', 'Yan', 'Liyanarachchi', 'Huang', 'Nephew', 'Jansen'] | [] | Oncol Rep | 2009 | 2009 Oct | 0 | ovarian tumors. It is noted that 80 out of 83 ovarian tumors at FIGO stage I showed down-expression, compared with the controls derived from the pooled normal tissues (GEO database, accession number GSE8842{{tag}}--MENTION-- ). Because promoter hypermethylation is a progressive as well as cumulative event ( 36 ) and commonly leads to gene silencing, down-regulation of HAAO in the early stage of ovarian cancer support | 0 | 0 | 1 | NOT pmc_gds | 0 | 0 |
1446 | GSE8847 | 11/15/2007 | ['8847'] | [] | [] | 2290968 | [u'18404217'] | [u'Idaghdour', u'Gibson', u'Story'] | ['Idaghdour', 'Gibson', 'Jadallah', 'Storey'] | [u'Idaghdour', u'Gibson'] | PLoS Genet | 2008 | 4/11/2008 | 0 | transcripts with 500 ng of labeled cRNA for each sample and following manufacturer's recommended protocols. All expression data are available at NCBI Gene Expression Ominbus (GEO) under series number GSE8847{{tag}}--DEPOSIT--. The individual expression arrays are listed as GSM219988 through GSM220033. An Excel spreadsheet list of all differentially expressed gene is also available online at the PLoS Genetics website as | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1447 | GSE8849 | 8/24/2007 | ['8849'] | [] | [u'19014493'] | 2605480 | [u'19014493'] | ['Suttle', 'Anderson', 'Horvath', 'Thimmapuram', 'Chao'] | ['Suttle', 'Anderson', 'Horvath', 'Thimmapuram', 'Chao'] | ['Anderson', 'Suttle', 'Thimmapuram', 'Horvath', 'Chao'] | BMC Genomics | 2008 | 11/12/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1448 | GSE8853 | 12/21/2007 | ['8853'] | ['3223'] | [u'18073124', u'20208004'] | 2556388 | [u'18836535'] | ['Burwinkel', 'Abonia', 'Caldwell', 'Lu', 'Collins', 'Franciosi', 'Putnam', 'Kushner', 'Wells', 'Jameson', 'Wu', 'Kaul', 'Ahrens', 'Rothenberg', 'Greenberg', 'Mingler', 'Martin', 'Stucke', 'Vicario', 'Blanchard', 'Buckmeier', u'Ying'] | ['Abbott', 'Rubin', 'Halpern', 'Ramsey', 'Stephan', 'Hen', 'Alter'] | [] | PLoS One | 2008 | 10/6/2008 | 1 | AND pmc_gds | 0 | 1 | ||||
1449 | GSE8855 | 9/7/2007 | ['8855'] | [] | [u'17851529'] | 2802188 | [u'20008927'] | ['Lan', 'Wang', u'Demeter', 'Canaani', 'Issaeva', 'Roberts', 'Rinn', 'Bayliss', 'Alpatov', 'Shi', 'Chen', 'Iwase', 'Chang', 'Whetstine'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1450 | GSE8857 | 10/16/2007 | ['8857'] | [] | [u'17974019'] | 2213678 | [u'17974019'] | [u'O\u2019Farrelly', 'MacHugh', 'Fitzsimons', 'Zhao', 'Meade', 'Doyle', 'Gormley', "O'Farrelly", 'Keane', 'Costello'] | ['MacHugh', 'Fitzsimons', 'Zhao', 'Meade', 'Doyle', 'Gormley', "O'Farrelly", 'Keane', 'Costello'] | ['MacHugh', 'Fitzsimons', 'Zhao', 'Meade', 'Doyle', 'Gormley', "O'Farrelly", 'Keane', 'Costello'] | BMC Genomics | 2007 | 10/31/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1451 | GSE8862 | 12/1/2007 | ['8862'] | [] | [u'18059476'] | 2206128 | [u'18059476'] | ['Huff', 'Cairns', 'Parnell'] | ['Huff', 'Cairns', 'Parnell'] | ['Huff', 'Cairns', 'Parnell'] | EMBO J | 2008 | 1/9/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1452 | GSE8864 | 8/31/2007 | ['8864'] | [] | [u'17905860'] | 2048804 | [u'17905860'] | ['Kumar', 'Banhara', 'Meyerowitz', 'Riechmann', 'Alves-Ferreira', 'Wellmer'] | ['Kumar', 'Banhara', 'Meyerowitz', 'Riechmann', 'Alves-Ferreira', 'Wellmer'] | ['Kumar', 'Banhara', 'Meyerowitz', 'Riechmann', 'Alves-Ferreira', 'Wellmer'] | Plant Physiol | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1453 | GSE8872 | 8/28/2007 | ['8872'] | [] | [u'17804603'] | 2722667 | [u'19602279'] | [u'Yiwen', 'Vandenborne', 'Chen', 'Shi', 'Gregory', 'Scarborough', 'Walter'] | ['Subramaniam', 'Pont\xc3\xa9n', 'Smith', 'Hedstr\xc3\xb6m', 'Lieber', 'Chambers', 'Ward'] | [] | BMC Med Genomics | 2009 | 7/14/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
1454 | GSE8884 | 11/1/2007 | ['8884'] | [] | [u'17999768'] | 2631022 | [u'19105848'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Tozeren', 'Ertel'] | [] | BMC Genomics | 2008 | 12/23/2008 | 1 | To further investigate the correlation between histone methylation and bimodal gene expression, we gathered additional microarray samples corresponding to H9 stem cells (GEO dataset accession numbers GSE9865, GSE8884{{tag}}--REUSE--, and GSE2248) and evaluated the mode of expression for bimodal genes within those H9 stem cell samples as well as liver samples within our dataset. We identified a group of bimodal genes a| interplay between histone methylation and bimodal gene expression, we gathered additional microarray samples corresponding to H9 stem cells (samples GSM249282, GSM225045, and GSM38629, from datasets GSE9865, GSE8884{{tag}}--REUSE--, and GSE2248, respectively) and evaluated the mode of expression for bimodal genes within that those H9 stem cells as well as liver samples within our dataset. Using the binomial distribut|61 10.1093/nar/gkf492 Lareau LF Brooks AN Soergel DA Meng Q Brenner SE The coupling of alternative splicing and nonsense-mediated mRNA decay Adv Exp Med Biol 2007 623 190 211 18380348 Gene expression omnibus Barrett T Edgar R Gene expression omnibus: microarray data storage, submission, retrieval, and analysis Methods Enzymol 2006 411 352 369 16939800 10.1016/S0076-6879(06)11019-8 ArrayExpress Brazma A P | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1455 | GSE8884 | 11/1/2007 | ['8884'] | [] | [u'17999768'] | 2268390 | [u'18212061'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Gatta', 'Dolfini', 'Vigan\xc3\xb2', 'Pavesi', 'Mantovani', 'Ceribelli', 'Merico'] | [] | Mol Cell Biol | 2008 | 2008 Mar | 0 | of four experiments for NF-Y or two out of three experiments for active histone marks, with corresponding probability values computed with a binomial distribution. Profiling experiments. The GEO GSE6022 data set, containing a total of three untreated HeLa cell samples, each being a biological replicate, was used for HeLa profiling experiments. The GEO GSE6207 data set, containing seven untreate| Sato et al. ( 45 ), containing three untreated samples, each being a biological replicate (Affimetrix HG-U133-A platform), was used for human embryonic stem (ES) cell profiling experiments. The GEO GSE8884{{tag}}--REUSE-- (Affimetrix HG-U133 plus 2.0 platform) was also used for the latter analysis, yielding essentially the same results despite the larger coverage of the platform (not shown). Expression signals for |is coupled to inactive genes. FIG. 5. Correlation between NF-Y binding, active histone marks, and gene expression. (A) NF-Y + promoters were scored for the presence (P) or absence (A) in the GSE6022 data set after the intersection with histone mark experiments. (B) Same as panel A, (more ...) FIG. 5. Correlation between NF-Y binding, active histone marks, and gene expression. (A) NF-Y � | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1456 | GSE8884 | 11/1/2007 | ['8884'] | [] | [u'17999768'] | 2258184 | [u'17999768'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1457 | GSE8886 | 9/5/2007 | ['8886'] | [] | [u'18413601'] | 2299225 | [u'18413601'] | ['Taoufik', 'Cerami', 'Probert', 'Petit', 'Tseveleki', 'Quackenbush', 'Roberts', 'Divoux', 'Brines', 'Mengozzi', 'Valable', 'Ghezzi'] | ['Taoufik', 'Cerami', 'Probert', 'Petit', 'Tseveleki', 'Quackenbush', 'Roberts', 'Divoux', 'Brines', 'Mengozzi', 'Valable', 'Ghezzi'] | ['Taoufik', 'Cerami', 'Mengozzi', 'Petit', 'Tseveleki', 'Quackenbush', 'Roberts', 'Divoux', 'Brines', 'Probert', 'Valable', 'Ghezzi'] | Proc Natl Acad Sci U S A | 2008 | 4/22/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1458 | GSE8894 | 11/20/2007 | ['8894'] | [] | [u'19010856'] | 2689870 | [u'19393097'] | ['Lee', 'Han', 'Shim', 'Ko', 'Choi', 'Park', 'Jung', 'Son', 'Huh', 'Jo', 'Kim', 'Lim'] | ['Nakai', 'Mizuno', 'Kitada', 'Sarai'] | [] | BMC Med Genomics | 2009 | 4/24/2009 | 1 | as possible. All tables were relationally linked and stored in the MySQL server. Table 1 Dataset content from PrognoScan Dataset Cancer type Subtype Cohort Author/Contributor Array type n Data source GSE13507 Bladder cancer Transitional cell carcinoma Cheongju Kim Human-6 v2 n = 165 GEO GSE5287 Bladder cancer Aarhus (1995–2004) Als et al . [ 10 ] HG-U133A n = 30 GEO GSE12417-GPL570 Blood cance|-GPL96 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133A n = 163 GEO GSE12417-GPL97 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133B n = 163 GEO GSE8970 Blood cancer AML San Diego Raponi et al . [ 12 ] HG-U133A n = 34 GEO GSE4475 Blood cancer B-cell lymphoma Berlin (2003–2005) Hummel et al . [ 13 ] HG-U133A n = 158 GEO E-TABM-346 Blood ca|E2658 Blood cancer Multiple myeloma Arkansas Zhan et al . [ 15 ] HG-U133_Plus_2 n = 559 GEO E-TABM-158 Breast cancer UCSF, CPMC (1989–1997) Chin et al . [ 16 ] HG-U133A n = 129 ArrayExpress GSE11121 Breast cancer Mainz (1988–1998) Schmidt et al . [ 17 ] HG-U133A n = 200 GEO GSE1378 Breast cancer MGH (1987–2000) Ma et al . [ 18 ] Arcturus 22 k n = 60 GEO GSE1379 Breast cancer|6-GPL96 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133A n = 159 GEO GSE1456-GPL97 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133B n = 159 GEO GSE2034 Breast cancer Rotterdam (1980–1995) Wang et al . [ 20 ] HG-U133A n = 286 GEO GSE2990 Breast cancer Uppsala, Oxford Sotiriou et al . [ 21 ] HG-U133A n = 187 GEO GSE3143 Breast cancer Duke |GSE3494-GPL96 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133A n = 236 GEO GSE3494-GPL97 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133B n = 236 GEO GSE4922-GPL96 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133A n = 249 GEO GSE4922-GPL97 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133B n = 249 GEO GSE|SE7378 Breast cancer UCSF Zhou et al . [ 26 ] U133AAofAv2 n = 54 GEO GSE7390 Breast cancer Uppsala, Oxford, Stockholm, IGR, GUYT, CRH (1980–1998) Desmedt et al . [ 27 ] HG-U133A n = 198 GEO GSE7849 Breast cancer Duke (1990–2001) Anders et al . [ 28 ] HG-U95A n = 76 GEO GSE9195 Breast cancer GUYT2 Loi et al . [ 25 ] HG-U133_Plus_2 n = 77 GEO GSE9893 Breast cancer Montpellier, Bordeau|man 21 K V12.0 n = 155 GEO GSE11595 Esophagus cancer Adenocarcinoma Sutton Giddings CRUKDMF_22 K_v1.0.0 n = 34 GEO GSE7696 Glioma Glioblastoma Lausanne Murat et al . [ 30 ] HG-U133_Plus_2 n = 70 GEO GSE4271-GPL96 Glioma MDA Phillips et al . [ 31 ] HG-U133A n = 77 GEO GSE4271-GPL97 Glioma MDA Phillips et al . [ 31 ] HG-U133B n = 77 GEO GSE2837 Head and neck cancer Squamous cell carcinoma VUMC, VAMC, |Adenocarcinoma Harvard Beer et al . [ 33 ] HG-U95A n = 84 Author's web site MICHIGAN-LC Lung cancer Adenocarcinoma Michigan (1994–2000) Beer et al . [ 33 ] HuGeneFL n = 86 Author's web site GSE11117 Lung cancer NSCLC Basel Baty Novachip human 34.5 k n = 41 GEO GSE3141 Lung cancer NSCLC Duke Bild et al . [ 22 ] HG-U133_Plus_2 n = 111 GEO GSE4716-GPL3694 Lung cancer NSCLC Nagoya (1995–|da et al . [ 34 ] GF200 n = 50 GEO GSE4716-GPL3696 Lung cancer NSCLC Nagoya (1995–1996) Tomida et al . [ 34 ] GF201 n = 50 GEO GSE8894{{tag}}--REUSE-- Lung cancer NSCLC Seoul Son HG-U133_Plus_2 n = 138 GEO GSE4573 Lung cancer Squamous cell carcinoma Michigan (1991–2002) Raponi et al . [ 35 ] HG-U133A n = 129 GEO DUKE-OC Ovarian cancer Duke Bild et al . [ 22 ] HG-U133A n = 134 Author's web site GSE8|horts. Datasets come from a number of different institutions around the world, and patient backgrounds differ. In addition, several datasets are based on specific subpopulations, for example, dataset GSE2034 is from lymph node-negative breast cancers, and GSE5287 is from cisplatin-containing chemotherapy-treated bladder cancers. Hence, it is possible that the specific association between gene expressio|subsequent care may affect the clinical course of a patient. 3) Experimental factors. Expression measurement of microarray is subject to various factors at the experiment level. Microdissection (e.g. GSE1378) would reduce contamination of mRNAs from non-cancer cells [ 57 ]. Formalin fixation of a sample (e.g. GSE2873) influences the quality of mRNAs [ 58 ]. Array type (e.g. Affymetrix, cDNA microarrays | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1459 | GSE8894 | 11/20/2007 | ['8894'] | [] | [u'19010856'] | 2988703 | [u'21047417'] | ['Lee', 'Han', 'Shim', 'Ko', 'Choi', 'Park', 'Jung', 'Son', 'Huh', 'Jo', 'Kim', 'Lim'] | ['Watkinson', 'Anastassiou', 'Varadan', 'Kim'] | ['Kim'] | BMC Med Genomics | 2010 | 11/3/2010 | 0 | ata sets in the paper is given in Table 1 . They were identified by searching for rich data sets focused on a specific cancer in two public databases, The Cancer Genome Atlas and the Gene Expression Omnibus data depository. Furthermore, for the data sets initially used to infer the gene signature we required that they have well annotated staging information associated with the samples and that they cont|ists of Data sets in the paper Data set name Source Site GEO Accession Affymetrix platform Sample size TCGA Ovarian Cancer The Cancer Genome Atlas HT_HG-U133A a 377 CCR Ovarian Cancer Gene Expression Omnibus GSE9891 HG-U133_Plus_2 b 285 CCR Colon Cancer Gene Expression Omnibus GSE14333 HG-U133_Plus_2 290 Moffitt Colon Cancer Gene Expression Omnibus GSE17536 HG-U133_Plus_2 177 Singapore Gastric Cancer Gen|mnibus GSE15459 HG-U133_Plus_2 200 CCR Breast Cancer Gene Expression Omnibus GSE7390 HG-U133A c 198 Wang Breast Cancer Gene Expression Omnibus GSE2034 HG-U133A 286 Samsung Lung Cancer Gene Expression Omnibus GSE8894{{tag}}--REUSE-- HG-U133_Plus_2 138 Bild Lung Cancer Gene Expression Omnibus GSE3141 HG-U133_Plus_2 111 Neuroblastoma tumor Gene Expression Omnibus GSE3960 HG_U95Av2 102 Neoadjuvant Breast Cancer Gene Express|aging phenotype (specific to each cancer type) suggests that it could be used as a "proxy" of the MAF signature. This would allow us to improve on the gene list of Table 2 by making use of numerous publicly available gene expression data sets of cancers of many types, even without any staging information, as long as the MAF signature is present in a sizeable subset of them, aiming at finding the "inters|s-inhibiting therapeutic intervention targeting the MAF mechanism would be widely applicable to low-stage tumors. Conclusions In conclusion, we have shown that, using purely computational analysis of publicly available biological information, systems biology has revealed the core of a multi-cancer metastasis-associated gene expression signature. In the near future, a vast amount of additional information | mutual information with COL11A1 . Click here for file Additional file 5 Heat map of neuroblastoma data set . This file contains the result of hierarchical clustering for the neuroblastoma data set (GSE3960) using the MAF signature genes. Click here for file Additional file 6 Heat map of breast cancer data set using MAF signature genes . This file contains the result of hierarchical clustering for the|ure genes. Click here for file Additional file 7 Heat map of breast cancer data set using DCN metagene set . This file contains the result of hierarchical clustering for the breast cancer data set (GSE4779) using the DCN metagene set. Click here for file Acknowledgements Appreciation is expressed to Prof. Jessica Kandel, MD for helpful discussions. This work was supported by university inventor's ( | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1460 | GSE8894 | 11/20/2007 | ['8894'] | [] | [u'19010856'] | 2742946 | [u'19576624'] | ['Lee', 'Han', 'Shim', 'Ko', 'Choi', 'Park', 'Jung', 'Son', 'Huh', 'Jo', 'Kim', 'Lim'] | ['Kris', 'Massagu\xc3\xa9', 'Gerald', 'Zhang', 'Nguyen', 'Kim', 'Ladanyi', 'Chiang'] | ['Kim'] | Cell | 2009 | 7/10/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1461 | GSE8910 | 9/14/2007 | ['8910'] | [] | [u'17199047'] | 2683727 | [u'19401680'] | ['Lipshitz', 'Babak', 'Vardy', u'Simbert', 'Goldman', 'Menzies', 'Hughes', 'Orr-Weaver', 'Westwood', 'Smibert', 'Tadros'] | ['Stormo', 'Foat'] | [] | Mol Syst Biol | 2009 | 2009 | 1 | vts1 strain, Oberstrass et al (2006) used microarrays to look for mRNAs that were differentially expressed in a wild-type versus a Δ vts1 strain (Gene Expression Omnibus (GEO) accession GSE3859). They confirmed with Northern blots that a few transcripts were present at different levels between the two strains and contained predicted Vts1p-binding sites. Aviv et al (2006b) used a pull-|ree microarray time courses in Figure 3B examined gene expression in activated Drosophila eggs or embryos during early embryogenesis ( Pilot et al , 2006 ; Tadros et al , 2007 ; GEO accessions GSE8910{{tag}}--REUSE--, GSE3955). In both wild-type time courses, having high-affinity Smaug-binding sites correlates with reduced mRNA concentration starting at the 2–4 h time point and T1 (slow phase). This obs| the heavier fractions. Qin et al (2007) performed such microarray profiling of mRNA–ribosome association during a time course of the first 10 h of Drosophila development (GEO accession GSE5430). By examining the TFAPs of the Smaug specificities over two replicates each of a 0–2 h sample and a 4–6 h sample, we see that mRNAs that are bound by Smaug are being specifically e| activity that we see with the Smaug SCREs. The Dm3 and Dm4 specificities were detected using microarray data that compared expression in wild-type flies and flies lacking the Kep1 RBP (GEO accession GSE6086), suggesting that they may represent the specificity of Kep1. There were no significant themes among Gene Ontology, phenotype, or in situ annotations for Dm3 and Dm4, besides that their targets s|icing ( Fruscio et al , 2003 ; Robard et al , 2006 ). Finally, Dm5 and Dm6 both were detected using polysome association data from the early Drosophila embryo ( Qin et al , 2007 ; GEO accession GSE5430). Both have strong correlations during the 0–2 h time point and have almost no effect by 4–6 h. Dm5 has a strong positive correlation with the lightest fraction, suggesting that tra|c colorectal cancer cell line, SW620, versus a non-metastatic cell line from the same patient, SW480, as measured in a polysome association microarray study ( Provenzani et al , 2006 ; GEO accession GSE2509). Transcripts containing Hs2 SCREs are expressed at a lower level in U937 cells that have been exposed to 12-myristate 13-acetate (PMA) and caused to differentiate into a macrophage-like state ( Ki| correlate with increased association with ribosomes in human mammary epithelial cells, regardless of whether translation initiation factor 4F is overexpressed ( Larsson et al , 2007 ; GEO accession GSE6043). Although a Smaug homolog exists in the human genome, we did not detect a Smaug/Vts1p-like specificity in the data that we analyzed. Nevertheless, we could calculate TFAPs for the Drosophila Sma|ok for Smaug activities that were too weak to detect in the original search. Indeed, there were two RBP pull-down microarray studies, one for poly-pyrimidine tract binding protein (PTB; GEO accession GSE6021; Gama-Carvalho et al , 2006 ) and one for Staufen1 and Staufen2 ( Furic et al , 2008 ; GEO accessions GSE8437, GSE8438), where pulled-down mRNAs were enriched for Smaug-binding sites ( Figure 7C|m2 did not correlate with mRNA levels in similarly treated Δ smg eggs (not shown). Dm3 and Dm4 correlated with mRNA levels changing between wild-type and Δ kep1 flies (GEO accession GSE6086), suggesting that Dm3 and Dm4 may reflect the specificity of Kep1, an RNA-binding protein. ( C ) Dm5 and Dm6 were detected from microarray data measuring mRNA association with ribosomes in early dr | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1462 | GSE8910 | 9/14/2007 | ['8910'] | [] | [u'17199047'] | 2965385 | [u'20858238'] | ['Lipshitz', 'Babak', 'Vardy', u'Simbert', 'Goldman', 'Menzies', 'Hughes', 'Orr-Weaver', 'Westwood', 'Smibert', 'Tadros'] | ['Alonso', 'Thomsen', 'Huber', 'Janga', 'Anders'] | [] | Genome Biol | 2010 | 2010 | 0 | on [ 68 ], lists of up- and down-regulated proteins [ 69 ] and a list of genes with AREs [ 71 ] were obtained from the literature; for Smaug, we reanalyzed available raw data from the Gene Expression Omnibus [GEO:GSE8910{{tag}}--REUSE--] as described [ 43 ] and considered 260 genes with the highest differential expression as targets. For all lists, we retained only genes represented on Drosophila Genome 2.0 Gene Chips | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1463 | GSE8919 | 11/4/2007 | ['8919'] | [] | [u'17982457'] | 2940763 | [u'20862329'] | ['Holmans', 'Leung', 'Zismann', 'Heward', 'Nath', 'Stephan', 'Gibbs', 'Joshipura', 'Hu-Lince', 'Rohrer', 'Coon', 'Pearson', 'Huentelman', 'Reiman', 'Zhao', 'Webster', 'Myers', 'Marlowe', 'Bryden', 'Craig', 'Hardy', 'Kaleem'] | ['Harold', 'Goate', 'Peskind', 'Galasko', 'Mayo', 'Owen', 'Holtzman', 'Cruchaga', 'Li', 'Morris', 'Hollingworth', 'Lovestone', 'Spiegel', 'Nowotny', 'Kauwe', 'Bertelsen', 'Shah', 'Abraham', 'Fagan', 'Williams', 'Leverenz'] | [] | PLoS Genet | 2010 | 9/16/2010 | 0 | 1D ). Rs12713636, in LD with rs1868402 (D′ = 1, R 2  = 0.75) also shows association with PPP3R1 mRNA levels and in the same direction in the publicly available GEO GSE8919{{tag}}--MENTION-- [34] dataset ( P  = 0.015). Discussion In the present study we have used a novel and powerful endophenotype-based approach to identify |em interval, APOE genotype, and CDR). One-tailed P-values were calculated, because a priori predictions were made based on the associations with CSF ptau 181 levels. We also used the GEO dataset GSE8919{{tag}}--REUSE-- [51] to analyze the association between rs12713636, in LD with rs1868402 (D′ = 1, R 2  = 0.75) and PPP3R1 gene expression | 1 | 0 | 1 | NOT pmc_gds | 0 | 1 |
1464 | GSE8919 | 11/4/2007 | ['8919'] | [] | [u'17982457'] | 2761903 | [u'19772654'] | ['Holmans', 'Leung', 'Zismann', 'Heward', 'Nath', 'Stephan', 'Gibbs', 'Joshipura', 'Hu-Lince', 'Rohrer', 'Coon', 'Pearson', 'Huentelman', 'Reiman', 'Zhao', 'Webster', 'Myers', 'Marlowe', 'Bryden', 'Craig', 'Hardy', 'Kaleem'] | ['Gillis', 'Pavlidis'] | [] | BMC Bioinformatics | 2009 | 9/22/2009 | 0 | experiment name, organism part, array design and age category for the experiments are listed in each column. Experiments used for analysis . Gemma ID Name Organism part Array Design Age category 622 GSE8586 Umbilical cord GPL570 Prenatal 726 GSE9164 Foreskin cells GPL5876 Prenatal 233 GSE1397 Brain, heart GPL96 Prenatal 215 khatua-astrocytoma Brain GPL91 Child/young adult 218 pomeroy-embryonal Brain, |Child/young adult 555 GSE5808 Blood cell GPL96 Child/young adult 585 GSE7586 Placenta GPL570 Adult 178 GSE80 Muscle GPL91 Adult 633 GSE8607 Testis GPL91 Adult 275 GSE4757 Brain GPL570 Older adult 721 GSE8919{{tag}}--REUSE-- Brain GPL2700 Older adult 263 GSE5281 Brain GPL570 Older adult To allow the investigation of differential expression over age, we computed a relative rank-based measure of expression level for each | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1465 | GSE8919 | 11/4/2007 | ['8919'] | [] | [u'17982457'] | 2831070 | [u'20209083'] | ['Holmans', 'Leung', 'Zismann', 'Heward', 'Nath', 'Stephan', 'Gibbs', 'Joshipura', 'Hu-Lince', 'Rohrer', 'Coon', 'Pearson', 'Huentelman', 'Reiman', 'Zhao', 'Webster', 'Myers', 'Marlowe', 'Bryden', 'Craig', 'Hardy', 'Kaleem'] | ['Santiago', 'Guerreiro', 'Oliveira', 'Ribeiro', 'Hardy', 'Fox', 'Singleton', 'Rossor', 'Beck', 'Collinge', 'Santana', 'Mead', 'Gibbs', 'Nalls', 'Schott'] | ['Hardy', 'Gibbs'] | PLoS One | 2010 | 3/3/2010 | 0 | anscripts we conducted an expression quantitative loci (eQTL) analysis for CLU genomic region, for gene expression in cortical tissues from a group of 174 neurologically normal controls (GEO Series GSE8919{{tag}}--REUSE--, Myers et al 2007). Results Sequencing Analysis A total of twenty-four variants were found in both cohorts. From these, eleven were observed in only one subject ( Table 1 ). These eleven variants i|tatistically significant results within the immediate region of AD associated loci for CLU ( Figure 2 ). We were, however, able to confirm that CLU levels are higher in AD cases than in controls (GSE15222) [11] , [20] , though this is difficult to interpret because of the changing cellular composition of diseased tissue. 10.1371/journal.pone.0009510.g002 Figure 2 M|ntrols from the United States, obtained from our previously published eQTL analysis [31] . Gene expression data are available at NCBI's Gene Expression Omnibus (GEO Series Accession, GSE8919{{tag}}--REUSE--) and genotype data are available from TGEN ( http://www.tgen.org/research/neuro_gab2.cfm ). Based on genotype data samples were checked for quality of genotyping based upon genotype call rate per s | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1466 | GSE8920 | 11/1/2007 | ['8920'] | [] | [] | 2821899 | [u'19351829'] | [u'Gong', u'Drobizhev', u'McInnerney', u'Spangler', u'Starkey', u'Rebane', u'Elliott', u'Meng'] | ['Rudin', 'Parmigiani', 'Marchionni', 'Rhodes', 'Devereux', 'Hierman', 'Daniel', 'Peacock', 'Dorsch', 'Watkins', 'Yung'] | [] | Cancer Res | 2009 | 4/15/2009 | 0 | AND pmc_gds | 0 | 1 | ||||
1467 | GSE8934 | 11/5/2007 | ['8934'] | [] | [u'17975066'] | 3017766 | [u'20565716'] | ['Wang', 'Dinneny', 'Benfey', 'Orlando', 'Lee', 'Mace', 'Ohler', 'Koch', 'Brady'] | ['Ahnert', 'Fink', 'Benfey', 'Orlando', 'Brady'] | ['Benfey', 'Orlando', 'Brady'] | BMC Genomics | 2010 | 6/16/2010 | 0 | nterest. The dataset we chose was a recent synchrony/release time-series microarray dataset from the yeast Saccharomyces cerevisiae measuring gene expression through the cell cycle (GEO Accession: GSE8799 )[ 2 ]. In the synchrony/release protocol used by the study, a population of cells is synchronized to early G1 phase. The cells are subsequently released to progress through the cell cycle, during |itudinal axis. In the work of Brady et al. [ 3 ], two developmental microarray time courses were generated by taking 12 or 13 successive transverse sections along an Arabidopsis root (GEO Accession: GSE8934{{tag}}--REUSE-- ). We use these two time-courses as the replicates (removing the 1 st section of the 13 section time-course) and use developmental time as the known time scale. We analyzed each of the 22746 genes | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1468 | GSE8936 | 9/5/2007 | ['8936'] | [] | [u'17881564'] | 1986572 | [u'17881564'] | ['Udayakumar', 'Aharoni', 'Marsch-Martinez', 'Nataraja', 'Trijatmiko', 'Karaba', 'Greco', 'Krishnan', 'Dixit', 'Pereira'] | ['Udayakumar', 'Aharoni', 'Marsch-Martinez', 'Nataraja', 'Trijatmiko', 'Karaba', 'Greco', 'Krishnan', 'Dixit', 'Pereira'] | ['Udayakumar', 'Aharoni', 'Marsch-Martinez', 'Nataraja', 'Trijatmiko', 'Karaba', 'Greco', 'Krishnan', 'Dixit', 'Pereira'] | Proc Natl Acad Sci U S A | 2007 | 9/25/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1469 | GSE8938 | 9/11/2007 | ['8938'] | [] | [u'17967061'] | 2042021 | [u'17967061'] | ['Clark', 'Schlenke', 'Govind', 'Morales'] | ['Clark', 'Schlenke', 'Govind', 'Morales'] | ['Clark', 'Schlenke', 'Govind', 'Morales'] | PLoS Pathog | 2007 | 10/26/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1470 | GSE8945 | 12/1/2007 | ['8945'] | [] | [u'18073345'] | 3012675 | [u'21110835'] | ['Benes', 'Schreiner', 'Bindereif', 'Hui', 'Hung', 'Heiner'] | ['Anton', 'Rubio', 'Aramburu'] | [] | BMC Bioinformatics | 2010 | 11/26/2010 | 0 | tation improves the estimation of the concentrations. In most of the cases the diminution in the estimated error is statistically significant. Real datasets SPACE was also applied to several datasets downloaded from GEO [ 19 ]. All of them have available CEL files to perform all the steps of the analysis and the corresponding papers include validation of the results using RT-PCR. These datasets are: Affym| each tissue along with several mixtures of three tissues (heart, testes and cerebellum). A recent study made by de la Grange et al . [ 8 ] includes RT-PCR validation for some genes in that dataset; GSE9385 [ 20 ] (a study of glial brain tumors in humans); GSE9372 [ 21 ] (a comparative study of the relationships between genotype and alternative splicing in humans); GSE11344 [ 22 ] (change in splicingand GSE8945{{key}}--REUSE-- [23] (a study on the effect of the hnRNP L protein on alternative splicing). |ly in the three groups. This error in the predicted concentration may be owed to the lack of identifiability of this particular gene structure [ 15 , 16 ]. Figure 5 Results for ATP2B4 gene (data from GSE9385) . The content of each panel is explained in figure 4. (f) In Ensembl release 51 there are six different annotated transcripts. (g) The predicted number of transcripts using SPACE is two and their |are shown. There are three replicates of each sample and the estimated concentrations match perfectly PCR results for both splicing variants in each sample. Figure 6 Results for PARP2 gene (data from GSE9372) . According to Kwan et al . [ 21 ], this gene shows AS related to a particular SNP in probeset 3527423 and it has a new variant with its second exon shortened. The content of each panel is explai|3 (normal tissue) and variant 2 is predominant in the samples 4 to 6 (tissue of mice with PTB depletion). These concentrations are in concordance with PCRs. Figure 7 Results for Ncam1 gene (data from GSE11344) . The content of each panel is explained in figure 4. (b) Residuals of the RMA normalization model. It can be observed that for many exons there are residuals that are much larger than expected. |/gb-2008-9-2-r46 18312629 Owen A Perry P Bi-cross-validation of the SVD and the non-negative matrix factorization Annals of Applied Statistics 2009 3 2 564 594 10.1214/08-AOAS227 GEO (Gene Expression Omnibus) http://www.ncbi.nlm.nih.gov/geo/ French P Peeters J Horsman S Duijm E Siccama I van den Bent M Luider T Kros J van der Spek P Sillevis Smitt P Identification of differentially regulated splice varia | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1471 | GSE8945 | 12/1/2007 | ['8945'] | [] | [u'18073345'] | 2717951 | [u'19545436'] | ['Benes', 'Schreiner', 'Bindereif', 'Hui', 'Hung', 'Heiner'] | ['Heber', 'Sick', 'Howard'] | [] | BMC Bioinformatics | 2009 | 6/22/2009 | 1 | ded from the NCBI GEO database [ 28 ]. A variety of Affymetrix GeneChip 3' Expression array types are represented in the dataset, including: ath1121501 (Arabidopsis, 248 chips; GEO accession numbers: GSE5770, GSE5759, GSE911 [ 29 ], GSE2538 [ 30 ], GSE3350 [ 31 ], GSE3416 [ 32 ], GSE5534, GSE5535, GSE5530, GSE5529, GSE5522, GSE5520, GSE1491 [ 33 ], GSE2169, GSE2473), hgu133a (human, 72 chips; GSE1420 [|), hgu95av2 (human, 51 chips; GSE1563 [ 35 ]), hgu95d (human, 22 chips; GSE1007 [ 36 ]), hgu95e (human, 21 chips; GSE1007), mgu74a (mouse, 60 chips; GSE76, GSE1912 [ 37 ]), mgu74av2 (mouse, 29 chips; GSE1947 [ 38 ], GSE1419 [ 39 , 40 ]), moe430a (mouse, 10 chips; GSE1873 [ 41 ]), mouse4302 (mouse, 20 chips; GSE5338 [ 42 ], GSE1871 [ 43 ]), rae230a (rat, 26 chips; GSE1918, GSE2470), and rgu34a (rat, 44 |xperiment. The second dataset consists of all of the exon array .CEL files available in the GEO database at the time of this analysis (540 .CEL files). Fourteen different experiments are represented (GSE10599 [ 47 ], GSE10666 [ 48 ], GSE11150 [ 49 ], GSE11344 [ 50 ], GSE11967 [ 51 ], GSE12064 [ 52 ], GSE6976 [ 53 ], GSE7760 [ 54 ], GSE7761 [ 55 ], GSE8945{{tag}}--REUSE-- [ 56 ], GSE9342, GSE9372 [ 57 ], GSE9385 [ 58 ] | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1472 | GSE8945 | 12/1/2007 | ['8945'] | [] | [u'18073345'] | 2212255 | [u'18073345'] | ['Benes', 'Schreiner', 'Bindereif', 'Hui', 'Hung', 'Heiner'] | ['Benes', 'Schreiner', 'Bindereif', 'Hui', 'Hung', 'Heiner'] | ['Benes', 'Schreiner', 'Bindereif', 'Hui', 'Hung', 'Heiner'] | RNA | 2008 | 2008 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
1473 | GSE8949 | 12/12/2007 | ['8949'] | [] | [u'20018933'] | 2408877 | [u'18285614'] | ['Faraci', 'Maeda', 'Beyer', 'Casavant', 'de', 'Liu', 'Keen', 'Halabi', 'Sigmund'] | ['Faraci', 'Maeda', 'Beyer', 'Tsai', 'Gerhold', 'de', 'Ghoneim', 'Keen', 'Halabi', 'Sigmund', 'Modrick', 'Baumbach', 'Lynch'] | ['Faraci', 'Maeda', 'Beyer', 'de', 'Keen', 'Halabi', 'Sigmund'] | Hypertension | 2008 | 2008 Apr | 0 | #x003b3; P465L mutation as detailed in the Supplemental Methods (see http://hyper.ahajournals.org ). Data from the microarray studies including CEL files have been submitted to the Gene Expression Omnibus at NCBI (array platform: GPL1261 , series accession: GSE8949{{tag}}--DEPOSIT-- ). Aortic Ring Preparation Male and female mice were given a lethal dose of pentobarbital (50 to 100 mg/mouse IP), and the thoracic aort | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1474 | GSE8949 | 12/12/2007 | ['8949'] | [] | [u'20018933'] | 2275166 | [u'18316027'] | ['Faraci', 'Maeda', 'Beyer', 'Casavant', 'de', 'Liu', 'Keen', 'Halabi', 'Sigmund'] | ['Faraci', 'Beyer', 'de', 'Keen', 'Halabi', 'Sigmund', 'Baumbach'] | ['Faraci', 'Beyer', 'de', 'Keen', 'Halabi', 'Sigmund'] | Cell Metab | 2008 | 2008 Mar | 0 | ere altered in aorta from normal C57BL/6 mice in response to the PPARγ agonist Rosiglitazone suggesting they are not direct targets of PPARγ (unpublished data, GEO Database Accession GSE8949{{tag}} ). This suggests the signaling defect may lie downstream of PKG, a finding consistent with the loss of Ach and cGMP-mediated relaxation in PKG1-deficient mice ( Pfeifer et al., 1998 ). The upregula| t-test was used where appropriate. P<0.05 was considered significant. Gene Expression Profiling Microarray data (array platform- GPL1261 ) is available in MIAME format at the Gene Expression Omnibus Database (GEO) using accession number GSE8949{{tag}}--DEPOSIT-- . Supplementary Material body 01 Click here to view. (326K, pdf) Acknowledgments This work was supported by grants from the National Institutes of Healt | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1475 | GSE8961 | 11/10/2007 | ['8961'] | [] | [u'18234263'] | 2958743 | [u'20798171'] | ['Garofalo', 'Hong', 'Luxon', 'Liu', 'Sinha', 'Casola', 'Bao'] | ['Sama', 'Huynen'] | [] | Bioinformatics | 2010 | 11/1/2010 | 0 | DS 2.1 General PPI networks The PPI network used were built from an accumulation of human-curated PPIs obtained from the Biomolecular Interaction Network Database (BIND; Bader et al. , 2003 ) (data downloaded in October 2006), the HPRD (Peri et al. , 2003 ) (data of release 6 of January 2007), the IntAct database (Kerrien et al. , 2007 ) (downloaded in May 2007), the Molecular Interactions Database |actions between human proteins as HsapiensPPI. Furthermore, interologous PPIs were built using the orthologues datasets from the Ensembl genome browser (Hubbard et al. , 2007 ) (Ensembl release 44, downloaded on May 2007). These were combined with the HsapiensPPI dataset. We refer to this comprehensive dataset as AllspeciesPPI. The HsapiensPPI contains 53 807 interactions between 10 826 proteins. The Al|. Unless otherwise stated, the HsapiensPPI is used in this article as the general PPI network. 2.2 Disease and immune-related data All human disease genes were obtained from the Morbid Omim database (downloaded February 10, 2009 from ftp://ftp.ncbi.nih.gov/repository/OMIM/morbidmap ) (Sayers et al. , 2009 ). The HMPV infection data was obtained from (Bao et al. , 2008 ), as deposited in the NCBI Gene| epithelial cell treatment with the cytokine INFG (GSE1815) (Pawliczak et al. , 2005 ). Expression data of bronchial epithelial cells infected with respiratory pathogens like Chlamydia pneumonia (GSE7246) (Alvesalo et al. , 2008 ) and UV-irradiated airway-pathogens (for P.aeruginosa and RSV; GSE6802; Mayer et al. , 2007 ). A geometric average of all probes for a gene was used to represent the | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1476 | GSE8961 | 11/10/2007 | ['8961'] | [] | [u'18234263'] | 2777699 | [u'18234263'] | ['Garofalo', 'Hong', 'Luxon', 'Liu', 'Sinha', 'Casola', 'Bao'] | ['Garofalo', 'Hong', 'Luxon', 'Liu', 'Sinha', 'Casola', 'Bao'] | ['Garofalo', 'Hong', 'Luxon', 'Liu', 'Sinha', 'Casola', 'Bao'] | Virology | 2008 | 4/25/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1477 | GSE8965 | 12/15/2007 | ['8965'] | [] | [u'18245349'] | 2248342 | [u'18245349'] | ['Le', 'Tyerman', 'Bertrand', 'Hancock', 'Brazas', 'Doebeli', 'Spencer'] | ['Le', 'Tyerman', 'Bertrand', 'Hancock', 'Brazas', 'Doebeli', 'Spencer'] | ['Le', 'Tyerman', 'Bertrand', 'Hancock', 'Brazas', 'Doebeli', 'Spencer'] | Genetics | 2008 | 2008 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
1478 | GSE8967 | 9/12/2007 | ['8967'] | [] | [u'18691415'] | 2528013 | [u'18691415'] | ['Taffurelli', '', 'Rosati', 'Lauriola', 'Coppola', 'Zanotti', 'Mattei', 'Guidotti', 'Pezzetti', 'Strippoli', 'Ruggeri', 'Santini', 'Ugolini', 'Ceccarelli', 'Montroni', 'Solmi', 'Castellani', 'Voltattorni', 'Martini', 'Francesconi'] | ['Taffurelli', 'Rosati', 'Lauriola', 'Coppola', 'Zanotti', 'Mattei', 'Guidotti', 'Pezzetti', 'Strippoli', 'Ruggeri', 'Santini', 'Ugolini', 'Ceccarelli', 'Montroni', 'Solmi', 'Castellani', 'Voltattorni', 'Martini', 'Francesconi'] | ['Taffurelli', 'Rosati', 'Lauriola', 'Coppola', 'Mattei', 'Guidotti', 'Solmi', 'Pezzetti', 'Strippoli', 'Ruggeri', 'Santini', 'Ugolini', 'Ceccarelli', 'Montroni', 'Voltattorni', 'Castellani', 'Zanotti', 'Martini', 'Francesconi'] | BMC Cancer | 2008 | 8/8/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1479 | GSE8970 | 9/12/2007 | ['8970'] | [] | [u'18160667'] | 2689870 | [u'19393097'] | ['Stone', 'De', 'Lee', 'Gojo', 'Harousseau', 'Dossey', 'L\xc3\xb6wenberg', 'Karp', 'Wright', 'Morris', 'Fan', 'Raponi', 'Gotlib', 'Feldman', 'Lancet', 'Greenberg', 'Wang'] | ['Nakai', 'Mizuno', 'Kitada', 'Sarai'] | [] | BMC Med Genomics | 2009 | 4/24/2009 | 1 | as possible. All tables were relationally linked and stored in the MySQL server. Table 1 Dataset content from PrognoScan Dataset Cancer type Subtype Cohort Author/Contributor Array type n Data source GSE13507 Bladder cancer Transitional cell carcinoma Cheongju Kim Human-6 v2 n = 165 GEO GSE5287 Bladder cancer Aarhus (1995–2004) Als et al . [ 10 ] HG-U133A n = 30 GEO GSE12417-GPL570 Blood cance|-GPL96 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133A n = 163 GEO GSE12417-GPL97 Blood cancer AML AMLCG (1999–2003) Metzeler et al . [ 11 ] HG-U133B n = 163 GEO GSE8970{{tag}}--REUSE-- Blood cancer AML San Diego Raponi et al . [ 12 ] HG-U133A n = 34 GEO GSE4475 Blood cancer B-cell lymphoma Berlin (2003–2005) Hummel et al . [ 13 ] HG-U133A n = 158 GEO E-TABM-346 Blood ca|E2658 Blood cancer Multiple myeloma Arkansas Zhan et al . [ 15 ] HG-U133_Plus_2 n = 559 GEO E-TABM-158 Breast cancer UCSF, CPMC (1989–1997) Chin et al . [ 16 ] HG-U133A n = 129 ArrayExpress GSE11121 Breast cancer Mainz (1988–1998) Schmidt et al . [ 17 ] HG-U133A n = 200 GEO GSE1378 Breast cancer MGH (1987–2000) Ma et al . [ 18 ] Arcturus 22 k n = 60 GEO GSE1379 Breast cancer|6-GPL96 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133A n = 159 GEO GSE1456-GPL97 Breast cancer Stockholm (1994–1996) Pawitan et al . [ 19 ] HG-U133B n = 159 GEO GSE2034 Breast cancer Rotterdam (1980–1995) Wang et al . [ 20 ] HG-U133A n = 286 GEO GSE2990 Breast cancer Uppsala, Oxford Sotiriou et al . [ 21 ] HG-U133A n = 187 GEO GSE3143 Breast cancer Duke |GSE3494-GPL96 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133A n = 236 GEO GSE3494-GPL97 Breast cancer Uppsala (1987–1989) Miller et al . [ 23 ] HG-U133B n = 236 GEO GSE4922-GPL96 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133A n = 249 GEO GSE4922-GPL97 Breast cancer Uppsala (1987–1989) Ivshina et al . [ 24 ] HG-U133B n = 249 GEO GSE|SE7378 Breast cancer UCSF Zhou et al . [ 26 ] U133AAofAv2 n = 54 GEO GSE7390 Breast cancer Uppsala, Oxford, Stockholm, IGR, GUYT, CRH (1980–1998) Desmedt et al . [ 27 ] HG-U133A n = 198 GEO GSE7849 Breast cancer Duke (1990–2001) Anders et al . [ 28 ] HG-U95A n = 76 GEO GSE9195 Breast cancer GUYT2 Loi et al . [ 25 ] HG-U133_Plus_2 n = 77 GEO GSE9893 Breast cancer Montpellier, Bordeau|man 21 K V12.0 n = 155 GEO GSE11595 Esophagus cancer Adenocarcinoma Sutton Giddings CRUKDMF_22 K_v1.0.0 n = 34 GEO GSE7696 Glioma Glioblastoma Lausanne Murat et al . [ 30 ] HG-U133_Plus_2 n = 70 GEO GSE4271-GPL96 Glioma MDA Phillips et al . [ 31 ] HG-U133A n = 77 GEO GSE4271-GPL97 Glioma MDA Phillips et al . [ 31 ] HG-U133B n = 77 GEO GSE2837 Head and neck cancer Squamous cell carcinoma VUMC, VAMC, |Adenocarcinoma Harvard Beer et al . [ 33 ] HG-U95A n = 84 Author's web site MICHIGAN-LC Lung cancer Adenocarcinoma Michigan (1994–2000) Beer et al . [ 33 ] HuGeneFL n = 86 Author's web site GSE11117 Lung cancer NSCLC Basel Baty Novachip human 34.5 k n = 41 GEO GSE3141 Lung cancer NSCLC Duke Bild et al . [ 22 ] HG-U133_Plus_2 n = 111 GEO GSE4716-GPL3694 Lung cancer NSCLC Nagoya (1995–|da et al . [ 34 ] GF200 n = 50 GEO GSE4716-GPL3696 Lung cancer NSCLC Nagoya (1995–1996) Tomida et al . [ 34 ] GF201 n = 50 GEO GSE8894 Lung cancer NSCLC Seoul Son HG-U133_Plus_2 n = 138 GEO GSE4573 Lung cancer Squamous cell carcinoma Michigan (1991–2002) Raponi et al . [ 35 ] HG-U133A n = 129 GEO DUKE-OC Ovarian cancer Duke Bild et al . [ 22 ] HG-U133A n = 134 Author's web site GSE8|horts. Datasets come from a number of different institutions around the world, and patient backgrounds differ. In addition, several datasets are based on specific subpopulations, for example, dataset GSE2034 is from lymph node-negative breast cancers, and GSE5287 is from cisplatin-containing chemotherapy-treated bladder cancers. Hence, it is possible that the specific association between gene expressio|subsequent care may affect the clinical course of a patient. 3) Experimental factors. Expression measurement of microarray is subject to various factors at the experiment level. Microdissection (e.g. GSE1378) would reduce contamination of mRNAs from non-cancer cells [ 57 ]. Formalin fixation of a sample (e.g. GSE2873) influences the quality of mRNAs [ 58 ]. Array type (e.g. Affymetrix, cDNA microarrays | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1480 | GSE8977 | 10/5/2007 | ['8977'] | [] | [u'17914389'] | 2631593 | [u'19116033'] | ['Tubo', 'Polyak', 'Weinberg', 'Karnoub', 'Bell', 'Sullivan', 'Brooks', 'Vo', 'Dash', 'Richardson'] | ['Ergul', 'Yulug', 'Gur-Dedeoglu', 'Kir', 'Bozkurt', 'Ozturk', 'Konu'] | [] | BMC Cancer | 2008 | 12/30/2008 | 1 | used as an estimate of the measure of expression. Data retrieval and analysis for validation studies The ".cel files" of the three publicly available independent microarray gene expression data sets, GDS2635 [ 5 ], GDS2250 [ 7 ] and GDS1329 [ 4 ], were downloaded from GEO [ 28 ] and processed by the BRB-ARRAYTOOLS [ 26 ]. All three datasets were obtained using the Affymetrix HGU133A or HGU133 Plus 2.0 |lation to their normal ductal and lobular cells (n = 10). The authors identified multiple genes differentially expressed in comparisons between ductal and lobular tumor and normal cells [ 5 ]. In the GDS2250 study, a gene expression array-based analysis of three breast tumor subtypes, i.e., sporadic basal-like cancer (BLC), BRCA-associated breast cancer, and non-BLC, was performed. They used 47 human b|s for the meta-gene lists, DN (Ductal/Normal) and LN (Lobular/Normal). Study GEO ID Class Meta gene-list DN LN N T Accuracy (%) Number of genes r DN Accuracy (%) Number of genes r LN Turashvili [ 5 ] GDS2635 10 10 93 57 0.85 80 49 0.87 Richardson [ 7 ] GDS2250 7 40 100 145 0.86 100 96 0.78 Karnoub [ 8 ] GSE8977{{tag}}--REUSE-- 15 7 95.5 109 0.72 95.5 89 0.81 Normal (N) and tumor (T) sample sizes, accuracy of predictio|tive gene set differentially expressed between tumor and normal cells (Additional file 9 ). Twenty-eight genes from the DN or LN meta-gene lists intersected with the three other microarray datasets (GDS2635, GDS2250, and GD1329); 17 of which were differentially expressed between basal vs. non-basal and/or ER status (Additional file 9 ). For example, ADAMTS1 , ATF3 , IGFBP6 , PRNP , EGFR , FN1 ,|3c; 0.05; Additional file 9 ). Validation of ductal vs. lobular meta-gene list Comparison of fold-change values of the DL meta-gene list consisting of 65 genes with that of the Turashvili's DL list (GDS2635) resulted in a high degree of correlation (r = 0.53; p < 0.001), suggesting that the direction and magnitude of expression change between the IDC and ILC samples were largely consistent bet | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1481 | GSE8977 | 10/5/2007 | ['8977'] | [] | [u'17914389'] | 2896865 | [u'20625410'] | ['Tubo', 'Polyak', 'Weinberg', 'Karnoub', 'Bell', 'Sullivan', 'Brooks', 'Vo', 'Dash', 'Richardson'] | ['Lei', 'Hou', 'Wang', 'Li'] | [] | J Biomed Biotechnol | 2010 | 2010 | 0 | network analysis. 2.6. Dataset To evaluate the performance of the proposed method, seven gene expression datasets were used in this study: Acute Lymphoblastic Leukemia (ALL) [ 17 ], Breast cancer 30 (GSE5764) [ 18 ], Breast cancer 22(GSE8977{{tag}}--REUSE--) [ 18 ], Colon cancer [ 19 ], Prostate cancer 102 [ 20 ], and Prostate cancer 34 [ 21 ]. The two pairs of cross-platform datasets were used to evaluate the general | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1482 | GSE8981 | 12/31/2007 | ['8981'] | [] | [u'18076285'] | 2704158 | [u'19439448'] | ['Buhler', 'Borde', 'Lichten'] | ['Ito', 'Yamada', 'Shirahige', 'Sasanuma', 'Kugou', 'Shibata', 'Matsumoto', 'Mori', 'Ohta', 'Katou', 'Fukuda', 'Itoh'] | [] | Mol Biol Cell | 2009 | 2009 Jul | 0 | AND pmc_gds | 0 | 1 | ||||
1483 | GSE8981 | 12/31/2007 | ['8981'] | [] | [u'18076285'] | 2121111 | [u'18076285'] | ['Buhler', 'Borde', 'Lichten'] | ['Buhler', 'Borde', 'Lichten'] | ['Buhler', 'Borde', 'Lichten'] | PLoS Biol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1484 | GSE8986 | 9/8/2007 | ['8986'] | [] | [u'17908812'] | 2168327 | [u'17908812'] | ['Reenstra', 'Wilson', 'Weiner', 'Louboutin', 'Zhang'] | ['Reenstra', 'Wilson', 'Weiner', 'Louboutin', 'Zhang'] | ['Reenstra', 'Wilson', 'Weiner', 'Louboutin', 'Zhang'] | Infect Immun | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1485 | GSE8988 | 9/11/2007 | ['8988'] | ['3084'] | [u'18562560', u'20500897'] | 2990769 | [u'21040584'] | ['Lai', 'Almon', 'DuBois', 'Nguyen', 'Androulakis', 'Yang', 'Jusko'] | ['Almon', 'DuBois', 'Jusko', 'Androulakis', 'Sukumaran', 'Ovacik'] | ['DuBois', 'Androulakis', 'Almon', 'Jusko'] | BMC Bioinformatics | 2010 | 11/1/2010 | 0 | 1985) and was approved by the University at Buffalo Institutional Animal Care and Use Committee. The details of the experiment can be found in [ 9 ]. The data is available under the accession number GSE8988{{tag}}--DEPOSIT-- http://www.ncbi.nlm.nih.gov/geo/ . Circadian signature of gene expression levels The circadian pattern of a gene expression is approximated using the sinusoidal model A · sin( B |lysis to evaluate pathway levels [ 13 ]. The pathway activity analysis begins with mapping gene expressions of microarray onto pathways. Pathway annotations of gene expressions are retrieved from the publicly available database The Molecular Signatures Database (MSigDB) [ 18 ]. Subsequently, gene expression levels within a given pathway are reduced to the pathway activity levels using singular value decom | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1486 | GSE8988 | 9/11/2007 | ['8988'] | ['3084'] | [u'18562560', u'20500897'] | 2889936 | [u'20500897'] | ['Lai', 'Almon', 'DuBois', 'Nguyen', 'Androulakis', 'Yang', 'Jusko'] | ['Nguyen', 'Androulakis', 'DuBois', 'Jusko', 'Almon'] | ['Nguyen', 'Androulakis', 'DuBois', 'Almon', 'Jusko'] | BMC Bioinformatics | 2010 | 5/26/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
1487 | GSE8988 | 9/11/2007 | ['8988'] | ['3084'] | [u'18562560', u'20500897'] | 2561907 | [u'18562560'] | ['Lai', 'Almon', 'DuBois', 'Nguyen', 'Androulakis', 'Yang', 'Jusko'] | ['Lai', 'Almon', 'DuBois', 'Androulakis', 'Yang', 'Jusko'] | ['Lai', 'Almon', 'DuBois', 'Androulakis', 'Yang', 'Jusko'] | J Pharmacol Exp Ther | 2008 | 2008 Sep | 0 | AND pmc_gds | 1 | 0 | ||||
1488 | GSE8989 | 9/11/2007 | ['8989'] | ['3083'] | [u'18667713'] | 2576101 | [u'18667713'] | ['Lai', 'Almon', 'Hoffman', 'Androulakis', 'Yang', 'Ghimbovschi', 'Jusko', 'Dubois'] | ['Lai', 'Almon', 'Hoffman', 'Androulakis', 'Yang', 'Ghimbovschi', 'Jusko', 'Dubois'] | ['Lai', 'Almon', 'Hoffman', 'Androulakis', 'Yang', 'Ghimbovschi', 'Jusko', 'Dubois'] | Am J Physiol Regul Integr Comp Physiol | 2008 | 2008 Oct | 0 | AND pmc_gds | 1 | 0 | ||||
1489 | GSE8990 | 11/1/2007 | ['8990'] | [] | [u'19793890'] | 2826312 | [u'20089153'] | ['Rebbert', 'Zhao', 'Dawid', 'Tanegashima'] | ['Raya', 'Christen', 'Belmonte', 'Robles', 'Paramonov'] | [] | BMC Biol | 2010 | 1/20/2010 | 0 | espectively. All analyses were performed applying Summit software DakoCytomation (Fort Collins, CO, USA). Re-analysis of microarray data sets Microarray data are available at the NCBI Gene Expression Omnibus database under the following accession numbers: Xenopus animal cap - GSE3334 (Dickinson et al., 2006), GSE8990{{tag}}--REUSE--, GSE8496, Xenopus regenerating hindlimb - GSE9813 (Pearl et al., 2008), GSE4738 (Gro | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1490 | GSE8990 | 11/1/2007 | ['8990'] | [] | [u'19793890'] | 2746918 | [u'19758566'] | ['Rebbert', 'Zhao', 'Dawid', 'Tanegashima'] | ['Janssen-Megens', 'Akkers', 'Stunnenberg', 'Fran\xc3\xa7oijs', 'Veenstra', 'Jacobi', 'van'] | [] | Dev Cell | 2009 | 2009 Sep | 0 | AND pmc_gds | 0 | 1 | ||||
1491 | GSE8992 | 9/12/2007 | ['8992'] | [] | [u'18036212'] | 2242807 | [u'18036212'] | ['Costa', 'Irsigler', 'Reis', 'Boston', 'Zhang', 'Fontes', 'Dewey'] | ['Costa', 'Irsigler', 'Reis', 'Boston', 'Zhang', 'Fontes', 'Dewey'] | ['Costa', 'Irsigler', 'Reis', 'Boston', 'Zhang', 'Fontes', 'Dewey'] | BMC Genomics | 2007 | 11/23/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1492 | GSE8994 | 9/14/2007 | ['8994'] | [] | [u'17997849'] | 2190774 | [u'17997849'] | ['Rattray', 'Meyers', 'West', 'St', u'St.Clair', 'Agrawal', 'Chen', 'Michelmore', 'Coughlan'] | ['Rattray', 'Meyers', 'West', 'St', 'Agrawal', 'Chen', 'Michelmore', 'Coughlan'] | ['Rattray', 'Meyers', 'West', 'St', 'Agrawal', 'Chen', 'Michelmore', 'Coughlan'] | BMC Genomics | 2007 | 11/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1493 | GSE9001 | 12/31/2007 | ['9001'] | ['3352'] | [u'18689425'] | 2522372 | [u'18689425'] | ['Porpiglia', 'Heyland', 'Tatar', 'Sherlock', 'Garbuzov', 'Silverman', 'Rus', 'Yamamoto', 'Flatt', 'Palli'] | ['Porpiglia', 'Heyland', 'Tatar', 'Sherlock', 'Garbuzov', 'Silverman', 'Rus', 'Yamamoto', 'Flatt', 'Palli'] | ['Porpiglia', 'Heyland', 'Tatar', 'Sherlock', 'Garbuzov', 'Silverman', 'Rus', 'Yamamoto', 'Flatt', 'Palli'] | J Exp Biol | 2008 | 2008 Aug | 0 | AND pmc_gds | 1 | 0 | ||||
1494 | GSE9006 | 9/12/2007 | ['9006'] | [] | [u'17595242'] | 2872005 | [u'20400455'] | ['Kaizer', 'Banchereau', 'Glaser', 'Pascual', 'Chaussabel', 'White'] | ['McIndoe', 'Zhao', 'Sharma', 'Podolsky'] | [] | Bioinformatics | 2010 | 6/1/2010 | 0 | 6 GHz/512K Cache Xeon Processors and 8.0 GB DDR 266 Mhz RAM. 3.2 Increased capacity of handling larger datasets We analyzed an experimentally derived microarray dataset (NCBI Gene Expression Omnibus; GSE9006{{tag}}--REUSE--: Gene expression in PBMCs from children with diabetes) (Kaizer et al. , 2007 ). Combined data from both chips (HG-U133A and HG-U133B) were used for a 44 760 gene dataset. The data from only the H | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1495 | GSE9006 | 9/12/2007 | ['9006'] | [] | [u'17595242'] | 2797819 | [u'19961616'] | ['Kaizer', 'Banchereau', 'Glaser', 'Pascual', 'Chaussabel', 'White'] | ['Geman', 'Price', 'Edelman', 'Toia', 'Zhang'] | [] | BMC Genomics | 2009 | 12/5/2009 | 0 | s Classification Task Tissue Source Samples (Positive/Negative) GEO ID # Probes GI Stromal Tumor vs Leiomyosarcoma GI Biopsy 68 (37/31) N/A 43,931 Crohn's Disease vs Healthy Controls PBMC 101 (59/42) GDS1615 22,283 Ischemic vs Idiopathic Cardiomyopathy Cardiac Biopsy 194 (86/108) GSE5406 22,283 Type I Diabetes vs Healthy Controls PBMC 105 (81/24) GSE9006{{tag}}--REUSE-- 22,283 Type II Diabetes vs Healthy Controls PBMC| Ulcerative Colitis W/WO Transformation Colon Biopsy 54 (11/43) GSE3629 54,681 Gram-Negative vs Gram-Positive Infection PBMC 73 (29/44) GSE6269 22,283 Gram-Negative vs Viral Infection PBMC 62 (18/44) GSE6269 22,283 HIV Infection vs Healthy Controls PBMC 86 (74/12) GDS1449 8793 Microarray gene expression datasets obtained from the Gene Expression Omnibus. Transcriptional analysis was performed on either | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1496 | GSE9006 | 9/12/2007 | ['9006'] | [] | [u'17595242'] | 2672630 | [u'19261720'] | ['Kaizer', 'Banchereau', 'Glaser', 'Pascual', 'Chaussabel', 'White'] | ['McIndoe', 'Zhao', 'Sharma', 'Podolsky'] | [] | Bioinformatics | 2009 | 5/1/2009 | 0 | Supplementary Fig. S1 ). 2.5.1 Experimental microarray data We also analyzed a real experimentally derived microarray dataset. The dataset was downloaded from the NCBI Gene Expression Omnibus (GEO) (GSE9006{{tag}}--REUSE--: Gene expression in PBMCs from children with diabetes) (Kaizer et al. , 2007 ). The array platform is the Affymetrix GeneChip Human Genome HG-U133A and HG-U133B chips. Combined data from both chi | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1497 | GSE9010 | 9/11/2007 | ['9010'] | [] | [u'17940039'] | 2040480 | [u'17940039'] | ['Ostertag', 'Schiedlmeier', 'Auer', 'Klump', 'Kornacker', 'Mallo', 'Baum', 'Moncaut', 'Santos', 'Ribeiro', 'Lesinski'] | ['Ostertag', 'Schiedlmeier', 'Auer', 'Klump', 'Kornacker', 'Mallo', 'Baum', 'Moncaut', 'Santos', 'Ribeiro', 'Lesinski'] | ['Ostertag', 'Schiedlmeier', 'Auer', 'Klump', 'Kornacker', 'Mallo', 'Baum', 'Moncaut', 'Santos', 'Ribeiro', 'Lesinski'] | Proc Natl Acad Sci U S A | 2007 | 10/23/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1498 | GSE9015 | 9/29/2007 | ['9015'] | [] | [u'18256232'] | 2802188 | [u'20008927'] | ['', 'Shann', 'Chiao', 'Li', 'Cheng', 'Hsu', 'Chen'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | [] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1499 | GSE9015 | 9/29/2007 | ['9015'] | [] | [u'18256232'] | 2336806 | [u'18256232'] | ['', 'Shann', 'Chiao', 'Li', 'Cheng', 'Hsu', 'Chen'] | ['Shann', 'Chiao', 'Li', 'Cheng', 'Hsu', 'Chen'] | ['Shann', 'Chiao', 'Li', 'Cheng', 'Hsu', 'Chen'] | Genome Res | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
1500 | GSE9026 | 11/15/2007 | ['9026'] | [] | [u'17951393'] | 2168637 | [u'17951393'] | ['Fischer', 'Friberg', 'Pessi', 'Lindemann', 'Hauser', 'Hennecke', 'Moser'] | ['Fischer', 'Friberg', 'Pessi', 'Lindemann', 'Hauser', 'Hennecke', 'Moser'] | ['Fischer', 'Friberg', 'Pessi', 'Lindemann', 'Hauser', 'Hennecke', 'Moser'] | J Bacteriol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1501 | GSE9036 | 9/19/2007 | ['9036'] | [] | [u'19826084'] | 2761237 | [u'19826084'] | ['Kodama', 'Kanki', 'Aburatani', 'Izumi', 'Tsutsumi', 'Minami', 'Papantonis', 'Ihara', 'Cook', 'Meguro', 'Kimura', 'Shirahige', 'Hamakubo', 'Inoue', 'Komura', 'Ohta', 'Mataki', 'Kitakami', 'Mimura', 'Oshida', 'Yamamoto', 'Wada', 'Kobayashi', 'Xu'] | ['Kodama', 'Kanki', 'Aburatani', 'Izumi', 'Tsutsumi', 'Minami', 'Papantonis', 'Ihara', 'Cook', 'Meguro', 'Kimura', 'Shirahige', 'Hamakubo', 'Inoue', 'Komura', 'Ohta', 'Mataki', 'Kitakami', 'Mimura', 'Oshida', 'Yamamoto', 'Wada', 'Kobayashi', 'Xu'] | ['Kodama', 'Kanki', 'Aburatani', 'Izumi', 'Tsutsumi', 'Ihara', 'Papantonis', 'Cook', 'Meguro', 'Kimura', 'Shirahige', 'Hamakubo', 'Inoue', 'Minami', 'Ohta', 'Mataki', 'Kitakami', 'Komura', 'Mimura', 'Oshida', 'Yamamoto', 'Wada', 'Kobayashi', 'Xu'] | Proc Natl Acad Sci U S A | 2009 | 10/27/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1502 | GSE9044 | 9/19/2007 | ['9044'] | ['3036'] | [u'17940039'] | 2040480 | [u'17940039'] | ['Ostertag', 'Schiedlmeier', 'Auer', 'Klump', 'Kornacker', 'Mallo', 'Baum', 'Moncaut', 'Santos', 'Ribeiro', 'Lesinski'] | ['Ostertag', 'Schiedlmeier', 'Auer', 'Klump', 'Kornacker', 'Mallo', 'Baum', 'Moncaut', 'Santos', 'Ribeiro', 'Lesinski'] | ['Ostertag', 'Schiedlmeier', 'Auer', 'Klump', 'Kornacker', 'Mallo', 'Baum', 'Moncaut', 'Santos', 'Ribeiro', 'Lesinski'] | Proc Natl Acad Sci U S A | 2007 | 10/23/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1503 | GSE9051 | 11/30/2007 | ['9051'] | [] | [u'18218979'] | 2259111 | [u'18218979'] | ['Theiler', 'Descombes', 'Paszkowski', 'Reinders', u'Vivier', 'Chollet', 'Delucinge'] | ['Theiler', 'Descombes', 'Paszkowski', 'Reinders', 'Chollet', 'Delucinge'] | ['Theiler', 'Descombes', 'Paszkowski', 'Reinders', 'Chollet', 'Delucinge'] | Genome Res | 2008 | 2008 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
1504 | GSE9054 | 10/25/2007 | ['9054'] | [] | [u'17959825'] | 2174171 | [u'17959825'] | ['Paramio', 'Santos', u'Mart\xednez-Cruz', 'Lorz', 'Bravo', u'Garc\xeda-Escudero', 'Mart\xc3\xadnez-Cruz', 'Beltran', 'Lara', 'Lu', 'Moral', 'DiGiovanni', 'Segovia', 'Segrelles', 'Garc\xc3\xada-Escudero', 'Cascallana', 'Carbajal'] | ['Paramio', 'Santos', 'Lorz', 'Bravo', 'Mart\xc3\xadnez-Cruz', 'Beltran', 'Lara', 'Lu', 'Moral', 'DiGiovanni', 'Segovia', 'Segrelles', 'Garc\xc3\xada-Escudero', 'Cascallana', 'Carbajal'] | ['Paramio', 'Santos', 'Lorz', 'Bravo', 'Mart\xc3\xadnez-Cruz', 'Beltran', 'Lara', 'Lu', 'Moral', 'DiGiovanni', 'Segovia', 'Segrelles', 'Garc\xc3\xada-Escudero', 'Cascallana', 'Carbajal'] | Mol Biol Cell | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1505 | GSE9072 | 9/30/2007 | ['9072'] | [] | [] | 2365940 | [u'18423039'] | [u'Shohet', u'Maresh'] | ['Shohet', 'Maresh'] | ['Shohet', 'Maresh'] | Cardiovasc Diabetol | 2008 | 4/19/2008 | 0 | was performed using the t-test within Microsoft Excel ® to compare diabetic vs. control expression values. Results Microarray results have been placed in the GEO database under series record GSE9072{{tag}}--DEPOSIT--. Transcripts within the aortic endothelium dysregulated by at least 70% in response to the type I diabetic model are shown in fig. 1 . The average microarray-based fold change, reference sequence | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1506 | GSE9086 | 11/1/2007 | ['9086'] | [] | [u'17999768'] | 2258184 | [u'17999768'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1507 | GSE9087 | 11/14/2007 | ['9087'] | [] | [u'17978103'] | 2049191 | [u'17978103'] | ['Bell', 'MacAlpine', 'Ahn', 'Botchan', 'Cheung', 'Manak', 'Beall', 'Lewis', 'Speed', 'Georlette'] | ['Bell', 'MacAlpine', 'Ahn', 'Botchan', 'Cheung', 'Manak', 'Beall', 'Lewis', 'Speed', 'Georlette'] | ['Bell', 'MacAlpine', 'Ahn', 'Botchan', 'Cheung', 'Manak', 'Beall', 'Lewis', 'Speed', 'Georlette'] | Genes Dev | 2007 | 11/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1508 | GSE9089 | 11/1/2007 | ['9089'] | [] | [u'17999768'] | 2258184 | [u'17999768'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1509 | GSE9090 | 11/1/2007 | ['9090'] | [] | [u'17999768'] | 2258184 | [u'17999768'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1510 | GSE9091 | 11/1/2007 | ['9091'] | [] | [u'17999768'] | 2258184 | [u'17999768'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1511 | GSE9096 | 9/28/2007 | ['9096'] | [] | [u'17928342'] | 2168861 | [u'17928342'] | ['Webb', 'Wilson', 'Whitehouse', 'Kellam', 'Tsantoulas', 'Dalton-Griffin', 'Ye', 'Tsao', 'Gale', 'Du'] | ['Webb', 'Wilson', 'Whitehouse', 'Kellam', 'Tsantoulas', 'Dalton-Griffin', 'Ye', 'Tsao', 'Gale', 'Du'] | ['Webb', 'Wilson', 'Whitehouse', 'Kellam', 'Tsantoulas', 'Dalton-Griffin', 'Ye', 'Tsao', 'Gale', 'Du'] | J Virol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1512 | GSE9097 | 11/1/2007 | ['9097'] | [] | [u'18285497'] | 2346688 | [u'18285497'] | ['Feng', 'Rogers', u'Garcia', 'Garc\xc3\xada', 'Fox', 'Boutin', 'Samson', 'Ihrig', 'Muthupalani', 'Fry', 'Xu'] | ['Feng', 'Rogers', 'Garc\xc3\xada', 'Fox', 'Boutin', 'Samson', 'Ihrig', 'Muthupalani', 'Fry', 'Xu'] | ['Feng', 'Rogers', 'Garc\xc3\xada', 'Fox', 'Boutin', 'Samson', 'Ihrig', 'Muthupalani', 'Fry', 'Xu'] | Infect Immun | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
1513 | GSE9100 | 11/15/2007 | ['9100'] | [] | [u'17951393'] | 2168637 | [u'17951393'] | ['', 'Fischer', 'Friberg', 'Pessi', 'Lindemann', 'Hauser', 'Hennecke', 'Moser'] | ['Fischer', 'Friberg', 'Pessi', 'Lindemann', 'Hauser', 'Hennecke', 'Moser'] | ['Fischer', 'Friberg', 'Pessi', 'Lindemann', 'Hauser', 'Hennecke', 'Moser'] | J Bacteriol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1514 | GSE9103 | 11/1/2007 | ['9103'] | ['3182'] | [] | 2620272 | [u'19014681'] | [u'Short', u'Joyner', u'Asmann', u'Nair', u'Lanza', u'Bigelow'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103{{tag}}--REUSE--, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1515 | GSE9117 | 9/21/2007 | ['9117'] | ['3180'] | [u'18627025'] | 2577172 | [u'18627025'] | ['Krizsan-Agbas', 'Pedchenko', 'Smith'] | ['Krizsan-Agbas', 'Pedchenko', 'Smith'] | ['Krizsan-Agbas', 'Pedchenko', 'Smith'] | J Neurosci Res | 2008 | 11/1/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1516 | GSE9124 | 11/1/2007 | ['9124'] | ['3058'] | [u'17923686'] | 2169408 | [u'17923686'] | ['Wisse', 'van', 'Mahtab', 'Suske', 'Hou', 'Grosveld', 'Philipsen', 'Gittenberger-de'] | ['Wisse', 'van', 'Mahtab', 'Suske', 'Hou', 'Grosveld', 'Philipsen', 'Gittenberger-de'] | ['Wisse', 'van', 'Mahtab', 'Suske', 'Hou', 'Grosveld', 'Philipsen', 'Gittenberger-de'] | Mol Cell Biol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1517 | GSE9129 | 12/5/2007 | ['9129'] | [] | [u'18066065'] | 2628762 | [u'18066065'] | ['', 'Wentzel', 'Lee', 'Arking', 'West', 'Thomas-Tikhonenko', 'Mendell', 'Dang', 'Chang', 'Yu'] | ['Wentzel', 'Lee', 'Arking', 'West', 'Thomas-Tikhonenko', 'Mendell', 'Dang', 'Chang', 'Yu'] | ['Wentzel', 'Lee', 'Arking', 'West', 'Thomas-Tikhonenko', 'Mendell', 'Dang', 'Chang', 'Yu'] | Nat Genet | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1518 | GSE9132 | 9/22/2007 | ['9132'] | [] | [u'17921248'] | 2034232 | [u'17921248'] | ['Kim', 'Affourtit', 'Ablamunits', 'Shalom-Barak', 'Young', 'Snow', 'Hasham', 'Barak', 'Paulk', 'Huang', 'Richardson', 'Bult'] | ['Kim', 'Affourtit', 'Ablamunits', 'Shalom-Barak', 'Young', 'Snow', 'Hasham', 'Barak', 'Paulk', 'Huang', 'Richardson', 'Bult'] | ['Shalom-Barak', 'Ablamunits', 'Snow', 'Young', 'Affourtit', 'Barak', 'Paulk', 'Kim', 'Hasham', 'Richardson', 'Bult', 'Huang'] | Proc Natl Acad Sci U S A | 2007 | 10/16/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1519 | GSE9136 | 10/5/2007 | ['9136'] | [] | [u'16489413', u'17910955'] | 2248370 | [u'18245339'] | ['Miura', 'Usami', 'Minegishi', 'Abe'] | ['Minegishi', 'Abe'] | ['Minegishi', 'Abe'] | Genetics | 2008 | 2008 Feb | 0 | nged upon the shifts of growth condition to high pressure or low temperature except for downregulation of GDH1 by 2.5-fold under low temperature (the microarray data is available at Gene Expression Omnibus, accession no. GSE9136{{tag}}--DEPOSIT-- ). These results suggest that cells respond to high pressure and low temperature in a distinct program from that in response to rapamycin treatment. F igure 4.— Cell |ncoding 24-amino-acid permeases and their homologs were downregulated upon shifts to growth under high pressure and low temperature ( Table 2 ; the DNA microarray data is available at Gene Expression Omnibus, accession no. GSE9136{{tag}}--DEPOSIT-- ). In particular, genes classified into the amino acid permease cluster I and cluster II were markedly decreased. With reduced levels of transcripts, and thereby synthesis of | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1520 | GSE9137 | 9/26/2007 | ['9137'] | [] | [] | 2613928 | [u'18990226'] | [''] | ['Juskeviciute', 'Vadigepalli', 'Hoek'] | [] | BMC Genomics | 2008 | 11/6/2008 | 0 | Elmer, Waltham, MA). Raw quantitated array data was normalized using the print-tip lowess and scale normalization algorithms [ 52 ]. MIAME compliant microarray data are deposited at , accession # GSE7415 (PHx) and GSE9137{{tag}}--DEPOSIT-- (sham). ANOVA model The normalized gene expression data was analyzed using a mixed-effects ANOVA response model for each gene using the statistical software package in R follow | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
1521 | GSE9138 | 10/25/2007 | ['9138'] | [] | [u'17952056'] | 2825236 | [u'20113528'] | ['Lin', 'Yin'] | ['Korbie', 'Mattick', 'Hansen', 'Makunin', 'Jung'] | [] | BMC Genomics | 2010 | 2/1/2010 | 0 | As frequently cover the entire length of annotated snoRNAs or tRNAs, which suggests that other loci specifying similar ncRNAs can be identified by clusters of short RNA sequences. Results We combined publicly available datasets of tens of millions of short RNA sequence tags from Drosophila melanogaster , and mapped them to the Drosophila genome. Approximately 6 million perfectly mapping sequence tags w|snoRNAs, as well as a number of novel ncRNAs. Results Compilation of short RNA sequence reads into tag-contigs We obtained 10,846,433 sequence tags comprising 55,894,809 reads from 12 Gene Expression Omnibus (GEO) datasets (Table 1 ) derived from 90 experiments performed on Drosophila cell lines and tissues. Approximately 6 million tags were perfectly mapped to the D. melanogaster genome, excluding |ee Methods). As a measure of expression level, each TC was assigned a tag-depth score based on the maximum number of overlapping reads covering any part of the locus (Fig. 1 ) (see Methods). Table 1 Publicly available short RNA sequencing datasets on D. malanogaster GEO accession No. of tags Mappable References GSE10277 23252 12057 [ 14 ] GSE10515 49878 12096 [ 15 ] GSE10790 347861 30780 [ 19 ] GSE10794|19 255670 381508 [ 9 ] GSE11086 1277025 1509771 [ 13 ] GSE11624 6643474 3125323 [ 12 ] GSE6734 32160 34362 [ 8 ] GSE7448 753797 452471 [ 17 ] GSE9138{{tag}}--REUSE-- 13299 13294 [ 20 ] GSE9389 59906 32472 [ 18 , 9 ] GSE12527 2967 817 [ 11 ] total 10846433 6297373 Figure 1 Compilation of a tag-contig . Contiguously overlapping tags (grey arrows) were assembled into a tag-contig (TC) (block arrow). The tag-depth is the |rts of existing transposons generating siRNAs or piRNAs. Conclusions Several studies investigating the population of small RNAs have yielded millions of sequence reads. In this study, we combined all publicly available sequence data from Drosophila melanogaster short RNA into hundreds of thousands tag-contigs and associated subsets of them with known ncRNAs such as snoRNAs and tRNAs. The characteristic | miRbase release 12.0 [ 25 ]. Repeats were annotated using RepeatMasker [ 43 ] in FlyBase 5.12. Mapping of sequence tags We obtained all public available deep-sequencing datasets from Gene Expression Omnibus database at National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/geo in SOFT format (Table 1 ). These sequences were subsequently mapped to the genome of D. melanogaster usi | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1522 | GSE9151 | 10/1/2007 | ['9151'] | ['3003'] | [u'17919147'] | 2972666 | [u'20975711'] | ['Vroling', 'Duinsbergen', 'van', 'Fokkens'] | ['Wang', 'Li', "O'Connor-McCourt", 'Purisima', 'Collins', 'Deng', 'Lenferink', 'Cui'] | [] | Nat Commun | 2010 | 7/13/2010 | 0 | The breast cancer microarray data sets used were from WangÊet al.15Ê(Wang cohort or data set, Affymetrix arrays), ChangÊet al.6Ê(Chang cohort or data set, cDNA arrays), van 't VeerÊet al.5Ê(van 't Veer cohort or data set, cDNA arrays), MillerÊet al.21Ê(Miller cohort or data set, Affymetrix arrays) and from several other Affymetrix array data sets with the following NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/) IDs:GSE11121,ÊGSE1456,ÊGSE6532,ÊGSE9151{{key}}--REUSE--,ÊGSE7378ÊandÊGSE12093. | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1523 | GSE9157 | 9/26/2007 | ['9157'] | [] | [u'18208606'] | 2395253 | [u'18208606'] | ['Viguerie', 'Zucker', 'Stich', 'Poitou', 'Henegar', 'Achard', 'Langin', 'Bedossa', 'Lacasa', 'Cremer', 'Tordjman', 'Guerre-Millo', 'Clement', 'Basdevant'] | ['Viguerie', 'Zucker', 'Stich', 'Poitou', 'Henegar', 'Achard', 'Langin', 'Bedossa', 'Lacasa', 'Cremer', 'Tordjman', 'Guerre-Millo', 'Clement', 'Basdevant'] | ['Viguerie', 'Zucker', 'Poitou', 'Langin', 'Achard', 'Henegar', 'Bedossa', 'Lacasa', 'Stich', 'Tordjman', 'Cremer', 'Guerre-Millo', 'Clement', 'Basdevant'] | Genome Biol | 2008 | 1/21/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1524 | GSE9164 | 11/22/2007 | ['9164'] | [] | [u'18029452'] | 2761903 | [u'19772654'] | [u'Heidarsdottir', 'Thomson', 'Frane', 'Jonsdottir', 'Nie', 'Tian', 'Antosiewicz-Bourget', 'Vodyanik', 'Slukvin', 'Yu', 'Ruotti', 'Smuga-Otto', 'Stewart'] | ['Gillis', 'Pavlidis'] | [] | BMC Bioinformatics | 2009 | 9/22/2009 | 0 | experiment name, organism part, array design and age category for the experiments are listed in each column. Experiments used for analysis . Gemma ID Name Organism part Array Design Age category 622 GSE8586 Umbilical cord GPL570 Prenatal 726 GSE9164{{tag}}--REUSE-- Foreskin cells GPL5876 Prenatal 233 GSE1397 Brain, heart GPL96 Prenatal 215 khatua-astrocytoma Brain GPL91 Child/young adult 218 pomeroy-embryonal Brain, |Child/young adult 555 GSE5808 Blood cell GPL96 Child/young adult 585 GSE7586 Placenta GPL570 Adult 178 GSE80 Muscle GPL91 Adult 633 GSE8607 Testis GPL91 Adult 275 GSE4757 Brain GPL570 Older adult 721 GSE8919 Brain GPL2700 Older adult 263 GSE5281 Brain GPL570 Older adult To allow the investigation of differential expression over age, we computed a relative rank-based measure of expression level for each | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1525 | GSE9166 | 10/30/2007 | ['9166'] | ['3068'] | [u'17967932'] | 2881867 | [u'20539758'] | ['Liguori', 'Waring', 'Blomme'] | ['Duvall', 'Irvin', 'Zhang', 'Wheeler', 'Zhai', 'Black', 'Panwar', 'Seksenyan', 'Jouanneau', 'Sarayba'] | [] | PLoS One | 2010 | 6/7/2010 | 0 | ) Principal Component Analyses focused on discrete gene lists were plotted in GeneSpring GX 7.3, and group clusters circled, on: 59 GBMs from UCLA database (“UCLA GBM”; GEO accession #GSE4412), 12 GBMs from 6 patients collected before and after DC vaccination (“vaccinated GBM”; GEO accession #GSE9166{{tag}}--REUSE--); 10 GBMs from 5 patients collected before and after standard radiation|c;control GBM”; GEO accession #GSE9166{{tag}}--REUSE--) (red); CD133 - and CD133 + CSCs from 6 University of Regensberg GBM patients [29] (“UR GSC”; GEO accession #GDS2728) (green); stem cell media-cultured GBM lines from 2 Henry Ford Hospital patients (“HFH CS lines”; GEO accession #GSE4536); murine GL26 glioma samples recovered and cultured ≤|pression ( Fig. S1A ). GL26B6V exhibited parallel trends in all analogous gene lists (right column). (B) Primary GBM microarray expression values from 200 Henry Ford Hospital patients (GEO accession #GSE4536) were assessed for similarity to averaged expression values of 6 UCLA glioma CSCs by determining Pearson's coefficients across 54,674 transcripts, and arranged in order of ascending coefficient val|ne.0010974.g002 Figure 2 Regulation of stem-like gene expression in proportion to anti-tumor T cell activity. (A) CSC similarity (Pearson's coefficient for similarity to GSCs – GEO accession #GDS2728 - across all transcripts) from 200 Henry Ford Hospital GBM patients (GEO accession #GSE4536) and 6 CSMC GBM patients was assessed and found to be statistically identical, demonstrating absence of r|M”, total n = 200), and cultured GBM lines grown in stem cell media (“HFH CS lines”, total n = 23) as indicated (GEO accession #GSE4536 for both), were arranged in groups with increasing global CSC similarity as in Fig. 1B (n = 20/group for GBM, groups A-J; n = 5 for 4 groups, and n|GSCs across all transcripts), and CD133 expression, were determined for GBM from HFH (n = 200) and high-grade gliomas from UCLA (n = 45; GEO accession #GSE4412), plotted against each other for each individual sample, trendlines generated, and r and P values determined as depicted. CSC similarity correlated significantly with CD133 expression within two se|arity, to distinguish non-fractionated GBM from CD133 − or CD133 + GSCs (29) (GEO accession #GDS2728) or from stem cell media-cultured GBM lines from 2 patients (37) (GEO accession #GSE4536); was determined (P<0.01 denoted by red asterisk). Unlike CD133 expression, only GSC similarity distinguished CD133 + or CD133 GSCs (from multiple sources) from surgical GBM sample|fficient for similarity to GSCs - GEO accession #GDS2728 - across all transcripts), and CD133 expression, were determined for de novo GBM, secondary GBM, and grade 3 gliomas from UCLA (GEO accession #GSE4412) and each parameter assessed for inter-group differences by one-tailed T-test (P<0.01 denoted by red asterisk). CD133 expression has been shown to be highest in de novo GBM (29), and this w | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1526 | GSE9167 | 12/1/2007 | ['9167'] | [] | [u'17965720'] | 2302790 | [u'17965720'] | ['Menezes', 'Huso', 'Piontek', 'Germino', 'Garcia-Gonzalez'] | ['Menezes', 'Huso', 'Piontek', 'Germino', 'Garcia-Gonzalez'] | ['Menezes', 'Huso', 'Piontek', 'Germino', 'Garcia-Gonzalez'] | Nat Med | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1527 | GSE9168 | 10/4/2007 | ['9168'] | [] | [u'17993533'] | 2223683 | [u'17993533'] | ['Lubelski', 'Kuipers', 'Driessen', 'van', 'Agustiandari'] | ['Lubelski', 'Kuipers', 'Driessen', 'van', 'Agustiandari'] | ['Lubelski', 'Kuipers', 'Driessen', 'van', 'Agustiandari'] | J Bacteriol | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1528 | GSE9184 | 9/28/2007 | ['9184'] | [] | [u'18427110'] | 2329691 | [u'18427110'] | ['Tang', 'Wilcox-Adelman', 'Park', 'Kim', 'Collier', 'Sano'] | ['Tang', 'Wilcox-Adelman', 'Park', 'Kim', 'Collier', 'Sano'] | ['Tang', 'Wilcox-Adelman', 'Park', 'Kim', 'Collier', 'Sano'] | Proc Natl Acad Sci U S A | 2008 | 4/22/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1529 | GSE9186 | 11/6/2007 | ['9186'] | [] | [u'17989247'] | 2099583 | [u'17989247'] | ['Marshall', 'Kassner', 'Chin', 'Baribault', 'Cutler'] | ['Marshall', 'Kassner', 'Chin', 'Baribault', 'Cutler'] | ['Baribault', 'Marshall', 'Chin', 'Kassner', 'Cutler'] | Genome Res | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1530 | GSE9196 | 11/1/2007 | ['9196'] | [] | [u'17999768'] | 2258184 | [u'17999768'] | ['', 'Feng', 'Atala', 'Lanza', 'Lu', 'Hipp'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | ['Feng', 'Lu', 'Atala', 'Hipp', 'Lanza'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1531 | GSE9210 | 12/31/2007 | ['9210'] | [] | [u'18266473'] | 2988811 | [u'21124965'] | ['Tanaka', 'Tajima', 'Okada', 'Inoue', 'Shichiri'] | ['Goto', 'Rennert', 'Nagashima', 'Kumamoto', 'Hussain', 'Saito', 'Horikawa', 'Harris', 'Furusato', 'Robles', 'Yokota', 'Baxendale', 'Trivers', 'Sesterhenn', 'Takenoshita', 'Okamura', 'Yamashita', 'Lee'] | [] | PLoS One | 2010 | 11/19/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1532 | GSE9210 | 12/31/2007 | ['9210'] | [] | [u'18266473'] | 2233677 | [u'18266473'] | ['Tanaka', 'Tajima', 'Okada', 'Inoue', 'Shichiri'] | ['Tanaka', 'Tajima', 'Okada', 'Inoue', 'Shichiri'] | ['Tanaka', 'Tajima', 'Okada', 'Inoue', 'Shichiri'] | PLoS Genet | 2008 | 2008 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
1533 | GSE9211 | 12/31/2007 | ['9211'] | [] | [u'17984176'] | 2793095 | [u'19675559'] | ['Rouzaud', 'Watabe', 'Yasumoto', 'Hara', 'Katayama', 'Hoashi', 'Passeron', 'Miki', 'Tohyama', 'Yamaguchi', 'Hearing'] | ['Yamaguchi', 'Morita', 'Hearing', 'Maeda'] | ['Yamaguchi', 'Hearing'] | J Investig Dermatol Symp Proc | 2009 | 2009 Aug | 0 | formed microarray analyses and compared gene expression patterns in normal human melanocytes treated with or without DKK1 ( Yamaguchi et al., 2007b ). The results ( http://www.ncbi.nlm.nih.gov/geo/ GSE5515 ) indicated involvement of apoptosis pathways, including TNF-α and Gadd45, and of melanocyte receptors for DKK1 other than LRP5/6 and Kremen1/2, which explains the suppression of growth and of mela|cted human skin with DKK1 induced a thickened epidermis. We performed microarray analyses using normal human keratinocytes either treated or not treated with DKK1 ( http://www.ncbi.nlm.nih.gov/geo/ GSE9211{{tag}}--DEPOSIT-- ), and we identified three genes responsible for inducing the thickened phenotype: keratin 9, αKLEIP and β-catenin. Keratin 9 seems to be essential to support the skin against severe mechanical sti | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1534 | GSE9214 | 10/11/2007 | ['9214'] | [] | [u'17986355'] | 2220004 | [u'17986355'] | ['Jones', 'Maydan', 'Moerman', 'Baillie', 'Flibotte'] | ['Jones', 'Maydan', 'Moerman', 'Baillie', 'Flibotte'] | ['Jones', 'Maydan', 'Moerman', 'Baillie', 'Flibotte'] | BMC Genomics | 2007 | 11/7/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1535 | GSE9222 | 12/20/2007 | ['9222'] | [] | [u'18252227'] | 2426913 | [u'18252227'] | ['Summers', 'Ketelaars', 'Sloman', 'Ren', 'Noor', 'Thompson', 'Shago', 'Chitayat', 'Kirkpatrick', 'Baatjes', 'Luscombe', 'Skaug', 'Ficicioglu', 'Thiruvahindrapduram', 'Gibbons', 'Lionel', 'Weksberg', 'Nicolson', 'Zwaigenbaum', 'Roberts', 'Scherer', 'Crosbie', 'Vincent', 'Fernandez', 'Feuk', 'Vardy', 'Schreiber', 'Marshall', 'Moessner', 'Szatmari', 'Vos', 'Fiebig', 'Teebi', 'Pinto', 'Friedman'] | ['Summers', 'Ketelaars', 'Sloman', 'Ren', 'Noor', 'Thompson', 'Shago', 'Chitayat', 'Kirkpatrick', 'Baatjes', 'Luscombe', 'Skaug', 'Ficicioglu', 'Thiruvahindrapduram', 'Gibbons', 'Lionel', 'Weksberg', 'Nicolson', 'Zwaigenbaum', 'Roberts', 'Scherer', 'Crosbie', 'Vincent', 'Fernandez', 'Feuk', 'Vardy', 'Schreiber', 'Marshall', 'Moessner', 'Szatmari', 'Vos', 'Fiebig', 'Teebi', 'Pinto', 'Friedman'] | ['Summers', 'Nicolson', 'Sloman', 'Ren', 'Noor', 'Thompson', 'Shago', 'Chitayat', 'Kirkpatrick', 'Baatjes', 'Luscombe', 'Skaug', 'Ficicioglu', 'Ketelaars', 'Gibbons', 'Lionel', 'Crosbie', 'Thiruvahindrapduram', 'Zwaigenbaum', 'Roberts', 'Scherer', 'Vincent', 'Fernandez', 'Feuk', 'Vardy', 'Schreiber', 'Marshall', 'Moessner', 'Weksberg', 'Szatmari', 'Vos', 'Fiebig', 'Teebi', 'Pinto', 'Friedman'] | Am J Hum Genet | 2008 | 2008 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
1536 | GSE9239 | 10/6/2007 | ['9239'] | [] | [u'18039949'] | 2150965 | [u'18039949'] | ['Fuentes-Duculan', 'Sullivan-Whalen', 'Lowes', 'Novitskaya', 'Zaba', 'Khatcherian', 'Krueger', 'Su\xc3\xa1rez-Fari\xc3\xb1as', 'Bluth', 'Gilleaudeau', 'Cardinale'] | ['Fuentes-Duculan', 'Sullivan-Whalen', 'Lowes', 'Novitskaya', 'Zaba', 'Khatcherian', 'Krueger', 'Su\xc3\xa1rez-Fari\xc3\xb1as', 'Bluth', 'Gilleaudeau', 'Cardinale'] | ['Fuentes-Duculan', 'Sullivan-Whalen', 'Lowes', 'Zaba', 'Khatcherian', 'Su\xc3\xa1rez-Fari\xc3\xb1as', 'Krueger', 'Novitskaya', 'Bluth', 'Gilleaudeau', 'Cardinale'] | J Exp Med | 2007 | 12/24/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1537 | GSE9247 | 10/6/2007 | ['9247'] | ['3002'] | [u'17925016'] | 2147034 | [u'17925016'] | ['Schroeder', 'Lamblin', 'Staggs', 'Nair', 'Westendorf'] | ['Schroeder', 'Lamblin', 'Staggs', 'Nair', 'Westendorf'] | ['Schroeder', 'Lamblin', 'Staggs', 'Nair', 'Westendorf'] | BMC Genomics | 2007 | 10/9/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1538 | GSE9248 | 10/18/2007 | ['9248'] | [] | [u'17965872', u'19587787'] | 2258211 | [u'17965872'] | ['Fay', 'MacArthur', 'Schwartz', 'Schaaf', 'Misulovin', 'Sahota', 'Biggin', 'Li', 'Siddiqui', 'Kahn', 'Gause', 'Eisen', 'Dorsett', 'Pirrotta'] | ['Fay', 'MacArthur', 'Schwartz', 'Misulovin', 'Biggin', 'Li', 'Kahn', 'Gause', 'Eisen', 'Dorsett', 'Pirrotta'] | ['Fay', 'MacArthur', 'Schwartz', 'Misulovin', 'Biggin', 'Li', 'Kahn', 'Gause', 'Eisen', 'Dorsett', 'Pirrotta'] | Chromosoma | 2008 | 2008 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
1539 | GSE9248 | 10/18/2007 | ['9248'] | [] | [u'17965872', u'19587787'] | 2703808 | [u'19587787'] | ['Fay', 'MacArthur', 'Schwartz', 'Schaaf', 'Misulovin', 'Sahota', 'Biggin', 'Li', 'Siddiqui', 'Kahn', 'Gause', 'Eisen', 'Dorsett', 'Pirrotta'] | ['Dorsett', 'Schwartz', 'Schaaf', 'Misulovin', 'Sahota', 'Siddiqui', 'Kahn', 'Gause', 'Pirrotta'] | ['Dorsett', 'Schwartz', 'Schaaf', 'Misulovin', 'Sahota', 'Siddiqui', 'Kahn', 'Gause', 'Pirrotta'] | PLoS One | 2009 | 7/9/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1540 | GSE9253 | 10/19/2007 | ['9253'] | [] | [u'17515612'] | 2782370 | [u'19117983'] | ['Kraus', 'Isaacs', 'Clark', 'Zhang', 'Diehl', 'Chen', 'Siepel', 'Kininis'] | ['Steffen', 'Hilsenbeck', 'Ochsner', 'Chen', 'McKenna', 'Watkins'] | ['Chen'] | Cancer Res | 2009 | 1/1/2009 | 0 | t datasets at either time point. Moreover, relaxing the q -value cut -off to 0.2 resulted in only a modest increase in the number of genes in this intersection (data not shown here but available for download from the GEMS website). This initial result indicated that given the extent in variation across the datasets, traditional Venn analysis would be of limited use in arriving at a consensus gene express| Table 1 Studies selected for meta-analysis. Supplementary Material body Supplementary Click here to view. (4.1M, zip) Acknowledgments We thank the principal investigators who made their datasets publicly available. This work was supported by NIDDK NURSA U19 DK62434. Other Sections� Abstract Introduction Materials and Methods Results Gene Expression MetaSignatures (GEMS) web resource Discussion | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1541 | GSE9253 | 10/19/2007 | ['9253'] | [] | [u'17515612'] | 2802188 | [u'20008927'] | ['Kraus', 'Isaacs', 'Clark', 'Zhang', 'Diehl', 'Chen', 'Siepel', 'Kininis'] | ['Kraus', 'Krishnakumar', 'Yang', 'Frizzell', 'Gamble'] | ['Kraus'] | Genes Dev | 2010 | 1/1/2010 | 0 | AND pmc_gds | 1 | 0 | ||||
1542 | GSE9254 | 12/1/2007 | ['9254'] | ['3141'] | [u'18056783'] | 2620272 | [u'19014681'] | ['Brown', 'Molloy', 'Dunne', 'Wattchow', 'LaPointe', 'Young', 'Worthley'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254{{tag}}--REUSE--, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1543 | GSE9254 | 12/1/2007 | ['9254'] | ['3141'] | [u'18056783'] | 2996925 | [u'20565988'] | ['Brown', 'Molloy', 'Dunne', 'Wattchow', 'LaPointe', 'Young', 'Worthley'] | ['Lam', 'Gong', 'Wang', 'Matsudaira', 'Mathavan', 'Du'] | [] | BMC Genomics | 2010 | 6/22/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1544 | GSE9258 | 10/16/2007 | ['9258'] | [] | [u'18508958'] | 2442537 | [u'18508958'] | ['Daniel-Vedele', u'Vedele', 'M\xc3\xa9rigout', 'Renou', u'Martin-Magniette', 'Bitton', 'Lelandais', 'Briand', 'Meyer'] | ['Daniel-Vedele', 'M\xc3\xa9rigout', 'Renou', 'Bitton', 'Lelandais', 'Briand', 'Meyer'] | ['Daniel-Vedele', 'M\xc3\xa9rigout', 'Renou', 'Briand', 'Lelandais', 'Bitton', 'Meyer'] | Plant Physiol | 2008 | 2008 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
1545 | GSE9259 | 12/1/2007 | ['9259'] | [] | [] | 2335119 | [u'18331636'] | [u'Chardon', u'Flori', u'Cochet', u'Rogel-Gaillard', u'Lef\xe8vre', u'Lemonnier', u'Hugot', u'Robin'] | ['Lef\xc3\xa8vre', 'Hugot', 'Flori', 'Cochet', 'Rogel-Gaillard', 'Lemonnier', 'Chardon', 'Robin'] | ['Chardon', 'Flori', 'Cochet', 'Rogel-Gaillard', 'Lemonnier', 'Hugot', 'Robin'] | BMC Genomics | 2008 | 3/10/2008 | 0 | ème d'Information du projet d'Analyse des Genomes des Animaux d'Elevage) [ 20 ]. The SLA/PrV and the Qiagen-NRSP8 microarray data have been submitted to the GEO and received accession numbers GSE8676 and GSE9259{{tag}}--DEPOSIT--, respectively. Statistical data analysis The normalization and statistical analysis steps were performed with scripts written with R software. Functions contained in stats, anapuce and | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1546 | GSE9264 | 10/9/2007 | ['9264'] | ['3007'] | [u'17881434'] | 2169101 | [u'17881434'] | ['Renne', 'Riva', 'Skalsky', 'Baker', 'Plaisance', 'Boss', 'Samols', 'Lopez'] | ['Renne', 'Riva', 'Skalsky', 'Baker', 'Plaisance', 'Boss', 'Samols', 'Lopez'] | ['Renne', 'Riva', 'Skalsky', 'Baker', 'Plaisance', 'Boss', 'Samols', 'Lopez'] | J Virol | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1547 | GSE9266 | 10/23/2007 | ['9266'] | [] | [u'18414485'] | 2387230 | [u'18414485'] | ['Igoshin', 'Veening', 'Kuipers', 'Nijland', 'Eijlander', 'Hamoen'] | ['Igoshin', 'Veening', 'Kuipers', 'Nijland', 'Eijlander', 'Hamoen'] | ['Igoshin', 'Veening', 'Kuipers', 'Nijland', 'Eijlander', 'Hamoen'] | Mol Syst Biol | 2008 | 2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1548 | GSE9286 | 10/12/2007 | ['9286'] | [] | [u'18378698'] | 2423295 | [u'18378698'] | ['Subtil-Rodr\xc3\xadguez', 'Quiles', u'Cecilia', 'Beato', u'Miguel', 'Jordan', 'Mill\xc3\xa1n-Ari\xc3\xb1o', u'Subtil', u'Minana', 'Ballar\xc3\xa9', u'Lauro'] | ['Subtil-Rodr\xc3\xadguez', 'Quiles', 'Beato', 'Jordan', 'Mill\xc3\xa1n-Ari\xc3\xb1o', 'Ballar\xc3\xa9'] | ['Subtil-Rodr\xc3\xadguez', 'Quiles', 'Beato', 'Jordan', 'Mill\xc3\xa1n-Ari\xc3\xb1o', 'Ballar\xc3\xa9'] | Mol Cell Biol | 2008 | 2008 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
1549 | GSE9290 | 10/20/2007 | ['9290'] | [] | [u'18049034'] | 2801700 | [u'20003344'] | ['Baba', 'Kuwahara', 'Wang', 'Nakagawa', 'Fukumoto', 'Li', 'Ohkubo', 'Roudkenar', 'Kasaoka', 'Ono'] | ['Van', 'Marchi', 'Romualdi', 'M\xc3\xbcller', 'Radonjic', 'Calura', 'Cavalieri'] | [] | BMC Genomics | 2009 | 12/11/2009 | 0 | dicted PPAR targets [ 13 ]. Our study complemented the genome-wide analysis conducted to date by adding a meta-analysis performed across species of expression data related to PPARα signaling. Publicly available gene expression studies selected for our meta-analysis included experiments addressing molecular response to high fat diet, PPARα activation by various stimuli and gene expression i|tails). Table 1 Meta-analysis data collection. PPARα signaling n° Reference Dataset Accession Number GEO/AE Org Tissue Technology PPARα signaling activated by WY14643 1 [ 55 ] GSE8302/E-GEOD-8302 Hs Liver Affymetrix 2 [ 55 ] GSE8302/E-GEOD-8302 Mm Liver Affymetrix 3 [ 55 ] GSE8302/E-GEOD-8302 Rn Liver Affymetrix PPARα signaling repressed using PPARα knokout mice |] GSE8291/E-GEOD-8291 Mm Liver Affymetrix 6 [ 55 ] GSE8292/E-GEOD-8292 Mm Liver Affymetrix 7 [ 55 ] GSE8295/E-GEOD-8295 Mm Liver Affymetrix PPARα signaling activated by High fat diet 8 [ 56 ] GSE8753/E-GEOD-8753 Mm Liver Affymetrix 9 [ 57 ] GSE6903/E-GEOD-6853 Mm Liver Affymetrix 10 [ 58 ] GSE8524/~ Mm Liver Affymetrix 11 [ 59 ] GSE1560/E-GEOD-1560 Mm Aorta Oligo Array 12 [ 60 ] GSE8700/E-GEOD-|Table 2 Validation data sets. n° PPARα signalling Reference Dataset Accession Number GEO/AE Org Tissue Technology 1 PPARα signaling activated by WY14643 (PPARα WY14643-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 2 PPARα signaling repressed using PPARα knokout mice (PPARα KO1-GSE8396) [ 64 ] GSE8396/E-GEOD-8396 Mm Liver Affymetrix 3 PPA|α signaling activated by High fat diet (HFD-E-MEXP-1755) [ 65 ] ~/E-MEXP-1755 Mm Liver Affymetrix 5 Oleate response repressed using knokout yeast of a transcription promoter (del_ADR1) [ 50 ] GSE5862/~ Sc ~ Oligo Array 6 Oleate response repressed using knokout yeast of a transcription promoter (del_PIP2) [ 50 ] GSE5862/~ Sc ~ Oligo Array 7 Oleate response repressed using knokout yeast of a tran|y 8 Oleate response activated using knokout yeast of a transcription repressor (del_OAF3) [ 50 ] GSE5862/~ Sc ~ Oligo Array 9 Oleate response activated by High fat diet (oleate_vs_low_glucose) [ 50 ] GSE5862/~ Sc ~ Oligo Array Statistical analysis of microarray data Gene expression of Affymetrix datasets were quantified and separately normalized using rma technique [ 17 ] and. EntrezGene Custom CDF f|ers of biological replicates as required for powerful inference. Fold change cut-off, filtered by variance coefficient, was used to select DEGs in those datasets with less than 3 replicates per gene (GSE8302, GSE9291 and GSE9290{{tag}}--REUSE--). Pathways analysis on DEGs Enrichment analysis on metabolic pathways was calculated for each dataset using Fisher exact test based on hypergeometric distribution with a p-val | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1550 | GSE9295 | 11/30/2007 | ['9295'] | [] | [] | 2080756 | [u'18043759'] | [u'Ambrose', u'Svensson', u'Rusyn', u'Samson', u'Begley', u'Fry', u'Klapacz'] | ['Ambrose', 'Svensson', 'Rusyn', 'Samson', 'Fry', 'Begley', 'Klapacz'] | ['Ambrose', 'Svensson', 'Rusyn', 'Samson', 'Begley', 'Fry', 'Klapacz'] | PLoS One | 2007 | 11/28/2007 | 0 | BER relative to the wild type strain (ANOVA, p<0.05; Fold change ≥1.5, ≤−1.5). All microarray data have been submitted to the Gene Expression Omnibus, accession number GSE9295{{tag}}--DEPOSIT-- ( www.ncbi.nlm.nih.gov/geo/ ). Network Analysis, Gene Ontology and ESR Enrichment Network analysis was carried out using the Cytoscape software [24] . Transcriptionally modulated O | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1551 | GSE9301 | 11/1/2007 | ['9301'] | [] | [] | 2762118 | [u'19627265'] | [u'Park', u'Tedesco', u'Johnson'] | ['Park', 'Johnson', 'Tedesco'] | ['Park', 'Johnson', 'Tedesco'] | Aging Cell | 2009 | 2009 Jun | 0 | actor. This scaling factor is multiplied by each probe set signal to give the raw expression. The microarray gene expression data are available at http://www.ncbi.nlm.nih.gov/geo/ (accession number GSE9301{{tag}}--DEPOSIT--). The fold change (Fold Δ) was calculated using mean signal intensity of each individual gene in each study group. After array normalization, Significance Analysis of Microarray (SAM) was p | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1552 | GSE9306 | 10/31/2007 | ['9306'] | [] | [u'17989215'] | 2084302 | [u'17989215'] | ['Sharp', 'Calabrese', 'Seila', 'Yeo'] | ['Sharp', 'Calabrese', 'Seila', 'Yeo'] | ['Sharp', 'Calabrese', 'Seila', 'Yeo'] | Proc Natl Acad Sci U S A | 2007 | 11/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1553 | GSE9307 | 11/6/2007 | ['9307'] | [] | [u'18276640'] | 2346606 | [u'18276640'] | ['Macgregor', 'Henders', 'Montgomery', 'Zhao', u'Martin', 'Visscher', 'Nicholas'] | ['Macgregor', 'Henders', 'Montgomery', 'Zhao', 'Visscher', 'Nicholas'] | ['Macgregor', 'Henders', 'Montgomery', 'Zhao', 'Visscher', 'Nicholas'] | Nucleic Acids Res | 2008 | 2008 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
1554 | GSE9311 | 10/24/2007 | ['9311'] | ['3243', '3225'] | [u'18251864'] | 2785812 | [u'19917117'] | ['Van', 'Tamaoki', 'Inoue', 'Pilon-Smits', 'Hess', 'Takahashi'] | ['Kroll', 'Barkema', 'Carlon'] | [] | Algorithms Mol Biol | 2009 | 11/16/2009 | 1 | d to the local physical neighbors and to the nearest-neighbor free energy. Table 1 Overview over organisms and number of CEL-files analyzed Organism GEO # Chiptype (dimension) # files min A. Thaliana GSE4847 ATH1-121501 (712 × 712) 18 0.0259 GSE7642 ATH1-121501 (712 × 712) 12 0.0544 GSE9311{{tag}}--REUSE-- ATH1-121501 (712 × 712) 8 0.0546 C. Elegans GSE6547 Celegans (712 × 712) 25 0.036|00d7; 712) 7 0.0396 D. Melanogaster GSE3990 Drosophila_2 (732 × 732) 6 0.0620 GSE6558 DrosGenome1 (640 × 640) 24 0.0605 D. Rerio GSE4859 Zebrafish (712 × 712) 8 0.0357 E. Coli GSE11779 E_coli_2 (478 × 478) 3 0.0869 GSE2928 Ecoli (544 × 544) 12 0.0172 GSE6195 E_coli_2 (478 × 478) 4 0.0664 H. Sapiens GSE10433 HG-U133A_2 (732 × 732) 12 0.0757 GSE5054|-U133A (712 × 712) 14 0.0296 GSE8514 HG-U133_Plus_2 (1164 × 1164) 15 0.0738 GSE11897 MOE430A (712 × 712) 11 0.0640 MOE430B (712 × 712) Mouse430_2 (1002 × 1002) GSE6210 Mouse430_2 12 0.0594 GSE6297 Mouse430_2 24 0.0325 O. Sativa GSE15071 Rice (1164 × 1164) 20 0.1157 R. Norvegicus GSE4494 RG_U34A (534 × 534) 59 0.0488 GSE7493 Rat230_2 (834 ×|2 (834 × 834) 4 0.0640 S. Aureus GSE7944 S_aureus (602 × 602) 6 0.0746 S. Cerevisiae GSE6073 YG_S98 (534 × 534) 12 0.0283 GSE8379 YG_S98 (534 × 534) 8 0.0180 X. Laevis GSE3368 Xenopus_laevis (712 × 712) 20 0.0514 Table 2 Optimized parameter values as obtained from the minimization of Eq. (3). A. Thaliana C. Elegans D. Melanogaster D. Rerio GEO no GSE4847 GSE7642|d in this work. Figure 5 Computed background vs. low expressed genes data . Experimental results (solid black line) and theoretical prediction (dashed red line) of probeset a) 256610_at (A. Thaliana, GSE4847, GSM109107.CEL) and b) 175270_at (C. Elegans, GSE8159, GSM201995.CEL). Discussion We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1555 | GSE9314 | 11/1/2007 | ['9314'] | [] | [u'18658108'] | 2542434 | [u'18658108'] | ['Lang', 'Hong', 'Ma', 'Barnard', 'Garcia', 'Dudek', 'Moreno-Vinasco', 'Ye', 'Husain', 'Evenoski', 'Jacobson', 'Mirzapoiazova', 'Moitra', 'Huang', 'Chiang', 'Reeves', 'Lussier', 'Sammani'] | ['Lang', 'Hong', 'Ma', 'Barnard', 'Garcia', 'Dudek', 'Moreno-Vinasco', 'Ye', 'Husain', 'Evenoski', 'Jacobson', 'Mirzapoiazova', 'Moitra', 'Huang', 'Chiang', 'Reeves', 'Lussier', 'Sammani'] | ['Lang', 'Hong', 'Ma', 'Barnard', 'Garcia', 'Moreno-Vinasco', 'Ye', 'Husain', 'Evenoski', 'Jacobson', 'Mirzapoiazova', 'Moitra', 'Huang', 'Dudek', 'Chiang', 'Reeves', 'Lussier', 'Sammani'] | Am J Respir Crit Care Med | 2008 | 9/15/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1556 | GSE9316 | 10/17/2007 | ['9316'] | [] | [u'18025126'] | 2118525 | [u'18025126'] | ['Ito', 'Sugimoto', 'Nomura', 'Maeda', 'Sakaguchi', 'Teradaira', 'Nakamura', 'Hirota', 'Yamaguchi', 'Yoshitomi', 'Hashimoto'] | ['Ito', 'Sugimoto', 'Nomura', 'Maeda', 'Sakaguchi', 'Teradaira', 'Nakamura', 'Hirota', 'Yamaguchi', 'Yoshitomi', 'Hashimoto'] | ['Ito', 'Sugimoto', 'Nomura', 'Maeda', 'Sakaguchi', 'Teradaira', 'Nakamura', 'Hirota', 'Yamaguchi', 'Yoshitomi', 'Hashimoto'] | J Exp Med | 2007 | 11/26/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1557 | GSE9318 | 11/14/2007 | ['9318'] | [] | [u'18006685'] | 2049192 | [u'18006685'] | ['Chen', 'de', 'Bell'] | ['Chen', 'de', 'Bell'] | ['Chen', 'de', 'Bell'] | Genes Dev | 2007 | 11/15/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1558 | GSE9335 | 10/31/2007 | ['9335'] | [] | [u'17978184'] | 2077018 | [u'17978184'] | ['Tentler', 'Coppola', 'Oldham', u'Tentler*', 'Abrahams', u'Perederiy*', 'Geschwind', 'Perederiy'] | ['Tentler', 'Coppola', 'Oldham', 'Abrahams', 'Geschwind', 'Perederiy'] | ['Tentler', 'Coppola', 'Oldham', 'Abrahams', 'Geschwind', 'Perederiy'] | Proc Natl Acad Sci U S A | 2007 | 11/6/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1559 | GSE9342 | 10/20/2007 | ['9342'] | [] | [] | 2717951 | [u'19545436'] | [u'Thomas', u'Moreau', u'Gutierrez', u'Maser', u'Tchinda', u'Rothstein', u'Neuberg', u'Kutok', u'Feng', u'Li', u'DeAngelo', u'Chin', u'DePinho', u'Silverman', u'Shirley', u'McKenna', u'O\u2019Neil', u'Wong', u'Lee'] | ['Heber', 'Sick', 'Howard'] | [] | BMC Bioinformatics | 2009 | 6/22/2009 | 1 | ded from the NCBI GEO database [ 28 ]. A variety of Affymetrix GeneChip 3' Expression array types are represented in the dataset, including: ath1121501 (Arabidopsis, 248 chips; GEO accession numbers: GSE5770, GSE5759, GSE911 [ 29 ], GSE2538 [ 30 ], GSE3350 [ 31 ], GSE3416 [ 32 ], GSE5534, GSE5535, GSE5530, GSE5529, GSE5522, GSE5520, GSE1491 [ 33 ], GSE2169, GSE2473), hgu133a (human, 72 chips; GSE1420 [|), hgu95av2 (human, 51 chips; GSE1563 [ 35 ]), hgu95d (human, 22 chips; GSE1007 [ 36 ]), hgu95e (human, 21 chips; GSE1007), mgu74a (mouse, 60 chips; GSE76, GSE1912 [ 37 ]), mgu74av2 (mouse, 29 chips; GSE1947 [ 38 ], GSE1419 [ 39 , 40 ]), moe430a (mouse, 10 chips; GSE1873 [ 41 ]), mouse4302 (mouse, 20 chips; GSE5338 [ 42 ], GSE1871 [ 43 ]), rae230a (rat, 26 chips; GSE1918, GSE2470), and rgu34a (rat, 44 |xperiment. The second dataset consists of all of the exon array .CEL files available in the GEO database at the time of this analysis (540 .CEL files). Fourteen different experiments are represented (GSE10599 [ 47 ], GSE10666 [ 48 ], GSE11150 [ 49 ], GSE11344 [ 50 ], GSE11967 [ 51 ], GSE12064 [ 52 ], GSE6976 [ 53 ], GSE7760 [ 54 ], GSE7761 [ 55 ], GSE8945 [ 56 ], GSE9342{{tag}}--REUSE--, GSE9372 [ 57 ], GSE9385 [ 58 ] | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1560 | GSE9344 | 12/23/2007 | ['9344'] | [] | [u'18671851'] | 2518935 | [u'18671851'] | ['Elsasser', 'Fernandes', 'Van', 'Capuco', 'Sonstegard', 'Connor', 'Siferd', 'Evock-Clover'] | ['Elsasser', 'Fernandes', 'Van', 'Capuco', 'Sonstegard', 'Connor', 'Siferd', 'Evock-Clover'] | ['Elsasser', 'Van', 'Capuco', 'Sonstegard', 'Fernandes', 'Connor', 'Evock-Clover', 'Siferd'] | BMC Genomics | 2008 | 7/31/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1561 | GSE9353 | 12/14/2007 | ['9353'] | [] | [u'18068631'] | 2175032 | [u'18068631'] | ['Sorensen', 'Perou', 'Herschkowitz', 'Orkin', 'Bronson', 'Li', 'Lannon', 'Tognon', 'Godinho', 'Kim', 'Yasaitis', 'Cho', 'Hock'] | ['Sorensen', 'Perou', 'Herschkowitz', 'Orkin', 'Bronson', 'Li', 'Lannon', 'Tognon', 'Godinho', 'Kim', 'Yasaitis', 'Cho', 'Hock'] | ['Herschkowitz', 'Perou', 'Yasaitis', 'Orkin', 'Bronson', 'Li', 'Lannon', 'Tognon', 'Godinho', 'Kim', 'Sorensen', 'Cho', 'Hock'] | Cancer Cell | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1562 | GSE9354 | 12/14/2007 | ['9354'] | [] | [u'18068631'] | 2175032 | [u'18068631'] | ['Sorensen', 'Perou', 'Herschkowitz', 'Orkin', 'Bronson', 'Li', 'Lannon', 'Tognon', 'Godinho', 'Kim', 'Yasaitis', 'Cho', 'Hock'] | ['Sorensen', 'Perou', 'Herschkowitz', 'Orkin', 'Bronson', 'Li', 'Lannon', 'Tognon', 'Godinho', 'Kim', 'Yasaitis', 'Cho', 'Hock'] | ['Herschkowitz', 'Perou', 'Yasaitis', 'Orkin', 'Bronson', 'Li', 'Lannon', 'Tognon', 'Godinho', 'Kim', 'Sorensen', 'Cho', 'Hock'] | Cancer Cell | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1563 | GSE9355 | 12/14/2007 | ['9355'] | [] | [u'18068631'] | 2175032 | [u'18068631'] | ['Sorensen', 'Perou', 'Herschkowitz', 'Orkin', 'Bronson', 'Li', 'Lannon', 'Tognon', 'Godinho', 'Kim', 'Yasaitis', 'Cho', 'Hock'] | ['Sorensen', 'Perou', 'Herschkowitz', 'Orkin', 'Bronson', 'Li', 'Lannon', 'Tognon', 'Godinho', 'Kim', 'Yasaitis', 'Cho', 'Hock'] | ['Herschkowitz', 'Perou', 'Yasaitis', 'Orkin', 'Bronson', 'Li', 'Lannon', 'Tognon', 'Godinho', 'Kim', 'Sorensen', 'Cho', 'Hock'] | Cancer Cell | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1564 | GSE9365 | 12/20/2007 | ['9365'] | ['3419'] | [u'18281415'] | 2287347 | [u'18281415'] | ['Stitt', 'Winter', 'Stein', 'Weschke', 'Radchuk', 'Sreenivasulu', 'Scholz', 'Graner', 'Strickert', 'Wobus', 'Close', 'Usadel'] | ['Stitt', 'Winter', 'Stein', 'Weschke', 'Radchuk', 'Sreenivasulu', 'Scholz', 'Graner', 'Strickert', 'Wobus', 'Close', 'Usadel'] | ['Stitt', 'Winter', 'Stein', 'Weschke', 'Radchuk', 'Sreenivasulu', 'Scholz', 'Graner', 'Strickert', 'Wobus', 'Close', 'Usadel'] | Plant Physiol | 2008 | 2008 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
1565 | GSE9368 | 11/1/2007 | ['9368'] | [] | [u'18658108'] | 2850090 | [u'19779014'] | ['Lang', 'Hong', 'Ma', 'Barnard', 'Garcia', 'Dudek', 'Moreno-Vinasco', 'Ye', 'Husain', 'Evenoski', 'Jacobson', 'Mirzapoiazova', 'Moitra', 'Huang', 'Chiang', 'Reeves', 'Lussier', 'Sammani'] | ['Lang', 'Svensson', 'Ma', 'Kogut', 'Morrisey', 'Yates', 'Solway', 'Garcia', 'Camoretti-Mercado', 'Huang', 'Chen', 'Turner', 'Tao', 'Gruber'] | ['Lang', 'Garcia', 'Ma', 'Huang'] | Am J Physiol Gastrointest Liver Physiol | 2009 | 2009 Dec | 0 | 0 | 0 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1566 | GSE9368 | 11/1/2007 | ['9368'] | [] | [u'18658108'] | 2542434 | [u'18658108'] | ['Lang', 'Hong', 'Ma', 'Barnard', 'Garcia', 'Dudek', 'Moreno-Vinasco', 'Ye', 'Husain', 'Evenoski', 'Jacobson', 'Mirzapoiazova', 'Moitra', 'Huang', 'Chiang', 'Reeves', 'Lussier', 'Sammani'] | ['Lang', 'Hong', 'Ma', 'Barnard', 'Garcia', 'Dudek', 'Moreno-Vinasco', 'Ye', 'Husain', 'Evenoski', 'Jacobson', 'Mirzapoiazova', 'Moitra', 'Huang', 'Chiang', 'Reeves', 'Lussier', 'Sammani'] | ['Lang', 'Hong', 'Ma', 'Barnard', 'Garcia', 'Moreno-Vinasco', 'Ye', 'Husain', 'Evenoski', 'Jacobson', 'Mirzapoiazova', 'Moitra', 'Huang', 'Dudek', 'Chiang', 'Reeves', 'Lussier', 'Sammani'] | Am J Respir Crit Care Med | 2008 | 9/15/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1567 | GSE9381 | 10/20/2007 | ['9381'] | ['3059'] | [u'17890343'] | 2168209 | [u'17890343'] | ['Scow', 'Legler', 'Chain', 'Kane', 'Chakicherla', 'Wu', 'Hristova', 'Schmidt'] | ['Scow', 'Legler', 'Chain', 'Kane', 'Chakicherla', 'Wu', 'Hristova', 'Schmidt'] | ['Scow', 'Legler', 'Chain', 'Kane', 'Chakicherla', 'Wu', 'Hristova', 'Schmidt'] | Appl Environ Microbiol | 2007 | 2007 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1568 | GSE9391 | 10/29/2007 | ['9391'] | [] | [] | 2194693 | [u'17997827'] | [u'Krupinski', u'Mitsios', u'Slevin', u'Kumar', u'Gaffney'] | ['Krupinski', 'Slevin', 'Wang', 'Kumar', 'Sullivan', 'Sanfeliu', 'Gaffney', 'Saka', 'Rubio', 'Mitsios', 'Pennucci'] | [u'Krupinski', u'Mitsios', u'Slevin', u'Kumar', u'Gaffney'] | BMC Neurosci | 2007 | 11/12/2007 | 0 | > 2-fold or downregulated < 0.5-fold were counted as deregulated and taken into consideration. The microarray data are available in Gene Expression Omnibus under the accession number GSE9391{{tag}}--DEPOSIT--. Reverse Transcription-Polymerase Chain Reaction (RT-PCR) Gene expression was examined by semi-quantitative RT-PCR with standard reaction conditions of a 10 min denaturation at 94°C, follow | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1569 | GSE9397 | 10/23/2007 | ['9397'] | [] | [u'17984056'] | 2084313 | [u'17984056'] | ['Sauvage', 'Winokur', 'van', 'Qian', 'Laoudj-Chenivesse', 'Leo', 'Matt\xc3\xa9otti', 'Figlewicz', 'Tassin', 'Dixit', 'Shi', 'Chen', 'Ansseau', 'Belayew', 'Copp\xc3\xa9e', 'Barro'] | ['Sauvage', 'Winokur', 'van', 'Qian', 'Laoudj-Chenivesse', 'Leo', 'Matt\xc3\xa9otti', 'Figlewicz', 'Tassin', 'Dixit', 'Shi', 'Chen', 'Ansseau', 'Belayew', 'Copp\xc3\xa9e', 'Barro'] | ['Sauvage', 'Winokur', 'van', 'Qian', 'Laoudj-Chenivesse', 'Leo', 'Matt\xc3\xa9otti', 'Figlewicz', 'Tassin', 'Dixit', 'Shi', 'Chen', 'Ansseau', 'Belayew', 'Copp\xc3\xa9e', 'Barro'] | Proc Natl Acad Sci U S A | 2007 | 11/13/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1570 | GSE9411 | 10/24/2007 | ['9411'] | [] | [] | 2347392 | [u'18310348'] | [u'Crhanova', u'Rychlik'] | ['Sebkova', 'Crhanova', 'Karasova', 'Budinska', 'Rychlik'] | [u'Crhanova', u'Rychlik'] | J Bacteriol | 2008 | 2008 May | 0 | by significance analysis of microarrays ( 34 ) using the Excel version with the FDR value set to 0.05. Raw data from the microarray analysis were deposited in the GEO database under accession number GSE9411{{tag}}--DEPOSIT-- .  Other Sections� Abstract MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES RESULTS Sensitivity of aro mutants to porcine serum. Although Salmonella is predominantly an intracellular par | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1571 | GSE9414 | 10/27/2007 | ['9414'] | [] | [u'18007591'] | 2140113 | [u'18007591'] | ['Scharf', 'van', 'Bell', 'MacAlpine', 'Peters', 'Imhof', u'Schubeler', 'Sch\xc3\xbcbeler', 'Garza', 'Schwaiger', 'Zilbermann', 'Hild', 'Wirbelauer'] | ['Scharf', 'van', 'Bell', 'MacAlpine', 'Peters', 'Imhof', 'Sch\xc3\xbcbeler', 'Garza', 'Schwaiger', 'Zilbermann', 'Hild', 'Wirbelauer'] | ['Scharf', 'van', 'Bell', 'MacAlpine', 'Peters', 'Imhof', 'Sch\xc3\xbcbeler', 'Garza', 'Schwaiger', 'Zilbermann', 'Hild', 'Wirbelauer'] | EMBO J | 2007 | 12/12/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1572 | GSE9434 | 10/26/2007 | ['9434'] | [] | [u'18059444'] | 2174629 | [u'18059444'] | ['', 'Park', 'Lee', 'Kim'] | ['Park', 'Lee', 'Kim'] | ['Park', 'Lee', 'Kim'] | Mol Syst Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1573 | GSE9444 | 12/7/2007 | ['9444'] | [] | [u'18077435'] | 3002903 | [u'21092329'] | ['', 'Gurcel', 'Dorsaz', 'Franken', 'Pradervand', 'Hagenbuchle', "O'Hara", 'Petit', 'Tafti', 'Pfister', 'Maret'] | ['Mehrotra', 'Thomas', 'de', 'Chang'] | [] | BMC Bioinformatics | 2010 | 11/24/2010 | 0 | s, moment-ratio diagrams and relationships between the different moments of the distribution of the gene expression was used to characterize the observed distributions. The data are obtained from the publicly available gene expression database, Gene Expression Omnibus (GEO) to characterize the empirical distributions of gene expressions obtained under varying experimental situations each of which providin|tion, changes at higher levels of gene expression are more difficult to detect [ 29 ]. Most of the current journals require the microarray data to be deposited on a database (like the Gene Expression Omnibus, GEO [ 30 ]) if these data were used for analysis in a paper. The data deposited on the database are in the form of multiple samples corresponding to each of multiple conditions (typically one of the|xpression as measured by microarrays across multiple univariate theoretical distributions with sufficient amount of data. Since 2002 a significant amount of data from sources like the Gene Expression Omnibus GEO [ 30 ] and ArrayExpress ([ 46 ] has become available. This data can now potentially be used to validate the distribution and noise assumptions for statistical analysis and to develop improved inf|n-commercial platform using a two-color, spotted oligonucleotide array technology. Table 1 Description of the data sets used. Data set GEO accession Technology No. of Samples Tissue Data Craniofacial GSE7759 in situ oligonucleotide 105 craniofacial Logarithm of transcript measure Liver GSE8396 in situ oligonucleotide 93 liver Logarithm of transcript measure Brain GSE9444{{tag}}--REUSE-- in situ oligonucleotide 69 brai|nal file 4 , Table S2 in situ oligonucleotide 6219 mixed Logarithm of transcript measure Male GSE2814 spotted oligonucleotide 155 liver Logarithm of transcript measure relative to common pool Female GSE2814 spotted oligonucleotide 156 liver Logarithm of transcript measure relative to common pool Further detailed description of these data sets are provided in Additional files 1 , 2 , 3 , 4 : Table | periods. There were three biological replicate samples per combination of strain and time period of sleep deprivation. In addition, 6219 microarray samples on the Affymetrix 430.2 platform were also downloaded from GEO [ 30 ]. The GSM sample ids along with a description of the experimental conditions under which the samples were collected are given in Additional file 4 , Table S2. The samples were chose|from the Affymetrix Mouse 430.2 was visually examined. Samples that did not seem like outliers were chosen to form the set of samples that formed the basis of our analysis. Note that the raw data was downloaded in all cases and then preprocessed in a well-defined manner (see below). These samples were collected from mice of different strains, development stages, tissues, sex, and different laboratories ov|w years. The distribution of expression of the 21 house keeping genes identified in [ 47 ] was analyzed using this data. The data from the non-commercial spotted oligonucleotide array (GEO accession: GSE2814) were analyzed in a number of papers [ 49 - 54 ] where one of the goals was to sex-specific and tissue-specific differences in gene expression. The data were from the liver of ApoE null (C57BL/6J &|f an eQTL with LOD score cutoff or significant correlations with the characterized traits of adiposity, plasma lipids, or atherosclerosis were ignored. The log ratio data were used then used as-is as downloaded from GEO. Testing Distributional Assumptions For each of the data sets "Craniofacial", "Liver" and "Brain", "Housekeeping", "Male" and "Female", Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) hy|er would be useful in guiding analyses of these new data and also in making distributional hypotheses. Conclusions The analyses of the empirical probability distribution of gene expressions from five publicly available data sources with relatively large number of samples have been described in this manuscript. The failure of the distributions to follow any of the known theoretical univariate probability d | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1574 | GSE9444 | 12/7/2007 | ['9444'] | [] | [u'18077435'] | 2148427 | [u'18077435'] | ['', 'Gurcel', 'Dorsaz', 'Franken', 'Pradervand', 'Hagenbuchle', "O'Hara", 'Petit', 'Tafti', 'Pfister', 'Maret'] | ['Gurcel', 'Dorsaz', 'Franken', 'Pradervand', 'Hagenbuchle', "O'Hara", 'Petit', 'Tafti', 'Pfister', 'Maret'] | ['Gurcel', 'Dorsaz', 'Franken', 'Pradervand', 'Hagenbuchle', "O'Hara", 'Petit', 'Tafti', 'Pfister', 'Maret'] | Proc Natl Acad Sci U S A | 2007 | 12/11/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1575 | GSE9452 | 10/31/2007 | ['9452'] | ['3119'] | [] | 2620272 | [u'19014681'] | [u'Olsen', u'Csillag', u'Nielsen', u'Seidelin'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452{{tag}}--REUSE-- Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1576 | GSE9455 | 10/30/2007 | ['9455'] | [] | [u'18068629'] | 2148463 | [u'18068629'] | ['Earl', 'Ahmed', 'Blenkiron', 'Bell', 'Massie', 'Swanton', 'Ibrahim', 'Mills', 'Caldas', 'McGeoch', 'Brenton', 'Vias', 'Downward', 'Nicke', 'Crawford', 'Temple', 'Laskey', 'Iyer'] | ['Earl', 'Ahmed', 'Blenkiron', 'Bell', 'Massie', 'Swanton', 'Ibrahim', 'Mills', 'Caldas', 'McGeoch', 'Brenton', 'Vias', 'Downward', 'Nicke', 'Crawford', 'Temple', 'Laskey', 'Iyer'] | ['Earl', 'Ahmed', 'Swanton', 'Bell', 'Massie', 'Ibrahim', 'Mills', 'Caldas', 'Blenkiron', 'McGeoch', 'Brenton', 'Vias', 'Downward', 'Nicke', 'Crawford', 'Temple', 'Laskey', 'Iyer'] | Cancer Cell | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1577 | GSE9463 | 11/1/2007 | ['9463'] | [] | [] | 3025572 | [u'20860999'] | [u'Murata', u'Kitagawa', u'Iwahashi'] | ['Deng', 'Shao', 'Dong', 'Zhang'] | [] | Nucleic Acids Res | 2011 | 1/1/2011 | 0 | et al. ( 38 ) and Hu et al. ( 39 ), respectively. A much larger expression data set was used in predicting expression noise using the SVR. We compiled 633 microarray data sets from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ ). We used this expression compendium to calculate five different types of expression variation: (i) expression variation under different environmental conditi|measured. Types (iv) and (v) were calculated as Euclidian distance (ED) among different stains and species. All these data are available upon request. A list of essential genes in S. cerevisiae was downloaded from Mewes et al. ( 40 ), and the haploinsufficient genes were taken from Deutschbauer et al. ( 41 ). We compiled protein–protein interactions from the BioGrid database in April 2010 ( |ase pair. Determination of statistical significance of Gene Ontology terms We used hypergeometric distribution in calculating statistical significance of Gene Ontology (GO) terms. GO annotations were downloaded from Ensembl database. We performed genome-wide analysis to ensure that it had sufficient power to detect significant GO terms. We use N to denote the total number of genes in yeast that have any | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1578 | GSE9465 | 12/1/2007 | ['9465'] | [] | [u'19057703'] | 2967749 | [u'21044366'] | ['Lang', 'Natarajan', 'Wang', 'Goonewardena', 'Linares', 'Breysse', 'Moreno-Vinasco', 'Samet', 'Garcia', 'Huang', 'Geyh', 'Lussier', 'Grabavoy'] | ['Yousif', 'Mbagwu', 'Ohno-Machado', 'Lacson'] | [] | BMC Bioinformatics | 2010 | 10/28/2010 | 0 | es/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background The amount of data deposited in the Gene Expression Omnibus (GEO) has expanded significantly. It is important to ensure that these data are properly annotated with clinical data and descriptions of experimental conditions so that they can be useful for future|istency. Association between relevant variables, however, was adequate. 10–12 March 2010 2010 AMIA Summit on Translational Bioinformatics San Francisco, CA, USA Background The Gene Expression Omnibus (GEO) project was initiated by the National Center for Biotechnology Information (NCBI) to serve as a repository for gene expression data [ 1 , 2 ]. In addition to GEO, there are several other large-| 400,000 samples. There has been an ever growing interest in large microarray repositories for several reasons: (a) Microarray data are required by funding agencies and scientific journals to be made publicly accessible; (b) such repositories enable researchers to view data from other research groups; and (c) with proper pre-processing, such repositories may allow researchers to formulate and test hypothe|viously described [ 14 , 15 ]. The annotation tool used for this research was developed to facilitate human annotation by allowing easy access between the data descriptions and measurements that were downloaded from GEO and appropriate scientific publications from Pubmed [ 13 ]. The annotators are able to read the study descriptions that researchers deposited in GEO, as well as individual sample descripti|, and the results are displayed in Table 3 . Table 4 shows all the studies’ goals and the number of samples in each of the 17 annotated studies. Table 3 Coverage of Asthma variables in GDS GSE 470 GSE 473 GSE 3183 GSE 3004 Total Agent 100% 0% 100% 100% 17.4% Disease State 100% 100% 0% 0% 88.2% Time 100% 0% 100% 0% 12.7% Other 0% 100% 0% 0% 82.5% No. of Samples 12 175 15 10 212 Table 4 Annota|dy No. of Samples Topic/Title GSE8052 404 Determinants of susceptibility to childhood asthma GSE473 175 Defining diagnostic genes from purified CD4+ blood cells that have specific diagnostic profiles GSE4302 118 Profiling of airway epithelial cells GSE3184 40 Murine airway hyperresponsiveness GSE483 39 Allergic response to ragweed GSE1301 24 Mechanisms by which IL-13 elicits the symptoms of asthma GSE8|fects of exercise on gene expression GSE6858 16 Expression data from experimental murine asthma GSE3183 15 Early cytokine-mediated mechanisms that lead to asthma GSE470 12 Asthma exacerbatory factors GSE9465{{tag}}--REUSE-- 12 Pulmonary responses to ambient particulate matter GSE3004 10 Effects of allergen challenge on airway cell gene expression GSE2276 9 Effect of PGE receptor subtype agonist on an asthma model GSE4|d inhaler 697 24.1 Disease frequency 627 31.7 Gender 489 46.7 Atopic 425 53.7 Tissue 403 56.1 Challenge 0 1.0 The consistency of the studies in the asthma domain was also measured. In one such study (GSE4302), the data for 32 asthmatics randomized to a placebo-controlled trial of fluticasone propionate were examined. The authors use the generic name “fluticasone propionate” within both | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1579 | GSE9465 | 12/1/2007 | ['9465'] | [] | [u'19057703'] | 2850090 | [u'19779014'] | ['Lang', 'Natarajan', 'Wang', 'Goonewardena', 'Linares', 'Breysse', 'Moreno-Vinasco', 'Samet', 'Garcia', 'Huang', 'Geyh', 'Lussier', 'Grabavoy'] | ['Lang', 'Svensson', 'Ma', 'Kogut', 'Morrisey', 'Yates', 'Solway', 'Garcia', 'Camoretti-Mercado', 'Huang', 'Chen', 'Turner', 'Tao', 'Gruber'] | ['Lang', 'Garcia', 'Huang'] | Am J Physiol Gastrointest Liver Physiol | 2009 | 2009 Dec | 0 | 0 | 0 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1580 | GSE9465 | 12/1/2007 | ['9465'] | [] | [u'19057703'] | 2592270 | [u'19057703'] | ['Lang', 'Natarajan', 'Wang', 'Goonewardena', 'Linares', 'Breysse', 'Moreno-Vinasco', 'Samet', 'Garcia', 'Huang', 'Geyh', 'Lussier', 'Grabavoy'] | ['Lang', 'Natarajan', 'Wang', 'Goonewardena', 'Linares', 'Breysse', 'Moreno-Vinasco', 'Samet', 'Garcia', 'Huang', 'Geyh', 'Lussier', 'Grabavoy'] | ['Lang', 'Natarajan', 'Wang', 'Goonewardena', 'Linares', 'Moreno-Vinasco', 'Samet', 'Garcia', 'Huang', 'Breysse', 'Geyh', 'Lussier', 'Grabavoy'] | Environ Health Perspect | 2008 | 2008 Nov | 0 | AND pmc_gds | 1 | 0 | ||||
1581 | GSE9471 | 10/30/2007 | ['9471'] | ['3080'] | [u'18028544'] | 2258187 | [u'18028544'] | ['Bucan', 'Yang', 'Wang', 'Hannenhalli', 'Valladares'] | ['Bucan', 'Yang', 'Wang', 'Hannenhalli', 'Valladares'] | ['Hannenhalli', 'Yang', 'Wang', 'Bucan', 'Valladares'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1582 | GSE9476 | 11/1/2007 | ['9476'] | ['3057'] | [u'17910043'] | 2865519 | [u'20463876'] | ['Stirewalt', 'Dorcy', 'McQuary', 'Cronk', 'Hockenbery', 'Engel', 'Wood', 'Fan', 'Radich', 'Heimfeld', 'Kopecky', 'Pogosova-Agadjanyan', 'Meshinchi', u'Shannon'] | ['Stephan-Otto', 'Riester', 'Downey', 'Singer', 'Michor'] | [] | PLoS Comput Biol | 2010 | 5/6/2010 | 0 | the same study as well as expression data of 3 human embryonic stem cell lines (hESCs) and 3 hESC derived mesenchymal precursor lines (downloaded from NCBI Geo [47] accession number GSE7332 [48] ). We use gene expression data of AML [47] patient samples available within GEO (accession numbers GSE1159, GSE9476{{tag}}--REUSE-- [49] , GSE1729 [|GSE12417 [51] ). The breast cancer dataset is also compiled from Microarray data published in GEO with dataset numbers GSE7390 [16] , GSE2990 [15] , GSE3494 [17] , and GSE9574 [23] . A problem of micrarray meta-analyses is that the different dataset sources may introduce a bias. We therefore applied hierachical cluster | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1583 | GSE9482 | 11/1/2007 | ['9482'] | [] | [u'18087042'] | 2409234 | [u'18087042'] | ['Jacobson', 'Johansson', 'Spatrick', 'Li', 'He'] | ['Jacobson', 'Johansson', 'Spatrick', 'Li', 'He'] | ['Jacobson', 'Johansson', 'Li', 'He', 'Spatrick'] | Proc Natl Acad Sci U S A | 2007 | 12/26/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1584 | GSE9484 | 11/2/2007 | ['9484'] | [] | [u'18249175'] | 2828690 | [u'18249175'] | ['Hopkins', 'Walsh', 'LeBrasseur', 'Zeng', 'Morris', 'Hamilton', 'Viereck', 'Ouchi', 'Izumiya', 'Sato'] | ['Hopkins', 'Walsh', 'LeBrasseur', 'Zeng', 'Morris', 'Hamilton', 'Viereck', 'Ouchi', 'Izumiya', 'Sato'] | ['Hopkins', 'Walsh', 'LeBrasseur', 'Zeng', 'Morris', 'Hamilton', 'Viereck', 'Ouchi', 'Izumiya', 'Sato'] | Cell Metab | 2008 | 2008 Feb | 0 | AND pmc_gds | 1 | 0 | ||||
1585 | GSE9485 | 11/1/2007 | ['9485'] | [] | [u'18284693'] | 2263045 | [u'18284693'] | ['Kurn', 'Levy', 'Wang', 'Dexheimer', 'Miller', 'Von', 'Heath', 'Watson', 'Spencer'] | ['Kurn', 'Levy', 'Wang', 'Dexheimer', 'Miller', 'Von', 'Heath', 'Watson', 'Spencer'] | ['Kurn', 'Levy', 'Wang', 'Dexheimer', 'Miller', 'Von', 'Heath', 'Watson', 'Spencer'] | BMC Genomics | 2008 | 2/19/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1586 | GSE9486 | 11/1/2007 | ['9486'] | [] | [u'18087042'] | 2409234 | [u'18087042'] | ['', 'Johansson', 'Jacobson', 'Li', 'He', 'Spatrick'] | ['Jacobson', 'Johansson', 'Spatrick', 'Li', 'He'] | ['Jacobson', 'Johansson', 'Li', 'He', 'Spatrick'] | Proc Natl Acad Sci U S A | 2007 | 12/26/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1587 | GSE9494 | 12/25/2007 | ['9494'] | [] | [u'18086846'] | 2258546 | [u'18086846'] | ['Silverman', 'Muthaiyan', 'Wilkinson', 'Jayaswal'] | ['Silverman', 'Muthaiyan', 'Wilkinson', 'Jayaswal'] | ['Silverman', 'Muthaiyan', 'Wilkinson', 'Jayaswal'] | Antimicrob Agents Chemother | 2008 | 2008 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
1588 | GSE9496 | 11/2/2007 | ['9496'] | [] | [u'18167336'] | 2243238 | [u'18167336'] | ['Hallstrom', 'Mori', 'Nevins'] | ['Hallstrom', 'Mori', 'Nevins'] | ['Hallstrom', 'Mori', 'Nevins'] | Cancer Cell | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1589 | GSE9499 | 11/1/2007 | ['9499'] | [] | [u'18029387'] | 2975422 | [u'21047384'] | ['Ying', 'Jin', 'Fields', 'Soo', 'Peng', 'Liu', 'Wu', 'Qiu', 'Robertson', 'Tao', 'Delmas'] | ['Xu', 'Hu'] | [] | BMC Genomics | 2010 | 11/2/2010 | 0 | essed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test|dary region in the 2-D feature space of average gene expression (AG) versus average difference of gene expression (AD). Fig. 1 . shows the distribution of true DEGs in the 2D space for four datasets: GSE9499{{tag}}--REUSE--, GSE6342, GSE6740_1, and GSE6740_2 from GEO database [ 22 ]. Based on the fact that boundary region is characterized with scarcity of genes, a density based pruning algorithm is proposed here for p|s as collected by Kadota et. al. [ 20 ]. They collected 38 microarray datasets with experimentally determined true DEGS by real-time polymerase chain reaction (RT-PCR). Thirty six of the datasets are downloaded from GEO database [ 22 ]. Without losing generality, we experimented with 17 disease or dose response datasets of Homo sapiens out of the 36 GEO datasets (Table 1 ). The 17 datasets are reported j|how that real-world GEO datasets, especially historical ones, tend to have small sample size. Table 1 17 Datasets with 284 DEGs in total. Each dataset has 22833 genes. Dataset Conditions True DEG A B GSE1462 4 4 4 GSE1615_1 4 5 8 GSE1650 18 12 8 GSE2666_2 5 5 6 GSE3524 16 4 4 GSE3860 9 9 8 GSE4917 3 3 5 GSE5667_1 5 6 3 GSE6236 14 14 7 GSE6344 10 10 19 GSE6740_1 10 10 40 GSE6740_2 10 10 62 GSE7146 6 6 6|GSE8441 11 11 9 GSE9499{{tag}}--REUSE-- 15 7 77 GSE9574 15 14 5 The 17 Datasets used here cover a variety of biological or medical studies: GSE1462 (mitochondrial DNA mutations), GSE1615_1 (Valproic acid treatment), GSE1650 (chronic obstructive pulmonary disease), GSE2666_2(bone marrow Rho level effect), GSE3524 (tumor of epithelial tissue), GSE3860 (Hutchinson-Gilford progeria syndrome), GSE4917 (breast cancer), GSE5|67_1 (atopic dermatitis), GSE6236 (Adult vs. fetal reticulocyte transcriptome comparison), GSE6344 (renal cell carcinoma disease), GSE6740_1 (HIV-infection), GSE6740_2 (HIV-infection, disease state), GSE7146 (hyperinsulinaemic, does response), GSE7765 (dose response, DMSO or 100 nM Dioxin), GSE8441 (dietary intake response), GSE9574 (breast cancer), and GSE9499{{tag}}--REUSE-- (hypomorphic germline mutations). The div|ch as WAD and FC. To illustrate the bias of popular DEG identification algorithms, Fig. 2 shows true positive (TP), true negative (TN), false positive (FP), and false negative (FN) DEGs for dataset GSE9499{{tag}}--REUSE-- which has 77 true DEGs. Fig. 2 . (a) shows that fold change (FC) misses most true DEGs (FN genes), which are located in the region below the threshold average difference and with high expression le|obtaining a user-specified number of candidate DEGs. Table 2 Comparison of No. of missing true DEGs after DB pruning. ( N 0 = 4, R 0 = 0.0017) Total Gene: 22283 After DP-pruning True DEG DP missed GSE1462 2054 4 0 GSE1615_1 2449 8 3 GSE1650 1317 8 2 GSE2666_2 1618 6 2 GSE3524 814 4 0 GSE3860 2073 8 0 GSE4917 785 5 1 GSE5667_1 1316 3 0 GSE6236 2231 7 0 GSE6344 3127 19 0 GSE6740_1 1183 40 1 GSE6740_2 |DEG to have high expression levels and high expression difference. Table 3 Ranks of true DEGs in original gene list and pruned gene list. Genes are sorted by four DEG identification algorithms on the GSE1577 dataset. Increase of ranks of true DEGs means that DB pruning have correctly filtered out many non-DEGs. t-test/tTest’ 1404/808 7/6 1321/768 3800/1713 4741/1975 3633/1659 4145/1828 606/388 |on level remains the same. And the average difference of expression can be defined as sum of average difference among pairwise comparisons. Our DB pruning is implemented using C++ and Perl and can be downloaded from http://mleg.cse.sc.edu/degprune . Authors’ contributions JH initiated the project, proposed the DB pruning idea, helped in experimental designs, and wrote the manuscript. J. X. develo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1590 | GSE9514 | 11/6/2007 | ['9514'] | ['3438', '3437'] | [u'18326586'] | 2394968 | [u'18326586'] | ['Philpott', 'Storey', 'Androphy', 'Shakoury-Elizeh', 'Keane', 'Protchenko'] | ['Philpott', 'Storey', 'Androphy', 'Shakoury-Elizeh', 'Keane', 'Protchenko'] | ['Philpott', 'Storey', 'Androphy', 'Shakoury-Elizeh', 'Keane', 'Protchenko'] | Eukaryot Cell | 2008 | 2008 May | 0 | AND pmc_gds | 1 | 0 | ||||
1591 | GSE9520 | 11/7/2007 | ['9520'] | ['3062'] | [u'17916801'] | 2923118 | [u'20670406'] | ['Yl\xc3\xb6stalo', 'Larson', u'Ylostalo', 'Prockop'] | ['Wang', 'Tsai', 'Liu', 'Hou', 'Hung', 'Chen', 'Lee'] | [] | J Biomed Sci | 2010 | 7/29/2010 | 0 | Partek ® Genomics Suite™ software (Partek Incorporated, St. Louis, MO, http://www.partek.com ). All microarray datasets in this paper are available at GEO under the accession no. of GSE7234 and GSE9520{{tag}}--DEPOSIT--. Results Downregulation of Oct4 and Nanog and upregulation of developmental markers and lineage-specific genes during expansion of primary hMSCs Embryonic transcription factors, such as | 0 | 1 | 0 | NOT pmc_gds | 0 | 1 |
1592 | GSE9549 | 12/12/2007 | ['9549'] | ['3149'] | [u'18158318'] | 2234364 | [u'18158318'] | ['Johnson', 'Bammler', 'McMahan', 'Riehle', 'Beyer', 'Campbell', 'Fausto'] | ['Johnson', 'Bammler', 'McMahan', 'Riehle', 'Beyer', 'Campbell', 'Fausto'] | ['Johnson', 'Bammler', 'Beyer', 'Riehle', 'McMahan', 'Campbell', 'Fausto'] | J Exp Med | 2008 | 1/21/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1593 | GSE9561 | 11/22/2007 | ['9561'] | [] | [u'18035408'] | 2941458 | [u'20862250'] | ['Ichisaka', 'Narita', 'Tomoda', 'Ohnuki', 'Tanabe', 'Yamanaka', 'Takahashi'] | ['Izpis\xc3\xbaa', 'Barrero', 'Paramonov', 'Bou\xc3\xa9'] | [] | PLoS One | 2010 | 9/17/2010 | 0 | sor genes differing in iPSCs from ESCs, and which could be the source of higher risks. Materials and Methods Gene expression analysis The datasets used for the human analyses are: Takahashi et al. (GSE9561{{tag}}--REUSE--) [5] ; Yu et al. (GSE9071) [7] ; Park et al. (GSE9832) [6] ; Zhao et al. (GSE12922) [52] ; Masaki et al. (GSE9709) �|2390) [30] ; Aasen et al. (GSE12583) [37] ; Huangfu et al. (pers. comm.) [53] ; Lowry et al. (GSE9865) [54] ; Ebert et al. (GSE13828) [15] ; Yu et al. (GSE15148) [55] ; Soldner et al. (GSE14711) [11] . The datasets used for the mouse analyses are: Takahashi et al. (GSE525|7815) [30] ; Feng et al. (GSE13211) [56] ; Sridharan et al. (GSE14012) [35] ; Wernig et al. (E-MEXP1037) [4] ; Chen et al . (GSE15267); Zhou et al . (GSE16062) [57] ; Zhao et al . (GSE16925) [20] ; Kang et al . (GSE17004) [19] ; Heng et al . (GSE19023) [58] ; Ic|(wpr =  pr *weight). Next, the average percentrank and the average weighted percentrank were identified for the replicates of each sample. In addition, for the dataset GSE7841 we have averaged the available iPSCs samples (day2, day16, day17 and day18). For the dataset E-MEXP-1037 we have averaged iPSCs samples (clones 8 and 18). For the dataset GSE13211 we have averaged |OSCE (clones 8 and 13) and iPSCs OSE (clones T8 and T9) samples. For the dataset GSE14012 we have averaged ESCs (v6.5 and E14), MEFs (male and female) and iPSCs (1D4 and 2D4) samples. For the dataset GSE15267 we have averaged ESCs (CGR8 and R1), iPSCs reprogrammed with four factors (S2C12 and S2C16) and iPSCs reprogrammed with 3 factors (S53C1 and S53C5). For the dataset GSE19023 we have averaged MEFs |teworthy that the bivalent genes profiles of the iPSC lines described to contribute to viable mice through tetraploid complementation assay (the most stringent proof of pluripotency available so far, GSE16925 and GSE17004) have the highest correlation coefficients when compared with the ESC lines. As expected, the correlation between bivalent genes profiles of fibroblasts and ESCs is very low and espec|es whose expression in iPSCs could restrict or at least bias the differentiation potential. Encouragingly, the iPSC lines that were shown to generate viable mice by tetraploid complementation assays (GSE16925 and GSE17004) express none to very few of such genes, whereas the first iPSCs generated that did not contribute to the germline (GSE5259), as well as the partially reprogrammed iPSC lines (GSE1401|umber of these potentially troublesome genes ( Figure 4 ). For example, the partially reprogrammed iPSC lines 1A2 and 1B3 (GSE14012), as well as the Fbx15KO iPSC line, which showed a limited potency (GSE5259), express Hoxc8, which is a homeodomain gene important for early embryogenesis, especially for neural development, and whose expression level is normally tightly regulated [78] an|133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) Phalanx Human one aray (GPL6254) Affy HG-U133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) GEO accession number GSE12390 Pers. Comm GSE9865 GSE12583 GSE12922 GSE13828 GSE15148 GSE14711 Corr coeff whole array iPS/ES: average (min-max) Primary iPS: 0.988 (0.984–0.989) Secondary iPS: 0.991 (0.990–0.991) | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1594 | GSE9566 | 12/31/2007 | ['9566'] | [] | [u'18171944'] | 2838193 | [u'20219994'] | ['Cahoy', 'Kaushal', 'Barres', 'Zamanian', 'Krupenko', 'Emery', 'Christopherson', 'Lubischer', 'Thompson', 'Xing', 'Foo', 'Krieg'] | ['Bergles', 'Nishiyama', 'De'] | [] | J Neurosci | 2010 | 3/10/2010 | 0 | se for each specific cell type using Affymetrix GeneChip Mouse Genome 430 2.0 and Exon 1.0 ST arrays. In order to examine changes in gene expression that accompany oligodendrocyte differentiation, we downloaded their raw data on oligodendrocyte progenitor cells (OPCs), premyelinating oligodendrocytes (OLs), and myelinating oligodendrocytes (Myelin OLs) from the NCBI Gene Expression Omnibus (GEO, http://w | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1595 | GSE9566 | 12/31/2007 | ['9566'] | [] | [u'18171944'] | 2717951 | [u'19545436'] | ['Cahoy', 'Kaushal', 'Barres', 'Zamanian', 'Krupenko', 'Emery', 'Christopherson', 'Lubischer', 'Thompson', 'Xing', 'Foo', 'Krieg'] | ['Heber', 'Sick', 'Howard'] | [] | BMC Bioinformatics | 2009 | 6/22/2009 | 1 | indicator distributions with those obtained from 3' expression arrays. Methods Datasets Our first dataset is a set of 603 Affymetrix raw intensity microarray data files, from 32 distinct experiments downloaded from the NCBI GEO database [ 28 ]. A variety of Affymetrix GeneChip 3' Expression array types are represented in the dataset, including: ath1121501 (Arabidopsis, 248 chips; GEO accession numbers: G|5759, GSE911 [ 29 ], GSE2538 [ 30 ], GSE3350 [ 31 ], GSE3416 [ 32 ], GSE5534, GSE5535, GSE5530, GSE5529, GSE5522, GSE5520, GSE1491 [ 33 ], GSE2169, GSE2473), hgu133a (human, 72 chips; GSE1420 [ 34 ], GSE1922), hgu95av2 (human, 51 chips; GSE1563 [ 35 ]), hgu95d (human, 22 chips; GSE1007 [ 36 ]), hgu95e (human, 21 chips; GSE1007), mgu74a (mouse, 60 chips; GSE76, GSE1912 [ 37 ]), mgu74av2 (mouse, 29 chips| ], GSE1419 [ 39 , 40 ]), moe430a (mouse, 10 chips; GSE1873 [ 41 ]), mouse4302 (mouse, 20 chips; GSE5338 [ 42 ], GSE1871 [ 43 ]), rae230a (rat, 26 chips; GSE1918, GSE2470), and rgu34a (rat, 44 chips; GSE5789 [ 44 ], GSE1567 [ 45 ], GSE471 [ 46 ]). These experiments cover many of the species commonly analyzed using the GeneChip platform, and were selected to represent a variety of tissue types and exper|xperiment. The second dataset consists of all of the exon array .CEL files available in the GEO database at the time of this analysis (540 .CEL files). Fourteen different experiments are represented (GSE10599 [ 47 ], GSE10666 [ 48 ], GSE11150 [ 49 ], GSE11344 [ 50 ], GSE11967 [ 51 ], GSE12064 [ 52 ], GSE6976 [ 53 ], GSE7760 [ 54 ], GSE7761 [ 55 ], GSE8945 [ 56 ], GSE9342, GSE9372 [ 57 ], GSE9385 [ 58 ]| Reliability analysis of microarray data using fuzzy c-means and normal mixture modeling based classification methods Bioinformatics 2005 21 644 9 15374860 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Research 2002 30 207 10 11752295 William DA Su Y Smith MR Lu M Baldwin DA Wagner D Genomic identification of direct target | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1596 | GSE9566 | 12/31/2007 | ['9566'] | [] | [u'18171944'] | 2818769 | [u'20129251'] | ['Cahoy', 'Kaushal', 'Barres', 'Zamanian', 'Krupenko', 'Emery', 'Christopherson', 'Lubischer', 'Thompson', 'Xing', 'Foo', 'Krieg'] | ['Wilson', 'Feiler', 'Gray', 'Kahn', 'Alexe', 'Wilkerson', 'Lawrence', 'Purdom', 'Sarkaria', 'Weir', 'Gabriel', 'Hodgson', "O'Kelly", 'Verhaak', 'James', 'Meyerson', 'Hayes', 'Tamayo', 'Qi', 'Spellman', 'Ding', 'Wang', 'Gupta', 'Miller', 'Mesirov', 'Getz', 'Brennan', 'Perou', 'Jakkula', 'Hoadley', 'Speed', 'Winckler', 'Golub'] | [] | Cancer Cell | 2010 | 1/19/2010 | 0 | ed and mutations are in Table S1 and Table S2 . Gene Sets and Single Sample GSEA Gene sets were generated using the transcriptome database presented in Cahoy at al. ( Cahoy et al., 2008 ) (GEO ID GSE9566{{tag}}--REUSE-- ). Expression values for 17,021 murine genes were generated using gene centric probe set definitions ( Liu et al., 2007 ). Hierarchical clustering of 38 normal murine brain samples in this dataset | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1597 | GSE9566 | 12/31/2007 | ['9566'] | [] | [u'18171944'] | 2856555 | [u'20302627'] | ['Cahoy', 'Kaushal', 'Barres', 'Zamanian', 'Krupenko', 'Emery', 'Christopherson', 'Lubischer', 'Thompson', 'Xing', 'Foo', 'Krieg'] | ['Cyr', 'Wiltshire', 'Lee', 'Zhang', 'Kang', 'Gusella', 'Wheeler', 'Kohane', 'Su', 'MacDonald', 'Dragileva', 'Boily', 'Walker', 'Gillis', 'Lopez'] | [] | BMC Syst Biol | 2010 | 3/19/2010 | 0 | th Hdh Q 111/+ mice as our training set (Figure 2B , excluding tail), with instability index as a quantitative phenotype, we analyzed mouse tissue gene expression data (Mouse Gene Expression Atlas GSE11339, C57BL/6J, 10 weeks) to identify a gene expression signature that correlated with tissue repeat instability. Hdh Q 111 somatic instability (and therefore instability index) increases over time [| with instability indices predicted by the regression model (blue, 2 gene expression replicates). Secondly, we predicted instability indices using independent striatum and cerebellum microarray data (GSE9025, Hdh Q 111/+ , 5 months, n = 1), and compared them to measured instability indices (red). RMSEP, root mean squared error of prediction. We then confirmed the predictive power of this instability-c|at mediate expansion. Figure 5 HD pathogenesis and somatic instability . (A) We profiled gene expression in striatum and cerebellum in Hdh Q 111/111 and control ( Hdh +/+ ) mice at 10 weeks of age (GSE19780). Expression values of the 150 instability-correlated probes were used to predict instability indices based on our regression model. Data bars represent mean ± SD (n = 3-5 mice per genotyp|atum; CB, cerebellum; LV, liver; SP, spleen. (C) Microarray expression levels of Msh3 (Affymetrix probe ID, 1430643_at and 1446511_at) in FACS-purified astrocytes and neurons were obtained from the GSE9566{{tag}}--REUSE-- data set. Msh3 expression level in astrocytes was not significantly different from that in neurons. Data bar represents mean of log2 expression levels ± SD (n = 8-10). Selective neuronal |is of GNF mouse Gene Expression Atlas and regression modeling We used the mouse tissue gene expression database of Genomics Institute of the Novartis Research Foundation (mouse Gene Expression Atlas, GSE11339). All microarrays were background corrected and normalized by gcRMA. To identify an instability-correlated gene expression signature, Pearson correlation coefficients and corresponding p values be | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1598 | GSE9568 | 12/5/2007 | ['9568'] | [] | [u'18042715'] | 2148304 | [u'18042715'] | ['Farnham', 'Hogart', 'Vallero', 'Thatcher', 'Bieda', u'LaSalle', 'Lasalle', 'Yasui', 'Peddada', 'Nagarajan'] | ['Farnham', 'Hogart', 'Vallero', 'Thatcher', 'Bieda', 'Lasalle', 'Yasui', 'Peddada', 'Nagarajan'] | ['Farnham', 'Hogart', 'Vallero', 'Thatcher', 'Bieda', 'Lasalle', 'Nagarajan', 'Peddada', 'Yasui'] | Proc Natl Acad Sci U S A | 2007 | 12/4/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1599 | GSE9570 | 11/30/2007 | ['9570'] | [] | [u'18087037'] | 2781331 | [u'19726549'] | ['Nigam', 'Choi', 'Johkura', 'Rosines', 'Sakurai', 'Shah', 'Vaughn', 'Sampogna'] | ['Gallegos', 'Nigam', 'Choi', 'Sweeney', 'Sakurai', 'Tee', 'Bush', 'Johkura', 'Rosines', 'Kouznetsova', 'Rose', 'Shah', 'Meyer-Schwesinger', 'Meyer'] | ['Nigam', 'Johkura', 'Rosines', 'Sakurai', 'Shah', 'Choi'] | Am J Physiol Renal Physiol | 2009 | 2009 Nov | 0 | testing correction via Benjamini and Hochberg's false discovery rate procedure ( 3 ). Raw data for the whole embryonic kidney time points are available at the NCBI GEO website with accession number GSE9570{{tag}}--DEPOSIT-- .  Other Sectionsâ�¼ Abstract MATERIALS AND METHODS RESULTS DISCUSSION GRANTS REFERENCES RESULTS For recombination experiments, isolated rat UBs cultured for 6–7 days were dissected from the | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1600 | GSE9570 | 11/30/2007 | ['9570'] | [] | [u'18087037'] | 2409245 | [u'18087037'] | ['Nigam', 'Choi', 'Johkura', 'Rosines', 'Sakurai', 'Shah', 'Vaughn', 'Sampogna'] | ['Nigam', 'Choi', 'Johkura', 'Rosines', 'Sakurai', 'Shah', 'Vaughn', 'Sampogna'] | ['Nigam', 'Johkura', 'Rosines', 'Sakurai', 'Shah', 'Vaughn', 'Choi', 'Sampogna'] | Proc Natl Acad Sci U S A | 2007 | 12/26/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1601 | GSE9574 | 12/10/2007 | ['9574'] | ['3139'] | [u'18058819'] | 2975422 | [u'21047384'] | ['Gerry', 'King', 'Stone', 'Tripathi', 'Rosenberg', 'de', 'Kavanah', 'Antoine', 'Perry', 'Lenburg', 'Hirsch', 'Mendez', 'Burke'] | ['Xu', 'Hu'] | [] | BMC Genomics | 2010 | 11/2/2010 | 0 | essed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test|dary region in the 2-D feature space of average gene expression (AG) versus average difference of gene expression (AD). Fig. 1 . shows the distribution of true DEGs in the 2D space for four datasets: GSE9499, GSE6342, GSE6740_1, and GSE6740_2 from GEO database [ 22 ]. Based on the fact that boundary region is characterized with scarcity of genes, a density based pruning algorithm is proposed here for p|s as collected by Kadota et. al. [ 20 ]. They collected 38 microarray datasets with experimentally determined true DEGS by real-time polymerase chain reaction (RT-PCR). Thirty six of the datasets are downloaded from GEO database [ 22 ]. Without losing generality, we experimented with 17 disease or dose response datasets of Homo sapiens out of the 36 GEO datasets (Table 1 ). The 17 datasets are reported j|how that real-world GEO datasets, especially historical ones, tend to have small sample size. Table 1 17 Datasets with 284 DEGs in total. Each dataset has 22833 genes. Dataset Conditions True DEG A B GSE1462 4 4 4 GSE1615_1 4 5 8 GSE1650 18 12 8 GSE2666_2 5 5 6 GSE3524 16 4 4 GSE3860 9 9 8 GSE4917 3 3 5 GSE5667_1 5 6 3 GSE6236 14 14 7 GSE6344 10 10 19 GSE6740_1 10 10 40 GSE6740_2 10 10 62 GSE7146 6 6 6|GSE8441 11 11 9 GSE9499 15 7 77 GSE9574{{tag}}--REUSE-- 15 14 5 The 17 Datasets used here cover a variety of biological or medical studies: GSE1462 (mitochondrial DNA mutations), GSE1615_1 (Valproic acid treatment), GSE1650 (chronic obstructive pulmonary disease), GSE2666_2(bone marrow Rho level effect), GSE3524 (tumor of epithelial tissue), GSE3860 (Hutchinson-Gilford progeria syndrome), GSE4917 (breast cancer), GSE5|67_1 (atopic dermatitis), GSE6236 (Adult vs. fetal reticulocyte transcriptome comparison), GSE6344 (renal cell carcinoma disease), GSE6740_1 (HIV-infection), GSE6740_2 (HIV-infection, disease state), GSE7146 (hyperinsulinaemic, does response), GSE7765 (dose response, DMSO or 100 nM Dioxin), GSE8441 (dietary intake response), GSE9574{{tag}}--REUSE-- (breast cancer), and GSE9499 (hypomorphic germline mutations). The div|ch as WAD and FC. To illustrate the bias of popular DEG identification algorithms, Fig. 2 shows true positive (TP), true negative (TN), false positive (FP), and false negative (FN) DEGs for dataset GSE9499 which has 77 true DEGs. Fig. 2 . (a) shows that fold change (FC) misses most true DEGs (FN genes), which are located in the region below the threshold average difference and with high expression le|obtaining a user-specified number of candidate DEGs. Table 2 Comparison of No. of missing true DEGs after DB pruning. ( N 0 = 4, R 0 = 0.0017) Total Gene: 22283 After DP-pruning True DEG DP missed GSE1462 2054 4 0 GSE1615_1 2449 8 3 GSE1650 1317 8 2 GSE2666_2 1618 6 2 GSE3524 814 4 0 GSE3860 2073 8 0 GSE4917 785 5 1 GSE5667_1 1316 3 0 GSE6236 2231 7 0 GSE6344 3127 19 0 GSE6740_1 1183 40 1 GSE6740_2 |DEG to have high expression levels and high expression difference. Table 3 Ranks of true DEGs in original gene list and pruned gene list. Genes are sorted by four DEG identification algorithms on the GSE1577 dataset. Increase of ranks of true DEGs means that DB pruning have correctly filtered out many non-DEGs. t-test/tTest’ 1404/808 7/6 1321/768 3800/1713 4741/1975 3633/1659 4145/1828 606/388 |on level remains the same. And the average difference of expression can be defined as sum of average difference among pairwise comparisons. Our DB pruning is implemented using C++ and Perl and can be downloaded from http://mleg.cse.sc.edu/degprune . Authors’ contributions JH initiated the project, proposed the DB pruning idea, helped in experimental designs, and wrote the manuscript. J. X. develo | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1602 | GSE9574 | 12/10/2007 | ['9574'] | ['3139'] | [u'18058819'] | 2865519 | [u'20463876'] | ['Gerry', 'King', 'Stone', 'Tripathi', 'Rosenberg', 'de', 'Kavanah', 'Antoine', 'Perry', 'Lenburg', 'Hirsch', 'Mendez', 'Burke'] | ['Stephan-Otto', 'Riester', 'Downey', 'Singer', 'Michor'] | [] | PLoS Comput Biol | 2010 | 5/6/2010 | 0 | the same study as well as expression data of 3 human embryonic stem cell lines (hESCs) and 3 hESC derived mesenchymal precursor lines (downloaded from NCBI Geo [47] accession number GSE7332 [48] ). We use gene expression data of AML [47] patient samples available within GEO (accession numbers GSE1159, GSE9476 [49] , GSE1729 [|GSE12417 [51] ). The breast cancer dataset is also compiled from Microarray data published in GEO with dataset numbers GSE7390 [16] , GSE2990 [15] , GSE3494 [17] , and GSE9574{{tag}}--REUSE-- [23] . A problem of micrarray meta-analyses is that the different dataset sources may introduce a bias. We therefore applied hierachical cluster | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1603 | GSE9578 | 11/10/2007 | ['9578'] | [] | [u'18077432'] | 2148418 | [u'18077432'] | ['Wang', 'Shenk', 'Murphy', u'Schroer', 'Schr\xc3\xb6er', 'Yu'] | ['Schr\xc3\xb6er', 'Murphy', 'Yu', 'Wang', 'Shenk'] | ['Schr\xc3\xb6er', 'Murphy', 'Yu', 'Wang', 'Shenk'] | Proc Natl Acad Sci U S A | 2007 | 12/11/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1604 | GSE9590 | 11/15/2007 | ['9590'] | [] | [u'18281432'] | 2293160 | [u'18281432'] | ['Hazelwood', 'Daran', 'Dickinson', 'van', 'Pronk'] | ['Hazelwood', 'Daran', 'Dickinson', 'van', 'Pronk'] | ['Hazelwood', 'Daran', 'Dickinson', 'van', 'Pronk'] | Appl Environ Microbiol | 2008 | 2008 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
1605 | GSE9601 | 11/15/2007 | ['9601'] | ['3175'] | [u'18003728'] | 2736317 | [u'19427341'] | ['Yurochko', 'Smith', 'Bivins-Smith', 'Chan'] | ['Yurochko', 'Smith', 'Bivins-Smith', 'Chan'] | ['Yurochko', 'Smith', 'Bivins-Smith', 'Chan'] | Virus Res | 2009 | 2009 Sep | 0 | ss if NF-κB and PI(3)K biological activities are responsible for the HCMV-induced distinct M1/M2 reprogramming of monocytes, we combined our previous bioinformatic data (GEO accession number GSE11408 and GSE 9601{{tag}}--REUSE--) and re-examined the transcriptome of HCMV-infected monocytes for changes in the M1/M2 polarization that were dependent on NF-κB and/or PI(3)K activities. Microarray analysis | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1606 | GSE9601 | 11/15/2007 | ['9601'] | ['3175'] | [u'18003728'] | 2224586 | [u'18003728'] | ['Yurochko', 'Smith', 'Bivins-Smith', 'Chan'] | ['Yurochko', 'Smith', 'Bivins-Smith', 'Chan'] | ['Yurochko', 'Smith', 'Bivins-Smith', 'Chan'] | J Virol | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1607 | GSE9609 | 12/11/2007 | ['9609'] | [] | [u'18304490'] | 2427204 | [u'18304490'] | ['', 'Menzel', 'von', 'Pedersen', 'Gijsbers', 'Diaz', 'Dumanski', 'Bruder', 'Crasto', 'van', 'Wirdefeldt', 'Sandgren', 'Partridge', 'Komorowski', 'Erickson', 'Andersson', 'Allison', 'Crowley', 'Tiwari', 'Boomsma', 'Piotrowski', 'den', 'Poplawski'] | ['Menzel', 'von', 'Pedersen', 'Gijsbers', 'Diaz', 'Dumanski', 'Bruder', 'Crasto', 'van', 'Wirdefeldt', 'Sandgren', 'Partridge', 'Komorowski', 'Erickson', 'Andersson', 'Allison', 'Crowley', 'Tiwari', 'Boomsma', 'Piotrowski', 'den', 'Poplawski'] | ['Menzel', 'von', 'Pedersen', 'Gijsbers', 'Diaz', 'Dumanski', 'Bruder', 'Crasto', 'van', 'Wirdefeldt', 'Sandgren', 'Partridge', 'Komorowski', 'Erickson', 'Andersson', 'Allison', 'Crowley', 'Tiwari', 'Boomsma', 'Piotrowski', 'den', 'Poplawski'] | Am J Hum Genet | 2008 | 2008 Mar | 0 | AND pmc_gds | 1 | 0 | ||||
1608 | GSE9622 | 11/17/2007 | ['9622'] | [] | [u'18162535'] | 2224196 | [u'18162535'] | ['Wang', 'Zhong', 'Pfeifer', 'Riggs', 'Wu', 'Rauch', 'Kernstine'] | ['Wang', 'Zhong', 'Pfeifer', 'Riggs', 'Wu', 'Rauch', 'Kernstine'] | ['Wang', 'Zhong', 'Pfeifer', 'Riggs', 'Wu', 'Rauch', 'Kernstine'] | Proc Natl Acad Sci U S A | 2008 | 1/8/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1609 | GSE9633 | 12/1/2007 | ['9633'] | ['3155'] | [u'18047674'] | 2943622 | [u'20660011'] | ['Lee', 'Clark', 'Luo', 'Huang', 'Wang', 'Reeves', 'Xu'] | ['Taschereau', 'Plaisier', 'Wong', 'Graeber'] | [] | Nucleic Acids Res | 2010 | 9/1/2010 | 0 | res are then characterized using algorithms that measure statistical enrichment for genes in particular pathways, with particular functions or with particular structural characteristics attained from publicly available databases. The statistical significance of enrichment is typically determined using the hypergeometric distribution or equivalently the one-tailed version of Fisher’s exact test. Al| and homologs were identified using HomoloGene; only the probe with the highest absolute signed t -test P -value within those with matching UniGene identifiers was kept in the collapsing step. Data downloaded from Gene-expression Omnibus (GEO) ( 19 ): MPAKT prostate cancer mouse model, GSE1413; breast tumors with ER, PR and HER2 status, GSE2603; MMTV-HER2/neu mouse model, GSE2528; BCR-ABL transfected ce|0912; mammary stem cell, GSE3711; KRAS2 overexpression cell line, GSE3151; lung tumors with KRAS2 status, GSE3141; imatinib treatment in leukemia patients, GSE2535; dasatinib treatment in cell lines, GSE9633{{tag}}--REUSE-- and GSE6569; castration and testosterone treatment in mice, GSE5901; gedunin treatment in prostate cancer cell line, GSE5506. Data downloaded from Array Express ( 20 ): imatinib treatment in leukem|orodeoxyglucose-positron emission tomography (FDG-PET) imaging (N. Palaskas et al. , submitted for publication). Gene-expression information from microarray data repositories such as Gene-expression Omnibus ( 19 ) and ArrayExpress ( 53 ) as well as from more specialized resources such as the drug response profile database Cmap ( 10 ) can be readily converted to ranked list inputs for our program. Thus, |dependency Ann Stat 2001 29 1165 1188 18 Dudoit S Shaffer JP Boldrick JC Multiple hypothesis testing in microarray experiments Stat. Sci. 2003 18 71 103 19 Edgar R Domrachev M Lash AE Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Nucleic Acids Res. 2002 30 207 210 11752295 20 Brazma A Parkinson H Sarkans U Shojatalab M Vilo J Abeygunawardena N Holloway E Kapushesky | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1610 | GSE9633 | 12/1/2007 | ['9633'] | ['3155'] | [u'18047674'] | 2258199 | [u'18047674'] | ['Lee', 'Clark', 'Luo', 'Huang', 'Wang', 'Reeves', 'Xu'] | ['Lee', 'Clark', 'Luo', 'Huang', 'Wang', 'Reeves', 'Xu'] | ['Wang', 'Clark', 'Lee', 'Huang', 'Xu', 'Reeves', 'Luo'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1611 | GSE9635 | 12/11/2007 | ['9635'] | [] | [u'18077431'] | 2148413 | [u'18077431'] | ['Nghiemphu', 'Demichelis', 'Soto', 'Getz', 'Rubin', 'Mischel', 'Shah', 'Debiasi', 'Garraway', 'Cloughesy', 'Lee', 'Vivanco', 'Prensner', 'Sellers', 'Kau', 'Du', 'Alexander', 'Meyerson', 'Huang', 'Hsueh', 'Perner', 'Hatton', 'Nelson', 'Lander', 'Thomas', 'Liau', 'Mellinghoff', 'Barretina', 'Beroukhim', 'Linhart', 'Golub'] | ['Nghiemphu', 'Demichelis', 'Soto', 'Getz', 'Rubin', 'Mischel', 'Shah', 'Debiasi', 'Garraway', 'Cloughesy', 'Lee', 'Vivanco', 'Prensner', 'Sellers', 'Kau', 'Du', 'Alexander', 'Meyerson', 'Huang', 'Hsueh', 'Perner', 'Hatton', 'Nelson', 'Lander', 'Thomas', 'Liau', 'Mellinghoff', 'Barretina', 'Beroukhim', 'Linhart', 'Golub'] | ['Nghiemphu', 'Demichelis', 'Soto', 'Getz', 'Rubin', 'Mischel', 'Shah', 'Debiasi', 'Garraway', 'Cloughesy', 'Lee', 'Vivanco', 'Prensner', 'Sellers', 'Kau', 'Du', 'Alexander', 'Huang', 'Hsueh', 'Meyerson', 'Hatton', 'Perner', 'Lander', 'Thomas', 'Liau', 'Mellinghoff', 'Barretina', 'Beroukhim', 'Linhart', 'Golub', 'Nelson'] | Proc Natl Acad Sci U S A | 2007 | 12/11/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1612 | GSE9650 | 11/27/2007 | ['9650'] | [] | [] | 2776092 | [u'18832700'] | [''] | ['Bevan', 'Kamimura'] | [] | J Immunol | 2008 | 10/15/2008 | 1 | AND pmc_gds | 0 | 1 | ||||
1613 | GSE9650 | 11/27/2007 | ['9650'] | [] | [] | 2964975 | [u'20921622'] | [''] | ['van', 'Yong', 'Hertoghs', 'Baas', 'Moerland', 'Remmerswaal', 'ten'] | [] | J Clin Invest | 2010 | 11/1/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1614 | GSE9654 | 11/21/2007 | ['9654'] | [] | [] | 2647728 | [u'19242607'] | [u'Yoshimoto', u'Selvarajah'] | ['Maire', 'Squire', 'Thorner', 'Yoshimoto', 'Chilton-MacNeill', 'Zielenska'] | [u'Yoshimoto'] | Neoplasia | 2009 | 2009 Mar | 0 | a files have also been used to map the distribution of recurrently deleted and amplified regions in OS [ 15 ] and they are deposited in National Center for Biotechnology Information's Gene Expression Omnibus (GEO) Web site ( http://www.ncbi.nlm.nih.gov/geo/ ) and are accessible through the GEO Series accession number GSE9654{{tag}}--DEPOSIT-- . For each tumor, the normalized data were run through the Circular Binary Segm | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1615 | GSE9676 | 11/24/2007 | ['9676'] | [] | [u'18167544'] | 2148100 | [u'18167544'] | ['Welle', 'Tawil', 'Thornton'] | ['Welle', 'Tawil', 'Thornton'] | ['Welle', 'Tawil', 'Thornton'] | PLoS One | 2008 | 1/2/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1616 | GSE9684 | 11/27/2007 | ['9684'] | [] | [] | 2667328 | [u'18838117'] | [u'Cristobal', u'Wackym', u'Popper', u'Cioffi', u'Erbe'] | ['Winkler', 'Wackym', 'Lerch-Gaggl', 'Popper', 'Siebeneich', 'Erbe'] | [u'Wackym', u'Popper', u'Erbe'] | Hear Res | 2008 | 2008 Dec | 0 | 6, 9, 10, 12, 13 and 15 (Genus BioSystems, Inc., Northbrook, IL), indicated that the two-pore-domain K + channels expressed in Scarpa’s ganglia are Kcnk1, 2, 3, 6, 12 and 15 (GEO accession GSE4458 and GSE9684{{tag}}--MENTION-- ; [ Wackym, 2008 ]). In this study, we determined the expression of these two-pore-domain K + channel genes, through real time PCR and the distribution of four of their proteins (K 2 | 0 | 0 | 1 | NOT pmc_gds | 1 | 0 |
1617 | GSE9692 | 12/1/2007 | ['9692'] | [] | [u'18460642'] | 2620272 | [u'19014681'] | ['Thomas', 'Cvijanovich', 'Shanley', u'Stanley', 'Freishtat', 'Lin', u'Checcia', 'Odoms', 'Wong', 'Sakthivel', 'Monaco', 'Allen', u'Thoams', 'Checchia', 'Anas'] | ['Tozeren', 'Gormley'] | [] | BMC Bioinformatics | 2008 | 11/17/2008 | 1 | or therapeutic targets in tissue-specific diseases. Table 1 Microarray datasets used in this study Tissue Phenotype Data Tissue No. of Samples Gene Expression Omnibus/Array Express Accn. # Adipose 10 GSE3526 Adrenal 20 GSE3526, GSE8514, GSE2316 Brain 89 GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, Colon 10 E-TABM-176, GSE8671, GSE9254, GSE9452 Epidermal 25 GSE1133, GSE2361, GSE3419, GSE352|38 E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 Kidney 10 E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 Liver 10 E_AFMX-11, GSE2004, GSE3526, GSE6764 Lung 26 E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 Mammary 15 E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 Muscle 64 GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, Ovary 10 GSE2361, GSE3526, GSE6008, GS|ancreas 6 GSE1133, GSE2361, GSE7307 Peripheral blood 12 GSE7462, GSE8608, GSE8668, GSE8762, GSE9692{{tag}}--REUSE-- Small intestine 7 GSE2361, GSE7307 Spleen 12 GSE2004, GSE2361, GSE3526, GSE7307 Stomach 10 GSE2361, GSE3526, GSE7307 Testis 38 E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 Thymus 5 GSE1133, GSE2361, GSE7307 Infectious Disease Disease No. of Samples Gene Expression Omnibus/Array Express | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1618 | GSE9692 | 12/1/2007 | ['9692'] | [] | [u'18460642'] | 2440641 | [u'18460642'] | ['Thomas', 'Cvijanovich', 'Shanley', u'Stanley', 'Freishtat', 'Lin', u'Checcia', 'Odoms', 'Wong', 'Sakthivel', 'Monaco', 'Allen', u'Thoams', 'Checchia', 'Anas'] | ['Thomas', 'Cvijanovich', 'Shanley', 'Freishtat', 'Lin', 'Odoms', 'Wong', 'Sakthivel', 'Monaco', 'Allen', 'Checchia', 'Anas'] | ['Thomas', 'Cvijanovich', 'Anas', 'Freishtat', 'Lin', 'Shanley', 'Wong', 'Sakthivel', 'Monaco', 'Allen', 'Checchia', 'Odoms'] | Physiol Genomics | 2008 | 6/12/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1619 | GSE9695 | 11/29/2007 | ['9695'] | [] | [u'19304931'] | 2675726 | [u'19304931'] | ['Flemr', 'Burketov\xc3\xa1', 'Collin', 'Zachowski', 'Ruelland', 'Renou', 'Krinke', 'Valentov\xc3\xa1', 'Vergnolle', 'Yu', 'Taconnat', u'Martin-Magniette'] | ['Flemr', 'Burketov\xc3\xa1', 'Collin', 'Zachowski', 'Ruelland', 'Renou', 'Krinke', 'Valentov\xc3\xa1', 'Vergnolle', 'Yu', 'Taconnat'] | ['Flemr', 'Burketov\xc3\xa1', 'Collin', 'Zachowski', 'Ruelland', 'Renou', 'Krinke', 'Valentov\xc3\xa1', 'Vergnolle', 'Yu', 'Taconnat'] | Plant Physiol | 2009 | 2009 May | 0 | AND pmc_gds | 1 | 0 | ||||
1620 | GSE9702 | 11/28/2007 | ['9702'] | [] | [u'18417639'] | 2409029 | [u'18417639'] | ['Irish', 'Mara'] | ['Irish', 'Mara'] | ['Irish', 'Mara'] | Plant Physiol | 2008 | 2008 Jun | 0 | AND pmc_gds | 1 | 0 | ||||
1621 | GSE9716 | 11/28/2007 | ['9716'] | [] | [u'14755057', u'17909027'] | 2831002 | [u'20064233'] | ['', 'Efimova', 'Khodarev', 'Minn', 'Darga', 'Labay', 'Roizman', 'Weichselbaum', 'Mauceri', 'Beckett'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux'] | [] | BMC Bioinformatics | 2010 | 1/11/2010 | 0 | Data collection (cel files) was performed using Gene Expression Omnibus [19] on the Affymetrix platform HG-U133a (Human Genome model U133a). This collection consists of 34 datasets (tableÊ2) for which there are at least 15 replicates for each of 2 different experimental conditions. {{key}}--REUSE-- | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1622 | GSE9725 | 12/13/2007 | ['9725'] | [] | [u'18208333'] | 2211538 | [u'18208333'] | ['Ma', 'Choi', 'Jun', 'Cheung', 'Southworth', 'Kim', 'Artandi', 'Venteicher', 'Chang', 'Shah', 'Sarin'] | ['Ma', 'Choi', 'Jun', 'Cheung', 'Southworth', 'Kim', 'Artandi', 'Venteicher', 'Chang', 'Shah', 'Sarin'] | ['Ma', 'Choi', 'Jun', 'Cheung', 'Southworth', 'Kim', 'Artandi', 'Venteicher', 'Chang', 'Shah', 'Sarin'] | PLoS Genet | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1623 | GSE9738 | 12/4/2007 | ['9738'] | [] | [u'18036227'] | 2865499 | [u'20370903'] | ['Szpara', 'Vranizan', 'Goodman', 'Speed', 'Tai', 'Ngai'] | ['Lang', 'Krumsiek', 'Theis', 'Marr', 'Lutter'] | [] | BMC Genomics | 2010 | 4/6/2010 | 0 | All analyzed datasets were taken from the GEO [43] database: (i) The stem cell development (SCD) dataset consists of three cell lines (R1, J1, V6.5) differentiated into embryoid bodies (EB) at 11 time points from t = 0 h until t = d 14. From each time point and each cell line 3 technical replicates were measured (combination of three cell line differentiationsÊGSE2972,ÊGSE3749,ÊGSE3231). (ii) Within the somitogenesis dataset (SG) gene expression was measured from synchronized C2C12 myoblasts at 13 timepoints from t = 0 h until t = 6 h (GSE7012). (iii) The neurite outgrowth (NO) and regeneration dataset consists of transcriptional activity, measured from dorsal root ganglia during a time course of neurite outgrowth in vitro under two conditions: untreated and under potent inhibitory cue Semaphorin3A. Measurements were taken at 5 time points from t = 2 h until t = 40 h including two technical replicates (GSE9738{{key}}--REUSE--). | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1624 | GSE9744 | 12/4/2007 | ['9744'] | [] | [u'18036227'] | 2245955 | [u'18036227'] | ['', 'Szpara', 'Vranizan', 'Goodman', 'Speed', 'Tai', 'Ngai'] | ['Szpara', 'Vranizan', 'Goodman', 'Speed', 'Tai', 'Ngai'] | ['Szpara', 'Vranizan', 'Goodman', 'Speed', 'Tai', 'Ngai'] | BMC Neurosci | 2007 | 11/23/2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1625 | GSE9746 | 12/7/2007 | ['9746'] | [] | [] | 2234263 | [u'18154678'] | [u'Folt', u'Hampton', u'Colbourne', u'Shaw', u'Glaholt', u'Hamilton', u'Chen', u'Davey'] | ['Folt', 'Hampton', 'Colbourne', 'Shaw', 'Glaholt', 'Hamilton', 'Chen', 'Davey'] | ['Folt', 'Hampton', 'Colbourne', 'Shaw', 'Glaholt', 'Hamilton', 'Chen', 'Davey'] | BMC Genomics | 2007 | 12/21/2007 | 0 | rays were used to compare the gene-expression profiles of D. pulex maintained under control conditions with those exposed to 20 μg Cd/L for 48-h (i.e., sub-lethal concentration, ~LC01; [GEO:GSE9746{{tag}}--REUSE--]). These utilized RNA isolated from three independent and concurrently replicated exposures of Daphnia to cadmium and control conditions, applied to three replicate microarrays using a standard c|s on the MA plot. Figure 2 Cadmium effects on gene expression: Buffers, blanks, and controls . Gene expression data from control and cadmium (20 μg Cd/L for 48-h) treated Daphnia pulex [GEO:GSE9746{{tag}}--REUSE--]. Data were LOESS normalized; duplicate probes were averaged within LIMMA using gene-wise linear models fit to expression data, and gene expression log ratios (M = log 2 treated/control) were plot|e M average for an EST cluster. Figure 3 Cadmium effects on gene expression: Regulated genes . Gene expression data from control and cadmium (20 μg Cd/L for 48-h) treated Daphnia pulex [GEO:GSE9746{{tag}}--REUSE--]. Micorarray elements determined to significantly different (p ≤ 0.05) using the empirical Bayes (ebayes) method to shrink gene-wise error estimate in cadmium treated vs. control animals ar|e of the current manuscript. Figure 8 Cadmium effects on gene expression: literature reports . Gene expression data from control and cadmium (20 μg Cd/L for 48-h) treated Daphnia pulex [GEO:GSE9746{{tag}}--REUSE--]. Micorarray elements, which are known from the literature to be regulated by cadmium, are highlighted: Cuticle protein, pink; Hemoglobin, green; Metallothionein, blue; Ferritin, orange; Chitinase,| detailed description of this microarray platform is archived at the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus under the accession number [GEO:GPL6195], series [GEO:GSE9746{{tag}}--REUSE--]. Briefly, cDNA for seeding the amplifications were obtained by directly transferring into the reactions 5 μl of cDNA bacterial transformants, which were grown in 1.2 ml of 2X YT and 0.005% | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1626 | GSE9747 | 12/14/2007 | ['9747'] | [] | [u'17998334'] | 2223422 | [u'17998334'] | ['Mongroo', 'Delemos', 'Tobias', 'Liu', 'Bowser', 'Rich', 'Hannigan', 'Schupp', 'Rustgi', 'Johnstone'] | ['Mongroo', 'Delemos', 'Tobias', 'Liu', 'Bowser', 'Rich', 'Hannigan', 'Schupp', 'Rustgi', 'Johnstone'] | ['Mongroo', 'Delemos', 'Tobias', 'Liu', 'Bowser', 'Rich', 'Hannigan', 'Schupp', 'Rustgi', 'Johnstone'] | Mol Cell Biol | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1627 | GSE9749 | 12/4/2007 | ['9749'] | [] | [u'18093538'] | 2190753 | [u'18093538'] | ['Jones', 'Zhuang'] | ['Jones', 'Zhuang'] | ['Jones', 'Zhuang'] | Immunity | 2007 | 2007 Dec | 0 | AND pmc_gds | 1 | 0 | ||||
1628 | GSE9751 | 12/4/2007 | ['9751'] | ['3143'] | [u'18668222'] | 2680807 | [u'19416532'] | ['Chan', 'Stapleton'] | ['Thomas', 'Zhang', 'Murphy', 'Gohlke', 'Mattingly', 'Davis', 'Rosenstein', 'Portier', 'Becker'] | [] | BMC Syst Biol | 2009 | 5/5/2009 | 1 | tered using Spearman rank correlation in Cluster and the graphic was prepared using TreeView [ 73 ], where color ranges linearly from blue (p = 1) to orange (p = 0). Phenotype-gene relationships were downloaded from the Genetic Association Database [ 1 ] in June 2007 and phenotypes were further grouped according to Additional file 1 . Request the TreeView file of this cluster from Julia Gohlke gohlkej@n|bal gene expression datasets utilized for validation of metabolic syndrome and neuropsychiatric subnetworks METABOLIC SYNDROME Condition Species Tissue GEO Acc . Reference obese/lean Human adipocytes GSE2508 [ 30 ] obese/lean Mouse adipocytes GSE4692 [ 31 ] Familial combined hyperlipedemia Human monocytes GSE11393 [ 32 ] Treatment Species Tissue GEO Acc . Reference Fenofibrate Rat liver GSE8251 [ 3|amide Rat liver GSE3952 [ 34 ] 9-cis retinoic acid Rat liver GSE3952 [ 34 ] Targretin Rat liver GSE3952 [ 34 ] Vitamin A deficient diet Rat liver GSE1600 [ 35 ] Omega 3 fatty acids Rat cardiomyocytes GSE4327 [ 36 ] Thiazolidinediones Human 3T3-L1 adipocytes GSE1458 [ 37 ] Atorvastatin Human monocytes GSE11393 [ 32 ] Cyfluthrin Human astrocytes GSE5023 [ 38 ] NEUROPSYCHIATRIC DISORDERS Condition Specie| 39 ] Depression Human prefrontal cortex GSE12654 [ 39 ] Schizophrenia Human prefrontal cortex GSE12654 [ 39 ] Schizophrenia Human frontal cortex E-MEXP-857 [ 40 ] Anxiety Mouse various brain regions GSE3327 [ 41 ] Autism Human lymphoblastoid cell lines GSE7329 [ 42 ] Autism Human whole blood GSE6575 [ 43 ] Treatment Species Tissue GEO Acc . Reference Chlorpyrifos Human astrocytes GSE5023 [ 38 ] Ch|mon diseases, and is one of the largest databases of human disease associated polymorphisms currently available. All gene-phenotype relationships (N = 28,341) in the Genetic Association Database were downloaded (June 8, 2007). Phenotypes were further annotated to collapse synonyms, as well as group similar phenotypes into categories (see mapping used in Additional file 1 ). Only those phenotypes with at |d with those phenotypes, phenotype p-values for KEGG pathway enrichment were clustered using Spearman Rank correlation with average linkage using Cluster version 2.11 and viewed using TreeView [ 73 ] downloaded July 2007 from Phenotype-Environmental factor Network Each phenotype-phenotype, phenotype-environmental factor, or environmental factor-environmental factor pair with at least two common significa|lue cut-offs and the sensitivity and specificity of the results as compared to manually curated, direct chemical-disease relationships found in the Comparative Toxicogenomics Database (CTD) database (downloaded in September 2008)[ 16 ]. This dataset contains direct chemical-disease relationships reported in the literature in human and animal model studies. We reduced this CTD database to those diseases fo|ned using the data for 4559 chemicals from CTD and for 204 diseases in GAD. Graphical representation of the network was determined using the edge weighted spring embedded algorithm in Cytoscape 2.5.0 downloaded Aug. 2007 [ 76 ] with the following parameters: spring strength = 5.0, spring rest length = 10.0, rest length of a disconnected spring = 1500, and strength of a disconnected spring = 0.06. Only tho|rom microarray datasets Up and downregulated gene lists from the microarray data as described in Table 1 was accessed from the publication associated with the datasets [ 30 - 36 , 77 - 84 ], or via downloading from GEO in Feb. or Nov. 2008. In the latter case, differentially expressed genes were identified using mattest and mafdr (Matlab 7.4.0.287 (R2007a)) with a fold change cutoff of 1.5 and a q value| metabolic syndrome datasets versus the neuropsychiatric datasets and vice versa using a hypergeometric distribution. Generation of Pathway Interconnectivity Network Connectivity between pathways was downloaded from KEGG (Aug. 16, 2007). Pathway network layout was generated using the force directed algorithm in Cytoscape 2.5.0 [ 76 ], where node size reflects the number of phenotypes associated with each | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1629 | GSE9755 | 12/6/2007 | ['9755'] | [] | [u'18039828'] | 2223256 | [u'18039828'] | ['Mackie', 'Smith', 'Sundset', 'Zoetendal'] | ['Mackie', 'Smith', 'Sundset', 'Zoetendal'] | ['Mackie', 'Smith', 'Sundset', 'Zoetendal'] | Appl Environ Microbiol | 2008 | 2008 Jan | 0 | AND pmc_gds | 1 | 0 | ||||
1630 | GSE9761 | 12/4/2007 | ['9761'] | [] | [u'19321454'] | 2685708 | [u'19321454'] | ['', 'Nott', 'Fluharty', 'Yeh', 'Li', 'Huang', 'Qiu', 'Welshons', 'Muyan'] | ['Nott', 'Fluharty', 'Yeh', 'Li', 'Huang', 'Qiu', 'Welshons', 'Muyan'] | ['Nott', 'Fluharty', 'Yeh', 'Li', 'Huang', 'Qiu', 'Welshons', 'Muyan'] | J Biol Chem | 2009 | 5/29/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1631 | GSE9776 | 12/14/2007 | ['9776'] | [] | [u'18156607'] | 2726785 | [u'19714220'] | ['Bishai', 'Karakousis', 'Williams'] | ['Zucker', 'Brandes', 'Murray', 'Colijn', 'Galagan', 'Cheng', 'Weiner', 'Farhat', 'Lun', 'Moody'] | [] | PLoS Comput Biol | 2009 | 2009 Aug | 0 | sults of E-Flux in the two models. The model is available as Dataset S1 . Expression Data The expression data published by Boshoff et al. [17] are listed under GEO accession number GSE1642. Boshoff et al. used clustering of the expression profiles to predict the mechanisms of action of previously unknown agents. Data are available for two channels: Cy3 (control) and Cy5 (condition) o| data are in log format; these were exponentiated to obtain raw values. The data of Karakousis et al. [18] are published at http://www.ncbi.nlm.nih.gov/geo/ under accession number GSE9776{{tag}}--REUSE-- and are also two-channel data of H37Rv M. tb ; the dataset contains 17 arrays for 6 unique conditions, comparing M. tb 's response to isoniazid in dormancy models. Expression Data Processing We p | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1632 | GSE9782 | 12/6/2007 | ['9782'] | [] | [u'17185464'] | 2841345 | [u'20215539'] | ['Koenig', 'Chng', 'Bergsagel', 'Mitsiades', 'Broyl', 'Boral', 'Fergus', 'Trepicchio', 'Shaughnessy', 'Esseltine', 'Anderson', 'Zhan', 'Sonneveld', 'Huang', 'Richardson', 'Schenkein', 'Roels', 'Mulligan', 'Bryant'] | ['Jenner', 'Gonzalez', 'Boyd', 'Johnson', 'Morgan', 'Brito', 'Davies', 'Dickens', 'Walker', 'Zeisig', 'Gregory', 'Leone', 'Ross'] | [] | Clin Cancer Res | 2010 | 3/15/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1633 | GSE9803 | 12/17/2007 | ['9803'] | [] | [u'17519223'] | 2816210 | [u'20140217'] | ['Cha', 'Dunnett', 'Leavitt', 'Augood', 'Hayden', 'Hodges', 'Olson', 'Hannan', 'Sathasivam', 'Strand', 'Becanovic', 'Jones', 'Bates', 'Albin', 'Goldstein', 'Shelbourne', 'Faull', 'Kuhn', 'Delorenzi', 'Ferrante', 'Sengstag', 'Luthi-Carter', 'Kooperberg', 'Pouladi'] | ['Slonim', 'Turcan', 'Vetter'] | [] | PLoS One | 2010 | 2/4/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1634 | GSE9806 | 12/7/2007 | ['9806'] | [] | [u'18483172'] | 2494826 | [u'18483172'] | ['Shapiro', 'Periyasamy', 'Xie', 'El-Okdi', 'Smaili', 'Shidyak', 'Gupta', 'Chin', 'Malhotra', 'Fedorova', 'Kahaleh', 'Raju', 'Elkareh'] | ['Shapiro', 'Periyasamy', 'Xie', 'El-Okdi', 'Smaili', 'Shidyak', 'Gupta', 'Chin', 'Malhotra', 'Fedorova', 'Kahaleh', 'Raju', 'Elkareh'] | ['Shapiro', 'Periyasamy', 'Xie', 'El-Okdi', 'Smaili', 'Shidyak', 'Gupta', 'Chin', 'Malhotra', 'Fedorova', 'Kahaleh', 'Raju', 'Elkareh'] | J Appl Physiol | 2008 | 2008 Jul | 0 | AND pmc_gds | 1 | 0 | ||||
1635 | GSE9811 | 12/31/2007 | ['9811'] | [] | [u'18270576'] | 2220035 | [u'18270576'] | ['Stadler', 'Trimarchi', 'Cepko'] | ['Stadler', 'Trimarchi', 'Cepko'] | ['Stadler', 'Trimarchi', 'Cepko'] | PLoS One | 2008 | 2/13/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1636 | GSE9812 | 12/31/2007 | ['9812'] | [] | [u'17444492', u'19470466'] | 2220035 | [u'18270576'] | ['Sun', 'Roska', 'Bartch', 'Cepko', 'Billings', 'Trimarchi', 'Cherry', 'Stadler'] | ['Stadler', 'Trimarchi', 'Cepko'] | ['Stadler', 'Trimarchi', 'Cepko'] | PLoS One | 2008 | 2/13/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1637 | GSE9812 | 12/31/2007 | ['9812'] | [] | [u'17444492', u'19470466'] | 2686638 | [u'19470466'] | ['Sun', 'Roska', 'Bartch', 'Cepko', 'Billings', 'Trimarchi', 'Cherry', 'Stadler'] | ['Stadler', 'Cherry', 'Trimarchi', 'Cepko'] | ['Stadler', 'Cherry', 'Trimarchi', 'Cepko'] | Proc Natl Acad Sci U S A | 2009 | 6/9/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1638 | GSE9818 | 12/31/2007 | ['9818'] | [] | [u'18287487'] | 2287368 | [u'18287487'] | ['Salon', 'Fernie', 'Jeudy', 'Freixes', 'Ruffel', 'Lepetit', 'Gojon', 'Kakar', 'Udvardi', 'van', 'Tillard', 'Balzergue', 'Gouzy', 'Martin-Magniette'] | ['Salon', 'Fernie', 'Jeudy', 'Freixes', 'Ruffel', 'Lepetit', 'Gojon', 'Kakar', 'Udvardi', 'van', 'Tillard', 'Balzergue', 'Gouzy', 'Martin-Magniette'] | ['Salon', 'Fernie', 'Jeudy', 'Freixes', 'Ruffel', 'Lepetit', 'Gojon', 'Kakar', 'Udvardi', 'van', 'Tillard', 'Balzergue', 'Gouzy', 'Martin-Magniette'] | Plant Physiol | 2008 | 2008 Apr | 0 | AND pmc_gds | 1 | 0 | ||||
1639 | GSE9832 | 12/10/2007 | ['9832'] | [] | [u'18157115'] | 2777408 | [u'19936228'] | ['Yabuuchi', 'Lerou', 'West', 'Huo', 'Park', 'Daley', 'Zhao', 'Lensch', 'Ince'] | ['Tomoda', 'Conklin', 'Salomonis', 'Nakamura', 'Yamanaka'] | [] | PLoS One | 2009 | 11/10/2009 | 0 | http://www.ncbi.nlm.nih.gov/geo ) for the U133 plus 2.0 array platform from published datasets. This dataset consisted of normal adult human tissue samples from the human body index compendium study (GSE7307), hES cells, hiPS cells, and fibroblast lines (GSE9709 and GSE9832{{tag}}--REUSE--) [3] , [29] . For all datasets, one to three arbitrary biological replicates per tissue were use | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1640 | GSE9832 | 12/10/2007 | ['9832'] | [] | [u'18157115'] | 2941458 | [u'20862250'] | ['Yabuuchi', 'Lerou', 'West', 'Huo', 'Park', 'Daley', 'Zhao', 'Lensch', 'Ince'] | ['Izpis\xc3\xbaa', 'Barrero', 'Paramonov', 'Bou\xc3\xa9'] | [] | PLoS One | 2010 | 9/17/2010 | 0 | sor genes differing in iPSCs from ESCs, and which could be the source of higher risks. Materials and Methods Gene expression analysis The datasets used for the human analyses are: Takahashi et al. (GSE9561) [5] ; Yu et al. (GSE9071) [7] ; Park et al. (GSE9832{{tag}}--REUSE--) [6] ; Zhao et al. (GSE12922) [52] ; Masaki et al. (GSE9709) �|2390) [30] ; Aasen et al. (GSE12583) [37] ; Huangfu et al. (pers. comm.) [53] ; Lowry et al. (GSE9865) [54] ; Ebert et al. (GSE13828) [15] ; Yu et al. (GSE15148) [55] ; Soldner et al. (GSE14711) [11] . The datasets used for the mouse analyses are: Takahashi et al. (GSE525|7815) [30] ; Feng et al. (GSE13211) [56] ; Sridharan et al. (GSE14012) [35] ; Wernig et al. (E-MEXP1037) [4] ; Chen et al . (GSE15267); Zhou et al . (GSE16062) [57] ; Zhao et al . (GSE16925) [20] ; Kang et al . (GSE17004) [19] ; Heng et al . (GSE19023) [58] ; Ic|(wpr =  pr *weight). Next, the average percentrank and the average weighted percentrank were identified for the replicates of each sample. In addition, for the dataset GSE7841 we have averaged the available iPSCs samples (day2, day16, day17 and day18). For the dataset E-MEXP-1037 we have averaged iPSCs samples (clones 8 and 18). For the dataset GSE13211 we have averaged |OSCE (clones 8 and 13) and iPSCs OSE (clones T8 and T9) samples. For the dataset GSE14012 we have averaged ESCs (v6.5 and E14), MEFs (male and female) and iPSCs (1D4 and 2D4) samples. For the dataset GSE15267 we have averaged ESCs (CGR8 and R1), iPSCs reprogrammed with four factors (S2C12 and S2C16) and iPSCs reprogrammed with 3 factors (S53C1 and S53C5). For the dataset GSE19023 we have averaged MEFs |teworthy that the bivalent genes profiles of the iPSC lines described to contribute to viable mice through tetraploid complementation assay (the most stringent proof of pluripotency available so far, GSE16925 and GSE17004) have the highest correlation coefficients when compared with the ESC lines. As expected, the correlation between bivalent genes profiles of fibroblasts and ESCs is very low and espec|es whose expression in iPSCs could restrict or at least bias the differentiation potential. Encouragingly, the iPSC lines that were shown to generate viable mice by tetraploid complementation assays (GSE16925 and GSE17004) express none to very few of such genes, whereas the first iPSCs generated that did not contribute to the germline (GSE5259), as well as the partially reprogrammed iPSC lines (GSE1401|umber of these potentially troublesome genes ( Figure 4 ). For example, the partially reprogrammed iPSC lines 1A2 and 1B3 (GSE14012), as well as the Fbx15KO iPSC line, which showed a limited potency (GSE5259), express Hoxc8, which is a homeodomain gene important for early embryogenesis, especially for neural development, and whose expression level is normally tightly regulated [78] an|133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) Phalanx Human one aray (GPL6254) Affy HG-U133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) Affy HG-U133 plus 2.0 (GPL570) GEO accession number GSE12390 Pers. Comm GSE9865 GSE12583 GSE12922 GSE13828 GSE15148 GSE14711 Corr coeff whole array iPS/ES: average (min-max) Primary iPS: 0.988 (0.984–0.989) Secondary iPS: 0.991 (0.990–0.991) | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1641 | GSE9832 | 12/10/2007 | ['9832'] | [] | [u'18157115'] | 2946409 | [u'20885964'] | ['Yabuuchi', 'Lerou', 'West', 'Huo', 'Park', 'Daley', 'Zhao', 'Lensch', 'Ince'] | ['Kiyokawa', 'Umezawa', 'Okita', 'Fukawatase', 'Akutsu', 'Miyagawa', 'Makino', 'Chikazawa', 'Nishino', 'Yamazaki-Inoue', 'Toyoda', 'Takahashi'] | [] | PLoS One | 2010 | 9/27/2010 | 0 | x0005b;41] ( http://www.genome.jp/kegg/ ) for gene ontology analysis, the GEO database ( http://www.ncbi.nlm.nih.gov/geo/ ) for surveying gene expression in human iPS/ES cells (accession no. GSE9832{{tag}}--REUSE-- [16] and GSE12583 [17] ), and the UCSC Genome Browser website [42] ( http://genome.ucsc.edu/ ). RT-PCR RNA was extracted from cells using the RN | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1642 | GSE9832 | 12/10/2007 | ['9832'] | [] | [u'18157115'] | 2837036 | [u'20178649'] | ['Yabuuchi', 'Lerou', 'West', 'Huo', 'Park', 'Daley', 'Zhao', 'Lensch', 'Ince'] | ['Wang', 'Chao', 'Liu', 'Wu', 'Wong', 'Liang', 'Hsu', 'Hsieh'] | [] | BMC Genomics | 2010 | 2/24/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1643 | GSE9854 | 12/13/2007 | ['9854'] | [] | [] | 2742858 | [u'19525223'] | [u'Rood'] | ['Tschan', 'Van', 'Touka', 'Mont\xc3\xa9', 'Rood', 'Leprince', 'B\xc3\xa9gue', 'Pinte', 'Ramsey', 'Gu\xc3\xa9rardel', 'Stephan', 'Jenal'] | [u'Rood'] | J Biol Chem | 2009 | 7/31/2009 | 0 | both cell types were removed from the analysis. Self-organizing maps were used to identify major trends in expression. Raw data can be obtained at NCBI GEO ( www.ncbi.nlm.nih.gov ), accession number GSE9854{{tag}}--DEPOSIT-- , or the Children's National Medical Center Public Expression Profiling Resource site. Western Blot and Antibodies Proteins were fractionated by SDS-PAGE and transferred onto nitrocellulose memb | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1644 | GSE9862 | 12/19/2007 | ['9862'] | [] | [u'18431496'] | 2292242 | [u'18431496'] | ['Parker', 'Lastovica', 'Mandrell', 'Qui\xc3\xb1ones', 'Miller', 'Guilhabert', u'Qui\xf1ones'] | ['Parker', 'Lastovica', 'Mandrell', 'Qui\xc3\xb1ones', 'Miller', 'Guilhabert'] | ['Parker', 'Lastovica', 'Mandrell', 'Qui\xc3\xb1ones', 'Miller', 'Guilhabert'] | PLoS One | 2008 | 4/23/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1645 | GSE9873 | 12/13/2007 | ['9873'] | [] | [u'18067656'] | 2246263 | [u'18067656'] | ['Newburger', 'Kanegaye', '', 'Brown', 'Sundel', 'Relman', 'Shike', 'Popper', 'Burns', 'Shimizu'] | ['Newburger', 'Kanegaye', 'Brown', 'Sundel', 'Relman', 'Shike', 'Popper', 'Burns', 'Shimizu'] | ['Newburger', 'Kanegaye', 'Brown', 'Sundel', 'Relman', 'Shike', 'Popper', 'Burns', 'Shimizu'] | Genome Biol | 2007 | 2007 | 0 | AND pmc_gds | 1 | 0 | ||||
1646 | GSE9874 | 12/14/2007 | ['9874'] | [] | [u'18506362'] | 2831002 | [u'20064233'] | ['Jern\xc3\xa5s', 'Eriksson', u'Jern\xe5s', 'Wiklund', 'Thelle', 'Hamsten', 'H\xc3\xa4gg', 'Svensson', 'Fagerberg', 'Carlsson', 'Olsson', u'H\xe4gg'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux'] | [] | BMC Bioinformatics | 2010 | 1/11/2010 | 0 | using the R package GCRMA [ 20 ]. As the benchmark is tested gene by gene, a pre-treatment including all Affybatch objects globally was not needed. Table 2 Datasets list Dataset Number of replicates GSE10072 107 GSE10760 98 GSE1561 49 GSE1922 49 GSE3790FC 65 GSE3790CN 70 GSE3790CB 54 GSE3846 108 GSE3910 70 GSE3912 113 GSE5388 61 GSE5392 82 GSE5462 116 GSE5580 42 GSE5847 95 GSE646-7 93 GSE643-5 126 GSE|6 38 GSE9874{{tag}}--REUSE--b-f 60 GSE9874{{tag}}--REUSE-- 60 GSE9877 47 GSE994 75 Datasets used for construction of the initial matrix. The number of replicates is the number of microarrays in the experiment. Giant datasets (e.g., GSE3790 with 202 replicates in three different brain regions) were first split into subsets according to their biological content. The datasets were then sampled as follows: when the number of replicates w | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1647 | GSE9877 | 12/18/2007 | ['9877'] | [] | [u'18156497'] | 2831002 | [u'20064233'] | ['Jiang', 'Wei', 'Hillery', 'Yang', 'Scott', 'Nelson', 'Enenstein', 'Hirsch', 'Topper', 'Bodempudi', 'Chang', 'Hebbel', 'Pan', u'Milbauer'] | ['Berger', 'De', 'Pierre', 'Depiereux', 'Bareke', 'Gaigneaux'] | [] | BMC Bioinformatics | 2010 | 1/11/2010 | 0 | using the R package GCRMA [ 20 ]. As the benchmark is tested gene by gene, a pre-treatment including all Affybatch objects globally was not needed. Table 2 Datasets list Dataset Number of replicates GSE10072 107 GSE10760 98 GSE1561 49 GSE1922 49 GSE3790FC 65 GSE3790CN 70 GSE3790CB 54 GSE3846 108 GSE3910 70 GSE3912 113 GSE5388 61 GSE5392 82 GSE5462 116 GSE5580 42 GSE5847 95 GSE646-7 93 GSE643-5 126 GSE|6 38 GSE9874b-f 60 GSE9874 60 GSE9877{{tag}}--REUSE-- 47 GSE994 75 Datasets used for construction of the initial matrix. The number of replicates is the number of microarrays in the experiment. Giant datasets (e.g., GSE3790 with 202 replicates in three different brain regions) were first split into subsets according to their biological content. The datasets were then sampled as follows: when the number of replicates w | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1648 | GSE9877 | 12/18/2007 | ['9877'] | [] | [u'18156497'] | 2944782 | [u'20885780'] | ['Jiang', 'Wei', 'Hillery', 'Yang', 'Scott', 'Nelson', 'Enenstein', 'Hirsch', 'Topper', 'Bodempudi', 'Chang', 'Hebbel', 'Pan', u'Milbauer'] | ['Klassen', 'Khush', 'Morgan', 'Valantine', 'Dudley', 'Li', 'Sigdel', 'Sarwal', 'Kambham', 'Chen', 'Butte', 'Hsieh', 'Caohuu'] | [] | PLoS Comput Biol | 2010 | 9/23/2010 | 0 | AND pmc_gds | 0 | 1 | ||||
1649 | GSE9877 | 12/18/2007 | ['9877'] | [] | [u'18156497'] | 2275038 | [u'18156497'] | ['Jiang', 'Wei', 'Hillery', 'Yang', 'Scott', 'Nelson', 'Enenstein', 'Hirsch', 'Topper', 'Bodempudi', 'Chang', 'Hebbel', 'Pan', u'Milbauer'] | ['Jiang', 'Wei', 'Hillery', 'Yang', 'Scott', 'Nelson', 'Enenstein', 'Hirsch', 'Topper', 'Bodempudi', 'Chang', 'Hebbel', 'Pan'] | ['Jiang', 'Wei', 'Hillery', 'Yang', 'Scott', 'Nelson', 'Enenstein', 'Hirsch', 'Topper', 'Bodempudi', 'Chang', 'Hebbel', 'Pan'] | Blood | 2008 | 4/1/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1650 | GSE9877 | 12/18/2007 | ['9877'] | [] | [u'18156497'] | 2756210 | [u'19643986'] | ['Jiang', 'Wei', 'Hillery', 'Yang', 'Scott', 'Nelson', 'Enenstein', 'Hirsch', 'Topper', 'Bodempudi', 'Chang', 'Hebbel', 'Pan', u'Milbauer'] | ['Demers', 'Callas', 'Vossen', 'Hasstedt', 'Trotman', 'Bovill', 'Rosendaal', 'Hebbel', 'Bezemer'] | ['Hebbel'] | Blood | 2009 | 10/1/2009 | 0 | AND pmc_gds | 1 | 0 | ||||
1651 | GSE9884 | 12/17/2007 | ['9884'] | [] | [u'18302797'] | 2277405 | [u'18302797'] | ['Jakt', 'Alev', 'Tarui', 'Sheng', 'McIntyre'] | ['Jakt', 'Alev', 'Tarui', 'Sheng', 'McIntyre'] | ['Tarui', 'Jakt', 'Alev', 'Sheng', 'McIntyre'] | BMC Dev Biol | 2008 | 2/27/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1652 | GSE9889 | 12/31/2007 | ['9889'] | [] | [u'18198273'] | 2242723 | [u'18198273'] | ['Taylor', 'Han', 'Elgar'] | ['Taylor', 'Han', 'Elgar'] | ['Taylor', 'Han', 'Elgar'] | Proc Natl Acad Sci U S A | 2008 | 1/22/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1653 | GSE9909 | 12/28/2007 | ['9909'] | [] | [u'18081424'] | 2788246 | [u'20019809'] | ['Gorospe', 'Lakatta', 'Zahn', 'Firman', 'Boheler', 'Wood', 'Ingram', 'Schlessinger', u'Suresh', 'Lustig', 'Taub', 'Zonderman', 'Kim', 'Ko', 'Mattson', 'Kummerfeld', 'Becker', 'Falco', 'Carter', 'Mazan-Mamczarz', 'Owen', 'Poosala', 'Weeraratna', 'Xu'] | ['Southworth', 'Kim', 'Owen'] | ['Owen', 'Kim'] | PLoS Genet | 2009 | 2009 Dec | 0 | iological networks, including changes in social or economic networks over time. Materials and Methods Data normalization We downloaded the AGEMAP data from the NCBI Gene Expression Omnibus (accession GSE9909{{tag}}--REUSE--) [10] . The AGEMAP microarray collection contains microarrays for 16 different tissues for five male and five female mice aged both 16 and 24 months. We removed the liver, striatum|part [42] . Identification of co-expressed clusters from six-month AGEMAP data We downloaded the AGEMAP data for the 6-month-old mice from the NCBI Gene Expression Omnibus (accession GSE9909{{tag}}--REUSE--) [10] . We normalized the data using the method described for the other AGEMAP data. We found gene clusters by performing a hierarchical clustering of the 6-month data, using a Spe | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1654 | GSE9914 | 12/18/2007 | ['9914'] | ['3545', '3544'] | [u'18216249'] | 2234131 | [u'18216249'] | ['Gatchel', 'Thaller', 'Shaw', 'Jafar-Nejad', 'Orr', 'Carson', 'Zu', 'Watase', 'Zoghbi'] | ['Gatchel', 'Thaller', 'Shaw', 'Jafar-Nejad', 'Orr', 'Carson', 'Zu', 'Watase', 'Zoghbi'] | ['Gatchel', 'Thaller', 'Shaw', 'Jafar-Nejad', 'Orr', 'Carson', 'Zu', 'Watase', 'Zoghbi'] | Proc Natl Acad Sci U S A | 2008 | 1/29/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1655 | GSE9916 | 12/18/2007 | ['9916'] | [] | [] | 2430197 | [u'18510776'] | [u'Wong', u'Sakthivel', u'Odoms'] | ['Wong', 'Sakthivel', 'Odoms'] | ['Wong', 'Sakthivel', 'Odoms'] | BMC Immunol | 2008 | 5/30/2008 | 0 | croarray hybridization The data and protocols described in this manuscript have been deposited in the NCBI Gene Expression Omnibus (GEO, [ 20 ]) and are accessible through GEO Series accession number GSE9916{{tag}}--DEPOSIT--. Total RNA was isolated from THP-1 cells exposed to the above experimental conditions using the Trizol reagent (Invitrogen) according to the manufacturer's specifications. Microarray hybridization | 0 | 1 | 0 | NOT pmc_gds | 1 | 0 |
1656 | GSE9922 | 12/28/2007 | ['9922'] | [] | [u'18245445'] | 2216690 | [u'18245445'] | ['', 'Helms', 'Wang', 'Nusse', 'Mikels', 'Liu', 'Rinn', 'Brugmann', 'Stadler', 'Allen', 'Chang', 'Ridky'] | ['Helms', 'Wang', 'Nusse', 'Mikels', 'Liu', 'Rinn', 'Brugmann', 'Stadler', 'Allen', 'Chang', 'Ridky'] | ['Helms', 'Wang', 'Nusse', 'Mikels', 'Liu', 'Rinn', 'Brugmann', 'Stadler', 'Allen', 'Chang', 'Ridky'] | Genes Dev | 2008 | 2/1/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1657 | GSE9929 | 12/20/2007 | ['9929'] | [] | [u'18226214'] | 2257942 | [u'18226214'] | ['Hue', 'Degrelle', 'Hennequet-Antier', 'Chiapello', 'Piot-Kaminski', 'Piumi', 'Renard', 'Robin'] | ['Hue', 'Degrelle', 'Hennequet-Antier', 'Chiapello', 'Piot-Kaminski', 'Piumi', 'Renard', 'Robin'] | ['Hue', 'Degrelle', 'Hennequet-Antier', 'Chiapello', 'Piot-Kaminski', 'Piumi', 'Renard', 'Robin'] | BMC Genomics | 2008 | 1/28/2008 | 0 | AND pmc_gds | 1 | 0 | ||||
1658 | GSE9954 | 12/20/2007 | ['9954'] | ['3142'] | [u'18365009'] | 2673434 | [u'19237395'] | ['Moreau', 'Van', 'Schuit', 'Thorrez', 'Marchal', 'Tranchevent', u'Lommel', 'Engelen', u'Lemaire'] | ['Jackson', 'Flicek', 'Swan', 'Preston-Fayers', 'Ballester', 'Carlile', 'Werner'] | [] | Nucleic Acids Res | 2009 | 2009 Apr | 0 | 02003;   Pfn3 0.33 0.36 1.54 105.1      Slc34a1 0.97 0.79 1.04 1.06      Slc34a-3′ 0.98 0.95 1.03 1.04 Datasets GSE9954{{tag}} and GSE4193 were evaluated using the probe sets specific for NATs ( Supplementary data 1 ). The normalized values for the probesets Pfn3 (1453962_at), Slc34a1 (1423279_at) and Slc34a -3ȃ|assess datasets in the public repositories with emphasis on testis and embryonic stem cells. Other mouse tissues were included as controls and cross reference. The relevant GEO accession numbers are: GSE9954{{tag}}--REUSE--, different mouse tissues; GSE4193 spermatogenesis. Raw data was downloaded pre-processed with RMA and a per-gene normalisation applied in GeneSpring 7.3.1 (Agilent). Net expression levels were plot| transcript may only span a fraction of the individual probes that make up the entire probe set). We tested the expression of the corresponding transcripts in various mouse tissues using GEO datasets GSE9954{{tag}}--REUSE-- and GSE4193. In particular, antisense transcripts were assessed in a multi tissue experiment including testis, seminal vesicle, brain, kidney and embryonic stem cells. Datasets from spermatocytes a | 1 | 0 | 0 | NOT pmc_gds | 0 | 1 |
1659 | GSE9954 | 12/20/2007 | ['9954'] | ['3142'] | [u'18365009'] | 2858751 | [u'20346121'] | ['Moreau', 'Van', 'Schuit', 'Thorrez', 'Marchal', 'Tranchevent', u'Lommel', 'Engelen', u'Lemaire'] | ['Thorrez', 'Chang', 'Moreau', 'Tranchevent', 'Schuit'] | ['Thorrez', 'Moreau', 'Tranchevent', 'Schuit'] | BMC Genomics | 2010 | 3/26/2010 | 0 | The expression dataset consisting of 70 microarrays covering 22 different murine tissues with 3-5 replicates per tissue was used as starting data. Data are accessible through the GEO database, with accession numberÊGSE9954{{key}}--REUSE--Ê[32] | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1660 | GSE9954 | 12/20/2007 | ['9954'] | ['3142'] | [u'18365009'] | 2752611 | [u'19671693'] | ['Moreau', 'Van', 'Schuit', 'Thorrez', 'Marchal', 'Tranchevent', u'Lommel', 'Engelen', u'Lemaire'] | ['Thorrez', 'Van', 'Schuit', 'Hoijtink'] | ['Thorrez', 'Van', 'Schuit'] | Bioinformatics | 2009 | 10/1/2009 | 0 | id for potential users, we provided the script file of our simulation. 3.2 Tissue-selective genes We used a publicly available microarray dataset ( http://www.ncbi.nlm.nih.gov/geo/ , accession number GSE9954{{tag}}--REUSE--; see also Thorrez et al. , 2008 ) that we generated via Affymetrix mRNA expression analysis using 430 2.0 arrays. This database consists of 22 different murine tissues, with 3–5 replicate | 1 | 0 | 0 | NOT pmc_gds | 1 | 0 |
1661 | GSE9954 | 12/20/2007 | ['9954'] | ['3142'] | [u'18365009'] | 2267211 | [u'18365009'] | ['Moreau', 'Van', 'Schuit', 'Thorrez', 'Marchal', 'Tranchevent', u'Lommel', 'Engelen', u'Lemaire'] | ['Moreau', 'Van', 'Schuit', 'Thorrez', 'Marchal', 'Tranchevent', 'Engelen'] | ['Moreau', 'Van', 'Schuit', 'Thorrez', 'Marchal', 'Tranchevent', 'Engelen'] | PLoS One | 2008 | 3/26/2008 | 1 | AND pmc_gds | 1 | 0 |