-
Notifications
You must be signed in to change notification settings - Fork 4
Expand file tree
/
Copy pathGEOquery.Rmd
More file actions
45 lines (34 loc) · 1.91 KB
/
GEOquery.Rmd
File metadata and controls
45 lines (34 loc) · 1.91 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Example of how to download CEL files from GEO
If the `GEOquery` R/Biocondcutor package is not installed, use `biocLite()` to install the package:
```{r, eval=FALSE}
source("http://bioconductor.org/biocLite.R")
biocLite("GEOquery")
```
Load the `GEOquery` R/Bioconductor package:
```{r, message=FALSE}
library(GEOquery)
```
### Access the GEO Series Data
To access the GEO Sample (GSM), GEO Series (GSE) (lists of GSM files that together form a single experiment) or GEO Dataset (GDS), use the function `getGEO()` which returns a list of ExpressionSets:
```{r, message=FALSE}
gse <- getGEO("GSE21653", GSEMatrix=TRUE)
show(gse)
```
### Accessing raw data from GEO
If raw data such as .CEL files exist on GEO, you can easily access this dea using the `getGEOSuppFiles()` function. The function takes in a GEO accession as the argument and will download all the raw data associated with that accession. By default the `getGEOSuppFiles()` function will create a directory within the current working directory to store the raw data. Here, the file paths of the downloaded files (often with as a .tar extension) are stored in a data frame called `filePaths`.
```{r}
filePaths = getGEOSuppFiles('GSE21653')
filePaths
```
From here you can use, for example, `ReadAffy()` to read in the CEL files.
### Access GSE Data Tables from GEO
To access the phenotypic information about the samples, the best way is to use `getGEO()` function to obtain the GSE object and then extract the phenoData object from that. Unfortunately this means downloading the entire GSE Matrix file.
```{r}
dim(pData(gse[[1]]))
head(pData(gse[[1]])[,1:3])
```
Sometimes GSEs are include separate data tables with the sample information. If these exist, you can uuse the `getGSEDataTables()` function. For example here is the phenoData object from a different GSE accession GSE3494 with a Data Table.
```{r}
df1 <- getGSEDataTables("GSE3494")
lapply(df1,head)
```