|
Tutorial
This document is a tutorial for the CORNA package. The tutorial
contains code that can be directly executed in R, the software on which CORNA is
based.
1. Requirements
You will need:
source("http://www.bioconductor.org/biocLite.R")
biocLite(c("Biobase", "biomaRt", "GEOquery", "RCurl"))
install.packages("XML")
Connecting R to the internet
If you're having problems with R accessing the internet, then there are
a couple of things to try.
On windows, right click the shortcut to R, and click properties. In the
target box, enter the flag '--internet2':
On linux or unix systems, if you are using a proxy server, make sure
your http_proxy environment variable is set and you may also need to
execute unset no_proxy.
2. Choice of population
When performing tests that draw from the Hypergeometric distribution, defining the universe (or population) from
which the sample has been drawn is key. In the case of CORNA, we are dealing with gene lists, and so the question is:
from which population of genes have we drawn our sample gene list?
The obvious answer is "all the genes in the genome in question". However, this may not be the case - if you have not
assayed all genes in the genome, then those genes you didn't assay could never have made it into your sample, and therefore
they should not be present in the population either. For example, if you perform a microarray experiment and your microarray
only represents a subset of the genes in the genome, then only that subset should be used as the population.
The population may be further refined. For example, with CORNA, we are looking at regulation of gene lists by microRNAs, therefore
the population of genes could be further reduced to only those genes that you have assayed that also have a predicted
miRNA relationship. If you have assayed a gene that has no miRNA relationship, then that gene can never
be counted against a particular miRNA, and serves only to increase the size of the universe.
In general, the effect of a larger universe is that the resultant p-values appear more significant.
The population gene list to be used in CORNA is handled by the 'y' parameter of the corna.test.fun function. This is highly
configurable and you may input any list you wish, and there is therefore a responsibility on the user to select the correct population.
In the examples below, when testing for association of microRNAs with gene lists, the population used is all genes in
the target genome that have at least one microRNA association. When using predicted microRNA targets to test for enrichment of pathways or GO terms, only those genes with at least one pathway, or at least one GO term, are used as the population. Where a microRNA’s predicted targets are not associated with a pathway or GO term, they are not included in the analysis.
3. Reading and querying microRNA:gene pairs
Here we read microRNA:transcript target information from mirBASE and query that data set by transcript, by gene and by microRNA
library(CORNA)
targets <- miRBase2df.fun(url="ftp://ftp.sanger.ac.uk/pub/mirbase/targets/v5/arch.v5.txt.mus_musculus.zip")
corna.map.fun(targets, "mmu-mir-20b", "mir", "tran", all=TRUE)
corna.map.fun(targets, "ENSMUST00000056452", "tran", "mir", all=TRUE)
vec <- sample(targets$mir, 10)
corna.map.fun(targets, vec, "mir", "tran", all=TRUE)
vec <- sample(targets$tran, 10)
corna.map.fun(targets, vec, "tran", "mir", all=TRUE)
tran2gene <- BioMart2df.fun(
biomart="ensembl",
dataset="mmusculus_gene_ensembl",
col.old=c("ensembl_transcript_id",
"ensembl_gene_id"),
col.new=c("tran", "gene"))
mir2gene <- corna.map.fun(targets,
tran2gene,
"gene",
"mir")
corna.map.fun(mir2gene, "mmu-mir-20b", "mir", "gene", all=TRUE)
corna.map.fun(mir2gene, "ENSMUSG00000046341", "gene", "mir", all=TRUE)
4. Testing a transcript list for significant miRNA associations
Here we test a list of ensembl transcripts for enrichment of microRNA-target association from mirbase.
The vector tsam is loaded from the CORNA package. This vector consists of 960 randomly selected transcripts,
plus 30 each for mmu-mir-155 and mmu-mir-708 - thus we expect these two to show up as significant.
library(CORNA)
data(CORNA.DATA)
targets <- miRBase2df.fun(file="arch.v5.txt.mus_musculus.zip")
res <- corna.test.fun(x=tsam,
y=unique(targets$tran),
z=targets,
p.adjust="BH")
res[res$hypergeometric<=0.05,]
5. Testing a gene list for significant miRNA associations
Here we test a list of ensembl genes for enrichment of microRNA-target association from mirbase.
The vector gsam is loaded from the CORNA package. This vector consists of 900 randomly selected genes
plus 50 each for mmu-mir-155 and mmu-mir-708 - thus we expect these two to show up as significant.
library(CORNA)
data(CORNA.DATA)
targets <- miRBase2df.fun(file="arch.v5.txt.mus_musculus.zip")
tran2gene <- BioMart2df.fun(biomart="ensembl",
dataset="mmusculus_gene_ensembl",
col.old=c("ensembl_transcript_id",
"ensembl_gene_id"),
col.new=c("tran", "gene"))
mir2gene <- corna.map.fun(x=targets,
y=tran2gene,
m="gene",
n="mir")
res <- corna.test.fun(x=gsam,
y=unique(mir2gene$gene),
z=mir2gene,
p.adjust="BH")
res[res$hypergeometric<=0.05,]
6. Testing a microRNA's targets for enrichment of KEGG pathways
Here we start with the microRNA mmu-mir-155 and use the mirbase targets information to find
transcripts that it targets. The microRNA-transcript relationship is converted to a microRNA-gene
relationship using data from BioMart. This data is then linked to KEGG, and each pathway is tested
to see if it is enriched for the genes targeted by mmu-mir-155
library(CORNA)
targets <- miRBase2df.fun(file="arch.v5.txt.mus_musculus.zip")
tran2gene <- BioMart2df.fun(
biomart="ensembl",
dataset="mmusculus_gene_ensembl",
col.old=c("ensembl_transcript_id",
"ensembl_gene_id"),
col.new=c("tran", "gene"))
mir2gene <- corna.map.fun(targets,
tran2gene,
"gene",
"mir")
gvec <- corna.map.fun(mir2gene,
"mmu-mir-155",
"mir",
"gene")
gene2path <- KEGG2df.fun(org="mmu")
gvec <- intersect(gvec, unique(gene2path$gene))
path2name <- unique(gene2path[c("path", "name")])
test <- corna.test.fun(
gvec,
unique(gene2path$gene),
gene2path,
hypergeometric=T,
fisher=T,
chi.square=T,
#fisher.alternative="greater",
min.pop=10,
sort="fisher",
desc=path2name)
test[test$hypergeometric<=0.05,]
7. Testing a microRNA's targets for enrichment of GO terms
Here we start with the microRNA mmu-mir-708 and use the mirbase targets information to find
transcripts that it targets. This data is then linked to GO using BioMart and each GO Term is tested
to see if it is enriched for the genes targeted by mmu-mir-708
library(CORNA)
targets <- miRBase2df.fun(file="arch.v5.txt.mus_musculus.zip")
tvec <- corna.map.fun(x=targets,
y="mmu-mir-708",
m="mir",
n="tran")
tran2gomf <- BioMart2df.fun(biomart="ensembl",
dataset="mmusculus_gene_ensembl",
col.old=c("ensembl_transcript_id", "go_molecular_function_id"),
col.new=c("tran", "gomf"))
go2term <- GO2df.fun(url="ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz")
pvec <- as.vector(unique(tran2gomf[, "tran"]))
samvec <- intersect(tvec, pvec)
test <- corna.test.fun(samvec,
pvec,
tran2gomf,
fisher=T,
sort="fisher",
min.pop=5,
desc=go2term)
test[test$fisher <= 0.05, ]
8. Displaying quantitative data associated with a microRNA from a microarray experiment
library(CORNA)
data(CORNA.DATA)
tran2probe <- BioMart2df.fun(biomart="ensembl",
dataset="mmusculus_gene_ensembl",
col.old=c("ensembl_transcript_id", "affy_mouse430_2"),
col.new=c("tran", "probe"))
targets <- miRBase2df.fun(file="arch.v5.txt.mus_musculus.zip")
tvec <- corna.map.fun(x=targets,
y="mmu-mir-1",
m="mir",
n="tran")
sigprobe <- corna.map.fun(tran2probe,
tvec,
"tran",
"probe", all=TRUE)
tsig <- intersect(sigprobe$probe, sam.probe.vec)
my.microarray.df <- corna.sub.fun(x=microarray.df,
y=tsig)
corna.line.fun(x=my.microarray.df,
array.order= c("GSM176756", "GSM176758", "GSM176759", "GSM176757", "GSM176760", "GSM176761"),
las=0, ps=11, mar=c(7,3,3,1),
main="Significant Probes")
corna.barplot.fun(x=my.microarray.df,
array.order= c("GSM176756", "GSM176758", "GSM176759", "GSM176757", "GSM176760", "GSM176761"),
array.colour=c(rep("black", 3), rep("white", 3)),
row=3,
column=3,
main="Significant probes targeted by mmu-mir-1",
line=3)
corna.barplot.fun(x=my.microarray.df,
array.order= c("GSM176756", "GSM176758", "GSM176759", "GSM176757", "GSM176760", "GSM176761"),
array.colour=c(rep("black", 3), rep("white", 3)),
row=7,
column=7,
main="Significant probes targeted by mmu-mir-1",
line=3)
This tutorial represents only part of the functionality in the CORNA
package. For the rest, I recommend you read the full documentation after
installing.
Comments and queries should be made to the author, Michael Watson
9. References
- Zhao Y, Ransom JF, Li A, Vedantham V, von Drehle M, Muth AN, Tsuchihashi T, McManus MT, Schwartz RJ, Srivastava D.
Dysregulation of cardiogenesis, cardiac conduction, and cell cycle in mice lacking miRNA-1-2.
Cell. 2007 Apr 20;129(2):303-17.
PMID: 17397913
|