Friday, August 31, 2012

biomaRt: Find gene name using chromosome number and position

In a previous post, I gave a few examples of using biomaRt in R.  This is a continuation giving another useful example of using biomaRt: How to obtain gene names (e.g. HGNC) or really any information in listAttributes() function using only chromosome number and chromosome position. I was interested in obtaining the gene names for a set of mutations and decided to use biomaRt.  I created a tab-delimited file called 'positions.txt' containing three columns.  The first contained the chromosome number, followed by the start and end position (in this case they were the same).  The following code identifies what gene the chromosome position is in and reports the HGNC gene symbol.

# Load the library
library(biomaRt)

# Define biomart object
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")

# Gives a list of all possible annotations; Currently there are 1668 listed
listAttributes(mart)

# Gives a list of all filters or criteria to search by; Currently there are 333 listed
# I chose to filter by: chromosome_name, start, end
listFilters(mart)

# Read in tab-delimited file with three columns: chromosome number, start position and end position
positions <- read.table("positions.txt")

# Extract HGNC gene symbol
results <- getBM(attributes = c("hgnc_symbol", "chromosome_name", "start_position", "end_position"), filters = c("chromosome_name", "start", "end"), values = list(positions[,1], positions[,2], positions[,3]), mart = mart)