Example analysis using the GSE35626 public dataset
The GSE35626 dataset is a TCRα repertoire analysis of CD8+ cells, in duplicates.
Preparation
This example was ran in a Ubuntu Precise LTS virtual machine in the Amazon Elastic Compute Cloud.
Software installation
Since this is run in a virtual machine that can be erased after use, some steps of the installation are not recommended on local systems. See the README for more precise installation instructions.
Update the package repostitory to Quantal.
sudo sed -i 's/precise/quantal/g' /etc/apt/sources.list sudo apt-get update
Install the programs needed by clonotypeR and for other analysis steps.
sudo apt-get install bioperl bwa emboss r-base
Install git, download clonotypeR, build the R package and install it. Note that on this cloud instance, the temporary storage area for large files is in /mnt/
.
sudo apt-get install git sudo install -d /mnt/clonotyper -o ubuntu -g ubuntu ln -s /mnt/clonotyper . git clone git://clonotyper.branchable.com/ clonotyper/ R CMD build clonotyper/ sudo R CMD INSTALL clonotypeR_0.1.tar.gz
Install the package for the NCBI SRA toolkit, to convert the sequences downloaded from the Gene Expression Omnibus.
sudo apt-get install sra-toolkit
Data download
The files are larger than gigabytes, so the download and conversion to FASTQ format will take roughly one hour.
mkdir clonotyper/GSE35626 cd clonotyper/GSE35626 wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP%2FSRP010%2FSRP010815/SRR407172/SRR407172.sra wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP%2FSRP010%2FSRP010815/SRR407173/SRR407173.sra fastq-dump SRR407172.sra fastq-dump SRR407173.sra
Data analysis
In shell
../scripts/clonotypeR detect SRR407172.fastq ../scripts/clonotypeR detect SRR407173.fastq ../scripts/clonotypeR extract SRR407172 ../scripts/clonotypeR extract SRR407173 cat clonotypes/*tsv > clonotypes.tsv R
This prepares a large (3.2 Gb) ?table of clonotypes, and starts R.
In R
library(clonotypeR) clonotypes <- read_clonotypes('clonotypes.tsv') a <- clonotype_table(from=clonotypes, feat=c("V","J")) colSums(a > 0) # SRR407172 SRR407173 # 3653 3677
3,653 V–J pairs were detected in SRR407172.