This example analysis assumes a Unix system (Linux, Mac OS, …)
The data provided on-line in example_data is a sub-sample of three sequence librairies (2,000 reads each) made on the 454 Titanium or the 454 junior platforms. The original libraries will be deposited in public databanks after publication in a peer-reviewed journal.
These example libraries are called
C, and are in FASTQ format,
with entries like the following (the sequence was truncated for the convenience
of the display).
@HKTLYLP01B0MTM gactGTCCATCTTCCTTTTATCGGACACTGAAGTATGGATATCAGAAGTGCAgggccttcccacgggaacg + IIIIIIIIIIIHHFF::::G>IIIGGGIIIIIIIIIGGIIIIIIFEBDCDC<//-5522------
Detection of V segments
Run the command
clonotyper detect A.fastq in the same directory as a copy of
The result is stored in a temporary directory called
will be created if it does not already exist.
clonotyper detect compares the sequences to the reference V segments using
BWA, and produces output like the following.
[bsw2_aln] read 2000 sequences/pairs (843395 bp)... [samopen] SAM header is present: 167 sequences. [main] Version: 0.6.2-r126 [main] CMD: bwa bwasw -t8 /usr/share/clonotypeR/references/V-C/index A.fastq [main] Real time: 1.099 sec; CPU: 8.225 sec
This indicates that 2,000 reads have been processed, representing 843,395 base
pairs in total. There were 167 reference V segments, and the version number of
0.6.2-r126. The whole process took less than 10 seconds.
Process the example libraries
C similarly with the commands
clonotyper detect B.fastq and
clonotyper detect C.fastq.
Extraction of CDR3 regions
Run the command
clonotyper extract A in the same directory as where you ran
clonotyper detect A.fastq. The result is a table stored in a directory
clonotypes, that will be created if it does not already exist.
The output is quite voluminous, and indicates which V / J combinations are being found, like on the following.
TRAV14-3 233 TRAJ61 0 TRAJ60 0 TRAJ59 0 TRAJ58 1 TRAJ57 39 TRAJ56 2 TRAJ55 0
The format of the table is explained in the manual
page of the function
read_clonotypes() of the
For each library (
C), one file is available in the
directory. With BWA
0.6.2-r126, the following numbers of clonotypes are found.
1072 clonotypes/A.tsv 924 clonotypes/B.tsv 689 clonotypes/C.tsv
The files need to be concatenated before analysis in
R, with the following command.
find clonotypes/ -name '*tsv' | xargs cat > clonotypes.tsv
Data analysis in R
R statistical analysis software with the command “
R”. The commands entered after this are executed by
Load the clonotypeR library:
Load the data in a R object called clonotypes:
clonotypes <- read_clonotypes('clonotypes.tsv')
summary(clonotypes) already provides useful information.
> summary(clonotypes) lib V J read A:1072 TRAV14-1 :944 TRAJ31 : 380 Length:2684 B: 924 TRAV14-2 :237 TRAJ23 : 270 Class :character C: 688 TRAV14-3 :251 TRAJ22 : 257 Mode :character TRAV14D-3/DV8 :242 TRAJ37 : 156 TRAV14N-1_14D-1:604 TRAJ34 : 141 TRAV14N-2_14D-2:235 TRAJ40 : 104 TRAV14N-3 :171 (Other):1376 dna qual pep unproductive Length:2684 Length:2684 Length:2684 Mode :logical Class :character Class :character Class :character FALSE:2130 Mode :character Mode :character Mode :character TRUE :554 NA's :0
Identify unique clonotypes, count their sequences in the libraries
C, and store the result as a table arbitrarly named
> abc <- clonotype_table(c('A','B','C')) > head(abc) A B C TRAV14-1 AAASSGSWQLI TRAJ22 1 0 0 TRAV14-1 AACNNRIF TRAJ31 1 0 0 TRAV14-1 AAGAKLT TRAJ39 3 0 0 TRAV14-1 AAGGSWQLI TRAJ22 1 0 0 TRAV14-1 AAGTNTGKLT TRAJ27 1 0 0 TRAV14-1 AAHDTNAYKVI TRAJ30 1 0 0 > summary(abc) A B C Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 Median : 0.0000 Median : 0.0000 Median : 0.0000 Mean : 0.7599 Mean : 0.6606 Mean : 0.5018 3rd Qu.: 1.0000 3rd Qu.: 1.0000 3rd Qu.: 0.0000 Max. :18.0000 Max. :22.0000 Max. :121.0000
The summary shows that the most frequent clonotype is in
R index vectors, we can see that its CDR3 sequence is AASDSNNRIF and that it was not found in the other libraries.
> abc[C == 121,] A B C TRAV14N-1_14D-1 AASDSNNRIF TRAJ31 0 0 121
clonotype_table function can also produce a count table for and combination of V, CDR3 or J segments.
> clonotype_table(c('A','B','C'), "V") A B C TRAV14-1 239 493 0 TRAV14-2 131 61 0 TRAV14-3 79 9 113 TRAV14D-3/DV8 140 50 4 TRAV14N-1_14D-1 78 24 388 TRAV14N-2_14D-2 81 61 49 TRAV14N-3 94 34 2 > head(clonotype_table(c('A','B','C'), c("V","J"))) A B C TRAV14-1 TRAJ11 1 1 0 TRAV14-1 TRAJ12 2 2 0 TRAV14-1 TRAJ13 2 1 0 TRAV14-1 TRAJ15 10 11 0 TRAV14-1 TRAJ16 4 3 0 TRAV14-1 TRAJ18 1 21 0