recently fixed bugs
In EMBOSS 6.6.0, sequence addresses with a dash and a colon cause are parsed as 'dbname:entry' instead of 'file:entry' when an appropriate directory name exists.
echo -e '>A\nAAAAAA' > test-1.fa cat test-1.fa # >A # AAAAAA seqret test-1.fa:A stdout # Read and write (return) sequences # >A # AAAAAA mkdir test seqret test-1.fa:A stdout # Read and write (return) sequences # Error: Query 'test-1.fa:A' query field '1.fa' not defined for datatype 'sequence' # Error: Unable to read sequence 'test-1.fa:A' # Died: seqret terminated: Bad value for '-sequence' and no prompt
See also http://lists.open-bio.org/pipermail/emboss/2014-June/008983.html
Done with the following workaround: the directory references/V
was renamed references/V-C
in 125524131bd57c084ee39374054b73784e6ae929.
> yassai_identifier(c(V='TRAV2', J='TRAJ61', dna='ATTGTGACTGACACAGGTACAGAATTAATAGGAAATTGGCA', pep='IVTDTGTELIGNW')) [1] "tg.integer(0)A2A61L13"
Done in 2b8514d18f312f627255b0f2eb0ec1bf4ffa48dd, by return an empty chain instead of integer(0)
when receiving an empty chain.
Result after correction:
> yassai_identifier(c(V='TRAV2', J='TRAJ61', dna='ATTGTGACTGACACAGGTACAGAATTAATAGGAAATTGGCA', pep='IVTDTGTELIGNW')) [1] "tg.A2A61L13"
Problem
yassai_identifier(c(V="TRAV14N-3", J="TRAJ16", dna="GCAACTTCAAGTGGCCAGAAGCTGGTT", pep="ATSSGQKLV")) # takes ages yassai_identifier(c(V="TRAV14N-3", J="TRAJ16", dna="GCAGCAACCGCAACTTCAAGTGGCCAGAAGCTGGTT", pep="AATATSSGQKLV")) ## [1] "aTa.2A14N3A16L12" # returns immediately.
Troubleshooting
The problematic clonotype comes from read GWYEMNS10GHZUF
.
@GWYEMNS10GHZUF gactAACTGGTACACAGCAGGTTCTGGGTTCTGGATGTGCAGGTACACCTTTAATATGGTCCCCTGGCCAAAAACCAGCTTCTGGCCACTTGAAGTTGCACAGAAGTAGGTGGCTGAGTCTCCAGGCTGAGAGTCTTTGATGTGCAAGGAGAGATTTTTCTCCCTTTTATTGAAGAAGATTGTGAATCGTCCATCTTCCTTTTTATCGGACACTGAACGTATGGCTATCAGGAGAGCAGGGCCTTCCCCAGGGAACTGCTGGTACCATGGGAAGTAGTCAAAAGCACTGTTCTCAtaactagcagttcagaattgcggtctctccttccccagactgtcagagattggagtcgtgggagacaaggcacacaggggataggngngnnnnnnnnnnnnn + FFF::;;FFFFFFFFIIIIHFFFHF666=<FFGGDDDFFHFDIIIIIIIHHHIIIHFFFD54449662200001335>>AAABDFDDDFDDDFFFFFFFFFFFFFFFFFCCCCCFFFFFFFFFFFFDDBBBBB==110?@@@@@>4455BBA??3357;F4:::;;==D88?AA<;:==BAAAA==??AAB@AAA=>>>222//02A8898BDDFFFDDFFFFFF666:DBAAAABB:::???DD;;;;D?????DDBA?<<<><<<<000444<8993222233393...:663322<9233776899:;;963326755551111,,,,3313335333799.....23322666726;;;66772222,,,,47400!,!,!!!!!!!!!!!!!
Here is the alignment made by clonotypeR
GWYEMNS10GHZUF 16 TRAV14N-3 105 17 49S17M1I29M1I201M99S * 0 0 NNNNNNNNNNNNNCNCNCCTATCCCCTGTGTGCCTTGTCTCCCACGACTCCAATCTCTGACAGTCTGGGGAAGGAGAGACCGCAATTCTGAACTGCTAGTTATGAGAACAGTGCTTTTGACTACTTCCCATGGTACCAGCAGTTCCCTGGGGAAGGCCCTGCTCTCCTGATAGCCATACGTTCAGTGTCCGATAAAAAGGAAGATGGACGATTCACAATCTTCTTCAATAAAAGGGAGAAAAATCTCTCCTTGCACATCAAAGACTCTCAGCCTGGAGACTCAGCCACCTACTTCTGTGCAACTTCAAGTGGCCAGAAGCTGGTTTTTGGCCAGGGGACCATATTAAAGGTGTACCTGCACATCCAGAACCCAGAACCTGCTGTGTACCAGTTAGTC !!!!!!!!!!!!!,!,!00474,,,,22227766;;;62766622332.....9973335333133,,,,111155557623369;;:9986773329<223366:...3933322223998<444000<<<<><<<?ABDD?????D;;;;DD???:::BBAAAABD:666FFFFFFDDFFFDDB8988A20//222>>>=AAA@BAA??==AAAAB==:;<AA?88D==;;:::4F;7533??ABB5544>@@@@@?011==BBBBBDDFFFFFFFFFFFFCCCCCFFFFFFFFFFFFFFFFFDDDFDDDFDBAAA>>53310000226694445DFFFHIIIHHHIIIIIIIDFHFFDDDGGFF<=666FHFFFHIIIIFFFFFFFF;;::FFF AS:i:233 XS:i:218 XF:i:3 XE:i:4 XN:i:0
The problem was that the V segment was completely digested in favor of the J segment.
351 400 GWYEMNS10GHZUF CTGTGCAACTTCAAGTGGCCAGAAGCTGGTTTTTGGCCAGGGGACCATAT TRAJ16 ~~~~GCAACTTCAAGTGGCCAGAAGCTGGTTTTTGGCCAGGGGACCATAT TRAV14N-3 CTGTGCAGC...AAGTG~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 401 450 GWYEMNS10GHZUF TAAAGGTGTACCTGCACATCCAGAACCCAGAACCTGCTGTGTACCAGTTA TRAJ16 TAAAGGTGTACCTGC~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TRAV14N-3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This triggered an infinite loop when calling is.germline()
.
Done in 42ab98729c5c77462b5929b00ee973074ae4c1c7.
Now the function should return as in the following example.
yassai_identifier(c(V="TRAV14N-3", J="TRAJ16", dna="GCAACTTCAAGTGGCCAGAAGCTGGTT", pep="ATSSGQKLV")) ## [1] "a.A14N3A16L9"
Is it a bug in the software or a bug in the concept ?
> rownames(dd)[grep('aAn.1A14-1A43L9', dd.yassai)] [1] "TRAV14-1 TRAJ43 GCAGCAGCTAACAACAATGCCCCACGA AAANNNAPR" [2] "TRAV14-1 TRAJ43 GCAGCTAATAACAACAATGCCCCACGA AANNNNAPR" > V_after_C['TRAV14-1',] [1] "GCAGCAAGTG" > J_before_FGxG['TRAJ43',] [1] "GCAATAACAACAATGCCCCACGA"
aAn.1A14-1A43L9 A A A N N N A P R GCA GCA GCT AAC AAC AAT GCC CCA CGA GCA GCA agt g gc aaT AAC AAC AAT GCC CCA CGA
aAn.1A14-1A43L9 A A N N N N A P R GCA GCT AAT AAC AAC AAT GCC CCA CGA GCA GCa agt g gc AAT AAC AAC AAT GCC CCA CGA
This is a collision (different colonotypes giving the same Yassai ID) rather than a bug in the implementation → done.
> ?yassai.nomenclature > clonotype <- c("C", "TRBV2", "TRBJ1-2", "GINGMWQ04IYGTQ", + "GCCAGCAGCCAAGAGACCAGGGGAGGCTCCGACTACACC", + "IIIIIIIIIIIIIIIIIIIBBBBIFFI>>>IIIIIIIHH", "ASSQETRGGSDYT") > yassai.nomenclature(clonotype) [1] "qETRGg.2263TRBV2TRBJ1-2L13"
This should have been qETRGg.2263B2B1-21L13
.