recently fixed bugs

Does not work with EMBOSS 6.6.0.

In EMBOSS 6.6.0, sequence addresses with a dash and a colon cause are parsed as 'dbname:entry' instead of 'file:entry' when an appropriate directory name exists.

echo -e '>A\nAAAAAA' > test-1.fa
cat test-1.fa 
# >A
# AAAAAA
seqret test-1.fa:A stdout
# Read and write (return) sequences
# >A
# AAAAAA
mkdir test
seqret test-1.fa:A stdout
# Read and write (return) sequences
# Error: Query 'test-1.fa:A' query field '1.fa' not defined for datatype 'sequence'
# Error: Unable to read sequence 'test-1.fa:A'
# Died: seqret terminated: Bad value for '-sequence' and no prompt

See also http://lists.open-bio.org/pipermail/emboss/2014-June/008983.html

Done with the following workaround: the directory references/V was renamed references/V-C in 125524131bd57c084ee39374054b73784e6ae929.

Posted
Incorrect identifiers when V-J boundary is a codon boundary
> yassai_identifier(c(V='TRAV2', J='TRAJ61', dna='ATTGTGACTGACACAGGTACAGAATTAATAGGAAATTGGCA', pep='IVTDTGTELIGNW'))
[1] "tg.integer(0)A2A61L13"

Done in 2b8514d18f312f627255b0f2eb0ec1bf4ffa48dd, by return an empty chain instead of integer(0) when receiving an empty chain.

Result after correction:

> yassai_identifier(c(V='TRAV2', J='TRAJ61', dna='ATTGTGACTGACACAGGTACAGAATTAATAGGAAATTGGCA', pep='IVTDTGTELIGNW'))
[1] "tg.A2A61L13"
Posted
Clonotype that takes ages for calculating the Yassai identifier.

Problem

yassai_identifier(c(V="TRAV14N-3", J="TRAJ16", dna="GCAACTTCAAGTGGCCAGAAGCTGGTT", pep="ATSSGQKLV"))
# takes ages

yassai_identifier(c(V="TRAV14N-3", J="TRAJ16", dna="GCAGCAACCGCAACTTCAAGTGGCCAGAAGCTGGTT", pep="AATATSSGQKLV"))
## [1] "aTa.2A14N3A16L12"
# returns immediately.

Troubleshooting

The problematic clonotype comes from read GWYEMNS10GHZUF.

@GWYEMNS10GHZUF
gactAACTGGTACACAGCAGGTTCTGGGTTCTGGATGTGCAGGTACACCTTTAATATGGTCCCCTGGCCAAAAACCAGCTTCTGGCCACTTGAAGTTGCACAGAAGTAGGTGGCTGAGTCTCCAGGCTGAGAGTCTTTGATGTGCAAGGAGAGATTTTTCTCCCTTTTATTGAAGAAGATTGTGAATCGTCCATCTTCCTTTTTATCGGACACTGAACGTATGGCTATCAGGAGAGCAGGGCCTTCCCCAGGGAACTGCTGGTACCATGGGAAGTAGTCAAAAGCACTGTTCTCAtaactagcagttcagaattgcggtctctccttccccagactgtcagagattggagtcgtgggagacaaggcacacaggggataggngngnnnnnnnnnnnnn
+
FFF::;;FFFFFFFFIIIIHFFFHF666=<FFGGDDDFFHFDIIIIIIIHHHIIIHFFFD54449662200001335>>AAABDFDDDFDDDFFFFFFFFFFFFFFFFFCCCCCFFFFFFFFFFFFDDBBBBB==110?@@@@@>4455BBA??3357;F4:::;;==D88?AA<;:==BAAAA==??AAB@AAA=>>>222//02A8898BDDFFFDDFFFFFF666:DBAAAABB:::???DD;;;;D?????DDBA?<<<><<<<000444<8993222233393...:663322<9233776899:;;963326755551111,,,,3313335333799.....23322666726;;;66772222,,,,47400!,!,!!!!!!!!!!!!!

Here is the alignment made by clonotypeR

GWYEMNS10GHZUF  16  TRAV14N-3   105 17  49S17M1I29M1I201M99S    *   0   0   NNNNNNNNNNNNNCNCNCCTATCCCCTGTGTGCCTTGTCTCCCACGACTCCAATCTCTGACAGTCTGGGGAAGGAGAGACCGCAATTCTGAACTGCTAGTTATGAGAACAGTGCTTTTGACTACTTCCCATGGTACCAGCAGTTCCCTGGGGAAGGCCCTGCTCTCCTGATAGCCATACGTTCAGTGTCCGATAAAAAGGAAGATGGACGATTCACAATCTTCTTCAATAAAAGGGAGAAAAATCTCTCCTTGCACATCAAAGACTCTCAGCCTGGAGACTCAGCCACCTACTTCTGTGCAACTTCAAGTGGCCAGAAGCTGGTTTTTGGCCAGGGGACCATATTAAAGGTGTACCTGCACATCCAGAACCCAGAACCTGCTGTGTACCAGTTAGTC   !!!!!!!!!!!!!,!,!00474,,,,22227766;;;62766622332.....9973335333133,,,,111155557623369;;:9986773329<223366:...3933322223998<444000<<<<><<<?ABDD?????D;;;;DD???:::BBAAAABD:666FFFFFFDDFFFDDB8988A20//222>>>=AAA@BAA??==AAAAB==:;<AA?88D==;;:::4F;7533??ABB5544>@@@@@?011==BBBBBDDFFFFFFFFFFFFCCCCCFFFFFFFFFFFFFFFFFDDDFDDDFDBAAA>>53310000226694445DFFFHIIIHHHIIIIIIIDFHFFDDDGGFF<=666FHFFFHIIIIFFFFFFFF;;::FFF AS:i:233    XS:i:218    XF:i:3  XE:i:4  XN:i:0

The problem was that the V segment was completely digested in favor of the J segment.

           351                                            400
GWYEMNS10GHZUF CTGTGCAACTTCAAGTGGCCAGAAGCTGGTTTTTGGCCAGGGGACCATAT
TRAJ16         ~~~~GCAACTTCAAGTGGCCAGAAGCTGGTTTTTGGCCAGGGGACCATAT
TRAV14N-3      CTGTGCAGC...AAGTG~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

           401                                            450
GWYEMNS10GHZUF TAAAGGTGTACCTGCACATCCAGAACCCAGAACCTGCTGTGTACCAGTTA
TRAJ16         TAAAGGTGTACCTGC~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TRAV14N-3      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This triggered an infinite loop when calling is.germline().

Done in 42ab98729c5c77462b5929b00ee973074ae4c1c7.

Now the function should return as in the following example.

yassai_identifier(c(V="TRAV14N-3", J="TRAJ16", dna="GCAACTTCAAGTGGCCAGAAGCTGGTT", pep="ATSSGQKLV"))
## [1] "a.A14N3A16L9"
Posted
Different clonotypes give the same Yassai identifier.

Is it a bug in the software or a bug in the concept ?

> rownames(dd)[grep('aAn.1A14-1A43L9', dd.yassai)]
[1] "TRAV14-1 TRAJ43 GCAGCAGCTAACAACAATGCCCCACGA AAANNNAPR"
[2] "TRAV14-1 TRAJ43 GCAGCTAATAACAACAATGCCCCACGA AANNNNAPR"
> V_after_C['TRAV14-1',]
[1] "GCAGCAAGTG"
> J_before_FGxG['TRAJ43',]
[1] "GCAATAACAACAATGCCCCACGA"
aAn.1A14-1A43L9
 A   A   A   N   N   N   A   P   R
GCA GCA GCT AAC AAC AAT GCC CCA CGA
GCA GCA agt g
     gc aaT AAC AAC AAT GCC CCA CGA
aAn.1A14-1A43L9
 A   A   N   N   N   N   A   P   R
GCA GCT AAT AAC AAC AAT GCC CCA CGA
GCA GCa agt g
     gc AAT AAC AAC AAT GCC CCA CGA

This is a collision (different colonotypes giving the same Yassai ID) rather than a bug in the implementation → done.

Posted
yassai.nomenclature returns incorrect values.
> ?yassai.nomenclature
> clonotype <- c("C", "TRBV2", "TRBJ1-2", "GINGMWQ04IYGTQ",
+        "GCCAGCAGCCAAGAGACCAGGGGAGGCTCCGACTACACC",
+        "IIIIIIIIIIIIIIIIIIIBBBBIFFI>>>IIIIIIIHH", "ASSQETRGGSDYT")
>      yassai.nomenclature(clonotype)
[1] "qETRGg.2263TRBV2TRBJ1-2L13"

This should have been qETRGg.2263B2B1-21L13.

Done in 7cf11805d84ff8dc6a537c17e73efa275077915d.

Posted