![]() ![]() If we can create a hash table of our QNAMEs, then searching a BAM file should be O(n) in the number of records. Good hash functions have ~O(1) complexity for a lookup. Let's approach the problem from a different angle: But personally I tend to recoil when I have to set up a JVM!) (I haven't tested Picard, which I assume will work as intended. Most of the solutions posted thus far are either slow, or can yield spurious results in certain cases. ![]() ![]() Perform the same sort of iteration as above, but only compare to the top read name in your list from step 2 (removing that element from the stack after a match).Sort the list of read names to match (this will be more complicated than one would naively think, since samtools and picard will name sort differently, so be sure to put some thought into this).Name sort the BAM file ( samtools sort -n or picard), which can be done with multiple threads.If you need to select a large list of reads by name, then a more efficient strategy is: Now that'll work, but if you have a lot of names to look through then it'll end up being quite slow (as will grep -f. Obam = pysam.AlignmentFile("some output file.bam", "w", template=bam) Below is a small bit of python code showing one naive but manageable way of doing this (N.B., I expect grep to be faster, though getting it to output the header will be annoying): import pysamīam = pysam.AlignmentFile("some input file.bam")
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |