Answers and mysteries from DNA sequences
Before Post-doctoral fellow Michel Valim returned to Brazil several weeks ago there was one of those days that illustrated to me, once again, why molecular data are so special. They are one big part answers and can still harbor mysteries unsolved. The answers came from Josh Engel who has been working hard gathering data on genetic structure in birds of the Albertine Rift for our MacArthur Foundation grant. From our own work and that of colleagues, we now have some really exciting data sets on genetic structure in birds and small mammals of the Albertine Rift based on tissue samples collected over the last 20 years. A part of the project is to get data from toepads of historical samples taken from study skins. That is always harder, and the samples Josh has been working with from Mountain Yellow Warblers (Chloropeta similis) and African Hill Babblers (Psuedoalcippe abyssinica) have been giving him some trouble. He had done everything right in terms of trying to avoid any possibility of contamination when he extracted DNA from these samples. Contamination means you have sequenced DNA, but it is DNA that is not what you targeted; the most likely thing is that it is DNA from something you have sequenced previously that has gotten somehow gotten into the chemicals you are using.
Even when you do get successful sequencing, you have a DNA sequence, but you still worry that it might not be what you think it is, meaning contamination. You also worry if the sequences are good enough to detect differences, which are going to be few in population studies (then again, if your sequences are identical to sequences from tissue, you would worry again about contamination). Today, he wanted to show me something. He had gotten toepad samples from some of the Hill Babblers to work. So, were they good sequences? They were, but he was not sure until he ran an analysis to see how they related to the fresh tissues samples for which he already had sequence. That is what Josh wanted to show me. Some of the sequences from toepads were identical to modern samples but they agreed with geography. By that I mean toepad sequences from the Rwenzori Mountains were identical to sequences from modern individuals from the Rwenzoris and these are different from sequences from other sites. This makes sense and is independent evidence that these sequences are from the toepads and not something else.
The analyses were even more exciting because several of the toepad sequences grouped with a single sequence of a modern bird that we had been wondering about. This sequence comes from a bird from Kahuzi-Biega National Park in the Democratic Republic of Congo, but it was notably different from other sequences from the same area. When you find a single sequence like that your first thought is that maybe something is wrong with the sequence, but we could not find anything. It was a Hill Babbler, just one genetically a little different from others in the same region (and all other regions in the Albertine Rift). Now we had some other sequences 100 years older that clustered with it. These samples were from birds collected at a site in the next set of mountains to the south of Kahuzi-Biega. So we have some more work to do and more samples to study, but Josh is feeling pretty good about his ability to get DNA from toepads and we may have found something interesting about the evolutionary history of Hill Babblers in this part of the Albertine Rift.
The mystery of molecules came later in the afternoon when I stuck my head into Jason and Michel’s office to hear whether Michel’s new sequences of feather lice were good. They were looking at the computer screen and editing sequence. The results looked good for all but one individual. It was not that the sequence they obtained for this individual was not readable, it was just readily apparent when they compared it to other louse sequences that it was far too different from all the other sequences to be a feather louse. So what was it? The first thing you can try when you get sequence like this is to go to the GenBank web site and use a program called BLAST to see if your sequence matches something in the massive GenBank database of sequences of everything that has been sequenced and submitted to this incredible database.
Jason and Michel had BLASTED their sequence and it had not matched anything in GenBank. So there is the mystery, a good sequence that they know is not from the thing whose DNA they intended to isolate and study and it does not match anything in GenBank. A mystery, and one not likely to be solved anytime soon, but we have the sequence. There is still much more undocumented DNA sequence across the world’s biodiversity than there is DNA sequence we know something about.