This morning, I am pushing the Encyclopedia of mtDNA Origins out into the world. Each of the requirements for it is complete. What remains is doing quality assurance work and flushing out the basic background text.
From my first post the user stories were: As a user with mtDNA results I would like…
- Breadcrumb links showing the path from the RSRS founder lineage to my end branch on the maternal tree.
- A sidebar with links to other areas of interest.
- The name of my end branch on the tree.
- Background information on my branch.
- A summary of important facts about my branch: Age, Origin, Defining mutations, Parent, and Children.
- To know how the named branch is defined in the current Phylotree build and has been in past ones.
- Maternal Origin information from Geno 2.0 tested samples.
- Results and origin information from GenBank samples.
- Journal Articles that mention the branch.
- Additional resources for understanding my results and taking part in new discovery.
Everything is complete. Now it time to go over each page, validate it, and address problems.
Checking the Phylotree Log
Each set of Phylotree branches linked to an mtDNA Story needs to be checked for conflicts. This means catching where major renaming took place. I have already done an first-pass quality assurance check and identified potential problems. They are shown in the table below. Fixing them is a matter of looking at the branch mutations (variants) for each build and comparing it to the Phylotree Build 17 mutations.
Verifying Branch Ages
Once I have confirmed the correct ordering of Phylotree builds and branches, I need to verify that I have pulled the right date information from Behar et al, 2012b. The paper was written using a draft version of Phylotree. The draft seems to have been mostly equivalent to the standard Build 12. However, it included branches that were not officially named until Phylotree Build 16. Thus, anyplace where the Phylotree Build 12 and Build 17 are not in agreement needs to be checked.
Converting GenBank Samples to Phylotree Build 17
To do this I will need to format files to run them through the on-line version of MitoTools.
Converting Geno 2 Samples to Phylotree Build 17
The process for updating Geno 2 samples is essentially the same as the one for GenBank samples.
This, I am sad to say, is going to be rather tedious. I need to check each mtDNA Story page and confirm that the parent branch shown is the right one. Potential problems are where branches are linked by a ‘ branch. An example is that H5'36 is the parent of H5 rather than H.
Expanding mtDNA Stories
I have already expanded several dozen stories as I have tested. Once the other parts of validation are complete, I need to work on the remaining 5,100 or so. How much I can write depends on how many samples are available and the demographic information linked to them. I would like to be done with the first pass by January 1, 2017. That means that I need to complete 50 to 60 a day.
Adding GenBank Samples
Every week new samples are added to GenBank. I need to catch up with the ones that have been added in the past few months and develop a routine for weekly updates.
Now for the most important part of all…. I need your opinion. What do you think? What questions do you have? What else would you like to see on pages? Please leave your thoughts in the comment section below.
- Behar, D.M., van Oven, M., Rosset, S., Metspalu, M., Loogväli, E.L., Silva, N.M., Kivisild, T., Torroni, A. and Villems, R. (2012). A “Copernican” reassessment of the human mitochondrial DNA tree from its root. American Journal of Human Genetics, 90(4), 675-684.
- Fan, L., & Yao, Y. G. (2011). MitoTool: a web server for the analysis and retrieval of human mitochondrial DNA sequence variations. Mitochondrion, 11(2), 351-356.
- Fan, L., & Yao, Y. G. (2013). An update to MitoTool: using a new scoring system for faster mtDNA haplogroup determination. Mitochondrion, 13(4), 360-363.
Note: To the person who said I needed to finish something. You were right. I hate it when you are right and I am wrong. I am working on the rest.