Suppr超能文献

重新构建人类线粒体系统发育树:一种自动化、可扩展的方法,结合专家知识。

Rebooting the human mitochondrial phylogeny: an automated and scalable methodology with expert knowledge.

机构信息

Departamento de Informática e Ingeniería de Sistemas, Universidad de Zaragoza, Zaragoza, Spain.

出版信息

BMC Bioinformatics. 2011 May 19;12:174. doi: 10.1186/1471-2105-12-174.

Abstract

BACKGROUND

Mitochondrial DNA is an ideal source of information to conduct evolutionary and phylogenetic studies due to its extraordinary properties and abundance. Many insights can be gained from these, including but not limited to screening genetic variation to identify potentially deleterious mutations. However, such advances require efficient solutions to very difficult computational problems, a need that is hampered by the very plenty of data that confers strength to the analysis.

RESULTS

We develop a systematic, automated methodology to overcome these difficulties, building from readily available, public sequence databases to high-quality alignments and phylogenetic trees. Within each stage in an autonomous workflow, outputs are carefully evaluated and outlier detection rules defined to integrate expert knowledge and automated curation, hence avoiding the manual bottleneck found in past approaches to the problem. Using these techniques, we have performed exhaustive updates to the human mitochondrial phylogeny, illustrating the power and computational scalability of our approach, and we have conducted some initial analyses on the resulting phylogenies.

CONCLUSIONS

The problem at hand demands careful definition of inputs and adequate algorithmic treatment for its solutions to be realistic and useful. It is possible to define formal rules to address the former requirement by refining inputs directly and through their combination as outputs, and the latter are also of help to ascertain the performance of chosen algorithms. Rules can exploit known or inferred properties of datasets to simplify inputs through partitioning, therefore cutting computational costs and affording work on rapidly growing, otherwise intractable datasets. Although expert guidance may be necessary to assist the learning process, low-risk results can be fully automated and have proved themselves convenient and valuable.

摘要

背景

线粒体 DNA 是进行进化和系统发育研究的理想信息来源,因为它具有非凡的特性和丰富的含量。从这些研究中可以获得许多见解,包括但不限于筛选遗传变异以识别潜在的有害突变。然而,这些进展需要高效的解决方案来解决非常困难的计算问题,而这些问题的解决需要大量的数据,这也给分析带来了困难。

结果

我们开发了一种系统的、自动化的方法来克服这些困难,从现成的公共序列数据库构建高质量的比对和系统发育树。在自主工作流程的每个阶段,都会仔细评估输出,并定义异常值检测规则,以整合专家知识和自动化编辑,从而避免过去解决该问题的方法中存在的手动瓶颈。使用这些技术,我们对人类线粒体系统发育进行了详尽的更新,展示了我们方法的强大功能和计算可扩展性,并且我们对生成的系统发育树进行了一些初步分析。

结论

手头的问题需要仔细定义输入,并对其解决方案进行适当的算法处理,才能使其具有现实意义和实用价值。可以通过直接细化输入并通过它们的组合作为输出来定义正式规则来满足前一个要求,而后者也有助于确定所选算法的性能。规则可以利用数据集的已知或推断属性通过分区简化输入,从而降低计算成本,并允许对快速增长的、否则难以处理的数据集进行处理。尽管可能需要专家指导来协助学习过程,但低风险的结果可以完全自动化,并且已经证明它们是方便和有价值的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e999/3123235/f09783e9990f/1471-2105-12-174-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验