Horvath Denis, Žoldák Gabriel
Center for Interdisciplinary Biosciences, Technology and Innovation Park, University of Pavol Jozef Šafárik, Jesenná 5, 041 01 Košice, Slovakia.
Entropy (Basel). 2020 Jun 23;22(6):701. doi: 10.3390/e22060701.
Recent advances in single-molecule science have revealed an astonishing number of details on the microscopic states of molecules, which in turn defined the need for simple, automated processing of numerous time-series data. In particular, large datasets of time series of single protein molecules have been obtained using laser optical tweezers. In this system, each molecular state has a separate time series with a relatively uneven composition from the point of view-point of local descriptive statistics. In the past, uncertain data quality and heterogeneity of molecular states were biased to the human experience. Because the data processing information is not directly transferable to the black-box-framework for an efficient classification, a rapid evaluation of a large number of time series samples simultaneously measured may constitute a serious obstacle. To solve this particular problem, we have implemented a supervised learning method that combines local entropic models with the global Lehmer average. We find that the methodological combination is suitable to perform a fast and simple categorization, which enables rapid pre-processing of the data with minimal optimization and user interventions.
单分子科学的最新进展揭示了关于分子微观状态的大量惊人细节,这反过来又确定了对众多时间序列数据进行简单、自动化处理的需求。特别是,已经使用激光光镊获得了单蛋白质分子时间序列的大型数据集。在这个系统中,从局部描述统计的角度来看,每个分子状态都有一个单独的时间序列,其组成相对不均匀。过去,不确定的数据质量和分子状态的异质性偏向于人类经验。由于数据处理信息不能直接转移到用于高效分类的黑箱框架中,同时对大量同时测量的时间序列样本进行快速评估可能构成严重障碍。为了解决这个特殊问题,我们实施了一种将局部熵模型与全局莱默平均值相结合的监督学习方法。我们发现这种方法组合适合进行快速简单的分类,从而能够以最少的优化和用户干预对数据进行快速预处理。