Division of Surgical Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA.
Department of General Surgery, Beijing Anzhen Hospital, Capital Medical University, Beijing, China.
Breast J. 2020 Jan;26(1):92-99. doi: 10.1111/tbj.13718. Epub 2019 Dec 18.
The medical literature has been growing exponentially, and its size has become a barrier for physicians to locate and extract clinically useful information. As a promising solution, natural language processing (NLP), especially machine learning (ML)-based NLP is a technology that potentially provides a promising solution. ML-based NLP is based on training a computational algorithm with a large number of annotated examples to allow the computer to "learn" and "predict" the meaning of human language. Although NLP has been widely applied in industry and business, most physicians still are not aware of the huge potential of this technology in medicine, and the implementation of NLP in breast cancer research and management is fairly limited. With a real-world successful project of identifying penetrance papers for breast and other cancer susceptibility genes, this review illustrates how to train and evaluate an NLP-based medical abstract classifier, incorporate it into a semiautomatic meta-analysis procedure, and validate the effectiveness of this procedure. Other implementations of NLP technology in breast cancer research, such as parsing pathology reports and mining electronic healthcare records, are also discussed. We hope this review will help breast cancer physicians and researchers to recognize, understand, and apply this technology to meet their own clinical or research needs.
医学文献呈指数级增长,其规模已成为医生定位和提取临床有用信息的障碍。作为一种有前途的解决方案,自然语言处理(NLP),特别是基于机器学习(ML)的 NLP 是一种潜在的解决方案。基于 ML 的 NLP 基于使用大量带注释的示例来训练计算算法,从而使计算机能够“学习”和“预测”人类语言的含义。尽管 NLP 在工业和商业中得到了广泛的应用,但大多数医生仍然没有意识到这项技术在医学上的巨大潜力,并且 NLP 在乳腺癌研究和管理中的应用相当有限。通过一个用于识别乳腺癌和其他癌症易感性基因外显率论文的实际成功项目,本综述说明了如何训练和评估基于 NLP 的医学摘要分类器,将其纳入半自动荟萃分析程序,并验证该程序的有效性。还讨论了 NLP 技术在乳腺癌研究中的其他应用,例如解析病理报告和挖掘电子医疗记录。我们希望本综述将帮助乳腺癌医生和研究人员认识、理解和应用这项技术,以满足他们自己的临床或研究需求。