Ive Julia
School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK.
Front Digit Health. 2022 Oct 24;4:1010202. doi: 10.3389/fdgth.2022.1010202. eCollection 2022.
In today's world it seems fair to say that extensive digital data sharing is the price we pay for the technological advances we have seen achieved as a result of AI systems analysing large quantities of data in a relatively short time. Where such AI is used in the realm of mental health, this data sharing poses additional challenges not just due to the sensitive nature of the data itself but also the potential vulnerability of the data donors themselves should there be a cybersecurity data breach. To address the problem, the AI community proposes to use synthetic text preserving only the salient properties of the original. Such text has potential to fill gaps in the textual data availability (e.g., rare conditions or under-represented groups) while reducing exposure. Our perspective piece is aimed to demystify the process of generating synthetic text, explain its algorithmic and ethical challenges, especially for the mental health domain, as well as most promising ways of overcoming them. We aim to promote better understanding and as a result acceptability of synthetic text outside the research community.
在当今世界,可以说广泛的数字数据共享是我们为人工智能系统在相对较短的时间内分析大量数据所取得的技术进步而付出的代价。在心理健康领域使用此类人工智能时,这种数据共享不仅会带来额外的挑战,这不仅是因为数据本身的敏感性质,还因为如果发生网络安全数据泄露,数据提供者自身可能会面临风险。为了解决这个问题,人工智能社区提议使用仅保留原始文本显著特征的合成文本。这样的文本有可能填补文本数据可用性方面的空白(例如,罕见病症或代表性不足的群体),同时减少暴露风险。我们的观点文章旨在揭开合成文本生成过程的神秘面纱,解释其算法和伦理挑战,特别是针对心理健康领域的挑战,以及克服这些挑战的最有前景的方法。我们旨在促进更好的理解,并因此提高研究界之外对合成文本的接受度。