Leonardo Christian J, Melcer Kevin, Liu Steven H, Komatsu David E, Barsi James M
Department of Orthopedic Surgery, Stony Brook University, Stony Brook, USA.
Cureus. 2024 Nov 27;16(11):e74574. doi: 10.7759/cureus.74574. eCollection 2024 Nov.
Background The generation of innovative research ideas is crucial to advancing the field of medicine. As physicians face increasingly demanding clinical schedules, it is important to identify tools that may expedite the research process. Artificial intelligence may offer a promising solution by enabling the efficient generation of novel research ideas. This study aimed to assess the feasibility of using artificial intelligence to build upon existing knowledge by generating innovative research questions. Methods A comparative evaluation study was conducted to assess the ability of AI models to generate novel research questions. The prompt "research ideas for adolescent idiopathic scoliosis" was input into ChatGPT 3.5, Gemini 1.5, Copilot, and Llama 3. This resulted in an output of several research questions ranging from 10 questions to 14 questions. A keyword-friendly modified version of the AI-generated responses was searched in the PubMed database. Results were limited to manuscripts published in the English language from the year 2000 to the present. Each response was then cross-referenced to the PubMed search results and assigned an originality score of 0-5, with 0 being the most original and 5 being not original at all, by adding one numerical value for each paper already published on the topic. The mean originality scores were calculated manually by summing the originality scores from all the responses from each AI model and then dividing that sum by the respective number of prompts generated by the AI. The standard deviation of the originality scores for each AI was calculated using the standard deviation function (STDEV) function in Google Sheets (Google, Mountain View, California). Each AI was also evaluated on its percent novelty, the percentage of total generated responses that yielded an originality score of 0 when searched in PubMed. Results Each AI produced varying numbers of research prompts that were inputted into PubMed. The mean originality scores for ChatGPT, Gemini, Copilot, and Llama were 4.2 ± 1.9, 4.1 ± 1.3, 4.0 ± 1.6, and 3.8 ± 1.7, respectively. Of ChatGPT's 12 prompts, 16.67% were completely novel (no prior research had been conducted on the topic provided by the AI model). 10.00% of Copilot's 10 prompts were completely novel, and 8.33% of Llama's 12 prompts were completely novel. None of Gemini's fourteen responses yielded an originality score of 0. Conclusions Our findings demonstrate that ChatGPT, Llama, and Copilot are capable of generating novel ideas in orthopaedics research. As these models continue to evolve and become even more refined with time, physicians and scientists should consider incorporating them when brainstorming and planning their research studies.
背景 创新研究思路的产生对于推动医学领域发展至关重要。由于医生面临着日益繁重的临床工作安排,识别可能加快研究进程的工具很重要。人工智能通过高效生成新颖的研究思路或许能提供一个有前景的解决方案。本研究旨在评估利用人工智能在现有知识基础上生成创新研究问题的可行性。
方法 开展了一项比较评估研究,以评估人工智能模型生成新颖研究问题的能力。将提示语“青少年特发性脊柱侧凸的研究思路”输入ChatGPT 3.5、Gemini 1.5、Copilot和Llama 3。这产生了从10个到14个不等的若干研究问题。在PubMed数据库中搜索人工智能生成回复的关键词友好型修改版本。结果仅限于2000年至今以英文发表的手稿。然后将每个回复与PubMed搜索结果进行交叉核对,并根据该主题已发表的每篇论文为其分配一个0至5的原创性分数,0表示最具原创性,5表示完全没有原创性。通过将每个人工智能模型所有回复的原创性分数相加,然后将该总和除以人工智能生成的相应提示语数量,手动计算平均原创性分数。使用谷歌表格(谷歌,加利福尼亚州山景城)中的标准偏差函数(STDEV)计算每个人工智能的原创性分数的标准差。还根据其新颖性百分比对每个人工智能进行评估,即在PubMed中搜索时产生原创性分数为0的生成回复总数的百分比。
结果 每个人工智能生成了不同数量的输入到PubMed中的研究提示语。ChatGPT、Gemini、Copilot和Llama的平均原创性分数分别为4.2±1.9、4.1±1.3、4.0±1.6和3.8±1.7。在ChatGPT的12个提示语中,16.67%是完全新颖的(人工智能模型提供的主题此前未进行过研究)。Copilot的10个提示语中有10.00%是完全新颖的,Llama的12个提示语中有8.33%是完全新颖的。Gemini的14个回复中没有一个产生的原创性分数为0。
结论 我们的研究结果表明,ChatGPT、Llama和Copilot能够在骨科研究中生成新颖的想法。随着这些模型不断发展并随着时间推移变得更加完善,医生和科学家在进行头脑风暴和规划研究时应考虑采用它们。