大语言模型在检验医学领域的应用潜力与挑战评估
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

2023年度四川省留学回国人员科技活动项目(川人社-202303-5)


Evaluation of the Application Potential and Challenges of Large Language Models in the Field of Laboratory Medicine
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    摘要:目的 评估ChatCPT-4.0、ERNE Bot-4.0在检验医学领域的应用表现,探讨其在专业领域内的应用潜力及面临的挑战。方法以全国临床医学检验技术(中级)考试真题作为基准,对比2个模型在检验医学知识掌握和答题一致性方面的表现;通过30个检验医学病例评估模型在检验结果解读和辅助诊断方面的能力。结果在临床医学检验技术测试中,2个模型均通过了60%的合格线。ChatCPT-4.0在答题速度和一致性方面优于ERNIE Bot-4.0,但在答题正确率上明显低于ERNE Bot-4.0(73.25% vs 80.75% ) ,且 ERNE Bot-4.0正确率高于临床检验人员此项考试的平均正确率78.03%。不同题型正确率分析方面, ERNIE Bot-4.0和ChatCPT-4.0均在实验技术题型中表现最差(66.32%和60.53%) ,在医学基础知识题型上表现最好,成绩都为86.00%。在病例分析测试中,ERNIE Bot-4.0的各项评分均高于ChatGCPT-4.0,两者均在常规病例分析上表现良好,但在复杂病例分析中会发生错误。结论﹑在检验医学领域,2个大语言模型都展现出了一定的应用潜力,特别是在中文环境下,ERNE Bot-4.0在答题正确率和病例分析能力方面显著优于ChatCPT-4.0,这显示了其在国内应用中的相对优势。不过,2个模型在实验技术知识、复杂病例的分析能力以及结果输出的准确性和一致性方面还有待提升。在现阶段,直接将这类通用型大语言模型应用于临床检验结果解读及辅助诊断仍存在一定风险,这为检验报告的解读提供了新的研究方向。

    Abstract:

    Abstract:ObjectiveTo evaluate the perfomance of ChatCPT-4.0 and ERNE Bot-4.0 in the field of laloratory medlicine , and ex-plore their application potential and challenges in this pofessional domain.Methods Using the national clinical medical laboratorytechnology ( intermedliale) examination questions as a benchmark, we compared the performance of the two models in mastering labora-tory medlicine knowledge and answering consistency.We also and assessed the molels’ ability in inferpreting test results and assistingdiagnosis through 30 laboratory medicine cases.Results In the clinical medlical examination technology test, both modlels passed the60% qualification threshold.ChatCPT-4.0 was superior to ERNE Bot-4.0 in terms of answering speed and consistency, but its answer-ing accuracy was significantly lower than that of ERNE Bot-4.0 (73.25% vs 80.75%). ERNE Bot-4.0's accuracy rate was higher thanthe average accturacy rate of clinical aboratory personnel in this examination(78.03% ). In the accuracy analysis of different questiontypes , both performed wost in experimental techmology questions ( ERNE Bot-4.0:66.32% , ChatGPT-4.0 :60.53%)) and best in bas-ic medical knowledge questions( both scoring 86.00% ) . In the case analysis test,ERNE Bot-4.0 outperformed ChatCPT-4.0 in all cat-egories.Both molels performed well in routine case analysis but made erros in complex case analysis. Conclusion In the field of la-boratory medicine,both large language modlels have shown certain application potential , especially in a Chinese context , where ERNIEBot-4.0 significantly outperforms ChatCPT-4.0 in terms of answering accuracy and case analysis ability , indlicating its relative adlvantagein domestic applications.However, both models still need improvement in experimental technical knowledge , complex case analysis ca-pabilities, and the accuracy and consistency of result output. At the current stage , there are still certain risks in directly applying suchgeneral large language models to clinical test result interpretation and assisted diagnosis , which provides a new research direction for theinterpretation of test reports.

    参考文献
    相似文献
    引证文献
引用本文

陆小琴,佳薇,武宇翔,武永康.大语言模型在检验医学领域的应用潜力与挑战评估[J].临床检验杂志,2024,42(08):619-623

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-03-15
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-09-27
  • 出版日期:
文章二维码
您是第位访问者  苏ICP备13058113号-3
苏公网安备32010202012004号
主管单位:江苏省医学会  出版单位:临床检验杂志
单位地址:江苏省南京市中央路42号  邮编:210008
电话:025-83620683 E-MAIL:lcjyzz@163.com
技术支持:北京勤云科技发展有限公司