人工智能ChatGPT-4V在乳腺超声病灶良恶性鉴别中的诊断效能

贺玉卿; 吴梓政; 郭帅; 高庆壮; 李慧; 杨百胜

doi:10.19732/j.cnki.2096-6210.2026.01.008

您当前的位置：

首页 >

文章列表页 >

人工智能ChatGPT-4V在乳腺超声病灶良恶性鉴别中的诊断效能

论著 | 更新时间：2026-03-25

- 人工智能ChatGPT-4V在乳腺超声病灶良恶性鉴别中的诊断效能
- Diagnostic performance of artificial intelligence ChatGPT-4V in differentiating benign and malignant breast lesions on ultrasound
- 肿瘤影像学 2026年35卷第1期页码：57-63
- 作者机构：
  
  1.秦皇岛市第一医院超声医学科，河北秦皇岛 066001
  2.秦皇岛市卫生学校，河北秦皇岛 066001
- 作者简介：
  
  贺玉卿（ORCID：0009-0002-2840-4798），硕士，主治医师。
  吴梓政（ORCID：0000-0001-8253-0352），博士，副主任医师，E-mail：wwzzh890415@163.com。
- 基金信息：
  
  河北省医学科学研究课题计划(20231893);秦皇岛市科学技术研究与发展计划(202301A199)
- DOI：10.19732/j.cnki.2096-6210.2026.01.008
  中图分类号： R737.9;R445.1
- 收稿：2025-08-15，
  
  修回：2026-01-05，
  
  纸质出版：2026-02-28
- 稿件说明：
移动端阅览
贺玉卿, 吴梓政, 郭　帅, 等. 人工智能ChatGPT-4V在乳腺超声病灶良恶性鉴别中的诊断效能［J］肿瘤影像学, 2026, 35(1): 57-63.

HE Y Q, WU Z Z, GUO S,Citation: et al. Diagnostic performance of artificial intelligence ChatGPT-4V in differentiating benign and malignant breast lesions on ultrasound［J］. Oncoradiology, 2026, 35(1): 57-63.
贺玉卿, 吴梓政, 郭　帅, 等. 人工智能ChatGPT-4V在乳腺超声病灶良恶性鉴别中的诊断效能［J］肿瘤影像学, 2026, 35(1): 57-63. DOI： 10.19732/j.cnki.2096-6210.2026.01.008.

HE Y Q, WU Z Z, GUO S,Citation: et al. Diagnostic performance of artificial intelligence ChatGPT-4V in differentiating benign and malignant breast lesions on ultrasound［J］. Oncoradiology, 2026, 35(1): 57-63. DOI： 10.19732/j.cnki.2096-6210.2026.01.008.

摘要

目的

评估ChatGPT-4V在乳腺超声病灶良恶性判读中的诊断效能，并与低年资及高年资医师进行比较，探讨其辅助诊断的可行性。

方法

回顾并纳入2024年1月—2025年6月秦皇岛市第一医院乳腺病变患者，以病理学检查结果为金标准，由ChatGPT-4V、2名低年资（3~5年工作经验）及2名高年资医师（>10年工作经验），采用盲法独立判读超声图像。记录灵敏度、特异度、准确度、受试者工作特征（receiver operating characteristic curve，ROC）曲线的曲线下面积（area under curve，AUC），并以乳腺影像报告和数据系统（Breast Imaging Reporting and Data System，BI-RADS）为标准评估形状、边界、回声类型、后方回声、钙化特征识别准确度。采用McNemar检验比较准确度，DeLong检验比较AUC，临床决策曲线评估净收益。

结果

ChatGPT-4V诊断效能接近低年资医师（准确度

0.05），但低于高年资医师（

0.05），临床决策曲线显示低阈值净收益接近低年资医师。ChatGPT-4V与低年资医师比较，在回声类型（

=0.012）、后方回声（

=0.018）方面的识别准确度显著更低，钙化特征识别差异无统计学意义（

=1.000）；ChatGPT-4V与高年资医师比较，在形状、边界、回声类型、后方回声及钙化所有超声特征的识别上均显著不足（

0.05）。ChatGPT-4V误判24例（16.0%），恶性误为良性多见于边界光整，良性误为恶性多见于不规则形状。

结论

ChatGPT-4V接近低年资医师效能，适合基层筛查辅助，但在复杂特征识别方面需改进，未来可继续优化以提升临床应用价值。

Abstract

Objective

To evaluate the diagnostic performance and feature recognition accuracy of ChatGPT-4V in classifying benign and malignant breast lesions on ultrasound

compared with junior and senior physicians

and to explore its feasibility as an auxiliary diagnostic tool.

Methods

A retrospective study included patients with breast lesions from The First Hospital of Qinhuangdao between January 2024 and June 2025. With pathological examination results as the gold standard

ChatGPT-4V

two junior physicians (3-5 years of experience)

and two senior physicians (>10 years of experience) independently interpreted ultrasound images in a blinded manner. Sensitivity

specificity

accuracy

and area under the receiver operating characteristic curve (AUC) were recorded. The accuracy of identifying ultrasound features

including shape

margin

echo pattern

posterior acoustic features

and calcifications

were evaluated against the criteria of the Breast Imaging Reporting and Data System (BI-RADS). The McNemar test was used to compare diagnostic accuracy among different interpreters

the DeLong method was employed to compare AUC values

and decision curve analysis (DCA) was performed to assess the net benefit across varying threshold probabilities.

Results

ChatGPT-4V’s diagnostic performance was comparable to that of junior physicians (accuracy

0.05) but inferior to senior physicians (

0.05). DCA showed similar net benefit to junior physicians at low thresholds. Compared with junior physicians

ChatGPT-4V had significantly lower accuracy in identifying echo pattern (

=0.012) and posterior features (

=0.018)

with no statistical difference in calcification recognition (

=1.000). Compared with senior physicians

ChatGPT-4V showed significantly insufficient accuracy in recognizing all ultrasound features

including shape

margin

echo pattern

posterior features

and calcification (

0.05). ChatGPT-4V misdiagnosed 24 cases (16.0%)

with malignant-to-benign errors often linked to smooth margins and benign-to-malignant errors to irregular shapes.

Conclusion

ChatGPT-4V demonstrates diagnostic performance close to junior physicians

making it a potential auxiliary tool for breast ultrasound screening in primary care. However

its limitations in complex feature recognition require improvement through targeted optimization to enhance clinical utility.

关键词

Keywords

references

SUNG H , FERLAY J , SIEGEL R L , et al . Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries ［J］. CA Cancer J Clin , 2021 , 71 ( 3 ): 209 - 249 .

GUO Y L , LI N , SONG C H , et al . Artificial intelligence-based automated breast ultrasound radiomics for breast tumor diagnosis and treatment: a narrative review ［J］. Front Oncol , 2025 , 15 : 1578991 .

YAN H J , DAI C C , XU X J , et al . Using artificial intelligence system for assisting the classification of breast ultrasound glandular tissue components in dense breast tissue ［J］. Sci Rep , 2025 , 15 ( 1 ): 11754 .

LEE J , KIM W H , KIM J , et al . Efficacy of a real-time artificial intelligence ultrasound system with computer-aided detection and diagnosis for breast cancer: a feasibility study ［J］. J Breast Cancer , 2025 , 28 ( 3 ): 206 - 214 .

MAHANT S S , VARMA A R . Artificial intelligence in breast ultrasound: the emerging future of modern medicine ［J］. Cureus , 2022 , 14 ( 9 ): e28945 .

American College of Radiology . ACR BI-RADS atlas: breast imaging reporting and data system (6th edition) ［M］. Reston, VA : American College of Radiology , 2021 : 42 - 98 .

XIANG H L , WANG X , XU M , et al . Deep learning-assisted diagnosis of breast lesions on US images: a multivendor, multicenter study ［J］. Radiol Artif Intell , 2023 , 5 ( 5 ): e220185 .

LIU H X , CUI G Z , LUO Y , et al . Artificial intelligence-based breast cancer diagnosis using ultrasound images and grid-based deep feature generator ［J］. Int J Gen Med , 2022 , 15 : 2271 - 2282 .

王　琪 , 党晓智 , 许　磊 , 等 . 超声在乳腺癌筛查中的应用现状与未来［J］. 中华医学超声杂志（电子版）, 2024 , 21 ( 4 ): 429 - 433 .

WANG Q , DANG X Z , XU L , et al . Current status and future perspectives of application of ultrasound in breast cancer screening ［J］. Chin J Med Ultrasound Electron Ed , 2024 , 21 ( 4 ): 429 - 433 .

PESAPANE F , TRENTIN C , FERRARI F , et al . Deep learning performance for detection and classification of microcalcifications on mammography ［J］. Eur Radiol Exp , 2023 , 7 ( 1 ): 69 .

DAN Q , XU Z T , BURROWS H , et al . Diagnostic performance of deep learning in ultrasound diagnosis of breast cancer: a systematic review ［J］. NPJ Precis Oncol , 2024 , 8 ( 1 ): 21 .

何奕宗 , 姚振强 , 何小娜 , 等 . 具备视觉功能的ChatGPT对乳腺超声图像病变的识别能力和诊断价值初探［J］. 中国超声医学杂志 , 2025 , 41 ( 1 ): 13 - 16 .

HE Y Z , YAO Z Q , HE X N , et al . Preliminary exploration of ChatGPT with vision in the recognition and diagnostic value of breast ultrasound lesions ［J］. Chin J Ultrasound Med , 2025 , 41 ( 1 ): 13 - 16 .

GU Y , XU W , LIN B , et al . Deep learning based on ultrasound images assists breast lesion diagnosis in China: a multicenter diagnostic study ［J］. Insights Imaging , 2022 , 13 ( 1 ): 124 .

QIAN X J , PEI J , ZHENG H , et al . Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning ［J］. Nat Biomed Eng , 2021 , 5 ( 6 ): 522 - 532 .

程妙仙 , 曾令红 , 吴　忧 , 等 . 人工智能与大数据在超声医学实践中的应用进展［J］. 肿瘤影像学 , 2023 , 32 ( 1 ): 78 - 82 .

CHENG M X , ZENG L H , WU Y , et al . Application progress of artificial intelligence and big data in ultrasound medicine practice ［J］. Oncoradiology , 2023 , 32 ( 1 ): 78 - 82 .

浏览量

下载量

CNKI被引量

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于人工智能的超声影像组学在乳腺癌诊疗中的研究进展

超声人工智能在前列腺癌诊断中的应用进展

基于灰阶超声与彩色多普勒超声影像组学模型的乳腺癌前哨淋巴结转移风险评估研究

多模态超声术前评估右侧喉返神经后方淋巴结转移的应用与意义

多模态超声特征联合血清CEA、CK19预测乳腺癌腋窝淋巴结转移