HE Y Q, WU Z Z, GUO S,Citation: et al. Diagnostic performance of artificial intelligence ChatGPT-4V in differentiating benign and malignant breast lesions on ultrasound[J]. Oncoradiology, 2026, 35(1): 57-63.
HE Y Q, WU Z Z, GUO S,Citation: et al. Diagnostic performance of artificial intelligence ChatGPT-4V in differentiating benign and malignant breast lesions on ultrasound[J]. Oncoradiology, 2026, 35(1): 57-63. DOI: 10.19732/j.cnki.2096-6210.2026.01.008.
Diagnostic performance of artificial intelligence ChatGPT-4V in differentiating benign and malignant breast lesions on ultrasound
回顾并纳入2024年1月—2025年6月秦皇岛市第一医院乳腺病变患者,以病理学检查结果为金标准,由ChatGPT-4V、2名低年资(3~5年工作经验)及2名高年资医师(>10年工作经验),采用盲法独立判读超声图像。记录灵敏度、特异度、准确度、受试者工作特征(receiver operating characteristic curve,ROC)曲线的曲线下面积(area under curve,AUC),并以乳腺影像报告和数据系统(Breast Imaging Reporting and Data System,BI-RADS)为标准评估形状、边界、回声类型、后方回声、钙化特征识别准确度。采用McNemar检验比较准确度,DeLong检验比较AUC,临床决策曲线评估净收益。
To evaluate the diagnostic performance and feature recognition accuracy of ChatGPT-4V in classifying benign and malignant breast lesions on ultrasound
compared with junior and senior physicians
and to explore its feasibility as an auxiliary diagnostic tool.
Methods
2
A retrospective study included patients with breast lesions from The First Hospital of Qinhuangdao between January 2024 and June 2025. With pathological examination results as the gold standard
ChatGPT-4V
two junior physicians (3-5 years of experience)
and two senior physicians (>10 years of experience) independently interpreted ultrasound images in a blinded manner. Sensitivity
specificity
accuracy
and area under the receiver operating characteristic curve (AUC) were recorded. The accuracy of identifying ultrasound features
including shape
margin
echo pattern
posterior acoustic features
and calcifications
were evaluated against the criteria of the Breast Imaging Reporting and Data System (BI-RADS). The McNemar test was used to compare diagnostic accuracy among different interpreters
the DeLong method was employed to compare AUC values
and decision curve analysis (DCA) was performed to assess the net benefit across varying threshold probabilities.
Results
2
ChatGPT-4V’s diagnostic performance was comparable to that of junior physicians (accuracy
P>
0.05) but inferior to senior physicians (
P<
0.05). DCA showed similar net benefit to junior physicians at low thresholds. Compared with junior physicians
ChatGPT-4V had significantly lower accuracy in identifying echo pattern (
P
=0.012) and posterior features (
P
=0.018)
with no statistical difference in calcification recognition (
P
=1.000). Compared with senior physicians
ChatGPT-4V showed significantly insufficient accuracy in recognizing all ultrasound features
including shape
margin
echo pattern
posterior features
and calcification (
P
<
0.05). ChatGPT-4V misdiagnosed 24 cases (16.0%)
with malignant-to-benign errors often linked to smooth margins and benign-to-malignant errors to irregular shapes.
Conclusion
2
ChatGPT-4V demonstrates diagnostic performance close to junior physicians
making it a potential auxiliary tool for breast ultrasound screening in primary care. However
its limitations in complex feature recognition require improvement through targeted optimization to enhance clinical utility.
关键词
Keywords
references
SUNG H , FERLAY J , SIEGEL R L , et al . Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries [J]. CA Cancer J Clin , 2021 , 71 ( 3 ): 209 - 249 .
GUO Y L , LI N , SONG C H , et al . Artificial intelligence-based automated breast ultrasound radiomics for breast tumor diagnosis and treatment: a narrative review [J]. Front Oncol , 2025 , 15 : 1578991 .
YAN H J , DAI C C , XU X J , et al . Using artificial intelligence system for assisting the classification of breast ultrasound glandular tissue components in dense breast tissue [J]. Sci Rep , 2025 , 15 ( 1 ): 11754 .
LEE J , KIM W H , KIM J , et al . Efficacy of a real-time artificial intelligence ultrasound system with computer-aided detection and diagnosis for breast cancer: a feasibility study [J]. J Breast Cancer , 2025 , 28 ( 3 ): 206 - 214 .
MAHANT S S , VARMA A R . Artificial intelligence in breast ultrasound: the emerging future of modern medicine [J]. Cureus , 2022 , 14 ( 9 ): e28945 .
American College of Radiology . ACR BI-RADS atlas: breast imaging reporting and data system (6th edition) [M]. Reston, VA : American College of Radiology , 2021 : 42 - 98 .
XIANG H L , WANG X , XU M , et al . Deep learning-assisted diagnosis of breast lesions on US images: a multivendor, multicenter study [J]. Radiol Artif Intell , 2023 , 5 ( 5 ): e220185 .
LIU H X , CUI G Z , LUO Y , et al . Artificial intelligence-based breast cancer diagnosis using ultrasound images and grid-based deep feature generator [J]. Int J Gen Med , 2022 , 15 : 2271 - 2282 .
WANG Q , DANG X Z , XU L , et al . Current status and future perspectives of application of ultrasound in breast cancer screening [J]. Chin J Med Ultrasound Electron Ed , 2024 , 21 ( 4 ): 429 - 433 .
PESAPANE F , TRENTIN C , FERRARI F , et al . Deep learning performance for detection and classification of microcalcifications on mammography [J]. Eur Radiol Exp , 2023 , 7 ( 1 ): 69 .
DAN Q , XU Z T , BURROWS H , et al . Diagnostic performance of deep learning in ultrasound diagnosis of breast cancer: a systematic review [J]. NPJ Precis Oncol , 2024 , 8 ( 1 ): 21 .
HE Y Z , YAO Z Q , HE X N , et al . Preliminary exploration of ChatGPT with vision in the recognition and diagnostic value of breast ultrasound lesions [J]. Chin J Ultrasound Med , 2025 , 41 ( 1 ): 13 - 16 .
GU Y , XU W , LIN B , et al . Deep learning based on ultrasound images assists breast lesion diagnosis in China: a multicenter diagnostic study [J]. Insights Imaging , 2022 , 13 ( 1 ): 124 .
QIAN X J , PEI J , ZHENG H , et al . Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning [J]. Nat Biomed Eng , 2021 , 5 ( 6 ): 522 - 532 .
CHENG M X , ZENG L H , WU Y , et al . Application progress of artificial intelligence and big data in ultrasound medicine practice [J]. Oncoradiology , 2023 , 32 ( 1 ): 78 - 82 .
Research progress of ultrasound radiomics based on artificial intelligence in the diagnosis and treatment of breast cancer
Advances in the application of ultrasound artificial intelligence in prostate cancer
Risk assessment of sentinel lymph node metastasis in breast cancer using a radiomics model based on B-mode ultrasound and color doppler ultrasound
Application and significance of multimodal ultrasound for preoperative assessment of metastasis in lymph nodes posterior to the right recurrent laryngeal nerve in papillary carcinoma of the thyroid
Multimodal ultrasound features combined with serum CEA and CK19 to predict axillary lymph node metastasis of breast cancer
Related Author
HUANG Yini
ZHOU Jianhua
Wanjun JIANG
Zhen WANG
Yunxin ZHAO
Yicheng ZHU
Yuan ZHANG
Zheqin YANG
Related Institution
Department of Ultrasound, Shanghai Punan Hospital of Pudong New District
College of Medical Instrumentation, Shanghai University of Medicine & Health Sciences
Department of Ultrasound, Shanghai Pudong New Area People's Hospital
Department of Ultrasound, The First Affiliated Hospital of Nanjing Medical University