J Cancer Prev 2021; 26(4): 258-265
Published online December 30, 2021
https://doi.org/10.15430/JCP.2021.26.4.258
© Korean Society of Cancer Prevention
Ji Young Jang1,* , Eun Young Ko1,* , Ji Soo Jung1 , Kyung Nam Kang1 , Yeon Soo Kim2 , Chul Woo Kim1
1BIOINFRA Life Science Inc., Seoul, 2DIOGENE Inc., Seongnam, Korea
Correspondence to :
Chul Woo Kim, E-mail: chulwoo.kim@bioinfra.co.kr, https://orcid.org/0000-0002-1229-198X
*These authors contributed equally to this work as co-first authors.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This study was conducted to confirm the performance of the microRNA (miRNA) biomarker combination as a new breast cancer screening method in Korean women under the age of 50 with a high percentage of dense breasts. To determine the classification performance of a set of miRNA biomarkers (miR-1246, 202, 21, and 219B) useful for breast cancer screening, we determined whether there was a significant difference between the breast cancer and healthy control groups through box plots and the Mann–Whitney U-test, which was further examined in detail by age group. To verify the classification performance of the 4 miRNA biomarker set, 4 classification methods (logistic regression, random forest, XGBoost, and generalized linear model plus random forest) were applied, and 10-fold cross-validation was used as a validation method to improve performance stability. We confirmed that the best breast cancer detection performance was achievable in patients under 50 years of age when the set of 4 miRNAs were used. Under the age of 50, the 4 miRNA biomarkers showed the highest performance with a sensitivity of 85.29%, specificity of 93.33%, and area under the curve (AUC) of 0.961. Examining the results of 4 miRNA biomarkers was found to be an effective strategy for diagnosing breast cancer in Korean women under 50 years of age with dense breasts, and hence has the potential as a new breast cancer screening tool. Further validation in an appropriate screening population with large-scale clinical trials is required.
Keywords: MicroRNA, Breast cancer, Screening, Dense breast, Korean women under the age of 50
In Korea, breast cancer is the most common malignancy in women. In 2017, there were 22,300 new cases and the crude incidence rate was 86.9 per 100,000 according to data from the Korea National Cancer Incidence Database (KNCID) [1]. The number of incident breast cancers in 2019 was estimated to be at 24,010, with a crude incidence rate of 92.9 per 100,000 [2], suggesting an increasing trend. An epidemiologic study has suggested that the incidence rate is expected to rapidly increase within the next 10 years in Korea due to the increasing proportion of elderly individuals in the population and the continuous adoption of the Westernized lifestyle [3]. This trend highlights the importance of effective breast cancer screening in Korea.
As an early screening test for asymptomatic people, if there are no specific risk factors, mammography is the best method available. Mammography is the only screening method that has been shown to reduce breast cancer mortality, and the most important benefit of screening mammography is the early detection of breast cancer and reduced mortality. The National Cancer Screening Program (NCSP) in Korea introduced breast cancer screening in 2002 [4]. As of 2015, the NCSP guidelines recommend routine biennial breast cancer screening by mammography for women aged 40 to 69 years, and for women above 70 years old, according to individual preference and risk [5].
Mammography is the primary imaging modality for breast cancer screening worldwide. Despite being acknowledged as a first-line tool, multiple studies have reported that its sensitivity may be as low as 30% to 48% for dense breasts [6-8]. The reason is that dense breast tissue can obscure breast cancers on mammography and yield false-negative results. Dense breasts are a well-known risk factor for breast cancer, and can also negatively influence the accuracy of breast cancer screening by mammography [6,9]. Breast density is defined as the proportion of fibroglandular tissue in the total breast volume. It can be influenced by age, menopausal status, and ethnicity. The proportion of dense breasts tends to be decreased as the patient ages and goes through menopause [9-11].
In Korean women, the frequency of dense breasts according to age was 88.1% at 30 to 34 years old, 91.1% at 35 to 39 years old, 78.3% at 40 to 44 years old, 61.1% at 45 to 49 years old, 30.1% at 50 to 54 years old, 21.1% at 55 to 59 years old, and 7.0% at 60 to 64 years old. Korean women aged 30 to 49 years showed a very high frequency of high-density breasts, whereas those aged 50 to 54 years had a sharp decrease to 30.1%. Compared to Western women who have high density breasts, 47.2% for those aged 40 to 44 and 44.8% for those aged 45 to 49, Korean women in their 40s have a very high frequency of high-density breasts [12,13]. Mammographic breast density may be the most undervalued and underused risk factor in studies investigating breast cancer occurrence. The risk for breast cancer is four to six times higher in women with dense breasts. Breast density may also decrease the sensitivity and, thus, the accuracy of mammography [14-16].
Accordingly, the limitations of screening mammography for women with dense breasts under the age of 50 are clear. Developing a new high-accuracy breast cancer screening tool in Korean women under the age of 50 with a high proportion of high-density breasts could increase the rate of early breast cancer detection, thereby increasing the therapeutic effect and survival rate. The purpose of this study was to confirm the value of a novel blood-based multiplex miRNA assay as a new adjunct tool for breast cancer screening in Korean women under 50 years of age with high breast density.
In this study, we generated a model to identify breast cancer risk scores using 165 breast cancer patients and 165 healthy control plasma and used 10-fold cross-validation to validate the stability of the model performance by age and stage (Table 1).
Table 1 . Specimen characteristics and breast cancer histology of the subjects used in the generation and validation of models to identify breast cancer risk scores
Specimens | Breast cancers (n = 165) | Healthy controls (n = 165) | |
---|---|---|---|
Age | < 50 | 68 | 60 |
50 to 59 | 50 | 56 | |
≥ 60 | 47 | 49 | |
Stage (average age) | 0 | 5 (50.4) | - |
1 | 81 (52.1) | - | |
2 | 58 (52.6) | - | |
3 | 21 (57.5) | - |
-, not available.
Plasma samples from breast cancer patients with cancers were obtained from the Korea Regional Biobank of the Korea Institute of Radiological and Medical Sciences, Gangwon National University Hospital, and Inje University Busan Paik Hospital. The breast cancer samples were obtained before any therapeutic approaches were performed. This study was ethically approved by the BIOINFRA Life Science Institutional Review Board (No. 1-700097-B-N-01). The samples were stored at –80°C until analyzed. Plasma samples from asymptomatic healthy donors were obtained from the Korea Regional Biobank of Ajou University Hospital and Kyungpook National University Hospital. Healthy controls with a known history of cancer, high-grade dysplasia, autoimmune disease, or chronic kidney disease, pregnancy, or inflammatory conditions that needed medical management were excluded. Table 1 presents the number of samples for each age and stage of all breast cancer patients. The cancer clinical stage was determined by the final pathological diagnosis after resection, according to the 7th edition of the Union for International Cancer Control tumor-node-metastasis classification.
Circulating RNA was extracted from 300 µL of plasma using a nucleic acid automated extraction equipment (Smart Lab Assist-24; Korea KETT, Seoul, Korea), and finally eluted in 150 µL of RNase-free water. The concentration and purity of the extracted circulating RNA were confirmed using Thermo ScientificTM NanoDropTM 2000 Spectrophotometers (Thermo Fisher Scientific, Waltham, MA, USA).
To remove gDNA from the extracted circulating RNA, a gDNA removal process (42°C, 2 minutes) was performed using gDNA Eraser (product code RR047A, Takara, Shiga, Japan). Quantitative real-time PCR (qRT-PCR) was performed on the total RNA using 4 miRNAs and internal control (IC) primers for standardization. The reaction solution (total RNA, primers, 2X qRT-PCR master mixture, and ddH2O) was put into each of 4 prepared tubes, and qRT-PCR was performed under the following conditions [50°C for 15 minutes (1 cycle)→95°C for 10 minutes (1 cycle)→95°C for 10 seconds, and 65°C for 20 seconds (40 cycles)] and the Bio-Rad CFX96 Dx system (Bio-Rad, Hercules, CA, USA) was used for genetic analysis. The primer sequences used for PCR follow, and X in the primer sequence represents inosine.
miR-1246 (NR_031648.1) primer sequences: forward, 5’-TCT CTXXXT GAA GTA GGA CTG GGC AGA GA-3’; reverse, 5’-CTC AAXXXT GTT TGC AAT AGC CCT TTG AG-3’
miR-202 (NR_030170.1) primer sequences: forward, 5’-GGC CAXXXG CAT ATA CTT CTT TGA GGA TCT GGC C-3’; reverse, 5’-CAT GGXXXG ACC GCC CCG TTT TCC CAT G-3’
miR-21(NR_029493.1) primer sequences: forward, 5’-CAG TCXXXG TCG GGT AGC TTA TCA GAC TG-3’; reverse, 5’-CAG TCXXXC AGA CAG CCC ATC GAC TG-3’
miR-219B (NR_039815.1) primer sequences: forward, 5’-ACA TCXXXG GAG CTC AGC CAC AGA TGT-3’; reverse, 5’-GTT TGXXXG CGC CAC TGA TTG TCC AAA C-3’
Human glyceraldehyde-3-phosphate dehydrogenase gene primer sequence: forward, 5’-CAG GTXXXT GCC AAC GTG TCA GTG GTG GAC CTG-3’; reverse, 5’-CAT CCXXXA CCT GGT GCT CAG TGT AGC CCA GGA TG-3’
glyceraldehyde-3-phosphate dehydrogenase was used as the reference gene for qRT-PCR, and RNA concentration used in the experiment was standardized as the reference gene. The Ct (cycle threshold) is defined as the number of cycles required for the fluorescent signal to cross the threshold (ie exceeds background level) and the dCt is defined as Ct (reference gene)–Ct (target gene).
In this study, the classification performance of 4 miRNA biomarkers was investigated and was particularly high in the patient age group under 50 years old. Before modeling, the differences between breast cancer and healthy controls were examined by age group using box plots and the Mann–Whitney U-test. In addition, we tried 4 classification methods (logistic regression, random forest, XGBoost, and generalized linear model plus random forest [GLMRF]) to determine the classification performance of the 4 miRNA biomarkers. These methods are predictive models for screening breast cancer patients and their performance was compared using 10-fold cross-validation.
Logistic regression is a special case of the general linear model (GLM) and is a representative method used in various classification and prediction models. This method has the advantage of being able to interpret the prediction results through formulas. Random forest and XGBoost are widely used machine learning algorithms as ensemble techniques. In general, these methods are known to have high performance in classification problems, but they have the disadvantages of a high possibility of overfitting and difficulty in interpreting the results. We also used the GLMRF model, which has the advantages of both a linear classification model (explanatory) and a non-linear classification model.
To evaluate and compare the performance of the 4 methods, we used the area under the curve (AUC) and receiver operating characteristic (ROC) curves, which are performance metrics used in classification problems. The ROC curve is a graph that displays the performance for all thresholds at once, and AUC refers to the lower area of the ROC curve. A value closer to 1 indicates an excellent model, and a value closer to 0 indicates a poor model. Detailed performance by age group and the stage was compared using sensitivity, indicating the probability of classifying actual breast cancers as breast cancer, and specificity, indicating the probability of classifying the actual healthy controls as healthy. All analyses were performed using the R statistical analysis tool (version 4.0.3; R Core Team, Vienna, Austria).
We previously reported an optimal panel of multiple biomarkers (miR-1246, miR-206, miR-24 and miR-373) and diagnostic models for screening breast cancer of all ages [17]. In a previous study, it was observed that 4 miRNA biomarkers (miR-1246, 202, 21, 219B) out of a combination of 2 or more miRNA biomarkers out of 9 candidate miRNAs performed well in samples younger than 50 years old (data show that not). In this study, we intended to reaffirm this. In addition, the correlation between miRNAs for these 9 candidate miRNA biomarkers (miR-223, 1246, 206, 24, 373, 21, 6875, 202, 219B) in the previous study was analyzed using Spearman’s correlation analysis [17]. It was confirmed that the four miRNAs selected for this study had a very low correlation.
We collected breast cancer patients and healthy controls over 20 years of age and performed 4-way modeling to classify breast cancer patients and healthy controls. First, to confirm the performance of each of the 4 miRNA biomarkers, we examined the performance values of each of 4 miRNA biomarkers in 165 breast cancer samples and 165 healthy control samples. The expression level of each miRNA was the intersection of the miRNA amplification curve and the cycle threshold (Ct), which is a relative measure of the target miRNA concentration in the RT-PCR reaction and was internally normalized to control genes. We confirmed that all 4 miRNA biomarkers were reproduced from different samples and were meaningful in differentiating normal and breast cancer patients in box plots (Fig. 1). That is, this result was statistically significant based on the
The Mann–Whitney U-test for the 4 miRNA biomarkers by age group with box plots confirmed that there was a greater difference between the breast cancer and healthy control groups under the age of 50 (Fig. 2). These results suggest that the 4 miRNA biomarker sets may have higher performance in women under 50 years of age compared to the overall age group. miR-21 is not significant over 60 years of age compared to other miRNA biomakers (Fig. 2). However, in all age groups, inclusion of miR-21 was slightly better than inclusion of miR-21 (Table S1).
To confirm the classification performance of the 4 miRNA biomarker sets in breast cancer and healthy controls under 50 years of age, we performed modeling by applying 4 methods (logistic regression, random forest, XGBoost, and GLMRF). As a model validation method, 10-fold cross-validation was used to improve the stability of the model performance, and AUC and ROC curves were used as performance metrics. The results confirmed that the performance was excellent in the age group under 50 compared to the overall age in all 4 methods (Fig. 3). Specifically, in logistic regression, the AUC of all age groups was 0.913 and the AUC under 50 was 0.961, in random forest analysis, the AUC of all ages was 0.903 and the AUC under 50 was 0.967, and in XGBoost, the AUC of all ages was 0.894 and the AUC under 50 was 0.955. And for GLMRF, the AUC for all age groups was 0.912 and the AUC under 50 was 0.963, and AUC was slightly higher for those under 50 than for all age groups in all methods.
To confirm the significance of the 4 miRNA biomarker sets in the early diagnosis of breast cancer under 50 years of age, the sensitivity and specificity by stage and age group were analyzed. The logistic regression results with the highest AUC among the 4 methods tried in this study were examined in detail. The analysis by age group confirmed that the 4 miRNA biomarker sets had a sensitivity of 85.29% and a specificity of 93.33% in the age group under 50 years of age, higher than the sensitivity of 82.42% and specificity of 85.45% in all ages (Table 2). These results are very meaningful, showing the possibility that a set of 4 miRNA biomarkers could provide supplemental data for the low sensitivity of mammography in Korean women under 50 years of age. And, analysis by age group included stages 0, 1, 2, and 3 samples, the early stages of breast cancer correspond to stages 0, 1, and 2, and the number of stage 0 samples used in this study is too small to determine the accuracy of stage 0. In the case of stage 3, the number of samples was smaller than that of stage 1 and stage 2, and the average patient age was higher than that in stage 1 and stage 2, indicating that the sensitivity was affected. In the 4 miRNA biomarkers, the sensitivity was 86.42% in stage 1 and 79.31% in stage 2, confirming that it was higher or similar to the sensitivity of 82.42% for all stages (Table 2). That is, the four miRNA biomarker sets showed the potential to compensate for the low sensitivity of the existing breast cancer screening method with an average sensitivity of 83.45% for stage 1 and stage 2 breast cancer. That is, the set of 4 miRNA biomarkers showed the potential to compensate for the low sensitivity of mammography with an average sensitivity of 83.45% for stages 1 and 2 breast cancer.
Table 2 . Sensitivity and specificity by age and stage in the logistic regression method
Specimens | Logistic regression method (n = 165) | ||
---|---|---|---|
Specificity (%) | Sensitivity (%) | ||
Total | 85.45 | 82.42 | |
Age | < 50 | 93.33 | 85.29 |
50 to 59 | 80.36 | 78.00 | |
≥ 60 | 81.63 | 82.98 | |
Stage | 0 | - | 60.00 |
1 | - | 86.42 | |
2 | - | 79.31 | |
3 | - | 80.95 |
-, not available.
The early diagnosis of breast cancer is difficult because there are no symptoms in the early stages. Patients with early-detected breast cancer have better treatment outcomes and higher survival rates than other cancers, but the survival rate is low in patients with terminal cancer, so accurate early diagnosis is very important.
Mammography is the only screening test scientifically proven to reduce breast cancer mortality. However, mammography has disadvantages such as hospitalization, psychological burden, and radiation exposure, as it can lead to false-negative and false-positive diagnoses, which can lead to excessive additional tests and unnecessary biopsies [18]. Above all, the sensitivity of mammography is inversely proportional to breast density. Korean women aged 40 to 49 years, who account for more than 70% of high-density breasts, have lower sensitivity in mammography than women aged 50 years or older. Therefore, breast cancer screening by mammography was associated with a significant decrease in mortality in patients aged 50 years and older, but this difference was not statistically significant when only women aged 40 to 49 years were included. Even under 40 years of age, breast cancer screening by mammography did not reduce breast cancer mortality [19,20] according to the results of a comparative evaluation of the accuracy of screen-film mammography (SFM) and full-field digital mammography (FFDM) for breast cancer screening in more than 8 million Korean women. Sensitivity and positive predictive value (PPV) were higher in FFDM than in SFM, but specificity was lower in FFDM, and the overall AUC of FFDM was 0.80, higher than that of SFM at 0.75. In particular, in the case of patients under 50 years of age, the sensitivity of SFM was 49.0% to 53.7% and the specificity was 86.0% to 86.2%. In the case of FFDM, the sensitivity in women under the age of 50 was 54.6% to 54.5% and the specificity was 85.3% to 86.0%, which was better than SFM, but the sensitivity was still low [21]. These results show the importance of developing a new auxiliary diagnostic method that can improve the sensitivity of mammography and are considered important reference material for evaluating the significance of the results of this study.
Breast ultrasonography is mainly used as an auxiliary test to compensate for the problems of mammography. When both mammography and ultrasonography are performed as screening tests, the sensitivity is improved compared to when only mammography is performed. However, despite the development of breast ultrasound equipment, this auxiliary test also has limitations due to the high dependence of the diagnosis on the examiner and difficulty in the early diagnosis of breast cancer due to calcified lesions [22].
As a new screening method for high-risk breast cancer that can assist with mammography, liquid biopsy is non-invasive and minimizes the inconvenience of an examination [23]. In this study, we tried to verify the combination of multiple miRNA biomarkers with high sensitivity in women under the age of 50 with a high percentage of dense breasts, which are difficult to accurately diagnose by mammography.
miRNAs used as breast cancer screening biomarkers have demonstrated the value of circulating miRNAs in breast cancer diagnosis in several previous studies [24,25]. However, there have been no reports on miRNAs that significantly selected Korean breast cancer patients under 50 years of age. In this study, the expression levels of 4 miRNAs in plasma were analyzed to verify whether they significantly identified Korean breast cancer patients under 50 years of age. In particular, in this study, we used a dumbbell-shaped rescue primer for pre-miRNA amplification by RT-PCR, our team’s proprietary technology that can minimize non-specific PCR in real time. Previous studies reported that the 4 miRNAs used in this study were involved in breast cancer development and progression. However, in contrast to our study, the published papers analyzed mature miRNA expression [26-30].
The results of this study showed that the AUC value of the 4 miRNA biomarkers (miR-1246, miR-202, miR-21, and miR-219B) measured in plasma for the early diagnosis of breast cancer was 0.913 in the logistic method. In particular, under the age of 50, the AUC was 0.961, the sensitivity was 85.29%, and the specificity was 93.33%, showing the highest performance.
It is not yet known whether the developed multi-miRNA set can discriminate between breast cancer and benign breast disease. In particular, the selectability of breast cancer and difficult-to-selectable mammary glands and benign diseases in mammography should be studied using samples from patients with benign breast calcifications in the future.
In conclusion, the 4 miRNA biomarker set was found to be meaningful in the early diagnosis of breast cancer in Korean women under 50 years of age. The set of 4 miRNA biomarkers provided higher sensitivity and specificity in the age group of patients under 50 years compared to all age groups, suggesting that it may be helpful in supplementing mammography sensitivity in Korean women under the age of 50 with a high percentage of dense breasts. For reference, miR-21 among the 4 miRNA biomarkers showed significant performance in other carcinomas, whereas the other three miRNAs did not (data not shown).
The clinical significance of the results of this study is that it can provide a basis for the development of a new adjunct tool to improve the accuracy of mammography and hence can be utilized as a new high-accuracy breast cancer screening tool. Therefore, it is expected that the treatment effect and survival rate of breast cancer can be improved. However, many factors must be considered for clinically use, and additional validation of an appropriate screening population through large-scale clinical trials is required.
Supplementary materials can be found via https://doi.org/10.15430/JCP.2021.26.4.258.
jcp-26-4-258-supple.pdfNo potential conflicts of interest were disclosed.
Dessiet Oma, Maria Teklemariam, Daniel Seifu, Zelalem Desalegn, Endale Anberbir, Tamrat Abebe, Solomon Mequannent, Solomon Tebeje, Wajana Lako Labisso
J Cancer Prev 2023; 28(2): 64-74 https://doi.org/10.15430/JCP.2023.28.2.64Eun-Ryeong Hahm, Sivapar V. Mathan, Rana P. Singh, Shivendra V. Singh
J Cancer Prev 2022; 27(2): 101-111 https://doi.org/10.15430/JCP.2022.27.2.101Jinyoung Suh, Do-Hee Kim, Su-Jung Kim, Nam-Chul Cho, Yeon-Hwa Lee, Jeong-Hoon Jang, Young-Joon Surh
J Cancer Prev 2022; 27(1): 68-76 https://doi.org/10.15430/JCP.2022.27.1.68