用户名: 密码: 验证码:
Applying Under-Sampling Techniques and Cost-Sensitive Learning Methods on Risk Assessment of Breast Cancer
详细信息    查看全文
  • 作者:Jia-Lien Hsu (1)
    Ping-Cheng Hung (1)
    Hung-Yen Lin (1)
    Chung-Ho Hsieh (2)

    1. Department of Computer Science and Information Engineering
    ; Fu Jen Catholic University ; New Tapei City ; Taiwan ; Republic of China
    2. Department of General Surgery
    ; Shin Kong Wu Ho-Su Memorial Hospital ; Taipei ; Taiwan ; Republic of China
  • 关键词:Breast cancer ; Cost ; sensitive learning ; Sampling
  • 刊名:Journal of Medical Systems
  • 出版年:2015
  • 出版时间:April 2015
  • 年:2015
  • 卷:39
  • 期:4
  • 全文大小:2,142 KB
  • 参考文献:1. Siegel, R, Naishadham, D, Jemal, A (2013) Cancer statistics, 2013. CA: Cancer J. Clin. 63: pp. 11-30
    2. Kim, J, Shin, H (2013) Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data. J. Am. Med. Inform. Assoc. 20: pp. 613-618 12-001570" target="_blank" title="It opens in new window">CrossRef
    3. Uhry, Z, H茅delin, G, Colonna, M, Asselain, B, Arveux, P, Rogel, A (2010) Multi-state Markov models in cancer screening evaluation: a brief review and case study. Stat. Methods Med. Res. 19: pp. 463-486 CrossRef
    4. Bleyer, A, Welch, HG (2012) Effect of three decades of screening mammography on breast-cancer incidence. N. Engl. J. Med. 367: pp. 1998-2005 1206809" target="_blank" title="It opens in new window">CrossRef
    5. Blume, JD, Cormack, JB, Mendelson, EB, Lehrer, D, Pisano, ED, Jong, RA (2008) Combined screening with ultrasound and mammography vs mammography alone in women at elevated risk of breast cancer. J. Am. Med. Assoc. 299: pp. 2151-2163 CrossRef
    6. Lord, SJ, Lei, W, Craft, P, Cawson, JN, Morris, I, Walleser, S (2007) A systematic review of the effectiveness of magnetic resonance imaging (MRI) as an addition to mammography and ultrasound in screening young women at high risk of breast cancer. Eur. J. Cancer 43: pp. 1905-1917 CrossRef
    7. Breast Cancer Screening (PDQ), Breast Cancer Screening Modalities Beyond Mammography (Health Professional Version) [homepage on the Internet]. National Cancer Institute; c2014 [updated 2014 Oct. 3; cited 2014 Oct. 6]. Available from: ofessional/page9" class="a-plus-plus">http://www.cancer.gov/cancertopics/pdq/screening/breast/healthprofessional/page9
    8. Kittler, J, Hatef, M, Duin, RPW, Matas, J (1998) On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20: pp. 226-239 CrossRef
    9. Wolpert, DH (1992) Stacked generalization. Neural Netw. 5: pp. 241-259 CrossRef
    10. Elkan, C.: The Foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2. IJCAI鈥?1. Available from: http://dl.acm.org/citation.cfm?id=1642194.1642224, pp. 973鈥?78. Morgan Kaufmann Publishers Inc., San Francisco, CA (2001)
    11. Seiffert, C., Khoshgoftaar, T.M., van Hulse, J., Napolitano A.: A Comparative Study of Data Sampling and Cost Sensitive Learning. In: Proceedings of the 2008 IEEE International Conference on Data Mining Workshops, pp. 46鈥?2 (2008)
    12. Garca-Laencina, P, Sancho-Gmez, JL, Figueiras-Vidal, A (2010) Pattern classification with missing data: a review. Neural Comput. Applic. 19: pp. 263-282 CrossRef
    13. Evangelopoulos, NE (2013) Latent semantic analysis. Wiley Interdiscip. Rev. Cogn. Sci. 4: pp. 683-692
    14. Deerwester, SC, Dumais, ST, Landauer, TK, Furnas, GW, Harshman, RA (1990) Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41: pp. 391-407 CrossRef
    15. Fawcett, T (2006) An introduction to, R O C analysis. Pattern Recognit. Lett. 27: pp. 861-874 CrossRef
  • 刊物类别:Mathematics and Statistics
  • 刊物主题:Statistics
    Statistics for Life Sciences, Medicine and Health Sciences
    Health Informatics and Administration
  • 出版者:Springer Netherlands
  • ISSN:1573-689X
文摘
Breast cancer is one of the most common cause of cancer mortality. Early detection through mammography screening could significantly reduce mortality from breast cancer. However, most of screening methods may consume large amount of resources. We propose a computational model, which is solely based on personal health information, for breast cancer risk assessment. Our model can be served as a pre-screening program in the low-cost setting. In our study, the data set, consisting of 3976 records, is collected from Taipei City Hospital starting from 2008.1.1 to 2008.12.31. Based on the dataset, we first apply the sampling techniques and dimension reduction method to preprocess the testing data. Then, we construct various kinds of classifiers (including basic classifiers, ensemble methods, and cost-sensitive methods) to predict the risk. The cost-sensitive method with random forest classifier is able to achieve recall (or sensitivity) as 100 %. At the recall of 100 %, the precision (positive predictive value, PPV), and specificity of cost-sensitive method with random forest classifier was 2.9 % and 14.87 %, respectively. In our study, we build a breast cancer risk assessment model by using the data mining techniques. Our model has the potential to be served as an assisting tool in the breast cancer screening.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700