Tackling prediction uncertainty in machine learning for healthcare – Nature.com

  • Challen, R. et al. Artificial intelligence, bias and clinical safety. BMJ Qual. Saf. 28, 231–237 (2019).

    Article 

    Google Scholar 

  • Hendrycks, D. & Gimpel, K. A baseline for detecting misclassified and out-of-distribution examples in neural networks. Preprint at arXiv https://arxiv.org/abs/1610.02136 (2018).

  • Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. Preprint at arXiv https://arxiv.org/abs/1412.6572 (2015).

  • Amodei, D. et al. Concrete problems in AI safety. Preprint at arXiv https://arxiv.org/abs/1606.06565 (2016).

  • Nguyen, A., Yosinski, J. & Clune, J. Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 427–436 (2015).

  • He, J. et al. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 30–36 (2019).

    Article 
    CAS 

    Google Scholar 

  • Kompa, B., Snoek, J. & Beam, A. L. Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digit. Med. 4, 4 (2021).

    Article 

    Google Scholar 

  • Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. On calibration of modern neural networks. In Proc. 34th Int. Conference on Machine Learning (PMLR) 70, 1321–1330 (2017).

  • Dyer, T. et al. Diagnosis of normal chest radiographs using an autonomous deep-learning algorithm. Clin. Radiol. 76, 473–473 (2021).

    Article 

    Google Scholar 

  • Dyer, T. et al. Validation of an artificial intelligence solution for acute triage and rule-out normal of non-contrast CT head scans. Neuroradiology 64, 735–743 (2022).

    Article 

    Google Scholar 

  • Liang, X., Nguyen, D. & Jiang, S. B. Generalizability issues with deep learning models in medicine and their potential solutions: illustrated with Cone-Beam Computed Tomography (CBCT) to Computed Tomography (CT) image conversion. Mach. Learn. Sci. Technol. 2, 015007 (2020).

  • Navarrete-Dechent, C. et al. Automated dermatological diagnosis: hype or reality? J. Invest. Dermatol. 138, 2277–2279 (2018).

    Article 
    CAS 

    Google Scholar 

  • Krois, J. et al. Generalizability of deep learning models for dental image analysis. Sci. Rep. 11, 6102 (2021).

    Article 

    Google Scholar 

  • Sathitratanacheewin, S., Sunanta, P. & Pongpirul, K. Deep learning for automated classification of tuberculosis-related chest X-ray: dataset distribution shift limits diagnostic performance generalizability. Heliyon 6, e04614 (2020).

    Article 

    Google Scholar 

  • Xin, K. Z., Li, D. & Yi, P. H. Limited generalizability of deep learning algorithm for pediatric pneumonia classification on external data. Emerg. Radiol. 29, 107–113 (2022).

    Article 

    Google Scholar 

  • Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 15, e1002683 (2018).

    Article 

    Google Scholar 

  • Chen, J. S. et al. Deep learning for the diagnosis of stage in retinopathy of prematurity: accuracy and generalizability across populations and cameras. Ophthalmol. Retina 5, 1027–1035 (2021).

    Article 

    Google Scholar 

  • Jiang, H., Kim, B., Guan, M. & Gupta, M. To trust or not to trust a classifier. In Advances in Neural Information Processing Systems 31 (2018).

  • Geifman, Y. & El-Yaniv, R. Selectivenet: a deep neural network with an integrated reject option. In Proc. 36th Int. Conference on Machine Learning (PMLR) 97, 2151–2159 (2019).

  • Madras, D., Pitassi, T. & Zemel, R. Predict responsibly: improving fairness and accuracy by learning to defer. In Advances in Neural Information Processing Systems 31 (2018).

  • Kim, D. et al. Accurate auto-labeling of chest X-ray images based on quantitative similarity to an explainable AI model. Nat. Commun. 13, 1867 (2022).

    Article 
    CAS 

    Google Scholar 

  • Bernhardt, M. et al. Active label cleaning for improved dataset quality under resource constraints. Nat. Commun. 13, 1161 (2022).

    Article 
    CAS 

    Google Scholar 

  • Krause, J. et al. Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmology 125, 1264–1272 (2018).

    Article 

    Google Scholar 

  • Basha, S. H. S., Dubey, S. R., Pulabaigari, V. & Mukherjee, S. Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 378, 112–119 (2020).

    Article 

    Google Scholar 

  • Trabelsi, A., Chaabane, M. & Ben-Hur, A. Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities. Bioinformatics 35, i269–i277 (2019).

    Article 
    CAS 

    Google Scholar 

  • Boland, G. W. L. Voice recognition technology for radiology reporting: transforming the radiologist’s value proposition. J. Am. Coll. Radiol. 4, 865–867 (2007).

    Article 

    Google Scholar 

  • Heleno, B., Thomsen, M. F., Rodrigues, D. S., Jorgensen, K. J. & Brodersen, J. Quantification of harms in cancer screening trials: literature review. BMJ 347, f5334–f5334 (2013).

    Article 

    Google Scholar 

  • Dans, L. F., Silvestre, M. A. A. & Dans, A. L. Trade-off between benefit and harm is crucial in health screening recommendations. Part I: general principles. J. Clin. Epidemiol. 64, 231–239 (2011).

    Article 

    Google Scholar 

  • Peryer, G., Golder, S., Junqueira, D. R., Vohra, S. & Loke, Y. K. in Cochrane Handbook for Systematic Reviews of Interventions (eds Higgins, J. P. et al.) Ch. 19, 493–505 (John Wiley & Sons, 2011).

  • Mukhoti, J., Kirsch, A., van Amersfoort, J., Torr, P. H. S. & Gal, Y. Deep deterministic uncertainty: a simple baseline. Preprint at arXiv https://arxiv.org/abs/2102.11582 (2022).

  • Kruschke, J. K. in The Cambridge Handbook of Computational Psychology (ed. Sun, R.) 267–301 (Cambridge Univ. Press, 2008).

  • Bowman, C. R., Iwashita, T. & Zeithamova, D. Tracking prototype and exemplar representations in the brain across learning. eLife 9, e59360 (2020).

    Article 
    CAS 

    Google Scholar 

  • Platt, J. C. in Advances in Large Margin Classifiers (eds Smola, A. J. et al.) (MIT Press, 1999).

  • Ding, Z., Han, X., Liu, P. & Niethammer, M. Local temperature scaling for probability calibration. In Proc. IEEE/CVF International Conference on Computer Vision 6889–6899 (2021).

  • Clinciu, M.-A. & Hastie, H. A survey of explainable AI terminology. In Proc. 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI) 8–13 (2019).

  • Biran, O. & Cotton, C. Explanation and justification in machine learning: a survey. In IJCAI-17 Workshop on Explainable Artificial Intelligence (XAI) 8, 8–13 (2017).

  • Leave a comment

    Your email address will not be published. Required fields are marked *