Comparative study of mortality prediction models from (SARS) associated with COVID-19 in Brazil
DOI:
https://doi.org/10.35699/2965-6931.2023.47705Keywords:
predictive models, logistic regression, COVID-19Abstract
Severe Acute Respiratory Syndrome 2 (SARS-CoV-2) comprises one of the complications triggered by the new coronavirus. This study aims to propose a comparison between two machine learning-based models in different contexts to predict mortality in cases of Severe Acute Respiratory Syndrome (SARS) caused by the 2019 coronavirus, COVID-19. The data used are available on the DataSUS platform and cover the period from January 2021 to December 2022. Consequently, a descriptive statistical analysis, variable selection, and the development of two models were carried out, one before the milestone of the second dose of COVID-19 vaccination and another after. Regarding the metrics, model 1 presented an accuracy of 71.8%, while model 2 achieved an accuracy of 80%, thus contributing to the decision-making process for tackling the disease.
Downloads
References
AGUIAR, P.; NUNES, B. Odds Ratio: Reflexão sobre a Validade de uma Medida de Referência em Epidemiologia. Acta medica portuguesa, v. 26, n. 5, p. 505–510, 2013.
ALBERT, S.; LINVILLE, L. Benchmarking current and emerging approaches to infrasound signal classification. Seismological research letters, v. 91, n. 2A, p. 921–929, 2020.
ALBITAR, O. et al. Risk factors for mortality among COVID-19 patients. Diabetes research and clinical practice, v. 166, n. 108293, p. 108293, 2020.
BARDA, N. et al. Performing risk stratification for COVID-19 when individual level data is not available – the experience of a large healthcare organization. 2020. Disponível em: <http://dx.doi.org/10.1101/2020.04.23.20076976>.
BREIMAN, L. Machine learning, v. 45, n. 1, p. 5–32, 2001.
CARVALHO, J. A. M. DE; RODRÍGUEZ-WONG, L. L. A transição da estrutura etária da população brasileira na primeira metade do século XXI. Cadernos de saude publica, v. 24, n. 3, p. 597–605, 2008.
CHICCO, D.; JURMAN, G. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC medical informatics and decision making, v. 20, n. 1, p. 16, 2020.
CORONAVIRIDAE STUDY GROUP OF THE INTERNATIONAL COMMITTEE ON TAXONOMY OF VIRUSES et al. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nature microbiology, v. 5, n. 4, p. 536–544, 2020.
CUCINOTTA, D.; VANELLI, M. WHO declares COVID-19 a pandemic. Acta bio-medica : Atenei Parmensis, v. 91, n. 1, p. 157–160, 2020.
DELONG, E. R.; DELONG, D. M.; CLARKE-PEARSON, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, v. 44, n. 3, p. 837–845, 1988.
DIEZ, D.; ÇETINKAYA-RUNDEL, M.; BARR, C. OpenIntro statistics. [s.l: s.n.].
DREISEITL, S.; OHNO-MACHADO, L. Logistic regression and artificial neural network classification models: a methodology review. Journal of biomedical informatics, v. 35, n. 5–6, p. 352–359, 2002.
DUDA, R. O.; HART, P. E.; STORK, D. G. Pattern Classification. 3. ed. [s.l.] Standards Information Network, 2022.
Elkan, C.P. (1997). Boosting and Naive Bayesian learning.
FARIA, A. R. Q. DE P. Análise de sobrevivência e fatores prognósticos associados à mortalidade em pacientes com SRAG por Covid-19 hospitalizados em UTI na Paraíba. 2021.
FERNÁNDEZ, A. et al. Learning from imbalanced data sets. 1. ed. Basel, Switzerland: Springer International Publishing, 2018.
FERNÁNDEZ GARCÍA, L.; PUENTES GUTIÉRREZ, A. B.; GARCÍA BASCONES, M. Relationship between obesity, diabetes and ICU admission in COVID-19 patients. Medicina Clínica (English Edition), v. 155, n. 7, p. 314–315, 2020.
FRANCESCHI, P. R. DE. Modelagens preditivas de Churn: o caso do Banco do Brasil. 2019.
GORBALENYA, A. E. et al. Severe acute respiratory syndrome-related coronavirus: The species and its viruses – a statement of the Coronavirus Study Group. 2020. Disponível em: <http://dx.doi.org/10.1101/2020.02.07.937862>.
GUDE-SAMPEDRO, F. et al. Development and validation of a prognostic model based on comorbidities to predict COVID-19 severity: a population-based study. International journal of epidemiology, v. 50, n. 1, p. 64–74, 2021.
HASTIE, T.; TIBSHIRANI, R.; TIBSHIRANI, R. J. Extended comparisons of best subset selection, forward stepwise selection, and the lasso. 2017. Disponível em: <http://arxiv.org/abs/1707.08692>.
HU, J.; FEI, Y.; LI, W.-Q. Predicting the mortality risk of acute respiratory distress syndrome: radial basis function artificial neural network model versus logistic regression model. Journal of clinical monitoring and computing, v. 36, n. 3, p. 839–848, 2022.
HUANG, C. et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet, v. 395, n. 10223, p. 497–506, 2020.
JAMES, G.; WITTEN, D.; HASTIE, T. An introduction to statistical learning: With applications in R. New York, NY: Springer, 2013.
KÜCHEMANN, B. A. Envelhecimento populacional, cuidado e cidadania: velhos dilemas e novos desafios. Sociedade e Estado, v. 27, n. 1, p. 165–180, 2012.
LI, X. et al. Molecular immune pathogenesis and diagnosis of COVID-19. Journal of pharmaceutical analysis, v. 10, n. 2, p. 102–108, 2020.
LIMA, T. P. F. et al. Death risk and the importance of clinical features in elderly people with COVID-19 using the Random Forest Algorithm. Revista Brasileira de Saúde Materno Infantil, v. 21, n. suppl 2, p. 445–451, 2021.
LIU, W. et al. Analysis of factors associated with disease outcomes in hospitalized patients with 2019 novel coronavirus disease. Chinese medical journal, v. 133, n. 9, p. 1032–1038, 2020.
LU, R. et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet, v. 395, n. 10224, p. 565–574, 2020.
LUO, H. et al. Logistic regression and random forest for effective imbalanced classification. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). Anais...IEEE, 2019.
M., C. Pattern Recognition and Machine Learning. Nova Iorque, NY, USA: Springer, 2016.
MAIMON, O.; ROKACH, L. (EDS.). Data mining and knowledge discovery handbook. 2. ed. New York, NY: Springer, 2010.
MARINI, J. J.; GATTINONI, L. Management of COVID-19 respiratory distress. JAMA: the journal of the American Medical Association, v. 323, n. 22, p. 2329, 2020.
MARSONO, M. N.; EL-KHARASHI, M. W.; GEBALI, F. Targeting spam control on middleboxes: Spam detection based on layer-3 e-mail content classification. Computer networks, v. 53, n. 6, p. 835–848, 2009.
MONTAZERI, M. et al. Machine learning models in breast cancer survival prediction. Technology and health care: official journal of the European Society for Engineering and Medicine, v. 24, n. 1, p. 31–42, 2016.
MONTGOMERY, D. C.; PECK, E. A. Introduction to Linear Regression Analysis. 3. ed. Nashville, TN: John Wiley & Sons, 2001.
MUSTAFA ABDULLAH, D.; MOHSIN ABDULAZEEZ, A. Machine learning applications based on SVM classification A review. Qubahan Academic Journal, v. 1, n. 2, p. 81–90, 2021.
OVALLE, D. L. P. et al. COVID obesity: A one-year narrative review. Nutrients, v. 13, n. 6, p. 2060, 2021.
PATIL, K. et al. Deep learning based car damage classification. 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). Anais...IEEE, 2017.
PRATI, R. C.; BATISTA, G. E. A. P. A.; MONARD, M. C. Evaluating classifiers using ROC curves. IEEE Latin America Transactions, v. 6, n. 2, p. 215–222, 2008.
PROVOST, R. K. Glossary of terms. Journal of Machine Learning, v. 30, p. 271–274, 1998.
QUINLAN, J. R. Bagging, Boosting, and C4. 5. Proceedings of the AAAI Conference on Artificial Intelligence, v. 13, 1996.
RÄTSCH, G. A brief introduction into Machine Learning. 2004.
RIOS, L. F. B. Modelos de predição de risco de morte para pacientes com carcinoma epidermoide de cabeça e pescoço. [s.l.] Universidade de Sao Paulo, Agencia USP de Gestao da Informacao Academica (AGUIA), 2021.
RUSSELL, S. J.; NORVIG, P. Artificial intelligence: A modern approach. [s.l.] Prentice Hall, 2010.
SENA, G. R. (2021). Modelos Preditivos de Óbito para Pacientes de Óbito para Pacientes com COVID-19.
SHARMA, S. Applied Multivariate Techniques. Nashville, TN: John Wiley & Sons, 1995.
TANBOĞA, I. H. et al. Development and validation of clinical prediction model to estimate the probability of death in hospitalized patients with COVID‐19: Insights from a nationwide database. Journal of medical virology, v. 93, n. 5, p. 3015–3022, 2021.
WANG, H.; MA, C.; ZHOU, L. A brief review of machine learning and its application. 2009 International Conference on Information Engineering and Computer Science. Anais...IEEE, 2009.
WOLLENSTEIN-BETECH, S. et al. Physiological and socioeconomic characteristics predict COVID-19 mortality and resource utilization in Brazil. PloS one, v. 15, n. 10, p. e0240346, 2020.
ZHOU, Y. et al. Obesity and diabetes as high-risk factors for severe coronavirus disease 2019 (Covid-19). Diabetes/metabolism research and reviews, v. 37, n. 2, p. e3377, 2021.