Providing an imputation algorithm for missing values of longitudinal data using Cuckoo search algorithm

A case study on cervical dystonia

Authors

  • Kobra Etminani Ph.D., Assistant Professor, Department of Biomedical Informatics, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran

Keywords:

Missing data, Imputation of missing data, Cuckoo algorithm, Longitudinal data

Abstract

Background: Missing values in data are found in a large number of studies in the field of medical sciences, especially longitudinal ones, in which repeated measurements are taken from each person during the study. In this regard, several statistical endeavors have been performed on the concepts, issues, and theoretical methods during the past few decades.

Methods: Herein, we focused on the missing data related to patients excluded from longitudinal studies. To this end, two statistical parameters of similarity and correlation coefficient were employed. In addition, metaheuristic algorithms were applied to achieve an optimal solution. The selected metaheuristic algorithm, which has a great search functionality, was the Cuckoo search algorithm.

Results: Profiles of subjects with cervical dystonia (CD) were used to evaluate the proposed model after applying missingness. It was concluded that the algorithm used in this study had a higher accuracy (98.48%), compared with similar approaches.

Conclusion: Concomitant use of similar parameters and correlation coefficients led to a significant increase in accuracy of missing data imputation.

References

Wikipedia, the free encyclopedia. Longitudinal study. 2016. Available from:

https://en.wikipedia.org/w/index.php?title=Longitudinal_study&oldid=731082311.

Enders CK. Applied Missing Data Analysis (Methodology in the Social Sciences). ISBN-13: 978- 1606236390. Available from: https://www.amazon.com/Applied-Missing-Analysis-Methodology- ciences/dp/1606236393/ref=sr_1_cc_1?s=aps&ie=UTF8&qid=1471519943&sr=1-1- catcorr&keywords=Applied+Missing+Data+Analysis.

Mayo Clinic. Spasmodic torticollis. 2016. Available from: http://www.mayoclinic.org/diseases- conditions/spasmodic-torticollis/basics/definition/con-20028215.

2016. Available from: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets /Ccdystonia.html.

Nakai M, Chen DG, Nishimura K, Miyamoto Y. Comparative study of four methods in missing value

imputations under missing completely at random mechanism. Open J Stat. 2014; 4: 27-37. doi:

4236/ojs.2014.41004.

Little A, Rubin B. Statistical Analysis with Missing Data. 3th ed. Chichester: John Wiley & Sons; 2016.

Yang XS, Deb S. Engineering optimisation by cuckoo search. Int J Math Model Numer Optim. 2010; 1(4):

–43.

Payne RB. The Cuckoos. Oxford University Press; 1833.

Brown C, Liebovitch LS, Glendon R. Lévy flights in Dobe Ju/’hoansi foraging patterns. Hum Ecol. 2007;

(1): 129–38. doi: 10.1007/s10745-006-9083-4.

Reynolds AM, Frye MA. Free-flight odor tracking in Drosophila is consistent with an optimal intermittent

scale-free search. PloS One. 2007; 2(4): e354. doi: 10.1371/journal.pone.0000354. PMID: 17406678,

PMCID: PMC1831497.

Pavlyukevich I. Lévy flights, non-local search and simulated annealing. J Comput Phys. 2007; 226(2): 1830- 1844. doi: 10.1016/j.jcp.2007.06.008.

Shlesinger MF. Mathematical physics: Search research. Nature. 2006; 443(7109): 281–2. doi:

1038/443281a. PMID: 16988697.

Barthelemy P, Bertolotti J, Wiersma DS. A Lévy flight for light. Nature. 2008; 453(7194): 495-8. doi:

1038/nature06948. PMID: 18497819.

Reynolds AM, Frye MA. Free-flight odor tracking in Drosophila is consistent with an optimal intermittent

scale-free search. PloS One. 2007; 2(4): e354. doi: 10.1371/journal.pone.0000354. PMID: 17406678,

PMCID: PMC1831497.

Yang XS. Biology-derived algorithms in engineering optimization. In: Handbook of Bioinspired Algorithms

and Applications. 2006.

Ford GS. Outlier Statistics Using Eviews. 2008. doi: 10.2139/ssrn.1293945.

Wheelan C. Naked Statistics: Stripping the Dread from the Data. 1 edition. W. W. Norton & Company; 2014.

Witten IH, Frank E, Hall MA. Data Mining: Practical Machine Learning Tools and Techniques. Third

Edition. Burlington, MA: Morgan Kaufmann; 2011.

Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M. Methods for imputation of missing

values in air quality data sets. Atmos Environ. 2004; 38(18): 2895-907. doi:

1016/j.atmosenv.2004.02.026.

Lobato F, Sales C, Araujo I, Tadaiesky V, Dias L, Ramos L, et al. Multi-objective genetic algorithm for

missing data imputation. Pattern Recognit Lett. 2015; 68: 126-31. doi: 10.1016/j.patrec.2015.08.023.

Leke C, Marwala T, Paul S. Proposition of a Theoretical Model for Missing Data Imputation using Deep

Learning and Evolutionary Algorithms. ArXiv Prepr ArXiv151201362. 2015.

Jiang P, Liu F, Wang J, Song Y. Cuckoo search-designated fractal interpolation functions with winner

combination for estimating missing values in time series. Appl Math Model. 2016; 40(23): 9692-718. doi:

1016/j.apm.2016.05.030.

Garren ST. Maximum likelihood estimation of the correlation coefficient in a bivariate normal model with

missing data. Stat Probab Lett. 1998; 38(3): 281–8. doi: 10.1016/S0167-7152(98)00035-2.

Published

2022-01-18

Issue

Section

Articles