Providing an imputation algorithm for missing values of longitudinal data using Cuckoo search algorithm
A case study on cervical dystonia
Keywords:
Missing data, Imputation of missing data, Cuckoo algorithm, Longitudinal dataAbstract
Background: Missing values in data are found in a large number of studies in the field of medical sciences, especially longitudinal ones, in which repeated measurements are taken from each person during the study. In this regard, several statistical endeavors have been performed on the concepts, issues, and theoretical methods during the past few decades.
Methods: Herein, we focused on the missing data related to patients excluded from longitudinal studies. To this end, two statistical parameters of similarity and correlation coefficient were employed. In addition, metaheuristic algorithms were applied to achieve an optimal solution. The selected metaheuristic algorithm, which has a great search functionality, was the Cuckoo search algorithm.
Results: Profiles of subjects with cervical dystonia (CD) were used to evaluate the proposed model after applying missingness. It was concluded that the algorithm used in this study had a higher accuracy (98.48%), compared with similar approaches.
Conclusion: Concomitant use of similar parameters and correlation coefficients led to a significant increase in accuracy of missing data imputation.
References
Wikipedia, the free encyclopedia. Longitudinal study. 2016. Available from:
https://en.wikipedia.org/w/index.php?title=Longitudinal_study&oldid=731082311.
Enders CK. Applied Missing Data Analysis (Methodology in the Social Sciences). ISBN-13: 978- 1606236390. Available from: https://www.amazon.com/Applied-Missing-Analysis-Methodology- ciences/dp/1606236393/ref=sr_1_cc_1?s=aps&ie=UTF8&qid=1471519943&sr=1-1- catcorr&keywords=Applied+Missing+Data+Analysis.
Mayo Clinic. Spasmodic torticollis. 2016. Available from: http://www.mayoclinic.org/diseases- conditions/spasmodic-torticollis/basics/definition/con-20028215.
2016. Available from: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets /Ccdystonia.html.
Nakai M, Chen DG, Nishimura K, Miyamoto Y. Comparative study of four methods in missing value
imputations under missing completely at random mechanism. Open J Stat. 2014; 4: 27-37. doi:
4236/ojs.2014.41004.
Little A, Rubin B. Statistical Analysis with Missing Data. 3th ed. Chichester: John Wiley & Sons; 2016.
Yang XS, Deb S. Engineering optimisation by cuckoo search. Int J Math Model Numer Optim. 2010; 1(4):
–43.
Payne RB. The Cuckoos. Oxford University Press; 1833.
Brown C, Liebovitch LS, Glendon R. Lévy flights in Dobe Ju/’hoansi foraging patterns. Hum Ecol. 2007;
(1): 129–38. doi: 10.1007/s10745-006-9083-4.
Reynolds AM, Frye MA. Free-flight odor tracking in Drosophila is consistent with an optimal intermittent
scale-free search. PloS One. 2007; 2(4): e354. doi: 10.1371/journal.pone.0000354. PMID: 17406678,
PMCID: PMC1831497.
Pavlyukevich I. Lévy flights, non-local search and simulated annealing. J Comput Phys. 2007; 226(2): 1830- 1844. doi: 10.1016/j.jcp.2007.06.008.
Shlesinger MF. Mathematical physics: Search research. Nature. 2006; 443(7109): 281–2. doi:
1038/443281a. PMID: 16988697.
Barthelemy P, Bertolotti J, Wiersma DS. A Lévy flight for light. Nature. 2008; 453(7194): 495-8. doi:
1038/nature06948. PMID: 18497819.
Reynolds AM, Frye MA. Free-flight odor tracking in Drosophila is consistent with an optimal intermittent
scale-free search. PloS One. 2007; 2(4): e354. doi: 10.1371/journal.pone.0000354. PMID: 17406678,
PMCID: PMC1831497.
Yang XS. Biology-derived algorithms in engineering optimization. In: Handbook of Bioinspired Algorithms
and Applications. 2006.
Ford GS. Outlier Statistics Using Eviews. 2008. doi: 10.2139/ssrn.1293945.
Wheelan C. Naked Statistics: Stripping the Dread from the Data. 1 edition. W. W. Norton & Company; 2014.
Witten IH, Frank E, Hall MA. Data Mining: Practical Machine Learning Tools and Techniques. Third
Edition. Burlington, MA: Morgan Kaufmann; 2011.
Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M. Methods for imputation of missing
values in air quality data sets. Atmos Environ. 2004; 38(18): 2895-907. doi:
1016/j.atmosenv.2004.02.026.
Lobato F, Sales C, Araujo I, Tadaiesky V, Dias L, Ramos L, et al. Multi-objective genetic algorithm for
missing data imputation. Pattern Recognit Lett. 2015; 68: 126-31. doi: 10.1016/j.patrec.2015.08.023.
Leke C, Marwala T, Paul S. Proposition of a Theoretical Model for Missing Data Imputation using Deep
Learning and Evolutionary Algorithms. ArXiv Prepr ArXiv151201362. 2015.
Jiang P, Liu F, Wang J, Song Y. Cuckoo search-designated fractal interpolation functions with winner
combination for estimating missing values in time series. Appl Math Model. 2016; 40(23): 9692-718. doi:
1016/j.apm.2016.05.030.
Garren ST. Maximum likelihood estimation of the correlation coefficient in a bivariate normal model with
missing data. Stat Probab Lett. 1998; 38(3): 281–8. doi: 10.1016/S0167-7152(98)00035-2.
Published
Issue
Section
License
Copyright (c) 2020 KNOWLEDGE KINGDOM PUBLISHING
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.