• Printed Journal
  • Indexed Journal
  • Peer Reviewed Journal
Journal of Applied Science & Engineering

Dhaka University Journal of Applied Science & Engineering

Issue: Vol. 7, No. 1, January 2022
Title: A Comparative Study of Missing Data Imputation Methods for Activity Recognition Task
Authors:
  • Afia Sajeeda
    Institute of Information Technology, University of Dhaka
  • B M Mainul Hossain
    Institute of Information Technology, University of Dhaka
  • Sumon Ahmed
    Institute of Information Technology, University of Dhaka
DOI:
Keywords: Generative Adversarial Networks, GANs, Missing Data, Imputation, Deep Learning.
Abstract:

While data is integral to address real-world problems using machine learning techniques, the availability of complete data is challenging as data goes missing during collection or even afterward. Missing values make data unsuitable for use and demand imputation techniques to resolve the problem. Here, we compare the performance of four existing missing data imputation techniques KNN, MICE, GAIN, and HexaGAN by applying them on KU-HAR dataset and two datasets of the collection, Nurse Care Activity Recognition Dataset. Our investigation suggests that HexaGAN learns the original data distribution better and demands further considerations, as the RMSE between the imputed data and the real data is lower. To investigate the role of activation function, we replace the underlying ReLU activation functions of the neural networks in HexaGAN architecture with the Swish activation function. Experimental results show that the modified version of HexaGAN possesses the potential to outperform the original one when applied to the same activity recognition datasets.

References:
  1. D. B. Rubin, “Inference and Missing Data,” Biometrika, vol. 63, no. 3, p. 581, 1976, Available: https://doi.org/10.2307/2335739.
  2. S. van Buuren, Flexible Imputation of Missing Data, Second Edition. Second edition, CRC Press, 2019, Available: https://doi.org/10.1201/9780429492259.
  3. I. Goodfellow et al., “Generative Adversarial Nets,” 2014, Available: https://doi.org/10.1145/3422622.
  4. U. Hwang, D. Jung, and S. Yoon, “HexaGAN: Generative Adversarial Nets for Real World Classification,” 2019.
  5. S. C.-X. Li, B. Jiang, and B. M. Marlin, “MISGAN: Learning from Incomplete Data with Generative Adversarial Networks,” 2019
  6. J. Yoon, J. Jordon, and M. van der Schaar, “Gain: Missing Data Imputation Using Generative Adversarial Nets,” in Proceedings of the 35th International Conference of Machine Learning, pp. 5689–5698, 2018
  7. O. Troyanskaya et al., “Missing value estimation methods for DNA microarrays,” Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001, Available: https://doi.org/10.1093/bioinformatics/17.6.520.
  8. T. Hastie, R. Mazumder, J. D. Lee, and R. Zadeh, “Matrix Completion and Low-rank SVD via Fast Alternating Least Squares,” The Journal of Machine Learning Research, vol. 16, no. 1, pp. 3367–3402, 2015
  9. S. van Buuren and K. Groothuis-Oudshoorn, “mice: Multivariate Imputation by Chained Equations in R,” Journal of Statistical Software, vol. 45, no. 3, 2011, Available: https://doi.org/10.18637/jss.v045.i03.
  10. P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and Composing Robust Features with Denoising Autoencoders,” in ICML ’08: Proceedings of the 25th International Conference on Machine Learning, pp. 096-1103, 2008, [Online; accessed 16-February-2021]. Available: https://doi.org/10.1145/1390156.1390294
  11. Sozo Inoue, Paula Lago, Shingo Takeda, Alia Shamma, Farina Faiz, Nattaya Mairittha, Tittaya Mairittha, “Nurse Care Activity Recognition Challenge”, IEEE Dataport, 2019, Available: https://dx.doi.org/10.21227/2cvj-bs21.s
  12. M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” in International Conference on Machine Learning, pp. 214–223, 2017.
  13. T. Karras, S. Laine, and T. Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks.” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401-4410, 2019.
  14. J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks.” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2223-2232, 2017
  15. X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley ”Least squares generative adversarial networks.” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2794-2802.
  16. M. Mirza and S. Osindero, “Conditional Generative Adversarial Nets,” 2014. [Online; accessed 16-February-2021]. Available: https://doi.org/10.48550/arXiv.1411.1784
  17. J. Brownlee, Generative Adversarial Networks with Python. Machine Learning Mastery, 2019.
  18. D. Dua and C. Graff, 2019, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  19. C. Li, K. Xu, J. Zhu, J. Liu and B. Zhang, “Triple Generative Adversarial Networks,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9629-9640, 2022, Available: https://doi.org/10.1109/TPAMI.2021.3127558.
  20. H. Zhang, “Medical Missing Data Imputation by Stackelberg GAN,” 2018. [Online; accessed 16-February-2021]. Available: https://www.ml.cmu.edu/research/dap-papers/f18/dap-zhang-hongyang.pdf
  21. S. Yoon and S. Sull, “GAMIN: Generative Adversarial Multiple Imputation Network for Highly Missing Data.” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8456-8464, 2020.
  22. D. Lee, J. Kim, W. Moon, and J. C. Ye, “CollaGAN: Collaborative GAN for missing image data imputation.” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2487-2496, 2019.
  23. Y. Luo, Y. Zhang, X. Cai, and X. Yuan,”E2GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation.” in Proceedings of the 28th International Joint Conference on Artificial Intelligence, AAAI Press, pp. 3094- 3100, 2019.
  24. Y. Luo, X Cai, Y. Zhang, and J. Xu, “Multivariate time series imputation with generative adversarial networks.” In Advances in Neural Information Processing Systems, pp. 1596- 1607, 2018
  25. Y. Xu, Z. Zhang, L. You, J. Liu, Z. Fan, and X. Zhou, “scIGANs: single-cell RNA-seq imputation using generative adversarial networks,” Nucleic Acids Research, Jun. 2020, Available: https://doi.org/10.1093/nar/gkaa506
  26. A. Kazemi and H. Meidani, “IGANI: Iterative Generative Adversarial Networks for Imputation With Application to Traffic Data,” IEEE Access, vol. 9, pp. 112966–112977, 2021,Available: https://doi.org/10.1109/access.2021.3103456
  27. R. Viñas, T. Azevedo, E. R. Gamazon, and P. Liò, “Deep Learning Enables Fast and Accurate Imputation of Gene Expression,” Frontiers in Genetics, vol. 12, Apr. 2021, Available: https://doi.org/10.3389/fgene.2021.624128
  28. J. Kim, D. Tae, and J. Seok, “A survey of missing data imputation using generative adversarial networks”, In 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, pp. 454-456, 2020.
  29. P. Ramachandran, B. Zoph, and Q. Le, “Swish: A Self-Gated Activation Function,” 2017
  30. A. Nahid, N. Sikder and I. Rafi, “KU- HAR: An Open Dataset for Human Activity Recognition”, 2020
  31. Mendeley Data, V3, Available: https://doi.org/10.17632/45f952y38r.3 [Online; accessed 16-February-2021]
  32. M. Ibrahim ”Sampling non-relevant documents of training sets for learning-to-rank algorithms.”, International Journal of Machine Learning and Computing, vol. 10, no. 2, 2020
  33. M. Ibrahim,” Reducing correlation of random forest based learning-to-rank algorithms using subsample size”,Computational Intelligence, vol. 35, no. 4, pp. 774-798, 2019.