Archive Issue – Vol.4, Issue.4 (Oct-Dec 2024)

Archive Issue – Vol.4, Issue.4 (October-December 2024)


A Comprehensive Review on Object Removal from Images using Deep Learning

Arun Pratap Singh, Amit Saxena

Research Paper | Journal Paper

Vol.4, Issue.4, pp.1-06, Dec-2024

Abstract

Object removal from images, often referred to as image inpainting or content-aware editing, is a fundamental and challenging task in computer vision that aims to seamlessly reconstruct missing or undesired regions in images while preserving visual realism, semantic coherence, and structural integrity. This problem has garnered significant attention due to its wide range of practical applications, including professional photography, augmented and virtual reality, video post-processing, medical image artifact removal, and surveillance, where accurate restoration of occluded or corrupted areas is critical. Early approaches to this problem relied primarily on traditional signal-processing techniques and patch-based methods, such as diffusion-based propagation or exemplar-based patch matching, which achieved notable successes for small missing regions and repetitive textures but struggled with large holes, complex textures, and maintaining global semantic consistency. The advent of deep learning has transformed the field by introducing data-driven models capable of learning complex patterns and contextual relationships from large datasets. Convolutional neural networks (CNNs) provided the first major leap, enabling end-to-end learning of hierarchical image representations that could generate plausible fills conditioned on the visible context. Building upon this, generative adversarial networks (GANs) further improved perceptual realism by employing adversarial training, where a generator synthesizes missing regions and a discriminator evaluates authenticity, leading to sharper and more coherent inpainting results.

Key-Words / Index Term: Image Inpainting, Object Removal, Deep Learning, GANs, Transformers, Diffusion Models, Computer Vision.

References

      1. National Eye Institute, ‘Diabetic retinopathy’, [Accessed: 26-March-2022], [Online]. Available: https://www.nei.nih.gov/...
      2. Grace, Annie & Mohideen, S.. (2014). An Economic System for Screening of Diabetic Retinopathy Using Fundus Images. OnLine Journal of Biological Sciences. 14. 254-260. 10.3844/ojbsci.2014.254.260.
      3. EyeRis Vision, ‘Diabetic retinopathy’, [Accessed: 26-March-2022], [Online]. Available: http://www.eyerisvision.com/...
      4. Elia J. Duh, Jennifer K. Sun, Alan W. Stitt, Diabetic retinopathy: current understanding, mechanisms, and treatment strategies, JCI Insight. 2017;2(14):e93751. https://doi.org/10.1172/...
      5. S. Ravishankar, A. Jain and A. Mittal, "Automated feature extraction for early detection of diabetic retinopathy in fundus images," 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, 2009, pp. 210-217.
      6. K. K. Palavalasa and B. Sambaturu, "Automatic Diabetic Retinopathy Detection Using Digital Image Processing," 2018 ICCSP, Chennai, 2018, pp. 0072-0076, doi: 10.1109/ICCSP.2018.8524234.
      7. A. Sopharak, Bunyarit Uyyanonvara and Sarah Barman, “Automatic exudate detection from nondilated diabetic retinopathy retinal images using fuzzy c-means clustering.” Sensors 2009, vol. 9, no. 3, pp. 2148-2161.
      8. T. Walter, J.C. Klein, P. Massin and A. Erginay, “Detection of exudates in color fundus images of the human retina,” IEEE Transactions on Medical Imaging, Vol. 21, Issue 10, 2002.
      9. A. Sopharak, Bunyarit Uyyanonvara, Sarah Barman and Thomas H.Williamson, “Automatic detection of diabetic retinopathy exudates using mathematical morphology methods,” Computerized Medical Imaging and Graphics, 2008, pp. 720–727.
      10. D. Welfer, Jacob Scharcanski and Diane Ruschel Marinho, “A coarse-to-fine strategy for automatically detecting exudates in color eye fundus images,” Computerized Medical Imaging and Graphics, 2010, pp. 228–235.
      11. A. Osareh, B. Shadgar, and R. Markham, “A computational-intelligence-based approach for detection of exudates in diabetic retinopathy images,” IEEE Trans. Inf. Technol. Biomed., vol. 13, no. 4, pp. 535–545, 2009.
      12. Gardner, G & Keating, David & Williamson, Tom & Elliott, A. (1996). Automatic detection of diabetic retinopathy using an artificial neural network: A screening tool. The British Journal of Ophthalmology. 80. 940-4. 10.1136/bjo.80.11.940.
      13. Anupriyaa Mukherjee et al., Int. Journal of Engineering Research and Applications, Vol. 5, Issue 2, Part-4, February 2015, pp. 21-24.
      14. Muhammad Waseem Khan, “Diabetic Retinopathy Detection using Image Processing: A Survey”, International Journal Of Emerging Technology & Research, Volume 1, Issue 1, Nov-Dec, 2013.
      15. M. M. Dharmana and A. M.S., "Pre-diagnosis of Diabetic Retinopathy using Blob Detection," 2020 ICIRCA, pp. 98-101, doi: 10.1109/ICIRCA48905.2020.9183241.
      16. M. Arora and M. Pandey, "Deep Neural Network for Diabetic Retinopathy Detection," 2019 COMITCon, pp. 189-193, doi: 10.1109/COMITCon.2019.8862217.
      17. Y. S. Boral and S. S. Thorat, "Classification of Diabetic Retinopathy based on Hybrid Neural Network," 2021 ICCMC, pp. 1354-1358, doi: 10.1109/ICCMC51019.2021.9418224.
      18. Kanimozhi, J., Vasuki, P. & Roomi, S.M.M. Fundus image lesion detection algorithm for diabetic retinopathy screening. J Ambient Intell Human Comput 12, 7407–7416 (2021). https://doi.org/...
      19. Salman, Ahmad & Siddiqui, Shoaib & Shafait, Faisal & Mian, Ajmal & Shortis, Mark & Khurshid, Khawar & Ulges, Adrian & Schwanecke, Ulrich. (2019). Automatic fish detection in underwater videos by a deep neural network-based hybrid motion learning system. ICES Journal of Marine Science. 77. 10.1093/icesjms/fsz025.
      20. P. Kokare, "Wavelet based automatic exudates detection in diabetic retinopathy," 2017 WiSPNET, pp. 1022-1025, doi: 10.1109/WiSPNET.2017.8299917.
      21. N. Karami and H. Rabbani, "A dictionary learning based method for detection of diabetic retinopathy in color fundus images," 2017 MVIP, pp. 119-122, doi: 10.1109/IranianMVIP.2017.8342333.
      22. Dailyhunt, ‘Diabetic retinopathy can cause vision loss’, [Accessed: 26-March-2022], [Online]. Available: https://m.dailyhunt.in/...
      23. Sisodia D. S, Nair S, Khobragade P. Diabetic Retinal Fundus Images: Preprocessing and Feature Extraction for Early Detection of Diabetic Retinopathy. Biomed Pharmacol J 2017.
      24. Klein R, Klein BE, Moss SE, Davis MD and DeMets DL, “The Wisconsin epidemiologic study of diabetic retinopathy. II Prevalence and risk of diabetic retinopathy when age at diagnosis is less than 30 years,” Arch Ophthalmology 1984, vol. 102, pp. 527–532.
      25. B. Harangi, I. Lazar and A. Hajdu, “Automatic Exudate Detection Using Active Contour Model and Region wise Classification,” IEEE EMBS 2012, pp. 5951–5954.
      26. Balazs Harangi, Balint Antal and Andras Hajdu, “Automatic Exudate Detection with Improved Naive-Bayes Classifier,” CBMS 2012, pp. 1–4.
      27. K. Zuiderveld, “Contrast Limited Adaptive Histogram Equalization,” Graphics Gems IV, Academic Press 1994, pp. 474–485.
      28. M. N. Langroudi and Hamed Sadjedi, “A New Method for Automatic Detection and Diagnosis of Retinopathy Diseases in Colour Fundus Images Based on Morphology,” International Conference on Bioinformatics and Biomedical Technology 2010, pp. 134–138.
      29. S. Chaudhauri, S. Chatterjee, N. Katz, M. Nelson and M. Goldbaum, "Detection of blood vessels in retinal images using two dimensional matched filters," IEEE Trans. Medical Imaging, vol. 8.
      30. X. Jiang and D. Mojon, “Adaptive local thresholding by verification-based multithreshold probing with application to vessel detection in retinal images,” IEEE TPAMI, vol. 25, no. 1, pp. 131–137, Jan. 2003.
      31. J. Staal, M. D. Abràmoff, M. Niemeijer, M. A. Viergever, and B. v. Ginneken, “Ridge based vessel segmentation in color images of the retina,” IEEE Trans. Med. Imag., vol. 23, no. 4, pp. 501–509, Apr. 2004.
      32. S. K. Kuri, "Automatic diabetic retinopathy detection using Gabor filter with local entropy thresholding," 2015 ReTIS, pp. 411-415, doi: 10.1109/ReTIS.2015.7232914.

Citation

Arun Pratap Singh, Amit Saxena, "A Comprehensive Review on Object Removal from Images using Deep Learning" International Journal of Scientific Research in Technology & Management, Vol.4, Issue.4, pp.1-06, 2024.

Advances in Alzheimer's Disease Detection: A Review of Current Methods

Pragya Tripathi, Ritu Prasad

Research Paper | Journal Paper

Vol.4, Issue.4, pp.07-12, Dec-2024

Abstract

Memory loss and cognitive deterioration are hallmarks of Alzheimer’s disease (AD) a progressive neurodegenerative illness. Effective management and treatment depend on early detection. This review paper attempts to investigate and assess the current approaches for AD detection such as machine learning techniques genetic evaluations biomarker identification and neuroimaging techniques. It highlights current developments talks about the advantages and disadvantages of each approach and makes recommendations for future paths to enhance early diagnosis. Memory loss and cognitive decline are hallmarks of Alzheimer’s disease (AD) a progressive neurological disorder that affects millions of people globally. Effective treatment and management depend on early detection but conventional diagnostic techniques like clinical evaluations and cognitive testing frequently miss the disease in its early stages. In order to detect AD earlier and more accurately machine learning (ML) has transformed medical diagnostics by analyzing complex datasets such as genetic data neuroimaging and other biomarkers. A thorough review of the many Machine learning (ML) methods used for AD identification is given in this review paper. These methods include supervised learning unsupervised learning and deep learning approaches. It draws attention to noteworthy studies that have effectively used these techniques on various kinds of data proving how well they work to increase diagnostic precision.

Key-Words / Index Term: Alzheimer’s Disease, Machine Learning, Deep Learning, Neuroimaging, Memory Loss, Early Diagnosis, Disease Detection.

References

      1. Klöppel, S., Stonnington, C.M., Chu, C., Draganski, B., Scahill, R.I., Rohrer, J.D., ... & Frackowiak, R.S.J. (2008). Automatic classification of MR scans in Alzheimer's disease. Brain, 131(3), 681-689. doi:10.1093/brain/awm319
      2. DSiDC, MRI Scan of Brain – Alzheimer’s disease, https://dementia.ie/lessons/mri-scan-of-brain-alzheimers-disease/. Accessed 20 August 2024.
      3. Davatzikos, C., Bhatt, P., Shaw, L. M., Batmanghelich, K. N., & Trojanowski, J. Q. (2011). Prediction of MCI to AD conversion via MRI, CSF biomarkers, and pattern classification. Neurobiology of Aging, 32(12), 2322.e19-2322.e27. doi:10.1016/j.neurobiolaging.2010.05.023
      4. Suk, H. I., Lee, S. W., & Shen, D. (2014). Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage, 101, 569-582. doi:10.1016/j.neuroimage.2014.06.077
      5. Payan, A., & Montana, G. (2015). Predicting Alzheimer's disease: a neuroimaging study with 3D convolutional neural networks. arXiv preprint arXiv:1502.02506.
      6. Duchesne, S., Caroli, A., Geroldi, C., Collins, D.L., & Frisoni, G.B. (2010). Relating one-year cognitive change in mild cognitive impairment to baseline MRI features. NeuroImage, 47(4), 1363-1370. doi:10.1016/j.neuroimage.2009.12.077
      7. Lebedev, A. V., Westman, E., Van Westen, G. J., Kramberger, M. G., Lundervold, A., Aarsland, D., & Soininen, H. (2014). Random forest ensembles for detection and prediction of Alzheimer's disease with good between-cohort robustness. NeuroImage: Clinical, 6, 115-125. doi:10.1016/j.nicl.2014.08.023
      8. Vemuri, P., Gunter, J. L., Senjem, M. L., Whitwell, J. L., Kantarci, K., Knopman, D. S., ... & Jack, C. R. (2011). Alzheimer's disease diagnosis in individual subjects using structural MR images: validation studies. NeuroImage, 56(2), 829-837. doi:10.1016/j.neuroimage.2010.06.065
      9. Zhang, D., Wang, Y., Zhou, L., Yuan, H., & Shen, D. (2011). Multimodal classification of Alzheimer's disease and mild cognitive impairment. NeuroImage, 55(3), 856-867. doi:10.1016/j.neuroimage.2011.01.008
      10. Jie, B., Zhang, D., Cheng, B., Shen, D., & Alzheimer's Disease Neuroimaging Initiative. (2015). Manifold regularized multitask feature learning for multimodality disease classification. Human Brain Mapping, 36(2), 489-507. doi:10.1002/hbm.22641
      11. Liu, M., Zhang, D., Adeli, E., & Shen, D. (2018). Deep multivariate networks for multi-class classification with application to Alzheimer's disease. NeuroImage, 145, 253-268. doi:10.1016/j.neuroimage.2016.01.042
      12. Hosseini-Asl, E., Keynton, R., & El-Baz, A. (2016). Alzheimer's disease diagnostics by adaptation of 3D convolutional network. arXiv preprint arXiv:1607.06583.
      13. Lipton, Z.C., Kale, D.C., Elkan, C., & Wetzel, R. (2016). Learning to diagnose with LSTM recurrent neural networks. arXiv preprint arXiv:1511.03677.
      14. Eshaghi, A., Young, A.L., Marinescu, R.V., Firth, N.C., Prados, F., Cardoso, M.J., ... & Alexander, D.C. (2018). Progression of regional grey matter atrophy in multiple sclerosis. Brain, 141(6), 1665-1677. doi:10.1093/brain/awy088
      15. Suk, H.I., & Shen, D. (2013). Deep learning-based feature representation for AD/MCI classification. MICCAI 2013, 583-590. doi:10.1007/978-3-642-40763-5_72
      16. Gupta, Y., Lama, R. K., Kwon, G. R., & Alzheimer's Disease Neuroimaging Initiative. (2019). Ensemble sparse feature learning for Alzheimer’s disease diagnosis. Frontiers in Neuroscience, 13, 1070. doi:10.3389/fnins.2019.01070
      17. Misra, C., Fan, Y., & Davatzikos, C. (2009). Baseline and longitudinal patterns of brain atrophy in MCI patients, and their use in prediction of short-term conversion to AD: results from ADNI. NeuroImage, 44(4), 1415-1422. doi:10.1016/j.neuroimage.2008.10.031
      18. Eskildsen, S. F., Coupe, P., Fonov, V. S., Pruessner, J. C., Collins, D. L., & Alzheimer's Disease Neuroimaging Initiative. (2013). Structural imaging biomarkers of Alzheimer's disease: predicting disease progression. Neurobiology of Aging, 34(10), 2464-2477. doi:10.1016/j.neurobiolaging.2013.04.001
      19. Gray, K. R., Aljabar, P., Heckemann, R. A., Hammers, A., & Rueckert, D. (2013). Random forest-based similarity measures for multi-modal classification of Alzheimer’s disease. NeuroImage, 65, 167-175. doi:10.1016/j.neuroimage.2012.09.065
      20. Benzinger, T. L., Blazey, T., Jack, C. R., Koeppe, R. A., Su, Y., Xiong, C., ... & Bateman, R. J. (2013). Regional variability of imaging biomarkers in autosomal dominant Alzheimer's disease. PNAS, 110(47), E4502-E4509. doi:10.1073/pnas.1317918110
      21. Zhang, D., Wang, Y., Zhou, L., Yuan, H., & Shen, D. (2014). Multimodal classification of Alzheimer's disease and mild cognitive impairment. NeuroImage, 55(3), 856-867. doi:10.1016/j.neuroimage.2011.01.008
      22. Lunnon, K., Keohane, A., Pidsley, R., Newhouse, S., Riddoch-Contreras, J., Thubron, E.B., ... & Lovestone, S. (2017). Methylomic profiling implicates cortical deregulation of ANK1 in Alzheimer’s disease. Nature Neuroscience, 20(9), 1164-1172. doi:10.1038/nn.4597
      23. Escott-Price, V., Myers, A. J., Huentelman, M., & Hardy, J. (2017). Polygenic risk score analysis of pathologically confirmed Alzheimer disease. Annals of Neurology, 82(2), 311-314. doi:10.1002/ana.24999
      24. Kunkle, B. W., Grenier-Boley, B., Sims, R., Bis, J. C., Damotte, V., Naj, A. C., ... & Lambert, J. C. (2019). Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nature Genetics, 51(3), 414-430. doi:10.1038/s41588-019-0358-2
      25. Moradi, E., Pepe, A., Gaser, C., Huttunen, H., & Tohka, J. (2015). Machine learning framework for early MRI-based Alzheimer's conversion prediction in MCI subjects. NeuroImage, 104, 398-412. doi:10.1016/j.neuroimage.2014.10.002
      26. Sarraf, S., & Tofighi, G. (2016). Deep AD: Alzheimer's disease classification via deep convolutional neural networks using MRI and fMRI. bioRxiv, 070441. doi:10.1101/070441
      27. Liu, S., Liu, S., Cai, W., Che, H., Pujol, S., Kikinis, R., ... & Feng, D. (2015). Multimodal neuroimaging feature learning for multiclass diagnosis of Alzheimer's disease. IEEE Transactions on Biomedical Engineering, 62(4), 1132-1140. doi:10.1109/TBME.2014.2372011
      28. Gray, K.R., Wolz, R., Heckemann, R.A., Aljabar, P., & Rueckert, D. (2012). Regional analysis of FDG-PET for use in the classification of Alzheimer's disease. Journal of Alzheimer's Disease, 34(3), 409-421. doi:10.3233/JAD-111446
      29. Vieira, S., Pinaya, W.H.L., & Mechelli, A. (2017). Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: Methods and applications. Neuroscience & Biobehavioral Reviews, 74, 58-75. doi:10.1016/j.neubiorev.2017.01.002

Citation

Pragya Tripathi, Ritu Prasad, "Advances in Alzheimer's Disease Detection: A Review of Current Methods" International Journal of Scientific Research in Technology & Management, Vol.4, Issue.4, pp.07-12, 2024.

Visual Effects (VFX) Using Deep Learning: A Comprehensive Review

Arun Pratap Singh, Sanjay Kumar Sharma

Research Paper | Journal Paper

Vol.4, Issue.4, pp.13-18, Dec-2024

Abstract

Visual effects (VFX) have evolved into a crucial component of modern entertainment, enabling filmmakers, game developers, and content creators to achieve visuals that transcend physical constraints. Traditional pipelines, grounded in computer graphics and manual artistry, often demand extensive effort and resources. The emergence of deep learning has introduced a paradigm shift, allowing for data-driven automation, photorealistic rendering, and intelligent scene manipulation. Deep learning models such as convolutional neural networks (CNNs), generative adversarial networks (GANs), transformers, diffusion models, and neural radiance fields (NeRFs) have reshaped workflows in areas such as object removal, background replacement, motion capture, super-resolution, style transfer, and text-to-video synthesis. This paper provides a comprehensive review of deep learning in VFX, consolidating advances in architectures, datasets, evaluation methods, and real-world applications. Key challenges—such as temporal consistency, computational overhead, dataset scarcity, and ethical concerns—are analyzed, while emerging research directions including multimodal control, efficient generative modeling, and real-time deployment are highlighted.

Key-Words / Index Term: Deep learning, visual effects, VFX, generative adversarial networks, transformers, diffusion models, neural rendering, NeRF, computer vision.

References

    1. Lin, T. Y., et al. (2014). Microsoft COCO: Common Objects in Context. ECCV.
    2. Deng, J., et al. (2009). ImageNet: A Large-Scale Hierarchical Image Database. CVPR.
    3. Perazzi, F., et al. (2016). A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. CVPR.
    4. Karras, T., et al. (2018). Progressive Growing of GANs for Improved Quality, Stability, and Variation. ICLR.
    5. Chang, A. X., et al. (2015). ShapeNet: An Information-Rich 3D Model Repository. arXiv:1512.03012.
    6. Deitke, M., et al. (2023). Objaverse: A Universe of Annotated 3D Objects. CVPR.
    7. Berthelot, D., et al. (2017). BEGAN: Boundary Equilibrium Generative Adversarial Networks. arXiv:1703.10717.
    8. Autodesk Maya, Adobe After Effects (Software). Industry-standard VFX and compositing tools.
    9. Horn, B. K., & Schunck, B. G. (1981). Determining Optical Flow. Artificial Intelligence.
    10. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). ImageNet Classification with Deep Convolutional Neural Networks (AlexNet). NeurIPS.
    11. Goodfellow, I., et al. (2014). Generative Adversarial Nets. NeurIPS.
    12. Wu, Z., et al. (2015). 3D ShapeNets: A Deep Representation for Volumetric Shapes. CVPR.
    13. Berthelot, D., et al. (2017). Unsupervised Learning for Image Synthesis Using GANs. arXiv.
    14. Chan, C., et al. (2019). Everybody Dance Now. ICCV.
    15. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. CVPR.
    16. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI.
    17. He, K., et al. (2017). Mask R-CNN. ICCV.
    18. Karras, T., et al. (2019). StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks. CVPR.
    19. Wang, X., et al. (2018). ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. ECCV Workshops.
    20. Dosovitskiy, A., et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT). ICLR.
    21. Arnab, A., et al. (2021). ViViT: A Video Vision Transformer. ICCV.
    22. Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. NeurIPS.
    23. Song, J., et al. (2021). Denoising Diffusion Implicit Models. ICLR.
    24. Mildenhall, B., et al. (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV.
    25. Lin, T. Y., et al. (2014). Microsoft COCO: Common Objects in Context. ECCV.
    26. Deng, J., et al. (2009). ImageNet: A Large-Scale Hierarchical Image Database. CVPR.
    27. Perazzi, F., et al. (2016). DAVIS: A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. CVPR.
    28. Karras, T., et al. (2018). Progressive Growing of GANs for Improved Quality, Stability, and Variation. ICLR.
    29. Chang, A. X., et al. (2015). ShapeNet: An Information-Rich 3D Model Repository. arXiv.
    30. Deitke, M., et al. (2023). Objaverse: A Universe of Annotated 3D Objects. CVPR.
    31. Wang, Z., et al. (2004). Image Quality Assessment: From Error Visibility to Structural Similarity (SSIM). IEEE Transactions on Image Processing.
    32. Zhang, R., et al. (2018). The Unreasonable Effectiveness of Deep Features as a Perceptual Metric (LPIPS). CVPR.
    33. Heusel, M., et al. (2017). GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (FID). NeurIPS.
    34. Zhou, T., et al. (2018). Temporal Consistency Metrics for Video Prediction. ECCV.
    35. Howard, A., et al. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861.
    36. Yu, J., et al. (2019). Free-Form Image Inpainting with Gated Convolution. ICCV.
    37. Chen, L. C., et al. (2018). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. TPAMI.
    38. Cao, Z., et al. (2017). Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. CVPR.
    39. Thies, J., et al. (2016). Face2Face: Real-Time Face Capture and Reenactment of RGB Videos. CVPR.
    40. Ledig, C., et al. (2017). Photo-Realistic Single Image Super-Resolution Using a GAN (SRGAN). CVPR.
    41. Singer, A., et al. (2022). Make-A-Video: Text-to-Video Generation without Text-Video Data. arXiv:2209.14792.
    42. Torralba, A., & Efros, A. A. (2011). Unbiased Look at Dataset Bias. CVPR.
    43. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and Policy Considerations for Deep Learning in NLP. ACL.
    44. Geirhos, R., et al. (2020). Shortcut Learning in Deep Neural Networks. Nature Machine Intelligence.
    45. Chesney, R., & Citron, D. (2019). Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security. California Law Review.
    46. Ramesh, A., et al. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL•E 2). arXiv:2204.06125.
    47. Meng, C., et al. (2023). Distillation of Diffusion Models for Fast Sampling. ICLR.
    48. Chen, T., et al. (2020). A Simple Framework for Contrastive Learning of Visual Representations (SimCLR). ICML.
    49. Pumarola, A., et al. (2021). D-NeRF: Neural Radiance Fields for Dynamic Scenes. CVPR.
    50. Floridi, L., et al. (2018). AI4People—An Ethical Framework for a Good AI Society. Minds and Machines.

Citation

Arun Pratap Singh, Sanjay Kumar Sharma, "Visual Effects (VFX) Using Deep Learning: A Comprehensive Review" International Journal of Scientific Research in Technology & Management, Vol.4, Issue.4, pp.13-18, 2024.

Handwritten Text Recognition using Deep Learning Algorithms

Arun Pratap Singh, Amit Saxena

Research Paper | Journal Paper

Vol.4, Issue.4, pp.19-23, Dec-2024

Abstract

Handwritten Text Recognition (HTR) is a long-standing problem in computer vision and pattern recognition, aiming to automatically transcribe handwritten documents into machine-readable text. Traditional approaches relied on handcrafted features and rule-based techniques, but these methods struggled with diverse writing styles, noise, and contextual ambiguity. With the advent of deep learning, architectures such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Transformers have significantly advanced recognition accuracy. This paper reviews deep learning-based HTR approaches, datasets, evaluation metrics, and applications while highlighting challenges and future research opportunities.

Key-Words / Index Term: Handwritten Text Recognition (HTR), Deep Learning, CNN, RNN, LSTM, Transformer, OCR.

References

    1. Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., & Schmidhuber, J. (2009). A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5), 855–868.
    2. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
    3. Shi, B., Bai, X., & Yao, C. (2017). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304.
    4. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
    5. Plamondon, R., & Srihari, S. N. (2000). Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 63–84.
    6. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
    7. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
    8. Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd International Conference on Machine Learning, 369–376.
    9. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008.
    10. Marti, U.-V., & Bunke, H. (2002). The IAM-database: An English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5(1), 39–46.
    11. Grosicki, E., & El Abed, H. (2009). ICDAR 2009 handwriting recognition competition. International Conference on Document Analysis and Recognition, 1398–1402.
    12. Stutz, H., & Bunke, H. (2018). The Bentham dataset: Historical handwritten manuscripts. International Journal on Document Analysis and Recognition, 21(1), 77–89.
    13. Vasudevan, V., et al. (2019). Transformer-based handwriting recognition on complex scripts. Pattern Recognition Letters, 123, 45–52.
    14. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
    15. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations.
    16. Kleber, F., Fiel, S., & Sablatnig, R. (2013). CVL handwriting dataset. International Conference on Document Analysis and Recognition, 560–564.
    17. Al-Maadeed, S., et al. (2010). KHATT: Arabic handwritten text database. International Journal on Document Analysis and Recognition, 13(2), 59–68.
    18. Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707–710.
    19. Povey, D., et al. (2008). Word error rate and performance evaluation in speech and text recognition. ICASSP, 4449–4452.
    20. Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33(1), 31–88.
    21. Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 311–318.
    22. Fogel, I., & Gelbukh, A. (2016). Digitization of historical archives using deep learning for handwriting recognition. International Journal on Document Analysis and Recognition, 19(3), 145–157.
    23. Jain, A. K., & Singh, R. (2008). Automated bank cheque processing. Pattern Recognition, 41(12), 3527–3537.
    24. Ma, X., et al. (2019). Handwriting recognition in healthcare records. Journal of Biomedical Informatics, 94, 103188.
    25. Le, H., et al. (2018). Smart classrooms: Automated exam evaluation using HTR. Educational Technology Research and Development, 66(6), 1425–1442.
    26. Nayef, A., et al. (2021). Multilingual OCR systems: Advances and challenges. Pattern Recognition Letters, 145, 132–145.
    27. Sivasankaran, A., & Kumar, S. (2017). Variability in handwriting styles and its impact on recognition. Pattern Recognition Letters, 94, 10–18.
    28. Bluche, T., & Messina, R. (2017). Deep neural networks for handwritten text recognition in low-resource datasets. International Conference on Document Analysis and Recognition, 35–40.
    29. Singh, A., et al. (2020). Handwriting recognition for low-resource Indic scripts. Pattern Recognition, 107, 107463.
    30. Fischer, A., et al. (2012). Recognition of historical documents: Challenges and methods. Pattern Recognition, 45(9), 3151–3163.
    31. Ahmed, F., et al. (2021). Privacy and ethical considerations in digitizing handwritten documents. ACM Computing Surveys, 54(3), 1–30.
    32. Moysset, B., et al. (2019). Multilingual and cross-lingual handwriting recognition. International Journal on Document Analysis and Recognition, 22(1), 35–49.
    33. Shi, X., et al. (2020). Self-supervised learning for handwriting recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 12112–12119.
    34. Kang, L., et al. (2018). Integration of multimodal cues for improved HTR performance. Pattern Recognition Letters, 111, 30–37.
    35. Liu, Y., et al. (2022). Efficient models for real-time handwritten text recognition on edge devices. IEEE Transactions on Neural Networks and Learning Systems, 33(5), 2013–2026.
    36. Guidotti, R., et al. (2018). A survey of methods for explaining black box models in AI, with applications to HTR. ACM Computing Surveys, 51(5), 1–42.

Citation

Arun Pratap Singh, Amit Saxena, "Handwritten Text Recognition using Deep Learning Algorithms" International Journal of Scientific Research in Technology & Management, Vol.4, Issue.4, pp.19-23, 2024.