HIGH-PERFORMANCE COMPUTING ARCHITECTURES FOR TRAINING LARGE-SCALE TRANSFORMER MODELS IN CYBER-RESILIENT APPLICATIONS

Md Mohaiminul Hasan; Md Muzahidul Islam

doi:10.63125/6zt59y89

Authors

Md Mohaiminul Hasan Project Analyst, Quantanite, Dhaka, Bangladesh Author
Md Muzahidul Islam B.Sc in Computing Science and Technology, Jiangxi Normal University, Jiangxi, China Author

DOI:

https://doi.org/10.63125/6zt59y89

Keywords:

High-Performance Computing, Transformer Models, Cyber Resilience, Distributed Training, Federated Learning

Abstract

The exponential growth of transformer-based architectures has fundamentally reshaped the landscape of artificial intelligence, enabling breakthroughs across domains such as natural language processing, cybersecurity analytics, and autonomous decision systems. However, training and deploying these models at scale demand robust high-performance computing (HPC) infrastructures capable of managing massive computational loads, distributed data pipelines, and cyber-resilient workflows. This systematic review explores the convergence of HPC technologies and large-scale transformer model training with a particular focus on security and fault-tolerant applications. A total of 126 peer-reviewed studies published between 2018 and 2022 were analyzed using structured screening and thematic synthesis techniques. The review identifies emerging trends in GPU- and TPU-based parallelism, memory optimization strategies, model sharding techniques, and secure distributed training frameworks that enhance system resilience against data breaches and adversarial interference. Furthermore, the study discusses advances in federated learning architectures, hybrid cloud–edge HPC environments, and AI-driven workload orchestration that collectively contribute to cyber-resilient computational ecosystems. The findings highlight the increasing emphasis on integrating fault tolerance, encryption-aware resource allocation, and autonomous security auditing within large-scale model training workflows. This work provides a consolidated framework for understanding how high-performance computing infrastructures are evolving to support the scalability, reliability, and security of transformer models in mission-critical and cyber-sensitive environments.