HIGH-PERFORMANCE COMPUTING ARCHITECTURES FOR TRAINING LARGE-SCALE TRANSFORMER MODELS IN CYBER-RESILIENT APPLICATIONS
DOI:
https://doi.org/10.63125/6zt59y89Keywords:
High-Performance Computing, Transformer Models, Cyber Resilience, Distributed Training, Federated LearningAbstract
The exponential growth of transformer-based architectures has fundamentally reshaped the landscape of artificial intelligence, enabling breakthroughs across domains such as natural language processing, cybersecurity analytics, and autonomous decision systems. However, training and deploying these models at scale demand robust high-performance computing (HPC) infrastructures capable of managing massive computational loads, distributed data pipelines, and cyber-resilient workflows. This systematic review explores the convergence of HPC technologies and large-scale transformer model training with a particular focus on security and fault-tolerant applications. A total of 126 peer-reviewed studies published between 2018 and 2022 were analyzed using structured screening and thematic synthesis techniques. The review identifies emerging trends in GPU- and TPU-based parallelism, memory optimization strategies, model sharding techniques, and secure distributed training frameworks that enhance system resilience against data breaches and adversarial interference. Furthermore, the study discusses advances in federated learning architectures, hybrid cloud–edge HPC environments, and AI-driven workload orchestration that collectively contribute to cyber-resilient computational ecosystems. The findings highlight the increasing emphasis on integrating fault tolerance, encryption-aware resource allocation, and autonomous security auditing within large-scale model training workflows. This work provides a consolidated framework for understanding how high-performance computing infrastructures are evolving to support the scalability, reliability, and security of transformer models in mission-critical and cyber-sensitive environments.