INTEGRATION OF ARTIFICIAL INTELLIGENCE AND DEVOPS IN SCALABLE AND AGILE PRODUCT DEVELOPMENT: A SYSTEMATIC LITERATURE REVIEW ON FRAMEWORKS
DOI:
https://doi.org/10.63125/exyqj773Keywords:
DevOps, Artificial Intelligence, MLOps, AIOps, CI/CD, Microservices, Progressive Delivery, Site Reliability Engineering, DevSecOps, Model RegistryAbstract
This systematic literature review examines how artificial intelligence is integrated with DevOps to enable scalable and agile product development across organizational and technical contexts. Following a registered, PRISMA-guided protocol, we searched peer-reviewed and selected industry sources through 2021, applied transparent eligibility criteria, and extracted evidence on architectures, lifecycle coverage, platform capabilities, governance and security controls, and outcomes. We developed a taxonomy that distinguishes reference architectures, lifecycle and process models, and pipeline or platform frameworks, and mapped each to DevOps stages and AI or ML capabilities. The final corpus comprised 115 studies, which we synthesized using descriptive evidence mapping, thematic integration, and quality-weighted aggregation of reported effects. Findings show that most frameworks concentrate integration in build, test, and release, where AI augments CI and CD with data validation, predictive test selection, and change-risk analysis, while fewer extend into deploy and operate with progressive delivery keyed to service objectives and AIOps for anomaly detection and triage; upstream learning loops and requirements intelligence appear less frequently. Reported outcomes, where quantified, indicate improvements in throughput, reliability, and recovery time when AI is embedded within disciplined engineering practices, supported by microservices, cloud elasticity, model registries, feature stores, observability, and policy-as-code. Governance and security are most effective when treated as first-class pipeline concerns rather than afterthoughts. Limitations include heterogeneous study designs, uneven measurement depth, and sparse evidence on closed-loop retraining and supply-chain integrity for data and models. The review contributes a reusable taxonomy, coverage heatmaps, and integration patterns to inform both research and practice.