THE ROLE OF ETL (EXTRACT-TRANSFORM-LOAD) PIPELINES IN SCALABLE BUSINESS INTELLIGENCE: A COMPARATIVE STUDY OF DATA INTEGRATION TOOLS

Authors

  • Danish Mahmud Master of Science in Information Technology (MSIT), Washington University of Science and Technology, Alexandria, VA 22314, USA Author
  • Md. Zafor Ikbal Master of Science in Information Technology, Washington University of Science and Technology, VA, USA Author

DOI:

https://doi.org/10.63125/1spa6877

Keywords:

ETL, Business Intelligence, ELT, Data Governance, Cloud Integration

Abstract

This study systematically reviews the role of Extract–Transform–Load (ETL) pipelines in scalable business intelligence (BI), with particular emphasis on their evolution, tool ecosystems, performance optimization, and global governance implications. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework, a total of 63 studies were identified, screened, and synthesized from academic databases and grey literature. The findings reveal that ETL pipelines, once predominantly batch-oriented, have expanded into ELT and streaming paradigms, enabled by cloud-native warehouses and distributed architectures. Across the reviewed literature, data quality, metadata management, and lineage emerge as central imperatives for BI reliability, extending beyond technical efficiency to encompass governance, compliance, and accountability. Comparative analyses highlight the distinct strengths of commercial platforms such as Informatica, IBM DataStage, and Microsoft SSIS, contrasted with the flexibility and cost-effectiveness of open-source frameworks including Talend, Pentaho, and Apache NiFi. Cloud-native services such as AWS Glue, Azure Data Factory, and Google Dataflow are shown to embed scalability and governance into serverless pipelines, while innovations like Apache Spark and Delta Lake provide ACID-compliant lakehouse capabilities for enterprise analytics. The review also demonstrates how global governance frameworks—including GDPR, CCPA, OECD, and UNCTAD—necessitate embedding compliance into ETL processes through metadata, lineage, and documentation. Overall, the study concludes that ETL pipelines are not merely technical workflows but socio-technical infrastructures that sustain BI scalability, institutional trust, and regulatory legitimacy in global data environments.

Downloads

Published

2022-04-29

How to Cite

Danish Mahmud, & Md. Zafor Ikbal. (2022). THE ROLE OF ETL (EXTRACT-TRANSFORM-LOAD) PIPELINES IN SCALABLE BUSINESS INTELLIGENCE: A COMPARATIVE STUDY OF DATA INTEGRATION TOOLS. ASRC Procedia: Global Perspectives in Science and Scholarship, 2(1), 89–121. https://doi.org/10.63125/1spa6877