HyScaleFlow: An ML-Driven DAG-Based Orchestration Framework for Real-Time Stream Processing in Hybrid Cloud Environments
Abstract
The increasing complexity of real-time data processing across hybrid cloud and edge environments has revealed significant limitations in existing distributed stream processing systems. While frameworks like Apache Spark and Flink offer strong scalability and performance, they lack the orchestration intelligence required to adapt to dynamic workloads, anticipate failures, and optimize resource usage in heterogeneous environments. Traditional rule-based or reactive orchestration approaches fail to deliver the responsiveness and fault resilience needed for mission-critical applications in domains such as IoT analytics, innovative infrastructure, and cyber-physical systems. To address these challenges, this paper presents HyScaleFlow, a scalable and modular framework that integrates real-time stream processing with machine learning–driven orchestration. The architecture combines Apache Spark (at the edge) and Apache Flink (in the cloud) with a hybrid DAG-based orchestration strategy using Apache Airflow and Dagster. A key innovation is the FlowGuard module, which uses XGBoost models (classifier and regressor) to predict node failures and forecast resource load based on Prometheus-exported telemetry metrics. These predictions dynamically inform DAG execution, enabling preemptive scaling, container migration, and workload-aware task routing. Evaluations were conducted using the NYC Taxi Trip dataset (over 1.1 billion records) on a hybrid cloud testbed that combines Spark at the edge and Flink in the cloud, orchestrated via Docker/Kubernetes. Results reveal that HyScaleFlow improves DAG completion rates by 16.8%, reduces task retry rates by over 60%, and enhances fault recovery times by up to 40%. Additionally, the framework achieves a 19.5% reduction in cloud execution cost and a 35.9% gain in resource efficiency. HyScaleFlow demonstrates strong utility for real-time, data-intensive applications by unifying predictive intelligence with stream processing. It provides a replicable, cost-effective, and resilient solution for hybrid cloud data engineering, advancing the state of intelligent orchestration.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i9.9498
This work is licensed under a Creative Commons Attribution 3.0 License.








