Faculty Recruiting Support CICS

Intelligent Resource Optimization for Big Data Analytics Systems

27 Mar
Wednesday, 03/27/2024 10:00am to 12:00pm
Zoom
PhD Dissertation Proposal Defense
Speaker: Chenghao Lyu

Big data analytics in the cloud has achieved widespread adoption, revolutionizing data-driven insight discovery and decision-making for businesses and applications. However, unlocking economies of scale is contingent upon the efficient configuration of analytical jobs within big data analytics systems, rapidly optimizing user performance-cost benefits. Existing methods, often heuristic-based or limited to coarse-grained control, face challenges due to three key factors: (i) the complexity of predicting performance amidst varying job characteristics and dynamic runtime environments, (ii) the difficulty in developing Pareto optimal resource optimization solutions with good coverage, efficiency, and consistency, and (iii) the evolution of advanced systems like MaxCompute and Spark, which offer flexible configuration controls, leading to more complex hierarchical or adaptive optimization challenges.

This thesis aims to answer two questions: (i) how to build performance models for analytical jobs in big data systems, and (ii) how to efficiently automate resource optimization across various granularities and system settings? My first contribution is a model server designed for diverse data analytics scenarios. It uses a black-box modeling approach for jobs with unknown properties, learning job characteristics and performance in the execution environment. This process is enhanced by an autoencoder and a customized triplet loss, which effectively disentangles job embeddings from runtime metrics for accurate performance predictions. For SQL jobs with available query plans, the server employs a white-box modeling approach, using a multi-channel input framework to process heterogeneous data structures and learn job performance.

The second contribution is an intelligent resource optimizer with three components for various optimization options. The first component is my enhancement of an existing unified data analytics optimizer through a custom gradient-based solver, boosting its efficiency within a multi-objective optimization framework. The second component is a stage-level resource optimization method designed for a production-scale scheduler, mapping millions of computation tasks to machines and resource profiles within sub-seconds. The last component is an adaptive, multi-granularity framework for Spark SQL, tuning parameters both at compile-time and runtime, integrating adaptive query execution in Spark.

Advisor: Yanlei Diao

Join via Zoom