Authors: Annmary Justine (Hewlett Packard Enterprise), Aalap Tripathy (Hewlett Packard Enterprise), Revathy Venkataramanan (Hewlett Packard Enterprise), Sergey Serebryakov (Hewlett Packard Enterprise), Martin Foltin (Hewlett Packard Enterprise), Cong Xu (Hewlett Packard Enterprise), Suparna Bhattacharya (Hewlett Packard Enterprise), Paolo Faraboschi (Hewlett Packard Enterprise)
Abstract: Development of trustworthy AI models often requires significant effort, resources and energy. Available tools focus on optimization of individual AI pipeline stages but lack in end-to-end optimization and reuse of historical experience from similar pipelines. This leads to excessive resource consumption from running unnecessary AI experiments with poor data or parameters. Hewlett Packard Labs is developing a novel Self-Learning Data Foundation for AI infrastructure that captures and learns from AI pipeline metadata to optimize the pipelines. We show examples how Data Foundation is helping AI practitioners: i) enable reproducibility, audit trail, and incremental model development across distributed sites spanning edge, High Performance Computing and cloud (e.g., in particle trajectory reconstruction, autonomous microscopy computational steering, etc.), ii) reduce number of AI model training experiments by initializing AutoML training runs based on historical experience, and, iii) track resource consumption and carbon footprint through different stages of AI lifecycle enabling energy aware pipeline optimizations. The visibility of pipeline metadata beyond training to inference and retraining provides insights about end-to-end tradeoffs between runtime, accuracy and energy efficiency. The Data Foundation is built on open-source Common Metadata Framework that can be integrated with 3rd party workflow management, experiment tracking, data versioning and storage back ends.
Long Description: Development of trustworthy AI models often requires significant effort, resources and energy. Available tools focus on optimization of individual AI pipeline stages but lack in end-to-end optimization and reuse of historical experience from similar pipelines. This leads to excessive resource consumption from running unnecessary AI experiments with poor data or parameters. Hewlett Packard Labs is developing a novel Self-Learning Data Foundation for AI infrastructure that captures and learns from AI pipeline metadata to optimize the pipelines. We show examples how Data Foundation is helping AI practitioners: i) enable reproducibility, audit trail, and incremental model development across distributed sites spanning edge, High Performance Computing and cloud (e.g., in particle trajectory reconstruction, autonomous microscopy computational steering, etc.), ii) reduce number of AI model training experiments by initializing AutoML training runs based on historical experience, and, iii) track resource consumption and carbon footprint through different stages of AI lifecycle enabling energy aware pipeline optimizations. The visibility of pipeline metadata beyond training to inference and retraining provides insights about end-to-end tradeoffs between runtime, accuracy and energy efficiency. The Data Foundation is built on open-source Common Metadata Framework that can be integrated with 3rd party workflow management, experiment tracking, data versioning and storage back ends.
Paper: PDF