CSE Colloquium: Runtime Verification of Cloud Systems

Abstract:

Distributed systems for cloud infrastructure offer numerous features that users and applications rely on. Due to their massive complexities, all cloud systems experience failures despite extensive efforts to find and eliminate bugs. Many of the production failures are not simple crashes but partial breakdowns or silent violations without any error signals. To address such failures and provide strong assurance, runtime verification is a promising approach. However, runtime verification requires extensive checkers, which are time-consuming and error-prone to manually write.

In this talk, I will present two solutions that synthesize comprehensive checkers to enable runtime verification for cloud systems. First, I will describe a program reduction technique that automatically constructs custom watchdogs based on system code. Second, I will show an approach that leverages past failures to automatically infer semantic rules from execution traces. These methodologies have been successfully applied to large-scale, widely-used distributed systems and detect real-world failures. The talk will conclude with a discussion on the open challenges for runtime verification at scale.

Bio:

Ryan Huang is an associate professor in the EECS Department at University of Michigan, Ann Arbor. Prior to that, he was an Assistant Professor at Johns Hopkins University. He leads the OrderLab, which conducts research broadly in computer systems and specializes in designing principled methods to improve the reliability and performance of large system software. His work received multiple best paper awards in top conferences. He is a recipient of the NSF CAREER Award and a Meta research award.

Additional Information:

Zoom: psu.zoom.us/j/92709766247

 

Share this event:

facebook linked in twitter email

Media Contact: Timothy Zhu