Empirical System Reliability

April 16, 2007
Time: 11:00am-12:00pm
InterSchool Lab, 7th floor CEPSR
Hosted by: Department of Computer Science
Speaker: Bianca Schroeder, Carnegie Mellon University

Abstract

My research focuses on the design and implementation of computer systems. The methods I am using in my work are inspired by a broad array of disciplines, including performance modeling and analysis, workload and fault characterization, machine learning, and scheduling and queueing theory. My work spans a number of different areas in computer systems, including high-performance computing systems, web servers, computer networks, database systems and storage systems.

My PhD thesis work focused on scheduling to improve the performance of web servers and databases and to provide differentiated Quality of Service.

Currently, I am very interested in "empirical system reliability". This new line of research is motivated by the fact that, with the ever growing component count in large-scale IT systems, component failures are quickly becoming the norm rather than the exception. Yet, virtually no data on failures in real systems is publicly available, forcing researchers to base their work on anecdotes and back of the envelope calculations rather than empirical data. The goal of my work is to collect and analyze failure data from real, large-scale production systems and to exploit the results for better system design and management.

For a brief overview over some of the projects I have worked on check out the following project web pages:

Speaker Biography

I'm a post-doc at CMU working with Garth Gibson. I finished my PhD in August 2005 at CMU under the guidance of Mor Harchol-Balter. I am currently on the job market looking for an academic position. Please take a look at my application materials.

My research focuses on the design, implementation and performance evaluation of computer systems. The methods I am using in my work are inspired by a broad array of disciplines, including performance modeling and analysis, workload and fault characterization, machine learning, and scheduling and queueing theory. My work spans a number of different areas in computer systems, including high-performance computing systems, web servers, computer networks, database systems and storage systems.


500 W. 120th St., Mudd 1310, New York, NY 10027    212-854-3105               
©2014 Columbia University