Welcome to the CIFTS Wiki.
The "Coordinated Infrastructure for Fault Tolerant Systems (CIFTS)" initiative aims to develop an open-source "Fault Tolerance Backplane" API specification and build the infrastructure necessary to enable systems to adapt to faults in a holistic manner. The focus is to help improve fault-tolerance in large systems through a coordinated approach by integrating the various fault-tolerance features of the software components - starting from top level applications to middleware through the file system and operating system - present in the high-end computing system. Such integration will make possible a level of fault prediction, notification, management, and recovery that is impossible today but critical to the productive use of the high-end petascale systems of tomorrow.
The objectives of this effort are as follows:
- Design an open source reference implementation of a fault awareness and notification backplane to provide common uniform event handling and notification mechanisms for fault-aware libraries and middleware
- Create a public interface specification that allows libraries, run-time systems, and applications to connect to and use the fault-tolerance backplane
- Extend key libraries and applications to validate the interface choices and to form the critical mass necessary for adoption in the community