Frequently Asked Questions
What is CIFTS?
CIFTS ("Coordinated Infrastructure for Fault Tolerant Systems) is an initiative that aims to define the Fault Tolerance Backplace (FTB) API specification to enable software to exchange fault-related information on large systems and carry out actions in a coordinated manner to achieve comprehensive and holistic fault tolerance. It aims to provide a standardized interface using which software at all levels of the HPC stack (from applications, libraries, operating systems, networking protocols, file systems etc.) can share information about faults occurring in their domain, thus providing an opportunity for the other softwares in the system to act pro-actively instead of reactively to system-wide faults.
In addition to the FTB API specification, the CIFTS team is working on creating an implementation of the specification, called the FTB software. Various members of the team also work on other popular software such as MPICH2, MVAPICH2, OPEN MPI, COBALT, BLCR etc. and are working on integrating these software with the FTB software.
What is the FTB API specification?
FTB stands for the Fault Tolerance Backplane. The FTB API specification is an interface specification that allows different software on a system to communicate and exchange fault related information.
FTB API version 0.5 is the latest FTB API specification.
The CIFTS team is currently working on the FTB API 1.0 specification.
Which organizations and individuals are involved with the CIFTS effort?
The CIFTS effort is lead by the following organizations.
- Argonne National Laboratory
- Indiana University
- Lawrence Berkeley National Laboratory
- Oak Ridge National Laboratory
- Ohio State University
- University of Tennesse, Knoxville
Click for Participants List
What is CIFTS FTB software?
The CIFTS FTB software is a implementation of the FTB API. The current FTB software is fully compatible with the FTB API 0.5 specification currently and it provides a scalable messaging layer for different software to exchange information.
Details of the design of the FTB software can be found in the publication titled CIFTS: A Coordinated infrastructure for Fault-Tolerant Systems, which was published at the International Conference on Parallel Processing (ICPP) in 2009
What platforms are supported by the FTB software?
The current targeted platforms are:
- Linux systems (Ubuntu Hardy, Fiesty)
- IBM Blue Gene series (BG/L, BG/P)
- Cray XTs (Cray XT4)
How can I get started with using the FTB software?
To get started with FTB, read instructions at: http://wiki.mcs.anl.gov/cifts/index.php/Getting_Started_with_FTB
How is the FTB software licensed?
FTB is licensed under the BSD license. Click for Detailed license
What software is currently being made FTB-compatible?
The initial scope of the FTB project will span the following software components
- Middleware (MPI) -- MPICH2, MVAPICH2, Open MPI, LAM-MPI
- InfiniBand-enabled Networking Software
- Parallel File Systems -- PVFS2
- Job Scheduler and Resource Managers -- Cobalt
- Operating Systems -- ZeptoOS
- Checkpoint/Restart -- BLCR
- Math Libraries -- ScaLAPACK
- Applications -- SWIM, LAMPPS
Are there other implementations of the FTB API available?
Other than the FTB software, some of our collaborators are working on getting the FTB API ported on to the OpenSAF framework and on the AMQP protocol framework.
Who can I contact to get more information on CIFTS?
You can send an email to the cifts_discuss@googlegroups.com (public) or cifts@googlegroups.com (private) mailing list.
Following are additional links to get more information on CIFTS.
- Website : http://www.mcs.anl.gov/research/cifts/
- SVN : https://svn.mcs.anl.gov/repos/cifts/
- TRAC : http://trac.mcs.anl.gov/projects/cifts/wiki
Can I collaborate with the CIFTS team?
Absolutely! We are looking forward to working with collaborators who wish to enhance the Fault Tolerance Backplane (FTB), port FTB on their system, make their software FTB-enabled or simply plan to run it on their systems! We would like to hear your experiences with FTB and suggestions on how we can improve upon it.