Frequently Asked Questions

From CIFTS

Jump to: navigation, search

Contents

What is CIFTS?


CIFTS ("Coordinated Infrastructure for Fault Tolerant Systems) is an initiative that aims to develop an open-source "Fault Tolerance Backplane" software for helping improve fault-tolerance in large systems through a co-ordinated approach by integrating the various fault-tolerance features of the software components.

It aims to provide a standard framework using which software at all levels of the HPC stack (from applications, libraries, operating systems, networking protocols, file systems etc.) can share information about faults occurring in their domain, thus providing an opportunity for the other softwares in the system to act pro-actively instead of reactively to system-wide faults.


What is FTB?


FTB stands for the Fault Tolerance Backplane. The FTB forms the core of the CIFTS and provides a scalable messaging layer and a portable API for different software to exchange information. Softwares can connect to the FTB, subscribe to fault events they are interested in receiving from other softwares as well as publish their own fault-information using the FTB API.


Which organizations and individuals are involved with the CIFTS effort?


The CIFTS effort is lead by the following organizations.

  • Argonne National Laboratory
  • Indiana University
  • Lawrence Berkeley National Laboratory
  • Oak Ridge National Laboratory
  • Ohio State University
  • University of Tennesse, Knoxville

Click for Participants List

What software is currently being made FTB-compatible?


The initial scope of the FTB project will span the following software components

  • Middleware (MPI) -- MPICH2, MVAPICH2, Open MPI, LAM-MPI
  • InfiniBand-enabled Networking Software
  • Parallel File Systems -- PVFS2
  • Job Scheduler and Resource Managers -- Cobalt
  • Operating Systems -- ZeptoOS
  • Checkpoint/Restart -- BLCR
  • Math Libraries -- ScaLAPACK
  • Applications -- SWIM, LAMPPS


What platforms are supported by FTB?


The current targeted platforms are:

  • Linux systems (Ubuntu Hardy, Fiesty)
  • IBM Blue Gene series (BG/L, BG/P)
  • Cray XTs (Cray XT4)


How can I get started with using FTB?


To get started with FTB, read instructions at: http://wiki.mcs.anl.gov/cifts/index.php/Getting_Started_with_FTB


How is FTB licensed?


FTB is licensed under the BSD license. Click for Detailed license

Whom can I contact to get more information on CIFTS?


You can send an email to the cifts_discuss@googlegroups.com mailing list.

Following are additional links to get more information on CIFTS.


Can I collaborate with the CIFTS team?


Absolutely! We are looking forward to working with collaborators who wish to enhance the Fault Tolerance Backplane (FTB), port FTB on their system, make their software FTB-enabled or simply plan to run it on their systems! We would like to hear your experiences with FTB and suggestions on how we can improve upon it.

Personal tools