Minutes for FTB conference call - 2007 May 29th
Revision as of 16:09, 31 May 2007 by Rgupta@mcs.anl.gov
- Argonne National Lab: Rinku Gupta, Pete Beckman, Susan Coghlan, Narayan Desai, Rob Ross, Rajeev Thakur
- Lawrence Berkeley National Lab: Paul Hargrove
- Oak Ridge National Lab: Al Geist, Aniruddha Shet
- Indiana University: Tim Prins
- Ohio State University: Qi Gao, Abhinav Vishnu
- University of Tennessee: George Bosilca
- Progress Reports
- A single consolidated Progress report will be created. Everyone to send a 1-page report on their Fault Tolerant work by Monday June 4th to Rinku
- Presentation Walk-through: Important items discussed
- Mapping between event categories and software component categories - It was suggested that we have the same categories across all components, since some errors/warnings, belonging to different components, will be similar in nature.
- Mechanism needed to assign affinity between faults thrown by different components but occuring due to a common failure.
- Mechanism needed to provide a single aggregate response to different reported errors for a common failure
- 'Event scoping and grouping' was discussed - Event grouping will be important for implementors who wish to insularly include FTB in their product. In addition, there may be faults/warnings that may not need to be propagated beyond the local system, thus establishing a need for having events local in scope. The complexity of this topic led to a joint decision that it will be discussed at a later stage in the design cycle.
- OpenMPI folks to provide input/ideas on any fault tolerant specific features derived on experience gained with OpenMPI
- Face-to-face All-Hands meeting
- This will take place in Salt Lake city on a weekend in July
- CRAY representatives to be invited
- Component owners to work on their schemas for the components. Rinku to send out a sample.
- Owners to send a 1-pager update on their FTB work
June 12th 2007