Minutes for FTB conference call - 2010 Jan 20th
- Argonne National Lab: Rinku
- Oak Ridge National Lab: Hoony, Thomas
- Ohio State University: Sonia
- Indiana University: Abhishek
- University of Tennessee: Absent
- Lawrence Berkeley National Lab: Paul
- Hoony attended the CSM Summit and presented a demo of the system monitoring tool. He spoke about FTB and CIFTS and discussed how to utilize it with the CSM to broadcast system info. He plans to follow-up through regular conference calls.
- The system monitoring tool is being used in-house on Kracken system.
- Hoony to share the RAS event reporting tool with ANL (Pending action item)
- Aniruddha to provide information to ANL Cobalt folks on scenarios that applications and job scheduler should work together on. This information will be emailed to Narayan once Aniruddha and David finalize it. (Pending action item)
- Nothing new to report from FTB point-of-view. They are working on process migration on MVAPICH.
- Provided an update on SC'09 activities. Spoke to Cray folks about BLCR and FTB. From BLCR perspective, the CRAY plan to use BLCR for their future systems as well. No clear commitment towards CIFTS.
- Public beta of latest BLCR in Feb 2009. This is a feature-driven release of BLCR (features include compression, I/O mitigation, incremental checkpointing and other things). No changes in BLCR from FTB point-of-view. BLCR will not be subscribing to events.
- Paul is interested in feedback about what events can potentially be thrown by BLCR that would be useful to other components (Pending action items for all component developers)
- IU is interested in starting a conversation about how to standardize MPI FTB events and in particular their payloads. Abhishek to start an offline conversation with MPICH and MVAPICH folks about this.
- Hoony to share the RAS event reporting tool
- Aniruddha to provide Cobalt fault scenarios (Refer to items discussed for more information)
- IU to start email conversation with MPICH and MVAPICH folks about standardizing MPI FTB events and payloads
- Everyone to get back to Paul if they want their components to interact with BLCR through specific BLCR published FTB events.