Welcome - Report from the President - Report from the Treasurer
Presentation of the Board members - SIG Applications and programming environments
SIG User Support - SIG System Support - SIG Legacy Systems - SIG Xtreme
30 years of CUG - Next conference - Photos

XTreme SIG

The XTreme meeting was conducted on 4-May-08 in Helsinki beginning at 09:00 and ending at 17:00 at the Scandic Grand Marina Hotel.  There were nineteen people in attendance, representing AWE, CSC, CSCS, Hector/HPCX, NERSC, ORNL, PSC, and Sandia.  There were also four Cray managers in attendance.

The day began with site updates from CSC (Joni Virtanen) and AWE (Dave Goddard).

Then the sites met without Cray in attendance to discuss issues of concern and to prepare for presenting these to Cray management.  Concerns discussed included software release management, the restricted sharing of SPRs, inadequate Cray hardware resources, and abnormal job terminations.  These issues were prioritized, summarized, and presented to Cray management for discussion and consideration.

The day continued with several Cray presentations.
Peter Young from Cray presented:  
An update on the XTreme requirements.  Lustre failover is available in 2.1, but scaling issues remain.  Source RPMs should be improving for 2.0 releases.  Sites still have issues getting the source associated with a specific release.  Node attributes for batch scheduling are in 2.0.
Cray’s long-term strategy.
The basic principles are based on performance and timely software releases with no regressions.  Cray has had a 40% reduction in SPRs in the past year and have improved software MTTI by 25%.  They will plan for two significant releases per year—general availability releases in Q2 and Q4, with a limited availability release one quarter earlier.  Mid-release hardware may be supposed with a product-specific release.  For example, XT5 will require 2.1 HD release.  There will be consolidated and scheduled maintenance releases beginning with XT 2.1.  Service node and compute node software will be released independently.  The goal for the programming environment is to close the gap between observed performance and peak performance.  There will be one major release per year from each group and one programming environment rollup release per year.

Charlie Carroll from Cray presented:
Lustre support and ZFS.  The release of Sun’s Lustre 1.6.5 was imminent with 1.8 scheduled for September 2008, 2.0 scheduled for December 2008, and 3.0 scheduled for June 2009.  Lustre 2.0 includes ZFS and user-space servers (moving out of kernel space).  Lustre 3.0 provides tools for migrating from ext3 to ZFS (and end-of-life for ext3).  Clustered metadata is turned on.  There was no detail on when 1.8, 2.0, or 3.0 might appear in Cray software.  Sun will end-of-life Lustre 1.4 effective June 2009.  1.6 EOL is December 2009.  There is a potential problem where XT 2.2 will continue to use the 1.6 Lustre code, but there will be no software updates.  Lustre 1.8 should show up in the 2.2 release plan, but currently does not. 

ALPS and the Common Administration Infrastructure.  Cray plans to bring back the Catamount features that were scattered with CNL.  ALPS will have an interface to Mazama.  The programming environment tools may have an interface to Mazama as well.  They will be expanding checks of node health, exchanging information with Mazama.  Cray will directly manage Cray hardware and will monitor white boxes serving XT functions.  Cray will make information available via SNMP so that Cray systems can be managed by existing data center tools.  The initial implementation will perhaps be in XT 2.2 and there will be 6-8 total anticipated phases.

Dave Wallace and Charlie Carroll from Cray presented:
Future Requirements for System Administration.

Ann Baker (ORNL)
SIG Chair

Jim Rogers (ORNL)
Deputy

Copyright 2008 All Rights Reserved.