Applications and Programming Environment SIG Report
During the Cray User Group 2008, the Applications and Programming Environment Special Interest Group meeting was held Monday, May 5 at 5:00pm. In attendance were 33 people, including but not limited to members from Boeing, Cray, ORNL, DMI, Sandia, HPCX Consortium, Arctic Region Supercomputing Center, National Energy Research Scientific Computing Center, Allinea and CSC. Rolf Rabenseifner chaired the Applications and Programming Environment SIG. There were 8 members from Cray in attendance to participate in the discussion and collect user feedback.
After a brief introduction by the Chair, Rolf Rabenseifner from High Performance Computing Center in Stuttgart, Cray was given the opportunity to make announcements and provide comments. The goal of this Special Interest Group session was to hear feedback from users on what was going well and what was not going well with respect to applications and the programming environment. Jef Dawson of Cray stated that Cray was interested in hearing about application issues. He commented that the applications group within Cray works mostly internally, but that they try to monitor applications that are used by a large number of customers. Luiz DeRose of Cray briefly described the Cray Programming Environment, and described Cray’s focus which included hiding complexity of the system from users, and trying to make the environment easier to use. He asked for feedback on how Cray could improve on areas within this focus. He commented that one thing that he saw in the field at times was older versions of the software running at customer sites. He was concerned that customers weren’t always aware of the latest software, and because of this, he has created an email list where latest release information will be distributed in addition to the existing channels. He asked users to send him email if they wished to be added to this list. Nola Van Vugt of Cray then discussed the current state of Cray documentation, and asked for feedback on what type of documentation was important and most useful to users. The goal was to understand what was truly needed so that Cray could streamline their documentation content and processes. She made a 6 page questionnaire available for more formal feedback.
Then the session was opened up for any comments and issues.
The group had a discussion around how many compilers were supported on the XT system. A comment was made that there were too many compilers, and that only one would be optimal. However, another user noted that several compilers were needed to have a choice to work around stability issues, or find the best performance. The group went around and around on this topic. One user asked if the XT could just be like the X2, and have one Cray compiler. It was then mentioned that it would be desirable of course to have this compiler generate the best performance and be the most stable compared to the current compilers available on the system.
The topic then moved to the MPI software. It was stated that three climate codes were running out of MPI resources. These applications were constructed in a similar fashion where one process sent data to all the other processes. The customer has adjusted environment variables, only to find that a different limit was then exceeded. This type of application never consistently ran, suffering from intermittent failures. The user intended to try OpenMPI, to see if resources we exceeded with this implementation as well. The customer understood that the applications were written inefficiently, but it stated that it was difficult to get the agencies responsible for the applications to redesign them. A question was asked if there was an MPI environment variable in mpt 3.0 that throttled message flow to prevent resource exhaustion. Members from the site said they would be happy running a slower application if it ran consistently to completion.
A comment was made that there were too many MPI environment variables. The question was raised whether or not there were any old environment variables that, due to software changes, were no longer needed? Questions about how these environment variables were documented prompted the comment that Cray provided an MPI (also known as intro_mpi) man page that offered all the Cray-specific MPI information. The other MPI man pages were unmodified by Cray and came from MPICH 1.2.
Math libraries were the next topic of discussion. A site wanted to use both the Cray provided PETSc as well as DOE's PETSc. Both software packages were installed at her site, but the Cray software was loaded over the DOE package by default. The problem was that both software packages had the same name. The request was for Cray to change the name of its PETSc modulefile name. The group discussed various options, including methods using the module infrastructure to specifically request the DOE PETSc modulefile. The comments were noted and Cray will look into this further.
From math libraries, the topic then moved to feedback on documentation. Customers commented that they typically went first to the man pages for information, and then to the optimization / tuning guide for the compiler they were using. The consensus was that the more detailed manuals provided a main source of information, and were definitely needed.
During the discussions, a request was made for Cray to address the growing number of modulefiles for the XT system. It was asked if Cray could categorize, compress or suppress the number of modulefiles available on the XT. The feeling was that the number of modulefiles for the system was becoming difficult to manage.
A request was also made for a module "undo last switch" feature. When a user switched compilers for example, they did a 'module swap a to b'. The undo would essentially do a 'module swap b to a' without having to directly specify the entire swap command again. A shorthand approach would be more desirable since module switching occurs often enough with the different compilers supported.
A comment was raised that memory allocation didn't check if space exists. The user mentioned that you only find out if you don't have enough space when you try to write to the allocated memory. It was noted that David Tanqueray of Cray was aware of this issue, and could explain it further if necessary. This was believed to be an OS issue, and not a programming environment issue.
A request was made to provide stack traces on CNL. The user suggested that when something failed, it would be nice if Cray could by default provide a trace to stderr along with the error message. The desire was for behavior similar to the X1 system.
The session concluded on a positive note with the comments that Porsche did all of their simulations on Cray systems, and that customers really liked the close relationship they have with Cray. The direct information exchange between users and members from Cray that occurred at events such as CUG was highly valued for both parties.
Rolf Rabenseifner (HLRS)
Acting SIG Chair
SIG representatives:
-Rolf Rabenseifner (HLRS) Deputy Chair
-Mark Fahey (ORNL) Previous SIG Chair
-Jef Dawson (Cray) Applications Manager
-Luiz DeRose (Cray) Programming Environments Manager
-Nola van Vugt (Cray) Documentation Manager
-Heidi Poxon (Cray)
|