CUG2014 Final Proceedings | Created 2014-5-26 |
Birds of a Feather Interactive 3A Chair: Colin McMurtrie (Swiss National Supercomputing Centre) Birds of a Feather Interactive 3B Chair: Ashley Barker (Oak Ridge National Laboratory) Birds of a Feather Interactive 3C Chair: Nicholas Cardo (National Energy Research Scientific Computing Center) Birds of a Feather Interactive 8A Chair: John Hesterberg (Cray, Inc.) System Management John Hesterberg (Cray Inc.) Abstract Abstract Open and interactive discussion about system administration and management of Cray systems. Possible topics could be experiences with the new Resource Utilization Reporting (RUR) capabilities, the Image Management and Provisioning System (IMPS), Advanced Cluster Engine (ACE), Cray XE upgrades to SLES11 SP3, serial workloads on repurposed compute nodes, problems and best practices for administering large scale systems, and experiences with OpenStack projects. Birds of a Feather Interactive 8B Chair: Ashley Barker (Oak Ridge National Laboratory) Developing Dashboards for HPC Centers to Enable Instantaneous and Informed Decisions to be made at a Glance Ashley Barker (Oak Ridge National Laboratory) Abstract Abstract High Performance Computing Centers are collecting more data today than ever about every aspect of our operations including system performance, allocations, completed jobs, users, projects, trouble tickets, etc. It takes significant forethought and resources to turn the system data collected into knowledge. This knowledge in turn is used to make more impactful decision that can influence everything from policies to purchasing decisions to user satisfaction. Birds of a Feather Interactive 8C Chair: Vincent Betro (National Institute for Computational Sciences/University of Tennessee) Creating a United Front in Taking HPC to the Forefront of Research: PRACE and XSEDE Training Partnerships and Roadmaps Vincent C. Betro (National Institute for Computational Sciences/University of Tennessee) Abstract Abstract In order for all scientific research to glean the benefits of high performance computing, researchers around the world must be given not only resources but also the tools and training to use those resources effectively. Both the NSF XSEDE project, the Extreme Science and Engineering Discovery Environment, in the United States and PRACE, the Partnership for Advanced Computing in Europe, in the European Union aim to supply both of these necessary elements to researchers. One area in which the two virtual organizations can cooperate very readily is training. Both face the same issues with researchers needing a just-in-time approach to applying the most modern computing resources to their research, the need for both synchronous and asynchronous training over several time zones, the need for language to be removed as a barrier, and the desire to keep the training relevant and up-to-date. Both the panelists (including Vince Betro from NICS, David Henty from EPCC and Maria Grazia-Giuffreda from CSCS) and participants will be discussing the myriad difficulties as well as the opportunities for growth in training programs through partnerships with industry and academia. Participants will be placed into small groups and given an opportunity to create a list of the most necessary training elements to share between organizations as well as contribute their knowledge of where many of these resources already exist so they may be cataloged and collected for members of both projects to utilize in growing their training programs. The resulting information will be disseminated to both projects. Birds of a Feather Interactive 14A Chair: Jean-Guillaume Piccinali (CSCS) Parallel Debugging OpenACC/2.0 and OpenMP/4.0 Applications on Hybrid Multicore Systems jg piccinali (Swiss National Supercomputing Centre) Abstract Abstract Significant increases in performance of parallel applications have been achieved with hybrid multicore systems (such as GPGPUs, MICs based systems). In order to improve programmer productivity, directive based accelerator programming interfaces (such as OpenACC/2.0 and OpenMP/4.0) have been released for incremental development and porting of existing MPI and OpenMP applications. As scientists migrate their applications with these new parallel programming models in mind, they expect a new generation of parallel debugging tools that can seamlessly troubleshoot their algorithms on the current and new architectures. Developers of compilers and debuggers also rely on input from application developers to determine the optimal design for their tools to support the widest range of parallel programming paradigms and accelerated systems. Birds of a Feather Interactive 14B Chair: Sharif Islam (National Center for Supercomputing Applications) Zen and the art of the Cray System Administration Sharif Islam (National Center for Supercomputing Applications) Abstract Abstract System administrators have a unique and challenging role that requires a comprehensive knowledge of all the different components of the system beyond just installing and maintaining various pieces of software. System administrators are also the link between the users, applications developers, and storage and network admins, help desk, vendors, and documentation writers among other constituents using and supporting large, complex systems. This BOF will focus on tips, tools, and tricks that help achieve the zen-like comprehensiveness required of system admins in order to discover and solve problems and maintain a functioning, productive system for the users. In previous CUG sessions we have seen presentations focusing on how to interpret and correlate a large volume of log messages (such as Lustre and HSN logs) along with different tools to process data. The goal of this BOF is to share those knowledge and go beyond that and share novel strategies, ideas, challenges, and discoveries that are part and parcel of day-to-day system administration. Each Cray site may have specific setup and issues but there are underlying methods and techniques that are generic and worth sharing with CUG. Suggested topics: scheduler policies, lustre tuning and job performance, diagnosing HSN issues and job failures, hardware and console log analysis, decoding aprun failure codes. Birds of a Feather Interactive 14C Chair: Mark Fahey (National Institute for Computational Sciences) Future needs for Understanding User-Level Activity with ALTD Mark Fahey (National Institute for Computational Sciences) Abstract Abstract Let’s talk real, supercomputer analytics drilling down to the level of individual batch submissions, users, and binaries. And we’re not just targeting performance: we’re after everything from which libraries and/or individual functions are in demand to preventing the problems that get in the way of successful science. This BoF will bring together those with experience and interest in present and future system tools and technologies that can provide this type of job-level insight, and will be the kickoff meeting for a new Special Interest Group (SIG) for those who want to explore this topic more deeply. Dr. Fahey is the author of ALTD, a tool that reports software and library usage at the individual job level and principal investigator of a newly funded NSF grant to re-envision the infrastructure at XALT. ALTD is currently deployed at numerous major centers across the United States and Europe. Birds of a Feather Interactive 15A Chair: Jeff Keopp (Cray Inc.) Birds of a Feather Interactive 15B Chair: Ian Bird (YarcData, Inc.) Birds of a Feather Interactive 15C Chair: Timothy W. Robinson (Swiss National Supercomputing Centre) OpenACC: CUG members' experiences and evolution of the standard Timothy Robinson (Swiss National Supercomputing Centre) Abstract Abstract OpenACC is an emerging parallel programming standard designed to simplify the programming of heterogeneous systems, where CPUs are combined with GPUs and/or other accelerator architectures. The API, developed principally by PGI, Cray, NVIDIA and CAPS, follows a directives-based approach to specify loops and/or regions of code for offloading from host to accelerator, providing portability across operating systems, host CPUs and accelerators. The model is particularly attractive to the HPC community because it allows application developers to port existing codes in Fortran, C or C++ without the need for additional programming languages, and without the need to explicitly initiate accelerator startup/shutdown or explicitly manage accelerator memory. OpenACC is currently supported by 18 member organizations – 11 of which are CUG sites and/or Cray partners. The purpose of this BOF is two-fold: first, it is designed to update the user community of recent developments with the OpenACC specification and its future roadmap, including the relationship between OpenACC and other closely-related APIs (particularly OpenMP). Second, it will give OpenACC users an opportunity to describe their current experiences and needs for future releases. Discussions on language and construct-related issues will be led by Cray (in conjunction with their partners), while contributions from applications developers will describe the porting of two scientific codes: RAMSES, an AMR code developed in Switzerland for the simulation of galaxy formation, and ICON, a general circulation model developed by the Max Planck Institute for Meteorology and the German Weather Service. Additional contributions will be solicited from CUG member sites. General Session General Session 4 Chair: Nicholas Cardo (National Energy Research Scientific Computing Center) CUG Welcome Nick Cardo (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) Supercomputers: instruments for science or dinosaurs that haven’t gone extinct yet? Thomas Schulthess (Swiss National Supercomputing Centre) Biography Biography Thomas Schulthess (Swiss National Supercomputing Centre) Thomas Schulthess received his PhD in physics from ETH Zurich in 1994. He is a professor for computational physics at ETH Zurich and Director of the Swiss National Supercomputing Center in Lugano, Switzerland. Thomas holds a visiting distinguished professor appointment at ORNL, where he was group leader and researcher in computational materials science for over a decade before moving to ETH Zurich in 2008. His current research interests are in development of efficient and scalable algorithms for the study of strongly correlated quantum systems, as well as electronic structure methods in general. He is also engaged in the development of efficient tools and simulations systems for other domain areas, such as meteorology/climate and geophysics. Abstract Abstract High-performance computing has dramatically improved scientific productivity over the past 50 years. It turned simulations into a commodity that all scientists can now use to produce knowledge and understanding about the world and the universe, using data from experiment and theoretical models that can be solved numerically. Since the beginnings of electronic computing, supercomputing – loosely defined as the most powerful scientific computing at any given time – has led the way in technology development. Yet, the way we interact with supercomputers today has not changed much since the days we stopped using punch cards. I do not claim to understand why, but nevertheless would like to propose a change in how we develop models and applications that run on supercomputers. General Session General Session 5 Chair: Nicholas Cardo (National Energy Research Scientific Computing Center) General Session General Session 9 Chair: Nicholas Cardo (National Energy Research Scientific Computing Center) Adapting the COSMO weather and climate model for hybrid architectures Oliver Fuhrer (Federal Office of Meteorology and Climatology MeteoSwiss) Biography Biography Oliver Fuhrer (Federal Office of Meteorology and Climatology MeteoSwiss) Dr. Oliver Fuhrer is a senior scientist in the modeling group of the Federal Office of Meteorology and Climatology MeteoSwiss, Zurich. He has over 12 years experience in the fields of high performance computing, regional climate simulation and numerical weather prediction. He has applied and developed parallel software and conducted research starting on vector machines and later on massively parallel architectures at the Swiss National Supercomputing Centre. Recently, Fuhrer acted as a PI or co-PI on three projects within the Swiss High Performance and High Productivity Computing (HP2C) initiative and Platform for Advanced Scientific Computing (PASC) and has had the scientific lead for developing a hardware oblivious and performance portable implementation of the dynamical core of the COSMO model. These efforts have resulted in an implementation of COSMO capable of running production simulations on hybrid architectures. Abstract Abstract Higher grid resolution, larger ensembles as well as growing complexity of weather and climate models demand ever-increasing compute power. Since 2013, several large hybrid high performance computers that contain traditional CPUs as well as some type of accelerator (e.g. GPUs) are online and available to the user community. Early adopters of this technology trend may have considerable advantages in terms of available resources and energy-to-solution. On the downside, a substantial investment is required in order to adapt applications to such accelerator-based supercomputer. General Session General Session 10 Chair: David Hancock (Indiana University) Intel: Accelerating Insights … Together With You Rajeeb Hazra (Intel Corporation) Biography Biography Rajeeb Hazra (Intel Corporation) Rajeeb Hazra is vice president of the Data Center Group and general manager for the Technical Computing Group at Intel Corporation. He is responsible for driving the business of high-performance computing and workstations, successfully ramping Intel's new Many Integrated Core (MIC) product line in technical computing and other segments, and leading fabric product development. Before assuming his current position, Hazra was the director of Supercomputing Architecture, where he was responsible for strategic research and development engagements with Intel's supercomputing customers. Earlier in his Intel career, he served as the director of the Systems Technology Lab in Intel Labs (then called the Corporate Technology Lab), where he oversaw R&D in hardware and software-related systems technologies. Hazra also previously served as technical assistant to Intel's chief technology officer. He joined Intel in 1994 as a software engineer in Intel Architecture Labs, working on video compression/decompression technologies. Before coming to Intel, Hazra was with the Lockheed Engineering and Sciences Company based at NASA's Langley Research Center. Hazra holds 16 patents in signal processing and has contributed numerous technical publications to refereed technical journals and conferences. He was honored with an Intel Achievement Award in 2000 for breakthroughs in multiple generations of industry-leading video-compression technologies. Hazra earned his bachelor's degree in computer science and engineering from Jadavpur University, India. He also holds a master's degree in computer science and a Ph.D. in computer science, both from The College of William & Mary. Abstract Abstract A major shift is underway in the technical computing industry across technology, business models, and market structure. First, the data explosion – both from proliferation of devices and the growth of current iterative HPC models – along with the desire to quickly turn it into valuable insight, is driving the demand for both predictive and real-time analytics - using HPC. Second, the transition of technical and high performance computing workloads to the cloud has begun, and will accelerate over the next few years as a function of the powerful economic, accessibility, usability, and scalability requirements and forces. Third, while the demand for more powerful compute will continue, the challenges associated with storing large volumes of data and then feeding it to the compute engines will spur innovation in storage and interconnect technology. Finally, technology challenges such as energy efficiency will require more attention (and innovation) at all components of the SW stack. In this talk, Raj will discuss the current dynamics of the HPC market, how Intel is innovating to address these changing trends – both in the short and long term – all while keeping an ecosystem view in mind, and how our collaborations with key partners will fit together to enable a complete and affordable solution for the entire HPC ecosystem. Tech Paper - Filesystems & I/O Technical Session 6B Chair: Scott Michael (Indiana University) Lustre and PLFS Parallel I/O Performance on a Cray XE6 Brett M. Kettering, Alfred Torrez, David J. Bonnie and David L. Shrader (Los Alamos National Laboratory) Abstract Abstract Today’s computational science demands have resulted in larger, more complex parallel computers. Their PFSes (Parallel File Systems) generally perform well for N-N I/O (Input/Output), but often perform poorly for N-1 I/O. PLFS (Parallel Log-Structured File System) is a PFS layer under development that addresses the N-1 I/O shortcoming without requiring the application to rewrite its I/O. The PLFS concept has been covered in prior papers. In this paper, we will focus on an evaluation of PLFS with Lustre underlying it versus Lustre alone on a Cray XE6 system. We observed significant performance increases when using PLFS over these applications’ normal N-1 I/O implementations. While some work remains to make PLFS production-ready, it shows great promise to provide an application and underlying file system agnostic means of allowing programmers to use the N-1 I/O model and obtain near N-N I/O model performance without maintaining custom I/O implementations. Addressing Emerging Issues of Data at Scale Keith Miller (DataDirect Networks) Abstract Abstract As the storage provider powering over 2/3 of the world¹s fastest supercomputers, Data Direct Networks (DDN) is uniquely positioned to deliver solutions for the emerging data-centric computing era. I/O Router Placement and Fine-Grained Routing on Titan to Support Spider II Matthew A. Ezell (Oak Ridge National Laboratory), David A. Dillow (N/A) and Sarp Oral, Feiyi Wang, Devesh Tiwari, Don E. Maxwell, Dustin Leverman and Jason Hill (Oak Ridge National Laboratory) Abstract Abstract The Oak Ridge Leadership Computing Facility (OLCF) introduced the concept of fine-grained routing in 2008 to improve I/O performance between the Jaguar supercomputer and Spider, the center-wide Lustre file system. Fine-grained routing organizes I/O paths to minimize congestion. Jaguar has since been upgraded to Titan, providing more than a ten-fold improvement in peak performance. To support the center’s increased computational capacity and I/O demand, the Spider file system has been replaced with Spider II. Building on the lessons learned from Spider, an improved method for placing LNET routers was developed and implemented for Spider II. The fine-grained routing scripts and configuration have been updated to provide additional optimizations and better match the system setup. This paper presents a brief history of fine-grained routing at OLCF, an introduction to the architectures of Titan and Spider II, methods for placing routers in Titan, and details about the fine-grained routing configuration. Tech Paper - Filesystems & I/O Technical Session 7B Chair: Sharif Islam (National Center for Supercomputing Applications) Cray Data Management Platform: Cray Lustre File System Monitoring Jeff Keopp and Harold Longley (Cray Inc.) Abstract Abstract The Cray Data Management Platform (formerly Cray External Services Systems) provides two external Lustre File System products – CLFS and Sonexion. The CLFS Lustre Monitor (esfsmon) keeps the CLFS file systems available by providing automated failover of Lustre assets in the event of MDS and/or OSS node failures. The Lustre Monitoring Toolkit (LMT) is now part of the standard ESF software release used by CLFS systems. Tuning and Analyzing Sonexion Performance Mark S. Swan (Cray Inc.) Abstract Abstract This paper will present performance analysis techniques that Cray uses with Sonexion-based file systems. Topics will include Lustre client-side tuning parameters, Lustre server-side tuning parameters, the Lustre Monitoring Toolkit (LMT), Cray modifications to IOR, file fragmentation analysis, OST fragmentation analysis, and Sonexion-specific information. Building an Enterprise class HPC storage system - Performance, Reliability and Management Torben Kling Petersen (Xyratex) Abstract Abstract As deployed HPC storage systems continue to grow in size, performance needs and complexity, the need for a fully integrated and tested HPC storage solution is essential. Balanced system and environment components are crucial including tuned software stacks, integrated management systems, faster rebuilds of RAID subsystems and the detection and maintenance of data integrity. This presentation covers developments in delivering a scalable HPC storage solution that addresses the most challenging problems for HPC users today. Tech Paper - Filesystems & I/O Technical Session 12A Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory) Cray Lustre Roadmap Cory Spitz and Tom Sherman (Cray Inc.) Abstract Abstract Cray deploys ‘value-added’ Lustre offerings with modifications and enhancement over base Lustre releases. In order to supply the best technology, Cray has chosen to integrate upstream feature releases that may not have a community supported maintenance branch. With that, we carefully plan integration of upstream community features. This paper will discuss how the community features will map to Cray releases and how we facilitate them via our participation with the community and OpenSFS. It will also highlight Cray specific features and enhancements that will allow our Lustre offerings to fully exploit the scale and performance inherent to Cray HPC compute products. In addition, this paper will discuss the upgrade paths from prior Cray Lustre software releases. That discussion will explain the migration process between 1.8.x based software to Cray’s current 2.x-based offerings, including details of support for ‘legacy’ so-called direct-attached Lustre for XE/XK that had previously been announced as EOL. Cray’s Tiered Adaptive Storage An engineered system for large scale archives, tiered storage and beyond. Craig Flaskerud and Scott Donoho (Cray Inc.), Harriet Coverston (Versity Inc.) and Nathan Schumann (Cray Inc.) Abstract Abstract A technical overview of Cray® Tiered Adaptive Storage (TAS) and its capabilities, with emphasis on the Cray TAS system architecture, scalability options and manageability. We will also cover specific details of the Cray TAS architecture and configuration options, in addition the Cray TAS software stack will be covered in detail, including specific innovations that differentiate Cray TAS from other archive systems. We will also characterize primary scalability factors of both the hardware and software layers of Cray TAS, as well as note the ease of Cray TAS integration with Lustre via HSM capabilities available in Lustre 2.5. Clearing the Obstacles to Backup PiB-Size Filesystems Andy Loftus and Alex Parga (National Center for Supercomputing Applications/University of Illinois) Abstract Abstract How does computer game design relate to backups? What makes backups of a 2-PiB filesystem so hard? This paper answers these questions by taking a look at the major roadblocks to backing up PiB-size filesystems and offers a software solution to accomplish the task. It turns out that the design of an event management system for computer games is well suited to the task of backups. As for the challenges of scaling a backup system to a 2-PiB filesystem, the solution is to take advantage of all the parallelism that necessarily exists in a system of this size. Learn the details of how this software is built and help decide if this should become an open source project. Tech Paper - Filesystems & I/O Technical Session 13A Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory) Using Resource Utilization Reporting to Collect DVS Usage Statistics Tina Butler (National Energy Research Scientific Computing Center) Abstract Abstract In recent releases of the Cray Linux Environment, a feature called Resource Utilization Reporting (RUR) as been introduced. RUR is designed as an extensible framework for collecting usage and monitoring statistics from compute nodes on a per application basis. This paper will discuss the installation and configuration of RUR, and the design and implementation of a custom RUR plugin for collecting DVS client-side statistics on compute nodes. Toward Understanding Congestion Protection Events on Blue Waters Via Visual Analytics Robert Sisneros and Kalyana Chadalavada (National Center for Supercomputing Applications/University of Illinois) Abstract Abstract For a system the scale of Blue Waters it is of primary importance to minimize high-speed network (HSN) congestion. We hypothesize that the ability to analyze the HSN in a system-wide manner will aid in the detection of network traffic patterns thereby providing a clearer picture of HSN congestion. The benefit of this is obvious we want to eliminate, or at lest minimize HSN congestion and have a better chance of doing so with a more complete understanding. To this end we have developed a visual analytics tool for viewing system-wide traffic patterns. Specifically, we employ a simple representation of Blue Waters’ torus network to visually show congested areas of the network. In this work we will describe the development of this tool and demonstrate its potential uses. I/O performance on Cray XC30 Zhengji Zhao (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory), Doug Petesch and David Knaak (Cray Inc.) and Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory) Abstract Abstract Edison is NERSC's newest petascale Cray XC30 system. Edison has three Lustre file systems deploying the Cray Sonexion storage systems. During the Edison acceptance test period, we measured the system I/O performance on a dedicated system with the IOR benchmark code from the NERSC-6 benchmark suite. After the system entered production, we observed a significant I/O performance degradation for some tests even on a dedicated system. While some performance change is expected due to file system fragmentation and system software and hardware changes, some of the performance degradation was more than expected. In this paper, we analyze the I/O performance we observed on Edison, focusing on understanding the performance change over time. We will also present what we have done to resolve the major performance issue. Ultimately, we want to detect and monitor the I/O performance issues proactively, to effectively mitigate I/O performance variance on a production system. Tech Paper - Filesystems & I/O Technical Session 16B Chair: Scott Michael (Indiana University) Performance Analysis of Filesystem I/O using HDF5 and ADIOS on a Cray XC30 Ruonan Wang (ICRAR, UWA), Christopher J. Harris (iVEC, UWA) and Andreas Wicenec (ICRAR, UWA) Abstract Abstract The Square Kilometer Array telescope will be one of the worlds largest scientific instruments, and will provide an unprecedented view of the radio universe. However, to achieve its goals the Square Kilometer Array telescope will need to process massive amounts of data through a number of signal and imaging processing stages. For example, for the correlation stage the SKA-Low Phase 1 will produce terabytes of data per second and significantly more for the second phase. The use of shares filesystems, such as Lustre, between these stages provides the potential to simplify these workflows. This paper investigates writing correlator output to the Lustre filesystem of a Cray XC30 using the HDF5 and ADIOS high performance I/O APIs. The results compare the performance of the two APIs, and identify key parameter optimisations for the application, APIs and the Lustre configuration. Fan-In Communication On A Cray Gemini Interconnect Terry Jones and Bradley Settlemyer (Oak Ridge National Laboratory) Abstract Abstract Using the Cray Gemini interconnect as our platform, we present a study of an important class of communication operations––the fan-in communication pattern. By its nature, fan-in communications form ‘hot spots’ that present significant challenges for any interconnect fabric and communication software stack. Yet despite the inherent challenges, these communication patterns are common in both applications (which often perform reductions and other collective operations that include fan-in communication such as barriers) and system software (where they assume an important role within parallel file systems and other components requiring high-bandwidth or low-latency I/O). Our study determines the effectiveness of differing client-server fan-in strategies. We describe fan-in performance in terms of aggregate bandwidth in the presence of varying degrees of congestion, as well as several other key attributes. Comparison numbers are presented for the Cray Aries interconnect. Finally, we provide recommended communication strategies based on our findings. HPC’s Pivot to Data Suzanne Parete-Koon (Oak Ridge National Laboratory), Jason Hick (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Jason Hill (Oak Ridge National Laboratory), Shane Canon (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Blake Caldwell (Oak Ridge National Laboratory), David Skinner (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Christopher Layton (Oak Ridge National Laboratory), Eli Dart (Lawrence Berkeley National Laboratory) and Jack Wells, Hai Ah Nam, Daniel Pelfrey and Galen Shipman (Oak Ridge National Laboratory) Abstract Abstract Computer centers such as NERSC and OLCF have traditionally focused on delivering computational capa- bility that enables breakthrough innovation in a wide range of science domains. Accessing that computational power has required services and tools to move the data from input and output to computation and storage. A “pivot to data” is occurring in HPC. Data transfer tools and services that were previously peripheral are becoming integral to scientific work- flows. Emerging requirements from high-bandwidth detectors, high-throughput screening techniques, highly concurrent sim- ulations, increased focus on uncertainty quantification, and an emerging open-data policy posture toward published research are among the data-drivers shaping the networks, file systems, databases, and overall HPC environment. In this paper we explain the pivot to data in HPC through user requirements and the changing resources provided by HPC with particular focus on data movement. For WAN data transfers we present the results of a study of network performance between centers. Tech Paper - Filesystems & I/O Technical Session 17B Chair: Nicholas Cardo (National Energy Research Scientific Computing Center) The Value of Tape and Tiered Adaptive Storage Steve Mackey (Spectra Logic) Abstract Abstract High performance environments require peak performance from computing equipment—including storage. Spectra’s T-Series libraries help push the boundaries of operational objectives, giving cost-effective storage that meets all of your performance, growth, and environmental needs. Using Robinhood to Purge Lustre Filesystems Tina Declerck (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) Abstract Abstract NERSC purges local scratch filesystems to ensure end user usability and availability along with filesystem reliability. This is accomplished through quotas, and by destructively purging files that are older than a specified period. This paper will describe in detail how our new purge mechanism was developed and deployed based upon the Robinhood capabilities. The actual purge operation is a separate step to ensure data and metadata are consistent before a destructive purge operation takes place since the state of the filesystem may have changed during Robinhood’s sampling period. Other details of the purge will also be included such as how long the purge takes, an analysis of the data being purged and it’s affect on overall process, as well as work done to improve the time required to purge. Finally, a discussion of the issues that we encountered and what was accomplished to resolve them. Site Overview ECMWF Oliver Treiber (European Centre for Medium-Range Weather Forecasts) Abstract Abstract The European Centre for Medium-Range Weather Forecasts (www.ecmwf.int) is an intergovernmental organisation supported by 34 states, providing operational medium- and extended-range weather forecasts alongside access to its supercomputing facilities for scientific research. ECMWF is located in Reading, UK. Tech Paper - PE & Applications Technical Session 6C Chair: Abhinav S. Thota (Indiana University) The Cray Programming Environment: Current Status and Future Directions Luiz DeRose (Cray Inc.) Abstract Abstract The scale of current and future high end systems, as well as the increasing system software and architecture complexity, brings a new set of challenges for application developers. In order to be able to close the gap between observed and achievable performance on current and future supercomputer systems, application developers need a programming environment that can hide the issues of scale and complexity of high end HPC systems. In this talk I will present the recent activities and future directions of the Cray Programming Environment, which are being developed and deployed according to Cray’s adaptive supercomputing strategy, focusing on maximizing performance and programmability, as well as ease porting and tuning efforts, to improve user’s productivity on the Cray Supercomputers. New Functionality in the Cray Performance Analysis and Porting Tools Heidi Poxon (Cray Inc.) Abstract Abstract The Cray performance analysis and porting tools are set on an evolutionary path to help address application performance challenges associated with the next generation of HPC systems. This toolset provides key porting and performance measurement and analysis functionality needed when parallelizing codes for better use of new processors, and when tuning codes that run on Cray multi-core and hybrid computing systems. The recent focus of the Cray tools has been on ease of use, and more intuitive user interfaces, as well as on the access to more information available from processors. This paper describes new functionality including AMD L3 cache counter support on Cray XE systems, a new GPU timeline for Cray systems with GPUs, additional OpenMP parallelization assistance through Reveal, and power metrics on Cray XC systems. It's all about the applications: how system owners and application developers can get more out of their Cray David Lecomber (Allinea Software) Abstract Abstract Supercomputers are designed for the purpose of generating the results that enable scientific understanding and progress. Tech Paper - PE & Applications Technical Session 7C Chair: Rolf Rabenseifner (High Performance Computing Center Stuttgart) Scalability Analysis of Gleipnir: A Memory Tracing and Profiling Tool, on Titan Tomislav Janjusic, Christos Kartsaklis and Dali Wang (Oak Ridge National Laboratory) Abstract Abstract Understanding application performance properties is facilitated with various performance profiling tools. The scope of profiling tools varies in complexity, ease of deployment, profiling performance, and the detail of profiled information. Specifically, using profiling tools for performance analysis is a common task when optimizing and understanding scientific applications on complex and large scale systems such as Cray's XK7. Gleipnir is a memory tracing tool built as a plug-in tool for the Valgrind instrumentation framework. The goal of Gleipnir is to provide fine-grained trace information. The generated traces are a stream of executed memory transactions mapped to internal structures per process, thread, function, and finally the data structure or variable. This paper describes the performance characteristics of Gleipnir, a memory tracing tool, on the Titan Cray XK7 system when instrumenting large applications such as the Community Earth System Model. Debugging scalable hybrid and accelerated applications on the Cray XC30, CS300 with TotalView Chris Gottbrath (Rogue Wave Software) Abstract Abstract TotalView provides users with a powerful way to analyze and understand their codes and is a key tool in developing, tuning, scaling, and troubleshooting HPC applications on the Cray XC30 Supercomputer and CS300 Cluster Supercomputer Series. As a source code debugger TotalView provides users with complete control over program execution and a view into their program at the source code and variable level. TotalView uses a scalable tree based architecture and can scale up to hundreds of thousands of processes. This talk will introduce new users to TotalView's capabilities and give experienced users an update on recent developments including the new MRnet communication tree. The talk will also highlight memory debugging with MemoryScape (which is now available for the Xeon Phi), deterministic reverse debugging with ReplayEngine, and scripting with TVScript. Integration of Intel Xeon Phi Servers into the HLRN-III Complex: Experiences, Performance and Lessons Learned Florian Wende, Guido Laubender and Thomas Steinke (Zuse Institute Berlin) Abstract Abstract The third generation of the North German Supercomputing Alliance (HLRN) compute and storage facilities comprises a Cray XC30 architecture with exclusively Intel Ivy Bridge compute nodes. In the second phase, scheduled for November 2014, the HLRN-III configuration will undergo a substantial upgrade together with the option of integrating accelerator nodes into the system. To support the decision-making process, a four-node Intel Xeon Phi cluster is integrated into the present HLRN-III infrastructure at ZIB. This integration includes user/project management, file system access and job management via the HLRN-III batch system. For selected workloads, in-depth analysis, migration and optimization work on Xeon Phi is in progress. We will report our experiences and lessons learned within the Xeon Phi installation and integration process. For selected examples, initial results of the application evaluation on the Xeon Phi cluster platform will be discussed. Developing High Performance Intel® Xeon PhiT Applications Jim Jeffers (Intel Corporation) Abstract Abstract After introducing the Intel® Xeon Phi(tm) product family and roadmap, we will discuss how you can exploit the extensive parallel computing resources provided by Intel® Xeon PhiT products while enhancing your development investment in industry standards based applications for next generation systems. Performance optimization methods and tools will be surveyed with application examples. Tech Paper - PE & Applications Technical Session 12C Chair: Matt Allen (Indiana University) Using HPC in Planning for Urban/Coastal Sustainability and Resiliency Paul C. Muzio, Yauheni Dzedzits and Nikolaos Trikoupis (College of Staten Island/City University of New York) Abstract Abstract Population growth and the migration and concentration of people into urban areas is having a profound impact on the environment. In turn, climate change and rising sea levels is threatening the viability and sustainability of these large metropolitan areas, which are mainly located in coastal areas. Planning for urban sustainability and urban/coastal resiliency is increasing dependent on extensive modeling activities using high-performance computing. The Cray Framework for Hadoop for the Cray XC30 Howard Pritchard, Jonathan Sparks and Martha Dumler (Cray Inc.) Abstract Abstract This paper describes the Cray Framework for Hadoop on Cray XC30. This is a framework for supporting components of the Hadoop eco-system on XC30's managed by widely used batch schedulers. The paper further describes experiences encountered in running Hadoop workloads over typical Lustre deployments on the Cray XC30. Related work to enable Hadoop to better utilize the XC high speed interconnect is discussed. HPCHadoop: A framework to run Hadoop on Cray X-series supercomputers Scott Michael, Abhinav Thota and Robert Henschel (Indiana University) Abstract Abstract The rise of Big Data in research and industry has seen the development of software frameworks to address many challenges, most notably the MapReduce framework. There are many implementations of the MapReduce framework, however one of the most widely used open source implementations is the Apache Hadoop framework. Tech Paper - PE & Applications Technical Session 13C Chair: Matt Allen (Indiana University) Applications of the YarcData Urika in Drug Discovery and Healthcare Robert Henschel, David Wild, Ying Ding, Abhik Seal and Jeremy Yang (Indiana University), Bin Chen (Stanford University) and Abhinav Thota and Scott Michael (Indiana University) Abstract Abstract The Cheminformatics & Chemogenomics Research Group (CCRG) at Indiana University has been working on algorithms and tools for large scale data mining of drug discovery, chemical and biological data using semantic technologies. The work includes finding new gene targets for drugs, identifying drug/drug interactions and pinpointing the cause for drug side effects. CCRG uses sematic web technologies like RDF triple stores and the SPARQL query language. The YarcData Urika appliance promises to radically speed up this specific type of research by implementing a SPARQL endpoint on specialized hardware. In this paper, we are describing how a Urika system could be integrated into the workflow and report on the performance for specific problems. Discovery in Big Data: Success Stories, Best Practices and Analytics Techniques Ramesh Menon and Amar Shan (YarcData Inc.) Abstract Abstract Discovery, the uncovering of hidden relationships and unknown patterns, has always been a uniquely human endeavor with relatively little automated assistance. The advent of big data has changed that by enabling discovery to be performed “in-silico: High-Level Analytics with R and pbdR on Cray Systems Pragneshkumar Patel and Drew Schmidt (National Institute for Computational Sciences/University of Tennessee), Wei-Chen Chen (University of Tennessee), George Ostrouchov (Oak Ridge National Laboratory) and Mark Fahey (National Institute for Computational Sciences) Abstract Abstract In this paper, we present the high-level analytics engine R, as well as the high-performance extension to R called pbdR (Programming with Big Data in R). We intend to justify the need for high-level analytics in supercomputing; in particular, we stress the importance of R, and the need for pbdR. We also discuss build issues that arise with R and pbdR on supercomputers, notably the loading of dynamic libraries, accessing file system, and executing R programs on Cray machines. We conclude with extensive performance benchmarking using Titan (ORNL), Darter (NICS), Mars (NICS) and Blue Waters (NCSA) HPC resources. Tech Paper - PE & Applications Technical Session 16C Chair: Rebecca Hartman-Baker (iVEC) CP2K Performance from Cray XT3 to XC30 Iain Bethune and Fiona Reid (EPCC, The University of Edinburgh) and Alfio Lazzaro (Cray Inc.) Abstract Abstract CP2K is a powerful open-source program for atomistic simulation using a range of methods including Classical potentials, Density Functional Theory based on the Gaussian and Plane Waves approach, and post-DFT methods. CP2K has been designed and optimised for large parallel HPC systems, including a mixed-mode MPI/OpenMP parallelisation, as well as CUDA kernels for particular types of calculations. Developed by an open-source collaboration including University of Zurich, ETH Zurich, EPCC and others, CP2K has been well tested on several generations of Cray supercomputers, beginning with the XT3 in 2006 at CSCS, through XT4, XT5, XT/XE6, and XK7, to Ivy-Bridge and Sandy-Bridge based XC30 systems in 2014. We present a systematic view of benchmark data spanning 9 years and 7 generations of the Cray architecture, and report on recent efforts to carry out comprehensive comparative benchmarking and performance analysis of CP2K on the XE6 and XC30 systems at EPCC. We also describe work to enable CP2K for accelerators, and show performance data from the XK7 and XC30 at CSCS. Optimising Hydrodynamics applications for the Cray XC30 with the application tool suite Wayne P. Gaudin (Atomic Weapons Establishment), Andrew C. Mallinson (University of Warwick), Oliver FJ Perks and John A. Herdman (Atomic Weapons Establishment), John M. Levesque (Cray Inc.), Stephen A. Jarvis (University of Warwick) and Simon McIntosh-Smith (University of Bristol) Abstract Abstract Due to power constraints, HPC systems continue to increase hardware concurrency. Efficiently scaling applications on future machines will be essential for improved science and it is recognised that the "flat" MPI model will start to reach its scalability limits. The optimal approach is unknown, necessitating the use of mini-applications to rapidly evaluate new approaches. Reducing MPI task count through the use of shared memory programming models will likely be essential. We examine different strategies for improving the strong-scaling performance of explicit Hydrodynamics applications. Using the CloverLeaf mini-application at extreme scale across three generations of Cray platforms (XC30, XE6 and XK7). We show the utility of the hybrid approach and document our experiences with OpenMP, OpenACC, CUDA and OpenCL under both the PGI and CCE compilers. We also evaluate Cray Reveal as a tool for automatically hybridising HPC applications and Cray's MPI rank to network topology-mapping tools for improving application performance. Using a Developing MiniApp to Compare Platform Characteristics on Cray Systems Bronson Messer (Oak Ridge National Laboratory) Abstract Abstract The use of reduced applications that share many of the performance and implementation features of large, fully-featured code bases (“MiniApps”) has gained considerable traction in recent years, especially in the context of exascale planning exercises. We have recently developed a MiniApp designed to serve as a proxy for the CHIMERA code that we have dubbed Ziz. As an initial foray, we have used the directionally-split hydro version of Ziz to quantify a handful of architectural impacts on Cray XK7 and XC30 platforms and have compared these impacts to results from a new Infiniband-based cluster at the Oak Ridge Leadership Computing Facility (OLCF). We will describe these initial results, along with some observations about generating useful MiniApps from extant applications and what these artifacts might hope to capture. Tech Paper - PE & Applications Technical Session 17A Chair: Thomas Leung (General Electric) Expanding Blue Waters with Improved Acceleration Capability Celso L. Mendes, Gregory H. Bauer and William T. Kramer (National Center for Supercomputing Applications) and Robert A. Fiedler (Cray Inc.) Abstract Abstract Blue Waters, the first open-science supercomputer to achieve a sustained rate of one petaflop/s on a broad mix of scientific applications, is the largest system ever built by Cray. It was originally deployed at NCSA with a configuration of 276 cabinets, containing a mix of XE (CPU) nodes and XK (CPU+GPU) nodes that share the same Gemini interconnection network. As a hybrid system, Blue Waters constitutes an excellent platform for developers of parallel applications who want to explore GPU acceleration. In 2013, Blue Waters was expanded with 12 additional cabinets of XK nodes, increasing the total system peak floating-point performance by 12%. This paper describes the expansion process, our analysis of multiple practical and performance-related issues leading to the final configurations, how the expanded system is being used by science teams, node failure rates, and our latest efforts toward monitoring system components associated with the GPUs. Accelerating Understanding: Data Analytics, Machine Learning, and GPUs Steven M. Oberlin (NVIDIA) Abstract Abstract Amazing new applications and services employing machine learning algorithms to perform advanced analysis of massive streams and collections of structured and unstructured data are becoming quietly indispensable in our daily lives. Machine learning algorithms like deep learning neural networks are not new, but the rise of large scale applications hosted in massive cloud computing data centers collecting enormous volumes of data from and about their users have provided unprecedented training sets and opportunities for machine learning algorithms. Recognizers, classifiers, and recommenders are only a few component capabilities providing valuable new services to users, but the training of extreme scale learning systems is computationally intense. Fortunately, like so many areas of high-performance computing, great economies and speed-ups can be realized through the use of general purpose GPU accelerators. This talk will explore a few advanced data analytics and machine learning applications, and the benefits and value of GPU acceleration. Unlocking the Full Potential of the Cray XK7 Accelerator Mark Klein (National Center for Supercomputing Applications/University of Illinois) and John Stone (University of Illinois) Abstract Abstract The Cray XK7 includes NVIDIA GPUs for acceleration of computing workloads, but the standard XK7 system software inhibits the GPUs from accelerating OpenGL and related graphics-specific functions. We have changed the operating mode of the XK7 GPU firmware, developed a custom X11 stack, and worked with Cray to acquire an alternate driver package from NVIDIA in order to allow users to render and post-process their data directly on Blue Waters. Users are able to use NVIDIA's hardware OpenGL implementation which has many features not available in software rasterizers. By eliminating the transfer of data to external visualization clusters, time-to-solution for users has been improved tremendously. In one case, XK7 OpenGL rendering has cut turnaround time from a month down to to just one day. We describe our approach for enabling graphics on the XK7, discuss how the new capabilities are exposed to users, and highlight their use by science teams. Tech Paper - PE & Applications Technical Session 18B Chair: Robert M. Whitten (University of Tennessee) “Piz Daint:” Application driven co-design of a supercomputer based on Cray’s adaptive system design Sadaf R. Alam (Swiss National Supercomputing Centre) and Thomas C. Schulthess (ETH Zurich) Abstract Abstract “Piz Daint” is a 28-cabinet Cray XC30 supercomputer that has been co-designed along with applications, and is presently the most energy efficient petascale supercomputer. Starting from selected applications in climate science, geophysics, materials science, astrophysics, and biology that have been designed for distributed memory systems with massively multi-threaded nodes, a rigorous evaluation of node architecture was performed, yielding hybrid CPU-GPU nodes as the optimum. Two applications, the limited-area climate model COSMO and the electronic structure code CP2K were selected for further co-development with the system. “Piz Daint” was deployed in two phases. First in a 12-cabinet standard multi-core configuration in fall 2012 that allowed testing of the network and development of the applications at scale. While hybrid nodes were being constructed for the second phase, applications as well as necessary extensions to the programming environment were co-developed. We discuss the co-design methodology and present performance results for the selected applications. Performance Portability and OpenACC Doug Miles, Dave Norton and Michael Wolfe (PGI / NVIDIA) Abstract Abstract Performance portability means a single program gives good performance across a variety of systems, without modifying the program. OpenACC is designed to offer performance portability across CPUs with SIMD extensions and accelerators based on GPU or many-core architectures. Using a sequence of examples, we explore the aspects of performance portability that are well-addressed by OpenACC itself and those that require underlying compiler optimization techniques. We introduce the concepts of forward and backward performance portability, where the former means legacy codes optimized for SIMD-capable CPUs can be compiled for optimal execution on accelerators and the latter means the opposite. The goal of an OpenACC compiler should be to provide both, and we uncover some interesting opportunities as we explore the concept of backward performance portability. Transferring User Defined Types in OpenACC James C. Beyer, David Oehmke and Jeff Sandoval (Cray Inc.) Abstract Abstract A preeminent problem blocking the adoption of OpenACC by many programmers is support for user-defined types: classes and structures in C/C++ and derived types in Fortran. This problem is particularly challenging for data structures that involve pointer indirection, since transferring these data structures between the disjoint host and accelerator memories found on most modern accelerators requires deep-copy semantics. This paper will look at the mechanisms available in OpenACC 2.0 to allow the programmer to design transfer routines for OpenACC programs. Once these mechanisms have been explored, a new directive-based solution will be presented. Code examples will be used to compare the current state-of-the-art and the new proposed solution. Tech Paper - PE & Applications Technical Session 18C Chair: Frank M. Indiviglio (National Oceanic and Atmospheric Administration) On the Current State of Open MPI on Cray Systems Nathan Hjelm and Samuel Gutierrez (Los Alamos National Laboratory) and Manjunath Venkata (Oak Ridge National Laboratory) Abstract Abstract Open MPI provides an implementation of the MPI standard supporting native communication over a range of high-performance network interfaces. Los Alamos National Laboratory (LANL) and Oak Ridge National Laboratory (ORNL) collaborated on creating a port for Cray XE and XK systems. That work has continued and with the release of version 1.8 Open MPI now conforms to MPI-2.2 and MPI-3.0 on Cray XE, XK, and XC systems. The features introduced with this work include dynamic process support (MPI_Comm_spawn()), important for implementing fault-tolerant MPI systems; improved collective operations required for scalability and performance of applications; and Aries support to enable running Open MPI on Cray XC systems. In this paper, we present an update on the design and implementation of Open MPI for Cray systems and evaluate the performance and scaling characteristics on both Gemini and Aries networks. User-level Power Monitoring and Application Performance on Cray XC30 supercomputers Alistair Hart and Harvey Richardson (Cray Inc.), Jens Doleschal, Thomas Ilsche and Mario Bielert (Technische Universität Dresden) and Matthew Kappel (Cray Inc.) Abstract Abstract In this paper, we show how users can access and display new power measurement hardware counters on Cray XC30 systems (with and without accelerators), either directly or through extended prototypes of the Score-P performance measurement infrastructure and Vampir application performance monitoring visualiser. A Hybrid MPI/OpenMP 3D FFT for Plane Wave Ab-intio Materials Science Codes Andrew Canning (Lawrence Berkeley National Laboratory) Abstract Abstract Ab-initio materials science and chemistry codes based on density functional theory and a plane-wave (Fourier) expansion of the electron wavefunctions are the most commonly used approach for electronic structure calculations in materials and nanoscience. This approach has become the largest user of cycles at scientific computer centers around the world through codes such as VASP, Quantum Espresso, Abinit, PEtot etc. Therefore, like in many other application areas, (fluid mechanics, climate research, accelerator design, etc.) efficient parallel scalable 3DFFTs are required. In this paper we show how our specialized hybrid MPI/OpenMP implementation of the 3DFFT on the Cray XE6(Hopper) and XC30(Edison) can significantly outperform and scale better than the pure MPI version, particularly on large core counts, by sending fewer larger messages. Our 3DFFT has been implemented in the full electronic structure code PEtot and results scaling to 10,000s cores on Cray platforms for PEtot will also be presented. Tech Paper - PE & Applications Technical Session 19B Chair: Iain A. Bethune (EPCC, The University of Edinburgh) Performance of the fusion code GYRO on four generations of Cray computers Mark Fahey (National Institute for Computational Sciences) Abstract Abstract GYRO is a code used for the direct numerical simulation of plasma microturbulence. Here we show the comparative performance and scaling on four generations of Cray supercomputers simultaneously including the newest addition - the Cray XC30. We also show that the recently added hybrid OpenMP/MPI implementation shows a great deal of promise on traditional HPC systems that utilize fast CPUs and proprietary interconnects. Four machines of varying sizes were used in the experiment, all of which are located at the National Institute for Computational Sciences at the University of Tennessee at Knoxville and Oak Ridge National Laboratory. The advantages, limitations, and performance of using each system are discussed, as well as the direction of future optimizations. Time-dependent density-functional theory on massively parallel computers Jussi Enkovaara (CSC - IT Center for Science Ltd.) Abstract Abstract GPAW is versatile open source software for various quantum mechanical simulations utilizing the density-functional theory (DFT) and time-dependent density functional theory (TD-DFT). GPAW is implemented in combination of Python and C programming languages. High level algorithms are implemented in Python, while numerical intensive kernels are implemented in C or utilize libraries. The parallelization is done with MPI, and MPI calls can be made both from Python and C parts of the code. The approach enables fast software development due to high-level nature of Python, while ensuring good performance from compiled language. Toward Improved Support for Loosely Coupled Large Scale Simulation Workflows Swen Boehm, Wael R. Elwasif, Thomas Naughton and Geoffroy Vallee (Oak Ridge National Laboratory) Abstract Abstract High-performance computing (HPC) workloads are increasingly leveraging loosely coupled large scale simulations [1]. Unfortunately, most large-scale HPC platforms, including Cray/ALPS environments, are designed for the execution of long-running jobs based on coarse-grained launch capabilities (e.g., one MPI rank per core on all allocated compute nodes). This assumption limits capability-class workload campaigns [2] that require large numbers of discrete or loosely coupled simulations, and where time-to-solution is an untenable pacing issue. This paper describes the challenges related to the support of fine-grained launch capabilities that are necessary for the execution of loosely coupled large scale simulations on Cray/ALPS platforms. More precisely, we present the details of an enhanced runtime system to support this use case, and report on initial results from early testing on systems at Oak Ridge National Laboratory. Tech Paper - PE & Applications Technical Session 19C Chair: Zhengji Zhao (Lawrence Berkeley National Laboratory) Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver on Cray's Manycore Architectures Xiaoye Sherry Li (Lawrence Berkeley National Laboratory) Abstract Abstract This paper presents the first hybrid MPI+OpenMP+CUDA algorithm and implementation of a right-looking unsymmetric sparse LU factorization with static pivoting for scalable heterogeneous architectures. While BLAS calls can account for more than 40\% of the overall factorization time, the difficulty is that small problem sizes dominate the workload, making efficient GPU utilization challenging. This motivates our new algorithmic developments, which are to find ways to aggregate collections of small BLAS operations into larger ones; to schedule operations to achieve load balance and hide long-latency operations, such as PCIe transfer; and to exploit simultaneously all of a node's available CPU cores and GPUs. We extensively evaluate this implementation to understand the strengths and limits of our method. Extending the Capabilities of the Cray Programming Environment with CLang-LLVM Framework Integration Ugo Varetto, Benjamin Cumming and Sadaf Alam (Swiss National Supercomputing Centre) Abstract Abstract Recent developments in programming for multi-core processors and accelerators using C++11, OpenCL and Domain Specific Languages (DSL) have prompted us to look into tools that offer compilers and both static and runtime analysis toolchains to complement the Cray Programming Environment capabilities. In this paper we report our preliminary experiences from using the CLang-LLVM framework on a hybrid Cray XC30 to perform tasks such as generating NVIDIA PTX code from C++ and OpenCL in a portable and flexible manner. Specifically we investigate how to overcome some of the limitations currently imposed by the standard tools such as the complete lack of C++11 support in CUDA C and outdated 32 bit versions of OpenCL. We also demonstrate how Clang-LLVM tools, for example, the static analyzer can bring additional capabilities to the Cray environment. Finally we describe how CLang-LLVM integrates with the standard Cray Programming Environment (PE), for instance, Cray MPI, perftools and libraries, and the steps required to properly install such tools on various Cray platforms. Tri-Hybrid Computational Fluid Dynamics on DOE’s Cray XK7, Titan Aaron Vose (Cray Inc.), Brian Mitchell (General Electric) and John Levesque (Cray Inc.) Abstract Abstract A tri-hybrid port of General Electric's in-house, 3D, Computational Fluid Dynamics (CFD) code TACOMA is created utilizing MPI, OpenMP, and OpenACC technologies. This new port targets improved performance on NVidia Kepler accelerator GPUs, such as those installed in the world's second largest supercomputer, Titan, the Department of Energy's 27 petaFLOP Cray XK7 located at Oak Ridge National Laboratory. We demonstrate a 1.4x speed improvement on Titan when the GPU accelerators are enabled. We highlight key optimizations and techniques used to achieve these results. These optimizations enable larger and more accurate simulations than were previously possible with TACOMA, which not only improves GE's ability to create higher performing turbomachinery blade rows, but also provides "lessons learned" which can be applied to the process of optimizing other codes to take advantage of tri-hybrid technology with MPI, OpenMP, and OpenACC. Tech Paper - Systems Technical Session 6A Chair: Hans-Hermann Frese (Zuse Institute Berlin) Cray Management System Updates and Directions John Hesterberg (Cray Inc.) Abstract Abstract Cray has made a number of updates to its system management in the last year, and we continue to evolve this area of our software. We will review the new capabilities that are available now, such as serial workloads on repurposed compute nodes, Resource Utilization Reporting (RUR) and the initial limited release of the Image Management and Provisioning System (IMPS). We will look at what is coming in the upcoming releases, including the next steps of IMPS capabilities, and the initial integration of new technologies from the OpenStack projects. Producing the Software that Runs the Most Powerful Machines in the World: the Inside Story on Cray Software Test and Release. Kelly J. Marquardt (Cray Inc.) pdf, pdf Cray XC System Level Diagnosability: Commands, Utilities and Diagnostic Tools for the Next Generation of HPC Systems Jeffrey J. Schutkoske (Cray Inc.) Abstract Abstract The Cray XC system is significantly different from the previous generation Cray XE system. The Cray XC system is built using new technologies including transverse cooling, Intel processor based nodes, PCIe interface from the node to the network ASIC, Aries Network ASIC and Dragonfly topology. The diagnosability of a Cray XC system has also been improved by a new set of commands, utilities and diagnostics. This paper describes how these tools are used to aid in system level diagnosability of the Cray XC system. Tech Paper - Systems Technical Session 7A Chair: Jim Rogers (Oak Ridge National Laboratory) Cray Hybrid XC30 Installation – Facilities Level Overview Ladina Gilly, Colin McMurtrie and Tiziano Belotti (Swiss National Supercomputing Centre) Abstract Abstract In this paper we describe, from a facilities point of view, the installation of the 28-cabinet Cray hybrid XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS). This system was the outcome of a 12 month collaboration between CSCS and Cray and, as a consequence, is the first such system of its type worldwide. The focus of the paper is on the site preparation and integration of the system into CSCS' state-of-the-art HPC data centre. As with any new system architecture, the installation phase brings challenges at all levels. In order to achieve a quick turnaround of the initial bring-up it is essential to ensure that the site design is flexible enough to accommodate unforeseen variances in system environmental requirements. In the paper we detail some of the challenges encountered and the steps taken to ensure a quick and successful installation of the new system. Cray XC30 Power Monitoring and Management Steven J. Martin and Matthew Kappel (Cray Inc.) Abstract Abstract Cray customers are increasingly demanding better performance per watt and finer grained control of total power consumption in their data centers. Customers are requesting features that allow them to optimize application performance per watt, and to conduct research in support of future system and application power efficiency. New system procurements are increasingly constrained by site power and cooling limitations, the cost of power and cooling, or both. This paper describes features developed in support of system power monitoring and management for the Cray XC30 product line, expected use cases, and potential features and functions for the future. First Experiences With Validating and Using the Cray Power Management Databse Tool Gilles Fourestey, Benjamin Cumming, Ladina Gilly and Thomas C. Schulthess (Swiss National Supercomputing Centre) Abstract Abstract In October 2013 CSCS installed the first hybrid Cray XC-30 system, dubbed Piz Daint. This system features the power management database (PMDB), that was recently introduced by Cray to collect detailed power consumption information in a non-intrusive manner. Power measurements at taken on each node, with additional measurements for the Aries network and blowers, and recorded in a database. This enables fine-grained reporting of power consumption that is not possible with external power meters, and is useful to both application developers and facility operators. This paper will show how benchmarks of representative applications at CSCS were used to validate the PMDB on Piz Daint. Furthermore we will elaborate, with the well-known HPL benchmark serving as prototypical application, on how the PMDB streamlines the tuning for optimal power efficiency in production. Piz Daint is presently the most energy efficient petascale supercomputer in operation. Monitoring Cray Cooling Systems Don Maxwell (Oak Ridge National Laboratory), Jeffrey Becklehimer (Cray Inc.) and Matthew Ezell, Matthew Donovan and Christopher Layton (Oak Ridge National Laboratory) Abstract Abstract While sites generally have systems in place to monitor the health of Cray computers themselves, often the cooling systems are ignored until a computer failure requires investigation into the source of the failure. The Liebert XDP units used to cool the Cray XE/XK models as well as the Cray proprietary cooling system used for the Cray XC30 models provide data useful for health monitoring. Unfortunately, this valuable information is often available only to custom solutions not accessible by a center-wide monitoring system or is simply ignored entirely. In this paper, methods and tools used to harvest the monitoring data available are discussed, and the implementation needed to integrate the data into a center-wide monitoring system at the Oak Ridge National Laboratory is provided. Tech Paper - Systems Technical Session 12B Chair: Hans-Hermann Frese (Zuse Institute Berlin) Enhanced Job Accounting with PBS Works and Cray RUR: Better Access to Better Data Scott Suchyta (Altair Engineering, Inc.) Abstract Abstract Assessing how your system is being used can be a daunting task, especially when the data is spread across multiple sources or, even worse, the available data is sparse and unrelated to the true metrics you need to gather. Factor in the business requirements of reporting on user, group, and/or project usage of the system, and you find yourself creating a homegrown solution that you will need to maintain. Measuring GPU Usage on Cray XK7 using NVIDIA's NML and Cray's RUR Jim Rogers and Mitchell Griffith (Oak Ridge National Laboratory) Abstract Abstract ORNL introduced a 27PF Cray XK7 in to production in May 2013. This system provides users with 18,688 hybrid compute nodes, where each node couples an AMD 6274 Opteron with an NVIDIA GK110 (Kepler) GPU. Beginning with Cray’s OS version CLE 4.2UP02, new features available in the GK110 device driver, the NVIDIA Management Library, and Cray’s Resource Utilization software provide a mechanism for measuring GPU usage by applications on a per-job basis. By coupling this data with job data from the workload manager, fine grained analysis of the use of GPUs, by application, are possible. This method will supplement, and eventually supplant an existing method for identifying GPU-enabled applications that detects, at link time, the libraries required by the resulting binary (ALTD, the Automatic Library Tracking Database). Analysis of the new mechanism for calculating per-application GPU usage is provided as well as results for a range of GPU-enabled application codes. Resource Management Analysis and Accounting Challenges Michael T. Showerman, Jeremy Enos, Mark Klein and Joshi Fullop (National Center for Supercomputing Applications/University of Illinois) Abstract Abstract Maximizing the return on investment in a large scale computing resource requires policy that best enables the highest value workloads. Measuring the impact of a given scheduling policy pesents great challenges with a highly variable workload. Defining and measuring the separate components of scheduling and resource management overhead is critical in reaching a valuable conclusion about the effectiveness of the system’s availability for your workload. NCSA has developed tools for collecting and analyzing both user workload and system availability to measure the delivered impact of the Blue Waters resource. This publication presents solutions for displaying the scheduler’s past and present workloads well as an accounting for the availability and usage at the system and compute node level for application availability. Tech Paper - Systems Technical Session 13B Chair: Liz Sim (EPCC, The University of Edinburgh) Accelerate Insights with Topology, High Throughput and Power Advancements Wil Wellington and Michael Jackson (Adaptive Computing) Abstract Abstract Cray and Adaptive Computing power the world’s largest and most robust supercomputers with leading systems at NCSA, ORNL, HLRN, NERSC, NOAA and many more. Adaptive Computing and its Moab scheduling and optimization software help accelerate insights with Big Workflow. Big Workflow is an industry term coined by Adaptive Computing to describe the acceleration of insights through more efficient processing of intense simulations and big data analysis. Adaptive’s Big Workflow solution unifies all available resources, optimizes the analysis process and guarantees services to the business, allowing Cray systems do what they do best, deliver massive compute resources to tackle today’s challenges. Topology-Aware Job Scheduling Strategies for Torus Networks Jeremy Enos (National Center for Supercomputing Applications), Greg Bauer, Robert Brunner and Sharif Islam (National Center for Supercomputing Applications/University of Illinois), Robert A. Fiedler (Cray Inc.) and Michael Steed and David Jackson (Adaptive Computing) Abstract Abstract Multiple sites having Cray systems with a Gemini network in a 3D torus configuration have reported inconsistent application run times as a consequence of task placement and application interference on the torus. In 2013, a collaboration between Adaptive Computing, NCSA (Blue Waters project), and Cray was begun, which includes Adaptive’s plan to incorporate topology awareness into the Moab scheduler product to mitigate this problem. In this paper, we describe the new scheduler features, tests and results that helped shape its design, and enhancements of the Topaware node selection and task placement tool that enable users to best exploit these new capabilities. We also discuss multiple alternative mitigation strategies implemented on Blue Waters that have shown success in improving application performance and consistency. These include predefined optimally-shaped groups of nodes that can be targeted by jobs, and custom modifications of the ALPS node order scheme. TorusVis: A Topology Data Visualization Tool Omar Padron and David Semeraro (National Center for Supercomputing Applications/University of Illinois) Abstract Abstract The ever-growing scope of extreme-scale supercomputers requires an increasing volume of component-local metrics to better understand their systemic behavior. The collection and analysis of these metrics have become data-intensive tasks in their own right, the products of which inform system support activities critical to ongoing operations. With recent emphasis being placed on topology-awareness as a step towards better coping with extreme scale, the ability to visualize complex topology data has become increasingly valuable, particularly for the visualization of multidimensional tori. Several independent efforts to produce similar visualizations exist, but they have typically been in-house developments tailor-made for very specific purposes; and not trivially applicable to visualization needs not featured among those purposes. In contrast, a more general-purpose tool offers benefits that ease understanding of many interrelated aspects of a system's behavior, such as application performance, job node placement, and network traffic patterns. Perhaps more significantly, such a tool can offer analysts insight into the complex topological relationships shared among these considerations; relationships that are often difficult to quantify by any other means. Tech Paper - Systems Technical Session 16A Chair: Hans-Hermann Frese (Zuse Institute Berlin) Systems-level Configuration and Customisation of Hybrid Cray XC30 Nicola Bianchi, Sadaf Alam, Roberto Aielli, Vincenzo Annaloro, Colin McMurtrie, Massimo Benini, Timothy Robinson and Fabio Verzolli (Swiss National Supercomputing Centre) Abstract Abstract In November 2013 the Swiss National Supercomputing Centre (CSCS) upgraded the 12 cabinet Cray XC30 system, Piz Daint, to 28 cabinets. Dual-socket Intel Xeon nodes were replaced with the hybrid nodes containing one Intel Xeon E5-2670 CPU and one Nvidia K20X GPU. The new design resulted in several extensions to the system operating and management environment, in addition to user driven customisation. These include integration of elements from the Tesla Deployment Kit (TDK) for Node Health Check (NHC) tests and Nvidia Management Library (NVML). Cray extended the Resource Usage Reporting (RUR) tool to incorporate GPU usage statistics. Likewise, the Power Monitoring Database (PMDB) incorporated GPU power and energy usage data. Furthermore, custom configurations are introduced to the SLURM job scheduling system to support different GPU operating modes. In collaboration with Cray, we assessed the Cluster Compatibility Mode (CCM) with SLURM, which in turn allows for additional GPU usage scenarios, which are currently under investigation. Piz Daint is currently the only hybrid XC30 system in production. To support robust operations we invested in the development of: 1) an holistic regression suite that tests sanity of various aspects of the system, ranging from the development environment to the system hardware; 2) a methodology for screening the live system for complex transient issues, which are likely to develop at scale. Slurm Native Workload Management on Cray Systems Danny Auble and David Bigagli (SchedMD LLC) Abstract Abstract Cray's Application Level Placement Scheduler (ALPS) software has recently been refactored to expose low level network management interfaces in a new library. Slurm is the first workload manager to utilize this new Cray infrastructure to directly manage network resources and launch applications without ALPS. New capabilities provided by Slurm include the ability to execute multiple jobs per node, the ability to execute many applications within a single job allocation (ALPS reservation), greater flexibility in scheduling, and higher throughput without sacrificing the scalability and performance that Cray is famous for. This presentation includes a description of ALPS refactoring, new Slurm plugins for Cray systems, and the changes in functionality provided by this new architecture. Cori: A Cray XC Pre-Exascale System for NERSC Katie Antypas, Nicholas Wright, Nicholas Cardo and Matthew Cordery (National Energy Research Scientific Computing Center) and Allison Andrews (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) Abstract Abstract The next generation supercomputer for the National Energy Research Scientific Computing Center (NERSC) will be a next-generation Cray XC system. The system, named “Cori” after Nobel Laureate Gerty Cori, will bring together technological advances in processors, memory, and storage to enable the solution of the worlds most challenging scientific problems. This next-generation Cray XC supercomputer will use Intel’s next-generation Intel® Xeon Phi™ processor -- code-named “Knights Landing” -- a self-hosted, manycore processor with on-package high bandwidth memory and delivering more than 3 teraFLOPS of double-precision peak performance per single socket node. Scheduled for delivery in mid-2016, the system will deliver 10x the sustained computing capability of NERSC’s Hopper system, a Cray XE6 supercomputer. With the excitement of bringing new technology to bear on world-class scientific problems also come the many challenges in application development and system management. Strategies to overcome these challenges are key to successful deployment of this system. Tech Paper - Systems Technical Session 17C Chair: Luc Corbeil (Swiss National Supercomputing Centre) Cray CS300-LC Cluster Direct Liquid Cooling Architecture Roger Smith (Mississippi State University) and Giridhar Chukkapalli and Maria McLaughlin (Cray Inc.) Abstract Abstract In this white paper, you will learn about a hybrid, direct liquid cooling architecture developed for the Cray CS300-LC cluster supercomputer. First, we will review the pros and cons of a variety of direct liquid cooling solutions implemented by various competing vendors. Next, we will review the business and technical challenges of the CS300-LC cluster supercomputer with best practices and implementation details. The paper will also describe the challenges of this architecture adhering to open standards and TCO. In collaboration with MSU, We will provide detailed energy efficiency analysis of the CS300-LC system. Additional details of resiliency, remote monitoring, management of Hybrid cooling system is described as well as exploring potential future work of making CS300-LC close to 100% warm water cooled. Mississippi State University High Performance Computing Collaboratory - A Brief Overview Trey Breckenridge (Mississippi State University) Abstract Abstract The High Performance Computing Collaboratory (HPC²), an evolution of the MSU NSF Engineering Research Center (ERC) for Computational Field Simulation, at Mississippi State University is a coalition of member centers and institutes that share a common core objective of advancing the state-of-the-art in computational science and engineering using high performance computing; a common approach to research that embraces a multi-disciplinary, team-oriented concept; and a commitment to a full partnership between education, research, and service. The MSU HPC² has a long and rich history in high performance computing dating back to the mid-1980's, with pioneering efforts in commodity clusters, low latency interconnects, grid generation, and the original implementation of MPICH. A Single Pane of Glass: Bright Cluster Manager for Cray Matthijs van Leeuwen (Bright Computing) Abstract Abstract Bright Cluster Manager provides comprehensive cluster management for Cray systems in one integrated solution: deployment, provisioning, scheduling, monitoring, and management. Its intuitive GUI provides complete system visibility and ease of use for multiple systems and clusters simultaneously, including automated tasks and intervention. Bright also provides a powerful management shell for those who prefer to manage via a command-line interface. Tech Paper - Systems Technical Session 18A Chair: Douglas W. Doerfler (Sandia National Laboratories) Large Scale System Monitoring and Analysis on Blue Waters using OVIS Michael T. Showerman (National Center for Supercomputing Applications/University of Illinois), Jeremy Enos and Joseph Fullop (National Center for Supercomputing Applications), Paul Cassella (Cray Inc.), Nichamon Naksinehaboon, Narate Taerat and Tom Tucker (Open Grid Computing) and Jim M. Brandt, Ann C. Gentile and Benjamin Allan (Sandia National Laboratories) Abstract Abstract Understanding the complex interplay between ap- plications competing for shared platform resources can be key to maximizing both platform and application performance. At the same time, use of monitoring tools on platforms designed to support extreme scale applications presents a number of challenges with respect to scaling and impact on applications due to increased noise and jitter. In this paper, we present our approach to high fidelity whole system monitoring of resource utilization including High Speed Network link data on NCSA’s Cray XE/XK platform Blue Waters utilizing the OVIS monitoring framework. We then describe architectural implementation details that make this monitoring system suit- able for scalable monitoring within the Cray hardware and software environment. Finally we present our methodologies for measuring impact and the results. A Diagnostic Utility For Analyzing Periods Of Degraded Job Performance Joshi Fullop and Robert Sisneros (National Center for Supercomputing Applications) Abstract Abstract In this work we present a framework for identifying possible causes for observed differences in job performance from one run to another. Our approach is to contrast periods of time through the profiling of system log messages. On a large scale system there are generally multiple, independently reporting subsystems, each capable of producing mountainous streams of events. The ability to sift through these logs and pinpoint events provides a direct benefit in managing HPC resources. This is particularly obvious when applied to diagnosing, understanding, and preventing system conditions that lead to overall performance degradation. To this end, we have developed a utility with real-time access to the full history of Blue Waters’ data where event sets from two jobs can be compared side by side. Furthermore, results are normalized and arranged to focus on those events with greatest statistical divergence, thus separating the chaff from the wheat. A first glance at DWD's new Cray XC30 Florian Prill (Deutscher Wetterdienst) Abstract Abstract In December 2013 the German Meteorological Service (DWD) has installed two identical Cray XC30 clusters in Offenbach. The new systems serve as a supercomputing resource for the operational weather service and also provide sufficient capacity for DWD's research purposes. After overall completion of the second phase, the compute clusters will reach a peak performance of roughly 2x549 TF. Tech Paper - Systems Technical Session 19A Chair: Liz Sim (EPCC, The University of Edinburgh) Workload Managers - A Flexible Approach Blaine Ebeling (Cray Inc.) Abstract Abstract Workload Managers (WLM) are the main user interfaces for running HPC jobs on Cray systems. Application Level Placement Services (ALPS) is a resource placement infrastructure provided Cray systems to support WLMs. Until now, WLMs have interfaced with ALPS through the BASIL protocol for node reservations, and the aprun command (apinit daemon) for launching applications. Over the last several years, the requirement to support more platforms, processor capabilities, dynamic resource management, and new features, led Cray to investigate alternative ways to provide more flexible methods for supporting and expanding WLM capabilities and new WLMs. This paper will highlight Cray's plans to expose low level hardware interfaces by refactoring ALPS to allow 'native' WLM implementations that do not rely on the current ALPS interface mechanism. Analysis and reporting of Cray service data using the SAFE. Stephen P. Booth (EPCC, The University of Edinburgh) Abstract Abstract The SAFE (Service Administration from EPCC) is a user services system developed by EPCC that handles user management and report generation for all our HPC services including the Cray services. SAFE is used to administer both the HECToR (Cray XE6) service, and its successor ARCHER (Cray XC30). An important function of this system is the ingestion of accounting data into the database and the generation of usage reports. In this paper we will present an overview of the design and implementation of this reporting system. Designing Service-Oriented Tools for HPC Account Management and Reporting Adam G. Carlyle, Robert D. French and William A. Renaud (Oak Ridge National Laboratory) Abstract Abstract The User Assistance Group at the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL) maintains detailed records and auxiliary data for thousands of HPC user accounts every year. These data are used across every aspect of center operations, e.g., within system administration scripts, written reports, and end-user communications. Tutorial Tutorial 1A (Half Day) Chair: John Levesque (Cray Inc.) Reveal - A remarkable scoping facilty for multi/many core systems John Levesque and Heidi Poxon (Cray Inc.) Abstract Abstract We have entered into an era when restructuring an application for the emerging HPC architectures is an absolute necessity. Most scientific applications today are all MPI (NERSC's latest measurements indicate that 80% of their applications only utilize one thread/MPI task). The most difficult task in converting an application to use OpenMP is to parallelize the most important loops within the application, which involves scoping many variables through complex call chains. Over the past three years, Cray has developed a tool to assist the user in this most difficult task. Reveal offers an incredible break-through in parallelism assistance by employing “whole program” analysis which gives it the ability to analyze looping structures that contain subroutine and function calls. Even with Reveal, there has to be an understanding of the issues related to inserting OpenMP directives. Reveal is like a multi-functional power tool that must be employed by a knowledgeable programmer. In this tutorial, Reveal will be demonstrated by the principal designer of the tool to give the attendee a good understanding of how best to navigate the tool, and an experienced user of Reveal/OpenMP who will explain the idiosyncrasies involved in adding OpenMP to an application based on Reveal’s feedback. Going forward, the importance of generating an efficient Hybrid application cannot be over-emphasized. Whether the target is OpenACC for the XC30 with GPUs or Intel's next generation Phi systems. This tutorial will demonstrate the performance that can be obtained with Reveal on applications that are utilized in the community today. Tutorial Tutorial 1C (Half Day) Chair: David Bigagli (SchedMD LLC) Slurm Workload Manager Use David Bigagli and Morris Jette (SchedMD LLC) Abstract Abstract Slurm is an open source workload manager used on half of the world's most powerful computers and provides a rich set of features including topology aware optimized resource allocation, the ability to expand and shrink jobs on demand, failure management support for applications, hierarchical bank accounts with fair-share job prioritization, job profiling, and a multitude of plugins for easy customization. Recent changes to Cray software now permit Slurm to directly manage Cray XC30 network resources and directly launch MPI jobs, providing a richer job scheduling environment than previously possible. Tutorial Tutorial 1B/2B (Full Day) Chair: Jim Jeffers (Intel Corporation) Optimizing for MPI/OpenMP on Intel® Xeon Phi™ Coprocessors John Pennycook, Hans Pabst and Jim Jeffers (Intel Corporation) Abstract Abstract Although Intel® Xeon Phi™ coprocessors are based on x86 architecture, making porting straightforward, existing codes will likely require additional tuning efforts to maximise performance. This all-day, hands-on tutorial will teach high-level techniques and design patterns for improving the performance and scalability of MPI and OpenMP applications while keeping source code maintainable and familiar. Attendees will run and optimize sample codes on a Cray CS300-AC supercomputer, and tuning examples for real-world codes using coprocessors will be discussed. Cluster administration considerations for coprocessors, based on experiences with Beacon at NICS, will be included. Tutorial Tutorial 2A (Half Day) Chair: Alistair Hart (Cray) OpenACC: Productive, Portable Performance on Hybrid Systems Using High-Level Compilers and Tools Alistair Hart, Luiz DeRose and James Beyer (Cray Inc.) Abstract Abstract Portability and programming difficulty are critical obstacles to widespread adoption of accelerators (GPUs and coprocessors) in High Performance Computing. The dominant programming models for accelerator-based systems (CUDA and OpenCL) can extract high performance from accelerators, but with extreme costs in usability, maintenance, development and portability. To be an effective HPC platform, hybrid systems need a high-level programming environment to enable widespread porting and development of applications that run efficiently on either accelerators or CPUs. Tutorial Tutorial 2C (Half Day) Chair: Shawn Hoopes (Adaptive Computing) Tackle Massive Big Data Challenges with Big Workflow - Advanced Training for Multi-Dimensional Policies that Accelerate Insights Shawn Hoopes (Adaptive Computing) Abstract Abstract Today, Cray and Adaptive Computing power the world’s largest and most robust supercomputers such as Blue Waters, HLRN, NERSC, NOAA and many more. Adaptive Computing and its Moab scheduling and optimization software play a huge role in accelerating insights, for CRAY users, with Big Workflow. Big Workflow is an industry term coined by Adaptive Computing to describe the acceleration of insights for IT professionals through more efficient processing of intense simulations and big data analysis. Adaptive’s Big Workflow solution unifies all available resources, optimizes the analysis process and monitors workflow status, allowing Cray systems do what they do best, deliver massive compute resources to tackle Big Data challenges. |