This is an archived page of the 2009 conference


Final update: February 25, 2009. keynotes | papers | speakers | vendors


Title Challenges on the Path to Efficient, Affordable and Effective Exascale Computing
author(s) Henry Gabb , Intel, USA
presenter Henry Gabb
title Driving Forces Shaping Future Systems Design
Author(s) Pratap Pattnaik, IBM TJ Watson Research Center, USA
Presenter Pratap Pattaik
abstract TBD
title Microprocessor Technologies for HPC
author(s) Marc Tremblay, Sun Microsystems, Inc., USA
presenter Marc Tremblay
title Adaptive Mantle Convection Simulation on Petascale Supercomputers
author(s) Omar Ghattas, University of Texas at Austin, USA
presenter Omar Ghattas
abstract Mantle convection is the principal control on the thermal and geological evolution of the Earth. Our goal is to conduct global mantle convection simulations that can resolve faulted plate boundaries, down to 1 km scales. However, uniform resolution at these scales would result in meshes with a trillion elements, which would elude even sustained petaflops supercomputers. Thus parallel adaptive mesh refinement and coarsening (AMR) is essential. We present Rhea, a new generation mantle convection code designed to scale to hundreds of thousands of cores. Rhea is built on Alps, a parallel octree-based adaptive mesh finite element library that provides new distributed data structures and parallel algorithms for dynamic coarsening, refinement, rebalancing, and repartitioning of the mesh. Using TACC's 579 teraflops Ranger cluster, we demonstrate excellent weak and strong scalability of parallel AMR on up to 62,464 cores for problems with up to 12.4 billion elements. With Rhea's adaptive capabilities, we have been able to reduce the number of elements by over three orders of magnitude, thus enabling us to simulate large-scale mantle convection with a finest local resolution of 1.5 km. This work is joint with C. Burstedde, M. Gurnis, G. Stadler, E. Tan, T. Tu, L. Wilcox, and S. Zhong.


title Weather Research and Forecast (WRF) Model: Performance Analysis on Advanced Multi-core HPC Clusters
author(s) Gilad Shainer, Mellanox Technologies, USA
presenter Gilad Shainer
abstract The Weather Research and Forecast (WRF) Model is a fully functioning modeling system for atmospheric research and operational weather prediction communities. With an emphasis on efficiency, portability, maintainability, scalability and productivity, WRF has been successfully deployed over the years on a wide variety of HPC clustered compute nodes connected with high speed interconnects – currently the most used system architecture for high-performance computing. As such, understanding WRF dependency on the various clustering elements, such as the CPU, interconnects and the software libraries are crucial for enabling efficient predictions and high productivity. Our results identify WRF’s communication-sensitive points and demonstrate WRF’s dependency on high-speed networks and fast CPU to CPU communication. Both factors are critical to maintaining scalability and increasing productivity when adding cluster nodes. We conclude with specific recommendations for improving WRF performance, scalability, and productivity as measured in jobs per day. Because proprietary hardware and software can quickly erode cluster architecture’s favorable economics, we will restrict our investigation to standards based hardware and open source software readily available to typical research institutions.
title A Parallel Algorithm for Large, Multi-Scale Simulations of Liquid/Gas Phase Interfaces
author(s) Marcus Herrmann, Arizona State University, USA
presenter Marcus Herrmann
abstract This paper will present the development and performance of a parallel algorithm for large, multi-scale simulations of liquid/gas phase interface.
Power and Cooling
title Towards Real-World HPC Energy Efficiency and Productivity Metrics in a Fully Instrumented Datacenter
author(s) Andres Marquez, Pacific Northwest National Laboratory, USA
presenter Andres Marquez
abstract Towards real-world HPC energy efficiency and productivity metrics in a fully instrumented Datacenter. Towards real-world HPC energy efficiency and productivity metrics in a fully instrumented Datacenter.
title Cyber-Physical Autonomic Resource management for High-Performance Datacenters
author(s) Georgios Varsamopoulos, Sandeep Gupta, Arizona State University, USA
presenter TBD
abstract Previous research has demonstrated the potential benefits of thermal aware load placement and thermal mapping in cool-intensive environments such as data centers. However, applying existing techniques has proved difficult to live data centers because of models that are either unrealistic, or requiring extensive sensing instrumentation, or their derivation is disruptive to the data center services. The work presented in this paper discusses cyberphysical-oriented techniques and their associated challenges on creating an adaptive and non-invasive software system to derive realistic and low-complexity thermal models in an autonomic manner, using built-in and ambient sensors. Uses of these techniques can vary from assessing the thermal efficiency of the data center to designing a thermalaware job scheduler, thus greening data centers. Specifically, this paper proposes and evaluates: i) new thermal models and a sensing software architecture, and ii) a method to identify relocation of equipment within a data center room to achieve permanent cost savings under all scheduling conditions.
Performance Evaluation
title Performance Analysis of the SiCortex SC072
author(s) Brian Martin, Andrew Leiker, Douglas Doerfler, James Laros III, Sandia National Laboratories, USA
presenter Brian Martin and Andrew Leiker
The world of High Performance Computing (HPC) has seen a ma jor shift towards commodity clusters in the last 10 years. A new company, SiCortex, has set out to break this trend. They have created what they claim to be a balanced cluster which makes use of low-power MIPS processors and a custom interconnect in an effort to avoid many of the bottlenecks plaguing most modern clusters. In this paper, we reveal the results of preliminary benchmarking of one of their systems, the SC072. First, we ran a collection of microbenchmarks to characterize the performance of interprocessor communication. Next, we ran some real applications relevant to high performance computing and compared performance and scalability to a typical commodity cluster. Lastly, we examine and compare the performance per watt of the SiCortex system to a commodity cluster.
A two-year intern at Sandia National Labs, Brian is currently attending Carnegie Mellon University studying computer science as a sophomore.
title QP: A Heterogeneous Multi-Accelerator Cluster
author(s) Michael Showerman, Wen-Mei Hwu, University of Illinois, USA; Jeremy Enos, Avneesh Pant, Volodymyr Kindratenko, Craig Steffen, Robert Pennington, NCSA, USA
presenter TBD
abstract We present a heterogeneous multi-accelerator cluster developed and deployed at NCSA. The cluster consists of 16 AMD dual-core CPU compute nodes each with four NVIDIA GPUs and one Xilinx FPGA. Cluster nodes are interconnected with both InfiniBand and Ethernet networks. The software stack consists of standard cluster tools with the addition of accelerator-specific software packages and enhancements to the resource allocation and batch sub-systems. We highlight several HPC applications that have been developed and deployed on the cluster. We also present our Phoenix application development framework that is meant to help with developing new applications and migrating existing legacy codes to heterogeneous systems.
The SC08 Cluster Challenge
title Windows HPC Server 2008 at the SC08 Cluster Challenge
author(s) Benjamin Jimenez, Arizona State University, USA
presenter Benjamin Jimenez
abstract Arizona State University undergraduates, supported by the ASU High Performance Computing Initiative and Microsoft Corporation, accepted the Supercomputing 2008 Cluster Challenge. Natalie Freed, Zachary Giles, Patrick Lu, Benjamin Jimenez, and Richard Wellington are part of the team developing a small cluster utilizing Windows HPC Server 2008 on a Cray CX1 blade server. The focus of the research follows the cluster challenge goal of showing the power of clusters to harness open source software to solve interesting problems. Development of this cluster focused on implementation of open source scientific applications. Since these applications normally perform under Unix environments, porting to Windows was a necessary component of the research. The significance of the challenge presented by porting to Windows during the development of the cluster is included in the discussion. At the final stage, visualizations created present a clearer understanding of the open source code output data.
title Bringing Disruptive Technology to Competition: Purdue and SiCortex
author(s) Alexander Younts, Andrew Howard, Preston Smith, Jeffrey Evans, Purdue University, USA
presenter TBD
abstract In November 2008, the second annual Cluster Challenge competition at the 2008 Supercomputing conference was held in Austin, Texas. Students from Purdue University made up one of seven teams of undergraduates to compete in the competition. Wanting a change of pace from the commodity hardware of the first challenge, Purdue teamed up with their vendor SiCortex, Inc. Students were then given the task of learning scientific applications POY, OpenFOAM, RAxML, WPP, and GAMESS, along with the HPC challenge benchmark suite. In this paper, students from the Purdue Cluster Challenge team discuss the work done in preparation of the competition, along with strategies and their effect on the outcome of the Cluster Challenge.
title Optimizing Cluster Configuration and Applications to Maximize Power Efficiency
author(s) Jupp Müller, Timo Schneider, Jens Domke, Robin Geyer, Matthias Häsing, Torsten Hoefler, Stefan Höhlig, Guido Juckeland, Andrew Lumsdaine, Matthias S. Müller and Wolfgang E. Nagel
presenter Jupp Müller and Timo Schneider
abstract The goal of the Cluster Challenge is to design, build and operate a compute cluster. Although it is an artificial environment for cluster computing, many of its key constraints on operation of cluster systems are important to real world scenarios: high energy efficiency, reliability and scalability. In this paper, we describe our approach to accomplish these goals.We present our original system design and illustrate changes to that system as well as to applications and system settings in order to achieve maximum performance within the given power and time limits. Finally we suggest how our conclusions can be used to improve current and future clusters.
Perfomance Analysis
title Experiences with Managed Hosting of Virtual Machines
author(s) Dustin Leverman, University of Colorado, USA; Henry Tufo, Michael Oberg, Matthew Woitaszek, NCAR, USA
presenter TBD
abstract TBD
title Active Harmony: Getting the Human Out of the Performance Tuning Loop
author(s) Jeffrey Hollingsworth, University of Maryland, USA
presenter Jeffrey Hollingsworth
abstract Getting parallel programs to run well is a difficult, tedious, and time consuming task. Programmers don't like tuning programs, and eventually they may not have to. In this talk I will present a system called Active Harmony that supports automated tuning of parallel programs. Active Harmony can be used to automatically tune runtime parameters, and drive compiler optimization. I will also present some performance results that show Harmony's auto tuning providing better results that manual efforts, and similar performance to exhaustive search of the parameter space.
title Experiences in Tuning Performance of Hybrid MPI/Open MP Applications on Quad-core Systems
author(s) Ashay Rane, Dan Stanzione, Arizona State University, USA
presenter Ashay Rane
abstract The Hybrid method of parallelization (using MPI for inter-node communication and OpenMP for intra-node communication) seems a natural fit for the way most clusters are built today. It is generally expected to help programs run faster due to factors like availability of greater bandwidth for intra-node communication. Accordingly, this hybrid paradigm of parallel programming has gained widespread attention. However, optimizing hybrid applications for maximum speedup is difficult primarily due to inadequate transparency provided by the OpenMP constructs and also due to the dependence of the resulting speedup on the combination in which MPI and OpenMP is used. In this paper we list out some of our experiences in trying to optimize applications built using MPI and OpenMP. More specifically, we talk about the different techniques that could be helpful to other researchers working on hybrid applications. To demonstrate the usefulness of these optimizations, we provide results from optimizing a few typical scientific applications. Using these optimizations, one hybrid code ran up to 34% faster than pure-MPI code.
System Software
title Evaluating the Shared Root File System Approach for Diskless High-Performance Computing Systems
author(s) Christian Engelmann, Hong Ong, Stephen Scott, Oak Ridge National Laboratory, USA
presenter Hong Ong
abstract Diskless high-performance computing (HPC) systems utilizing networked storage have become popular in the last several years. Removing disk drives significantly increases compute node reliability as they are known to be a major source of failures. Furthermore, networked storage solutions utilizing parallel I/O and replication are able to provide increased scalability and availability. Reducing a compute node to processor(s), memory and network interface(s) greatly reduces its physical size, which in turn allows for large-scale dense HPC solutions. However, one major obstacle is the requirement by certain operating systems (OSs), such as Linux, for a root file system. While one solution is to remove this requirement from the OS, another is to share the root file system over the networked storage. This paper evaluates three networked file system solutions, NFSv4, Lustre and PVFS2, with respect to their performance, scalability, and availability features for servicing a common root file system in a diskless HPC configuration. Our findings indicate that Lustre is a viable solution as it meets both, scaling and performance requirements. However, certain availability issues regarding single points of failure and control need to be considered.
title A Profile Guided Approach to Scheduling in Cluster and Multi-cluster Systems
author(s) Arvind Sridhar Dan Stanzione, Arizona State University, USA
presenter TBD
abstract Effective resource management remains a challenge for large scale cluster computing systems, as well as for clusters of clusters. Resource management involves the ability to monitor resource usage and enforce polices to manage available resources and provide the desired level of service. One of the difficulties in resource management is that users are notoriously inaccurate in predicting the resource requirements of their jobs. In this research, a novel concept called `profile guided scheduling' is proposed. This work examines if the resource requirements of a given job in a cluster system can be predicted based on the past behavior of the user's submitted jobs. The scheduler can use this predicted value to get an estimate of the job's performance metrics prior to scheduling and thus make better scheduling decisions based on the predicted value. In particular, this approach is applied in a multi-cluster setting, where the scheduler must account for limited network bandwidth available between clusters. By having prior knowledge of a job's bandwidth requirements, the scheduler can make intelligent co-allocation decisions and avoid co-allocating jobs that consume high network bandwidth. This will mitigate the impact of limited network performance on co-allocated jobs, decreasing turnaround time and increasing system throughput.
title Parallel File Systems on High-End Computers
author(s) Walter Ligon, Clemson University, USA
presenter Walter Ligon
abstract This presentation will cover the state of the art in parallel file systems today, and discuss new developments in the PVFS project.
Data / Grids
title Addressing HPC Infrastructure Problems by Pooling HPC Resources: The Thebes Middleware Consortium
author(s) Arnie Miles, Georgetown University, USA
presenter Arnie Miles
abstract Outside the world of Nationally funded supercomputers lies institutions with far great computing demands then their infrastructure can support. HPC devices are purchased and shoved into closets, server rooms run out of power and air conditioning, and floor space is at a premium. For the researchers outside this world, be they K-12, higher ed, corporate, or even government, there are limits to what they can do. Even if their budgets support huge expenditures, which is frequently not the case, there is simply no way to commission huge computational resources in many cases. Of course, many installed resources stand idle a large percentage of the time. These are installed to handle peaks in demand, and the rest of the time they have little or no load. As this continues, there continue to be advancements in network speeds and decreases in network latency. The Thebes Consortium is poised to discover and build the middleware stack that will allow the sharing of resources across administrative domains in a simple, secure, and scalable manner. There is already demand for these tools, and this demand will increase exponentially. Thebes uses the Shibboleth implementation of the Security Assertion Markup Language to pass user attributes to service providers. A sample service provider for Condor was created as part of a demonstration project, a preproduction example of a Sun Grid Engine service provider using Shibboleth and DRMAA is already available for alpha testing. To enable resource discovery, Thebes is working to harmonize efforts made by Open Grid Forum, Ganglia, and the various job schedulers into a single, robust resource and job description language, which will be used by a Resource Discovery Network to provide ordered lists of qualified resources to users wishing to discover service providers capable of executing computational programs.
title Experiences with Managed Hosting of Virtual Machines
author(s) Dustin Leverman, University of Colorado, USA: Henry Tufo, Michael Oberg, Matthew Woitaszek, NCAR, USA
presenter TBD
abstract Virtual machines (VMs) are emerging as a powerful paradigm to deploy complex software stacks on distributed cyberinfrastructure. As a TeraGrid resource provider and developers of Grid-enabled software, we experience both the client and system administrator perspectives of service hosting. We are using this point of view to integrate and deploy a managed hosting environment based on VM technology that fulfills the requirements of both developers and system administrators. The managed hosting environment provides the typical base benefits of any VM hosting technique, but provides additional software support for developers and system monitoring opportunities for administrators. This paper presents our experiences creating and using a prototype managed hosting environment. The design and configuration of our current implementation is described, and its feature set is evaluated from the context of several current hosted projects.
title A Framework for Semantic-Based Dynamic Access Control in Data Grids
author(s) Anil Pereira, Charles Moseley, Benjamin VanTreese, David Goree, Karl Kirch (Southwestern Oklahoma State University, USA); Dennis Ferron (Delta Dental, Oklahoma City, Oklahoma, USA)
presenter TBD
abstract The abundance of large commercial and scientific data stores have driven the need for Petascale computation and data integration. Technologies such as Grid Computing are being developed to address this need. Grid Computing supports the coordinated sharing of data and resources among different organizations. Data Grids focus on the management of data and resources for analyzing the data. Though Grid computing technologies have been adopted in many scientific and commercial sectors, many Security issues have to be resolved for them to gain wider acceptance. Their true potential will only be realized by developing secure systems that can encompass multiple organizations. In this paper, we consider the security implications of the dynamic interactions that would occur in Data Grids and examine the requirements for a comprehensive security model to support those interactions. We explain that such a model could be constructed by enhancing existing dynamic role-based access control models and semantic-based access control models. Additionally, we present an enumeration of the security requirements for such dynamic interactions. Our work takes into consideration that the co-allocation of resources and job scheduling in Data Grids should not only be based on the user’s request, and the available pool and state of resources, but also on the user’s access rights, computing environment and the Security policies of resources. Furthermore, we discuss the problem of making access control decisions dynamically during an application’s runtime.
title XGet: A Highly Scalable and Efficient File Transfer Tool for Clusters
author(s) Hugh Greenberg, Latchesar Ionkov, Los Alamos National Laboratory, USA; Ronald Minnich, Sandia National Laboratory, USA
presenter TBD
abstract Transferring files between nodes in a cluster is a common occurrence for administrators and users. As clusters rapidly grow in size, transferring files between nodes can no longer be solved by the traditional transfer utilities due to their inherent lack of scalability. In this paper, we describe a new file transfer utility called XGet, which was designed to address the scalability problem of standard tools. We compared XGet against four transfer tools: Bittorrent, Rsync, TFTP, and Udpcast and our results show that XGet's performance is superior to the these utilities in many cases.
title Characterizing Parallel Scaling of Scientific Applications Using IPM
author(s) Nicholas Wright, San Diego Supercomputer Center, USA
presenter Nicholas Wright
abstract Scientific applications will have to scale to many thousands of processor cores to reach petascale. Therefore it is crucial to understand the factors that affect their scalability. Here we examine the strong scaling of four representative codes that exhibit different behaviors on four machines. We demonstrate the efficiency and analytic power of our performance monitoring tool, IPM, to understand the scaling properties of these codes in terms of their communication and computation components. Our analysis reveals complexities that would not become apparent from simple scaling plots; we disambiguate application from machine bottlenecks and attribute bottlenecks to communication or computation routines; we show by these examples how one could generate such a study of any application and benefit from comparing the result to the work already done. We evaluate the prospects for using extrapolation of results at low concurrencies as a method of performance prediction at higher concurrencies.
title Performance Measurement and Analysis Tools for Very Large Clusters
author(s) Bernd Mohr (Forschungszentrum Juelich GmbH, Germany)
presenter Bernd Mohr
abstract The number of processor cores available in high-performance computing systems is steadily increasing. In the latest (Nov 2008) list of the TOP500 supercomputers, 99.2% of the systems listed have more than 1024 processor cores and the average is about 6200. While these machines promise ever more compute power and memory capacity to tackle today's complex simulation problems, they force application developers to greatly enhance the scalability of their codes to be able to exploit it. To better support them in their porting and tuning process, many parallel tools research groups have already started to work on scaling their methods, techniques, and tools to extreme processor counts. In this talk, we survey existing profiling and tracing tools, report on our experience in using them in extreme scaling environments, review existing working and promising new methods and techniques, and discuss strategies for solving unsolved issues and problems.