abstracts
| 2004 Abstracts Plenary Presentations, Cluster Health, Data Rates , Applications Track, Systems Track, and Vendor Presentations Abstracts Last updated: 23 April 2004See also, Tutorials. |
||
Plenary Session I |
||
Title |
||
Author(s) |
Brian Ropers-Huilman |
|
Author Inst |
Louisiana State University. USA |
|
Presenter |
||
Abstract |
Available soon. |
|
|
||
Plenary Session II |
||
Title |
||
Author(s) |
David Jursik |
|
Author Inst |
IBM Worldwide Deep Computing Sales |
|
Presenter |
||
Abstract |
Available soon. |
|
|
||
Plenary Session III |
||
Title |
||
Author(s) |
Dr. Reza Rooholamini |
|
Author Inst |
Dell Product Group |
|
Presenter |
||
Abstract |
Available soon . |
|
|
||
Title |
A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster |
|
Author(s) |
Chokchai Leangsuksun1, Tong Liu11, Tirumala Rao11, Stephen L. Scott2, and Richard Libby3 |
|
Author Inst. |
1Louisiana Tech University, 2Oak Ridge National Laboratory, 3Intel Corporation |
|
Presenter |
||
Abstract |
The Open Source Cluster Application Resources (OSCAR) is a fully integrated cluster software stack designed for building and maintaining a Linux Beowulf cluster. As OSCAR has become a popular tool for building the cost-effective HPC cluster, undoubtedly, High Availability (HA) will equally be an important aspect that enables HPC systems, as clearly an unavailable cluster equals no performance. To embrace both HA and HPC features, we created the HA-OSCAR solution, which eliminates the numerous single-point-of -failure in HPC systems and alleviates unplanned downtime through sophisticated self-healing mechanisms and hardware-level failure detection and prediction based on the Service Availability Forum's Hardware Platform Interface (OpenHPI). Service monitoring and policy-based head node during recovery is also discussed in detail. Furthermore, we investigate a network file-system issue during server failure and resolution via the High Reliable Network File System (HR-NFS), without the need for an expensive hardware-based, shared-storage solution. Furthermore, our solution enables a graceful recovery with a deliberate job checkpointing and migration upon head node failure prediction. Finally, we introduce our Web-based management module that provides a customizable service monitoring, recovery/failover management mechanism with an effective cluster monitoring ability. |
|
Title |
||
Author(s) |
James E. Prewett |
|
Author Inst. |
HPC@UNM |
|
Presenter |
||
Abstract |
Introduction One way to make the administration of these machines bearable is to carefully analyze system logs with a real-time analysis tool. Unfortunately, the available free-software tools are lacking in many ways when events must be correlated across a potentially very large number of machines. Another issue with log analysis for larger clusters is that the volume of log data can be quite large. Free tools such as Logsurfer and SWATCH can be very inefficient when finding interesting messages due to the organization of their ruleset. LoGS is a log analysis engine that attempts to address many of the issues with maintaining cluster machines. LoGS has a dynamic ruleset, is able to look for one or more messages before triggering an action and has a powerful programming language use for configuration and extension. With proper rule-set construction, LoGS is a very efficient analysis engine. |
|
Title |
||
Author(s) |
Kevin Pedretti, Ron Brightwell |
|
Author Inst |
Sandia National Laboratories, USA |
|
Presenter |
||
Abstract |
The Portals data movement layer was specifically designed to support intelligent and/or programmable network interface cards, such as Quadrics QsNet. Portals provides elementary building blocks that can be combined to implement a variety of upper layer protocols. As such, it is general enough to support many different types of services that require data movement, such as MPI and parallel file systems. While the QsNet interface and its associated software stack were also designed to support a variety of upper-layer protocols, there are significant differences in the approach taken to achieve generality. In this paper, we analyze the different capabilities offered by Portals and the QsNet network stack. We discuss the design and implementation of Portals for QsNet and present a performance comparison using micro-benchmarks. We analyze how the different approaches have impacted performance and discuss how future intelligent network interface may be able to overcome some of the current limitations. |
|
Title |
Benchmarking Parallel I/O Performance for Computational Fluid Dynamics Applications |
|
Author |
Thomas Hauser |
|
Author Inst |
Utah State University, USA |
|
Presenter |
||
Abstract |
Available soon. |
|
Applications Papers I |
||
Title |
||
Author(s) |
Francisco Muniz |
|
Author Inst |
CDTN/CNEN, Brazil |
|
Presenter |
||
Abstract |
This paper describes the porting strategies and the implementation of a dynamic load-balancing mechanism over the PVM library. Such a load-balancing mechanism, the Extended Gradient approach, is found in the open literature. The implementation was done using the 'C' programming language, running over Linux/X86 compute nodes. Some results that validate the usefulness of the load-balancing system are presented. The conclusions are general and not restricted to a any particular architecture of distributed-memory MIMD (Multiple Instruction, Multiple Data) machines. |
|
Title |
Optimizing Linux Cluster Performance by Exploring the Correlation between Application Characteristics and Gigabit Ethernet Device Parameters |
|
Author(s) |
Onur Celebioglu, Tau Leng, Victor Mashayekhi |
|
Author Inst. |
Dell Inc., USA |
|
Presenter |
||
Abstract |
Cluster interconnect performance is typically characterized by latency and throughput. However, not only latency and throughput but also the CPU utilization of an interconnect are important attributes that affect overall system performance. In our studies, we have run cluster benchmarks with two device drivers with different throughput and latency characteristics. We have observed that point-to-point performance tests such as throughput and latency cannot be translated directly into application performance. We also tried to further tune the performance of the system by changing the interrupt coalescing parameters one of the drivers. Finally, we used this data to understand the correlation between an application's characteristics and interconnect performance attributes. |
|
Title |
Performance Analysis of a HybridParallel Linear Algebra Kernel |
|
Author(s) |
Sue Goudy, Lorie Liebrock, and Steve Schaffer |
|
Author Inst. |
New Mexico Institute of Mining and Technology, USA |
|
Presenter |
||
Abstract |
The focus of this paper is the performance of a kernel from a two-dimensional iterative solver. Complexity models for hybrid parallelization of block Gauss-Seidel relaxation are derived. We examine system parameters that can affect the performance of hybrid code. Complexity estimates are tested for a variety of decomposition strategies and problem sizes. Results from the Intel Teraflops supercomputer and from the Vplant visualization cluster at Sandia National Laboratories are presented. We show that the benefits oaf hybrid programming for this iterative solver are limited, on both the massively parallel system and the Linux cluster. |
|
Applications Papers II: Education and Training |
||
Title |
In Search of Clusters for HIgh-Performance Computing Education |
|
Author(s) |
Paul Gray |
|
Author Inst |
University of Northern Iowa, USA |
|
Presenter |
||
| Abstract | Available soon. |
|
Title |
||
Author(s) |
Amy Apon1, Jens Mache2, Yuriko Yara1, and Kurt Landrus1 |
|
Author Inst |
1University of Arkansas and 2Lewis & Clark College, USA |
|
Presenter |
||
Abstract |
Grid protocols and technologies are being adopted in a wide variety of academic, government, and industry research laboratories, and there is a growing body of research-oriented literature in Grid computing. However, there is a need for educational material that is suitable for classroom use. The goal of this paper is to develop and evaluate a suite of classroom exercise for use in a graduate or advanced undergraduate course. The exercises build on basic knowledge of operating systems concepts at the undergraduate level. This paper presents our design of the exercises. We evaluate the effectiveness of one exercise extensively and provide suggestions to educators about how to effectively use the Globus Toolkit 3 in a classroom setting. |
|
Title |
Automating the Large-Scale Collection an Analysis of Performance Data on Linux Clusters |
|
Author(s) |
Philip Mucci1, Jack Dongarra1, Shirley Moore1, Fengguang Song1, Felix Wolf1, and Rick Kufrin2 |
|
Author Inst |
1University of Tennessee and 2NCSA/University of Illinois, USA |
|
Presenter |
||
Abstract |
Introduction |
|
Systems Papers I |
||
Title |
||
Author |
Yung-Chin Fang, Jeffrey Mayerson, Rizwan Ali, Monica Kashyap, Jenwei Hsieh, Tau Leng Victor Mashayeckhi |
|
Author Inst |
Dell Inc., USA |
|
Presenter |
||
Abstract |
The remote, hardware-level management of heterogeneous clusters (such as the remote power cycling of a hung node) is a necessary task for a computer center. This task requires knowledge across multiple specifications, fabrics (hardware, firmware, software, management) and implementations. For a heterogeneous cluster environment, there is little in common across hardware-level management interface implementations. In a heterogeneous HPCC, grid or cyber-infrastructure environment. there is need to have a common hardware-management interface across unique architecture, platform, firmware, software and management fabric implementations. This paper presents the framework of a unified interface across heterogeneous clusters to overcome these differences. This paper also addresses certain findings in the prototyping process. |
|
Title |
Cluster Security as a Unique Problem with Emerent Properities: Issues and Techniques |
|
Author |
William Yurcik, Gregory A. Koenig, Xin Meng, and Joseph Greenseid |
|
Author Inst |
NCSA/University of Illinois, USA |
|
Presenter |
||
Abstract |
Large-scale commodity cluster systems are finding increasing deployment in academic, research, and commercial settings. Coupled with this increasing popularity are concerns regarding the security of these clusters. While an individual commodity machine may have prescribed best practices for security, a cluster of commodity machines has emergent security properties that are unique from the sum of its parts. This concept has not yet been addressed in either cluster administration techniques or the research literature. We highlight the emergent properties of cluster security that distinguish it as a unique problem space and then outline a unified framework for protection techniques. We conclude with a description of preliminary progress on a monitoring project focused specifically on cluster security that we have started at the National Center for Supercomputing Applications. |
|
Title |
||
Author |
karl W. Schulz, Kent Milfeld, Chona S. Guiang, Avijit Purkayastha, Tommy Minyard, John R. Boisseau, and John Casu |
|
Author Inst |
TACC/University of Texas-Austin, USA |
|
Presenter |
||
Abstract |
On multi-user HPC clusters, the batch system is a key component for aggregating compute nodes into a single, sharable computing resource. The batch system becomes the "nerve center" for coordinating the use of resources and controlling the state of the system in a way the must be "fair" to its users. Large, multi-user clusters need batch utilities that are robust, reliable, flexible, and easy to use and administer. In this paper we present our experiences with the configuration and deployment of a terascale cluster of 600 processors, with particular attention given to the integration of the LSF HPC batch system software by Platform Computing. To begin, we review the cluster design and present our requirements for a production batch environment supporting a community of hundreds of users. Next, we outline the configuration and extensions made to the LSF batch system and operating environment to meet our design criteria, including the development of job-monitoring and job-filtering application, authentication modifications to manage compute node access, and integration of the system with internal accounting applications. Initial scalability results using LSF for MPI applications are presented and compared against modified versions of the LSF application suite. The modified version incurred substantially lower overhead and provided good scalability on MPI applications up to 600 processors. Implementation of software updates as RPM packages, the use of modules for environment management, and the development of tools for monitoring compute-node software states have helped to insure a consistent, system-wide environment of user jobs across node failures and system reboots. |
|
Systems Papers II: Processor and File System Performance |
||
Title |
Performance Characteristics of Dual-Processor HPC Cluster Nodes Based on 64-bit Commodity Processors |
|
Author |
A. Purkayastha, C.S. Guiang, K. Schulz, T. Minyard, K. Milfeld, W. Barth, P. Hurley, and J.R. Boisseau |
|
Author Inst |
TACC/University of Texas, USA |
|
Presenter |
||
Abstract |
Dual-processor nodes are the preferred building blocks in HPC clusters because of the greater performance-to-price ratio of such configurations relative to clusters comprising single-processor nodes. The arrival of 64-bit commodity clusters for HPC is advantageous for applications that require large amounts of memory and I/O because of the larger memory addressability of these processors. Some of these 64-bit processors also use more advanced memory subsystems, which provide increased performance for some applications. This paper examines the overall performance characteristics of three dual-processor systems based on commodity 64-bit processors: Intel Itanium2, AMD Opteron and IBM PowerPC 970, also know as the Apple PowerPC G5. First, a low-level characterization of each system is obtained using a variety of computational kernels and micro-benchmarks to measure the speeds of the functional units and memory subsystems. Performance measurements and analysis of several scientific applications that span a wide range of computational requirements are presented next. Finally, we offer some general observations and insights on performance for applications developers and discuss 32- to 64-bit migration and interoperability issues. |
|
Title |
An Analysis of State-of-the-Art Parallel File System for Linux |
|
Author |
Martin W. Margo, Patricia A. Kovatch, Phil Andews, Bryan Banister |
|
Author Inst |
SDSC/University of California - San Diego, USA |
|
Presenter |
||
Abstract |
Parallel file systems are a critical piece of any Input/Output (/O)-intensive high-performance computing system. A parallel file system enables each process on every node to perform I/O to and from a common storage target. With more and more sites adopting Linux clusters for high-performance computing, the need for high-performing I/O on Linux is increasing. New options are available for Linux: IBM's GPFS (General Parallel File System) and Cluster File Systems, Inc.'s Lustre. Parallel Virtual File System (PVFS) from Clemson University and Argonne National Laboratories continues to be available. Using our IA-64 Linux cluster testbed, we evaluated each parallel file system on its ease of installation and administration, redundancy, performance, scalability and special features. We analyzed the results of our experiences and concluded with comparison information. |
|
Title |
Comparing Linus Clusters for teh Community Climate System Model |
|
Author |
Matthew Woitaszek, Michael Oberg, and Henry M. Tufo |
|
Author Inst |
University of Colorado - Boulder, USA |
|
Presenter |
||
Abstract |
In this paper, we examine the performance of two components of the NCAR Community Climate System Model (CCSM), executing on clusters with a variety of microprocessor architectures and interconnects. Specifically, we examine the execution time and scalability of the Community Atmospheric Model (CAM) and the Parallel Ocean Program (POP) on Linux clusters with Intel Xeon and AMD Opteron processors, using Dolphin, Myrinet, and Infiniband interconnects, and compare the performance of the cluster systems to an SGI Altix and IBM p690 supercomputer. Of the architectures examined, clusters constructed using AMD Opteron processors generally demonstrate the best performance, outperforming Xeon clusters nd occasionally an IBM p6690 supercomputer in simulated years per day. |
|
Vendor Session I |
||
Title |
||
Author(s) |
Ben Rosen |
|
Author Inst |
Dell Inc., USA |
|
Presenter |
Ben Rosen |
|
Abstract |
Available soon. |
|
Title |
Smart Interconnect: Recent Developments in Myricom Hardware and Software |
|
Author(s) |
Patrick Geoffray |
|
Author Inst |
Myricom, USA |
|
Presenter |
||
Abstract |
Available soon. |
|
Vendor Session II |
||
Title |
||
Author(s) |
Rebecca Austen and Jay Urbanski |
|
Author Inst |
IBM, USA |
|
Presenter |
||
Abstract |
Available soon. |
|
Title |
||
Author(s) |
Robert Ballance |
|
Author Inst |
Sandia National Laboratories, USA |
|
Presenter |
||
Abstract |
Available soon. |
|

