| Plenary Presentations
|
| Plenary Session I
|
Title |
|
Title |
Author(s) |
|
Brian Ropers-Huilman |
Author Inst |
|
Louisiana
State University. USA |
Presenter |
|
Brian Ropers-Huilman |
| Abstract |
|
Available soon. |
|
|
| Plenary Session II
|
Title |
|
Title |
Author(s) |
|
David Jursik |
Author Inst |
|
IBM Worldwide Deep Computing Sales |
Presenter |
|
David Jursik |
| Abstract |
|
Available soon.
|
|
|
| Plenary Session III
|
Title |
|
Title |
Author(s) |
|
Dr. Reza Rooholamini |
Author Inst |
|
Dell Product Group |
Presenter |
|
Dr. Reza
Rooholamini |
| Abstract |
|
Available soon . |
|
|
| Cluster
Health
|
| Title |
|
A Failure
Predictive and Policy-Based High Availability Strategy for
Linux High Performance Computing Cluster |
| Author(s) |
Chokchai Leangsuksun1, Tong Liu11, Tirumala
Rao11, Stephen L. Scott2, and Richard Libby3 |
| Author Inst. |
1Louisiana Tech University, 2Oak
Ridge National Laboratory, 3Intel Corporation |
| Presenter |
Chokchai
Leangsuksun |
| Abstract |
The Open Source Cluster Application Resources
(OSCAR) is a fully integrated cluster software stack designed
for building and maintaining a Linux Beowulf cluster. As OSCAR
has become a popular tool for building the cost-effective HPC
cluster, undoubtedly, High Availability (HA) will equally be
an important aspect that enables HPC systems, as clearly an unavailable
cluster equals no performance. To embrace both HA and HPC features,
we created the HA-OSCAR solution, which eliminates the numerous
single-point-of -failure in HPC systems and alleviates unplanned
downtime through sophisticated self-healing mechanisms and
hardware-level failure detection and prediction based on the
Service Availability Forum's Hardware Platform Interface (OpenHPI).
Service monitoring and policy-based head node during recovery
is also discussed in detail. Furthermore, we investigate a network
file-system issue during server failure and resolution via the
High Reliable Network File System (HR-NFS), without the need
for an expensive hardware-based, shared-storage solution. Furthermore,
our solution enables a graceful recovery with a deliberate
job checkpointing and migration upon head node failure prediction.
Finally, we introduce our Web-based management module that
provides a customizable service monitoring, recovery/failover
management mechanism with an effective cluster monitoring ability.
|
|
|
Title |
Listening to Your Cluster
with LoGS |
Author(s) |
James E. Prewett |
Author Inst. |
HPC@UNM |
Presenter |
Jim Prewett |
Abstract |
Introduction
Large systems are now being built
from smaller systems. GNU/Linux clusters have gone from a fad
at a couple of academic institutions to being some of the largest
and fastest computers in the world. Similarly designed systems
have become the bread and butter of more-traditional supercomputing
vendors. Cluster computing is now big business. All of the
computers listed in the TOP500 List for November 2003 are clusters!
Further, smaller clusters are now being purchased (and often
administered) by individual research groups; clusters are accessible
computing power.
One way to make the administration of these machines bearable
is to carefully analyze system logs with a real-time analysis
tool. Unfortunately, the available free-software tools are
lacking in many ways when events must be correlated across a
potentially very large number of machines. Another issue with
log analysis for larger clusters is that the volume of log data
can be quite large. Free tools such as Logsurfer and SWATCH can
be very inefficient when finding interesting messages due to
the organization of their ruleset.
LoGS is a log analysis engine that attempts to address many
of the issues with maintaining cluster machines. LoGS has a dynamic
ruleset, is able to look for one or more messages before triggering
an action and has a powerful programming language use for configuration
and extension. With proper rule-set construction, LoGS is a very
efficient analysis engine. |
|
| Data Rates
|
| Title |
A NIC-Offload Implementation
of Portals for Quadrics QsNet |
| Author(s) |
Kevin Pedretti, Ron Brightwell |
| Author Inst |
Sandia National Laboratories, USA |
| Presenter |
Kevin Pedretti |
| Abstract |
The Portals data movement layer was specifically
designed to support intelligent and/or programmable network interface
cards, such as Quadrics QsNet. Portals provides elementary building
blocks that can be combined to implement a variety of upper
layer protocols. As such, it is general enough to support many
different types of services that require data movement, such
as MPI and parallel file systems. While the QsNet interface and
its associated software stack were also designed to support a
variety of upper-layer protocols, there are significant differences
in the approach taken to achieve generality. In this paper, we
analyze the different capabilities offered by Portals and the
QsNet network stack. We discuss the design and implementation
of Portals for QsNet and present a performance comparison using
micro-benchmarks. We analyze how the different approaches have
impacted performance and discuss how future intelligent network
interface may be able to overcome some of the current limitations.
|
|
|
| Title |
Benchmarking Parallel
I/O Performance for Computational Fluid Dynamics Applications |
| Author |
Thomas Hauser |
| Author Inst |
Utah State University, USA |
| Presenter |
Thomas Hauser |
| Abstract |
Available soon.
|
|
|
| Applications
Track Abstracts:
|
| Applications Papers I
|
| Title |
Dynamic Load-Balancing
Algorithm Porting on MIMD Machines |
| Author(s) |
Francisco Muniz |
| Author Inst |
CDTN/CNEN, Brazil |
| Presenter |
Francisco Muniz |
| Abstract |
This paper describes the porting strategies and
the implementation of a dynamic load-balancing mechanism over
the PVM library. Such a load-balancing mechanism, the Extended
Gradient approach, is found in the open literature. The implementation
was done using the 'C' programming language, running over Linux/X86
compute nodes. Some results that validate the usefulness of the
load-balancing system are presented. The conclusions are general
and not restricted to a any particular architecture of distributed-memory
MIMD (Multiple Instruction, Multiple Data) machines.
|
|
|
Title |
Optimizing Linux Cluster
Performance by Exploring the Correlation between Application
Characteristics and Gigabit Ethernet Device Parameters |
Author(s) |
Onur Celebioglu, Tau Leng, Victor Mashayekhi |
Author Inst. |
Dell Inc., USA |
Presenter |
Onur Celebioglu |
Abstract |
Cluster interconnect performance is typically
characterized by latency and throughput. However, not only latency
and throughput but also the CPU utilization of an interconnect
are important attributes that affect overall system performance.
In our studies, we have run cluster benchmarks with two device
drivers with different throughput and latency characteristics.
We have observed that point-to-point performance tests such as
throughput and latency cannot be translated directly into application
performance. We also tried to further tune the performance of
the system by changing the interrupt coalescing parameters one
of the drivers. Finally, we used this data to understand the
correlation between an application's characteristics and interconnect
performance attributes.
|
|
|
Title |
Performance Analysis
of a HybridParallel Linear Algebra Kernel |
Author(s) |
Sue Goudy, Lorie Liebrock, and Steve Schaffer |
Author Inst. |
New Mexico Institute of Mining and Technology,
USA |
Presenter |
Sue Goudy |
Abstract |
The focus of this paper is the performance of
a kernel from a two-dimensional iterative solver. Complexity
models for hybrid parallelization of block Gauss-Seidel relaxation
are derived. We examine system parameters that can affect the
performance of hybrid code. Complexity estimates are tested for
a variety of decomposition strategies and problem sizes. Results
from the Intel Teraflops supercomputer and from the Vplant visualization
cluster at Sandia National Laboratories are presented. We show
that the benefits oaf hybrid programming for this iterative solver
are limited, on both the massively parallel system and the Linux
cluster.
|
|
|
| Applications Papers II:
Education and Training
|
Title |
In Search of Clusters
for HIgh-Performance Computing Education |
Author(s) |
Paul Gray |
Author Inst |
University of Northern Iowa, USA |
Presenter |
Paul Gray |
| Abstract |
Available soon. |
|
|
| Title |
Classroom Exercises
for Grid Services |
| Author(s) |
Amy Apon1, Jens Mache2,
Yuriko Yara1, and Kurt
Landrus1 |
| Author Inst |
1University of Arkansas and 2Lewis & Clark College,
USA |
| Presenter |
Any Apon |
| Abstract |
Grid protocols and technologies are being adopted
in a wide variety of academic, government, and industry research
laboratories, and there is a growing body of research-oriented
literature in Grid computing. However, there is a need for educational
material that is suitable for classroom use. The goal of this
paper is to develop and evaluate a suite of classroom exercise
for use in a graduate or advanced undergraduate course. The exercises
build on basic knowledge of operating systems concepts at the
undergraduate level. This paper presents our design of the exercises.
We evaluate the effectiveness of one exercise extensively and
provide suggestions to educators about how to effectively use
the Globus Toolkit 3 in a classroom setting.
|
|
|
| Title |
Automating the
Large-Scale Collection an Analysis of Performance Data on Linux
Clusters |
| Author(s) |
Philip Mucci1, Jack Dongarra1,
Shirley Moore1, Fengguang Song1, Felix Wolf1, and Rick
Kufrin2 |
| Author Inst |
1University of Tennessee and 2NCSA/University
of Illinois, USA |
| Presenter |
Rick Kufrin |
| Abstract |
Introduction
Many factors contribute to overall
application performance in today's high-performance cluster computing
environments. These factors include the memory subsystem, network
hardware and software stack, compilers and libraries, and I/O
subsystem. The large variability in hardware and software configuration
present in clusters can cause application performance to also
exhibit large variability on different platforms or on the same
platform over time. Compute-intensive applications may perform
well on an architecture with efficient utilization of CPU and
single-processor memory, such as the Intel Xeon, while memory-intensive
applications may perform well on and architecture with good scalability
of the memory subsystem, such as the AMD Opteron node. Even with
a fixed hardware configuration, software factors can cause large
variations in performance. Compilers that produce acceptable
code on some platform configurations may produce sub-optimal
code on other platform variants. Some math libraries require
hand tuning of various complied-in parameters, variant of the
same platform. Some libraries (e.g., BLAS, LAPACK) have standardized
APIs that are shared across different implementations that can
have considerable variations on performance. It can be difficult
to predict which library variant will perform best on a particular
platform without twisting each variant on the platform. If an
application is updated and/or port to a platform originally not
supported, the optimization flags in the application Makefile
may be anachronistic or otherwise inappropriate and may need
to be altered to achieve acceptable performance on new target
platforms and platform variants.
|
|
|
| Systems
Track Abstracts:
|
| Systems Papers I
|
| Title |
Unified Heterogeneous
HPCC Hardware Management Framework |
| Author |
Yung-Chin Fang, Jeffrey Mayerson, Rizwan Ali,
Monica Kashyap, Jenwei Hsieh, Tau Leng Victor Mashayeckhi |
| Author Inst |
Dell Inc., USA |
| Presenter |
Yung-Chin Fang |
| Abstract |
The remote, hardware-level management of heterogeneous
clusters (such as the remote power cycling of a hung node) is
a necessary task for a computer center. This task requires knowledge
across multiple specifications, fabrics (hardware, firmware,
software, management) and implementations. For a heterogeneous
cluster environment, there is little in common across hardware-level
management interface implementations. In a heterogeneous HPCC,
grid or cyber-infrastructure environment. there is need to have
a common hardware-management interface across unique architecture,
platform, firmware, software and management fabric implementations.
This paper presents the framework of a unified interface across
heterogeneous clusters to overcome these differences. This paper
also addresses certain findings in the prototyping process.
|
|
|
| Title |
Cluster Security
as a Unique Problem with Emerent Properities: Issues and Techniques |
| Author |
William Yurcik, Gregory A. Koenig, Xin Meng,
and Joseph Greenseid |
| Author Inst |
NCSA/University of Illinois, USA |
| Presenter |
Joseph Greenseid |
| Abstract |
Large-scale commodity cluster systems are finding
increasing deployment in academic, research, and commercial settings.
Coupled with this increasing popularity are concerns regarding
the security of these clusters. While an individual commodity
machine may have prescribed best practices for security, a cluster
of commodity machines has emergent security properties that
are unique from the sum of its parts. This concept has not yet
been addressed in either cluster administration techniques or
the research literature. We highlight the emergent properties
of cluster security that distinguish it as a unique problem
space and then outline a unified framework for protection techniques.
We conclude with a description of preliminary progress on a monitoring
project focused specifically on cluster security that we have
started at the National Center for Supercomputing Applications. |
|
|
| Title |
Batch System Deployment
on a Production Terascale Cluster |
| Author |
karl W. Schulz, Kent Milfeld, Chona S. Guiang,
Avijit Purkayastha, Tommy Minyard, John R. Boisseau, and John
Casu |
| Author Inst |
TACC/University of Texas-Austin, USA |
| Presenter |
Karl Schulz |
| Abstract |
On multi-user HPC clusters, the batch system
is a key component for aggregating compute nodes into a single,
sharable computing resource. The batch system becomes the "nerve
center" for coordinating the use of resources and controlling
the state of the system in a way the must be "fair" to its users.
Large, multi-user clusters need batch utilities that are robust,
reliable, flexible, and easy to use and administer. In this paper
we present our experiences with the configuration and deployment
of a terascale cluster of 600 processors, with particular attention
given to the integration of the LSF HPC batch system software
by Platform Computing. To begin, we review the cluster design
and present our requirements for a production batch environment
supporting a community of hundreds of users. Next, we outline
the configuration and extensions made to the LSF batch system
and operating environment to meet our design criteria, including
the development of job-monitoring and job-filtering application,
authentication modifications to manage compute node access,
and integration of the system with internal accounting applications.
Initial scalability results using LSF for MPI applications are
presented and compared against modified versions of the LSF application
suite. The modified version incurred substantially lower overhead
and provided good scalability on MPI applications up to 600 processors.
Implementation of software updates as RPM packages, the use of
modules for environment management, and the development of tools
for monitoring compute-node software states have helped to insure
a consistent, system-wide environment of user jobs across node
failures and system reboots. |
|
|
| Systems Papers II: Processor
and File System Performance
|
| Title |
Performance Characteristics
of Dual-Processor HPC Cluster Nodes Based on 64-bit Commodity
Processors |
| Author |
A. Purkayastha, C.S. Guiang, K. Schulz, T. Minyard,
K. Milfeld, W. Barth, P. Hurley, and J.R. Boisseau |
| Author Inst |
TACC/University of Texas, USA |
| Presenter |
Chona S. Guiang |
| Abstract |
Dual-processor nodes are the preferred building
blocks in HPC clusters because of the greater performance-to-price
ratio of such configurations relative to clusters comprising
single-processor nodes. The arrival of 64-bit commodity clusters
for HPC is advantageous for applications that require large amounts
of memory and I/O because of the larger memory addressability
of these processors. Some of these 64-bit processors also use
more advanced memory subsystems, which provide increased performance
for some applications. This paper examines the overall performance
characteristics of three dual-processor systems based on commodity
64-bit processors: Intel Itanium2, AMD Opteron and IBM PowerPC
970, also know as the Apple PowerPC G5. First, a low-level characterization
of each system is obtained using a variety of computational kernels
and micro-benchmarks to measure the speeds of the functional
units and memory subsystems. Performance measurements and analysis
of several scientific applications that span a wide range of
computational requirements are presented next. Finally, we offer
some general observations and insights on performance for applications
developers and discuss 32- to 64-bit migration and interoperability
issues. |
|
|
| Title |
An Analysis of
State-of-the-Art Parallel File System for Linux |
| Author |
Martin W. Margo, Patricia A. Kovatch, Phil Andews,
Bryan Banister |
| Author Inst |
SDSC/University of California - San Diego, USA |
| Presenter |
Martin W. Margo |
| Abstract |
Parallel file systems are a critical piece of
any Input/Output (/O)-intensive high-performance computing system.
A parallel file system enables each process on every node to
perform I/O to and from a common storage target. With more and
more sites adopting Linux clusters for high-performance computing,
the need for high-performing I/O on Linux is increasing. New
options are available for Linux: IBM's GPFS (General Parallel
File System) and Cluster File Systems, Inc.'s Lustre. Parallel
Virtual File System (PVFS) from Clemson University and Argonne
National Laboratories continues to be available. Using our IA-64
Linux cluster testbed, we evaluated each parallel file system
on its ease of installation and administration, redundancy, performance,
scalability and special features. We analyzed the results of
our experiences and concluded with comparison information.
|
|
|
| Title |
Comparing Linus
Clusters for teh Community Climate System Model |
| Author |
Matthew Woitaszek, Michael Oberg, and Henry
M. Tufo |
| Author Inst |
University of Colorado - Boulder, USA |
| Presenter |
Matthew Woitaszek |
| Abstract |
In this paper, we examine the performance
of two components of the NCAR Community Climate System Model
(CCSM), executing on clusters with a variety of microprocessor
architectures and interconnects. Specifically, we examine the
execution time and scalability of the Community Atmospheric
Model (CAM) and the Parallel Ocean Program (POP) on Linux clusters
with Intel Xeon and AMD Opteron processors, using Dolphin,
Myrinet, and Infiniband interconnects, and compare the performance
of the cluster systems to an SGI Altix and IBM p690 supercomputer.
Of the architectures examined, clusters constructed using AMD
Opteron processors generally demonstrate the best performance,
outperforming Xeon clusters nd occasionally an IBM p6690 supercomputer
in simulated years per day.
|
|
|
Vendor
Presentations
|
| Vendor Session I
|
Title |
The File System
Challenge in HPC |
Author(s) |
Ben Rosen |
Author Inst |
Dell Inc., USA |
Presenter |
Ben Rosen |
| Abstract |
Available soon. |
|
|
Title |
Smart Interconnect:
Recent Developments in Myricom Hardware and Software |
Author(s) |
Patrick Geoffray |
Author Inst |
Myricom, USA |
Presenter |
Patrick Geoffray |
| Abstract |
Available soon. |
|
|
| Vendor Session II
|
Title |
Clusteriing Solutions
from IBM |
Author(s) |
Rebecca Austen and Jay Urbanski |
Author Inst |
IBM, USA |
Presenter |
Rebecca Austen and Jay Urbanski |
| Abstract |
Available soon. |
|
|
Title |
Experiences with
Large Production Clusters at Sandia |
Author(s) |
Robert Ballance |
Author Inst |
Sandia National Laboratories, USA |
Presenter |
Robert Ballance |
| Abstract |
Available soon. |
|
|