Plenary
Presentations
|
Plenary I
|
Title |
|
Opportunities
and Challenges in High-End Computing for Science
and Engineering |
Author(s) |
|
Thom H. Dunning, Jr. |
Author Inst |
|
NCSA/University of Illinois Urbana-Champaign |
Presenter |
|
Thom
H. Dunning, Jr. |
Abstract |
|
Computational modeling and simulation
were among the most significant developments in
the practice of scientific inquiry in the 20th
century. They were significant contributions to
scientific and engineering research programs, finding
increasing use in a broad range of industrial applications.
We are presently in the midst of a revolution in
computing technologies with an order-of-magnitude
increase in computing capability every three to
five years. The most pressing question now is:
What must scientists and engineers do to harness
the power of high-end computing and information
technologies to solve the most critical problems
in science and engineering? Realizing the scientific
and engineering advances promised by the revolution
in computing technologies will require a holistic
approach to computational science and engineering.
It will require advances in the theoretical and
mathematical sciences leading to computational
models of ever increasing predictive power and
fidelity. It will require close collaboration
among computational scientists, computer scientists,
and applied mathematicians to translate these advances
into scientific and engineering applications that
can realize the full potential of high-end computers. It
will require educating a new generation of scientists
and engineers to use computational modeling and
simulation to address the challenging problems
that they will face in the 21st century. In this
presentation, we will briefly discuss all of these
issues. |
|
|
Plenary II
|
Title |
|
Transforming
the Sensing and Prediction of Intense Local
Weather Through Dynamic Adaptation: People
and Technologies Interacting with the Atmosphere |
Author(s) |
|
Kelvin Droegemeier |
Author Inst |
|
University of Oklahoma |
Presenter |
|
Kelvin
Droegemeier |
Abstract |
|
Each year across the United States,
floods, tornadoes, hail, strong winds, lightning,
and winter storms—so-called mesoscale
weather events—cause hundreds of deaths,
routinely disrupt transportation and commerce,
and result in annual economic losses greater than
$13B. Although mitigating the impacts of such events
would yield enormous economic and societal benefits,
the ability to do so is stifled by rigid IT frameworks
that cannot accommodate the real time, on-demand,
and dynamically adaptive needs of mesoscale
weather research; its disparate, high-volume data
sets and streams; and the tremendous computational
demands of its numerical models and data assimilation
systems.
This presentation describes a major paradigm
shift now under way in the field of meteorology—away
from today's environment in which remote sensing
systems, atmospheric prediction models, and hazardous
weather detection systems operate in fixed configurations,
and on fixed schedules largely independent of
weather—to one in which they can change
their configuration dynamically in response to
the evolving weather. This transformation involves
the creation of Grid-enabled systems that can
operate on demand and obtain the needed computing,
networking, and storage resources with little
prior planning, as weather and end-user needs
dictate. In addition to describing the research
and technology development being performed to
establish this capability, I discuss the associated
economic and societal implications of dynamically
adaptive weather systems and the manner in which
this new paradigm can serve as an underpinning
for future cyberinfrastructure development. |
|
|
Plenary III
|
Title |
|
Life, Liberty
and the Pursuit of Larger Clusters |
Author(s) |
|
Mark Seager |
Author Inst |
|
Lawrence Livermore National Laboratory |
Presenter |
|
Mark
Seager |
Abstract |
|
The state of the art in Linux clusters
is O(1K) nodes. Now many institutions are wondering
if even larger clusters can be built. Also, many
cluster interconnects can practically scale to
O(4K) to O(6K). In this talk, we offer some thoughts
on the current limitations in Linux clusters and
their impact on scaling up. We also offer some
thoughts on what topics Linux cluster developers
should focus on in order to enable much larger
clusters. |
|
|
Applications
Track Abstracts:
|
Applications
Papers I:
|
Title |
|
Performance
Metrics for Ocean and Air Quality Models on Commodity
Linux Platforms |
Author(s) |
George Delic |
Author Inst. |
HiPERiSM Consulting |
Presenter |
George
Delic |
Abstract |
Introduction
This a report on a project to evaluate industry standard fortran
90/95 compilers for IA-32 Linux commodity platforms when
applied to Air Quality Models (AQM). The goal is to determine
the optimal performance and workload though-put achievable
with commodity hardware. Only a few AQM's have been successfully
converted to OpenMP (CAMs), or MPI (CMAQ) and considerable
work remains to be done on others. In exploring the potential
for parallelism it has been interesting to discover the problems
with serial performance on several AQM codes. For this reason
we have searched for more precise metrics of performance
as an aid to measuring progress in performance enhancement.
The historical analogy is the programming environment on
Cray architectures which enable the development of performance
attributes for either individual codes or workloads using
hardware performance counters. Since commodity processors
also have performance counters, software interfaces such
as PAPI, may be used to read them.
This study applied the PAPI library in understanding
what delivered performance is for two AQM's.
ISCST3 and AERMOD and what the optimal achievable
performance can be . For the latter, as a base
line, two Ocean models with good vector character
have been included. These are used to measure
the optimal performance to be expected on commodity
hardware with available compiler technology.
In addition to performance metrics (as derived
from hardware performance counter values) some
comments on I?O storage performance are also
included because of the special character of
I?o requirements in AQMs. |
|
|
Title |
Cluster
Computing Through an Application-Oriented Computational
Chemistry Grid |
Author(s) |
Kent Milfeld |
Author Inst. |
TACC/University of Texas at Austin |
Presenter |
Kent
Milfeld |
Abstract |
Over the last 20 years, personal
computers and networking infrastructures have greatly
enhanced the working environment and communication
of researchers. Also, Linux clusters now flourish,
providing significant resources for executing parallel
applications. But there is still a gap between
desktop environments of individuals and the wide
assortment of Unix-flavored HPC system environments.
Grid technologies are delivering tools to bridge
this gap, especially between HPC systems; but the
difficulty of implementing the infrastructure software
(installation, configuration, etc.) has discouraged
adaptation of
grid software at the desktop level. Hence, users who employ long-running
parallel applications in their research still log into a grid-enabled
machine to submit batch jobs and manipulate data within a grid.
An infrastructure model adapted by the Computational Chemistry
Grid (CCG)1 eliminates dependence on grid software at the desktop,
is based on the need to run chemistry applications on HPC systems,
and uses a “client” interface for job submission.
A middleware server with grid software components is employed
to handle the deployment and scheduling of jobs and resource
management transparently. This same infrastructure can be used
to implement other client/server paradigms requiring pre- and
post-processing of application data on the desktop, and application
execution on large high-performance computing (HPC) systems,
as well as small departmental (Linux) clusters. This paper describes
the structure and implementation of the CCG infrastructure and
discusses its adaptation to other client/server application needs. |
|
|
Title |
A
Resource Management System for Adaptive Parallel
Applications in Cluster Environments |
Author(s) |
Sheikh
Ghafoor |
Author Inst. |
Mississippi State University |
Presenter |
Sheikh Ghafoor |
Abstract |
Adaptive parallel applications
that can change resources during execution, promise
better system utilization, increased application
performance, and furthermore, they open the opportunity
for developing a new class of parallel applications
driven by unpredictable data and events, capable
of amassing huge resources on demand. This paper
discusses requirements for a resource management
system to support such applications including communication
and negotiation of resources. To schedule adaptive
applications, interaction between the applications
and the resource management system is necessary.
While managing adaptive applications is a multidimensional
complex research problem, this paper focuses only
on support that a RMS requires to accommodate adaptive
applications. An early prototype implementation
shows that scheduling of adaptive applications
is possible in a cluster environment and the overhead
of management of applications is low compared to
the long running time of typical parallel applications.
The prototype implementation supports a variety
of adaptive parallel applications in addition to
rigid parallel applications. |
| |
Applications
Papers II:
|
Title |
Large
Scale Simulations in Nanostructures with NEMO3-D
on Linux Clusters |
Author(s) |
Marek Korkusinski, Faisal Saied,
Haiying Xu, Seungwon Lee, Mohamed Sayeed, Sebastien
Goasguen, and Gerhard Klimeck |
Author Inst |
Purdue University, USA |
Presenter |
Faisal Saied |
Abstract |
NEMO3D is a quantum mechanical
based simulation tool created to provide quantitative
predictions for nanometer-scaled semiconductor
devices. NEMO3D computes strain field using an
atomistic valence force field method and electronic
quantum states using an atomistic tight-binding
Hamiltonian. Target applications for NEMO3D include
semiconductor quantum dots and semiconductor quantum
wires. The atomistic nature of the model, and the
need to go to 100 million atoms and beyond, make
this code computationally very demanding. High
performance computing platforms, including large
Linux clusters, are indispensable for this research.
The key features of NEMO3D, including the underlying
physics model, the application domains, the algorithms
and parallelization have been described in detail
previously [1-3],. NEMO3D has been developed
with Linux clusters in mind, and has been ported
to a number of other HPC platforms. Also, a sophisticated
graphical user interface is under development
for NEMO3D. This work is a part of a wider project,
the NSF Network for Computational Nanotechnology
(NCN) and the full paper will include more details
on that project (http://nanohub.org/ ).
The main goal of this paper is to present new
capabilities that have been added to NEMO3D to
make it one of the premier simulation tools for
design and analysis of realistically sized nanoelectronic
devices. These recent advances include algorithmic
refinements, performance analysis to identify
the best computational strategies, and memory
saving measures. The combined effect of these
enhancements is the ability to increase the strain
problem size from about 20 to 64 million atoms
and the electronic state calculation from 0.5
to 21 million atoms. These two computational
domains correspond to physical device domain
of around 15x298x298 nm 3 and 15x178x178 nm 3
, large enough to consider realistic components
of a nano-structured array with imperfections
and irregularities. The key challenges are the
reduction of the memory footprint to allow the
initialization of large systems and numerically
the extraction of interior, degenerate eigenvectors. |
|
|
Title |
High
Performance Algorithms for Scalable Spin-Qubit
Circuits with Quantum Dots |
Author |
John Fettig |
Author Inst |
NCSA/University of Illinois at
Urbana-Champaign, USA |
Presenter |
John
Fettig |
Abstract |
This report details improvements
made on a code used for computer assisted design
(CAD) of scalable spin-qubit circuits based on
multiple quantum dots. It provides a brief scientific
framework as well as an overview of the physical
and numerical model. Then modifications¯cations
and improvements to the code based on utilization
of PETSc are listed. Then new and old codes are
benchmarked on three NCSA computer clusters. The
speed-up of the code is considerable: about 10
times for the eigenvalue solver and 2 times for
the Poisson equation solver. An example of code
application towards quantum dot modeling is also
given. Finally, conclusions and recommendations
for future work are provided. |
|
|
Title |
Parallel
Multi-Zone Methods for Large-Scale Multidisciplinary
Computational Physics Simulations |
Author |
Ding Li, Guoping Xia, and Charles
L. Merkle |
Author Inst |
Purdue University, USA |
Presenter |
Ding
Li |
Abstract |
A parallel multi-zone method for
the simulation of large-scale multidisciplinary
applications involving field equations from multiple
branches of physics is outlined. The equations
of mathematical physics are expressed in a unified
form that enables a single algorithm and computational
code to describe problems involving diverse, but
closely coupled, physics. Specific sub-disciplines
include fluid and plasma dynamics, electromagnetic
fields, radiative energy transfer, thermal/mechanical
stress and strain distributions and conjugate heat
transfer in solids. Efficient parallel implementation
of these coupled physics must take into account
the different number of governing field equations
in the various physical zones and the close coupling
inside and between regions. This is accomplished
by implementing the unified computational algorithm
in terms of an arbitrary grid and a flexible data
structure that allows load balancing by sub-clusters.
Capabilities are demonstrated by a trapped vortex
liquid spray combustor, an MHD power generator,
combustor cooling in a rocket engine and a pulsed
detonation engine-based combustion system for a
gas turbine. The results show a variety of interesting
physical phenomena and the efficacy of the computational
implementation. |
|
|
Applications
Papers III: Performance Measurement
|
Title |
PerfSuite:
An Accessible, Open Source Performance Analysis
Environment for Linux Development and Performance |
Author(s) |
Rick Kufrin |
Author Inst |
NCSA/University of Illinois at
Urbana-Champaign, USA |
Presenter |
Rick
Kufrin |
Abstract |
The motivation, design, implementation,
and current status of a new set of software tools
called PerfSuite that is targeted to performance
analysis of user applications on Linux based systems
is described. The primary emphasis of these tools
is ease of use/deployment, and portability/reuse,
both in implementation details as well as in data
representation and format. After a year of public
beta availability and production deployment on
Linux clusters that rank among the largest-scale
in the country, PerfSuite is gaining acceptance
as a user-oriented and flexible software tool set
that is as valuable on the desktop as it is on
leading-edge terascale clusters. |
|
|
Title |
Development
and Performance Analysis of a Simulation-Optimization
Framework on TeraGrid Linux Clusters Characteristics |
Author(s) |
Baha Y. Mirghani, Derek A. Baessler,
Ranji S. Ranjthan, Michael E. Tryby, Nicholas Karonis,
Kumar G. Mahinthakumar |
Author Inst. |
North Carolina State, USA |
Presenter |
Baha
Y. Mirghani |
Abstract |
A Large Scale Simulation Optimization
(LASSO) framework is being developed by the authors.
Linux clusters are the target platform for the
framework, specifically cluster resources on the
NSF TeraGrid. The framework is designed in a modular
fashion that simplifies coupling with simulation
model executables, allowing application of simulation
optimization approaches across problem domains.
In this paper the performance of the LASSO framework
is coupled with a parallel groundwater transport
simulation model. Performance is measured using
a source history reconstruction problem and benchmarked
against an existing MPI based implementation developed
previously. Performance results indicate that communication
overhead in the LASSO framework is contributing
significantly to wall times. The authors purpose
and will conduct several performance optimizations
designed to ameliorate the problem. |
|
|
Title |
Optimizing
Performance on Linux Clusters Using Advanced
Communication Protocols: Achieving Over 10 Teraflops
on an 8.6 Teraflops Linpack-Rated Linux Cluster |
Author(s) |
Manojkumar Krishnan |
Author Inst. |
Pacific Northwest National Laboratory,
USA |
Presenter |
Manojkumar
Krishnan |
Abstract |
Advancements in high-performance
networks (Quadrics, Infiniband or Myrinet) continue
to improve the efficiency of modern clusters. However,
the average application efficiency is as small
fraction of the peak as the system’s efficiency.
This paper describes techniques for optimizing
application performance on Linux clusters using
Remote Memory Access communication protocols. The
effectiveness of these optimizations is presented
in the context of an application kernel, dense
matrix multiplication. The result was achieving
over 10 teraflops on HP Linux cluster on which
LINPACK performance is measured as 8.6 teraflops. |
| |
|
Applications
Papers IV: Applications, Visualization
|
Title |
Scalable
Visualization Clusters Using Sepia Technology |
Author(s) |
Jim Kapadia, Glenn Lupton, Steve
Briggs |
Author Inst |
Hewlett Packard Corporation, USA |
Presenter |
Jim Kapadia |
| Abstract |
The advent of low cost commodity
components and open source software have made it
possible to build Linux computing clusters with
scalability and performance that put them in the
class of supercomputers. As Linux computing clusters
become more prevalent, they are increasingly being
used to solve challenging scientific and technical
problems. Users are now inundated with vast amounts
of data needing visualization for better understanding
and thus insight. Visualization systems need to
keep up with advances in Linux cluster-based computing
systems.
In this presentation, we will describe the SEPIA
visualization system architecture, which leverages
Linux clusters and industry standard commodity
components such as PC class systems, graphic
cards, and Infiniband interconnect. The original
technology was developed in conjunction with
US DOE ASCI program. Early versions of SEPIA
systems have been deployed at several sites worldwide. |
|
|
Title |
The
Case for an MPI ABI |
Author(s) |
Greg Lindahl |
Author Inst |
Pathscale |
Presenter |
Greg Lindahl |
Abstract |
MPI is a successful API (applications
programming interface) for parallel programming.
As an API, there is maximum freedom for library
implementors, but recompilation is needed to move
from one implementation to another. In a world
where most users compile their own codes, the fact
that you usually need to recompile to run on a
different machine is not a problem.
Now that MPI has become very popular, two situations
don't fit this model. The first is open-source
codes for which most users don't typically compile
the application. The second is commercial codes.
The first situation makes codes less usable if
a domain expert (non computer scientist) has
to figure out how to build the code. The second
situation means that portability is limited.
ISVs (independent software vendors) in particular
are typically choosing to only test and support
1 MPI implementation, which means that only a
limited number of today's high-speed cluster
interconnects and cluster environments are supported.
Large, free applications such as MM5 (meso-scale
weather model) and NWChem (description?), which
are often not modified by their users, cannot
be distributed in binary form because a large
number of different executables would be needed.
This is annoying to most MM5 and NWChem users.
Several vendors have offered to solve this problem
by selling widely-portable MPI implementations
which support a wide variety of systems, without
requiring recompilation or relinking. Such vendors
include Scyld, Scali, Verari, HP, and Intel.
However, no one of these implementations seems
likely to become universal, and each only supports
a limited number of cluster interconnects. Not
only does this make the recompile issue worse,
but it inhibits the success of new interconnect
hardware.
An alternative approach is to create an ABI
-- an application BINARY interface -- for MPI.
An ABI would allow applications to run on the
widest variety of interconnects and MPI implementations
without relinking or recompiling.
An ABI would need to standardize items which
are not standardized in the MPI API. This would
actually increase application and test portability,
and would also improve the quality of MPI implementations.
I suspect (hope?) that the main barrier to writing
an ABI is social. The investment of an MPI implementation
to implement the ABI is modest compared to the
cost of implementing all of MPI. However, projects
which don't feel an ABI is important are unlikely
to spend the effort.
An ABI is not a completely solution to the ISV/precompiled
software issue. Testing issues would likely limit
ISV enthusiasm for supporting their applications
on untested interconnects and MPI implementations.
However, such testing would be much more convenient
than it is today, and could be reasonably automated.
Testing could also be reasonably done by customers. |
| |
|
Systems
Track Abstracts:
|
Systems Papers
I: Reliability
|
Title |
Towards
More Reliable Commodity Clusters: A Software-Based
Approach at Run Time |
Author |
Chung-Hsing Hsu |
Author Inst |
Los Alamos National Laboratory
, USA |
Presenter |
Chung-Hsing
Hsu |
Abstract |
Though the high-performance computing
community continues to provide better and better
support for Linux-based commodity clusters, cluster
end-users and administrators have become more cognizant
of the fact that large-scale commodity clusters
fail quite frequently. The main source of these
failures is hardware (e.g., disk storage, processors,
and memory) with the primary cause being heat.
This situation is expected to worsen as we venture
forth into a new millennium with even larger-scale
clusters powered by faster (and/or multi-core)
processors.
In general, a faster processor consumes more
energy and dissipates more heat. Having thousands
of such processors complicates the air flow pattern
of the heat dissipated by these processors. Consequently,
cluster builders must resort to exotic cooling
and fault tolerant technologies and facilities
to ensure that the cluster stays cool enough
so that it is not perpetually failing. We consider
this approach to cluster reliability as being
a reactive one. In contrast, we propose a complementary
approach that more proactively addresses reliability
by more intelligently dealing with power and
cooling issues before they become an issue. Our
preliminary experimental work demonstrates that
our approach can easily be applied to commodity
processors and can reduce heat generation by
30% on average with minimal eeffect on performance
when running the SPEC benchmarks. |
|
|
Title |
Towards
Cluster Serviceability |
Author |
Box Leangsuksun1, Anand
Tikotekar1, Makan Pourzandi2,
and Ibrahim Haddad2 |
Author Inst |
1Louisiana Tech University
and 2Ericcson Research Canada |
Presenter |
Box
Leangsuksun |
Abstract |
This paper propounds an investigation,
a feasibility study, and performance benchmarking
of vital management elements for critical enterprise
and HPC infrastructure. We conduct a proof-of-concept
of integrating high availability cluster mechanism
with a secure cluster infrastructure. Our proposed
architecture incorporates the Distributed Security
Infrastructures (DSI) framework, an open source
project providing secure infrastructure for carrier
grade clusters and HA-OSCAR, an open source Linux
cluster framework that meets the Reliability, Availability,
Serviceability (RAS) needs. The result is a cluster
infrastructure that is compliant with the Reliability,
Analyzability, Serviceability and Security (RASS)
principles. We conducted an initial feasibility
study and experiment to gauge issues and the degree
of success in the implementation of our proposed
RASS framework. We verified the integration of
HA-OSCAR release 1.0 and DSI release 0.3. Although
there was a minimal performance overhead, having"RASS" in
the mission critical settings by far outweighs
the performance impact. We plan to further our
proof-of-concept architecture to suit the required
needs on the production environments. |
|
|
Title |
Defining
and Measuring Supercomputer Reliability, Availability,
and Serviceability (RAS) |
Author |
Jon Stearley |
Author Inst |
Sandia National Laboratories, USA |
Presenter |
Jon
Stearley |
Abstract |
The absence of agreed definitions
and metrics for supercomputer RAS obscures meaningful
discussion of the issues involved and hinders their
solution. This paper provides a survey of existing
practices, and proposes standardized definitions
and measurements. These are modeled after the SEMI-E10
specification which is widely used in the semiconductor
manufacturing industry. |
| |
|
Systems Papers
II: File Systems
|
Title |
Active
Storage Processing in a Parallel File System |
Author |
Evan J. Felix, Kevin Fox, Kevin
Regimbal, Jarek Nieplocha |
Author Inst |
Pacific Northwest National Laboratory,
USA |
Presenter |
Jarek
Nieplocha |
Abstract |
This paper proposes an extension
of the traditional active disk concept by applying
it to parallel file systems deployed in modern
clusters. Utilizing processing power of the disk
controller CPU for processing of data stored on
the disk has been proposed in the previous decade.
We have extended and deployed this idea in context
of storage servers of a parallel file system, where
substantial performance benefits can be realized
by eliminating the overhead of data movement across
the network. In particular, the proposed approach
has been implemented and tested in context of Lustre
parallel file system used in production Linux clusters
at PNNL. Furthermore, our approach allows active
storage application code to take advantage of modern
multipurpose operating Linux rather than a restricted
custom OS used in the previous work. Initial experience
with processing very large volume of bioinformatics
data validate our approach and demonstrate the
potential value of the proposed concept. |
|
|
Title |
Shared
Parallel Filesystems in Heterogeneous Linux Multi-Cluster
Environments |
Author |
Jason Cope1, Michael
Oberg1, Henry M. Tufo2, and
Matthew Woitaszek1 |
Author Inst |
1University of Colorado-Boulder
and 2National Center for Atmospheric
Research, USA |
Presenter |
Matthew
Woitaszek |
Abstract |
In this paper, we examine parallel
filesystems for shared deployment across multiple
Linux clusters running with different hardware
architectures and operating systems. Specifically,
we deploy GPFS, Lustre, PVFS2, and TerraFS in our
test environment containing Intel Xeon, Intel x86-64,
and IBM PPC970 systems. We comment on the recent
feature additions of each filesystem, describe
our implementation and configuration experiences,
and present initial performance benchmark results.
Our analysis shows that all of the parallel filesystems
outperform a legacy NFS system, but with different
levels of complexity. Lustre provides the best
performance but requires the most administrative
overhead. Three of the systems – GPFS, Lustre,
and TerraFS – depend on specific kernel versions
that increase administrative complexity and reduce
interoperability. |
|
|
Title |
Lustre:
Is It Ready for Prime Time? |
Author |
Steve Woods |
Author Inst |
MCNC-GCNS, USA |
Presenter |
Steve Woods |
Abstract |
When dealing with large number
of linux nodes in the HPC cluster market, one area
that sometimes gets overlooked is the area of shared
space among the nodes. This shared disk space can
be divided into several areas which might include:
- User $HOME space
- Shared application space
- Data Grid space
- Backup/data migration space
- Shared high speed scratch/tmp space
The decision of what technique to use for the
various forms of shared space can be determine
by any of a number of requirements. For example,
if a site has only 32 linux nodes and disk activity
is at a minimum, then a NFS mounted shared area
coming from the head node or a separate node
that has a disk raid attached might be sufficient.
But what about those situations where the HPC
cluster might be hundreds of nodes which require
hundreds of megabytes or gigabytes per second
performance. In these situations a NFS mounted
file system from a head node would not be sufficient.
Other techniques would need to be looked at to
provide a global file system. Currently there
are several products out there. For example,
IBRIX, PVFS, GPFS, Sistina GFS, SpinServer, and
Lustre just to name a few. Each has it s strengths
and weaknesses. The one we plan to concentrate
on is Lustre.
In our presentation on Lustre we plan to cover
the following topics:
- Intro – brief description of Lustre
and its concepts
- Current hardware configuration and philosophy
- Current software configuration and its philosophy
- Site experience – ease or difficulty
of use, and reliability
- Performance – OST to OSC performance,
NFS exported performance
- Conclusion – is it ready and which
form of shared space does it best fit in
In this presentation it is hoped that sites
that are not intimately experienced with Lustre
can get a sense as to whether or not Lustre is
worth investigating for their site. As for experienced
sites, it could be a place to compare notes and
experiences. |
| |
|
Systems Papers
III: Cluster Management Systems
|
Title |
Concept
and Implementation of CLUSTERIX: National Cluster
of Linux Systems |
Author |
Roman Wyrzykowski1,
Norbert Meyer2, and Maciej Stroinski2 |
Author Inst. |
1Czestochowa University
of Technology, 2Poznan Supercomputing
and Networking Center |
Presenter |
Roman
Wyrzykowski |
Abstract |
This paper presents the concept
and implementation of the National Cluster of Linux
Systems (CLUSTERIX) - a distributed PC-cluster
(or metacluster) of a new generation, based on
the Polish Optical Network PIONIER. Its implementation
makes it possible to deploy a production Grid environment,
which consists of local PC-clusters with 64- and
32-bit Linux machines, located in geographically
distant independent centers across Poland. The
management software (middleware) developed as Open
Source allows for dynamic changes in the metacluster
configuration. The resulting system will be tested
on a set of pilot distributed applications developed
as a part of the project. The project is being
implemented by 12 Polish supercomputing centers
and metropolitan area networks. |
|
|
Title |
HA-Rocks:
A Cost-Effective High-Availability System for Rocks-Based
Linux HPC Cluster |
Author |
Tong Liu, Saeed Iqbal, Yung-Chin
Fang, Onur Celebioglu, Victor Masheyakhi, and Reza
Rooholamini |
Author Inst. |
Dell, USA |
Presenter |
Tong
Liu |
Abstract |
Commodity Beowulf clusters are
now an established parallel and distributed computing
paradigm due to their attractive price/performance.
Beowulf clusters are increasing being used in environments
requiring improved fault tolerance and high availability.
From the fault tolerance prospect, the traditional
Beowulf cluster architecture has a single master
node which creates a single point of failure (SPOF)
in the system. Hence to meet the high availability
requirements enhancements to the management system
are critical. In this paper we propose such enhancements
based on the commonly used Rocks management system,
we call it high-availability Rocks (HA-Rocks).
HA-Rocks is sensitive to the level of failure and
provides mechanisms for graceful recovery to a
standby master node. We also discuss the architecture
and failover algorithm of HA-Rocks. Finally, we
evaluate failover time under HA-Rocks. |
|
|
| Title |
A
Specialized Approach for HPC System Software |
Author |
Ron Brightwell1, Suzanne
Kelly1, and Arthur B. Maccabe2 |
Author Inst. |
1Sandia National Laboratories, 2University
of New Mexico |
Presenter |
Ron Brightwell |
Abstract |
Arthur B. Maccabe
University of New Mexico
This technical presentation will describe our architecture for
scalable, high performance system software. The system software
architecture that we have developed is a vital component of a
complete system. System software is an important area of optimization
that directly impacts application performance and scalability,
and one that also has implications beyond performance. System
software not only impacts the ability of the machine to deliver
performance to applications and allow scaling to the full system
size, but also has secondary effects that can impact system reliability
and robustness. The following present an overview of our system
software architecture and provide important details necessary
to understand how this architecture impacts performance, scalability,
reliability, and usability. We discuss examples of how our architecture
addresses each of these areas and present reasons that we have
chosen this specialized approach. We conclude with a discussion
of the specifics of the implementation of this software architecture
for the Sandia/Cray Red Storm system. 2. The Puma Operating System |
| |
|
Systems Papers
IV: Monitoring and Detection
|
Title |
Deploying
LoGS to Analyze Console Logs on an IBM JS20 |
Author |
James E. Prewett |
Author Inst. |
NPC@UNM, USA |
Presenter |
James
E. Prewett |
Abstract |
In early 2005, The Center for High
Performance Computing at The University of New
Mexico acquired an IBM JS20[2] that has been given
the name “Ristra”. The hardware consists
of 96 blades, each with two 1.6 GHz PowerPC 970
processors and 4 GBs of RAM. The blades are housed
in 7 IBM BladeCenter chassis. Myrinet is used as
the high–speed, low–latency, interconnect.
Included in the machine’s components is a
management node which is an IBM x335 system.
This system uses the Xcat cluster management
software[3]. The system was con-figured so that
a management node would collect all of the system
logs as well as all of the output to the console
of the individual blades. Unfortunately, this
logging of the blades’ console output put
quite a heavy load on the system, especially
the disk. It was our goal to also monitor scheduler
logs on this machine by mounting the PBS MOM
log directory via NFS to each of the blades.
It seemed this would be quite problematic if
we could not reduce the load on the administrative
node.
Under certain conditions, the console logs were
growing especially quickly. Some-times the flood
of messages indicated an error with the hardware
or software on the blades themselves, or with
that used to gather and store the console output
from the blades. In other cases, the output seemed
to indicate normal operation of the monitoring
infrastructure of the machine. In either case,
the output was rather verbose and was being written
to disk. This was putting a heavy load on the
disk sub–system.
In order to solve the problem presented by these
log files, we decided to replace the log files
(in /var/log/consoles/) on disk with FIFO files
that could be monitored by a log analysis tool.
We decided to use LoGS as the tool to monitor
these log FIFO files as it is capable of finding
important messages and reacting to them. LoGS
was then used to filter out innocuous messages,
store important messages to files on disk, and
react to certain conditions that it could repair
without human intervention. |
|
|
Title |
Detection
of Privilege Escalation for Linux Cluster Security |
Author |
Michael Treaster, Xin Meng, William
Yurcik, Gregory A. Koenig |
Author Inst. |
NCSA/University of Illinois at
Urbana-Champaign, USA |
Presenter |
Michael
Treaster |
Abstract |
Cluster computing systems can be
among the most valuable resources owned by an organization.
As a result, they are high profile targets for
attackers, and it is essential that they be well-protected.
Although there are a variety of security solutions
for enterprise networks and individual machines,
there has been little emphasis on securing cluster
systems despite their great importance.
NVisionCC is a multifaceted security solution
for cluster systems built on the Clumon cluster
monitoring infrastructure. This paper describes
the component responsible for detecting unauthorized
privilege escalation. This component enables
security monitoring software to detect an entire
class of attacks in which an authorized local
user of a cluster is able to improperly elevate
process privileges by exploiting some kind of
software vulnerability. Detecting this type of
attack is one of many facets in an all-encompassing
cluster security solution. |
| |
|
Systems Papers
V: Hardware and Systems
|
Title |
Performance
of Two-Way Opteron and Xeon Processor-Based Servers
for Scientific and Technical Applications |
Author |
Douglas Pase and James Stephens |
Author Inst |
IBM, USA |
Presenter |
Douglas
Pase |
Abstract |
There are three important characteristics
that affect the performance of Linux clusters used
for High-Performance Computation (HPC) applications.
Those characteristics are the performance of the
Arithmetic Logic Unit (ALU) or processor core,
memory performance and the performance of the high-speed
network used to interconnect the cluster servers
or nodes. These characteristics are themselves
affected by the choice of processor used in the
server. In this paper we compare the performance
of two servers that are typical of those used to
build Linux clusters. Both are two-way servers
based on 64-bit versions of x86 processors. The
servers are each packaged in a 1U (1.75 inch high)
rack-mounted chassis. The first server we describe
is the IBM® eServer™ 326, based on the
AMD Opteron™ processor. The second is the
IBM xSeries™ 336, based on the Intel® EM64T
processor. Both are powerful servers designed and
optimized to be used as the building blocks of
a Linux cluster that may be as small as a few nodes
or as large as several thousand nodes.
In this paper we describe the architecture and
performance of each server. We use results from
the popular SPEC CPU2000 and Linpack benchmarks
to present different aspects of the performance
of the processor core. We use results from the
STREAM benchmark to present memory performance.
Finally, we discuss how characteristics of the
I/O slots affect the interconnect performance,
whether the choice is Gigabit Ethernet, Myrinet,
InfiniBand, or some other interconnect. |
|
|
Title |
A
First Look at BlueGene/L |
Author |
Martin Margo, Christopher Jordan,
Patricia Kovatch, and Phil Andrews |
Author Inst. |
San Diego Supercomputer Center,
USA |
Presenter |
Martin Margo |
Abstract |
An IBM BlueGene/L machine achieved
70.72 TeraFlops on the November 2004
Top 500 list to become the most powerful computer in the world
as measured by the list.
This new machine has “slow” but numerous processors
along with two high bandwidth
interconnection networks configurable in different topologies.
This machine is optimized
for specific data-intensive simulations, modeling and mining
applications. The San
Diego Supercomputer Center (SDSC) recently installed and configured
a single rack
BlueGene/L system. This system has a peak speed of 5.7 TeraFlops
with its 2048 700
MHz processors and 512 GB of memory and a Linpack measurement
of 4.6 TeraFlops.
SDSC specifically configured this machine to have the maximum
number of I/O nodes to
provide the best performance for data-intensive applications.
In this paper, we discuss
how BlueGene/L is configured at SDSC, the system management and
user tools and our
early experiences with the machine. |
|
|
Title |
Deploying
an IBM e1350 Cluster |
Author |
Aron Warren |
Author Inst. |
HPC@UNM, USA |
Presenter |
Aron Warren |
Abstract |
UNM HPC in it's technical presentation
will describe their experience in bringing online
a high density 126 node dual processor IBM E1350
cluster (BladeCenter JS20 + Myrinet 2000) as the
first large cluster of it's type deployed to academia
within the US (Barcelona Supercomputing Center's
MareNostrum cluster achieved #4 in the 2004 TOP500
list). In depth description of the challenges in
landing a high density cluster will be discussed
along with the problems encountered in deploying
and managing a cluster of this type including new
compilers and software stacks. Highlights of the
successes made with this clustering environment
will also be shown. |
| |
|
Systems Papers
VI: Systems and Experiences
|
Title |
To
InfiniBand or Not InfiniBand, One Site's Perspective |
Author |
Steve Woods |
Author Inst. |
MCNC-GCNS, USA |
Presenter |
Steve Woods |
Abstract |
When dealing in the world of high-performance
computing where the source for user computational
needs are being met by clusters of Linux-based
systems, the situation often arises as to what
interconnect to use in connecting dozens if not
hundreds of systems together. Generally the answer
results in the asking of a few basic questions.
- Do the applications being run require multiple
nodes i.e. parallel
- How much communication is being done by the
applications
- What are the sizes of the messages being
passed between processes
- And the tough one, what can you afford
Almost all Linux-based nodes come with gigabit
Ethernet. For many applications the performance
of gigabit Ethernet would be sufficient especially
when accompanied with a good quality switch.
But what about those situations where communication
latency becomes important, as well as, scalability
and overall wall clock performance. In these
situations it may become necessary to investigate
other forms of interconnect for the cluster.
Unfortunately, most of the choices for low latency
and high bandwidth interconnects are proprietary.
Interconnects like Myrinet and Quadrics. Both
very good products with varying degrees of performance
and cost associated with them. In more recent
years Infiniband has come and gone and come back
into the spotlight again. But the question is,
is it here to stay?
What we plan to present is our experience with
Infiniband. Starting with just a few nodes on
a small switch and later expanding to encompass
all nodes of our 64 node cluster plus support
nodes. What is planned to be covered is the following:
- Brief overview of Infiniband and it’s
protocols
- Our current configuration which includes
not only native Infiniband, but also
gateways to gigabit Ethernet
- Base performance looking at both bandwidth
and latency for various MPI
message sizes
- Application performance, what type of improvement
has been seen and how
Infiniband affected scaling of certain applications.
- Future – where we see Infiniband and
it’s usage going
- Conclusion – Finally our opinion of
Infiniband and how valuable it is in the HPC
marketplace
From this presentation, it is hoped that sites
who are considering alternate forms of
connecting cluster nodes other than Ethernet might gain some
insight as to the viability of Infiniband as an interconnect
media. |
|
|
Title |
The
Road to a Linux-Based Personal Desktop Cluster |
Author |
Wu-Chun Feng |
Author Inst. |
Los Alamos National Laboratory,
USA |
Presenter |
Wu-Chun Feng |
Abstract |
The proposed talk starts with background
information on how unconstrained power consumption
has led to the construction of highly inefficient
clusters with increasingly high failure rates.
By appropriately and transparently constraining
power consumption via a cluster-based, power-management
tool, we created the super-efficient and highly
reliable Green Destiny cluster 1 that debuted three
years ago. Since then, Green Destiny has evolved
in two different directions: (1) architecturally
into a low-power Orion Multisystems DT-12 personal
desktop cluster (October 2004) and (2) via adaptive
system software into a power-aware CAFfeine desk-side
cluster based on AMD quad-Opteron compute nodes
(November 2004 at SC2004). These dual paths of
evolution and their implications would be elaborated
upon in this talk. |
|
|
Title |
What
Do Mambo, VNC, UML and Grid Computing Have in Common? |
Author |
Sebastien Goasguen, Michael McLennan,
Gerhard Klimeck, and Mark S. Lundstrom |
Author Inst. |
Purdue University, USA |
Presenter |
Sebastien Goasguen |
Abstract |
The Network for Computation Nanotechnology
(NCN) operates a web computing application portal
called the nanoHUB. This portal serves thousands
of users from the nanotechnology community, ranging
from students to experienced researchers, using
both academic and industrial applications. The
content of the nanoHUB is highly dynamic and diverse,
consisting of video streams, presentation slides,
educational modules, research papers and on-line
simulations. This paper presents the core components
of the infrastructure supporting the nanoHUB. Committed
to the open source philosophy, the NCN has selected
the Mambo content management system, and uses it
in conjunction with Virtual Network Computing (VNC)
to deliver graphical applications to its users.
On the backend, these applications run on virtual
machines, which provide both a sandbox for the
applications and a consistent login mechanism for
the NCN user base. User Mode Linux (UML) is used
to boot the virtual machines on geographically
dispersed resources , and Local Directory Access
Protocol (LDAP) is used to validate users against
the NCN registry. |
| |
|
Vendor
Presentations
|
|
Title |
Cooling
for Ultra-High Density Racks and Blade Servers |
Author(s) |
Richard Sawyer |
Author Inst |
American Power Conversion (APC),
USA |
Presenter |
Richard Sawyer |
Abstract |
The requirement to deploy high-density
servers within single racks is
presenting data center managers with a challenge; vendors are
now designing
servers which will demand up to 20kW of cooling if installed
in a single
rack. With most data centers designed to cool an average of no
more than
2kW per rack, some innovative cooling strategies are required. Planning
strategies to cope with ultra-high power racks are described
along with
practical solutions for both new and existing data centers. |
|
|
Title |
Bridging
Discovery and Learning at Purdue University through
Distributed Computing |
Author(s) |
Krishna Madhavan |
Author Inst |
Purdue University, USA |
Presenter |
Krishna Madhavan |
Abstract |
While there have been significant
advances in the field of distributed and high performance
computing, the diffusion of these innovations into
day-to-day science, technology, engineering, and
mathematics curricula continues to remain a major
challenge. This presentation focuses on some on-going
initiatives lead by Information Technology at Purdue,
the central IT organization at Purdue University,
to narrow this gap between discovery and learning.
The innovative design and deployment of distributed
computing tools have significant impact on various
pedagogical theories such as problem-based learning
and scaffolding approaches. The presentation will
not only describe these tools and their pedagogical
relevance, but also highlight how they fit in with
the larger vision of promoting the diffusion of
distributed and high performance computing tools
in educational praxis. |
| |
|
|
Title |
AMD's
Road to Dual Core |
Author(s) |
Doug O'Flaherty |
Author Inst |
AMD, USA |
Presenter |
Doug O'Flaherty |
Abstract |
Good architecture is no accident.
With silicon planning horizons out years in advance
of product delivery, early choices can have an
impact on final products. In his presentation,
Douglas O'Flaherty, AMD HPC marketing manager,
covers the architecture choices that enabled AMD's
dual-core processors and how those choices affect
performance. |
|
|
Title |
|
Author(s) |
Patrick Geoffray |
Author Inst |
Myricom, USA |
Presenter |
Patrick
Geoffray |
Abstract |
Available soon. |
|
|