PAPERS |
| Applications |
| title |
Weather
Research and Forecast (WRF) Model: Performance
Analysis on Advanced Multi-core HPC Clusters |
| author(s) |
Gilad Shainer, Mellanox Technologies,
USA |
| presenter |
Gilad Shainer |
| abstract |
The Weather Research and Forecast
(WRF) Model is a fully functioning modeling system
for atmospheric research and operational weather
prediction communities. With an emphasis on efficiency,
portability, maintainability, scalability and productivity,
WRF has been successfully deployed over the years
on a wide variety of HPC clustered compute nodes
connected with high speed interconnects – currently
the most used system architecture for high-performance
computing. As such, understanding WRF dependency
on the various clustering elements, such as the CPU,
interconnects and the software libraries are crucial
for enabling efficient predictions and high productivity.
Our results identify WRF’s communication-sensitive
points and demonstrate WRF’s dependency on
high-speed networks and fast CPU to CPU communication.
Both factors are critical to maintaining scalability
and increasing productivity when adding cluster nodes.
We conclude with specific recommendations for improving
WRF performance, scalability, and productivity as
measured in jobs per day. Because proprietary hardware
and software can quickly erode cluster architecture’s
favorable economics, we will restrict our investigation
to standards based hardware and open source software
readily available to typical research institutions. |
| |
|
| title |
A
Parallel Algorithm for Large, Multi-Scale Simulations
of Liquid/Gas Phase Interfaces |
| author(s) |
Marcus Herrmann, Arizona State University,
USA |
| presenter |
Marcus Herrmann |
| abstract |
This paper will present the development
and performance of a parallel algorithm for large,
multi-scale simulations of liquid/gas phase interface. |
| |
|
| Power and Cooling |
| title |
Towards
Real-World HPC Energy Efficiency and Productivity
Metrics in a Fully Instrumented Datacenter |
| author(s) |
Andres Marquez, Pacific Northwest
National Laboratory, USA |
| presenter |
Andres Marquez |
| abstract |
Towards real-world HPC energy efficiency
and productivity metrics in a fully instrumented
Datacenter. Towards real-world HPC energy efficiency
and productivity metrics in a fully instrumented
Datacenter. |
| |
|
| title |
Cyber-Physical
Autonomic Resource management for High-Performance
Datacenters |
| author(s) |
Georgios Varsamopoulos, Sandeep Gupta,
Arizona State University, USA |
| presenter |
TBD |
| abstract |
Previous research has demonstrated
the potential benefits of thermal aware load placement
and thermal mapping in cool-intensive environments
such as data centers. However, applying existing
techniques has proved difficult to live data centers
because of models that are either unrealistic, or
requiring extensive sensing instrumentation, or their
derivation is disruptive to the data center services.
The work presented in this paper discusses cyberphysical-oriented
techniques and their associated challenges on creating
an adaptive and non-invasive software system to derive
realistic and low-complexity thermal models in an
autonomic manner, using built-in and ambient sensors.
Uses of these techniques can vary from assessing
the thermal efficiency of the data center to designing
a thermalaware job scheduler, thus greening data
centers. Specifically, this paper proposes and evaluates:
i) new thermal models and a sensing software architecture,
and ii) a method to identify relocation of equipment
within a data center room to achieve permanent cost
savings under all scheduling conditions. |
| |
|
| Performance Evaluation |
| title |
Performance
Analysis of the SiCortex SC072 |
| author(s) |
Brian Martin, Andrew Leiker, Douglas
Doerfler, James Laros III, Sandia National Laboratories,
USA |
| presenter |
Brian Martin and Andrew Leiker |
| abstract |
The world of
High Performance Computing (HPC) has seen a ma
jor shift towards commodity clusters in the last
10 years. A new company, SiCortex, has set out
to break this trend. They have created what they
claim to be a balanced cluster which makes use
of low-power MIPS processors and a custom interconnect
in an effort to avoid many of the bottlenecks plaguing
most modern clusters. In this paper, we reveal
the results of preliminary benchmarking of one
of their systems, the SC072. First, we ran a collection
of microbenchmarks to characterize the performance
of interprocessor communication. Next, we ran some
real applications relevant to high performance
computing and compared performance and scalability
to a typical commodity cluster. Lastly, we examine
and compare the performance per watt of the SiCortex
system to a commodity cluster.
A two-year intern at Sandia National
Labs, Brian is currently attending Carnegie Mellon
University studying computer science as a sophomore. |
| |
|
| title |
QP:
A Heterogeneous Multi-Accelerator Cluster |
| author(s) |
Michael Showerman, Wen-Mei Hwu, University
of Illinois, USA; Jeremy Enos, Avneesh Pant, Volodymyr
Kindratenko, Craig Steffen, Robert Pennington, NCSA,
USA |
| presenter |
TBD |
| abstract |
We present a heterogeneous multi-accelerator
cluster developed and deployed at NCSA. The cluster
consists of 16 AMD dual-core CPU compute nodes each
with four NVIDIA GPUs and one Xilinx FPGA. Cluster
nodes are interconnected with both InfiniBand and
Ethernet networks. The software stack consists of
standard cluster tools with the addition of accelerator-specific
software packages and enhancements to the resource
allocation and batch sub-systems. We highlight several
HPC applications that have been developed and deployed
on the cluster. We also present our Phoenix application
development framework that is meant to help with
developing new applications and migrating existing
legacy codes to heterogeneous systems. |
| |
|
| The SC08 Cluster Challenge |
| title |
Windows
HPC Server 2008 at the SC08 Cluster Challenge |
| author(s) |
Benjamin Jimenez, Arizona State University,
USA |
| presenter |
Benjamin Jimenez |
| abstract |
Arizona State University undergraduates,
supported by the ASU High Performance Computing Initiative
and Microsoft Corporation, accepted the Supercomputing
2008 Cluster Challenge. Natalie Freed, Zachary Giles,
Patrick Lu, Benjamin Jimenez, and Richard Wellington
are part of the team developing a small cluster utilizing
Windows HPC Server 2008 on a Cray CX1 blade server.
The focus of the research follows the cluster challenge
goal of showing the power of clusters to harness
open source software to solve interesting problems.
Development of this cluster focused on implementation
of open source scientific applications. Since these
applications normally perform under Unix environments,
porting to Windows was a necessary component of the
research. The significance of the challenge presented
by porting to Windows during the development of the
cluster is included in the discussion. At the final
stage, visualizations created present a clearer understanding
of the open source code output data. |
| |
|
| title |
Bringing
Disruptive Technology to Competition: Purdue and
SiCortex |
| author(s) |
Alexander Younts, Andrew Howard, Preston
Smith, Jeffrey Evans, Purdue University, USA |
| presenter |
TBD |
| abstract |
In November 2008, the second annual
Cluster Challenge competition at the 2008 Supercomputing
conference was held in Austin, Texas. Students from
Purdue University made up one of seven teams of undergraduates
to compete in the competition. Wanting a change of
pace from the commodity hardware of the first challenge,
Purdue teamed up with their vendor SiCortex, Inc.
Students were then given the task of learning scientific
applications POY, OpenFOAM, RAxML, WPP, and GAMESS,
along with the HPC challenge benchmark suite. In
this paper, students from the Purdue Cluster Challenge
team discuss the work done in preparation of the
competition, along with strategies and their effect
on the outcome of the Cluster Challenge. |
| |
|
| title |
Optimizing
Cluster Configuration and Applications to Maximize
Power Efficiency |
| author(s) |
Jupp Müller, Timo Schneider,
Jens Domke, Robin Geyer, Matthias Häsing, Torsten
Hoefler, Stefan Höhlig, Guido Juckeland,
Andrew Lumsdaine, Matthias S. Müller and Wolfgang
E. Nagel |
| presenter |
Jupp Müller and Timo Schneider |
| abstract |
The goal of the Cluster Challenge
is to design, build and operate a compute cluster.
Although it is an artificial environment for cluster
computing, many of its key constraints on operation
of cluster systems are important to real world scenarios:
high energy efficiency, reliability and scalability.
In this paper, we describe our approach to accomplish
these goals.We present our original system design
and illustrate changes to that system as well as
to applications and system settings in order to achieve
maximum performance within the given power and time
limits. Finally we suggest how our conclusions can
be used to improve current and future clusters. |
| |
|
| Perfomance Analysis |
| title |
Experiences
with Managed Hosting of Virtual Machines |
| author(s) |
Dustin Leverman, University of Colorado,
USA; Henry Tufo, Michael Oberg, Matthew Woitaszek,
NCAR, USA |
| presenter |
TBD |
| abstract |
TBD |
| |
|
| title |
Active
Harmony: Getting the Human Out of the Performance
Tuning Loop |
| author(s) |
Jeffrey Hollingsworth, University
of Maryland, USA |
| presenter |
Jeffrey Hollingsworth |
| abstract |
Getting parallel programs to run well
is a difficult, tedious, and time consuming task.
Programmers don't like tuning programs, and eventually
they may not have to. In this talk I will present
a system called Active Harmony that supports automated
tuning of parallel programs. Active Harmony can be
used to automatically tune runtime parameters, and
drive compiler optimization. I will also present
some performance results that show Harmony's auto
tuning providing better results that manual efforts,
and similar performance to exhaustive search of the
parameter space. |
| |
|
| title |
Experiences
in Tuning Performance of Hybrid MPI/Open MP Applications
on Quad-core Systems |
| author(s) |
Ashay Rane, Dan Stanzione, Arizona
State University, USA |
| presenter |
Ashay Rane |
| abstract |
The Hybrid method of parallelization
(using MPI for inter-node communication and OpenMP
for intra-node communication) seems a natural fit
for the way most clusters are built today. It is
generally expected to help programs run faster due
to factors like availability of greater bandwidth
for intra-node communication. Accordingly, this hybrid
paradigm of parallel programming has gained widespread
attention. However, optimizing hybrid applications
for maximum speedup is difficult primarily due to
inadequate transparency provided by the OpenMP constructs
and also due to the dependence of the resulting speedup
on the combination in which MPI and OpenMP is used.
In this paper we list out some of our experiences
in trying to optimize applications built using MPI
and OpenMP. More specifically, we talk about the
different techniques that could be helpful to other
researchers working on hybrid applications. To demonstrate
the usefulness of these optimizations, we provide
results from optimizing a few typical scientific
applications. Using these optimizations, one hybrid
code ran up to 34% faster than pure-MPI code. |
| |
|
| System Software |
| title |
Evaluating
the Shared Root File System Approach for Diskless
High-Performance Computing Systems |
| author(s) |
Christian Engelmann, Hong Ong, Stephen
Scott, Oak Ridge National Laboratory, USA |
| presenter |
Hong Ong |
| abstract |
Diskless high-performance computing
(HPC) systems utilizing networked storage have become
popular in the last several years. Removing disk
drives significantly increases compute node reliability
as they are known to be a major source of failures.
Furthermore, networked storage solutions utilizing
parallel I/O and replication are able to provide
increased scalability and availability. Reducing
a compute node to processor(s), memory and network
interface(s) greatly reduces its physical size, which
in turn allows for large-scale dense HPC solutions.
However, one major obstacle is the requirement by
certain operating systems (OSs), such as Linux, for
a root file system. While one solution is to remove
this requirement from the OS, another is to share
the root file system over the networked storage.
This paper evaluates three networked file system
solutions, NFSv4, Lustre and PVFS2, with respect
to their performance, scalability, and availability
features for servicing a common root file system
in a diskless HPC configuration. Our findings indicate
that Lustre is a viable solution as it meets both,
scaling and performance requirements. However, certain
availability issues regarding single points of failure
and control need to be considered. |
| |
|
| title |
A
Profile Guided Approach to Scheduling in Cluster
and Multi-cluster Systems |
| author(s) |
Arvind Sridhar Dan Stanzione, Arizona
State University, USA |
| presenter |
TBD |
| abstract |
Effective resource management remains
a challenge for large scale cluster computing systems,
as well as for clusters of clusters. Resource management
involves the ability to monitor resource usage and
enforce polices to manage available resources and
provide the desired level of service. One of the
difficulties in resource management is that users
are notoriously inaccurate in predicting the resource
requirements of their jobs. In this research, a novel
concept called `profile guided scheduling' is proposed.
This work examines if the resource requirements of
a given job in a cluster system can be predicted
based on the past behavior of the user's submitted
jobs. The scheduler can use this predicted value
to get an estimate of the job's performance metrics
prior to scheduling and thus make better scheduling
decisions based on the predicted value. In particular,
this approach is applied in a multi-cluster setting,
where the scheduler must account for limited network
bandwidth available between clusters. By having prior
knowledge of a job's bandwidth requirements, the
scheduler can make intelligent co-allocation decisions
and avoid co-allocating jobs that consume high network
bandwidth. This will mitigate the impact of limited
network performance on co-allocated jobs, decreasing
turnaround time and increasing system throughput. |
| |
|
| title |
Parallel
File Systems on High-End Computers |
| author(s) |
Walter Ligon, Clemson University,
USA |
| presenter |
Walter Ligon |
| abstract |
This presentation will cover the state
of the art in parallel file systems today, and discuss
new developments in the PVFS project. |
| |
|
| Data / Grids |
| title |
Addressing
HPC Infrastructure Problems by Pooling HPC Resources:
The Thebes Middleware Consortium |
| author(s) |
Arnie Miles, Georgetown University,
USA |
| presenter |
Arnie Miles |
| abstract |
Outside the world of Nationally funded
supercomputers lies institutions with far great computing
demands then their infrastructure can support. HPC
devices are purchased and shoved into closets, server
rooms run out of power and air conditioning, and
floor space is at a premium. For the researchers
outside this world, be they K-12, higher ed, corporate,
or even government, there are limits to what they
can do. Even if their budgets support huge expenditures,
which is frequently not the case, there is simply
no way to commission huge computational resources
in many cases. Of course, many installed resources
stand idle a large percentage of the time. These
are installed to handle peaks in demand, and the
rest of the time they have little or no load. As
this continues, there continue to be advancements
in network speeds and decreases in network latency.
The Thebes Consortium is poised to discover and build
the middleware stack that will allow the sharing
of resources across administrative domains in a simple,
secure, and scalable manner. There is already demand
for these tools, and this demand will increase exponentially.
Thebes uses the Shibboleth implementation of the
Security Assertion Markup Language to pass user attributes
to service providers. A sample service provider for
Condor was created as part of a demonstration project,
a preproduction example of a Sun Grid Engine service
provider using Shibboleth and DRMAA is already available
for alpha testing. To enable resource discovery,
Thebes is working to harmonize efforts made by Open
Grid Forum, Ganglia, and the various job schedulers
into a single, robust resource and job description
language, which will be used by a Resource Discovery
Network to provide ordered lists of qualified resources
to users wishing to discover service providers capable
of executing computational programs. |
| |
|
| title |
Experiences
with Managed Hosting of Virtual Machines |
| author(s) |
Dustin Leverman, University of Colorado,
USA: Henry Tufo, Michael Oberg, Matthew Woitaszek,
NCAR, USA |
| presenter |
TBD |
| abstract |
Virtual machines (VMs) are emerging
as a powerful paradigm to deploy complex software
stacks on distributed cyberinfrastructure. As a TeraGrid
resource provider and developers of Grid-enabled
software, we experience both the client and system
administrator perspectives of service hosting. We
are using this point of view to integrate and deploy
a managed hosting environment based on VM technology
that fulfills the requirements of both developers
and system administrators. The managed hosting environment
provides the typical base benefits of any VM
hosting technique, but provides additional software
support for developers and system monitoring opportunities
for administrators. This paper presents our experiences
creating and using a prototype managed hosting environment.
The design and configuration of our current
implementation is described, and its feature set
is evaluated from the context of several current
hosted projects. |
| |
|
| title |
A
Framework for Semantic-Based Dynamic Access Control
in Data Grids |
| author(s) |
Anil Pereira, Charles Moseley, Benjamin
VanTreese, David Goree, Karl Kirch (Southwestern
Oklahoma State University, USA); Dennis Ferron (Delta
Dental, Oklahoma City, Oklahoma, USA) |
| presenter |
TBD |
| abstract |
The abundance of large commercial
and scientific data stores have driven the need for
Petascale computation and data integration. Technologies
such as Grid Computing are being developed to address
this need. Grid Computing supports the coordinated
sharing of data and resources among different organizations.
Data Grids focus on the management of data and resources
for analyzing the data. Though Grid computing technologies
have been adopted in many scientific and commercial
sectors, many Security issues have to be resolved
for them to gain wider acceptance. Their true potential
will only be realized by developing secure systems
that can encompass multiple organizations. In this
paper, we consider the security implications of the
dynamic interactions that would occur in Data Grids
and examine the requirements for a comprehensive
security model to support those interactions. We
explain that such a model could be constructed by
enhancing existing dynamic role-based access control
models and semantic-based access control models.
Additionally, we present an enumeration of the security
requirements for such dynamic interactions. Our work
takes into consideration that the co-allocation of
resources and job scheduling in Data Grids should
not only be based on the user’s request, and
the available pool and state of resources, but also
on the user’s access rights, computing environment
and the Security policies of resources. Furthermore,
we discuss the problem of making access control decisions
dynamically during an application’s runtime. |
| |
|
| Tools |
| title |
XGet:
A Highly Scalable and Efficient File Transfer Tool
for Clusters |
| author(s) |
Hugh Greenberg, Latchesar Ionkov,
Los Alamos National Laboratory, USA; Ronald Minnich,
Sandia National Laboratory, USA |
| presenter |
TBD |
| abstract |
Transferring files between nodes in
a cluster is a common occurrence for administrators
and users. As clusters rapidly grow in size, transferring
files between nodes can no longer be solved by the
traditional transfer utilities due to their inherent
lack of scalability. In this paper, we describe a
new file transfer utility called XGet, which was
designed to address the scalability problem of standard
tools. We compared XGet against four transfer tools:
Bittorrent, Rsync, TFTP, and Udpcast and our results
show that XGet's performance is superior to the these
utilities in many cases. |
| |
|
| title |
Characterizing
Parallel Scaling of Scientific Applications Using
IPM |
| author(s) |
Nicholas Wright, San Diego Supercomputer
Center, USA |
| presenter |
Nicholas Wright |
| abstract |
Scientific applications will have
to scale to many thousands of processor cores to
reach petascale. Therefore it is crucial to understand
the factors that affect their scalability. Here we
examine the strong scaling of four representative
codes that exhibit different behaviors on four machines.
We demonstrate the efficiency and analytic power
of our performance monitoring tool, IPM, to understand
the scaling properties of these codes in terms of
their communication and computation components. Our
analysis reveals complexities that would not become
apparent from simple scaling plots; we disambiguate
application from machine bottlenecks and attribute
bottlenecks to communication or computation routines;
we show by these examples how one could generate
such a study of any application and benefit from
comparing the result to the work already done. We
evaluate the prospects for using extrapolation of
results at low concurrencies as a method of performance
prediction at higher concurrencies. |
| |
|
| title |
Performance
Measurement and Analysis Tools for Very Large Clusters |
| author(s) |
Bernd Mohr (Forschungszentrum Juelich
GmbH, Germany) |
| presenter |
Bernd Mohr |
| abstract |
The number of processor cores available
in high-performance computing systems is steadily
increasing. In the latest (Nov 2008) list of the
TOP500 supercomputers, 99.2% of the systems listed
have more than 1024 processor cores and the average
is about 6200. While these machines promise ever
more compute power and memory capacity to tackle
today's complex simulation problems, they force application
developers to greatly enhance the scalability of
their codes to be able to exploit it. To better support
them in their porting and tuning process, many parallel
tools research groups have already started to work
on scaling their methods, techniques, and tools to
extreme processor counts. In this talk, we survey
existing profiling and tracing tools, report on our
experience in using them in extreme scaling environments,
review existing working and promising new methods
and techniques, and discuss strategies for solving
unsolved issues and problems. |
| |
|