PAPERS |
| Applications |
| title |
Weather
Research and Forecast (WRF) Model: Performance Analysis
on Advanced Multi-core HPC Clusters |
| author(s) |
Gilad Shainer, Mellanox Technologies, USA |
| presenter |
Gilad Shainer |
| abstract |
The Weather Research and Forecast (WRF) Model
is a fully functioning modeling system for atmospheric research
and operational weather prediction communities. With an emphasis
on efficiency, portability, maintainability, scalability and
productivity, WRF has been successfully deployed over the years
on a wide variety of HPC clustered compute nodes connected
with high speed interconnects – currently the most used
system architecture for high-performance computing. As such,
understanding WRF dependency on the various clustering elements,
such as the CPU, interconnects and the software libraries are
crucial for enabling efficient predictions and high productivity.
Our results identify WRF’s communication-sensitive points
and demonstrate WRF’s dependency on high-speed networks
and fast CPU to CPU communication. Both factors are critical
to maintaining scalability and increasing productivity when
adding cluster nodes. We conclude with specific recommendations
for improving WRF performance, scalability, and productivity
as measured in jobs per day. Because proprietary hardware and
software can quickly erode cluster architecture’s favorable
economics, we will restrict our investigation to standards
based hardware and open source software readily available to
typical research institutions. |
| |
|
| title |
A
Parallel Algorithm for Large, Multi-Scale Simulations of
Liquid/Gas Phase Interfaces |
| author(s) |
Marcus Herrmann, Arizona State University, USA |
| presenter |
Marcus Herrmann |
| abstract |
This paper will present the development and
performance of a parallel algorithm for large, multi-scale
simulations of liquid/gas phase interface. |
| |
|
| Power and Cooling |
| title |
Towards
Real-World HPC Energy Efficiency and Productivity Metrics
in a Fully Instrumented Datacenter |
| author(s) |
Andres Marquez, Pacific Northwest National Laboratory,
USA |
| presenter |
Andres Marquez |
| abstract |
Towards real-world HPC energy efficiency and
productivity metrics in a fully instrumented Datacenter. Towards
real-world HPC energy efficiency and productivity metrics in
a fully instrumented Datacenter. |
| |
|
| title |
Cyber-Physical
Autonomic Resource management for High-Performance Datacenters |
| author(s) |
Georgios Varsamopoulos, Sandeep Gupta, Arizona
State University, USA |
| presenter |
TBD |
| abstract |
Previous research has demonstrated the potential
benefits of thermal aware load placement and thermal mapping
in cool-intensive environments such as data centers. However,
applying existing techniques has proved difficult to live data
centers because of models that are either unrealistic, or requiring
extensive sensing instrumentation, or their derivation is disruptive
to the data center services. The work presented in this paper
discusses cyberphysical-oriented techniques and their associated
challenges on creating an adaptive and non-invasive software
system to derive realistic and low-complexity thermal models
in an autonomic manner, using built-in and ambient sensors.
Uses of these techniques can vary from assessing the thermal
efficiency of the data center to designing a thermalaware job
scheduler, thus greening data centers. Specifically, this paper
proposes and evaluates: i) new thermal models and a sensing
software architecture, and ii) a method to identify relocation
of equipment within a data center room to achieve permanent
cost savings under all scheduling conditions. |
| |
|
| Performance Evaluation |
| title |
Performance
Analysis of the SiCortex SC072 |
| author(s) |
Brian Martin, Andrew Leiker, Douglas Doerfler,
James Laros III, Sandia National Laboratories, USA |
| presenter |
Brian Martin and Andrew Leiker |
| abstract |
The world of High Performance
Computing (HPC) has seen a ma jor shift towards commodity
clusters in the last 10 years. A new company, SiCortex, has
set out to break this trend. They have created what they
claim to be a balanced cluster which makes use of low-power
MIPS processors and a custom interconnect in an effort to
avoid many of the bottlenecks plaguing most modern clusters.
In this paper, we reveal the results of preliminary benchmarking
of one of their systems, the SC072. First, we ran a collection
of microbenchmarks to characterize the performance of interprocessor
communication. Next, we ran some real applications relevant
to high performance computing and compared performance and
scalability to a typical commodity cluster. Lastly, we examine
and compare the performance per watt of the SiCortex system
to a commodity cluster.
A two-year intern at Sandia National Labs,
Brian is currently attending Carnegie Mellon University studying
computer science as a sophomore. |
| |
|
| title |
QP:
A Heterogeneous Multi-Accelerator Cluster |
| author(s) |
Michael Showerman, Wen-Mei Hwu, University of
Illinois, USA; Jeremy Enos, Avneesh Pant, Volodymyr Kindratenko,
Craig Steffen, Robert Pennington, NCSA, USA |
| presenter |
TBD |
| abstract |
We present a heterogeneous multi-accelerator
cluster developed and deployed at NCSA. The cluster consists
of 16 AMD dual-core CPU compute nodes each with four NVIDIA
GPUs and one Xilinx FPGA. Cluster nodes are interconnected
with both InfiniBand and Ethernet networks. The software stack
consists of standard cluster tools with the addition of accelerator-specific
software packages and enhancements to the resource allocation
and batch sub-systems. We highlight several HPC applications
that have been developed and deployed on the cluster. We also
present our Phoenix application development framework that
is meant to help with developing new applications and migrating
existing legacy codes to heterogeneous systems. |
| |
|
| The SC08 Cluster Challenge |
| title |
Windows
HPC Server 2008 at the SC08 Cluster Challenge |
| author(s) |
Benjamin Jimenez, Arizona State University,
USA |
| presenter |
Benjamin Jimenez |
| abstract |
Arizona State University undergraduates, supported
by the ASU High Performance Computing Initiative and Microsoft
Corporation, accepted the Supercomputing 2008 Cluster Challenge.
Natalie Freed, Zachary Giles, Patrick Lu, Benjamin Jimenez,
and Richard Wellington are part of the team developing a small
cluster utilizing Windows HPC Server 2008 on a Cray CX1 blade
server. The focus of the research follows the cluster challenge
goal of showing the power of clusters to harness open source
software to solve interesting problems. Development of this
cluster focused on implementation of open source scientific
applications. Since these applications normally perform under
Unix environments, porting to Windows was a necessary component
of the research. The significance of the challenge presented
by porting to Windows during the development of the cluster
is included in the discussion. At the final stage, visualizations
created present a clearer understanding of the open source
code output data. |
| |
|
| title |
Bringing
Disruptive Technology to Competition: Purdue and SiCortex |
| author(s) |
Alexander Younts, Andrew Howard, Preston Smith,
Jeffrey Evans, Purdue University, USA |
| presenter |
TBD |
| abstract |
In November 2008, the second annual Cluster
Challenge competition at the 2008 Supercomputing conference
was held in Austin, Texas. Students from Purdue University
made up one of seven teams of undergraduates to compete in
the competition. Wanting a change of pace from the commodity
hardware of the first challenge, Purdue teamed up with their
vendor SiCortex, Inc. Students were then given the task of
learning scientific applications POY, OpenFOAM, RAxML, WPP,
and GAMESS, along with the HPC challenge benchmark suite. In
this paper, students from the Purdue Cluster Challenge team
discuss the work done in preparation of the competition, along
with strategies and their effect on the outcome of the Cluster
Challenge. |
| |
|
| title |
Optimizing
Cluster Configuration and Applications to Maximize Power Efficiency |
| author(s) |
Jupp Müller, Timo Schneider, Jens Domke,
Robin Geyer, Matthias Häsing, Torsten Hoefler, Stefan
Höhlig, Guido Juckeland, Andrew Lumsdaine, Matthias S.
Müller and Wolfgang E. Nagel |
| presenter |
Jupp Müller and Timo Schneider |
| abstract |
The goal of the Cluster Challenge is to design,
build and operate a compute cluster. Although it is an artificial
environment for cluster computing, many of its key constraints
on operation of cluster systems are important to real world
scenarios: high energy efficiency, reliability and scalability.
In this paper, we describe our approach to accomplish these
goals.We present our original system design and illustrate
changes to that system as well as to applications and system
settings in order to achieve maximum performance within the
given power and time limits. Finally we suggest how our conclusions
can be used to improve current and future clusters. |
| |
|
| Perfomance Analysis |
| title |
TBD |
| author(s) |
TBD |
| presenter |
TBD |
| abstract |
TBD |
| |
|
| title |
Active Harmony:
Getting the Human Out of the Performance Tuning Loop |
| author(s) |
Jeffrey Hollingsworth, University of Maryland,
USA |
| presenter |
Jeffrey Hollingsworth |
| abstract |
Getting parallel programs to run well is a difficult,
tedious, and time consuming task. Programmers don't like tuning
programs, and eventually they may not have to. In this talk
I will present a system called Active Harmony that supports
automated tuning of parallel programs. Active Harmony can be
used to automatically tune runtime parameters, and drive compiler
optimization. I will also present some performance results
that show Harmony's auto tuning providing better results that
manual efforts, and similar performance to exhaustive search
of the parameter space. |
| |
|
| title |
Experiences
in Tuning Performance of Hybrid MPI/Open MP Applications
on Quad-core Systems |
| author(s) |
Ashay Rane, Dan Stanzione, Arizona State University,
USA |
| presenter |
Ashay Rane |
| abstract |
The Hybrid method of parallelization (using
MPI for inter-node communication and OpenMP for intra-node
communication) seems a natural fit for the way most clusters
are built today. It is generally expected to help programs
run faster due to factors like availability of greater bandwidth
for intra-node communication. Accordingly, this hybrid paradigm
of parallel programming has gained widespread attention. However,
optimizing hybrid applications for maximum speedup is difficult
primarily due to inadequate transparency provided by the OpenMP
constructs and also due to the dependence of the resulting
speedup on the combination in which MPI and OpenMP is used.
In this paper we list out some of our experiences in trying
to optimize applications built using MPI and OpenMP. More specifically,
we talk about the different techniques that could be helpful
to other researchers working on hybrid applications. To demonstrate
the usefulness of these optimizations, we provide results from
optimizing a few typical scientific applications. Using these
optimizations, one hybrid code ran up to 34% faster than pure-MPI
code. |
| |
|
| System Software |
| title |
Evaluating
the Shared Root File System Approach for Diskless High-Performance
Computing Systems |
| author(s) |
Christian Engelmann, Hong Ong, Stephen Scott,
Oak Ridge National Laboratory, USA |
| presenter |
Hong Ong |
| abstract |
Diskless high-performance computing (HPC) systems
utilizing networked storage have become popular in the last
several years. Removing disk drives significantly increases
compute node reliability as they are known to be a major source
of failures. Furthermore, networked storage solutions utilizing
parallel I/O and replication are able to provide increased
scalability and availability. Reducing a compute node to processor(s),
memory and network interface(s) greatly reduces its physical
size, which in turn allows for large-scale dense HPC solutions.
However, one major obstacle is the requirement by certain operating
systems (OSs), such as Linux, for a root file system. While
one solution is to remove this requirement from the OS, another
is to share the root file system over the networked storage.
This paper evaluates three networked file system solutions,
NFSv4, Lustre and PVFS2, with respect to their performance,
scalability, and availability features for servicing a common
root file system in a diskless HPC configuration. Our findings
indicate that Lustre is a viable solution as it meets both,
scaling and performance requirements. However, certain availability
issues regarding single points of failure and control need
to be considered. |
| |
|
| title |
A
Profile Guided Approach to Scheduling in Cluster and Multi-cluster
Systems |
| author(s) |
Arvind Sridhar Dan Stanzione, Arizona State
University, USA |
| presenter |
TBD |
| abstract |
Effective resource management remains a challenge
for large scale cluster computing systems, as well as for clusters
of clusters. Resource management involves the ability to monitor
resource usage and enforce polices to manage available resources
and provide the desired level of service. One of the difficulties
in resource management is that users are notoriously inaccurate
in predicting the resource requirements of their jobs. In this
research, a novel concept called `profile guided scheduling'
is proposed. This work examines if the resource requirements
of a given job in a cluster system can be predicted based on
the past behavior of the user's submitted jobs. The scheduler
can use this predicted value to get an estimate of the job's
performance metrics prior to scheduling and thus make better
scheduling decisions based on the predicted value. In particular,
this approach is applied in a multi-cluster setting, where
the scheduler must account for limited network bandwidth available
between clusters. By having prior knowledge of a job's bandwidth
requirements, the scheduler can make intelligent co-allocation
decisions and avoid co-allocating jobs that consume high network
bandwidth. This will mitigate the impact of limited network
performance on co-allocated jobs, decreasing turnaround time
and increasing system throughput. |
| |
|
| title |
Parallel
File Systems on High-End Computers |
| author(s) |
Walter Ligon, Clemson University, USA |
| presenter |
Walter Ligon |
| abstract |
This presentation will cover the state of the
art in parallel file systems today, and discuss new developments
in the PVFS project. |
| |
|
| Data / Grids |
| title |
Addressing
HPC Infrastructure Problems by Pooling HPC Resources: The
Thebes Middleware Consortium |
| author(s) |
Arnie Miles, Georgetown University, USA |
| presenter |
Arnie Miles |
| abstract |
Outside the world of Nationally funded supercomputers
lies institutions with far great computing demands then their
infrastructure can support. HPC devices are purchased and shoved
into closets, server rooms run out of power and air conditioning,
and floor space is at a premium. For the researchers outside
this world, be they K-12, higher ed, corporate, or even government,
there are limits to what they can do. Even if their budgets
support huge expenditures, which is frequently not the case,
there is simply no way to commission huge computational resources
in many cases. Of course, many installed resources stand idle
a large percentage of the time. These are installed to handle
peaks in demand, and the rest of the time they have little
or no load. As this continues, there continue to be advancements
in network speeds and decreases in network latency. The Thebes
Consortium is poised to discover and build the middleware stack
that will allow the sharing of resources across administrative
domains in a simple, secure, and scalable manner. There is
already demand for these tools, and this demand will increase
exponentially. Thebes uses the Shibboleth implementation of
the Security Assertion Markup Language to pass user attributes
to service providers. A sample service provider for Condor
was created as part of a demonstration project, a preproduction
example of a Sun Grid Engine service provider using Shibboleth
and DRMAA is already available for alpha testing. To enable
resource discovery, Thebes is working to harmonize efforts
made by Open Grid Forum, Ganglia, and the various job schedulers
into a single, robust resource and job description language,
which will be used by a Resource Discovery Network to provide
ordered lists of qualified resources to users wishing to discover
service providers capable of executing computational programs. |
| |
|
| title |
Experiences
with Managed Hosting of Virtual Machines |
| author(s) |
Dustin Leverman, University of Colorado, USA:
Henry Tufo, Michael Oberg, Matthew Woitaszek, NCAR, USA |
| presenter |
TBD |
| abstract |
Virtual machines (VMs) are emerging as a powerful
paradigm to deploy complex software stacks on distributed cyberinfrastructure.
As a TeraGrid resource provider and developers of Grid-enabled
software, we experience both the client and system administrator
perspectives of service hosting. We are using this point of
view to integrate and deploy a managed hosting environment
based on VM technology that fulfills the requirements
of both developers and system administrators. The managed hosting
environment provides the typical base benefits of any
VM hosting technique, but provides additional software support
for developers and system monitoring opportunities for administrators.
This paper presents our experiences creating and using a prototype
managed hosting environment. The design and configuration
of our current implementation is described, and its feature
set is evaluated from the context of several current hosted
projects. |
| |
|
| title |
A Framework
for Semantic-Based Dynamic Access Control in Data Grids |
| author(s) |
Anil Pereira, Charles Moseley, Benjamin VanTreese,
David Goree, Karl Kirch (Southwestern Oklahoma State University,
USA); Dennis Ferron (Delta Dental, Oklahoma City, Oklahoma,
USA) |
| presenter |
TBD |
| abstract |
The abundance of large commercial and scientific
data stores have driven the need for Petascale computation
and data integration. Technologies such as Grid Computing are
being developed to address this need. Grid Computing supports
the coordinated sharing of data and resources among different
organizations. Data Grids focus on the management of data and
resources for analyzing the data. Though Grid computing technologies
have been adopted in many scientific and commercial sectors,
many Security issues have to be resolved for them to gain wider
acceptance. Their true potential will only be realized by developing
secure systems that can encompass multiple organizations. In
this paper, we consider the security implications of the dynamic
interactions that would occur in Data Grids and examine the
requirements for a comprehensive security model to support
those interactions. We explain that such a model could be constructed
by enhancing existing dynamic role-based access control models
and semantic-based access control models. Additionally, we
present an enumeration of the security requirements for such
dynamic interactions. Our work takes into consideration that
the co-allocation of resources and job scheduling in Data Grids
should not only be based on the user’s request, and the
available pool and state of resources, but also on the user’s
access rights, computing environment and the Security policies
of resources. Furthermore, we discuss the problem of making
access control decisions dynamically during an application’s
runtime. |
| |
|
| Tools |
| title |
XGet:
A Highly Scalable and Efficient File Transfer Tool for Clusters |
| author(s) |
Hugh Greenberg, Latchesar Ionkov, Los Alamos
National Laboratory, USA; Ronald Minnich, Sandia National Laboratory,
USA |
| presenter |
TBD |
| abstract |
Transferring files between nodes in a cluster
is a common occurrence for administrators and users. As clusters
rapidly grow in size, transferring files between nodes can
no longer be solved by the traditional transfer utilities due
to their inherent lack of scalability. In this paper, we describe
a new file transfer utility called XGet, which was designed
to address the scalability problem of standard tools. We compared
XGet against four transfer tools: Bittorrent, Rsync, TFTP,
and Udpcast and our results show that XGet's performance is
superior to the these utilities in many cases. |
| |
|
| title |
Characterizing
Parallel Scaling of Scientific Applications Using IPM |
| author(s) |
Nicholas Wright, San Diego Supercomputer Center,
USA |
| presenter |
Nicholas Wright |
| abstract |
Scientific applications will have to scale to
many thousands of processor cores to reach petascale. Therefore
it is crucial to understand the factors that affect their scalability.
Here we examine the strong scaling of four representative codes
that exhibit different behaviors on four machines. We demonstrate
the efficiency and analytic power of our performance monitoring
tool, IPM, to understand the scaling properties of these codes
in terms of their communication and computation components.
Our analysis reveals complexities that would not become apparent
from simple scaling plots; we disambiguate application from
machine bottlenecks and attribute bottlenecks to communication
or computation routines; we show by these examples how one
could generate such a study of any application and benefit
from comparing the result to the work already done. We evaluate
the prospects for using extrapolation of results at low concurrencies
as a method of performance prediction at higher concurrencies. |
| |
|
| title |
Performance
Measurement and Analysis Tools for Very Large Clusters |
| author(s) |
Bernd Mohr (Forschungszentrum Juelich GmbH,
Germany) |
| presenter |
Bernd Mohr |
| abstract |
The number of processor cores available in high-performance
computing systems is steadily increasing. In the latest (Nov
2008) list of the TOP500 supercomputers, 99.2% of the systems
listed have more than 1024 processor cores and the average
is about 6200. While these machines promise ever more compute
power and memory capacity to tackle today's complex simulation
problems, they force application developers to greatly enhance
the scalability of their codes to be able to exploit it. To
better support them in their porting and tuning process, many
parallel tools research groups have already started to work
on scaling their methods, techniques, and tools to extreme
processor counts. In this talk, we survey existing profiling
and tracing tools, report on our experience in using them in
extreme scaling environments, review existing working and promising
new methods and techniques, and discuss strategies for solving
unsolved issues and problems. |
| |
|
| Technical Briefs: |
| title |
TBD |
| author(s) |
TBD |
| presenter |
TBD |
| abstract |
TBD |
| |
|
| title |
TBD |
| author(s) |
TBD |
| presenter |
TBD |
| abstract |
TBD |
| |
|
| title |
TBD |
| author(s) |
TBD |
| presenter |
TBD |
| abstract |
TBD |
| |
|
| title |
TBD |
| author(s) |
TBD |
| presenter |
TBD |
| abstract |
TBD |
| |
|