|
Title |
|
Large
Scale Parallel Reservoir Simulations on a Linux
PC Cluster |
Author(s) |
|
Walid A. Habiballah and M. Ehtesham
Hayder |
Author Inst |
|
Petroleum Engineering Application
Services Department, Saudi Aramco |
Presenter |
|
M.
Ehtesham Hayder |
Abstract |
|
Numerical simulation is an important
tool used by engineers to develop production strategies
and enhance hydrocarbon recovery from reservoirs.
Demand for large scale reservoir simulations is
increasing as engineers want to study larger and
more complex models. In this study, we evaluate
a state of the art PC cluster and available software
tools for production simulations of large reservoir
models. We discuss some of our findings and issues
related to large scale parallel reservoir simulations
and present performance comparisons between a Pentium
IV Linux PC cluster and an IBM SP Nighthawk supercomputer.
|
|
|
Title |
|
Scalable
Performance of FLUENT on NCSA IA-32 Linux Cluster |
Author(s) |
|
Wai Yip Kwok |
Author Inst |
|
National Center for Supercomputing
Applications (NCSA) |
Presenter |
|
Wai Yip Kwonk |
Abstract |
|
FLUENT, a leading industrial computational
fluid dynamics (CFD) software, has been ported
to the NCSA IA-32 Linux cluster. For this study,
the scalable performance of FLUENT is benchmarked
with two engineering problems from Caterpillar,
Inc. and Fluent, Inc with a maximum of 64 processors
to accommodate up to 10 million cells. This session
will outline the impacts of different interconnects
on simulation performance. Using Myrinet interconnects,
the Linux cluster computes more than 2.5 times
faster than an SGI Origin2000 supercomputer at
NCSA. A performance increase of seven times is
observed when 32 processors are used instead of
two.
|
|
|
Title |
|
Moore's
Law and Cluster Computing When Moore Is Not
Enough |
Author(s) |
|
Greg Lindahl |
Author Inst |
|
Key Research, Inc. |
Presenter |
|
Greg Lindahl |
Abstract |
|
Linux cluster builders have become
accustomed to continuous improvement of cluster
building blocks: each year, CPUs get faster, disks
get bigger, memory bandwidth rises and networks
get cheaper and faster. These improvements are
often seen as the inevitable march of progress,
driven by the commodity market and Moore’s
Law. This session will revisit Moore’s famous
law in detail to determine if it adequately predicts
an environment ripe for commodity cluster computing.
|
|
|
Title |
|
Cooperative
Caching in Linux Clusters |
Author(s) |
|
Ying Xu and Brett Fleisch |
Author Inst |
|
University of California Riverside |
Presenter |
|
Ying
Xu |
Abstract |
|
Operating systems used in most
Linux clusters only manage memory locally without
cooperating with other nodes in the system. This
can create states where a node within the cluster
may be short of memory while idle memory in other
nodes is wasted. This session attempts to solve
the problem of how to improve the cluster operating
system to support the use of cluster-wide memory
as a global distributed resource. Presented will
be a description of a cooperative caching scheme
for caching files in the cluster-wide memory and
corresponding changes in Linux kernel memory management
to support it.
|
|
|
Title |
|
Object
Storage: Scalable Bandwidth for HPC Clusters |
Author(s) |
|
G. Gibson, B. Welch, and D. Nagle |
Author Inst |
|
Panasas Inc. |
Presenter |
|
Garth Gibson |
Abstract |
|
This session describes the Object
Storage Architecture solution for cost-effective,
high bandwidth storage in HPC environments. It
addresses the unique problems of storage intensive
computations in very large clusters, suggesting
that a shared file system with out-of-band metadata
management is needed to achieve the required bandwidth.
The session further argues that for excellent data
reliability, storage protection needs to be supported
on the data path and it recommends the higher-level
semantics of object-based, rather than block-based,
storage for scalable performance, data reliability
and efficient sharing.
|
|
|
Title |
|
Analyzing
Cluster Log Files Using Logsurfer |
Author(s) |
|
James Prewett |
Author Inst |
|
University of New Mexico |
Presenter |
|
James
Prewett |
Abstract |
|
Logsurfer is a log analysis tool
that simplifies maintaining a cluster by aiding
identification and resolution of system issues.
This session will outline several examples of using
Logsufer in a cluster environment. Examples range
from finding the traces of a comples exploitation
of a service to determining which of a set of nodes
have problems rebooting. Attendees will learn to
configure Logsurfer to meet the particular needs
of their environment.
|
|
|
Title |
|
Performance
Evaluation of Load Sharing Policies with PANTS
on Beowulf Cluster |
Author(s) |
|
James Nichols and Mark Claypool |
Author Inst |
|
Worcester Polytechnic Institute |
Presenter |
|
James Nichols |
Abstract |
|
Powerful, low-cost clusters of
personal computers, such as Beowulf clusters, have
fueled the potential for widespread distributed
computation. While these Beowulf clusters typically
have software that facilitates development of distributed
applications, there is still a need for effective
distributed computation that is transparent to
the application programmer.
|
|
|
Title |
|
On
the Numeric Efficiency of C++ Packages in Scientific
Computing |
Author(s) |
|
Ulisses Mello and Ildar Khabibrakhmanov |
Author Inst |
|
T.J. Watson Research Center |
Presenter |
|
Ullises
Mello |
Abstract |
|
Object-Oriented Programming (OOP)
has proven to be a useful paradigm for programming
complex models. In spite of recent interest in
expressing OOP paradigms in languages such as FORTRAN90,
C++ is the dominant OO language in scientific computing,
despite its complexity. Barton & Nackman advocated
C++ as a replacement for FORTRAN in engineering
and scientific computing due to its availability,
portability, effciency, correctness and generality.
These authors used OOP for code reorganization
of LAPACK (Linear Algebra PACKage), and they were
able to group and wrap over 250 FORTRAN routines
into much smaller set of classes, which expressed
the common structure of LAPACK.
|
|
|
Title |
|
Benchmarking
I/O Solutions for Clusters |
Author(s) |
|
Stefano Cozzini and Moshe Bar |
Author Inst |
|
Democritos INFM National Simulation
Cente |
Presenter |
|
Stefano
Cozzini |
Abstract |
|
Clustered Systems offer many advantages
for demanding scientific applications: they can
deal with massive CPU-bound requirements and allow
the distribution of RAM among many nodes. However,
many scientific applications process massive amounts
of data and therefore require high performance,
distributed storage next to parallel I/O. This
session will discuss present-day I/O cluster solutions
based on Bonnie performance benchmarking for a
variety of popular systems.
|
|
|
Title |
|
The
Design, Implementation, and Evaluation of mpiBLAST |
Author(s) |
|
Aaron E. Darling, Lucus Carey,
and Wu-chun Feng |
Author Inst |
|
University of Wisconsin -- Madison |
Presenter |
|
Aaron
E. Darling |
Abstract |
|
mpiBLAST is an Open Source parallelization
of BLAST that achieves superlinear speed-up by
segmenting a BLAST database and then having each
node in a computational cluster search a unique
portion of the database. Database segmentation
permits each node to search a smaller portion of
the database, eliminating disk I/O and vastly improving
BLAST performance. Because database segmentation
does not create heavy communication demands, BLAST
users can take advantage of low-cost and efficient
Linux cluster architectures such as the bladed
Beowulf. In addition to this presentation of the
software architecture of mpiBLAST, there will be
a detailed performance analysis of mpiBLAST to
demonstrate its scalability.
|
|
|
|
Title |
|
SLURM:
Simple Linux Utiltity for Resource Management |
Author(s) |
|
Morris Jette and Mark Grondona |
Author Inst |
|
Lawrence Livermore National Laboratory |
Presenter |
|
Morris
Jette |
Abstract |
|
SLURM is an open source, fault-tolerant
and highly scalable cluster management and job
scheduling system for Linux clusters of thousands
of nodes. Components include machine status, partition
management, scheduling and stream copy modules.
This session presents an overview of the SLURM
architecture and functionality. |
|
|
Title |
|
A
Simple Installation and Administration Tool
for the Large-Scaled PC Cluster System: DCAST |
Author(s) |
|
Tomoyuki Hiroyasu, Mitsunori Miki,
Kenzo Kodama, Junichi Uekawa, and Jack Dongarra |
Author Inst |
|
Doshisha University |
Presenter |
|
Tomoyuki
Hiroyasu |
Abstract |
|
The installation and configuration
of clusters with many nodes is difficult due to
the large amount of time and knowledge required
to fully complete the task. To solve this problem
a simple installation and administration tool, “Doshisha
Cluster Auto Setup Tool: DCAST,” has been
developed. Targeted at Linux, it supports both
diskless and diskfull clusters, requires no interaction
during install, boots slave nodes over the network
and changes to configuration are propagated to
all nodes. |
|
|
Title |
|
The
Space Simulator |
Author(s) |
|
Michael S. Warren, Chris Fryer,
and M. Patrick Goda |
Author Inst |
|
Los Alamos National Laboratory |
Presenter |
|
Michael
S. Warren |
Abstract |
|
The Space Simulator is a 294 processor
Beowulf cluster with a peak performance near1.5
Teraflops. It achieved Linpack performance of 665.1
Gflops on 288 processors, making it the 85th fastest
computer in the world. The Space Simulator Cluster
is dedicated to performing computational astrophysics
simulations in the Theoretical Astrophysics group
(T6) at Los Alamos National Laboratory. This case
study will outline the design drivers, software
and applications applied to |
|
|
Title |
|
A
Middleware-Level Parallel Transfer Technique
Over Multiple Network Interfaces |
Author(s) |
|
Nader Mohamed, Jameela Al-Jaroodi,
Hong Jiang, and David Swason |
Author Inst |
|
University of Nebrask--Lincoln |
Presenter |
|
Nader
Mohamed |
Abstract |
|
Network middleware is a software
layer that provides abstract network APIs to hide
the lowlevel technical details from users. Existing
network middleware support single network interface
and link message transfers. In this session, we
describe a middleware level parallel transfer technique
that utilizes multiple network interface units
that may be connected through multiple networks.
It operates on any reliable transport protocol
such as TCP and transparently provides an expandable
high bandwidth solution that reduces message transfer
time, provides fault tolerance and facilitates
dynamic load balancing between the underlying multiple
networks. The experimental evaluation displayed
a peak performance of 187Mbps on two fast Ethernet
networks. |
|
|
Title |
|
The Cluster
Integration Toolkit (CIT) |
Author(s) |
|
James H. Laros III, Lee Ward, Nathan
W. Dauchy, Ruth Klundt, Glen Laguna, James Vasak,
Marcus Epperson, and Jon R. Stearley |
Author Inst |
|
Sandia National Labs |
Presenter |
|
James
H. Laros III |
Abstract |
|
The Cluster Integration Toolkit
is an extensible, portable, scalable cluster management
software architecture for a variety of systems.
It has been successfully used to integrate and
support a number of clusters at Sandia National
Labs and several other sites, the largest of which
is 1861 nodes. This session will discuss the goals
of the project and how they were achieved. The
installation process will be described and common
tasks for cluster implementation and support will
be demonstrated.
|
|
|
Title |
|
Scalable
C3 Power Tools |
Author(s) |
|
Stephen Scott and Brian Luethke |
Author Inst |
|
Oak Ridge National Laboratory |
Presenter |
|
John Mugler |
Abstract |
|
With the growth of the typical
cluster reaching 512 and more compute nodes, it
is apparent that cluster tools must begin to reach
toward the 1000’s of nodes in scalability.
Version 3.2 of the C3 tools has started stretching
the Single System Illusion concept into the realm
of 1000’s of compute nodes by actually improving
performance on larger clusters. This session is
a discussion of how this was implemented and how
to use this new version of C3 and also presents
some results comparing the latest release with
prior versions of C3.
|
|
|
Title |
|
Full
Circle: Simulating Linux Clusters on Linux
Cluster |
Author(s) |
|
Jose Moreira, Luis Ceze, Karin
Strauss, George Almasi, Patrick J. Bohrer, Jose
R. Brunheroto, Calin Cascaval, Jose G. Gastranos,
and Derek Lieber |
Author Inst |
|
IBM T.J. Watson Research Center |
Presenter |
|
Jose
Moreira |
Abstract |
|
BGLsim is a complete system simulator
for parallel machines allowing users to develop,
test and run the same code that will be used in
a real system. It is currently being used in hardware
validation and software development for the BlueGene/L
cellular architecture machine. BGLsim is capable
of functionally simulating multiple nodes of this
machine operating in parallel. It simulates instruction
execution in each node and the communication that
happens between nodes. To illustrate the capabilities
of BGLsim, experiments running the NAS Parallel
Benchmark IS on a simulated BlueGene/L machine
are described.
|
|
|
Title |
|
Memory
Performance of Dual-Processor Nodes: Comparison
of Intel, Xeon, and AMD Opteron Memory Subsystem
Architectures |
Author(s) |
|
Avijit Purkayastha, Chona S. Guiang,
Kent F. Milfeld, and John R. Boisseau |
Author Inst |
|
University of Texas--Austin |
Presenter |
|
Avijit Purkayastha |
Abstract |
|
There are several important features
in the AMD x8664 microarchitecture (Opteron)
and the HyperTransport technology that are beneficial
to the HPC community. The Opteron processor has
an integrated memory controller, and hence a direct
connection to memory through two 64bit wide
interfaces. More importantly, this means that each
processor in an SMP system has a "separate" interface
and memory modules. In addition, HyperTransport
technology has been built directly into the processors
and also into the chipsets, creating processortoprocessor
and processortochipset interconnects
that are highspeed and have low latencies.
Systems that implement processors with onchip
memory controllers and HyperTransport pointtopoint
links for interprocessor communication can
support parallel applications that have large communication
and data sharing needs. Such systems provide an
ideal environment for both sharedmemory (OpenMP)
and distributedmemory (MPI) paradigms.
The Opteron can also achieve excellent singleprocessor
performance. It is unencumbered by the latencies
and bottleneck of a north bridge, so memoryintensive
applications have the opportunity to deliver
fullbandwidth streams from memory to each
processor. The large L2 caches provide more room
for improving the performance of computeintensive
applications. Also, the native 64bit Opteron
microarchitecture supports largememory applications,
as well as legacy 32bit applications, concurrently.
In this paper we will explore the benefits of
the new x8664 architecture through the performance
of some standard HPC code kernels and applications
that use multithreading (OpenMP) and multiprocessing
(MPI). Our analysis will focus on characteristics
of the memory subsystem and will examine two
key issues. We will conduct scaling studies of
compute and memory intensive applications on
dualprocessor AMD Opteron and Intel Xeon
nodes to assess how well the memory subsystem
copes with the increased memory demands of the
second processor. We will also investigate how
OS memory affinity and process binding affects
memory bandwidths.
|
|
|
Title |
|
Scheduling
for Improved Write Performance in a Cost-Effective,
Fault-Tolerant Parallel Virtual File System
(CEFT-PVFS) |
Author(s) |
|
Yifeng Zhu, Hong Jiang, Xiao Qin,
Dan Feng, and David R. Swanson |
Author Inst |
|
University of Nebraska -- Lincoln |
Presenter |
|
Yifeng Zhu |
Abstract |
|
This session will demonstrate that
all the disks on the nodes of a cluster can be
connected together through CEFTPVFS, an RAID10
style parallel file system for Linux system, to
provide a GBytes/sec parallel I/O performance ,
without any additional cost. To improve the overall
I/O performance, I/O requests can be scheduled
on a less loaded node in each mirroring pair, thus
making more informed scheduling decisions. Based
on the heuristic rules we found from the experimental
results, a scheduling algorithm for dynamic load-balancing
has been developed that significantly improves
the overall performance.
|
|
|
Title |
|
Archiving
Order through CHAOS: The LLNL HPC Cluster Experience |
Author(s) |
|
Robin Goldsteine, Ryan Braby, and
Jim Garlick |
Author Inst |
|
Lawrence Livermore National Laboratory |
Presenter |
|
Robin Goldstone |
Abstract |
|
For the past several years, Lawrence
Livermore National Laboratory (LLNL) has invested
significant effort in the deployment of large High
Performance Computing (HPC) Linux clusters. After
deploying two modest sized clusters (88 nodes and
128 nodes) in early 2002, efforts progressed to
the deployment of the Multiprogrammatic Capability
Resource (MCR, 1154 nodes) in fall 2002 and ASCI
Linux Cluster (ALC, 962 nodes) in early 2003. Through
these efforts, LLNL has developed expertise in
a number of areas related to the design, deployment
and management of large Linux clusters. In this
session LLNL will present their experiences, including
challenges encountered and lessons learned.
|
|
|
Title |
|
Supercomputing
Center Management Using AIRS |
Author(s) |
|
Robert Ballance, Jared Galbraith,
and Roy Heimbac |
Author Inst |
|
University of New Mexico |
Presenter |
|
Robert
A. Ballance |
Abstract |
|
Running a large university supercomputing
center teaches many lessons, including the need
to centralize data collection and analysis, automate
system administration functions, and enable users
to manage their own projects. Albuquerque Integrated
Reporting System (AIRS), a centralized, web-enabled
application capable of user and project administration
across multiple clusters and reporting against
both active and historical data, evolved in response
to these pressures.
|
|
|
|
Title |
|
Running
BLAST on a Linux Cluster |
Presenter(s) |
|
Ray Hookway |
Presenter Inst |
|
Hewlett-Packard |
Abstract |
|
Everyone knows that Blast is an
example of an embarrassingly parallel application,
i.e., an application that will run well on a cluster.
Conceptually, one breaks up a query against a database
into several queries against subsets of the database
and distributes the resulting jobs across the nodes
of the cluster. However, it is not obvious how
to go about doing this. The talk will begin with
a brief review of how Blast works and then will
explore factors that affect the performance of
Blast running on a single system. Final focus will
be on the answer to the question “How to
run Blast on a cluster?” |
|
|
Title |
|
Biobrew
Linux: A Linux Cluster Distribution for Bioinformatics |
Presenter(s) |
|
Glen Otero |
Presenter Inst |
|
Callident |
Abstract |
|
BioBrew Linux is the first known
attempt at creating and freely distributing an
easy-to-use clustering software package designed
for bioinformaticists. With support for both IA32
and IA64 platforms, BioBrew is a Linux distribution
that combines the NPACI Rocks cluster software
with several popular Open Source bioinformatics
software tools like BLAST, HMMER, ClustalW and
BioPerl. The result is a Linux distribution that
can be used to install a workstation or a Beowulf
cluster for bioinformatics analyses. |
|
|
Title |
|
Terascale
LinuxClusters: Supercomputing Solutions for
the Life Sciences |
Presenter(s) |
|
Bruce Ling and Padmanabhan Iver |
Presenter Inst |
|
Tularik, Inc. and Linux NetworkX
(respectively) |
Abstract |
|
At Tularik, a biotechnology company
specializing in drug discovery and development
using gene regulation, informatics has become essential
for the process of genomics-based drug discovery.
With the explosion of the genomic data and lead
discovery screening data points, a powerful computing
environment becomes a must in order to boost B&D
productivity. By deploying a 150-processor cluster,
Tularik has successfully managed millions of data
points, coming from Assay-Development, High-Throughput-Screening
(HTS), Structure-Activity-Relationship (SAR), Lead-Optimization
and Micro-Array to speed its R&D productivity
and decision-making processes. |
|
|
Title |
|
Blade
Servers for Genomic Research |
Presenter(s) |
|
Ron Neyland |
Presenter Inst |
|
RLX Technologies |
Abstract |
|
Clusters based on industry standard
hardware and software have become the most widely
used tools for performing genomic processing and
analysis. While providing many benefits such as
outstanding price/performance, they also introduce
a new set of problems. This session will address
how blade servers provide a compute cluster platform
that delivers the compute power required for genomic
research, while minimizing many of the problems.
Real world examples of clusters running many of
the widely used genomic applications will be presented,
along with tips and tools for managing the cluster
environment. |
|
|
Title |
|
High
Performance Mathematical Libraries for Itanium
2 Clusters |
Presenter(s) |
|
Hsin-Ying Lin |
Presenter Inst |
|
Hewlett-Packard |
Abstract |
|
HP’s Mathematical LIBrary
(HP MLIB) provides a user-friendly interface using
standard definitions of public domain software
and enables users to access the power of high performance
computing. HP MLIB fully exploits the architecture
of the processor and achieves optimal performance
on Itanium 2. HP MLIB has been used by high performance
computing customers for over 15 years. This session
will provide a brief overview of relevant architectural
features and depict how these features have been
used to design high-level algorithms. The performance
of some of the key components in HP MLIB on Itanium
2 clusters will be discussed: i.e. matrix multiplication,
ScaLAPACK and SuperLU_DIST. |
|
|
Title |
|
Parallel
Computational Biology Tools and Applications
for Windows Clusters |
Presenter(s) |
|
Jaroslaw Pillardy |
Presenter Inst |
|
Cornell Theory Center |
Abstract |
|
Using massively parallel programs
for data analysis is the most popular way of dealing
with the enormous amounts of data produced in molecular
biology research. Several computational biology
tools for Microsoft Windows clusters of different
levels of complexity, available at the Computational
Biology Service Unit at the Cornell Theory Center,
will be discussed. All of the tools follow a master-worker
approach using MPI communications. The simplest
tools - tools that are very important to biologists
- are standard sequence-based data mining tools
such as BLAST and HMMER. More sophisticated is
the structure-based (threading) protein annotation
algorithm LOOPP. |
|
|
Title |
|
Building
Software for High Performance Informatics and
Chemistry |
Presenter(s) |
|
Joseph Landman |
Presenter Inst |
|
Scalable Informatics LLC |
Abstract |
|
Given the growth rate of life science
data sets, analysis applications designed for single
machines with shared memory and one or more CPUs
quickly leads to a performance bottleneck. Clusters
and Grids represent a potential solution to this
bottleneck but only when applications are properly
designed to make full use of the resources available.
In this session we will look at the hard realities
of building software for the informatics industry,
including: problems with running legacy software
on clusters, how to make efficient use of clusters,
for both the cluster and the user, and making life
science informatics and chemistry applications
scale well on clustered systems. |
|
|
Title |
|
To
Cluster or Not to Cluster |
Presenter |
|
Tom Scanlan |
Presenter Inst |
|
NEC Solutions America |
Abstract |
|
(Unavailable) |
|
|
| Automotive & Aerospace
Engineering |
|
Title |
|
Cluster
Computing in Space Applications |
Presenter(s) |
|
Eric George |
Presenter Inst |
|
The Aerospace Corporation |
Abstract |
|
This case study will examine how
The Aerospace Corporation utilizes cluster computing
for a variety of applications in support of high
priority national defense programs including the
Global Positioning System (GPS) and future missile
warning programs. Applications to date have focused
on astrodynamics, satellite constellation design,
communications network modeling, thermal analysis,
and complex scheduling/tasking algorithms. Processing
techniques range from Monte Carlo analysis & brute
force search operations to genetic algorithms.
Research is progressing on implementation of a
diverse grid-computing environment at Aerospace. |
|
|
Title |
|
Full
Vehicle Dynamic Analysis Using Automated Component
Modal Synthesis |
Presenter(s) |
|
Peter Schartz |
Presenter Inst |
|
MSC Software |
Abstract |
|
Today it is commonplace to attempt
to analyze the fully trimmed body of an automobile
for its vibration characteristics, over increasing
frequency ranges, and on inexpensive computer hardware.
The cost effectiveness of RISC based cache processors,
combined with upward pressure in the form in large,
detailed models, has allowed new software methods
to utilize domain decomposition to enable high-level
parallelism. A domain decomposition, followed by
a component modal synthesis solution, is the bases
for Automated Modal Component Synthesis (ACMS)
in MSC.Nastran. The solution is described in theory,
and its effectiveness is demonstrated by an example
taken from today’s automotive industry. |
|
|
Title |
|
Using
Clusters to Deliver Turn Key DFC Solutinons |
Presenter(s) |
|
Greg Stuckert |
Presenter Inst |
|
Fluent |
Abstract |
|
While low cost, high performance
clusters have been in use since the early 1990’s,
the application of commercial off-the-shelf CFD
software, such as Fluent, to harness these shared
nothing architectures has only been viable near
the end of that decade. Early implementations required
persistent IT department willing to commit the
time and resources necessary to overcome these
challenges. Now, however, organizations are able
to access a full-featured implementation of Fluent
via the Internet in a pay as you go scenario. This
session will discuss the problems solved and gains
realized by a distributed implementation of Fluent
6.1. |
|
|
Title |
|
LS-DYNA:
CAE Simulation Software on Linux Clusters |
Presenter(s) |
|
Guangye Li |
Presenter Inst |
|
IBM |
Abstract |
|
LS-DYNA is used in a wide variety
of simulation applications: automotive crashworthiness & occupant
safety; sheet metal forming, military and defense
applications, aerospace industry applications,
electronic component design. Several years ago,
one simulation of a very simplified finite element
model needed days to complete on a Symmetric Multiprocessing
(SMP) vector computer. With the introduction of
Distributed Multiprocessing technology, the MPP
(Massively Parallel Processors) version of LS-DYNA
can dramatically reduce the turnaround time for
the simulation and therefore reduce the time for
the automotive design process. We will present
the comparison of the scalability of the SMP and
MPP versions of LS-DYNA, as well as the comparison
of communication networks (Myrinet, Fast Ethernet,
Gigabit Ethernet) on Linux clusters. |
|
|
Title |
|
Linux
Clusters in the German Automotive Industry |
Presenter(s) |
|
Karsten Galer |
Presenter Inst |
|
science + computing AG |
Abstract |
|
After the first German CAE-Linux
computer cluster (LCC) was installed in 1999 at
DaimlerChrysler for electromagnetic compatibility
calculations (EMC), there has been great success
in the adoption of LCC. This includes clusters
based on 512 CPUs used for crash-calculations running
at a major automotive manufacturer. This talk will
provide an overview of ways in which Linux clusters
are changing the course of CAE in Germany. It will
also look at a number of different configurations
currently being implemented in some of the world’s
largest automotive manufacturers. |
|
|
Title |
|
Improving
Multi-site/Multi-departmental Cluster Systems
through Data Grids in the Automotive and Aerospace
Industries |
Presenter(s) |
|
Andrew Grimshaw |
Presenter Inst |
|
Avaki |
Abstract |
|
As the pressure increases to optimize
the product design and manufacturing processes
it is critical for the automotive and aerospace
industries to give professionals secure access
to product and manufacturing information. Data
is often located at multiple R&D sites and
suppliers, regardless of location. Additionally,
product developers require more and more processing
power, delivered via clusters that are not effective
unless they can provide access to the data they
need. This session will examine the most significant
data challenges facing today’s automotive
and aerospace companies and how Grid technology
impacts the engineering and manufacturing process. |
|
|
Title |
|
Scrutinizing
CFD Performance on Multiple Linux Cluster Architectures |
Presenter(s) |
|
Thomas Hauser |
Presenter Inst |
|
Utah State University |
Abstract |
|
Linux cluster supercomputers are
a cost-effective platform for simulating fluid
flow in engineering applications. However, obtaining
high performance on these clusters is a non-trivial
problem, requiring tuning and design modifications
to the Computational Fluid Dynamics (CFD) codes.
Investigations in optimizing CFD codes on Linux
cluster platforms will be presented. Detailed performance
results of two CFD codes on a wide range of cluster
architectures, including Pentium and Athlon, Intel
Itanium and the AMD Opteron, will be analyzed.
The single and multi-processor performance of these
codes on different cluster architectures will be
compared and means of improving performance discussed. |
|
|
Title |
|
Managing
CAE Simulation Workload in Cluster Environments |
Presenter(s) |
|
Michael M. Humphrey |
Presenter Inst |
|
Altair |
Abstract |
|
Automotive manufacturers are beginning
to capitalize on workload management software to
get the most out their numerically intense computing
environments. Workload management software is middleware
technology that sits between your compute-intensive
applications - such as ABAQUS, ANSYS, FLUENT, LS-DYNA,
NASTRAN and OPTISTRUCT - and your network hardware
operating systems. The software schedules and distributes
all types of application runs (serial, parallel,
distributed memory, parameter studies, big memory,
long running, etc.), on all types of hardware (desktops,
clusters, supercomputers and even across sites).
This presentation will describe the current capabilities
of PBS Pro workload management software as a middleware
enabler for robust system design. |
|
|
| Digital
Content Creation / Scientific Visualization
/ Simulation |
|
Title |
|
The
Current State of Numerical Weather Prediction
on Cluster Technology -- What is Needed to
Break the 25% Efficiency Barrrier? |
Presenter(s) |
|
Dan Weber |
Presenter Inst |
|
Center for the Analysis and Prediction
of Storms |
Abstract |
|
This session will look in depth
at the current state of weather prediction and
the many challenges it faces. The talk will examine
the computational needs (teraflops) of a robust
numerical weather prediction (NWP) system at thunderstorm
scale and review NWP performance on current computer
technology. A review of current models will be
addressed, as well as the roadblocks associated
with clusters. Finally, a proposal for a complete
shift in the way systems of equations are solved
on scalar technology in order to break the 25%
efficiency ceiling will be examined. |
|
|
Title |
|
Building
and Using Tiled Display Walls |
Presenter(s) |
|
Paul Rajlich |
Presenter Inst |
|
National Center for Supercomputing
Applications (NCSA) |
Abstract |
|
Tiled display walls provide a large-format
environment for presenting very high-resolution
visualizations by tiling together the output from
a collection of projectors. Projectors are driven
by a Linux cluster augmented with high-performance
graphics accelerator cards and costs are controlled
by using commodity projectors and low-cost PCs.
Tiled walls must face a number of challenges, such
as, aligning the projectors so that the output
of adjacent tiles align to create a seamless image.
This session will discuss the Alliance Display
Wall-in-a-Box effort; a distribution of related
Open Source software packages that reduce the setup
and maintenance of complex high-end display systems. |
|
|
Title |
|
Discovery
and Analysis of Communication Patterns in Complex
Network-based Systems Using Virtual Environments |
Presenter(s) |
|
Tom Caudell |
Presenter Inst |
|
University of New Mexico |
Abstract |
|
The real-time visualization of
cluster networks provides a number of benefits
to administrators and developers in search of performance
bottlenecks. Real-time visuals provide early warning
of real problems in network traffic as well as
provide clear indication of potential problems
before they occur. However, real-time network visualization
is a remarkably difficult project. This session
will discuss a number of the technical hurdles
involved in building a visualization system that
will scale with increased performance. Using network
visualization, organizations can design applications
that take better advantage of network traffic,
avoiding bottlenecks, and administrators can make
informed decisions on scheduling that lead a cluster
toward optimal performance. |
|
|
Title |
|
HPC
and HA Clustering for Online Gaming |
Presenter(s) |
|
Jesper Jensen |
Presenter Inst |
|
SCI |
Abstract |
|
SCI, the company who developed
and supports the backend for the Department of
Defense's America's Army game, will deliver a case
study on deploying gaming clusters for the DoD — and
other game titles — and give an overview
of where large-scale game technology is and where
it is going. With technology capable of pushing
an average of 1.35 teraflops per cabinet space
and leveraging multiple transit carriers, SCI clusters
deliver both the HPC and HA required to support
a massive gaming audience. This discussion will
touch on solutions for 32 bit and next-generation
64 bit architectures both in place and under development. |
|
|
Title |
|
Large
Scale Scientific Visualization on PC Clusters |
Presenter(s) |
|
Brian Wylie |
Presenter Inst |
|
Sandia National Labs |
Abstract |
|
This session covers the use of
PC clusters with commodity graphics cards as high-performance
scientific visualization platforms. A cluster of
PC nodes, in which many or all of the nodes have
3D hardware accelerators, is an attractive approach
to building a scalable graphics system. The main
challenge in using cluster-based graphics systems
is the difficulty of realizing the full aggregate
performance of all the individual graphics accelerators.
Topics covered will include parallel geometric
rendering, parallel volume rendering, data distribution
approaches and novel techniques for utilizing graphics
processors. |
|
|
Title |
|
The
Use of Clusters for Engineering Simulation |
Presenter(s) |
|
Lynn Lewis |
Presenter Inst |
|
Hewlett-Packard |
Abstract |
|
Clusters allow the use of advanced
mathematical techniques for optimization, changing
the way engineers arrive at cost effective, safe
designs. Without inexpensive clusters, engineers
at automotive manufacturers could not do 1000's
of crash test simulations integrated with the initial
design stage nor test for structural integrity
much less manufacturability within weeks. This
session will examine in detail how, over the previous
decade, Unix and lately Linux clusters have found
use in commercial cash and fluid dynamics simulations,
changing the way cars and aircraft are designed
and built. |
|
|
Title |
|
NEESgrid:
Virtual Collaboratory for Earthquake Engineering
and Simulation |
Presenter(s) |
|
Tom Prudhomme |
Presenter Inst |
|
National Center for Supercomputing
Applications (NCSA) |
Abstract |
|
NEESgrid will link earthquake engineering
researchers across the U.S. with leading-edge computing
resources and research and testing facilities,
allowing teams to plan, perform, and publish their
research. Via both Telepresence and other collaboration
technologies, research teams are able to work remotely
on experimental trials and simulations. This session
will examine how NEESgrid, through the shared resources
of Grid technology, will bring together information
technologists and engineers in a way that will
revolutionize earthquake engineering, research
and simulation. |
|
|
Title |
|
In
the Architecture of and Audio Identification
Cluster |
Presenter(s) |
|
Daniel Culbert |
Presenter Inst |
|
Shazam Entertainment, Inc. |
Abstract |
|
(Unavailable) |
|
|
|
Title |
|
Building
the TeraGrid: The World's Largest Grid, Fastest
Linux Cluster, and Fastest Optical Network
Dedicataed to Open Science |
Presenter(s) |
|
Pete Beckman |
Presenter Inst |
|
Argonne National Laboratory |
Abstract |
|
The TeraGrid is one of the most
ambitious collaborative grid projects ever undertaken.
The building blocks for the $88 million National
Science Foundation funded project include mammoth
computational resources, ultra-fast fiberoptic
networks linking NCSA, SDSC, CalTech Argonne and
PSC and a software “grid hosting environment.” Together,
they will form an environment that makes developing
cluster-based, grid-enabled scientific applications
easy. This presentation will provide an overview
of the project, the bleeding edge technologies
used to bring clusters and grids to the scientific
community and an update on current status and results. |
|
|
Title |
|
Building
Blocks for 64-bit AMD Opteron Clusters |
Presenter(s) |
|
Richard Brunner |
Presenter Inst |
|
AMD |
Abstract |
|
This presentation describes the
hardware and software building blocks that are
in place to construct 64-bit AMD Opteron(TM) based
Clusters. We begin with an overview of the newly
released AMD Opteron(TM) processor and its system
architecture that allows affordable 64-bit clustered
computing while maintaining 32-bit performance
and compatibility. Special attention will be given
to the "glueless" multiprocessing capability
provided by fast HyperTransport(tm) Technology
interconnects and per-processor integrated memory
controllers. We will next describe how 64-bit SuSE
Linux Enterprise Server for AMD x86-64 exploits
this hardware topology and discuss the accompanying
thread and explicit parallelism tools and compilers
that are available. We will end the presentation
with a survey of the available third-party cluster
adapters and interconnects that are supported on
AMD Opteron(TM) platforms. |
|
|
Title |
|
Tools
for Optimizing HPC Applications on Intel Clusters |
Presenter(s) |
|
Don Gunning |
Presenter Inst |
|
Intel |
Abstract |
|
The Intel software research lab
is involved in several projects related to the
development and deployment of HPC software on Intel
based clusters. This discussion will focus on the
work Intel is doing in parallel/concurrent computing
within a single job or task, the development, debugging
and tuning multithreaded applications, in addition
to deploying MPI (and mixed MPI/threaded) applications
and Extending OpenMP to execute across clusters.
This discussion will also touch on ideas for maximum
messaging performance on the interconnect while
maximizing application performance on the node. |
|
|
Title |
|
The
Ultra Scalable HPTC Lustre Filesystem |
Presenter(s) |
|
Kent Koeninger |
Presenter Inst |
|
Hewlett-Packard |
Abstract |
|
The Lustre filesystem is designed
to provide a coherent-scalable shared filesystem
that can serve thousands of Linux client nodes,
delivering extremely high-bandwidth parallel-filesystem
access to many terabytes of storage. This talk
will describe how the Luster filesystem will be
used in scalable-HPTC-Linux systems to combine
the flexibility, scalability and manageability
of NAS systems with the performance of SAN systems.
The Lustre development effort is an open source
project with initial release target in 2003. |
|
|
Title |
|
Building
the World's Most Powerful Cluster: 11.2 Tflops
at Lawrence Livermore National Laboratory |
Presenter(s) |
|
Kim Clark |
Presenter Inst |
|
Linux NetworX |
Abstract |
|
In 2002, Linux Networx built the
MCR cluster housed at Lawrence Livermore National
Laboratory. It is currently the largest cluster
in the world with a theoretical peak of 11.2 Tflops
and, with more than 1,000 nodes to manage and monitor,
ranks as the fifth largest supercomputer in the
world. The unique challenges involved in building
and configuring such a massive system and what
was leaned from this experience will be discussed.
Attendees will learn how to apply aspects of the
LLNL system to their own smaller system to enhance
cluster performance and reliability. |
|
|
Title |
|
Driving
Cluster/Grid Technologies in HPC |
Presenter(s) |
|
David Barkai |
Presenter Inst |
|
Intel |
Abstract |
|
High performance computing has
undergone a metamorphosis in the last 15-20 years.
The changes, and what they mean to the industry
and the user community, will be reviewed. The cluster
approach to HPC drives the evolution of a new ecosystem.
In this talk we will describe the building blocks
as a set of components built upon enabling technologies.
The application characteristics determine the choices
made for system software, middleware, interconnect,
cluster topology, the nodes, and the processor.
The resulting architecture and the nature of the
workload and computing environment dictate the
management tools that are needed. We will summarize
the considerations for the choices that need to
be made while highlighting the gaps and the challenges,
as cluster computing ramps up and grid computing
continues to develop. |
|
|
Title |
|
Emerging
Trends in Data Center Powering and Cooling |
Presenter(s) |
|
Wahid Nawabi |
Presenter Inst |
|
APC |
Abstract |
|
Traditional data center architecture
approaches force enterprises to build out to full
capacity from day one, yet one hundred percent
utilization of the designed capacity is seldom
reached. This results in long deployment schedules,
millions of dollars of unrecoverable up-front capital
investments and the maintenance of expensive service
contracts on under-utilized infrastructure. APC’s
PowerStruXure offers an on-demand solution that
accelerates speed of deployment and allows you
to invest in a data center solution that is sufficient
to meet today’s demands, rather than an uncertain
estimate of future capacity. |
|
|
Title |
|
The
Virtual Environment and Its Impact on IT Infrastructure |
Presenter(s) |
|
Daniel Kusnetzky |
Presenter Inst |
|
IDC |
Abstract |
|
IDC has been examining the evolution
of the virtual environment for quite a number of
years. This session will examine IDC’s definition
of the virtual environment, its roots in techniques
developed in the late 1970s, and how Windows, Unix
and Linux can be deployed as platforms in the virtual
environment. Dan Kusnetzky, IDC’s Vice President
of System Software, will present the drivers for
virtual environment software adoption and project
how the virtual environment will impact the overall
IT infrastructure in the coming years. |
|
|
| Petroleum
/ Geophysical Expoloration |
|
Title |
|
Exploring
the Earth's Subsurface with Itanium 2 Linux
Clusters |
Presenter(s) |
|
Keith Gray |
Presenter Inst |
|
British Petroleum |
Abstract |
|
This case study of an Itanium 2
processor architecture and Linux cluster technology
for seismic imaging and migration imperative project,
allowed a British Petroleum to reduce their cost
for this high-end infrastructure by one-half while
increasing performance by 3X, and in some cases
exceeding this expectation by 5X. The environment
includes 1024 processors (4-way HP rx5670 servers
x 256 servers) with 8.2 Terabytes (32GB per rx5670
server) of memory and operates at over 4Teraflops
peak performance. |
|
|
Title |
|
Scalablity
Considerations for Compute Intensive Appplications
on Clusters |
Presenter(s) |
|
Christian Tanasescu |
Presenter Inst |
|
SGI |
Abstract |
|
This session investigates the scalability,
architectural requirements and performance characteristics
of some of the most widely used compute intensive
applications in the scientific and engineering
communities. Seismic Processing and Reservoir Simulation
(SPR) applications generally consume data read
from memory and have to load continuous new data.
As a result, to keep the floating point (FP) units
busy, these applications require computer architectures
with high memory bandwidth, mainly due to the data
addressing patterns and heavy I/O activities. We
will also introduce BandeLa, to study the influence
of the communication bandwidth and latency for
MPI applications. |
|
|
Title |
|
Parallel
Reservoir Simulation on Intel Xeon HPC Clusters |
Author(s) |
|
Baris Guler, Tau Leng, Victor Mashayekhi,
and Kamy Sepehmoori |
Author Inst |
|
Dell and Univeristy of Texas -
Austin |
Presenter |
|
Kamy Sepehmoori and Reza Rooholamini |
Abstract |
|
Numerical simulation of reservoirs
is an integral part of geo-scientific studies,
with the goal of optimizing petroleum recovery.
In this session, we conduct a series of benchmarks
by running a parallel reservoir simulation code
on an Intel Xeon Linux cluster and study the scalability
while using different interconnects for the cluster.
Our results show that the simulator’s performance
scales linearly from one to 64 single-processor
nodes, when using a low-latency, high-bandwidth
interconnect. In addition to benchmarking, we describe
a process-to-processor mapping approach for dual-processor
clusters to improve communication performance as
well as overall performance of the simulator. |
|
|
Title |
|
Geoscience
Visualization and Seismic Processing Clusters:
Collaboration and Integration Issues |
Presenter(s) |
|
Phil Neri |
Presenter Inst |
|
Paradigm |
Abstract |
|
The active development of Linux
visualization clusters has led to the notion of
associating closely compute-intensive seismic processing
and geosciences visualization, notably for the
purpose of building and verifying velocity and
solid models. The options are to implement cross-system
data integration, or to share of a common hardware
resource. Practical implementations of the integration
model will be presented, based on Paradigm’s
experience with existing production systems. The
use of a CORBA-based distributed data architecture
will also be discussed. The common hardware concept,
still in the design phase, will be analyzed for
its expected benefits, economics and potential
problems. |
|
|
Title |
|
Cluster
Computing at CGG |
Presenter(s) |
|
L. Clerc |
Presenter Inst |
|
CGG |
Abstract |
|
(Unavailable) |
|
|
Title |
|
Grid
Computing in the Energy Industry |
Presenter(s) |
|
Jamie Bernardin |
Presenter Inst |
|
DataSynapse |
Abstract |
|
Grid computing has attracted significant
attention in the current IT environment. What are
the business and technical factors driving companies
to adopt Grid? In this presentation on Grid computing
in Oil & Gas, we will examine, frequently encountered
obstacles to deploying a grid computing solution,
compare the vision of Grid to the realities of
today, identify target deployments for distributed
computing solutions in the Oil & Gas sector,
and describe the value impact of grid computing.
DataSynapse will share case studies from its existing
engagements as well as identify specific technical
requirements unique to the energy market. |
|
|
Title |
|
Drilling
in the Digital Oil Field: High Pay-offs from
Linux Clusters |
Presenter(s) |
|
Shawn Fuller |
Presenter Inst |
|
Hewlett-Packard |
Abstract |
|
The Oil & Gas industry is required
to manage mammoth volumes of complex data for both
engineering and scientific requirements in their
search for discovering new reservoirs and more
cost efficient production methods. Globally deployable
high-performance computing systems coupled with
best-in-class applications are the keys to success
for Oil & Gas companies to excel in their business.
This session will cover the areas of technology
receiving the most focus: mobility, desktop visualization,
scalable and immersive visualization, global collaboration,
scalable clustered systems, network storage systems,
imaging and printing - covering the full gamut
of Oil & Gas IT requirements. |
|
|