Plenary
Presentations
|
Plenary I
|
Title |
|
An Overview of High-Performance
Computing and Challenges for the Future |
Author(s) |
|
Jack Dongarra |
Author Inst |
|
University of Tennessee, USA |
Presenter |
|
Jack Dongarra |
Abstract |
|
In this talk we will examine how high-performance
computing has changed over the past ten years and look toward
the future in terms of trends. A new generation of software
libraries and algorithms are needed for the effective and reliable
use of (wide-area) dynamic, distributed and parallel environments.
Some of the software and algorithm challenges have already
been encountered, such as management of communication and memory
hierarchies through a combination of compile-time and run-time
techniques. However, the increased scale of computation, multicore,
depth-of-memory hierarchies, range of latencies, and increased
run-time environment variability will make these problems much
harder. |
|
|
Plenary II
|
Title |
|
High-End Operating
Systems: Status and Future |
Author(s) |
|
Frederick C. Johnson |
Author Inst |
|
DOE, Office of Science, USA |
Presenter |
|
Frederick C. Johnson |
Abstract |
|
With the exception of microkernels used on
the compute nodes of some systems, all operating system (OS)
software for high-end systems is based on Unix or Linux. These
OS's share some common attributes: the basic design is old
(~35 years for Unix), they are littered with system processes
unneeded for high-end computing, and neither scalability or
performance was designed in from scratch. Microkernels, in
conjunction with user-space messaging, offer performance and
scalability but do not support a full suite of user services.
The success of Unix and Linux has had a detrimental effect
on OS research for high-end systems, and very little research
has been undertaken for several years. This talk will provide
an overview of current high-end OS approaches, discuss some
of the barriers that must be overcome for terascale and petascale
OS's, and describe some new research activities that have been
initiated as part of the Office of Science Next Generation
Architecture activity. |
|
|
Plenary III
|
Title |
|
Cluster Computing
in Everyday Biomedical Research: Past, Present, and Future |
Author(s) |
|
Klaus Schulten |
Author Inst |
|
University of Illinois at Urbana-Champaign,
USA |
Presenter |
|
Klaus Schulten |
Abstract |
|
This lecture will review 12 years of successful
cluster computing in biomedical research, which has permitted
large-scale simulations of biomolecular machinery in living
cells. We started in 1993 with a few high-performance workstations
connected through an optical switch that permitted 30,000 atom
simulations, integrating commodity desktops and switches at
a 32-processor scale (in 1998), which permitted 100,000 atom
simulations. In 2003, using rack-server clusters integrating
50 (in the work group) to thousands (at the National Centers,
permitting multi-million atom simulations) of processors, biomedical
and computer scientists adapted their software (NAMD, honored
through a Gordon Bell award) and underlying algorithms continuously
to existing technical solutions. In 2005, the team of researchers
and developers prepared NAMD for automated grid computing,
making the grid available from the desktop as a unified resource.
Today, the team prepares NAMD for use on the next generation
of clusters with tens of thousands of processors and specialized
computational hardware such as FPGAs and GPUs.
This lecture will illustrate, in particular, how technical
development advanced biomedical research in quantum leaps
and what the advances actually mean from a biomedical perspective. |
|
|
Applications
Track Abstracts:
|
Applications
Papers I: Performance
|
Title |
|
A
Compuarison of Single-Core and Dual-Core Opteron Processor
Performance for HPC |
Author(s) |
Douglas M. Pase and Matthew A. Eckl |
Author Inst. |
IBM |
Presenter |
Douglas M. Pase |
Abstract |
Dual-core AMD Opteron™ processors represent
the latest significant development in microprocessor technology.
In this paper, using the IBM® eServer™ 326,
we examine the performance of dual-core Opteron processors.
We measure the core performance under ideal work loads using
the Linpack HPL benchmark, and show it to be 60% faster than
the fastest single-core Opteron. We measure unloaded memory
latency and show that the dual-core processor’s slower
clock frequency makes the latency longer. However, we also
show that its memory throughput is 10% greater than single-core
processors. Finally, we show that dual-core Opteron processors
offer a significant performance advantage even on realistic
applications, such as those represented by the SPEC® CPU2000
benchmark suite. |
|
|
Title |
Performance
Analysis of AERMOD on Commodity Platforms |
Author(s) |
George Delic |
Author Inst. |
HiPERiSM Consulting |
Presenter |
George Delic |
Abstract |
This report examines performance of the AERMOD
Air Quality Model in two versions and two data sets with three
compilers on Intel and AMD processors using the PAPI performance
event library to collect hardware performance counter values
where possible. The intent is to identify performance metrics
that indicate where performance inhibiting factors occur when
the codes execute on commodity hardware. Results for operations,
instructions, cycles, cache, table look-aside buffer misses,
and branching instructions, are discussed in detail. An execution
profile and source code analysis uncovers the causes of performance
inhibiting behavior and how they lead to bottle-necks on commodity
hardware. Based on this analysis, source code modifications
are proposed with a view to potential performance enhancement. |
|
|
Title |
Performance
Lessons from the Cray XT3 |
Author(s) |
Jeff Larkin |
Author Inst. |
Cray Inc. |
Presenter |
Jeff Larkin |
Abstract |
Whether designing a cluster or a massively
parallel processor machine, many of the most important designs
must be made early in the process and are difficult to change.
In this talk I will discuss some of the design choices made
by Cray when designing the XT3, Cray's third generation of
massively parallel processor machines. I will specifically
discuss what choices we made for processor, network, and operating
system and what we have learned from these choices. Finally,
I will show real-world performance results from the Cray XT3. |
|
Applications
Papers II:
|
Title |
Experiences in Optimizing
a Numerical Weather Prediction Model: An Exercise in Futility? |
Author(s) |
Dan Weber and Henry Neeman |
Author Inst |
University of Oklahoma |
Presenter |
Dan Weber |
Abstract |
This paper describes the basic optimization
methods applied, including cache memory optimization and message
hiding, to a grid point CFD numerical weather prediction model
to reduce the wall time for time critical weather forecasting
and compute intensive research. The problem is put into perspective
via a brief history of the type of computing hardware used
for this application during the past 15 years and the trend
in computing hardware that has brought us to the current state
of code efficiency. The detailed performance analysis of the
code identifies the most likely parts of the numerical solver
for optimization and the most promising methods for improving
single- and parallel-processor performance. Results from each
performance-enhancing ploy are presented and help define the
envelope of potential performance improvement using standard
commodity-based cluster technology. |
|
|
Title |
Benchmark Analysis
of 64-bit Servers for Linux Clusters for Application in Molecular
Modeling and Atomistic Simulations |
Author |
Stefano Cozzini1, Roger Rousseau2,
Axel Kohlmeyer3 |
Author Inst |
1CNR-INFM Democritos National Simulation
Center, 2Sissa, 3University of Pennsylvania |
Presenter |
TBD |
Abstract |
We present a detailed comparison of the performance
of synthetic test programs such as DGEMM and STREAM as well
as typical atomistic simulation codes DIPROTEIN and CPMD, which
are extensively used in computational physics, chemistry and
biology on a large number of high-performance computing platforms.
With an eye toward maximizing the price/performance ratio for
applications in atomistic simulations, we examine a wide class
of commonly used 64-bit machines and discuss our results in
terms of the various aspects of the machine architecture, such
as CPU speed, SMP memory performance and network. We find that
although the Intel EM64T machines show superior performance
for applications that extensively exploit the MKL library the
Opteron-based machines show superior performance with less
optimized codes. Moreover, for large memory applications such
as electronic structure codes, the SMP performance of the Opteron
is superior. An overview of which architecture is suitable
for which applications and a comparison of AMD dual-core CPU
technology to Intel hyper-threading are also discussed. |
|
|
Applications Papers III:
Tools
|
Title |
Intel Cluster
Tools |
Author(s) |
Ullrich Becker-Lemgau |
Author Inst |
Intel |
Presenter |
Ullrich Becker-Lemgau |
Abstract |
Parallel Systems like clusters and shared memory
systems require optimized parallel and scalable applications
to benefit from the computing power provided by these systems.
But the standard for distributed parallel programming—the
message-passing interface MPI—does not provide any features
or paradigms to reach high-parallel efficiency and scalability
for high numbers of processes. Tools are required to help software
developers understand the communication pattern of their applications
and to indicate where optimization is needed.
The Intel® Trace Analyzer & Collector 6.0 is a tool
to get detailed information on the parallelism of distributed
applications. The application is linked with the Intel® Trace
Collector Library, which writes a trace file during execution.
All MPI calls and events are recorded in this trace file.
Information about user code can also be recorded if the code
is instrumented with the Intel® Trace Collector API.
After the application execution stops, Intel® Trace Analyzer
is used to visualize all the information included in the
trace file. Intel® Trace Analyzer features different
charts like event timelines for detailed information on what
each process is executing at a given time, showing messages
between different processes, and it gives direct access to
the source code of the originating MPI calls. Several other
charts are available showing statistical information in a
timeline or matrix fashion. All information can be aggregated,
tagged or filtered in time and processes dimensions to enable
effective analysis of parallel applications with thousands
of processes. The new release 6.0 of the Intel® Trace
Analyzer & Collector features a redesigned and faster
GUI with better scaling over time and processes, bigger trace
file capabilities, and choice of execution on Linux or Windows® XP.
The Intel Trace Analyzer & Collector tool set is included
in the Intel Cluster Toolkit which combines a full set of
cluster tools provided by Intel. Besides performance tools
programming libraries like the Intel MPI Library are part
of the Intel Cluster Toolkit. Intel MPI Library features
many MPI-2 functions and multi-fabric support. For an application
build with Intel MPI Library the user is able to choose the
network fabric when launching the application. The Intel
Cluster Toolkit is a one-stop solution for developing and
optimizing Linux Cluster applications and systems. |
|
|
Title |
A Test Harmess
TH for Evaluating Code Changes in Scientific Software |
Author(s) |
Brian T. Smith |
Author Inst. |
Numerica 21 Inc. |
Presenter |
Brian T. Smith |
Abstract |
Not available. |
|
|
Title |
An Integrated
Performance Tools Environment |
Author(s) |
Luiz DeRose |
Author Inst. |
Cray Inc. |
Presenter |
Luiz DeRose |
Abstract |
Not available. |
|
|
Applications Papers IV:
Applications, Visualization
|
Title |
Evaluation of
RDMA over Ethernet Technology for Building Cost-Effective
Linux Clusters |
Author(s) |
Michael Oberg1, Henry M. Tufo2,
Theron Voran1, Matthew Woitaszek1 |
Author Inst |
1University of Colorado, Boulder; 2U.
of Colorado, Boulder/National Center for Atmospheric Research |
Presenter |
Matthew Woitaszek |
| Abstract |
Remote Direct Memory Access (RDMA) is an effective
technology for reducing system load and improving performance.
Recently, Ethernet offerings that exploit RDMA technology have
become available that can potentially provide a high-performance
fabric for MPI communications at lower cost than other competing
technologies. The goal of this paper is to evaluate RDMA over
gigabit Ethernet (ROE) as a potential Linux cluster interconnect.
We present an overview of current RDMA technology from Ammasso,
describe our performance measurements and experiences, and
discuss the viability of using ROE in HPC applications. In
a series of point-to-point tests, we find that the RDMA interface
provides higher throughput and lower latency than legacy gigabit
Ethernet. In addition, even when functioning in non-RDMA mode,
the ROE cards demonstrate better performance than the motherboard
network interfaces. For application benchmarks, including LINPACK
and a climate model, the Ammasso cards provide a speedup over
standard gigabit Ethernet even in small node configurations. |
|
|
Title |
Performance
of Voltaire InfiniBand in IBM 64-Bit Commodity HPC Clusters |
Author(s) |
Douglas M. Pase |
Author Inst |
IBM |
Presenter |
Douglas M. Pase |
Abstract |
Not available. |
|
|
Systems
Track Abstracts:
|
Systems Papers I: Filesystems
and Cluster Management
|
Title |
Building of
a GNU/Linux-Based Bootable Cluster CD |
Author |
Paul Gray1, Jeff Chapin1,
Tobias McNulty2 |
Author Inst |
1University of Northern Iowa and 2Earlham
College |
Presenter |
TBD |
Abstract |
The Bootable Cluster CD (BCCD) is an established,
well-maintained, cluster toolkit used nationally and internationally
within several levels of the academic system. During the Education
Programs of Supercomputing conferences 2002, 2003, and 2004,
the BCCD image was used to support instruction of issues related
to parallel computing education. It has been used in the undergraduate
curriculum to illustrate principles of parallelism and distributed
computing and widely used to facilitate graduate research in
parallel environments. The standard BCCD image is packaged
in the 3”, mini-CD format, easily fitting inside most
wallets and purses. Variations include PXE-bootable (network-bootable)
and USB-stick bootable images. All software components are
pre-configured to work together making the time required to
go from boot-up to functional cluster less than five minutes.
A typical Windows or Macintosh lab can be temporarily converted
into a working GNU/Linux-based computational cluster without
modification to original disk or operating system. Students
can immediately use this computational cluster framework to
run a variety of real scientific models conveniently located
on the BCCD and downloadable into any running BCCD environment.
This paper discusses building, configuring, modifying, and
deploying aspects of the Bootable Cluster CD. |
|
|
Title |
Improving Cluster
Mangement with Scalble Filesystems |
Author |
Adam Boggs1, Jason Cope1,
Sean McCreary2, Michael Oberg2, Henry
M. Tufo2, Theron Voran2, Matthew Woitaszek2 |
Author Inst |
1University of Colorado, Boulder; 2U.
of Colorado, Boulder/National Center for Atmospheric Research |
Presenter |
TBD |
Abstract |
Reducing the complexity of the hardware and
software components of Linux cluster systems can significantly
improve management infrastructure scalability. Moving parts,
in particular hard drives, generate excess heat and have the
highest failure rates among cluster node components. The use
of diskless nodes simplifies deployment and management, improves
overall system reliability, and reduces operational costs.
Previous diskless node implementations have relied on a central
server exporting node images using a high-level protocol such
as NFS or have employed virtual disks and a block protocol
such as iSCSI to remotely store the root filesystem. We present
a mechanism to provide the root filesystems of diskless computation
nodes using the Lustre high-performance cluster file system.
In addition to eliminating the downtime caused by disk failures,
this architecture allows for highly scalable I/O performance
that can be free from the single point of failure of a central
fileserver. We evaluate our management architecture using a
small cluster of diskless computation nodes and extrapolate
from our results the ability to provide the manageability,
scalability, performance and reliability required by current
and future cluster designs. |
|
|
Title |
The Hydra Filesystem:
A Distributed Storage Framework |
Author |
Benjamin Gonzalez and George K.Thiruvathukal |
Author Inst |
Loyola University, Chicago |
Presenter |
TBD |
Abstract |
Hydra File System (HFS) is an experimental
framework for constructing parallel and distributed filesystems.
While parallel and distributed applications requiring scalable
and flexible access to storage and retrieval are becoming more
commonplace, parallel and distributed filesystems remain difficult
to deploy easily and to configure for different needs. HFS
aims to be different by being true to the tradition of high-performance
computing while employing modern design patterns to allow various
policies to be configured on a per-instance basis (e.g. storage,
communication, security, and indexing schemes). We describe
a working prototype (available for public download) that has
been implemented in the Python programming language. |
|
|
Systems Papers II: Cluster
Efficiencies
|
Title |
ClearSpeed Acceleratiors
in Linux Clusters |
Author |
John L. Gustafson |
Author Inst |
ClearSpeed Technology |
Presenter |
John L. Gustafson |
Abstract |
While the use of commodity processors and interconnect
and the Linux operating system have permitted great advances
in the cost-effectiveness of HPC systems, they have exposed
a new limitation: power and space requirements. It is common
for a high-density Linux cluster to require more watts per
cabinet than earlier proprietary HPC designs. The cost of the
electrical power over the life of the system can exceed the
cost of the system. The ClearSpeed coprocessor accelerates
kernels specific to 64-bit scientific computing such as matrix
multiplication and Fourier transforms. For technical applications,
it raises both the performance and the performance-per-watt
on those operations relative to more general-purpose processor
designs. Commodity processors like those used in Linux clusters
cannot incorporate HPC-specific features without reducing competitiveness
in the broader application space, so the ClearSpeed accelerator
option restores an HPC feature emphasis to clusters that require
it.
We present the architecture of the ClearSpeed CSX600 chip
and its use in the Advance™ plug-in board, issues of
algorithm-architecture fit, approaches for ease-of-use, and
applications either available now or under development. We
also show the impact on performance and facilities requirements
of equipping typical Linux clusters with ClearSpeed accelerators. |
|
|
Title |
Maestro-VC:
On-Demand Secure Cluster Computing Using Virtualization |
Author |
Nadir Kiyanclar, Gregory A. Koenig, William
Yurcik |
Author Inst |
National Center for Supercomputing Applications/UIUC |
Presenter |
TBD |
Abstract |
On-demand computing is the name given to technology
which enables an infrastructure where computing cycles are
treated as a commodity, and where such a commodity can be accessed
upon request. In this way the goals of on-demand computing
overlap with and are similar to those of Grid computing: both
enable the pooling of global computing resources to solve complex
computational problems.
Recently, virtualization has emerged as a viable mechanism
for improving the utilization of commodity com-puting hardware.
This field has seen much research for potential applications
in the field of distributed and Grid computing. In this paper,
we present an architecture and prototype implementation for
Maestro-VC, a system which takes advantage of virtualization
to provide a sandboxed environment in which administrators
of cluster hardware can execute untrusted user code. User
code can run unmodified, or can optionally take advantage
of the special features of our system to improve performance
and adapt to changes in the environment. |
|
|
Title |
Architectural
Tradeoffs for Unifying Campus Grid Resources |
Author |
Bart Taylor1 and Amy Apon2 |
Author Inst |
1Acxiom Corporation and 2University
of Arkansas |
Presenter |
TBD |
Abstract |
Most universities have a powerful collection
of computing resources on campus for use in areas from high-performance
computing to general-access student labs. However, these resources
are rarely used to their full potential. Grid computing offers
a way to unify these resources and to better utilize the capability
they provide. The complexity of some grid tools makes learning
to use them a daunting task for users not familiar with using
the command line. Combining these tools into a single web-portal
interface provides campus faculty and students with an easy
way to access the campus resources. This paper presents some
of the grid and portal tools that are currently available and
the tradeoffs in their selection and use. The successful implementation
of a subset of these tools at the University of Arkansas and
the functionality they provide are discussed in detail. |
|
|
Systems Papers III:
|
Title |
LEA: A Cluster
Intensive Simulation Software for Unit Commitment |
Author |
Riadh Zorgati1, Wim Van Ackooij1,
Jean-Marc Luel1, Pierre Thomas1, Michael
Uchanski2, Kevin Shea2 |
Author Inst |
1EDF, 2The Mathworks,
Inc. |
Presenter |
TBD |
Abstract |
The Unit Commitment Problem (UCP) consists
in defining the generation schedule of minimal cost for a given
set of power units. Due to complex technical constraints, the
UCP is a challenging large-size, non-convex, non-linear optimization
problem. The UCP has been solved satisfactory in an industrial
way. Solving thus the deterministic basic UCP takes about fifteen
minutes. At bi-weekly horizon, uncertainty and hazards cannot
be neglected but solving the UCP as a stochastic problem at
such short-term horizon, is a challenging task.
In this paper, we report the LEA experience, a software
for cluster intensive simulation in unit commitment, implementing
a multi-scenarios technique, which allows to take uncertainty
and hazards into account in a simplified way. This technique
naturally requires important computing resources but becomes
industrially tractable when using a cluster. Regarding the
first obtained results, the underlying main concept consisting
in combining intensive simulation with smartly chosen uncertainty
scenarios seems to be efficient. |
|
|
Title |
Lessons for
the Cluster Community from an Experiment in Model Coupling
with Python |
Author |
Michael Tobis1, Mike Steder1,
Ray Pierrehumbert1, Robert Jacob2 |
Author Inst. |
1University of Chicago, 2Argonne
National Laboratory |
Presenter |
TBD |
Abstract |
Available soon. |
|
|
Title |
An Equation-by-Equation
Method for Large Problems in a Distributed Computing Enviroment |
Author |
Ganesh Thiagarajan and Anoop G. Varghese |
Author Inst. |
University of Missouri, Kansas City |
Presenter |
TBD |
Abstract |
For finite element problems involving millions
of unknowns, iterative methods set up in a parallel computing
platform, are more efficient than direct solver techniques.
Considerations that are important in the solution of such problems
include the time of computation, the memory required and the
type of platform being used to solve the problem. Traditionally
shared memory machines were popular. However, distributed memory
machines are now gaining wider acceptance due to their relatively
low cost and ease of setup. The conventional approach to set
up an iterative finite element solver is the Element-by-Element
(EBE) method with the preconditioned conjugate gradient (PCG)
solver. The EBE method is reported by Horie and Kuramae (1997,
Microcomputers in Civil Engineering, 2, 12) to be suitable
for shared-memory parallel architectures, but has certain conflicts
in distributed memory machines. This paper proposes a new algorithm
that is developed for the parallelization of the solution algorithm
of the finite element setup for a distributed memory environment.
The new method, called the Equation-by Equation (EQBYEQ) method,
is based on generating and storing the stiffness matrix on
an equation-by-equation basis in contrast to the element-by-element
basis. This paper discusses the algorithm and implementation
of details. The advantages of the EQBYEQ scheme when compared
to the EBE scheme, in distributed environment, is discussed. |
|
|
Systems Papers IV: High
Availability
|
Title |
On
the Survivability of Standard MPI Applications |
Author |
Anand Tikotekar1, Chokchai Leangsuksun1,
Stephen L. Scott2 |
Author Inst. |
1Louisiana Tech University, 2Oak
Ridge National Laboratory |
Presenter |
TBD |
Abstract |
Job loss due to failure represents a common
vulnerability in high-performance computing (HPC), especially
in the Message Passing Interface (MPI) environment. Rollback-recovery
has been used to mitigate faulty issues for long-running applications.
However, to date, the rollback-recovery such as checkpoint
mechanism alone may not be sufficient to ensure fault tolerance
for MPI applications due to a static view of MPI-cooperating
machines and lack of resilient ability to endure outages. In
fact, MPI applications are prone to cascading failures, where
one participating node causes the total failure. In this paper
we address fault issues in the MPI environment by improving
runtime availability with self-healing and self-cloning that
tolerates the outage of cluster computing systems. We develop
a framework that augments a standard HPC cluster with a fault-tolerance
capability at job level that preserves the job queue, and a
parallel MPI job submitted through a resource manager enabling
the nonstop execution even after encountering failure. |
|
|
Title |
Cluster
Survivability with ByzwATCh: A Byzantine Hardware Fault Detector
for Parallel Machines with Charm++ |
Author |
D. Mogilevsky, G. Koenig, W. Yurcik |
Author Inst. |
National Center for Supercomputing Applications/UIUC |
Presenter |
TBD |
Abstract |
As clusters grow larger in size, the sheer
number of participating components has created an environment
where failures effecting long-running jobs can be expected
to increase in frequency for the foreseeable future. The cluster
research community is aware of this problem and has proposed
many error-recovery protocols, which offer support for fault-tolerant
computing; however, each of these recovery protocols depends
on the underlying detection of faults. In this paper we present
a prototype system to detect the most difficult faults, specifically
Byzantine faults. We describe the operation of ByzwATCH, a
system for the run-time detection of Byzantine faults as part
of the Charm++ parallel programming framework. Results show
that ByzwATCH is both accurate in detection and lightweight
for high-performance computing environments. While we demonstrate
this work for a Linux cluster environment, it is extensible
to other environments requiring high reliability (e.g. carrier
class server environments) with error-recovery protocols using
this fault detection system as its foundation. |
|
|
Title |
RASS Framework
for a Cluster-Aware SELinux |
Author |
Arpan Darivemula1, Anand Tikotekar1,
Chokshai Leangsuksun1, Makan Pourzandi2 |
Author Inst. |
1Louisiana Tech University, 2Ericsson
Research Canada |
Presenter |
TBD |
Abstract |
The growing deployments of clusters to solve
critical and computationally intensive problems imply that
survivability is a key requirement through which the systems
must possess Reliability, Availability, Serviceability and
Security (RASS) together. In this paper, we conduct a feasibility
study on SELinux and the existing cluster-aware RASS framework.
We start by understanding a semantic mapping from cluster-wide
security policy to individual nodes’ Mandatory Access
Control (MAC). Through our existing RASS framework, we then
construct an experimental cluster-aware SELinux system. Finally,
we demonstrate feasibility of mapping distributed security
policy (DSP) to SELinux equivalences and the cohesiveness of
cluster enforcements, which, we believe, leads to a layered
technique and thus becomes highly survivable. |
|
|
Vendor
Presentations
|
|
Title |
TBD |
Author(s) |
TBD |
Author Inst |
TBD |
Presenter |
TBD |
Abstract |
Not available. |
|
|
Title |
TBD |
Author(s) |
TBD |
Author Inst |
IBM |
Presenter |
TBD |
Abstract |
Not available. |
|
|
Title |
HPC
Into The Mainstream |
Author(s) |
Stephen Wheat |
Author Inst |
Intel |
Presenter |
Stephen Wheat |
Abstract |
Processor performance continues with its historical
climb, bringing an ever increasing performance-per-cost ratio
to computing platforms. Technology and capabilities, which
were once exclusively in the hands of the high-performance
computing community, will now become available to a broader
set of the world's population. The computing possibilities
this will release are broad — from bringing HPC to previously
computing-challenged activities, to enabling each college student
to have their own HPC system. Technologies supporting
this HPC proliferation will be discussed, as will a view into
some of the future usage models. |
| |
|
|
Title |
Real Application
Scaling |
Author(s) |
Greg Lindahl |
Author Inst |
PathScale |
Presenter |
Greg Lindahl |
Abstract |
Available soon. |
|
|
Title |
Myri-10G: Overview
and a Report on Early Deployments |
Author(s) |
Tom Leinberger |
Author Inst |
Myricom |
Presenter |
Tom Leinberger |
Abstract |
Not available. |
|
|
Title |
PGI Compilers
and Tools for Scientists and Engineers |
Author(s) |
Douglas Miles |
Author Inst |
PGI |
Presenter |
Douglas Miles |
Abstract |
The differences between AMD64 and EM64T and
what that means for compiler users. |
|
|
Title |
TBD |
Author(s) |
TBD |
Author Inst |
Cray |
Presenter |
TBD |
Abstract |
Not available. |
|
|
Title |
TBD |
Author(s) |
TBD |
Author Inst |
AMD |
Presenter |
TBD |
Abstract |
Not available. |
|
|
Title |
TBD |
Author(s) |
TBD |
Author Inst |
TBD |
Presenter |
TBD |
Abstract |
Not available. |
|
|
Title |
TBD |
Author(s) |
TBD |
Author Inst |
TBD |
Presenter |
TBD |
Abstract |
Not available. |
|
|