program

October 6-10, 2008:
IBM Executive Briefing Center
Montpellier, France

The LCI reserves the right to cancel workshops for any reason.

monday | tuesday | wednesday | thursday | Friday

monday, october 6, 2008

Intro to HPC, Clusters and Compute Node Architecture
  Intro to current state of clusters and HPC
  • Types of parallelism/distributed computing vs. parallel computing
  • Evolution to multi-core computing , application impact/implications
  • Data scale and flow
  • HPC Trends: green computing, power efficiencies, FPGAs, etc.
  • Top500, Petascale examples
Compute Node Architecture/Intel/Power/BG Roadmap
  • Processors/Cores, Multi-core
  • Caches/Memory Architecture
  • Bus architectures/system configuration
Hands-on: Intro to parallel computing

tuesday, october 7, 2008

HPC Operating Systems and Network Architecture
  HPC Operating Systems: Windows and Linux/lightweight kernels
  • Building a cluster and troubleshooting
  • Launching jobs; compilers
  • Application performance
  • Lightweight kernels, OS Jitter
  • Essential commands
  • Troubleshooting
Network architectures for HPC
  • Management networks (serial, Ethernet, control nets)
  • Interconnect networks
  • Network topology and full-bisection bandwidth
  • Analyzing performance, latency/bandwidth, which affects applications
  • Network management, LAN/WAN, tuning for the WAN
  • Building security into the network architecture
Hands-on: Network troubleshooting and tuning

Hands-on: Windows as an HPC platform

Hands-on: Patching kernels

Wednesday, october 8, 2008

Data Architecture and Scheduling/Grid
  Data architectures for clusters
  • Local distributed storage vs centralized storage networks
  • Common disk technologies
  • Archival storage system hardware and architectures
  • Storage hardware: controllers, disks, cache, stripe size, block size, etc.
Data file system configurations
  • Overview of HPC file system concerns and file system architectures
  • Data flow, speeds and feeds, performance; Authentication and authorization
  • Local FS and NFS for system installs and software support
  • Parallel and distributed file systems ; WAN parallel file systems; pNFS
  • Parallel tools for applications
  • Backup strategies, file system policies, data reliability issues
  • Scalability, expectations for Petascale; MTBF for disks; Lessons learned
Scheduling and Grid Computing
  • Common Resource Managers and Schedulers
  • Results/applications-oriented policies
  • Accounting and allocations
  • Intro to Grid Computing
  • Meta-scheduling/Co-scheduling; Workflows; Application examples
Hands-on: Building Clusters/Packages

Hands-on: Building RAID/LUNs; Parallel file systems

thursday, october 9, 2008

Cluster Management and the Parallel Application Environment
  Cluster Management
  • Common management tools
  • Software management/change control
  • Backup management
  • Logging/monitoring for automated problem determination and security
  • Security plans/procedures, Risk Analysis
  • Regression tests
  • Monitoring tools
Application Benchmarking
  • Running HPL; getting on Top 500
  • Selecting hardware based on application performance
  • Tuning and knowing your application
Creating the Parallel Application Environment
  • Numerical libraries and compilers
  • User documentation, policies, training
  • Science gateways, workflows
  • Petascale application examples
Hands-on: Cluster management

Hands-on: Using numerical libraries

friday, october 10, 2008

Parallel Application Environment, cont’d
  Tracing and Debugging Parallel Applications
  • TotalView debugging and OpenMPI-specific tools
  • Performance counters
Hands-on: Debugging Parallel Applications