program
October 6-10, 2008:
IBM Executive Briefing Center
Montpellier, France
The LCI reserves the right to cancel workshops
for any reason.
monday |
tuesday |
wednesday |
thursday |
Friday
monday,
october 6, 2008 |
| Intro to HPC, Clusters and Compute
Node Architecture |
| |
Intro to current state of clusters and HPC
- Types of parallelism/distributed computing vs. parallel computing
- Evolution to multi-core computing , application impact/implications
- Data scale and flow
- HPC Trends: green computing, power efficiencies, FPGAs, etc.
- Top500, Petascale examples
Compute Node Architecture/Intel/Power/BG Roadmap
- Processors/Cores, Multi-core
- Caches/Memory Architecture
- Bus architectures/system configuration
Hands-on: Intro to parallel computing |
tuesday,
october 7, 2008 |
| HPC Operating Systems and Network
Architecture |
| |
HPC Operating Systems: Windows and
Linux/lightweight kernels
- Building a cluster and troubleshooting
- Launching jobs; compilers
- Application performance
- Lightweight kernels, OS Jitter
- Essential commands
- Troubleshooting
Network architectures for HPC
- Management networks (serial, Ethernet, control nets)
- Interconnect networks
- Network topology and full-bisection bandwidth
- Analyzing performance, latency/bandwidth, which affects applications
- Network management, LAN/WAN, tuning for the WAN
- Building security into the network architecture
Hands-on: Network troubleshooting and tuning
Hands-on: Windows as an HPC platform
Hands-on: Patching kernels |
Wednesday,
october 8, 2008 |
| Data Architecture and Scheduling/Grid |
| |
Data architectures for clusters
- Local distributed storage vs centralized storage networks
- Common disk technologies
- Archival storage system hardware and architectures
- Storage hardware: controllers, disks, cache, stripe size,
block size, etc.
Data file system configurations
- Overview of HPC file system concerns and file system architectures
- Data flow, speeds and feeds, performance; Authentication and
authorization
- Local FS and NFS for system installs and software support
- Parallel and distributed file systems ; WAN parallel file
systems; pNFS
- Parallel tools for applications
- Backup strategies, file system policies, data reliability
issues
- Scalability, expectations for Petascale; MTBF for disks; Lessons
learned
Scheduling and Grid Computing
- Common Resource Managers and Schedulers
- Results/applications-oriented policies
- Accounting and allocations
- Intro to Grid Computing
- Meta-scheduling/Co-scheduling; Workflows; Application examples
Hands-on: Building Clusters/Packages
Hands-on: Building RAID/LUNs; Parallel file systems |
thursday,
october 9, 2008 |
| Cluster Management and the Parallel
Application Environment |
| |
Cluster Management
- Common management tools
- Software management/change control
- Backup management
- Logging/monitoring for automated problem determination and
security
- Security plans/procedures, Risk Analysis
- Regression tests
- Monitoring tools
Application Benchmarking
- Running HPL; getting on Top 500
- Selecting hardware based on application
performance
- Tuning and knowing your application
Creating the Parallel Application Environment
- Numerical libraries and compilers
- User documentation, policies, training
- Science gateways, workflows
- Petascale application examples
Hands-on: Cluster management
Hands-on: Using numerical libraries |
friday,
october 10, 2008 |
| Parallel Application Environment,
cont’d |
| |
Tracing and Debugging Parallel Applications
- TotalView debugging and OpenMPI-specific tools
- Performance counters
Hands-on: Debugging Parallel Applications |