| ECE 259 / CPS 221 |
|
Advanced Computer Architecture II |
| Spring 2006 |
| Professor Daniel J. Sorin |
| Objectives |
| The objective of this course is to provide students with an understanding of parallel computer architectures. Students will read research papers, |
| lead in-class discussions of papers, perform a research project, and present their research projects both in written and oral formats. |
| The course focuses on both the design and evaluation of multiprocessor systems. The main design themes of this course are: parallel programming, system organizations, shared memory multiprocessors, memory consistency models, interconnection networks, high availability systems, interactions with current microprocessor and I/O technology, novel architectures, and emerging technologies. The evaluation portion of this course will focus on metrics, modeling, simulation, and workloads for benchmarking. |
| Prerequisites: ECE 252, CPS 220, or consent of instructor. |
| Class Location and Hours |
Class meets Monday/Wednesday/Friday from 10:20am - 11:10am.
Location: LSRC D106.
| Instructor |
Office: 209C Hudson Hall
Office Hours: Monday 11:15-noon, Wednesday 2:45-3:30
Email: 
| Materials |
| This course has a textbook for background material and for reference, but the emphasis of the class will be discussions of research papers. |
|
Textbook: Parallel Computer Architecture. David Culler and J.P. Singh |
| Assignments and Grading |
| This is a graduate level class that will not require "busy work." This class will, however, require that students learn the reading material and learn |
| how to present research in both written and oral formats (see Hill and Patterson for useful advice for presentations). Communication is very |
| important in this class. Students who struggle with reading and writing are encouraged to take this course but should expect to work hard and to |
| improve their communication skills in the process. |
Students are responsible for:
| The project is a semester-long assignment that should reflect the goal of being no more than "a stone's throw" away from a research paper. As |
| such, the project will require: |
| Deadlines will be enforced except under extreme circumstances. I would prefer that you turn in something not quite done on the due date rather than waiting until after the deadline to try to finish it. Any assignment/project that is late by less than 24 hours will lose 50%. Any assignment/project that is more than 24 hours late will receive a zero. | ||||||||
| |
||||||||
| Academic Misconduct: I will not tolerate academically dishonest work. This includes cheating on the final exam and plagiarism on the project. | ||||||||
Be careful on the
project to cite prior work and to give proper credit to others'
research.
I will post lecture notes (in PowerPoint format) shortly before I cover them in class. Segment 1: Introduction Segment 2: Parallel Programming Segment 3: System Organizations and Scalable Machines without Cache Coherence Segment 4: Shared Memory and Cache Coherence Segment 5: Memory Consistency and Synchronization Optimizations Segment 6: Interconnection Networks Segment 7: Evaluation Segment 8: Availability No slides for material past this point.
CM-5 (Pete Golden) Starfire (Mahmut Yilmaz) Multicast Snooping (Derek Hower) AlphaServer GS320 (Curt Harting) Token Coherence (Jerry Wu) Piranha (Garver Moore) Niagara (Bogdan Romanescu) R-NUMA (Anita Lungu) Wildfire (Terry Arnold) SC+ILP=RC (Costi Pistol) Speculative Lock Elision (Nathan Sadler) Virtual Channel Flow Control (John Calandrino) Alpha 21364 Interconnection Network (Luis Campos) AMVA Model (Bogdan Romanescu) Simics (Clif Kerr) WRL Commercial Workloads (Jerry Wu) Simulating $2M Server (Derek Hower) IBM Mainframes (Mahmut Yilmaz) SafetyNet (Anita Lungu) ROC (John Calandrino) Tarantula (Curt Harting) Raw (Garver Moore) |
| Topics and Readings |
Readings in italics are optional material.
| Theme |
# |
Topic |
Readings |
|
Introduction |
1 |
Why Study Multiprocessors parallelism, limits, Amdahl’s Law |
Culler/Singh 1.0, 1.1 |
| Parallel Programming |
2 |
Programming Models message passing, shared memory, performance and scaling |
Culler/Singh 2,3 (can skim/skip 3.5) |
|
3 |
Synchronization Basics atomic operations, locks, barriers |
Culler/Singh 2.3.4-2.3.6, 5.5 |
|
|
Machine Organizations and Scalable Systems without Cache Coherence |
4 |
System Organizations SIMD: MMX, vectors, DSP MIMD |
Culler/Singh 1.2 |
|
5 |
Scalable, Non-Coherent Multiprocessors message passing: Paragon, CM5, active messages shared physical memory: Cray T3E |
Culler/Singh 7.2, 7.5, 7.6 "The Network Architecture of the Connection Machine CM-5" "Synchronization and Communication in the Cray T3E Multiprocessor" |
|
|
Cache-Coherent Shared Memory Multiprocessors |
6 |
Shared Memory & Cache Coherence |
Culler/Singh 5.0, 5.1 |
|
7 |
Snooping Cache Coherence |
Culler/Singh 5.3-5.7 (skim 5.4), 6 "Starfire: Extending the SMP Envelope" "Multicast Snooping: A New Coherence Method Using a Multicast Address Network" |
|
|
8 |
Directory Cache Coherence |
Culler/Singh 8 (skim 8.3) "The Stanford DASH Multiprocessor" |
|
|
9 |
Advanced Coherence Topics: Token Coherence, Chip Multiprocessors |
"Token Coherence: Decoupling Performance and Correctness" "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing" |
|
| 10 |
COMA - Cache Only Memory Arch classic: DDM, KSR-1 new: S-COMA, R-NUMA, Wildfire |
Culler/Singh 9.2.2 "DDM--A Cache-Only Memory Architecture" |
|
|
Memory Consistency Models |
11 |
Memory Consistency Basics |
Culler/Singh 5.2, 9.1 |
|
12 |
Consistency Optimizations speculation, Scheurich's optimization |
"Two Techniques to Enhance the Performance of Memory Consistency Models" |
|
|
13 |
Synchronization Optimizations
|
"Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution" |
|
|
Interconnection Networks |
14 |
Interconnection Network Basics topology, routing, flow control |
Culler/Singh 10 |
|
15 |
Deadlock Avoidance virtual channels, turn model, hot-potato routing |
"Virtual Channel Flow Control" "A Survey of Wormhole Routing Techniques in Direct Networks" [includes "Turn Model" concept] |
|
|
Evaluation Tools and Methodology |
16 |
Evaluation: Metrics & Modeling scalability, throughput, why not IPC? mathematical modeling of performance |
Culler/Singh 4 (skim 4.4) "Cost-Effective Parallel Computing" "Analytic Evaluation of Shared-Memory Parallel Systems with ILP Processors" |
|
17 |
Evaluation: Simulation precision vs. performance full-system, parallel host |
Culler/Singh 4 (skim 4.4) "Simics: A Full System Simulation Platform" "The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers" |
|
|
18 |
Evaluation: Workloads scientific vs. commercial, TLP, importance of benchmark selection |
||
|
Reliability and Availability |
19 |
Available Computers
|
"IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective" |
|
20 |
Current Topics in Availability |
"Embracing Failure: A Case for Recovery-Oriented Computing (ROC)" |
|
|
Novel |
21 |
Vector Machines |
"Tarantula: A Vector Extension to the Alpha Architecture" "Optimizing Compiler for a CELL Processor" (do not worry about compiler details!) |
|
22 |
Dataflow |
Culler/Singh 1.2.6 "Executing a Program on the MIT Tagged-Token Dataflow Architecture" |
|
|
23 |
Grid Architectures |
||
|
24 |
Supercomputing
|
"Blue Gene: A Vision for Protein Science Using a Petaflop Supercomputer" |
|
|
Interactions with Processors and I/O |
25 |
Microarchitectural Effects parallelism: ILP, MLP, TLP |
"An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors" |
26 |
I/O |
||
|
New |
27 |
Quantum Computing |
|
|
28 |
Bio/Molecular Computing |
"Circuit and System Architecture for DNA-Guided Self-Assembly of Nanoelectronics" |