| ECE
254 / CPS 225 |
|
Fault-Tolerant
and Testable Computing Systems |
| Fall 2006 |
| Professor Daniel J. Sorin |
| Course
Objective and Content |
| Objective:
To provide students with an understanding
of fault tolerant computers, including both the theory of how to design
and evaluate them and the practical knowledge of real fault tolerant
systems. |
| Content: The main
themes of
this course are: technological reasons for faults, fault models,
information redundancy, spatial redundancy, backward and forward error
recovery, fault-tolerant
hardware and software, modeling and analysis, testing, and design for
test. |
| The course includes a project
that will allow the
students to apply what they have learned in class. |
| Prerequisites: ECE 152 or CPS 104 or consent of instructor. |
| Class Location and Hours |
Class meets M/W/F from 10:20am - 11:10am.
Location: 201 Hudson Hall
| Instructor |
Office: 209C Hudson Hall
Office Hours: Monday 2:00-3:00, Wednesday 11:10 (after class) -12:00
Email: 
| Textbook |
There is no textbook for this course. If you still think you'd like one, let me know and I can recommend one.
| Assignments and Grading |
Students are responsible for:
| The project is a significant assignment that requires: |
| Deadlines will be enforced except under extreme circumstances. I would prefer that you turn in something not quite done on the due date rather | |
|
than
waiting until after the deadline to try to finish it. Each day late will result in a 10%
reduction of the grade given. |
|
| |
|
| Academic Misconduct: I will not tolerate academically dishonest work. This includes cheating on the homework and exams and plagiarism on the project. | |
Be careful on the
project to cite prior work and to give proper credit to others'
work.
|
| Course
Topics, Lecture Notes, and Readings |
| Homework Assignments |
Homework #1, Due Friday, Sept 15 in class
Homework #2, Due Monday, Sept 25 in class (Exercise 3.2 must be emailed to me by 10:00am on Sept 25)
Homework #3, Due Monday, Oct 30 in class (Exercise 6.4 must be emailed to me by 10:00am on Oct 30)
Homework #4, Due Monday, Nov 20 in class
| Tentative
Schedule (subject to change) |
| Week |
Monday |
Wednesday | Friday |
| Aug 28 |
introduction | faults and their causes | faults and their causes |
| Sep 4 | "IBM Experiment in Soft Fails" | "A Large-Scale Study of Failures" | "Why Do Internet Services Fail?" |
| Sep 11 | basic FT concepts | physical redundancy | "The Teramac" |
| Sep 18 | information redundancy | re-execution techniques | "AR-SMT" |
| Sep 25 | backward error recovery | "A Survey of Rollback-Recovery Protocols" | review for midterm |
| Oct 2 |
MIDTERM EXAM | FT microprocessors, "RAS Strategy for IBM S/390" | FT memory, "Chipkill Memory" |
| Oct 9 |
FALL BREAK |
FT disks | FT networks, "End-to-End Arguments" |
| Oct 16 |
FT multiprocessors
project proposals due |
FT software, "Proactive Management of Software Aging" | FT software, "Software Implemented Fault Tolerance" |
| Oct 23 |
FT software, "The Google Cluster Architecture" | modeling/evaluation, "Modeling the Effect of Technology Trends" | modeling/evaluation |
| Oct 30 |
modeling/evaluation, "The Impact of Technology Scaling on Lifetime Reliability" | modeling/evaluation | modeling/evaluation |
| Nov 6 |
testing
project progress reports due |
testing | testing |
| Nov 13 |
design for test | "Validating the Pentium 4 Microprocessor" | "IDDQ Test" |
| Nov 20 |
review for final | THANKSGIVING |
|
| Nov 27 |
|
||