|
Distributed Systems (Special Topics 7680)
Class Information |
Instructor
Course Description
This course is an introduction to distributed systems.
The lectures will cover fundamental concepts in distributed
systems showing how they are applied when building reliable
distributed systems and services. Topics include:
-
Internet communication protocols.
Client Server paradigm, RPC, Corba.
-
How and why computers systems fail. How to overcome failures in a distributed system.
Failures models. The distributed commit problem.
-
Clock synchronization and synchronous systems.
-
Dynamic membership. Replicating data with malicious failures. Impossibility of asynchronous consensus.
-
Group communication systems, properties and dynamic group membership.
Causal and total order.
-
Virtually synchronous algorithms and tools: replicated data, state transfer, load-balancing, primary-backup and coordinator-cohort fault tolerance.
-
Transactional model and implementation of a transactional storage systems.
Distributed transactions and multiphase commit.
-
Distributed hash tables.
-
Application of distributed systems concepts to real systems.
GFS, HDFS, BigTable, HBase, Spanner. Chubby. Zookeeper. Zab. MapReduce.
Grade
The grade will be based on several written homework assignments (HW),
programming projects (PP), a final project (FP), and class
participation (in class, piazza, office hours, etc) (CP)
as follows:
Grade = 25%*HW + 45%*PP + 20%*FP + 10%CP.
Programming projects
Programming language required is C, platform is Linux.
Final Project, any language.
Textbooks and reading list
- Reading will be assigned during each lecture, see also the list at the end of the page.
-
Reliable Distributed Systems: Technologies, Web Services and Applications. Ken Birman.
2005, XXXVI, 668 p. 145 illus., Hardcover
ISBN: 0-387-21509-3
-
UNIX Network Programming, Volume 1, Second Edition: Networking APIs: Sockets and XTI, Prentice Hall, 1998, ISBN 0-13-490012-X.
W. Richard Stevens
Academic Integrity
Academic Honesty and Ethical behavior are required in this course,
as it is in all courses at Northeastern University. There is zero
tolerance to cheating.
You are encouraged to talk with the professor about any questions
you have about what is permitted on any particular assignment.
|
Lectures |
Lecture slides will be posted below. Homework and projects will be handed
in class and/or posted on piazza. All class communication will take place on piazza.
Preliminary plan and topics below.
Week |
Topics |
Homework |
Project |
Week 1 |
Topic 1 - Introduction. Class
policy. Examples.
|
|
|
Week 2 |
Topic 2 -
Time in distributed systems (Lamport clocks, vector clocks, NTP).
Global states and distributed snapshots. Failure detectors.
No class on Monday - University Holiday.
|
Hw1 assigned |
|
Week 3 |
Topic 2 - cont.
Topic 3 - Consensus: synchronous
systems, asynchronous systems, byzantine failures (including
randomized solutions). |
|
|
Week 4 |
Topic 3 -
Consensus: cont.
Topic 4 Distributed commit (2PC and 3PC)
|
Hw1 due. |
Project 1 assigned. |
Week 5 |
Topic 4 Distributed commit
(2PC and 3PC): cont. |
|
|
Week 6 |
Topic 5 - Process Groups: Leader election, membership, reliable multicast, virtual synchrony.
| Hw2 assigned |
Project 1 due Project 2 assigned. |
Week 7 |
No class on
Monday because of Presidents Day. Also class on
Feb 24.
Topic 6 - Quorums. Paxos. Viewstamped replication. BFT. |
|
|
Week 8 |
Topic 7 - Gossip protocols
Topic 8 - Distributed Hash Tables
|
Hw2 due |
|
|
Week 9 |
Spring break. |
|
|
Week 10 |
Topic 9 - GFS, HDFS.
Topic 10 - BigTable, HBase, Spanner.
| |
Project 2 due. Project 3 assigned. |
Week 11 |
Topic 10 - BigTable, HBase,
Spanner.
Topic 11 - Chubby. Zookeeper. Zab
|
|
|
Week 12 |
Topic 11 - Chubby. Zookeeper. Zab |
Project 3 due. |
|
Week 13 |
Topic 12 - MapReduce. Mesos. Yarn. |
|
|
Week 14 |
Topic 13 - Bitcoin. Security issues.
|
|
|
| |
Reading List |
- J. Gray. Why Do Computers Stop and What can be done about it? 1985.
- Saltzer, Reed, Clark. End to end arguments in System Design. TOCS 1990.
- D. Oppenheimer, A.Ganapathi and D. A. Patterson. Why do Internet services fail, and what can be done about it? 2003.
- L. Lamport for "Time, Clocks, and the Ordering of Events in a Distributed System,"
Communications of the ACM, July 1978,
21(7):558-565. E.W. Dijkstra Prize 2000, SIGOPS Hall of Fame.
- Mattern, F. "Virtual Time and Global States of Distributed Systems", in Cosnard, M., Proc. Workshop on Parallel and Distributed Algorithms, Chateau de Bonas, France: Elsevier, pp. 215–226, 1988.
- K. M. Chandy and L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems. ACM Transactions on Computer Systems, Vol. 3, No. 1, February, 1985, pp. 63-75. SIGOPS Hall of Fame.
- T. Chandra and S. Toueg. Unreliable Failure Detectors for Reliable Distributed Systems, 1996.
- J. Halpern and Y. Moses Knowledge and Common Knowledge in a Distributed Environment E.W. Dijkstra Prize 2009.
- M.J.Fischer, N.A.Lynch and M.S. Paterson. Impossibility of Distributed Consensus with One Faulty Process. ACM SPDS 1983.
E.W. Dijkstra Prize, 2001.
- L. Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems 4(3):382-401, July 1982.
- M. Ben-Or. Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocol. PODC '83.
- K. P. Birman and T. A. Joseph. Exploiting virtual synchrony in distributed systems.
In Proceedings of the ACM Symposium on OS Principles, pages 123--138, Austin, TX, 1987.
- L. E. Moser, Y. Amir, P. M. Melliar-Smith, D. A. Agarwal, Extended Virtual Synchrony,
The 14th IEEE International Conference on Distributed Computing Systems (ICDCS) 1994.
- Bernstein, Goodman and Hadzilakos. Distributed Recovery
- D. Skeen. Non-blocking Commit Protocols
- D. Skeen. Determining the Last Process to Fail
- F.B. Schneider. The State Machine Approach. SIGOPS Hall of Fame.
- T. Bressoud and F.B. Schneider Hypervisor-based Fault-Tolerance.
- E. Elnozahy, L. Alvisi, Y.M.Wang, and D.B. Johnson. A Survey of Rollback Recovery Protocols in Message Passing Systems
- L. Lamport: Paxos Made Simple
- L. Lamport: The Part-Time Parliament SIGOS Hall of Fame
- K.P. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky. Bimodal Multicast.
- D. Malkhi and M. Reiter Byzantine Quorum Systems
- L. Lamport, R. Shostak, and M. Pease The Byzantine Generals Problem E.W. Dijkstra Prize, 2005.
- M. Castro and B. Liskov Practical Byzantine Fault-Tolerance
- Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, H. Balakrishnan,
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, SIGCOMM 2001.
- Google File System. S, Ghemawat, H. Gobioff and S.-T. Leung. SOSP 2003.
- The Chubby Lock Service for Loosely-Coupled Distributed Systems. Mike Burrows, OSDI 2006
| |
Copyright© 2014 Cristina Nita-Rotaru. Send your comments and questions to Cristina Nita-Rotaru
|
|