Abstract
An effective failure-detection scheme is essential for reliable communication services. Most computer network rely on behavior-based detection schemes: each node uses heartbeats to detect the failure of its neighbor nodes, and the transport protocol (like TCP) achieves reliable communication by acknowledgment/retransmission. In this paper, we experimentally evaluate the effectiveness of such behavior-based detection schemes in real-time communication. Specifically, we measure and analyze the coverage and latency of two failure-detection schemes-neighbor detection and end-to-end detection-through fault-injection experiments. The experimental results have shown that a significant portion of failures can be detected very quickly by the neighbor detection scheme, while the end-to-end detection scheme uncovers the remaining failures with larger detection latencies.
Original language | English |
---|---|
Title of host publication | Digest of Papers - 27th Annual International Symposium on Fault-Tolerant Computing, FTCS 1997 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 122-131 |
Number of pages | 10 |
ISBN (Electronic) | 0818678313, 9780818678318 |
DOIs | |
Publication status | Published - 1997 |
Event | 27th Annual International Symposium on Fault-Tolerant Computing, FTCS 1997 - Seattle, United States Duration: 1997 Jun 24 → 1997 Jun 27 |
Publication series
Name | Digest of Papers - 27th Annual International Symposium on Fault-Tolerant Computing, FTCS 1997 |
---|
Other
Other | 27th Annual International Symposium on Fault-Tolerant Computing, FTCS 1997 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 97/6/24 → 97/6/27 |
Bibliographical note
Funding Information:The work reported in this paper was supported in part by the Advanced Research Projects Agency, monitored by the US Airforce Rome Laboratory under Grant F30602-95-1-0044, the National Science Foundation under Grant MIP-9203895 and the Office of Naval Research under Grant N00014-94-1-0229. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
Funding Information:
*The work reported in this paper was supported in part by the Advanced Research Projects Agency, monitored by the US Airforce Rome Laboratory under Grant F30602-95-1-0044, the National Science Foundation under Grant MIP-9203895 and the Office of Naval Research under Grant N00014-94-1-0229. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
Publisher Copyright:
© 1997 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Hardware and Architecture
- Software
- Safety, Risk, Reliability and Quality