Skip to main content

Why do computers stop and what can be done about it?

An analysis of the failure statistics of a commercially available fault-tolerant system shows that administration and software are the major contributors to failure. Various approaches to software fault-tolerance are then discussed -- notably process-pairs, transactions and reliable storage. It is pointed out that faults in production software are often soft (transient) and that a transaction mechanism combined with persistent process- pairs provides fault-tolerant execution -- the key to software fault-tolerance. 

Read the article over here or alternatively here.

Ron - the problem management guy

Comments

Popular posts from this blog

LDWin: Link Discovery for Windows

LDWin supports the following methods of link discovery: CDP - Cisco Discovery Protocol LLDP - Link Layer Discovery Protocol Download LDWin from here.

Battery Room Explosion

A hydrogen explosion occurred in an Uninterruptible Power Source (UPS) battery room. The explosion blew a 400 ft2 hole in the roof, collapsed numerous walls and ceilings throughout the building, and significantly damaged a large portion of the 50,000 ft2 building. Fortunately, the computer/data center was vacant at the time and there were no injuries. Read more about the explosion over at hydrogen tools here .

STG (SNMP Traffic Grapher)

This freeware utility allows monitoring of supporting SNMPv1 and SNMPv2c devices including Cisco. Intended as fast aid for network administrators who need prompt access to current information about state of network equipment. Access STG here (original site) or alternatively here .