Minimizing Hard Disk Drive Failure and Data Loss/Environmental Control

Temperature control edit

Overheating is purported to be a common cause of drive failure. Overheating can cause the platters to expand. If the disk's read-and-write head comes in contact with the disk's surface, a catastrophic head crash can result.

Each drive has a specified lower and upper bounded operating temperature. In addition, drives that constantly run relatively hot, i.e. near the upper bound of the operating temperature are thought to have a reduced lifetime.

Inadequate ventilation, especially during the summer months, can cause a drive's temperature to exceed safe levels. In desktops, this can be handled by ensuring that a computer fan is installed near each drive to move hot air outside. Other types of computer cooling can also be used as an alternative or in addition to basic air cooling. Air conditioning can be used if the room or the area in which the computer is present becomes too hot.

Laptops also can be cooled more using a laptop cooler, with an active cooler preferred over a passive cooler. This can be especially important if the drive's temperature is high.

External hard disk drives must preferably be enclosed in a disk enclosure that has a fan, rather than one without a fan. An absence of a fan in the enclosure can be partly compensated for by using an ordinary table fan to improve airflow around the enclosure. Stacking multiple external drives together, especially if they do not have fans, is strongly discouraged as it impedes heat transfer.

Temperature monitoring edit

Several drives include a temperature sensor and a thermal monitoring feature. The sensor can be queried using software and the drive's current temperature can be steadily monitored. Two free Windows software applications that do this are HD Tune and SpeedFan; in Linux the hddtemp command can be used. Several other programs are available as well. If the temperature exceeds a preset threshold, perhaps 50 °C, the monitoring application can be configured to log the event, warn the user, and shut down the drive or computer. If the drive includes a thermal monitoring feature, it shuts down the drive if its temperature reaches a critical level, perhaps 65 °C.

A common misconception is that a colder hard drive will last longer than a hotter hard drive. A 2007 study by Google showed the reverse to be true.[1] Hard drives with average temperatures below 27 °C had a failure rate worse than hard drives with the highest reported average temperature of 50 °C, and a failure rate at least twice as high as the optimum temperature range of 37 °C to 46 °C.[1]


 

Average temperatures versus annual failures rates for HDDs

It is recommended that the operating temperature of a drive not steadily exceed 47 °C, as this may disproportionately reduce its life. This, however, may not be feasible in laptops.

The 2013 University of Virginia study of 10,000 hard drives in a Microsoft datacenter found that the annual failure rate steadily increases with temperature, from about 4% per year at 27 °C to about 10% per year at 44 °C (Figure 5). Assuming an Arrhenius equation, that gives twice the number of failures for every 12 °C increase in temperature (section 6.1). The study concludes that the annual failure rate steadly increases with temperature from about 2.75% per year at 40 °C to about 6% per year at 55 °C (Table 2). [2]


 

To do:
It seems inconsistent to say the failure rate increases with temperature from 10% at 44 °C to 6% at 55 °C. Fix that problem in the above paragraph. Is it a simple typo?


The 2014 Backblaze survey of 34,000 hard drive found no correlation between temperature and failure rate.[3]

Even in the U.S., as is true in most engineering fields, it is highly recommended that the Celsius temperature scale be used for managing computer temperatures.

Unreadable sensor data edit

At times, a drive may include a temperature sensor, but the temperature data may not be readable. This is possible under at least three conditions:

  1. The drive is part of a RAID. Especially in case of hardware-based RAID, the drive itself will not be seen by the operating system; only the logical RAID drive will be seen.
  2. The drive is connected to a controller card, irrespective of whether or not the card implements RAID.
  3. The drive is external to the computer.

Under such situations, it might be possible to glue or tape an external temperature sensor to the drive's surface. Alternatively, if the drive is in a storage backplane, the backplane may have a built-in temperature sensor with a configurable threshold. With such external sensors, the temperature threshold can be set to a few degrees, perhaps 5 °C less than that for an internal sensor.

Condensation control edit

If a computer is moved from a cold place, such as outdoors, to a relatively warm place, such as indoors, it can result in condensation inside the drive and on other system components. Damage can ensue if the condensation is not given sufficient time to evaporate before the device is powered on. Depending upon the change in the device's temperature, the time needed to acclimatize the device can be up to several hours. Rapid and extreme temperature changes should be avoided for this reason.

Air quality control edit

Tobacco smoke and other particulates in the air near a computer may adversely affect the drive. Smoking in the presence of a computer is therefore discouraged. Particulate reduction, if necessary, may be achieved by the use of an effective air purifier.

Vibration control edit

Powerful vibrations caused near the computer, such as those caused by a subwoofer, may increase the risk of a head crash and of data corruption. Accordingly, such vibrations can be limited. One way to isolate low frequency vibrations is by supporting the speakers or the drive enclosure on spikes.

Motion control edit

The sudden accelerating movement of the computer, especially when it is powered on, may possibly result in damage to the drive. Laptops are especially prone to such damage. Such movements should therefore be avoided.

An external drive, if placed upright without a stand, is at risk of tipping and falling when in use, thus possibly causing damage. Laying it on a level position prevents this risk.

Shipment damage control edit

During shipment, a drive is at risk of being damaged due to shock and vibrations. Adequate cushioning can be used to reduce the risk of damage.

Magnetic field control edit

 
Strong magnets are very harmful to hard disks and should be kept away from computers at all costs.

Data is stored on a drive using magnetism. An external device with a strong magnetic field has a risk of causing data loss if the device is brought close to the computer. Such devices often come with a warning notice which states the minimum distance they are to be kept away from other electronic devices such as computers.

One less known source of magnetism is speakers. All speakers contain a magnet that allows them to produce sound. Large speakers, such as subwoofers, should be kept away from computers for this reason. In computer speakers, the magnets are small enough that they will not harm your hard disk when they are placed near your computer.

Further reading edit

  1. a b Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz André Barroso [1][2] (February 2007). "Failure Trends in a Large Disk Drive Population". USENIX Conference on File and Storage Technologies. 5th USENIX Conference on File and Storage Technologies (FAST 2007). http://research.google.com/archive/disk_failures.pdf. Retrieved 2008-09-15. 
  2. Sriram Sankar, Mark Shaw, Kushagra Vaid, Sudhanva Gurumurthi. "Datacenter Scale Evaluation of the Impact of Temperature on Hard Disk Drive Failures". [[3]] [doi: 10.1145/2491472.2491475]. 2013.
  3. Brian Beach. http://blog.backblaze.com/2014/05/12/hard-drive-temperature-does-it-matter/ "Hard Drive Temperature – Does It Matter?"]. May 12, 2014.