NAS disk failure

It starts to look like a tradition, starting the year with a failing disk in my TrueNAS based DYI NAS. It started last Friday afternoon when a mail arrived:

New alerts:
* Device: /dev/ada6, 1 Currently unreadable (pending) sectors.

Current alerts:
* Device: /dev/ada6, 1 Currently unreadable (pending) sectors.

Well, while an unreadable pending sector is not a good sign, it’s also not necessarily a big issue. However, during the next scheduled pool scrub, another mail arrived:

New alerts:
* Pool volume0 state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

Current alerts:
* Device: /dev/ada6, 1 Currently unreadable (pending) sectors.

* Pool volume0 state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

And five minutes later:

New alert:
* Pool volume0 state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:

* Disk WDC WD60EFRX-68L0BN1 WD-WX11DC7FE3KN is FAULTED

Seriously? That disk only had 65 959 power on hours, equaling 7 years, 193 days and 7 hours. [Spoiler Alert: It may still get another chance]

Taking no chances, even running RaidZ2, I replaced the disk at first available chance. From my drawer I found the rehabilitated Toshiba NAS N300 disk, which put a scare in me a year ago, and swapped it with the now failing WD Red WD60EFRX. After resilvering, the pool was back to ONLINE status.

Now the WD Red disk is being tested with “badblocks” and so far it seems to be okay. Testing with pattern “0xaa” has been done and Reading and comparing too, and so far no errors has been encountered.

The picture is a screenshot os a Ubuntu Linux shell, running badblocks test of an HDD. The test has completed first pattern and 21,82 % of second pattern without errors.

Most likely the offending sector has been remapped by the disk. It will be another 72 hours or so before the testing is complete and then I will run another Extended Offline SMART test before deciding the future of the disk, but I am hopeful so far.

My offsite backup NAS has also started complaining about a failing disk, so if the WD Red disk clears the tests it will probably be relocated/relegated to this secondary yet critically important position.

About Uffe R. B. Andersen

Uffes weblog
This entry was posted in In english, Nørderi and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.