On Christmas Eve 2024-12-24, I mistakenly nuked my home server, the night before I was leaving the country for a month-long vacation to India. I bought a used 1.92Tb SSD to be one of two drives used to store my media, because my single mirror vdev was reaching 70% capacity. The plan was to buy the second drive when I returned so that I can create a separate ZFS zpool with a single mirror vdev.
I wanted to verify that my “new” SSD has relatively few powered-on hours and
that it has withstood fewer than 10% of its rated terabytes written (which it
actually did, making this used SSD an economical purchase). While that SSD was
plugged into my server, I was planning to use it as a hot spare until I returned
from my vacation and bought the 2nd SSD to complete my media zpool. To verify
power-on hours and terabytes written, I wanted to use the smartctl command
line tool on my SSD. The SATA power cable going from my PSU to my 4 existing
drives didn’t have any more available ports that could plug into my “new”/fifth
SSD. I couldn’t find the PSU box where I keep my extra cables either. I did,
however, have the PSU box for my workstation PSU and that had some extra cables.
This was my first mistake. I wanted to use one of those cables to plug one
end into my server PSU and the other into my new SSD. I figured that these
cables, like most for my computer, are standardized. However when I plugged that
cable in, my server wouldn’t post/boot properly. It would boot/post properly
when I removed that cable, however. I thought that I might be using a wrong port
in my modular PSU, so I tried all of the similarly shaped/pinned ports. This
was my second mistake. Now my computer wouldn’t even post/boot properly when
any SSD SATA power cable was plugged into the PSU. I had effectively made my
server unusable – and on the night before I was leaving the country for a
month. I used a SATA to USB enclosure that I had laying around to verify that
atleast one of the drives was still OK, and found that it was. My zeroth
mistake was that I could have just uesd this enclosure to check the power on
hours on a separate computer, and I never needed to plug it into my server in
the first place.
How I Nuked My Home Server
It turns out that moduler PSU peripheral cables are NOT interchangable/standardized. They have different pin-outs. By using a cable from another PSU with a different pin-out, I had effectively sent random voltages to my SSDs. Huge yikes. I ended up shorting that port on my PSU, and it shut off to save my SSDs. My Seasonic PSU worked as well as it could have. By trying every peripheral port on my modular PSU, I shorted all of those ports, making it impossible to power any of my SSDs. My server became unusable the night before I left for a 1 month vacation.
Lessons Learned
- NEVER perform server maintenance right before going on vacation. I can roll back software changes; I can’t roll back hardware changes.
- Cables for modular PSUs are NOT standardized/interchangable like most other cables are. Only use cables directly from the same manufacturer that are specifically labeled as compatible for your specific PSU.
- Never perform server maintenance past 7PM.
- ALWAYS HAVE BACKUPS.
- TEST YOUR BACKUPS.
Aftermath
I had to buy a new $100 PSU, but all of my drives were safe. Even if they weren’t my backups would have saved me. This was the first time where my backups might have really come in clutch. This was a somewhat expensive mistake, but using a good quality PSU and having regular well-tested backups prevented any real damage.
The worst that happened is that I did not have access to my Jupyterlab environment in India and I had to pay $100 for a new PSU, which is not so bad, all things considered.