Learnt a tough lesson today, that I wasn't able to figure out for months. Goes to show that despite all the lessons you learn from books, lessons from mistakes are the best teachers.
Problem: This important network server has a unique reboot problem. While it was running, it would function absolutely fine. The moment I had to restart it for maintenance, it would become a pain. It just would not power back up. Fans worked, Hard drives spinned, but no POST beep or display at all with even the BIOS message.
Diagnostic Analysis: The problem started several months ago (over 6 months may be even a year) when it refused to start after a reboot. It was an old Gateway server with PIII bought in 2001. I thought it was dying, time to replace it. But of course, budget didn't accomodate it. Ok fine, let's stick to the technical stuff. The power connection seemed fine, cpu fan was working, hard drives were spinning, but no post beep, no bios display on screen, nothing. Stumped. Called Gateway, and they were no help. Since it was out of warranty all I got was some phone support and they said that it was a problem with the motherboard. Which I figured as well. In a desperate attempt, I unplugged everything, memory, cables, power and reseated everything again. I did this a couple of times, thinking it was just a matter of a bad connection somewhere. After countless attempts it magically booted. I thanked my stars and thought I fixed the problem. This was several months ago.
Around October 2006 or so, I had to restart the server and did it apprehensively. Same problem. It just won't boot back up. Again I took its guts apart and put it back together several times. The VGA connection seemed a bit less than snug. I had no idea why that could cause a problem, but I held it hard against its socket and powered up. Seemed to do the trick after several tries. Again I thought I figured out the problem. The VGA adapter connection on the motherboard has gone bad. So I ordered a new motherboard (~$200) to replace it, and waited until when I next had to restart it again.
Fast forward to today, several updates were due on this server and I really had to restart it. Bad idea. Just would not boot back. For the first time (I think), I noticed a blinking amber light on the front panel. Googled it and learned that a blinking amber light possibly meant a power problem.
[External Link]
Although, power light codes (or system beeps) may differ from vendor to vendor, there is some uniformity. A solid amber light meant that the computer was receiving power, but there may be an internal power problem. I considered this may be the real problem and replaced the motherboard I had. I think this is the first time I replaced a motherboard by myself. Removing the heat sink and then the processor was most worrisome. Anyway, I digress. Put the connections back exactly as on the previous motherboard. Prayed and powered up. No POST beep. Darn! Called Gateway again hoping they'd give me some checklist about replacing the motherboard. No, I did everything right. They thought that the new motherboard was DOA and I should call the vendor for tech support. He had no idea about what the blinking amber light could mean. Useless guy. Typical response of blaming the other guy. What did I expect.
I was about to give up when I reconsidered the power light code. Blinking amber light meant that a device might be malfunctioning or not correctly installed. Not necessarily an internal motherboard problem (which clearly had to be ruled out after replacing it). Looked at all the devices I had. I had removed all cards from the motherboard that didn't need to be there. And then I realized an old lesson I learnt and, for whatever stupid reason, had been overlooking all this time.
I had installed an extra hard drive a long time ago (probably about the time the problem started) when the server was running low on disk space. I had no tools to ensure the power supply was enough to support this extra device. This is a very basic thing to check when you are building a computer from scratch. After you choose a motherboard of your choice, you find a power supply that is more than sufficient to support all the devices. I learnt this specifically when studying for CompTIA A+ exam. This sytem came pre-built from Gateway, and most desktops in my experience came with enough supply to support hard drive upgrades. For whatever cheap reason, Gateway probably decided that they were going to use a power supply that was just sufficient. I removed the extra hard drive and presto, problem solved. It boots up immediately. Phew!
Side note: A side-effect of replacing motherboard is CMOS gets reset. The server which was a Domain Controller was unable to login into its own network and gave the error message that the time was different from network time. Kerberos authentication prevents logons from a computer that has a time difference of more than 6 minutes (or so). Being a domain controller I cannot even logon locally to reset the time. Had to restart to set the date time in BIOS. Wasn't sure if I should. After all I was wrong twice. Oh well, took the risk, cause this time I knew the solution without guessing and was pretty sure I fixed the problem. And yes, it worked. Restarted just fine. Set the correct date time in BIOS. And now I have a working network server again. :)
Moral Of The Story: Sometimes the simplest of answers to the toughest puzzles lies in the assumptions you make by default.
Monday, February 05, 2007
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment