New user's registration have been closed due to high spamming and low trafic on this forum. Please contact forum admins directly if you need an account. Thanks !
Failed B3 (another)
-
- Posts: 904
- Joined: 09 Oct 2009, 18:49
Failed B3 (another)
I *think* my B3 died this morning but I'm not sure. Prepare for a long post.
I went to bed last night with everything working OK.
This morning, I woke up to Pingdom alerts saying my website (hosted on B3) had been down since about 06:30.
The power LED was on and I could see my SSID on my phone. My phone would briefly connect to the wifi but would disconnect again within about 5 seconds.
So I rebooted B3 but it stayed on the pink LED as if the boot up process had hung. I left it and went to work.
When I got home the LED was blue but I could not connect to B3 over SSH on any wired interface or wifi. I booted up from a rescue stick which worked fine. The logs didn't't show any thing unusual and in fact, it would appear the B3 had been running happily most of the day, albeit with no network connectivity. fsck reported no problems with the disk.
Several hours of playing around with different files (i.e. disabling fail2ban, modifying iptables rule files) and most of the time B3 stays on a pink LED when booting. Occasionally, and not consistently, it would get to a blue LED after a very long wait but still with no connectivity. Twice in a row, I successfully connected to it after modifying /etc/network/interfaces by commenting out the br0 interface and assigning static IP addresses to both eth0 and eth1. After rebooting and uncommenting out br0, I was back to the pink LED again. So I thought this was the fix and commented out br0 again but this time the B3 has failed to boot successfully since so I think this was just coincidence.
So, although the disk appeared OK, I'm thought it must be the problem. I took out the disk and put in a known working disk that came with the B3 but currently lives in a B2. The first attempt to install from USB seemed be working initially but it ended up not rebooting and staying on the pink LED. The second attempt to install is still running, and also currently on pink LED. I will leave it until the morning and then try boot the thing (also, the USB stick will hopefully contain an install log, not sure how verbose they are though).
Anything obvious I'm missing? I'm thinking maybe the disk controller has failed.
I went to bed last night with everything working OK.
This morning, I woke up to Pingdom alerts saying my website (hosted on B3) had been down since about 06:30.
The power LED was on and I could see my SSID on my phone. My phone would briefly connect to the wifi but would disconnect again within about 5 seconds.
So I rebooted B3 but it stayed on the pink LED as if the boot up process had hung. I left it and went to work.
When I got home the LED was blue but I could not connect to B3 over SSH on any wired interface or wifi. I booted up from a rescue stick which worked fine. The logs didn't't show any thing unusual and in fact, it would appear the B3 had been running happily most of the day, albeit with no network connectivity. fsck reported no problems with the disk.
Several hours of playing around with different files (i.e. disabling fail2ban, modifying iptables rule files) and most of the time B3 stays on a pink LED when booting. Occasionally, and not consistently, it would get to a blue LED after a very long wait but still with no connectivity. Twice in a row, I successfully connected to it after modifying /etc/network/interfaces by commenting out the br0 interface and assigning static IP addresses to both eth0 and eth1. After rebooting and uncommenting out br0, I was back to the pink LED again. So I thought this was the fix and commented out br0 again but this time the B3 has failed to boot successfully since so I think this was just coincidence.
So, although the disk appeared OK, I'm thought it must be the problem. I took out the disk and put in a known working disk that came with the B3 but currently lives in a B2. The first attempt to install from USB seemed be working initially but it ended up not rebooting and staying on the pink LED. The second attempt to install is still running, and also currently on pink LED. I will leave it until the morning and then try boot the thing (also, the USB stick will hopefully contain an install log, not sure how verbose they are though).
Anything obvious I'm missing? I'm thinking maybe the disk controller has failed.
Re: Failed B3 (another)
Sounds to me you've tried everything possible, so I don't think I can supply anything useful. In my experience strange intermittent behaviour like this often have with bad capacitors to do, so you might look at the mainboard and see if you see any capacitors that are swollen (just use google image search to find out what to look for).
/Daniel
/Daniel
-
- Posts: 904
- Joined: 09 Oct 2009, 18:49
Re: Failed B3 (another)
Thought of that, there aren't any leaking or swollen capacitors either.
-
- Posts: 904
- Joined: 09 Oct 2009, 18:49
Re: Failed B3 (another)
OK, spare disk with a fresh install is working consistently so it's the disk/filesystem.
Despite fsck on the rescue stick finding no errors, fscking the disk in Ubuntu did find and fix a few errors and after that the B3 booted OK a few times in a row. However, I'd edited the /etc/network/interfaces file again as before and as soon as I uncommented out the br0 lines it has since refused to boot again. So is the disk, is it a NIC, is it the wifi card failing? I don't know. Arghh!
Despite fsck on the rescue stick finding no errors, fscking the disk in Ubuntu did find and fix a few errors and after that the B3 booted OK a few times in a row. However, I'd edited the /etc/network/interfaces file again as before and as soon as I uncommented out the br0 lines it has since refused to boot again. So is the disk, is it a NIC, is it the wifi card failing? I don't know. Arghh!
-
- Posts: 904
- Joined: 09 Oct 2009, 18:49
Re: Failed B3 (another)
So my previous statement that the bootup process was hanging was incorrect. What it does appear to be doing is fscking the disk on every boot. WTH? If I manually fsck every lv/partition it shows up no errors on the disk, /var/log/fsck/checkfs and checkroot both state the filesystem is clean. Searched for bad blocks with fsck and nothing. I even overwrote every sector on the root partition with dd and it didn't find any write errors.
Partition has since been restored from an image and it's back to the same thing. I've also done a load of smart tests and physically, the disk looks fine. Anyone know what would be making the filesystem dirty all the time?
Partition has since been restored from an image and it's back to the same thing. I've also done a load of smart tests and physically, the disk looks fine. Anyone know what would be making the filesystem dirty all the time?
Re: Failed B3 (another)
A crappy connector might, but for sata is rather unlikely. In pata-land it was rather common to see this, and i always had a spare cable available. Other than this not much really, in the hardware front, but a crap driver with bad timings can cause this kind of behaviour too.
-
- Posts: 904
- Joined: 09 Oct 2009, 18:49
Re: Failed B3 (another)
Thanks nobody. Funny you should mention crappy drivers because my previous hint about editing the interfaces file has led me down a path. The fsck at reboot only seems to happen if I play with the network interfaces. It appears that br0 isn't coming up at boot time. Still looking at this though but I'm thinking about the very first time I noticed a problem and I couldn't get connected to the wifi. So...maybe a problem with the wifi drivers or even a hardware failure on the wifi card.
Re: Failed B3 (another)
Interesting. But there is notheing realy that connects wifi drivers with low level hdd lock addressing that buggers the partitions.
Except bad memory.
Except bad memory.
Re: Failed B3 (another)
Reading back my own posts i seem to be jumping from one theory to another without substantial evidence for any of them. It's like a geek version of a really bad episode of House.
Re: Failed B3 (another)
Could it be a problem with a kernel panic at shutting down? I know I had that on my ubuntu box it did segfault when unmounting an autofs mount point. See this bug report umount segfault on shutdown when unmounting autofs mountpoint.
-
- Posts: 904
- Joined: 09 Oct 2009, 18:49
Re: Failed B3 (another)
Well, I've run every test on the drive I can think of including SMART and badblocks and I'm sure it's OK so I'm just gong to reinstall. Fortunately, this wont be too onerous as I have a decent set of backups.
-
- Posts: 904
- Joined: 09 Oct 2009, 18:49
Re: Failed B3 (another)
To bring this sorry saga to a conclusion...
I'm still not 100% convinced the disk was bad but everything is working fine with a new one so I'm writing the old one off.
I'm still not 100% convinced the disk was bad but everything is working fine with a new one so I'm writing the old one off.