Diagnose Hardware Failures Edit on GitHub

Won’t Power On

If a computer won’t turn on, this could be any number of component failures. The only way to know for sure which one has failed, is to test the system without anything attached. We need to disconnect anything that can be removed. This includes: the hard drives, Wifi card, RAM, and video cards (desktop, with on-board graphics). The only thing the system needs to boot is one stick of RAM in slot 0. Try different RAM sticks in slot 0 if it doesn’t boot (to test for failed RAM). Also, remove the CMOS battery and disconnect the main battery (laptops) for one minute. We don’t recommend removing the CPU as a test.

If the system will boot with everything removed, then add components back one by one and see which one is causing the problem. If everything works fine after removing and replacing all of the hardware, a loose connection is most likely the culprit. If the system won’t boot with everything disconnected, then the motherboard has likely failed, and needs replaced.

System Boots

If the system boots, but takes a long time to boot, crashes, or reports other random, hard to track down errors, then the individual hardware components can be checked for failure.

Memory

To run a memory test on your computer, we need to use a live disk of Ubuntu. We also need to change the BIOS settings from UEFI mode to BIOS mode. If you press the key indicated on boot to get into BIOS (F2 for laptops, and DEL for most desktops), there will be a toggle between the two modes. To create a bootable copy of Ubuntu, please follow these instructions:

Create an Installation USB Create an Installation DVD
Using Ubuntu Using Ubuntu
Using Windows Using Windows
Using Mac OS X Using Mac OS X

Once you switch to BIOS mode, restart, and use the key to boot from other drives (F7 for laptops, and F12 for most desktops) to select the USB. Right after you select the USB for boot, start tapping the ESC key to get into the GRUB boot menu. If you accidentally get to a GRUB command prompt, type in the word normal, press Enter, then immediately press ESC. Grub is available for only a second, so if you miss the opportunity, turn your computer off and try again.

In the grub boot menu, choose Memory test (memtest86+) and let it run overnight to check for any memory errors. If memory errors show up, the memory stick should be replaced.

Hard Drive

To check the hard drive for disk failures, start the program Disks, select the hard drive on the left, then click the icon in the top right, and choose SMART Data and Self-Tests, and then click Start Self-test and choose the Extended test. This test takes a few hours to run and will will give you a large amount of info about the health of the drive.

All of the values start at 100, and work their way down to 0. The terms “old-age” and “pre-fail” are normal. Pay attention to the overall assessment, and to how close the values are working towards the failure point, which is typically 0.

Machine Check Exceptions

Machine Check Exceptions are hardware failure events, and can be logged with the mcelog program. Run this command to install the program:

sudo apt install mcelog

Then, after the system has crashed or been used for a period of time, take a look at this log:

/var/log/mcelog

If there is no log, then the crash isn’t related to a hardware failure. The log will stay empty until a MCE happens. Take a look for “uncorrected” errors, as most “corrected” errors can be ignored. If there are a consistent number of “uncorrected” errors, the hardware should probably be replaced.

Support

Please contact support by opening a ticket to get the system repaired or to have failed components replaced.