This seems to happen in many different drivers but it happened more often in newer versions such as 530 vs 525.
Then nvidia-modeset goes to 100%
There are many reports of this appearing since driver 4.70 and I can confirm I've seen this in various machines.
https://forums.de........
Let's say you have a directory /mnt/raid which has files and directories inside it, but nothing is mounted to it.
Then you mount a block device such as /dev/sdh to /mnt/raid
Even though /mnt/raid has files and directories there, you can only see the mounted contents of /mnt/raid.
How do we access the original contents?
Just do a bind mount of the root filesystem to another location.
mkdir /bindmount
mount --bind / /b........
This seems to have changed for RHEL 8 where a normal dracut to update your initramfs creates a system that only boots for the running kernel. For example if you have Kernel 5 and then chroot into a RHEL 8 variant which uses kernel 4.18, and run dracut, it seems that by default the system will be unbootable.
It is also the case that if you move your RAID array or drives to another server that it will be unbootable, because dracut seems to only include modules needed for the curre........
It sounds intuitive that you may just move the /var/lib/docker dir to another location and symlink it back but it won't work and you'll get an error.
How to move Docker Storage the Correct Way
This assumes that you want to use /mnt/raid as the new location.
1.) Stop Docker
systemctl stop docker
2.) Move /var/lib/docker
mv /var/lib/docker /mnt/raid/
3.) Edit the Docker daemon file
Specify the path you wan........
This article about migrating to a CentOS 7 /8 RAID mdadm array has a lot of info but I wanted to focus specifically on what newer versions of CentOS 7 require to boot mdadm and what changes are necessary on CentOS 7.8+
CentOS 7 / 8 mdadm RAID booting requirements
This assumes you are chrooting into an existing install or using it to get a new deployment ready. However, these steps can........
cat /proc/mdstat
Personalities : [raid1] [raid10] [linear] [multipath] [raid0] [raid6] [raid5] [raid4]
md124 : inactive sdj1[0](S)
1048512 blocks
Solution, we "run" the array
sudo mdadm --manage /dev/md124 --run
mdadm: started array /dev/md/0_0........
It may appear to be an Xorg or lightdm/gdm/mdm error but in reality for many users with this issue, it's a driver conflict and issue. I had a system that had two GPUs, an Intel and Nvidia GPU.
The only thing that got it working was to remove the nouveau driver and blacklist it so it never came back, then the Intel GPU works fine without these issues.
Solution
sudo rmmod nouveau
add nouveau/other driver to blacklist
edit th........
Bonding is an excellent way to get both increased redundancy and throughput. It is similar to the "Network Teaming" feature in Windows.
There are a few different modes but we will use mode 6, I think it's the best of both worlds, as it is not just a failover, but it provides round robin, so you will get redundancy and load balancing. So if you have a 1G single port, you will have a combined throughput of 4G at this point. Just bear in mind that the true thr........
If you've come here, don't be embarraassed, working in IT, this is the MOST common computer problem that almost everyone will encounter. The reason why I'm doing this post is because I've seen an increase from colleagues and admins having this problem and many times it's not even your fault. A common scenario is that someone acquires a new or used computer which they weren't given the password for. Fortunately Ihave a detailed list of all the options whether free or pa........
Is a mdadm check on your trusty software RAID array happening at the worst time and slowing down your server or NAS?
cat /proc/mdstat
Personalities : [raid1] [raid10]
md127 : active raid10 sdb4[0] sda4[1]
897500672 blocks super 1.2 2 near-copies [2/2] [UU]
[==========>..........] check = 50.4% (452485504/897500672) finish=15500.3min speed=478K/sec
........
mdadm --create /dev/md0 --level 1 --raid-devices 2 /dev/sdb1 missing --metadata=0.90
mdadm: super0.90 cannot open /dev/sdb1: Device or resource busy
mdadm: /dev/sdb1 is not suitable for this array.
mdadm: create aborted
Sometimes running "partprobe" can fix this. Other times it requires a reboot.
One other manual thing that can be done is the following to fix it (if dm is using and blocking it):........
It is unfortunate that LXC's dir mode is completely insecure and allows way too much information from the host to be seen. I wonder if there will eventually be a way to break into the host filesystem or other container's storage?
OpenVZ better security:
[root@ev ~]# cat /proc/mdstat
cat: /proc/mdstat: No such file or directory
/dev/simfs 843G 740G 61G........
The cool thing here is that we only need 1 drive to make a RAID 10 or RAID 1 array, we just tell the Linux mdadm utility that the other drive is "missing" and we can then add our original drive to the array after booting into our new RAID array.
Step#1 Install tools we need
yum -y install mdadm rsync
Step #2 Create your partitions on the drive that will be our RAID array
Here I assume it is /dev........
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md127 : active (auto-read-only) raid10 sdc1[0] sdb1[2]
1953382400 blocks super 1.2 512K chunks 2 far-copies [2/1] [U_]
resync=PENDING
bitmap: 15/15 pages [60KB], 65536KB chunk
Solution force repai........
In a RAID array I had a have periodically lost a drive here and there over the past several months. Iwas always able to readd and resync without losing data. However at some point it looks like some minor corruption happened and this makes DRBD unhappy.
Using fsck did not help either.
Dec 19 06:01:45 storageboxtest4 kernel: [19005.945890] EXT3-fs error (device drbd0): ext3_get_inode_loc: unable to read inode block - inode=22184379........
Before getting into the output here is my typical experience with SMART, there is what I call a "bad disk" with pending and uncorrectable sectors that cannot be reallocated.
It has caused a kernel panic and system crash repeatedly as we can see from the logs.
But SMART says it has "PASSED" its self assessment. SMART is still useful to me but it is more about looking at Current_Pending_Sector.
Any time I have had anything but 0 for that attribute it........
Done on Centos 7.3 very important as clearly based on older guides it was a lot easier and more simpler! Hint do not use grub2-install!
If you have trouble booting after this check this CentOS mdadm RAID booting/fixing guide.
One huge caveat if you are an oldschool user or sysadmin who has avoided UEFIbooting
The nor........
user@box:~$ sudo tune2fs -l /dev/md99
[sudo] password for user:
tune2fs 1.42.9 (4-Feb-2014)
Filesystem volume name:
Last mounted on: /mnt/md50
Filesystem UUID: 976a8655-2619-4587-878c-dab07f7b7652
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Fi........
This is a 8TB Seagate external USB 3.0 device apparently newer kernels use a module called "UAS" instead of "USB Storage" which causes issues as a lot of devices are not properly supported in UAS mode by the kernel driver. The solution some say is to disable UAS specifically for your USB device but I'd rather just disable UAS altogether.
Solution blacklist UAS: *do not do this it does not work and just causes your USB 3.0........
When running cudaminer once it tries to initialize the card the entire screen freezes. The computer itself is still running but the Xorg is done for, you cannot even switch to another console window and must reboot (even an mdm or Xorg restart does not help).
At first cudaminer will give you these errors:
stratrum_recv_line failed
...retry after 15 seconds
GPU #0: Geforce 210 with compute ca........
In short the two drives in the array were /dev/sdd and /dev/sde. The kernel sees they were unplugged and have gone down as you can see below.
mdadm caught the first one being unplugged /dev/sde and disabled the missing drive. However when the final drive that was part of the array is unplugged it didn't notice at all. Instead it complains about an IO error later for drives that the kernel knows do not exist anymore.
[45817.162728] ata4: exception........
1.) Replicate the number of partitions in your new drives.
gdisk /dev/sda
gdisk /dev/sdb
I created 3 partitions of the same same size.
partition #1: +1G (/boot)
partition #2: +60G (swap)
partition #3: rest of it (/)
#note if you are using GPT/gdisk you need to create separate a partition at least 1MB in size (in my case I would a 4th partition and mark it type ef02).........
mdadm won't boot in Ubuntu/Mint/Debian anymore.
You just get the following in a loop:
mdadm: CREATE group disk not found
Incrementally started RAID arrays.
Incrementally starting RAID arrays...
mdadm: CREATE group disk not found
Incrementally started RAID arrays.
Incrementally starting RAID arrays...
mdadm: CREATE group disk not found
Incrementally started RAID arrays.
Incrementally starting RAID arrays...
mdadm: CREATE group dis........
This was a surprising bug but I unplugged all drives for an array md127. At first it was just 1 drive and mdadm seemed to notice this. I unplugged the second drive taking the array offline but mdadm did not realize it was offline and still showed a non-existent disk as being part of it. This created problems trying to unmount it or even to stop this array with mdadm freezing.
As for how to fix it I can only think of making sure you are not in a mounted path of........
I keep reading these drives are slower, but they are cheap and still SSDand work very fast for my needs.
As you can see the sequential read is 481-491MB/s, if I put them in MDADM RAID10 mode (normal RAID1) they should give me well over 900MB/s and with redundancy and being very cheap for what they offer.
[1232206.315622] scsi 8:0:1:0: Direct-Access ATA ADATA SU800&........
It is already known this is not possible
mdadm --create /dev/md3 --level 10 --layout=f2 --raid-devices=2 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid10 devices=2 ctime=Sat Dec 24 18:44:29 2016
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid10 devices=2 ctime=Sat Dec 24 18:44:29 2016
Continue creating ar........
The only way I've found in mdadm to make 2 drives perform like a proper RAID 1 (eg. the read speed should be 2x that of a single drive) is to use the --layout=f2 (far 2).
mdadm raid10 performance issues.
Be very aware that mdadm seems to default to layout=n2 (which means near). In this scenario it means it is like mdadm RAID 1 performance (you get maximum read speeds of a single drive).
dd if=/dev/md126 of=/dev/null bs=1M cou........
Device Boot Start End Blocks Id System
/dev/sdc1 1 132 1060256+ fd Linux raid autodetect
Partition 1 does not end on cylinder boundary.
Partition 1 does not start........
grub> root (hd0,0)
root (hd0,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
But if you do:
root (hd1,0)
setup (hd1)
it does work, I think hd0/sda had a GPT partition that was not removed properly (what I did was just dd bs=512 count=1 the partition table from another drive since the partition table should be identical).
Checking if "/boot/grub/........
In this example we have 2 drives in a RAID array and /dev/sdb is the one that failed. /dev/sda1 is also the /boot partition which we tell grub to install on /dev/sdb eg install root (hd0,0) /dev/sda1 on the new drive /dev/sdb (hd1)
First copy the partition table from /dev/sda to /dev/sdb
dd if=/dev/sda of=/dev/sdb bs=512 count=1
Run partprobe to detect the new partition table
partprobe........
Here is the scenario you or a client have a remote machine that was installed as a standard/default minimal Centos 6.x machine on a single disk with LVM for whatever reason. Often many people do not know how to install it to a RAID array so it is common to have this problem and why reinstall if you don't need to? In some cases on a remote system you can't easily reinstall without physical or KVM access.
So in this case you add a second physical or disk or already ha........
In my case I could login with the initial install but I rsync'd everything over while preserving ownership and permissions to another RAID partition and booted from that.was fine before. The problem is that you are kicked out the second you login and the problem was SELINUX for some reason (perhaps it noticed something strange when it was moved to the new partition)
login: pam_unix(login:session): session opened for user root by LOGIN(uid=0)
login: ROOT LOG........
Iwas surprised to see that Linux Mint at the latest 17.2 version still has NO mdadm installer option, and worse the installer will not be able to create a proper booting environment even when you do install it.
How to setup mdadm in Linux mint LiveCD
sudo su
apt-get install mdadm
# partition as you need and then create your mdadm devices
# create your SWAP md0
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /d........
root (hd2,1)
Filesystem type unknown, partition type 0x83
grub> root (hd2,2)
root (hd2,2)
Filesystem type is ext2fs, partition type 0x83
grub> setup (hd2)
setup (hd2)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... no
#weird thing about grub is that the drive you enter is considered hd0
For example when booted fu........
mdadm --create /dev/md1 --level 10 --raid-devices=2 /dev/sdb2 /dev/sdc2 --layout=f2 --metadata=0.90
Note that layout=f2 or layout=n2 is very important as without it you'll get a complaint like this:
mdadm --create /dev/md0 --level 10 --raid-devices /dev/sdb1 /dev/sdc1 missing missing
mdadm: invalid number of raid devices: /dev/sdb1
It is basically more like a prop........
I messed up the bootloader by accident on a standard Centos 6.3 install because I turned the /dev/vda1 boot partition into an mdadm raid 1. This was all done correctly aside from one point Ididn't realize was an issue metadata=00.90 is the only thing that will allow you to boot (otherwise grub won't work and you won't boot).
So the next step is rescue mode from a CD right? The problem you will find is that grub does not detect your hard drives, this is Ibelieve is be........
[3805108.257042] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[3805108.257052] sd 0:0:0:0: [sda] Write Protect is off
[3805108.257054] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[3805108.257066] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[3805108.257083] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[3805108.257090] sd 0:0:0:0: [sda] Write Protect is off........
This was caused by some weird dmraid setup which kind of takes control of drives even if they're blank/unused.
1. Check the table.
dmsetup table
ddf1_44656c6c202020201000006010281f0b3f5195b77cf86172: 0 3905945600 linear 8:0 0
ddf1_44656c6c202020201000006010281f0b3f5195b77cf86172p3: 0 37124096 linear 253:0 284547072
ddf1_44656c6c202020201000006010281f0b3f5195b77cf86172p2: 0 283496448 linear 253:0 1050624
ddf1_44656c6c2020202010........
This is a great way to upgrade your RAID array or move it/copy it to a new set of hard drives.
Eg. you have a current RAID 1 array on older/slower drives.
Just add at least 1 of the new drives to the array, update grub/install it and then boot into it. Then you have a transparent data migration that is fully synchronized.
mdadm --grow /dev/md126 --raid-devices 3
md127 : active raid1 sdc1........
The units in echo are kB as in kilobyte.
Setting a high sync speed
echo 120000 >/proc/sys/dev/raid/speed_limit_min
This will increase the speed, note that sometimes a rebuild is slow due to current disk activity/iowait.
If that is not the cause then you may have a hardware issue (controller, cable or a bad drive).
Setting a lower sync speed
echo 1200 >/proc/sys/dev/raid/speed_limit_max........
I've got one of these for testing projects from work at home and got more than I bargained for with the time I've spent on it due to the storage handing/Perc 6/i cards.
My particular model came with the following:
2U Rack Mount Server with Rails
2xOpteron 2373 EE (Quad Core, there is a 6-core version that can be found at times)
16GB RAM
2 x 250GB Seagate SATA
2 x Dell Perc 6/i (horrible and a nightmare to work........
I bricked one of my cards by following a guide from UNRAID.
Step #1 from them wipes out the BIOS, but guess what? The step where you restore the BIOS should have been done first, which is sas2flash but no version supports or is able to find my Perc 6/i. So now I'm a bit stuck.
I tried using megarec but it's funny that it can wipe the BIOS but can't forcefully reload it:
megarec -writesbr 0 mpt2sas.rom
Supports 1078 control........
One thing to remember is that you need MegaCli to do the flashing.
You also need the correct file,I tried at least 2 different Perc 6 firmwares from Dell that kept getting rejected as corrupt by MegaCli(they were really the wrong version). I have an external PCI-E Dell 6 Perc/I butI chose images from the 'Integrated" on motherboard version as it was allI could find. They are different, and below is my first time finding success.........
I flashed an LSI Logic firmware to it and it broke the BIOS (cannot do Ctrl+R) for booting purposes but allows other functionality to work normally.
I tried downgrading to a Dell firmware for Perc 6i but it won't work, not even with MegaCli
wget http://downloads.dell.com/FOLDER00416606M/1/SAS-RAID_Firmware_W83M2_LN32_6.3.1-0003_A14.BIN
--2013-08-26 12:53:39-- http://downloads.dell.com/FOLDER00416606M/1/SAS-RAID_Firmware_W83M2_LN32_6.3.1-0003_A14.BIN
Resolvi........
LSi Megaraid
At first it was configured as a RAID 0, then I deleted the Virtual Disk Group.
I thought both drives would be shown and detected in Linux as sda and sdb but it actually shows nothing.
To make them work you have to hit Ctrl+R before the system boots (when prompted) and create a Virtual Disk Group. In my case I created each one as RAID 0 (with a single drive only) as I just wanted JBOD but there is no such option or default in these Dell Pe........
Crashing with a RAID 1 array and when burning a CD.
Screen goes blank (no video signal) and system stops responding during heavier loads.
Is this a defective power supply or is it possible I have too many devices connected to the same rail?
How can I verify/troubleshoot this?........
Have you ever unplugged the wrong drive and then had to rebuild the entire array? It may not be a big deal in some ways but it does make your system vulnerable until the rebuild is done.
Many distros often enable the "bitmap" feature and this basically keeps track of what parts need to be resynced in the case of a temporary removal of a drive from the array, this way it only needs to sync what has changed.
To enable bitmap to speed up rebuilds and sync........
mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3
cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb3[1] sda3[0]
1363020736 blocks super 1.2 [2/2] [UU]
[=>...................] resync = 8.3% (113597440/1363020736) finish=276.2min speed=75366K/sec
........
mdadm --manage /dev/md3 --add /dev/sda1
cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdd2[1] sdd1[2](S)
31270272 blocks
md3 : active raid1 sda1[2] sdb1[1] sdc1[3](F)
943730240 blocks [2/1] [_U]
[>....................]........
Here's a proven example of what a bad hard drive can do, it was technically functioning OKin a RAID array but the system became extremely low and the load become high and IOWAIT was even higher and I always thought it was a bad application. The truth is that this failing 1TBHitachi has slowly gotten worse and caused huge slowdowns, (eg. 100% load on Thunderbird waiting for e-mails to load etc..). After swapping it out, tabs change instantly, emails are not lagged, and........
This booting error is because the Xen PV guest image uses the Xen kernel, this is not compatible with anything but a host running a Xen kernel.
I did a kpartx -av virtual.img and then it created some partitions that showed up in fdisk.
I mounted it and did a chroot into it and removed the xen kernel and installed a normal kernel but Xen still shows the same kernel in Grub (only the Xen one).
This is strange but it seems like this Xen PV guest has some sort of hidden or........
This array is a RAID 1 and in this case 1 of the 2 drives failed (a WD drive and I've found them to be the weakest and most unreliable of any brand and are easily damaged/DOA when shipping them).
mdadm --manage /dev/md0 --add /dev/sdb1
The above assumes the array you want to add to is /dev/md0 and the device we are adding is /dev/sdb1
*One thing to remember is to make sure the partition you are adding is the correct size for the array. You can also g........
Neither the blkid or the UUID internal to mdadm work to automount for some reason in Debian
partprobe doesn't work but was a good suggestion from: http://pato.dudits.net/2008/11/03/special-device-uuidxxxxxxxxxxxxxxxx-does-not-exist-especially-with-lvm
mount: special device /dev/disk/by-uuid/431b9b96-29e8f298-e89bd504-7065bddd does not exist
mdadm -D /dev/md_d12
mdadm: metadata format 00.90 unknown, ignored.
/dev/md_d12:
&nb........
For years I've always built cheap systems believing that there is little difference in more expensive components when it comes to reliability and quality, I generally believe this still except for Power Supplies.
I've always bought cheap cases with nice sounding 350-550W stock/cheap/crap power supplies and haven't had any issues for the most part until recently.
One such case is an NGEAR case with a 550W Optimax power supply, I always read that these supplies don't produce the........
This is one in a series of weird things whichIthought was motherboard related (I RMA'd the motherboard), the RAM tests fine with memtest86 and I used badblocks on both RAID 1 members with no errors and smartctl is happy with them.
Basically the array crashes the kernel a lot and has issues when writing.
[112322.723465] md0: rw=0, want=14958668696, limit=1887460480
[112322.731077] attempt to access beyond end of device
[112322.731087] md........
I like dd, although it only reads it, usually a read test of the entire disk will uncover if your hard drive is bad in some parts. This is a good thing to do at least once a month, a lot of times bizarre program behavior, laginess and crashing/unnmounting problems etc.. are due to a failing disc and SMART won't know it or indicate a problem:
We must also remember there's never a guarantee, I've found that ever since we moved to larger and more platters per drive with 1TB drives........
GNU GRUB version 0.97 (640K lower / 3072K upper memory)
[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]
grub> root (hd1,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup........
I thought only a faster CPUand SSDwould help but I already have a Quad-Core CPU and it wasn't being maxed out. The actual tests were performed on an AMD-V enabled 128MB dual core VMWare container though.
There is a flag that can be passed to make in order to start multiple threads, by specifying 4 threads I was able to reduce the whole kernel compilation time from scratch by about 50%! (65minutes vs 31minutes!). *Yes I did do a make clean before each co........
This happened during a RAID array check:
SMART says both drives pass the test, but I'm doing a long test on them and hopefully this is not a hardware error.
Apr 3 04:22:01 remote kernel: md: syncing RAID array md2
Apr 3 04:22:01 remote kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Apr 3 04:22:01 remote kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Apr........
mysql errors even though these files do exist:
110405 13:21:37 InnoDB: Operating system error number 13 in a file operation.
InnoDB: The error means mysqld does not have the access rights to
InnoDB: the directory.
InnoDB: File name ./ibdata1
InnoDB: File operation call: 'open'.
InnoDB: Cannot continue operation.
110405 13:26:15 InnoDB: Operating system error number 13 in a file operation.
InnoDB: The error means my........
high IO wait
424 root 39 19 1900 848 552 D 0.0 0.0 0:00.91 updatedb
root 424 0.0 0.0 1900 848 ? DN Mar11 0:00 /usr/bin/updatedb -f sysfs?rootfs?bdev?proc?cpuset?binfmt_misc?debugfs?sockfs?usbfs?pipefs?anon_inodefs?futexfs?tmpfs?inotifyfs?eventp........
I bought the 1TB Deskstar C revision recently at just $49 each and put them in RAID 1for my Desktop.
Look at how close the old Deskstar 1TB comes to matching the performance of the mor expensive Samsung and WD's?
This is phenomal, I can't believe the performance I've gotten out of these cheap drives.
http://www.tomshardware.com/reviews/hitachi-western-digital-terabyte,2017-6.html........
I think this will be useful to others because I have a server that kept crashing mysteriously during intense disk usage/RAID checks. It would only crash during the weekly RAID integrity check.
ThenI noticed during a reboot that not all CPUs were being brought up, as a result this actually creates much higher temperatures with the output I got from sensors, just booting the system produced higher than normal temperatures.
You can imagine that a full blown RAID check........
Jan 16 04:02:03 centosbox syslogd 1.4.1: restart.
Jan 16 04:07:34 centosbox kernel: INFO: task updatedb:20771 blocked for more than 300 seconds.
Jan 16 04:07:34 centosbox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 16 04:07:34 centosbox kernel: updatedb D F78BE050 6476 20771 20766&n........
CPU/Kernel/MB/RAID problem?
Jan 5 12:45:05 testbox kernel: [653298.890004] BUG: soft lockup - CPU#0 stuck for 61s! [hal-acl-tool:4168]
Jan 5 12:45:05 testbox kernel: [653298.890005] Modules linked in: vmnet vmci vmmon binfmt_misc drbd video output input_polldev ocfs2_stackglue ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs k8temp hwmon_vid lp snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi........
This made me nervous but it's clearly a cronjob based on the messages log that happens every Sunday at about 4:22.
I actually can't find any evidence of it in cron.d cron.daily but it is there somewhere obviously.
What I don't get is why doesn't this cronjob do a datacheck like Ubuntu's cronscript does? When you unnecessarily rebuild the array you lose your redundancy during that point which makes your data extremely vulnerable.
*Update I did a grep of &q........
This doesn't seem to be widely known (maybe it's in some documentation that none of us read though)but there's an easy way to check the integrity of any mdadm array:
sudo echo check > /sys/block/md0/md/sync_action
-bash: /sys/block/md0/md/sync_action: Permission denied
sudo will never work, this only works as root since echo is not actually a binary/command. It is built-into bash.
/sys/devices/virtu........
This really made me nervous but notice the mdstat says "check". This is because in Ubuntu there is a scheduled mdadm cronscript that runs everyday on Sunday at 00:57 that checks your entire array. This is a good way because it prevents gradual but unnoticed data corruption which Inever thought of.
As long as the check completes properly you have peace of mind knowing that your data integretiy is assured and that your hard drives are functioning properly (I'........
There's no partial WD EARS alignment fix:
I had data on /dev/sda3 and /dev/sdb3 (RAID1) so I couldn't edit that one.
I thought I'd be smart and try fixing the first two partitions so I set the first one starting at sector 2048 and then +8 for the second partition.
This has really slowed the performance down worse than it ever was!
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, tot........
mdadm: metadata format 00.90 unknown, ignored.
This happens with various versions of older mdadm such as mdadm - v2.6.7.1 - 15th October 2008
It is all because an extra 0 in 00.90 in /etc/mdadm/mdadm.conf that it doesn't like (it doesn't seem to cause any problem except that message though):
Solution - Edit your /etc/mdadm/mdadm.conf and change 00.90 to 0.90 in your arrays:
ARRAY /dev/md3 level=raid1 num-devices=2 metadata=0.90 UUID=f41a4644:6b2a05f........
yum exits in the middle
The problem is this VPS seems to be an OpenVZ template from HyperVM. The only way to make it work was to disable i386 packages since this was an x64 kernel. That shouldn't be necessary but it was the only way to make yum stop quitting after the first package or two. I couldn't find any issue by checking the logs either.
echo y|yum install vim-minimal telnet expect jwhois net-tools slocate iptables elinks gawk
L........
I separated the 2 drives in the RAID 1 array.
1 is the old one /dev/sda and is out of date, while the separated other one /dev/sdc was in another drive and mounted and used with more data (updated).
I wonder how mdadm will handle this:
usb-storage: device scan complete
md: md127 stopped.
md: bind
md: md127: raid array is not clean -- starting background reconstruction
raid1: raid set md127 active with 1 out of 2 m........
Moving to RAID was a pain.
What you have to do is the following from an existing install:
Install mdadm
Create your mdadm RAID 1 array on your spare hard drive.
Start it with the missing disk.
rsync the entire contents of your current / to the md partition.
Here's a good way of doing it:
rsync -Pha --exclude=/proc/* --exclude=/sys/* --exclude=/mnt/* /. /mnt/md2........
From a LiveCD or if you're doing something like converting your non-RAID install to mdadm here's how you would chroot properly (you have to mount your proc, sys and dev on the running system/LiveCD to your chroot environment if you want things to work right, especially if you need to run update-initramfs due to a driver change etc..)
*replace "path" with your mount/chroot path
mount -o bind /proc /mnt/path/proc
mount -o bind /dev/ mnt/pa........
had trouble trying to revert Ubuntu 10.04 LTS from grub2, won't boot mdraid and did not even install mdadm during the installation!
I have tried moving back to GRUB 0.97
backed up original /boot and then copied /boot from an old Debian install. Modified device.map and menu.lst and put the appropriate kernels and initrd for Ubuntu back in /boot
I ran grub:
root (hd0,1)
grub> setup (hd0)
Checking if "/boot/grub/stage1........
Create New RAID 1 Array:
First setup your partitions (make sure they are exactly the same size)
In my example I have sda3 and sdb3 which are 500GB in size.
mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3
mdadm: array /dev/md2 started.
Check Status Of The Array
*Note I already have other arrays md0 and md1.
You can see below that md2 is syn........
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdb1 ...
md: adding sdb1 ...
md: adding sda1 ...
md: created md0
md: bind
md: bind
md: running:
md: kicking non-fresh sda1 from array!
md: unbind
md: export_rdev(sda1)
raid1: raid set md0 active with 1 out of 2 mirrors
The md0 raid kicked sda1 ou........
[27969.398749] sd 5:0:0:0: [sdb] 3907029168 512-byte hardware sectors (2000399 MB)
[27969.398749] sd 5:0:0:0: [sdb] Write Protect is off
[27969.398749] sd 5:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[27969.398749] sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[27972.117543] ata6.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[27972.117543] ata6.00: irq_stat 0x48000000
[27972.117543] ata6.00: cmd 60/08:00:ff:7........
If you have the "(auto-read-only)" beside an arrayI have no idea why that happens but it is easy to fix.
Just run "mdadm --readwrite /dev/md1" (rename md0 to the device with the problem and it will begin to resync.
md1 : active (auto-read-only) raid1 sdb2[0] sda2[1]
19534976 blocks [2/2] [UU]
resync=PENDING
........
This is obviously a bug in the r8169 kernel module and it seems to affect a lot of people. I upgraded to the latest kernel and hope this won't happen anymore, as it is a very serious error. This is especially serious for those who are running servers with this chipset, who can afford for the NIC to randomly go off-line for no apparent reason?
[655548.189113] type=1505 audit(1277067560.902:5): operation="profile_load" name="/usr/bin/freshclam&q........
http://www.tomshardware.com/news/RAID-5-Doomed-2009,6525.html
I found this article interesting, it basically says that with 2TB hard drives or larger sizes, you are more likely to encounter an unrecoverable read error. But is this just another Y2K doomsday? Don't HDD's have enough advanced hardware ECC error and read recovery to prevent this from happening?
I'm almost tempted to build a 3 x........
Here is a RAID 1 partition (500GB Seagate & 2TB WD):
Sequential Reads
File Blk Num Avg Maximum Lat%&nbs........
I was creating a RAID array and got this error: mdadm: /dev/sda1 is too small: 0K
mdadm: create aborted
Of course sda1 is not too small, both partitions sda1 and sdb1 are identical in size:
Disk /dev/sda: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Sta........
Why would you want to downgrade the superblock? Old mdadm verisons like mdadm 2.5.6 only use the 0.90 superblock/metadata and new versions use 1,1.0,1.1 and 1.2 superblocks by default.
There are some annoying caveats with this, first of all the new superblocks (later than 0.90) CANNOT be read by GRUB, so you won't even be able to install GRUB. Even worse, old versions of mdadm CANNOT automatically detect arrays even if they were created with a new version of mdadm with th........
Which one does the OS care about? blkid says the UUID is "787f1fa4-b010-4d77-a010-795b42884f56" while md insists its UUID is "4d96dd3b:deb5d555:7adb93cb:ce9182d9"
When in doubt, do we assume the OS takes the one from blkid?
/dev/md0: UUID="787f1fa4-b010-4d77-a010-795b42884f56" TYPE="ext3"
[root@localhost ~]# mdadm -D /dev/md0
/dev/md0:
Version : 0.90
&........
I have an md0 arary that my Centos install refers to. I feel this is half the reason why it won't boot anymore.
I saw the initrd for Centos was assembling it as md127 even though it was known as md0.
The reason for this is because I used mdadm --assemble --scan to detect the array on a LiveCD. I had no idea this name would stick (but now I realize the name is permanently stored in the metadata once you mount md127 or whatever random name assemble gives it). W........
I successfully created a single RAID 1 partition which includes /boot inside it and my root directory through the Debian installer. It said GRUB installed successfully but when I try booting the OS it seems GRUB can't read anything.
When trying to boot from GRUB
GRUB Loading stage 1.5.
GRUB loading, please wait...
Error 2
I get "Error 2" when trying to boot Debian. I also notice from a LiveCD that........
I installed 5.5 with a 300GB RAID 1 partition (boot is also on this partition). It booted up fine the first few times until after I used a Live CD and accessed the array, and it became named /dev/md127 for some reason.
Now whenI boot into CentOS I get a kernel panic and different errors, once I got "invalid superblock", even though the array is fine (it didn't happen again, probably because I was sure to dismount and stop the mdadm array properly).
Here's what........
This was unbelievable how much the Xen kernel slows things down, keep in mind both tests were done on the hostnode, one was with the Openvz-Xen hybrid kernel and the other was just OpenVZ. You can see the performance difference is nearly 300% better when not using the Xen kernel.
OpenVZ-Xen Kernel Test Results (I was wondering what was wrong/so slow with my Core i5!)
# # # # # #&n........
mdadm --assemble --scan
mdadm: /dev/md/diaghost05102010:2 has been started with 2 drives.
mdadm: /dev/md/diaghost05102010:1 has been started with 2 drives.
mdadm: /dev/md/diaghost05102010:0 has been started with 2 drives.
-bash-3.1# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]
md125 : active raid1 sda1[0] sdb1[1]
14658185 blocks super 1.2........
From the package "parted" you can use the command "partprobe" to re-read the partition table. I really hate rebooting, and that's what Iloved to hear about AHCI motherboards, that they allow hotswap so you don't have to reboot. But that's only as good as the OS, if the OS does not reload the partition table you won't be able to do anything with that new drive you attached without rebooting. Yes, even without re-reading the partiton table Linux will........
Before we start I take no responsibility for this, you should have a backup and if you make a mistake during this process you could wipe out all of your data. So backup somewhere else before starting this as a precaution, or make sure it's data you could afford to lose.
The RAID 1 Setup (Hardware Wise)
I've already setup my 2 x 1TB (Seagate) drives with identical partitions, make sure your new hard drive (the empty one) is setup like your curr........
CentOS (most Linux) no-RAID to software RAID-1 guidehttp://lists.centos.org/pipermail/centos/2006-January/018624.html........
Nice General Linux RAID 1 GuideFull examples/tutorials that should work for any Linux system using GRUB or LILO as the boot loader.
This is the only tutorial I've seen that clearly shows how you can convert an existing non-RAID system to software RAID1 remotely, without ever having to be at the computer. This is important for people who co-locate or rent dedicated servers that they may not have physical access to in a timely manner.
https://alioth.debia........
RAID1 using Gmirror Tutorialhttp://www.onlamp.com/pub/a/bsd/2005/11/10/FreeBSD_Basics.html........
Clustering LinksI thought this might be interesting for people with spare time.
[b:6423c19973]Great clustering article from Linux Mag[/b:6423c19973]
http://www.linux-mag.com/2003-11/clusters_01.html
[b:6423c19973]General Linux cluster information[/b:6423c19973]
http://www.gdargaud.net/Hack/ClusterNotes.html#HighA
http://www.faqs.org/docs/Linux-HOWTO/Cluster-HOWTO.html#s3
http://www.yolinux.com/TUTORIALS/LinuxClustersAndFileSys........
I've gotten this error enough to bother posting about it, because I've come across so many servers where this happens, so what could "Error 28" possibly mean? Is your database corrupt, or is this a sign of a RAID failure/corruption or even worse, bad blocks on a clients system who has no RAID and never took backups?
No, check your free blocks, it simply means you have no space. This was the result of a script that was overzealous and backed up the entire database........
I've tried to find a good sensible solution to cluster with and each technology has it's pros and cons and there is no perfect solution and I've found a lot of "exaggerations" in the applications, benefits and performance of these different filesystems.
DRBD
I first started off with DRBD and Ihave to say it does live up to the hype, is quite reliable (although it can be annoying to match up the kernel module and user applications since they must match and whe........