====== Gestione dei dischi in Linux ======

===== Maneggiare le partizioni =====

==== Cambiare dimensione a una partizione ext3 ====

Vedere la pagina **[[resize2fs]]**.

===== Parametri Boot (per GRUB o LILO) relativi al RAID =====

Se vogliamo forzare il kernel ad assemblare le partizoini RAID in un determinato modo si possono passare dei parametri al kernel stesso, ad esempio:

<code>
raid=noautodetect md=1,/dev/hda5,/dev/hdc5 md=4,/dev/hda2,/dev/hdc2 root=/dev/md4
</code>

===== Gestione RAID con mdadm =====

=== Creazione di un volume RAID1 ===

Con il comando seguente viene creato e immediatamente attivato un volume **RAID1** in modalità **degradata**, cioè con un solo disco attivo:

<code>
mdadm --create /dev/md10 --run --level=1 --raid-devices=2 /dev/sdb3 missing
</code>

=== Aggiungere una partizione ad un volume RAID ===

Per aggiungere un componente ad un volume RAID, ad esempio per ripristinare un volume degradato:

<code>
mdadm /dev/md0 --add /dev/hdc1
</code>

=== Avviare un volume in modalità degradata ===

Se non è possibile montare un volume RAID nella sua completezza (ad esempio perché un componente di un RAID1 è guasto), il volume non viene attivato automaticamente. Per forzare l'operazione è necessario passare il parametro **%%--run%%** nel modo **assemble** oppure nel modo **misc**:

<code>
mdadm --assemble /dev/md0 --run
mdadm --misc /dev/md0 --run
</code>

Il modo **%%--misc%%** va usato se il device è stato assemblato solo parzialmente (cioè esiste il device, ma non è //running//, quindi risulta vuoto). Il nome del device. il corrispondente UUID e quindi le partizioni componenti vengono desunte dal file di configurazione **/etc/mdadm/mdadm.conf**.

È possibile anche forzare il nome del volume da creare e indicare le componenti da assemblare, specificando tutto sulla riga di comando:

<code>
mdadm /dev/md17 --assemble /dev/sdc7 --run
</code>

=== Rimuovere un componente da un volume RAID ===

Per rimuovere un componente, ad esempio da un volume RAID1, è necessario prima marcarlo come guasto (fail), dopo sarà possibile rimuoverlo dal RAID, che continuerà a lavorare in modalità degradata:

<code>
mdadm --manage /dev/md0 --fail /dev/sdb2
mdadm --manage /dev/md0 --remove /dev/sdb2
mdadm --zero-superblock /dev/sdb2
</code>

=== Distruggere un volume RAID ===

Per eliminare definitivamente un volume raid è necessario fermarlo, quindi è opportuno azzerare il sperblock di ogni componente, per evitare che l'auto-run dei volumi trovi la signature nelle partizioni e provveda a fare l'assemblaggio al successivo reboot:

<code>
mdadm --stop /dev/md4
mdadm --zero-superblock /dev/sda7
mdadm --zero-superblock /dev/sdb7
</code>


=== Vedere l'UUID di un volume RAID ===

<code>
mdadm --detail /dev/md0
</code>


==== Checking health of a RAID volume ====

A disk can degrade (some bits get altered) without that hardware errors are detected, in this case the Linux md subsystem (**''cat /proc/mdstat''**) does not display any error. Fortunately - with recent Linux kernels (>= 2.6.17?) - we can force a check of the whole volume consistency. Say we want to check ''/dev/md0'':

<code>
echo check >> /sys/block/md0/md/sync_action
</code>

Mismatches found are not repaired, but a count is displayed by:

<code>
cat /sys/block/md0/md/mismatch_cnt
</code>

The counter is reset at the start of the check, increased for each mismatch found, and then stored permanently (also across reboot?). It seems that each mismatch counts as 128 (may be this is the number of tries attempted before the mismatch is acknowledged?).

Unfortunately I have not yet found a way to know the exact filesystem block with the mismatch. If we could know the block number we could use **''debugfs(8)''** to discover the actual file using the mismatching block. Attempting to repair the mismatch is done by:

<code>
echo repair >> /sys/block/md0/md/sync_action
</code>

At the end of the **repair** we can read the mismatches found (and hoply repaired) via the **mismatch_cnt**. A new run of the **check** action should end with a zero mismatch_cnt.

I encountered a case where - for unknow reasons - the reapir did not work: the **mismatch_cnt** did not reset to zero. Fortunately it was a swap partition, so I solved the mismatch rewriting the partition entirely:

<code>
swapoff /dev/md1
dd if=/dev/zero of=/dev/md1
mkswap /dev/md1
swapon /dev/md1
</code>

Read also man **md(4)** and **''/usr/src/linux/Documentation/md.txt''**.

==== Managing bad blocks ====

  * http://osdir.com/ml/linux.utilities.smartmontools/2006-07/msg00043.html
  * http://smartmontools.sourceforge.net/man/smartd.conf.5.html
  * http://smartmontools.sourceforge.net/BadBlockHowTo.txt

==== resync=PENDING ====

If a volume remains in a **''resync=PENDING''** state (eg. a swap file which is in auto-read-only), you can force the resync with:

<code>
mdadm --readwrite /dev/md1
</code>

==== GRUB savedefault and RAID mismatches ====

The GRUB boot loader has the **savedefault** option which is used to remember the chosen boot entry. The information is saved into the ''**/boot/grub/stage2**'' file (at least with GRUB 0.97). This is a problem if the file resides on a RAID volume, because GRUB is not aware of RAID and it writes on the underlying component device. For a RAID1 volume this means that only one component will be written, creating a RAID mismatch.

This problem can cause false alarms on systems that perform periodic RAID checks (a default setup on a Debian box), this is the warning found in ''/var/log/syslog'':

<file>
mdadm: RebuildStarted event detected on md device /dev/md0
mdadm: Rebuild80 event detected on md device /dev/md0
mdadm: RebuildFinished event detected on md device /dev/md0, component device  mismatches found: 128
</file>

I think that //savedefault// option should be avoided when using Linux software RAID on the boot partition.


==== RAID in auto-read-only mode ====

You may find that a RAID array was assembled in **auto-read-only** mode, like this one:

<code>
md1 : active(auto-read-only) raid1 sda5[0] sdb5[1]
      1951744 blocks [2/2] [UU]
</code>

It seems a normal behaviour in newer Linux kernels (2.6.22). The read-only status is removed automatically as soon as something is written to the device. I did not find much info about this.

If you want to control this feature, check the state of the **start_ro** module parameter and eventually change it:

<code>
cat /sys/module/md_mod/parameters/start_ro
1
echo 0 > /sys/module/md_mod/parameters/start_ro
</code>

Obviously you need to change this parameter **before** the md kernel subsystem assembles the RAID volumes.

==== Mdadm problem with Debian Jessie and degraded raid1 ====

**Regression - mdadm in jessie will not boot degraded raid1 array**. See Debian bug [[https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=784823|#784823]]. This is a workaround, as suggested by [[http://serverfault.com/questions/688207/how-to-auto-start-degraded-software-raid1-under-debian-8-0-0-on-boot|this post]]:

<file>
cd /etc/initramfs-tools/scripts/local-top
cp /usr/share/initramfs-tools/scripts/local-top/mdadm .
patch --verbose --ignore-whitespace <<'EndOfPatch'
--- mdadm
+++ mdadm
@@ -76,7 +76,15 @@
   if $MDADM --assemble --scan --run --auto=yes${extra_args:+ $extra_args}; then
     verbose && log_success_msg "assembled all arrays."
   else
-    log_failure_msg "failed to assemble all arrays."
+    log_warning_msg "failed to assemble all arrays, attempting individual starts..."
+    for dev in $(cat /proc/mdstat | grep '^md' | cut -d ' ' -f 1); do
+      log_begin_msg "attempting mdadm --run $dev"
+      if $MDADM --run $dev; then
+        verbose && log_success_msg "started $dev"
+      else
+        log_failure_msg "failed to start $dev"
+      fi
+    done
   fi
   verbose && log_end_msg

EndOfPatch
update-initramfs -u
</file>

Somewhere on the net there is a suggestion to use the ''md-mod.start_dirty_degraded=1'' kernel command line, but this is not the case, because it is used to force starting degraded and dirty RAID5 and RAID6 volumes. Another hint about a ''BOOT_DEGRADED=true'' options in initramfs config seems to be Ubuntu specific.

The problem is that the initramfs hook script provided by the mdadm package does:

<code>
mdadm --assemble --scan --run --auto=yes
</code>

which actually does not run the array if it is degraded (where the previous start was not).

==== Rename an MD device permanently ====

To geneate the configuration file **/etc/mdadm/mdadm.conf** you can use the tool **/usr/share/mdadm/mkconf**, which gathers information from the existing RAID volumes. The information are read from the **superblock** of the MD arrays and contains the **name** and **homehost**, so the file created contains lines like this:

<file>
ARRAY /dev/md/125  metadata=1.2 UUID=d92a8066:fc45b7d3:f916e674:3a695968 name=hostname:125
</file>

Editing ''mdadm.conf'' to rename a device (e.g. changing from **/dev/md/125** to **/dev/md/5**) will work on the next reboot (remember to update initramfs), but eventually ''mkconf'' will restore the name ''/dev/md/125'' on the next run. To change the name permanently you have to stop the RAID device and update the superblock:

<code>
mdadm --stop /dev/md/125
mdadm --assemble /dev/md/5 --name=5 --homehost=newhostname --update=name /dev/sda5 /dev/sdb5
</code>

===== Filesystem has unsupported feature(s) =====

Può capitare che un filesystem abbia errori.

Può capitare che qualche insano di mente cerchi di riparare questi errori avviando il computer con una versione diversa del sistema operativo in uso. Il risultato può essere un filesystem che il sistema operativo originale non riesce più ad usare:

<code>
fsck.ext3 -f -y /dev/hda1
e2fsck 1.27 (8-Mar-2002)
fsck.ext3: Filesystem has unsupported feature(s) (/dev/hda1)
e2fsck: Get a newer version of e2fsck!
</code>

Scartiamo l'ipotesi che qualche altro insano di mente innesti una versione di e2fsck diversa da quella fornita col sistema operativo, possiamo usare **debugfs** per rimettere a posto le cose. Basta scoprire quali sono le //features// supportate e rimuovere quelle di troppo:

<code>
debugfs -w /dev/hda1
debugfs 1.27 (8-Mar-2002)
debugfs:  features
Filesystem features: resize_inode dir_index filetype sparse_super
debugfs:  feature -resize_inode
Filesystem features: dir_index filetype sparse_super
debugfs:  feature -dir_index
Filesystem features: filetype sparse_super
debugfs:  quit
</code>

===== Passaggio da ext3 a ext2 =====

Per convertire un filesystem **ext3** in **ext2** si deve rimuovere il journal, per tornare ad ext3 basta ricrearlo (tutte qui le differenze?). Il filesystem deve essere smontato. Maggiori informazioni qui: 
[[http://www.troubleshooters.com/linux/ext2toext3.htm|Converting Ext2 Filesystems to Ext3]].

<code>
tune2fs -O ^has_journal /dev/md3
</code>

Per creare nuovamente il journal:

<code>
tune2fs -j /dev/md3
</code>

===== Forzare fsck al reboot =====

Su un server remoto o privo di monitor/tastiera conviene **impostare il fix automatico di eventuali errori nel filesystem**, senza richiesta di intervento manuale. Questo soprattutto prima di forzare un ''fsck'' al reboot.

Fino a **Debian 7 Wheezy** era sufficiente impostare in **''/etc/default/rcS''**:

<file>
FSCKFIX=yes
</file>

Da quando si usa il sistema di //init// **systemd** (cioè da **Debian 8 Jessie**) tale file non viene più preso in considerazione, esistono invece le //unit// chiamate **systemd-fsck@.service** e **systemd-fsck-root.service** (vedere **man 8 systemd-fsck**). Tale servizio riconosce alcuni parametri passati al kernel che si possono aggiungere - eventualmente separata da spazi - in **''/etc/default/grub''** (ricordarsi poi di eseguire **update-grub**):

<file>
GRUB_CMDLINE_LINUX="fsck.repair=yes"
</file>

Controllare anche in **''/etc/fstab''** che il **sesto campo** sia diverso da zero: 1 per il rootfs, 2 per gli altri.

Con tune2fs è possibile verificare ogni quanti reboot viene fatto il check e quando è stato fatto l'ultimo:

<code>
tune2fs -l /dev/sda3
...
Mount count:              18
Maximum mount count:      38
Last checked:             Wed May  9 18:57:29 2012
...
</code>

Un metodo di forzare il check è aumentare artificialmente il Mount count:

<code>
tune2fs -C 100 /dev/sda3
</code>

Oppure creare il file

<code>
touch /forcefsck
</code>

Un altro sistema è forzare il check sulla richiesta di reboot:

<code>
shutdown -rF now
</code>

===== Filesystem Features =====

Se ci sono differenze fra il sistema su cui si esegue il **mkfs.ext4** e il sistema su cui si esegue il **grub-install**, si potrebbe incorrere nel seguente errore:

<code>
grub-install /dev/sda
Auto-detection of a filesystem of /dev/sda2 failed
</code>

Il caso visto sopra è dovuto al fatto che il filesystem è stato creato sull'host avviato con S.O. **64bit**, mentre il grub-install è stato eseguito in un chroot di un sistema a **32bit**.

Con **%%tune2fs -l%%** è possibile vedere le //Filesystem features//, quindi è possibile creare il filesystem con le opzioni opportune. Ad esempio il nostro problema è stato risolto eseguendo:

<code>
mkfs.ext4 -O 'uninit_bg,^64bit,^metadata_csum' /dev/sda2
</code>