
Troubleshooting9–28
2. Unmount the file system that uses the device on all client nodes.
The /proc/fs/lustre/mds/*/num_exports counter on the MDS server must be 0 (zero). This
counter is set internally when client nodes mount and unmount file systems.
3. Stop the file system by entering the command shown in the following example, where the file system is
called test:
sfs> stop filesystem test
4. Verify that the file system is stopped by entering the show filesystem command. The services must
be in the stopped or down state. If a service is in any other state, enter the stop filesystem
command again or shut down the server where the service is running.
5. Check the /proc/mounts file and the /proc/fs/lustre/obdfilter/mntdev (or /proc/
fs/lustre/mds/mntdev) file on each server to ensure that the file system devices are not mounted
on any point.
CAUTION: Before you run the e2fsck-lfsck command on a file system device, you must
ensure that the file system that uses the device is fully stopped and the device is not mounted on
any node. Running the e2fsck-lfsck command on a file system device that is mounted or
actively being used may corrupt the file system.
6. Log in to the preferred server for the service. (Use the show ost ost_name command to identify
the preferred server for an OST service.)
7. If the service is not mirrored, skip this step.
If the service is mirrored, identify the underlying RAID device and start it, as shown in the following
example:
a. Examine the raidtab file used to create the mirrored service, as shown in the following
example:
# cat /var/raid/raidtab.mds3
ARRAY /dev/md1
.
.
.
In this example, the underlying RAID device is /dev/md1
b. Start the underlying RAID device by entering the following command:
# mdadm --assemble --config /var/raid/raidtab.mds3 /dev/md10
You can now run the e2fsck-lfsck command manually using the same arguments as would be used by
the standard e2fsck command to repair a standard ext3/ldiskfs file system.
9.26.6 MDS or OST services stay in the recovering state
If you find that an MDS or OST service is remaining in the recovering state for a long time, check whether
the service has actually started the recovery process, as described in Section 9.26.7.
9.26.7 MDS and OST service recovery process
This section describes the process that takes place when a service fails over from a server to the peer server.
1. When an MDS or OST service starts up and determines that it has client recovery information that
was recorded during an earlier operation of the service, it reports messages similar to the following:
Lustre: OST south-ost12 now serving /dev/hpls/dev17a (e6cf0bf5-a180-46d5-b6ea-
cea8947013c1), but will be inrecovery until 9 clients reconnect, or if no clients
reconnect for 5:00; during that time new clients will not be allowed to connect.
Recovery progress can be monitored by watching /proc/fs/lustre/obdfilter/south-
ost12/recovery_status.
Komentáře k této Příručce