To better understand Chkdsk and its command-line switches, it is important to understand the basics of some of the internal NTFS data structures. NTFS is a recoverable file system that maintains volume consistency by using logging techniques. If the operating system stops responding (crashes or hangs), NTFS restores consistency by running a recovery procedure that accesses information that is stored in a log file. NTFS does not guarantee protection of user file data. If the system crashes while a program is writing a user file, the file can be lost or corrupted, and you may need a file system checker.
A file system's correctness and validity may have to be verified if there is serious corruption of a metadata file or corruption of user data. Its correctness and validity can be checked by using a file system's check command. In Windows NT and Windows 2000, this command is Chkdsk. Chkdsk can repair any problems it finds in the file system and alert you if there are any unrepairable issues.
When you format an NTFS volume, the format program creates a set of files that contain the data that is used to implement the file system structure. NTFS reserves the first 16 records in the Master File Table (MFT) for the information about these files, named metadata. Metadata is data that is stored on a volume in support of the file system format management. Typically, it is not made accessible to applications. Metadata includes the data that defines the placement of files and directories on a volume. In NTFS, all data that is stored on a volume is contained in files, including the data structures that are used to locate and retrieve files, bootstrap data, and the bitmap that records the allocation state of the whole volume. Metadata file names start with a dollar sign (for example, $Bitmap) and are hidden.
NTFS recovers metadata after a crash by using standard transaction logging and recovery techniques. If an I/O failure occurs when the operating system is writing data to the disk, NTFS restores consistency by running a recovery procedure that accesses information that is stored in a log file. The NTFS recovery procedure is exact, guaranteeing that the volume is restored to a consistent state.
NTFS does not protect user data — that is, the contents of files — through the use of a transaction log, the way it protects metadata. NTFS does not guarantee the integrity of user data after an instance of disk corruption, even if you immediately run a full Chkdsk operation. Chkdsk may not be able to recover some files, and some files that Chkdsk does recover may still be internally corrupted. It remains vitally important that you protect mission-critical data by making periodic backups or by using some other robust method of data recovery.
NTFS maintains the integrity of all NTFS volumes by automatically running Chkdsk and performing disk recovery operations the first time that Windows 2000 mounts an NTFS volume after the computer is restarted following a failure.
NTFS views each I/O operation that modifies a metadata file on the NTFS volume as a transaction and manages each one as an integral unit. After the transaction is started, the transaction is either completed or, if an I/O operation failure occurs, rolled back (such as when the NTFS volume is returned to the state it was in before the transaction was started).
To make sure that a transaction can be completed or rolled back, NTFS performs the suboperations of the transaction on the volume. After NTFS updates the volume, it commits the transaction by recording in the log file that the whole transaction is complete. Both the log file entries and the volume updates are buffered by the system’s file cache.
After a transaction is committed, NTFS makes sure that the whole transaction appears on the volume, even if the I/O operation fails because of a system shutdown or crash. During recovery operations that occur the next time the volume is mounted, NTFS redoes each committed transaction that it finds in the log file. Then NTFS locates the transactions in the log file that were not committed at the time of the system failure and undoes each transaction suboperation that is recorded in the log file. In this way, incomplete modifications to the volume metadata are prohibited.
Important NTFS uses transaction logging and recovery to guarantee that the volume structure is not corrupted. For this reason, all metadata files remain accessible after a system failure. However, user data can be lost because of a system failure, a bad sector, or a simple delete operation that is initiated by a user. NTFS does not implement transactional logging to protect user data. Regular backups of data are highly recommended.
NTFS Transaction Log Recoverability
Each file on an NTFS volume is listed as a record in the MFT. The first record in the table describes the MFT itself, and the second record describes a special file that mirrors the first few entries of the primary MFT. If the first MFT record is corrupted, NTFS uses the second record to find the MFT mirror file that is stored at the end of the logical disk, in which the first record is the same as the first record of the MFT. The boot sector records the locations of both the MFT and MFT mirror files. The MFT mirror file does not contain 100 percent of the whole MFT. Instead, it contains only the first few critical entries that Windows must have to mount the volume.
The third record in the MFT is the log file that records all file transaction information. NTFS and the Log File Service use the DATA attribute of the log file to implement file system recoverability. The Log File Service is a component of Windows NT Executive. Because the log file is a system file, it can be found early in the startup process and used to recover the disk volume if the volume is found to be corrupted. When a user updates a file, the Log File Service records all redo and undo information for the transaction. For recoverability, redo information allows NTFS to roll the transaction forward (repeating the transaction), and undo allows NTFS to roll the transaction back if an error occurs.
Committing data to the disk involves the following steps:
NTFS writes a log file record that notes the volume update that it intends to make.
NTFS calls the cache manager to flush the log file record to disk.
The cache manager flushes the modified metadata to disk, updating the volume structure.
NTFS writes a log file record that flags the transaction as having been completed.
If a transaction is completed successfully, NTFS commits the file update to disk. If the transaction is not completed, NTFS ends or rolls back the transaction according to the undo information. If NTFS detects an error in the transaction, it rolls back the transaction. If NTFS cannot guarantee that a transaction completed successfully, it rolls the transaction back. Incomplete modifications to the volume are not permitted.
If the system crashes (because of a power failure or other cause), NTFS performs three passes through the log on the disk: an analysis pass, a redo pass, and an undo pass. During the analysis pass, NTFS appraises the damage, if any, and determines which clusters it must update by using the information in the log file. The redo pass performs any steps that were logged from the last checkpoint. The undo pass then rolls back any incomplete (uncommitted) transactions.
The NTFS recovery pass involves the following steps:
NTFS calls the Log File Service to open the log file. This causes the Log File Service recovery to occur.
NTFS calls the Log File Service to read its restart area and reads all the data from the last checkpoint operation. This data initializes the transaction table, dirty pages table, and open file table so that they can be used in the recovery process.
NTFS performs an analysis pass on its last checkpoint record. At the end of this pass, the transaction table contains only transactions that were active when the crash occurred.
NTFS performs a redo pass. At the end of this pass, the cache reflects the state of the volume when the crash occurred.
NTFS performs an undo pass. At the end of this pass, the volume is recovered to a stable state.
The Log File Service maintains two objects to support its functions:
The restart area. This is a status area used to transfer information about a client's last checkpoint operation before a crash to the client's recovery procedure. The Log File Service maintains two restart areas to guarantee that at least one valid area is always available.
The infinite log file. The log file is a circularly reused file. When a new record is added, it is appended to the end of the file. When the log file reaches its capacity, the Log File Service waits for writes to occur and frees space for new entries. This can be a very time-consuming operation. If the log file becomes full, there is a perceptible pause in file system activity. This can significantly degrade performance if it occurs repeatedly. If your system is prone to this behavior, make its log file bigger by using the chkdsk /l command (see “Chkdsk Syntax and Optional Command-Line Switches” 10 of this white paper), or consider how I/O might be rebalanced.