As the options for data storage consolidation evolve, Microsoft strives to make sure that the Microsoft® Windows® 2000 operating system continues to be highly reliable. For example, with the shift to Storage Area Networks (SANs), Windows 2000 addresses new concerns with faster recovery and offers new features, such as Active Directory® directory service, that increase scalability and improve manageability.
This white paper describes some of the improvements that Microsoft has made in the Windows 2000 Chkdsk utility and describes ways to manage corrupted volumes. It also describes considerations of running Chkdsk on a server cluster in Microsoft Windows 2000 Advanced Server or Microsoft Datacenter Server.
Microsoft has significantly improved Chkdsk performance in Windows 2000 and continues to improve its performance to address the challenge of new I/O hardware technology that puts more and more data on “single,” very large, growing volumes. Microsoft also has enhanced the NTFS file system to minimize failures.
To complement these improvements, organizations that use Windows 2000 must apply “best practice” operational management and must develop recovery procedures and disaster recovery processes to minimize system outages of all types. By applying these best practices, you can drive the recovery process, instead of being a victim of system failures.
Chkdsk is a command-line utility that verifies the logical integrity of a file system on Windows 2000. NTFS, which maintains the integrity of all NTFS volumes, automatically runs Chkdsk the first time that Windows 2000 mounts an NTFS volume after the computer is restarted following a failure. You can also manually run Chkdsk or schedule Chkdsk to be run if you suspect there may be file system corruption.
Chkdsk examines all the metadataon a volume, compares it to the transaction logs that are maintained by NTFS, and if it finds logical inconsistencies, it takes actions to repair file system data. Metadata is “data about data.” It is the file system overhead, so to speak, that NTFS uses to keep track of everything about all the files on the volume. For example, metadata tells NTFS which allocation units make up the data for a particular file, which allocation units are free, and which allocation units contain bad sectors.
If Chkdsk runs at a time other than during the startup process, the code that actually performs the verification resides in utility dynamic-link libraries (DLLs), such as Untfs.dll and Ufat.dll. The verification routines that Chkdsk runs are the same ones that are run when Windows Explorer or Disk Administrator verifies a volume through its graphical user interface (GUI). If Chkdsk runs during the startup process, the binary module that contains the verification code is Autochk.exe.
Autochk is an integrated Windows 2000 command-line utility that runs early enough in the system startup process that it does not have the benefit of virtual memory or other Win32® application programming interface (API) services. Autochk generates the same kind of textual output that Chkdsk does, except that in addition to displaying this output on the screen during the startup process, Autochk also logs an event to the Application event log for the system. This event contains as much textual output as can fit into the event log's data buffer.
Because Autochk and the verification code in the utility DLLs that are used by Chkdsk are based on the same source code, this white paper will sometimes refer to Autochk and Chkdsk collectively as Chkdsk.
After the release of Microsoft Windows NT® 4.0 Service Pack 4 (SP4) and Windows 2000, Microsoft added two new command-line switches, /i and /c, to Chkdsk. These options are only valid when the destination drive has the NTFS file format. Each option directs Chkdsk to bypass certain actions, which reduces the time it takes Chkdsk to run. The /c option directs Chkdsk to skip the checking of cycles in the folder structure, and the /i option directs Chkdsk to perform a less vigorous check of index entries.
These command-line switches are intended for users with exceptionally large volumes who require flexibility in managing system downtime. Because the use of the /c and /i options can result in a volume remaining corrupted after Chkdsk has completed, it is a good idea to use these options only in situations in which system downtime must be kept to an absolute minimum.
To understand when it is appropriate to use these command-line switches, it is important to understand some of the internal NTFS data structures, the kinds of corruption that can happen, what actions Chkdsk takes when it verifies a volume, and what the potential consequences are if you circumvent the typical Chkdsk verification steps.
Run a Full Chkdsk This option repairs all file system data and restores all user data that can be recovered by means of an automated process. The drawback to this option is that a full Chkdsk can require several hours of downtime for a mission-critical server at an inopportune time. However, in terms of data recovery, this is the recommended course of action.
Run an Abbreviated Chkdsk By using some combination of the /c and /i command-line switches, you can repair the severe kinds of corruption that can grow into bigger problems in much less time than a full Chkdsk requires. However, this option does not repair all the corruption that might exist. A full Chkdsk is still required at some future time to guarantee that all the data that can be recovered will be recovered.
Do Nothing For a mission-critical server that is expected to be online 24 hours a day, this is frequently the necessary choice. The drawback to this option is that relatively minor corruption can grow into major corruption if it is not repaired as soon after it is detected as possible. Therefore, consider this option only when keeping a system up is more important that the integrity of the data that is stored on the corrupted volume. Keep in mind that all data on the corrupted volume is “at risk” until Chkdsk is run.
Format the Partition and Restore from Tape
When Chkdsk is run against a volume, Chkdsk may not correctly recover 100 percent of the data if there is extreme corruption. If you have a high-speed tape backup solution and a known last-good backup, it may be just as fast or faster to reformat the partition and then restore the data from tape. This is a rare scenario. Use this option only in extreme situations with careful consideration.
How long will Chkdsk take to run? This frequently asked question has no quick answer. For information about the factors that affect the length of time that Chkdsk takes to run, see “How Long Will Chkdsk Take to Run?” 26 of this white paper.