Windows NT 4.0 Summary
Based on articles written by Mark Russinovich
Windows NT Architecture Overview
The Windows NT architecture has evolved steadily since NT’s introduction in 1988. The following diagram illustrates the principal components of the operating system and their relative places in the system architecture. A description of the key Kernel Mode components follows.
Similar to other operating systems, applications executing in NT’s User Mode have no direct access to hardware and only restricted access to memory. All components that reside in Kernel Mode address space have direct access to all hardware and memory.
The services invoked in kernel mode are known as NT’s native API. This API is made up of about 250 functions that NT’s operating systems access through software-exception system calls. Software-exception system calls cause a switch from user mode to kernel mode. To carry out work, system services call on functions in one or more components of NT’s Executive (one or more of the 250 functions mentioned above) each component in the executive
Hardware Abstraction Layer
This kernel mode layer contains the majority of processor specific code. The Windows NT developers originally intended ALL hardware specific code to reside in the HAL, however performance tradeoffs caused some of the code to move into the microkernel.
Microkernel
The Windows NT kernel is a modified microkernel – it falls between a microkernel and monolithic kernel structure. In NT’s modified microkernel design, operating system environments execute in User Mode as discrete processes, including DOS, Win16, Win32 OS/2 and POSIX. NT’s User Mode operating system environments implement separate operating system APIs.
The Windows NT microkernel contains the scheduler and most of the synchronization primitives. Windows NT offers a broad range of synchronization primitives. Microkernel architecture gives a system configurability and fault tolerance. Because a subsystem like the VMM runs as a distinct program in NT’s modified microkernel design, a different implementation that exports the same interface can replace it.
Executive Subsystems
Above the Hardware Abstraction Layer and the Microkernel are Windows NT’s self-contained subsystems. Together with the microkernel and the HAL they comprise the kernel mode components of Windows NT.
Object Manager
The object manager is responsible for resource identification and reference counting. Other executive subsystems submit requests to the object manager when a particular system resource is required. The object manager implements the global namespace. All system processes reference the object manager when a given resource is required.
Every time a system service requests a resource from the object manager, the object manager references the Security Monitor to ensure the application (or operating system environment) has sufficient access to use the resource. The Security Reference Monitor uses Security Identifiers (SIDs) and Discretionary Access Control Lists (DACLs), and every process in Windows NT has an access token containing security information.
I/O Manager
All add-on device drivers are connected to Windows NT through the I/O Manager. Interrupt Service Routines are registered through the I/O Manager and asynchronous, packet-based I/O is supported.
Configuration Manager
The configuration manager controls the Windows NT system registry, as well as the Windows Registry API functions.
Process Manager
The project manager executes all native system services including NtCreateProcess and similar functions. The Process Manager also implements process accounting.
Virtual Memory Manager
The Virtual Memory Manager creates and manages address maps for processes. The Virtual Memory Manager also does all physical memory allocation. To improve performance, the VMM implements file memory mapping, memory sharing, and copy-on-write page protection. File memory mapping involves the automatic load of a file that is identified by the VMM as being connected to a currently in use portion of a process’ memory map. Memory sharing is used primarily for communication between processes. Copy-on-write is implemented to facilitate sharing of one instance of data between many processes.
Cache Manager
Windows NT has a single global file system cache. The cache is file oriented and maintained entirely by the Cache Manager
Local Procedure Call Facility
This subsystem was created to improve communications performance. Windows NT 4.0 uses data-copying (port to port) for messages smaller than 256 bytes and shared memory for messages larger than 256 bytes.
Win32
Windows NT 4.0 has a portion of the Win32 Operating System environment implemented in kernel mode as an Executive subsystem
Power Manager
This subsystem is integrated into NT 5.0, which Microsoft has announced will be released as Windows 2000. As NT and Windows95/98 merge, a sophisticated power manager is required for laptop and palmtop computers. It is likely that Enterprise Servers will not take advantage of this subsystem.
Plug and Play Manager
This subsystem is also an important feature in Windows 2000 to ensure home computer users are able to easily add and remove hardware as well as upgrade device drivers for any existing hardware.
The Windows NT Scheduler is invoked every time a thread’s time quantum expires, a thread becomes ready to execute (a thread that was waiting for an event, or a new thread is created), and when a thread begins waiting for an event to occur.
FindReadyThread
Find Ready Thread executes whenever a thread completes its time quantum and the scheduler must make a decision if another thread will take over the CPU. FindReadyThread also executes whenever a thread gives up the CPU before its time quantum expires because it begins waiting for an event.
ReadyThread
A process normally calls ReadyThread whenever a thread is ready to execute. The ReadyThread algorithm makes a decision as to whether the thread should be scheduled for immediate execution or placed in the ready queue (Dispatcher Ready queue)
Starvation Prevention
The Balance Set Manager thread wakes up regularly and runs ScanReadyQueues to look for any threads that have not executed in more than 3 seconds. All such threads are boosted to a priority of 15 (the highest non-realtime priority)
Boosting and Decay
Threads waiting for an event to occur are given a boost of between 1-6 when the event occurs. The priority of threads decays every time a thread completes its time quantum. Boosts are cumulative, so that a thread can make its way to priority 15 (but not higher for non-realtime threads) and stay around this priority
Processor Affinity in Multi-processor systems
A thread has soft affinity for a particular processor when the processor happens to be the last processor the thread executed on. Soft Affinity is the primary factor used by the scheduler to schedule threads for execution. Hard affinity for a processor is a design parameter, and can be one or more processors that the thread must execute on. An ideal processor can also be specified for a thread.
When a thread becomes ready to execute, the scheduler will first determine if there are any idle processors in the thread’s hard affinity list. If there is, then the thread begins executing on that processor. If not, the scheduler looks to only one other processor: the thread’s soft affinity processor. If this processor is currently executing a higher priority thread the ready-to-execute thread is placed in the Dispatcher Ready List (Ready Queue).
Thread Migration
Unnecessary thread migration can result from the scheduler’s soft affinity strategy. In the example outlined above, if another processor then becomes idle, the ready-to-execute thread may be scheduled for execution on that processor. The thread has now potentially waited unnecessarily (it was placed in the Dispatcher Ready List when it was ready to execute even if there were lower priority threads executing on processor’s other than the thread’s soft affinity processor) and has also moved to a non-soft affinity processor.
Operating System Environments
NT’s operating system environments are implemented as client/server systems. The operating system environment is implemented as the server and applications make requests to the server using the API exported by the client-side libraries linked to the application during compilation. Applications written for the Win32 OS environment have the are able to make NT native API calls directly as well as calls to the Win32 server. Win32 provides a higher level interface to the Windows NT native API, but with often significant performance implications. The performance hit has supposedly been alleviated largely through the use of messages called Local Procedure Calls. Many applications that made extensive use of Win32’s drawing functions - the windowing functions - has very high overhead in earlier implementations of Windows NT. Windows NT 4.0 removed this problem by moving a portion of Win32 into the Windows NT Executive. Microsoft has publicly announced that they intend to expand only the Win32 API in the future and that it is conceivable that the Win32 Operating System server be moved entirely into the Windows NT Executive
Hard Affinity Optimizations
The Canadian Broadcasting Corporation’s web site is currently hosted on a quad-Pentium II 350 Mhz with 1Gb of common physical memory. The system is running Windows NT 4.0 SP4, IIS 4.0, and MS-SQL 6.5 SP4. A hard affinity optimization has been done with the assistance of Microsoft to execute SQL threads on three of the four processors and the Active Server Pages interpreter and Web Server on one of the processors. The web server is continuously waiting for disk events (although this was alleviated by increasing memory to 1Gb). An 8% performance improvement was seen through this optimization. That is, the total time to serve one million page views was reduced by 8% through hard affinity.
Windows NT handles interrupts differently than most other operating systems. In NT, an interrupt is associated with an Interrupt Service Routine, but that routine does not actually do any of the data exchange with the device, nor does it contain any system instructions. Rather, the Interrupt Service Routine contains code necessary to instruct the device to lower its interrupt signal, and then requests a Deferred Procedure Call (DPC).
DPCs contain the code normally found in an Interrupt Service Routine. The difference is that it is executed when the processor Interrupt Request Level has been lowered to the Dispatch Level, which permits general device interrupts to occur. The advantage to this strategy is that the processor spends much more time at lower Interrupt Request Levels, so that device interrupts can be serviced more frequently.
The Windows NT scheduler has advanced functionality that looks at a given processors Deferred Procedure Call queue and makes decisions about which to execute based on status of the processor’s Dispatcher Ready List. The ISR can assign a priority to a DPC, and the request will be placed in the DPC queue accordingly.
Symbolic Name
|
Purpose
|
Intel Level
|
Alpha Level
|
High Level
|
Highest interrupt level
|
31
|
7
|
Power Level
|
Power event
|
30
|
7
|
IPI Level
|
Interprocessor signal
|
29
|
6
|
Clock Level
|
Clock tick
|
28
|
5
|
Profile Level
|
Performance monitoring
|
27
|
3
|
Device Level
|
General device interrupts
|
3-26
|
3-4
|
Dispatch Level
|
Scheduler operations and deferred procedure calls (DPCs)
|
2
|
2
|
APC Level
|
Asynchronous procedure calls (APCs)
|
1
|
1
|
Passive Level
|
No interrupts
|
0
|
0
|
One of the biggest problems with NT’s interrupt handling mechanism is that whenever interrupts are serviced (ISRs and all associated DPCs) the Windows NT Scheduler is inoperative. Note that Scheduler operations are at Dispatch Level, and the DPCs and ISRs run at Dispatch Level and Device Level respectively, meaning all interrupts at the Dispatch Level and lower are not accepted.
|