8/21/2009

NT-Architecture: What is.. Kernel- and User-Mode

Most operatingsystems have programs for displaying CPU utilization. In Windows this program is “Task Manager”.

The CPU-utilisation is generally displayed as a simple percentage of CPU time spent on non-idle-tasks. But this is just a simplification. The CPU spends time in two very distinct modes in every modern operating-system:

  1. Kernel-Mode

The executing code in Kernel-Mode has unrestricted access at the Hardware. This code is able to execute every CPU-instruction and has fully access to every memory address. Kernel mode is reserved for lowest-level, most trusted functions of the operating system. But a crash in kernel-mode is catastrophic because it will halt the PC.

  1. User-Mode

The excecuting code in User-Mode is not able to directly access memory or other hardware. Code in user mode must use system APIs to access the systems hardware and memory. Crashes in User-Mode are recoverable due to the protection by the isolation. Most code is executed in user-mode.

It is possible to display the kernel-time in Windows Task Manager (as I did in the picture above). The green line represents the total CPU time and the red line represents the Kernel time. The difference between this lines show the User-time.

These two modes aren't mere labels; they're enforced by the CPU hardware. If code executing in User mode attempts to do something outside its purview-- like, say, accessing a privileged CPU instruction or modifying memory that it has no access to -- a trappable exception is thrown. Instead of your entire system crashing, only that particular application crashes.

x86-CPU-Hardware has 4 protection rings: 0, 1, 2, and 3. Typically just 1 and 3 are used.

If we're only using two isolation rings, it's a bit unclear where device drivers should go-- the code that allows us to use our video cards, keyboards, mice, printers, and so forth. Do these drivers run in Kernel mode, for maximum performance, or do they run in User mode, for maximum stability? In Windows, at least, the answer is it depends. Device drivers can run in either user or kernel mode. Most drivers are shunted to the User side of the fence these days, with the notable exception of video card drivers, which need bare-knuckle Kernel mode performance. But even that is changing; in Windows Vista, video drivers are segmented into User and Kernel sections. Perhaps that's why gamers complain that Vista performs about 10 percent slower in games.

The exact border between these modes is still somewhat unclear. What code should run in User mode? What code should run in Kernel mode? Or maybe we'll just redefine the floor as the basement-- the rise of virtualization drove the creation of a new ring below all the others, Ring -1, which we now know as x86 hardware virtualization.

The User-Modus is clearly helpful, but not without disadvantage: Transitioning between the two modes is really slow.

The CPU's strict segregation of code between User and Kernel mode is completely transparent to most of us, but it is the difference whether the computer crashes or programs crash most of the time.

8/11/2009

News: Nexuiz runs on ReactOS!

Nexuiz (an OpenSource-Ego-Shooter based at the DarkPlaces-Engine) runs on ReactOS since cgutman's commit from 2009/08/07:


http://svn.reactos.org/svn/reactos?view=rev&revision=42467
Author: cgutman
Log Message:
- Call IoCompleteRequest to free IRPs created by IoBuildDeviceIoControlRequest
- Fixes bug 4770

However, currently you are not able to play online because Nexuis doesn't download the serverlist in ReactOS.
Additionally, the game runs extremely slow (even with low graphics settings), if only software-rendering is used. But if you install a graphics driver with OpenGL-support, then you can also use the graphics acceleration of the graphics card.

screenshots of Nexuiz running on ReactOS:


Enjoy the game ;-)

8/10/2009

Inside the mind of a ReactOS developer: implementing _SEH_prolog and _SEH_epilog (without making baby Jesus cry)

Disclaimer

This post is not for the faint of heart. This post assumes you know the C language (especially the Microsoft implementation), C compilers (especially Visual C++), what intrinsic functions are, what SEH is and how it works, and how to read x86 assembly. I could spend entire blog posts, no, entire blogs, just explaining you what the words alone mean, so I'll just go ahead and assume you know who you are.

… baby Jesus?

Implementing open source intrinsics for commercial compilers is a horrible liability, and a thankless job: almost nobody will even notice you have done anything, and half of those who do notice will accuse you of copyright violation. Since my work will not only be used by ReactOS, but also by projects like mingw-w64 whose reputation has not been compromised yet, I have to be extra careful with it.

For this reason, and because I believe I'm not alone in enjoying this kind of crazy shit, I have decided to blog to document my thought processes while writing obscure, low-level code. I hope at least some of you will enjoy.

Why do you do it, anyway?

Strictly speaking, this work is not required. For the purposes of compiling ReactOS with Visual C++, we could simply link the original Microsoft implementations.

However, I collaborate with the mingw-w64 project as well, which aims to create a Windows version of gcc that's as close as possible to both the GNU and the Win32/Win64 platforms. Being able to link code (e.g. static libraries) compiled with Microsoft tools would be a nice plus, and to achieve that, the runtime library that comes with mingw-w64 has to provide all the Microsoft compiler intrinsics.

Additionally, I'm very nervous about linking code in ReactOS executables that comes from outside our source tree, for several reasons (for one, we have no guarantee about the compiler options that were used to compile the third-party code, and this has come to bite us in the ass before). This seems to be a common sentiment among operating system developers, as I've seen more than one tutorial on how to provide missing compiler intrinsics.

Finally, writing compiler intrinsics is work I enjoy a lot (I am the author of PSEH). I have no idea why.

And now for the main course

We are all familiar with the code that Visual C++ generates for a function containing SEH, as it has not changed in more than ten years. Given C code like this:

void a(void)
{
    __try { }
    __except(1) { }
}

the compiler will emit assembler like this, with the recognizeable and well-known prologs and epilogs (which I marked in bold):

_TEXT	SEGMENT
__$SEHRec$ = -24
_a	PROC
	push	ebp
	mov	ebp, esp
	push	-1
	push	OFFSET __sehtable$_a
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	DWORD PTR fs:0, esp
	sub	esp, 8
	push	ebx
	push	esi
	push	edi
	mov	DWORD PTR __$SEHRec$[ebp], esp
	mov	DWORD PTR __$SEHRec$[ebp+20], 0
	jmp	SHORT $LN9@a
$LN5@a:
	mov	eax, 1
$LN7@a:
	ret	0
$LN6@a:
	mov	esp, DWORD PTR __$SEHRec$[ebp]
$LN9@a:
	mov	DWORD PTR __$SEHRec$[ebp+20], -1
	mov	ecx, DWORD PTR __$SEHRec$[ebp+8]
	mov	DWORD PTR fs:0, ecx
	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp
	ret	0
_a	ENDP
_TEXT	ENDS

But, like many things, this changed with the release of Visual Studio .NET, which was a huge leap forward from its predecessor Visual Studio 6.0, and in many ways a departure from the old ways.

Specifically, since Visual Studio .NET, when the Microsoft compiler optimizes for code size (option /O1 on the command line), rather than speed (/O2), it will emit assembler like this, instead:

_TEXT	SEGMENT
__$SEHRec$ = -24
_a	PROC
	push	8
	push	OFFSET __sehtable$_a
	call	__SEH_prolog
	and	DWORD PTR __$SEHRec$[ebp+20], 0
	jmp	SHORT $LN9@a
$LN5@a:
	xor	eax, eax
	inc	eax
$LN7@a:
	ret	0
$LN6@a:
	mov	esp, DWORD PTR __$SEHRec$[ebp]
$LN9@a:
	or	DWORD PTR __$SEHRec$[ebp+20], -1
	call	__SEH_epilog
	ret	0
_a	ENDP
_TEXT	ENDS

The prolog code was collapsed into a call to _SEH_prolog, and the epilog code into a call to _SEH_epilog.

It's easy to see why: a fixed per-function overhead of 54 bytes is turned into a fixed per-function overhead of 11 bytes (to the benefit of code size) by moving the invariant code to a library function (to the detriment of code speed, because function calls are expensive).

Apparently, nobody documents what _SEH_prolog and _SEH_epilog do, so I had to find out on my own. As soon as I discovered how to reliably make the compiler emit the inline code or the calls for the same code (/O2 vs /O1 on the command line, it turned out), though, what each function did was self-evident. If we look at the assembly listings above, we'll see that they are virtually identical, save for the bolded sections, therefore the functions used in the second listing are completely equivalent to the prolog/epilog code of the first listing. Since the functions have only one implementation each, the inline code we are looking at is the only possible prolog/epilog. In other words, the only requirement for _SEH_prolog is that, if it's invoked like this:

	push	8
	push	OFFSET __sehtable$_func
	call	__SEH_prolog

it will have the same effect as the following code:

	push	ebp
	mov	ebp, esp
	push	-1
	push	OFFSET __sehtable$_func
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	DWORD PTR fs:0, esp
	sub	esp, 8
	push	ebx
	push	esi
	push	edi
	mov	DWORD PTR __$SEHRec$[ebp], esp

and the only requirement for _SEH_epilog is that, when invoked, it will have the same effect as the following code:

	mov	ecx, DWORD PTR __$SEHRec$[ebp+8]
	mov	DWORD PTR fs:0, ecx
	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp

We can see that _SEH_prolog and _SEH_epilog cannot be regular functions, because they modify the stack of the caller: _SEH_prolog allocates stack space, saves registers and sets up a SEH frame, and _SEH_epilog undoes it. Since the stack pointer when the functions exit is at the "wrong" height (_SEH_prolog allocates stack space, _SEH_epilog deallocates), we can also reasonably assume that the return address is extracted from the stack into a register. We can also reasonably assume that the functions end with a ret instruction, so they don't screw too much with the CPU's branch predictor (which expects each call to be followed by a ret, not by a jmp that sometimes happens to go to the right place), so their last instructions have to look a lot like this:

	; put the return address on the stack
	push	register

	; return to the return address
	ret

The stack tricks alone are a guarantee that the functions cannot be written in C, and must be written in raw x86 assembler.

After this discovery, I decided to take a closer look at how, exactly, the prolog code sets up the stack (and what, by extension, would the epilog code have to do to undo it).

Stack layout of a SEH-using function

By "manually" executing the instructions of the prolog, one by one, it's easy to see how the stack is being set up, and which special locations on the stack will be pointed to by which special registers:

-00000004:	original ebp	; ebp points here
-00000008:	-1
-0000000c:	OFFSET __sehtable$_func
-00000010:	OFFSET __except_handler3
-00000014:	original fs:0	; fs:0 points here
-00000018:	undefined
-0000001C:	new esp		; (ebp+__$SEHRec$)
-00000020:	ebx
-00000024:	esi
-00000028:	edi		; esp points here

(instead of actual, absolute stack addresses, I'll use offsets from the initial value)

The 8 in the sub esp, 8 instruction is the size of the space between stack locations -00000014 and -0000001C. We can easily verify that this number is actually 8+0, where 0 is the combined size of the stack-allocated variables.

Summing up, for a function with N bytes of local variables, the final layout will be:

-00000004:	original ebp
-00000008:	-1
-0000000c:	OFFSET __sehtable$_func
-00000010:	OFFSET __except_handler3
-00000014:	original fs:0
-00000018:	undefined
-0000001c:	new esp		; (ebp+__$SEHRec$)
-00000020:	variable -4	; (ebp+__$SEHRec$) -4
-00000024:	variable -8	; (ebp+__$SEHRec$) -8
   ...		variable ...	; (ebp+__$SEHRec$) ...
-00000020 - N:	ebx
-00000024 - N:	esi
-00000028 - N:	edi		; esp points here

The _SEH_prolog is therefore defined as the function that can turn the stack from its initial state to the final layout, and set special registers ebp and fs:0 (actually a special memory location, but a software register in practice) to the expected values, and _SEH_epilog as the function that can turn the final layout back into the initial state, and reset the special registers ebx, esi, edi, ebp and fs:0 to their initial values.

Implementing _SEH_prolog

We have to write the _SEH_prolog function so that it can create the final layout depicted above from an initial layout of:

-00000004:	8 + N
-00000008:	OFFSET __sehtable$_func
-0000000c:	return address

Additionaly, _SEH_prolog has to do so without disturbing the calling function's execution:

  1. it must return to the instruction after the call __SEH_prolog;
  2. it must not overwrite the ecx register (which contains the this pointer in __thiscall functions, and the first argument in __fastcall functions) nor the edx register (which contains the second argument in __fastcall functions);
  3. although I cannot prove this, it's most probably expected to not overwrite the ebx, esi or edi registers.

These limitations only leave the eax register free. We will try our best to only ever use eax.

The current contents of the stack are all wrong, but they all are information that we will need later, so we leave them alone. The next word on the stack is the address of __except_handler3, and the word after that is the current contents of fs:0. We can simply push them on the stack:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	...
__SEH_prolog ENDP

At this point, the stack would be at the correct height to store its pointer in fs:0, but we can't do it yet: we cannot push a SEH frame before we have finished initializing it. We hold our horses, and move on to the next stack location.

The next stack location is the variable-sized part. We know the size of this part, because it's on the stack, passed as an argument to _SEH_prolog. All we need to do is to get it, and lower the stack by that amount. The current layout of the stack is:

-00000004:	8 + N
-00000008:	OFFSET __sehtable$_func
-0000000c:	return address
-00000010:	OFFSET __except_handler3
-00000014:	original fs:0 ; esp points here

so the argument is at [esp+16]. We cannot lower the stack yet, though, because then we wouldn't have a nice fixed offset to access the second argument (the pointer to __sehtable$_func) and the return address. Sounds like a good time as any to initialize the ebp register. We'll copy the argument from offset -00000004 to a register, save the current value of ebp to offset -00000004, and then set the new, final value of ebp:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	eax, [esp+16]
	mov	[esp+16], ebp
	lea	ebp, [esp+16]
	...
__SEH_prolog ENDP

With a pointer to a well-known location of the stack, we don't need to worry anymore about the variable-sized part, which we can now simply allocate:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	eax, [esp+16]
	mov	[esp+16], ebp
	lea	ebp, [esp+16]
	sub	esp, eax
	...
__SEH_prolog ENDP

The stack is looking a little better now:

-00000004:	original ebp
-00000008:	OFFSET __sehtable$_func
-0000000c:	return address
-00000010:	OFFSET __except_handler3
-00000014:	original fs:0
-00000018:	undefined
-0000001c:	new esp
-00000020:	...

We have to move the return address to eax, and the pointer to __sehtable$_func to stack offset -0000000c. Unfortunately, we cannot do two overlapping moves with a single register, so we'll store the return address on top of the stack instead, where it belongs anyway. To do so, we must first push the ebx, esi and edi registers, because _func expects them to be on the top of the stack when _SEH_prolog returns; we will then store the current value of esp in [ebp-24], since the stack is now at the expected height:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	eax, [esp+16]
	mov	[esp+16], ebp
	lea	ebp, [esp+16]
	sub	esp, eax
	mov	[ebp-24], esp
	push	ebx
	push	esi
	push	edi
	mov	[ebp-24], esp
	...
__SEH_prolog ENDP

We can then move the return address from -0000000c ([ebp-8]) to the top of the stack, then move __sehtable$_func from -00000008 ([ebp-4]) to -0000000c ([ebp-8]), and finally set -00000008 ([ebp-4]) to constant value -1:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	eax, [esp+16]
	mov	[esp+16], ebp
	lea	ebp, [esp+16]
	sub	esp, eax
	mov	[ebp-24], esp
	push	ebx
	push	esi
	push	edi
	mov	[ebp-24], esp
	mov	eax, [ebp-8]
	push	eax
	mov	eax, [ebp-4]
	mov	[ebp-8], eax
	mov	[ebp-4], -1
	...
__SEH_prolog ENDP

The stack is now ready, with an extra word on top containing the return address. Initialization is done, and we can, at last, set fs:0 to the new SEH frame, starting at -00000014 ([ebp-16]), and return to the caller:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	eax, [esp+16]
	mov	[esp+16], ebp
	lea	ebp, [esp+16]
	sub	esp, eax
	mov	[ebp-24], esp
	push	ebx
	push	esi
	push	edi
	mov	eax, [ebp-8]
	push	eax
	mov	eax, [ebp-4]
	mov	[ebp-8], eax
	mov	[epb-4], -1
	lea	eax, [ebp-16]
	mov	fs:0, eax
	ret
__SEH_prolog ENDP

And with this, _SEH_prolog is done.

Implementing _SEH_epilog

The _SEH_epilog function has restrictions similar to those of _SEH_prolog, and then more:

  1. it must return to the instruction after the call __SEH_epilog;
  2. it cannot use the eax or edx registers, because they contain the return value of the function;
  3. it must restore the initial values of ebx, esi, edi, ebp and fs:0;

As we limited ourselves to using eax in _SEH_prolog, we'll limit ourselves to ecx in _SEH_epilog. We know ecx is safe to overwrite in the epilog (even in __fastcall and __thiscall functions) because the inline epilog code generated by the compiler does so.

The stack layout on entering _SEH_epilog is:

-00000004:	original ebp	; ebp points here
-00000008:	-1
-0000000c:	OFFSET __sehtable$_func
-00000010:	OFFSET __except_handler3
-00000014:	original fs:0	; fs:0 points here
-00000018:	undefined
-0000001C:	new esp
-00000020:	ebx
-00000024:	esi
-00000028:	edi
-0000002c:	return address 	; esp points here

Following the example of the inline code, we restore the original value of fs:0 first:

__SEH_epilog PROC
	mov	ecx, DWORD PTR [ebp-16]
	mov	DWORD PTR fs:0, ecx
	...
__SEH_epilog ENDP

Then we pop the return address off the stack, so we can restore ebx, esi and edi:

__SEH_epilog PROC
	mov	ecx, DWORD PTR [ebp-16]
	mov	DWORD PTR fs:0, ecx
	pop	ecx
	pop	edi
	pop	esi
	pop	ebx
	...
__SEH_epilog ENDP

The next operation the inline code does is restoring esp and ebp, so let's do it too:

__SEH_epilog PROC
	mov	ecx, DWORD PTR [ebp-16]
	mov	DWORD PTR fs:0, ecx
	pop	ecx
	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp
	...
__SEH_epilog ENDP

Finally, let's put the return address back on the stack, and then return to it:

__SEH_epilog PROC
	mov	ecx, DWORD PTR [ebp-16]
	mov	DWORD PTR fs:0, ecx
	pop	ecx
	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp
	push	ecx
	ret	0
__SEH_epilog ENDP

And we're done. Any questions?

Q&A

What about functions that have more than 4096 bytes of local variables?

If a function has more than 4096 bytes of local variables (4068, actually — the hidden 28 bytes are the SEH frame minus the saved ebx, esi and edi), then the compiler will emit inline prologs/epilogs; the prolog will use _chkstk instead of sub esp.

What about code compiled with stack checks (/GS)?

Functions compiled with stack checks have different prologs/epilogs; the external implementations are called _SEH_prolog4 and _SEH_epilog4, and maybe we'll see them in a future episode.

Conclusions

In this first installment of "Inside the mind of a ReactOS developer" we have seen what the _SEH_prolog and _SEH_epilog functions are, what they do and how to implement them without looking at as little as one byte of copyrighted Microsoft code.

This blog post is the first public source on the internet that described the aforementioned functions in detail (that I know of), and if there is interest, I will write more like this.

Comments are welcome.

That's all.

8/08/2009

NT architecture: What is...an Architecture?

Windows is a well known OS from the user point of view ( tell me your first game played wasn't winemine or solitaire under any of the Windows versions), also thanks to some interesting books as "Windows Internals 4th or 5th Edition" now you can begin understanding the internal behavior of Windows.

ReactOS is an operating system which aims to have the same internal behavior than Windows, following this target,ReactOS will be able to load/run Applications and Drivers designed for Windows.

Windows,Gnu-Linux,ReactOS or any other OS has just one objective from the User point of view: Run Apps.From the developing point of view: Create the fastest/reliable way of connecting Applications with Hardware, processing and returning results.

But "Windows" is just an easy name to remember.The way an OS behaves,the way an OS processes the information,how the app is "processed" and how it is connected with the hardware(to show results in the LCD screen) is called Architecture, by the way "Windows" has followed different architectures.The last one is called "NT architecture".

What´s an Architecture?

I like the idea of comparing any OS architecture with the architecture of the buildings. Imagine you are an architect, imagine you can make a building with the shape you want but you have to give room for 100 people. You can decide creating a building of 10 floors and each floor creating 10 flats, or you can decide creating a skytower of 100 floors and just one flat on each floor. They are different Architectures, but they are valid solutions. In the same way you may have different OS Architectures, in this case the requirement is "processing and connecting the Software with the Hardware", so you can find: Nt architecture, Mac OS X architecture, Linux kernel...

And the Best Architecture is...

Let´s return to the idea of the buildings. Deciding making a skytower or a 10-floors building will make you facing different problems to solve.

Let´s study the skytower: you will have one neighboor in each floor, that means that each neighboor just will have a direct relation with the neighboor it has over his head and the one under his feet. So the flats are quite silence.That is a nice thing if you are the architect but also the owner who wants to rent the flats. "Silence full-floor flats": nice advertisement.

But of course it has some drawbacks, the first one: If the guy of the first floor wants to go to the roof will need some sandwiches to arrive to the roof if there isnt any direct elevator.

The 10floors-Architecture is a nice approach, it is much less expensive to construct, but there are much more noise in each floor and a lot of relations(which in a neighborhood can lead to problems) are created. Let's count the relations: each guy of the 5th floor will have 9 neighboors and 10 neighboors over his head and another 10 under his feet.Total: 29 direct relations. This is EACH guy. Try to calculate the number of total relations and compare with the number of the total relations with the Skytower approach.

Now, we can indeed invent a new architecture called: 1-Floor. An odd "building" which just have one enormous floor and 100 neighboors. Could you calculate the number of possible direct relations?Yes, a lot. But ey!They can arrive to the roof quite easy, withour any Elevator!!Wowww..(*ehem*)

Ok,Ok. Why am I talking about roofs,neighboors,floors?

The Translation...

An OS architecture can be studied as a number of superposed layers, as a building can be studied as a number of superposed floors. So we can have an architecture with 100 layers or an architecture with just 1 layer.This doesnt say too much, but let´s following. Each layer is usually divided in different structures,as a floor can be divided in 10 flats.And each flat(sorry, structure) has its own task inside the Architecture. One flat(sorry structure) processes the graphics , other flat talks with the hardware,other flat talk with the application, other flat talks with flat2 because flat3 is angry with the latter one...

So lets go to the skytower example again:

On the top(the Roof) of the Skytower there is an Application, in the basement of the building there is the Hardware, and theflats have to create "a way" for the app to reach the hardware.(Anything ends in the microprocessor,right?or accessing the HDD, or printing info in the Screen)

The skytower is the typical example of a too much layered OS.A too-much layered has some advantages, you as a programmer dont have to create a lot of direct relations between the structures(neighboors) in this skytower just 2 direct relations.So seems to be easier to control, but..you face one main issue: Slowness. The "call" has to go through 100 floors.No,there isn't a fast elevator.Maybe the "call" doesn't need to be processed in the floor number 63, but it has to go through it to reach the floor number 62.

What happens with a 1-Floor architecture? This is a non-layered-at-all architecture. Layers are nice since they reduce the number of direct relations (if you didn't calculate,please do it now :3.It´s huge )but they introduce extra time if you layered the architecture too much. A non layered architecture is huge in direct relations, and it is a full mess since you will be seeing the call going to the flat number #1 which later goes to the number #35 then going back to #12 and then to #98 and all this happening in ONE floor.

Of course an App makes different "calls"("ways"). A "call" which calculates 2+2 isnt the same that a "call" which reproduces music.This means that a call which calculates 2+2 needs to go through some flats but maybe different(some could be commun) that the flats which the "sound call" goes through.

In a 1-Floor architecture (thanks to be all the flats in the same floor), the calls just have to go in and out from the desired flats and ending in the basement.Now,imagine the skytower again: both calls will go through the 100 Flats(because in skytower flat and floor is the same)!!!

Look carefully the two pictures : There are 2 calls represented, but in different architectures(Skytower vs 10-Floors).

Call1:(Orange) This one has to visit Flat98, Flat4 before reaching the hardware

Call2:(Pink)This one has to visit 20,19,3.

As you can see in a Skytower architecture the calls have to go through all the Floors, to reach the desired flats. Something like calling to the door and saying: "Hi, I want to go into your flat but just because i need to reach your neighboor who is under your feet. My apologies"

In the 10-Floors, this happens much less, but you can find also calls (look Call1) which goes through layers without visiting any of the flats, just to arrive a flat which is placed in a layer below.

Look also that the call #100 to #98 doesnt need to go through 99. This is possible because Direct Relations inside the layer. Same with #19 and #3. Of course it´s just a developing decission if ALL the flats inside a layer have direct relations, or if it isn't needed(easier to code).

Look the call #20,#19,#3: It´s possible that 20 and 3 doesnt have a Direct Relation so the call needs to go through 19 (like in the Skytower) or maybe it goes through 19 because it needs to "do something" inside.

"So it´s better a 1-Floor architecture...":

Well,not really. Imagine that the 100% of the time you need at least going inside 2 flats(call1: flat #1,flat #13; call2: flat#3,flat#11,flat#7;call3: flat #1, flat#7,flat #14...and others 2 or more flats combinations).Then it´s obvious that if you create a 2-Floor architecture (remember you can place the flats inside this 2 Floors as you wish:you are the architect)you arent going to lose extra time!

So yes, you have to find the balance between number of direct relations(easier,clear and cleaner code as possible) with the performance of the OS(not too much layers).And also it depends of the objective of the OS: a real-time OS cant be quite layered or it will "lose a lot of time" from a Real-time OS point of view(any nanosecond counts).

So each OS tries to give its own solution to this balance problem, and that is why there are different architectures. ReactOS follows the NT architecture and its layers are placed in the same way Nt does.

And yes: We are going to try to explain the ReactOS (NT) architecture..Are you ready? ;)

See you next week :)