11/09/2009

When I throw an exception, my program terminates with a “terminate called after throwing an instance of `…'” message

Answer

The function that throws the exception was called from C code, or code that was otherwise compiled without unwinding information. This affects gcc but not Visual C++: Visual C++ always generates full unwinding information.

If you can recompile the C code in question, recompile it with the –fexceptions flag: this will enable the generation of unwinding information (disabled by default for C).

If you cannot recompile the C code in question (for example, if it’s a system library), then you cannot throw exceptions when it calls your C++ code.

When I catch an exception, the exception message is always “Unknown exception” or “std::exception” instead of the correct message

Answer

You are catching std::exception (or another exception base class) by value. Catch by reference instead, to correctly catch instances of derived classes.

This code will print “Unknown exception” (Visual C++) or “std::exception” (GNU g++):

#include <stdexcept>
#include <iostream>

int main()
{
    try
    {
        throw std::runtime_error("error message");
    }
    catch(std::exception e)
    {
        std::cout << e.what() << std::endl;
    }
}

This code will correctly print “error message”:

#include <stdexcept>
#include <iostream>

int main()
{
    try
    {
        throw std::runtime_error("error message");
    }
    catch(std::exception& e)
    {
        std::cout << e.what() << std::endl;
    }
}

11/07/2009

GNU make fails on Windows with “multiple target patterns” or “target pattern contains no `%'”

Answer

You have a quoted Win32 absolute path in a makefile rule. Don’t use quotes: GNU make doesn’t support quoting at all, and you will never be able to use paths with spaces in them in GNU make.

This is correct:

target: X:\path\to\file
	@command $< $@

This will raise an error:

target: "X:\path\to\file"
	@command $< $@
Why only absolute paths?

It’s the colon after the drive letter. The colon is a special character that tells GNU make that the rule is a pattern rule.

Why only quoted Win32 paths? why not unquoted paths?

GNU make has special code for recognizing and parsing Win32 paths. Since GNU make doesn’t support quoted paths, the opening quote is interpreted as just another character of the file name, the colon is no longer the second character of the path and the path is not recognized as a Win32 path.

10/26/2009

[ros-dev] Reactos and MinWin

Gerard Murphy to the ros-dev-mailing-list:
I’ve recently been taking an interest in MinWin, especially as it’s now visible in Windows 7, and I want to bring up the possibility of introducing something similar into ReactOS.
I know some of you will be familiar with WinWin and I know at least one of you will know quite a bit more about it than I do, but for those who don’t here’s an overview.

What is WinMin?
MinWin is an internal project at Microsoft that manifested around the time of the Vista rewrite. Very basically explained, the aim is to detangle the very core components from the full operating system, resulting in something which can be built separately and leaving just enough components for a working system. This minimal operating system can then boot and be tested separately or be built as part of the full OS and tested in internally as before.

The main problem in doing this, and one of the main benefits in solving it, is module dependencies. By simply cutting out the bottom part of the OS you’re left with modules which potentially call out to other modules which may now be missing. If these dependencies aren’t understood you have the possibility of an unstable OS as you never know where a code path may lead. It also means you need to build more components to get the thing to link, which also undermines one of the reasons for doing it.

What Microsoft did is to draw up a dependency tree and structure it in a layered hierarchy. Modules were then rearranged to make architectural sense according to the NT design and a top layer was decided upon. API call were then rewritten within this hierarchy such that anything in one layer would only call API’s in the layers below it. Therefore if something in layer 8 called something in layer 7 which then called something in layer 12, the APIs would be fixed to only use APIs at or below its level according to the hierarchy. You’re then left with a clean line where nothing calls out and you have a tightly bound core OS which is completely independent from anything.

What is essentially left in Microsoft’s WinMin is what people class as Cutlers NT along with supporting components to get a bootable system. The main components are:
The kernel (main kernel, not the full ntoskrnl module), parts of the executive, memory manager, networking, a file system driver and core drivers (basc IO and Bus, etc)
This is then exposed in usermode by 2 dlls, ntdll.dll and the new kernelbase.dll. On top of this you have a console for issuing commands.
This comprises of the full OS and amounts to about 25Mb on disk and has around a 40Mb working set.

Let’s quickly explain the new kernelbase dll.
As many of the API’s in the lower level dlls such as kernel32 and advapi32 were targeting functionality which wasn’t in WinMin, it was decided to pull out any of the API’s which were required to exercise WinMin and put them into kernelbase. This dll could then be used when building a running WinMin to provide all the functionality this kernel offered. The dlls which had their APIs removed now forwarded their call onto kernelbase so from a user perspective, nothing had changed.

Microsoft went a step further and designed a new technology around virtual dlls. They split functionality of dlls into ‘api sets’  These can now be seen in Win 7 in the system32 dir with the prefix ‘api-ms-win-core--<num of>-1-0.dll’. WinMin components now link against these virtual dlls which load up the real libs listed in apisetschema.dll.
However this involves modification to the loader and I don’t think it’s something worth us considering right now.


What are the reasons for doing this?
For a start, it’s a much better design from an engineering perspective.
The more software you have running on a system the more noise is created and the more things can interfere with each other.
It gives us a base to innovate with without worrying about affecting anything outside.
It gives us a more reliable base for the OS and a better platform to run comprehensive and efficient tests.
It makes us more attractive to external companies as we would have a slick NT compatible kernel ready for companies to build their products on, especially embedded applications.
With a few more additional items such as the base win32 subsystem, we’d have something along the lines of a stripped out ServerCore or Windows PE.



This is only a quick overview for discussion purposes, but it outlines the main design and goals of WinMin.
Comments, questions, concerns and gripes welcome J

Ged.
_____________________________

10/03/2009

Regarding UserAccounts in ReactOS

Hi,

many of you know we don't even support multiple Users at all, but I made some thoughts these days what way would be the best. I looked at the Linux and Windows ways and see pros and cons in both. Lets Compare them.

Linux uses one root account with all rights you can get and as much restricted ones as you want. This has the problem that you need for some apps root rights to get them install/run and as soon as these apps have these rights, they could easily kick your ass and kill the whole PC. So here I miss the softer differences between accounts. A root/User separation is not enough, we need steps in between.

Windows has a System Account with full rights, Admin Accounts with rights almost anywhere, Normal Users Accounts which only have access to their own and shared files and the Guest Account with NO rights at all. This is already much better, but I see problems here too. IMO the System Account has to be usable too to get things done, which fails on normal Admin Accounts. (Know these funny non deletable folders and files?!) The Admin Accounts are alright in theory. All Apps run there, so lets keep them for now. The next Problem is the User Account. You can't install any apps there which need a bit too much access to files. This can be bypassed of course, but only over a Admin Account which gives you the rights. IMO a good idea would be a Sandbox for files a setup wanted to replace and has no right to. The setup gets its files in place and they even work, but no Files from the main System are replaced at all. The funny aspect is, all files are usable like on a real Admin Account, so a virus could even work there and force you to reboot the PC because it fails BADLY. But there's the good aspect of the sandbox. You go into a recovery mode and say, kill the sandboxed files and the virus is GONE.

This sandbox aspect could be enhanced in a billion ways. A backup system which makes a System files Sandbox every week and you can switch between them.

Make apps only use a Sandboxed Version of important Files you made. This prevents Viruses to kill the Originals.

etcetc. Of course its just a thought of me. We could just use a powerful access System, too. Some ppl still might know WinPooch Watchdog. This app makes it possible to badly restrict any accesses to files and folders and that in a really hardcore way. It hooks in some Kernel APIs and thus restricts file/registry and whatever access for specific apps and users. This would be a nice thing for our accounts too.

Source: http://dreimer.dr.funpic.org/sblog/ - thx dreimer!

9/27/2009

RosBE 1.5

Dreimer published this post at his blog:
Hi, many ppl already asked me when RosBE 1.5 will be released.

Well, there are some problems we first have to solve. As many ppl were able to watch in the #reactos, #reactos-dev and #reactos-rosbe chats, we had a hard time trying to build GCC 4.4.X on Windows. With "we" I mean, encoded, Colin and me. The results were billion different tries and almost the same number of different errors. Well, now HTO seems to have made a nice build for us, which he uploaded for testing these days. Lets hope he also documents what he did to get it built. We finally want to have a really working manual in our wiki.

Right now we have documented our tries here: LINK

Don't even try to use it. Cloog-PPL fails to build at all and GCC too. Hopefully it will be updated in the future.

OK, now to the changes I made up to now on the scripts:

- Installer Fixes:
* Installer starts the Uninstaller of the old Version again.
* Installer does not accidentially kill the whole Start menu again.
* Installer uninstalls charch too.
* Fixed some line skippers.
(care2debug, Daniel Reimer)
- AMD64 Addon Fixes:
* Fix the one dash too much problem.
* Fix the args not checked bug.
(Samuel Serapion)
- Prepare scripts for RosBE64 1.1 which is compatible with the new RosBE Versions (Daniel Reimer)
- Bugfixes in ssvn (Shashkov Maxim, Daniel Reimer)
- Fix confusion between %_ROSBE_ROSPREFIX% and %_ROSBE_PREFIX%. (Colin Finck)
We only use %_ROSBE_PREFIX% now, which should fix building with amd64. (Colin Finck)
- Readd chdefdir's feature to switch to the new default directory after changing it. (Colin Finck)
- Fix inability to switch back to i386. (Colin Finck)
- Fix Variables to be able to use more recent GCCs. (Daniel Reimer)
- Fix up the call of the i386 config file in charch. (Daniel Reimer)
- Fix up default color of the 64 bit RosBE. (Daniel Reimer)
- "Only" call the 64 bit config file when you are in 64 bit mode. (Daniel Reimer)
- Fix the 64 bit options tool to be useable. (Daniel Reimer)
- Tidy up work on all files. (Daniel Reimer)
- Added RBuild Flags Setting possibility into the "config" command. They will be loaded at RosBE
startup and thus behave the same way like the other settings you can set in there. (Daniel Reimer)
- Made the "update" command stop when it found the last existant update online. Theres no use in trying
the full update 1-9 if theres none/one. This speeds up the status generation process. (Daniel Reimer)
- Get rid of the Doskey macro file, just add %_ROSBE_BASEDIR% (and %_ROSBE_BASEDIR%Tools for svn.exe)
to the PATH and call all .cmd files directly. Renamed some batch files to match the macro names and
added some new ones, for which no batch file existed. (except "env", this command has been removed)
Now these commands can be called from other batch files and stuff like "clean & make" will work as well.
(Colin Finck, Gunnar)
- Fix raddr2line.cmd to properly handle spaces in the path. (Colin Finck)
- Added rosapps and rostests support to ssvn. (Daniel Reimer)
- clean supports now multiple clean commands like "clean aaa bbb" (Daniel Reimer, mota)
- Got rid of the TranslateOptions Hack (Daniel Reimer, Art Yerkes)

--------------------------------------------------------------------------------

Last thing I would like to see in 1.5 is a merged config script and options tool, but I have no clue how to get this done in the GUI in a nice way... so this is no show stopper at all.
RosBE 1.5 beta 1 can be downloaded here.

8/21/2009

NT-Architecture: What is.. Kernel- and User-Mode

Most operatingsystems have programs for displaying CPU utilization. In Windows this program is “Task Manager”.

The CPU-utilisation is generally displayed as a simple percentage of CPU time spent on non-idle-tasks. But this is just a simplification. The CPU spends time in two very distinct modes in every modern operating-system:

  1. Kernel-Mode

The executing code in Kernel-Mode has unrestricted access at the Hardware. This code is able to execute every CPU-instruction and has fully access to every memory address. Kernel mode is reserved for lowest-level, most trusted functions of the operating system. But a crash in kernel-mode is catastrophic because it will halt the PC.

  1. User-Mode

The excecuting code in User-Mode is not able to directly access memory or other hardware. Code in user mode must use system APIs to access the systems hardware and memory. Crashes in User-Mode are recoverable due to the protection by the isolation. Most code is executed in user-mode.

It is possible to display the kernel-time in Windows Task Manager (as I did in the picture above). The green line represents the total CPU time and the red line represents the Kernel time. The difference between this lines show the User-time.

These two modes aren't mere labels; they're enforced by the CPU hardware. If code executing in User mode attempts to do something outside its purview-- like, say, accessing a privileged CPU instruction or modifying memory that it has no access to -- a trappable exception is thrown. Instead of your entire system crashing, only that particular application crashes.

x86-CPU-Hardware has 4 protection rings: 0, 1, 2, and 3. Typically just 1 and 3 are used.

If we're only using two isolation rings, it's a bit unclear where device drivers should go-- the code that allows us to use our video cards, keyboards, mice, printers, and so forth. Do these drivers run in Kernel mode, for maximum performance, or do they run in User mode, for maximum stability? In Windows, at least, the answer is it depends. Device drivers can run in either user or kernel mode. Most drivers are shunted to the User side of the fence these days, with the notable exception of video card drivers, which need bare-knuckle Kernel mode performance. But even that is changing; in Windows Vista, video drivers are segmented into User and Kernel sections. Perhaps that's why gamers complain that Vista performs about 10 percent slower in games.

The exact border between these modes is still somewhat unclear. What code should run in User mode? What code should run in Kernel mode? Or maybe we'll just redefine the floor as the basement-- the rise of virtualization drove the creation of a new ring below all the others, Ring -1, which we now know as x86 hardware virtualization.

The User-Modus is clearly helpful, but not without disadvantage: Transitioning between the two modes is really slow.

The CPU's strict segregation of code between User and Kernel mode is completely transparent to most of us, but it is the difference whether the computer crashes or programs crash most of the time.

8/11/2009

News: Nexuiz runs on ReactOS!

Nexuiz (an OpenSource-Ego-Shooter based at the DarkPlaces-Engine) runs on ReactOS since cgutman's commit from 2009/08/07:


http://svn.reactos.org/svn/reactos?view=rev&revision=42467
Author: cgutman
Log Message:
- Call IoCompleteRequest to free IRPs created by IoBuildDeviceIoControlRequest
- Fixes bug 4770

However, currently you are not able to play online because Nexuis doesn't download the serverlist in ReactOS.
Additionally, the game runs extremely slow (even with low graphics settings), if only software-rendering is used. But if you install a graphics driver with OpenGL-support, then you can also use the graphics acceleration of the graphics card.

screenshots of Nexuiz running on ReactOS:


Enjoy the game ;-)

8/10/2009

Inside the mind of a ReactOS developer: implementing _SEH_prolog and _SEH_epilog (without making baby Jesus cry)

Disclaimer

This post is not for the faint of heart. This post assumes you know the C language (especially the Microsoft implementation), C compilers (especially Visual C++), what intrinsic functions are, what SEH is and how it works, and how to read x86 assembly. I could spend entire blog posts, no, entire blogs, just explaining you what the words alone mean, so I'll just go ahead and assume you know who you are.

… baby Jesus?

Implementing open source intrinsics for commercial compilers is a horrible liability, and a thankless job: almost nobody will even notice you have done anything, and half of those who do notice will accuse you of copyright violation. Since my work will not only be used by ReactOS, but also by projects like mingw-w64 whose reputation has not been compromised yet, I have to be extra careful with it.

For this reason, and because I believe I'm not alone in enjoying this kind of crazy shit, I have decided to blog to document my thought processes while writing obscure, low-level code. I hope at least some of you will enjoy.

Why do you do it, anyway?

Strictly speaking, this work is not required. For the purposes of compiling ReactOS with Visual C++, we could simply link the original Microsoft implementations.

However, I collaborate with the mingw-w64 project as well, which aims to create a Windows version of gcc that's as close as possible to both the GNU and the Win32/Win64 platforms. Being able to link code (e.g. static libraries) compiled with Microsoft tools would be a nice plus, and to achieve that, the runtime library that comes with mingw-w64 has to provide all the Microsoft compiler intrinsics.

Additionally, I'm very nervous about linking code in ReactOS executables that comes from outside our source tree, for several reasons (for one, we have no guarantee about the compiler options that were used to compile the third-party code, and this has come to bite us in the ass before). This seems to be a common sentiment among operating system developers, as I've seen more than one tutorial on how to provide missing compiler intrinsics.

Finally, writing compiler intrinsics is work I enjoy a lot (I am the author of PSEH). I have no idea why.

And now for the main course

We are all familiar with the code that Visual C++ generates for a function containing SEH, as it has not changed in more than ten years. Given C code like this:

void a(void)
{
    __try { }
    __except(1) { }
}

the compiler will emit assembler like this, with the recognizeable and well-known prologs and epilogs (which I marked in bold):

_TEXT	SEGMENT
__$SEHRec$ = -24
_a	PROC
	push	ebp
	mov	ebp, esp
	push	-1
	push	OFFSET __sehtable$_a
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	DWORD PTR fs:0, esp
	sub	esp, 8
	push	ebx
	push	esi
	push	edi
	mov	DWORD PTR __$SEHRec$[ebp], esp
	mov	DWORD PTR __$SEHRec$[ebp+20], 0
	jmp	SHORT $LN9@a
$LN5@a:
	mov	eax, 1
$LN7@a:
	ret	0
$LN6@a:
	mov	esp, DWORD PTR __$SEHRec$[ebp]
$LN9@a:
	mov	DWORD PTR __$SEHRec$[ebp+20], -1
	mov	ecx, DWORD PTR __$SEHRec$[ebp+8]
	mov	DWORD PTR fs:0, ecx
	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp
	ret	0
_a	ENDP
_TEXT	ENDS

But, like many things, this changed with the release of Visual Studio .NET, which was a huge leap forward from its predecessor Visual Studio 6.0, and in many ways a departure from the old ways.

Specifically, since Visual Studio .NET, when the Microsoft compiler optimizes for code size (option /O1 on the command line), rather than speed (/O2), it will emit assembler like this, instead:

_TEXT	SEGMENT
__$SEHRec$ = -24
_a	PROC
	push	8
	push	OFFSET __sehtable$_a
	call	__SEH_prolog
	and	DWORD PTR __$SEHRec$[ebp+20], 0
	jmp	SHORT $LN9@a
$LN5@a:
	xor	eax, eax
	inc	eax
$LN7@a:
	ret	0
$LN6@a:
	mov	esp, DWORD PTR __$SEHRec$[ebp]
$LN9@a:
	or	DWORD PTR __$SEHRec$[ebp+20], -1
	call	__SEH_epilog
	ret	0
_a	ENDP
_TEXT	ENDS

The prolog code was collapsed into a call to _SEH_prolog, and the epilog code into a call to _SEH_epilog.

It's easy to see why: a fixed per-function overhead of 54 bytes is turned into a fixed per-function overhead of 11 bytes (to the benefit of code size) by moving the invariant code to a library function (to the detriment of code speed, because function calls are expensive).

Apparently, nobody documents what _SEH_prolog and _SEH_epilog do, so I had to find out on my own. As soon as I discovered how to reliably make the compiler emit the inline code or the calls for the same code (/O2 vs /O1 on the command line, it turned out), though, what each function did was self-evident. If we look at the assembly listings above, we'll see that they are virtually identical, save for the bolded sections, therefore the functions used in the second listing are completely equivalent to the prolog/epilog code of the first listing. Since the functions have only one implementation each, the inline code we are looking at is the only possible prolog/epilog. In other words, the only requirement for _SEH_prolog is that, if it's invoked like this:

	push	8
	push	OFFSET __sehtable$_func
	call	__SEH_prolog

it will have the same effect as the following code:

	push	ebp
	mov	ebp, esp
	push	-1
	push	OFFSET __sehtable$_func
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	DWORD PTR fs:0, esp
	sub	esp, 8
	push	ebx
	push	esi
	push	edi
	mov	DWORD PTR __$SEHRec$[ebp], esp

and the only requirement for _SEH_epilog is that, when invoked, it will have the same effect as the following code:

	mov	ecx, DWORD PTR __$SEHRec$[ebp+8]
	mov	DWORD PTR fs:0, ecx
	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp

We can see that _SEH_prolog and _SEH_epilog cannot be regular functions, because they modify the stack of the caller: _SEH_prolog allocates stack space, saves registers and sets up a SEH frame, and _SEH_epilog undoes it. Since the stack pointer when the functions exit is at the "wrong" height (_SEH_prolog allocates stack space, _SEH_epilog deallocates), we can also reasonably assume that the return address is extracted from the stack into a register. We can also reasonably assume that the functions end with a ret instruction, so they don't screw too much with the CPU's branch predictor (which expects each call to be followed by a ret, not by a jmp that sometimes happens to go to the right place), so their last instructions have to look a lot like this:

	; put the return address on the stack
	push	register

	; return to the return address
	ret

The stack tricks alone are a guarantee that the functions cannot be written in C, and must be written in raw x86 assembler.

After this discovery, I decided to take a closer look at how, exactly, the prolog code sets up the stack (and what, by extension, would the epilog code have to do to undo it).

Stack layout of a SEH-using function

By "manually" executing the instructions of the prolog, one by one, it's easy to see how the stack is being set up, and which special locations on the stack will be pointed to by which special registers:

-00000004:	original ebp	; ebp points here
-00000008:	-1
-0000000c:	OFFSET __sehtable$_func
-00000010:	OFFSET __except_handler3
-00000014:	original fs:0	; fs:0 points here
-00000018:	undefined
-0000001C:	new esp		; (ebp+__$SEHRec$)
-00000020:	ebx
-00000024:	esi
-00000028:	edi		; esp points here

(instead of actual, absolute stack addresses, I'll use offsets from the initial value)

The 8 in the sub esp, 8 instruction is the size of the space between stack locations -00000014 and -0000001C. We can easily verify that this number is actually 8+0, where 0 is the combined size of the stack-allocated variables.

Summing up, for a function with N bytes of local variables, the final layout will be:

-00000004:	original ebp
-00000008:	-1
-0000000c:	OFFSET __sehtable$_func
-00000010:	OFFSET __except_handler3
-00000014:	original fs:0
-00000018:	undefined
-0000001c:	new esp		; (ebp+__$SEHRec$)
-00000020:	variable -4	; (ebp+__$SEHRec$) -4
-00000024:	variable -8	; (ebp+__$SEHRec$) -8
   ...		variable ...	; (ebp+__$SEHRec$) ...
-00000020 - N:	ebx
-00000024 - N:	esi
-00000028 - N:	edi		; esp points here

The _SEH_prolog is therefore defined as the function that can turn the stack from its initial state to the final layout, and set special registers ebp and fs:0 (actually a special memory location, but a software register in practice) to the expected values, and _SEH_epilog as the function that can turn the final layout back into the initial state, and reset the special registers ebx, esi, edi, ebp and fs:0 to their initial values.

Implementing _SEH_prolog

We have to write the _SEH_prolog function so that it can create the final layout depicted above from an initial layout of:

-00000004:	8 + N
-00000008:	OFFSET __sehtable$_func
-0000000c:	return address

Additionaly, _SEH_prolog has to do so without disturbing the calling function's execution:

  1. it must return to the instruction after the call __SEH_prolog;
  2. it must not overwrite the ecx register (which contains the this pointer in __thiscall functions, and the first argument in __fastcall functions) nor the edx register (which contains the second argument in __fastcall functions);
  3. although I cannot prove this, it's most probably expected to not overwrite the ebx, esi or edi registers.

These limitations only leave the eax register free. We will try our best to only ever use eax.

The current contents of the stack are all wrong, but they all are information that we will need later, so we leave them alone. The next word on the stack is the address of __except_handler3, and the word after that is the current contents of fs:0. We can simply push them on the stack:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	...
__SEH_prolog ENDP

At this point, the stack would be at the correct height to store its pointer in fs:0, but we can't do it yet: we cannot push a SEH frame before we have finished initializing it. We hold our horses, and move on to the next stack location.

The next stack location is the variable-sized part. We know the size of this part, because it's on the stack, passed as an argument to _SEH_prolog. All we need to do is to get it, and lower the stack by that amount. The current layout of the stack is:

-00000004:	8 + N
-00000008:	OFFSET __sehtable$_func
-0000000c:	return address
-00000010:	OFFSET __except_handler3
-00000014:	original fs:0 ; esp points here

so the argument is at [esp+16]. We cannot lower the stack yet, though, because then we wouldn't have a nice fixed offset to access the second argument (the pointer to __sehtable$_func) and the return address. Sounds like a good time as any to initialize the ebp register. We'll copy the argument from offset -00000004 to a register, save the current value of ebp to offset -00000004, and then set the new, final value of ebp:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	eax, [esp+16]
	mov	[esp+16], ebp
	lea	ebp, [esp+16]
	...
__SEH_prolog ENDP

With a pointer to a well-known location of the stack, we don't need to worry anymore about the variable-sized part, which we can now simply allocate:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	eax, [esp+16]
	mov	[esp+16], ebp
	lea	ebp, [esp+16]
	sub	esp, eax
	...
__SEH_prolog ENDP

The stack is looking a little better now:

-00000004:	original ebp
-00000008:	OFFSET __sehtable$_func
-0000000c:	return address
-00000010:	OFFSET __except_handler3
-00000014:	original fs:0
-00000018:	undefined
-0000001c:	new esp
-00000020:	...

We have to move the return address to eax, and the pointer to __sehtable$_func to stack offset -0000000c. Unfortunately, we cannot do two overlapping moves with a single register, so we'll store the return address on top of the stack instead, where it belongs anyway. To do so, we must first push the ebx, esi and edi registers, because _func expects them to be on the top of the stack when _SEH_prolog returns; we will then store the current value of esp in [ebp-24], since the stack is now at the expected height:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	eax, [esp+16]
	mov	[esp+16], ebp
	lea	ebp, [esp+16]
	sub	esp, eax
	mov	[ebp-24], esp
	push	ebx
	push	esi
	push	edi
	mov	[ebp-24], esp
	...
__SEH_prolog ENDP

We can then move the return address from -0000000c ([ebp-8]) to the top of the stack, then move __sehtable$_func from -00000008 ([ebp-4]) to -0000000c ([ebp-8]), and finally set -00000008 ([ebp-4]) to constant value -1:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	eax, [esp+16]
	mov	[esp+16], ebp
	lea	ebp, [esp+16]
	sub	esp, eax
	mov	[ebp-24], esp
	push	ebx
	push	esi
	push	edi
	mov	[ebp-24], esp
	mov	eax, [ebp-8]
	push	eax
	mov	eax, [ebp-4]
	mov	[ebp-8], eax
	mov	[ebp-4], -1
	...
__SEH_prolog ENDP

The stack is now ready, with an extra word on top containing the return address. Initialization is done, and we can, at last, set fs:0 to the new SEH frame, starting at -00000014 ([ebp-16]), and return to the caller:

__SEH_prolog PROC
	push	OFFSET __except_handler3
	mov	eax, DWORD PTR fs:0
	push	eax
	mov	eax, [esp+16]
	mov	[esp+16], ebp
	lea	ebp, [esp+16]
	sub	esp, eax
	mov	[ebp-24], esp
	push	ebx
	push	esi
	push	edi
	mov	eax, [ebp-8]
	push	eax
	mov	eax, [ebp-4]
	mov	[ebp-8], eax
	mov	[epb-4], -1
	lea	eax, [ebp-16]
	mov	fs:0, eax
	ret
__SEH_prolog ENDP

And with this, _SEH_prolog is done.

Implementing _SEH_epilog

The _SEH_epilog function has restrictions similar to those of _SEH_prolog, and then more:

  1. it must return to the instruction after the call __SEH_epilog;
  2. it cannot use the eax or edx registers, because they contain the return value of the function;
  3. it must restore the initial values of ebx, esi, edi, ebp and fs:0;

As we limited ourselves to using eax in _SEH_prolog, we'll limit ourselves to ecx in _SEH_epilog. We know ecx is safe to overwrite in the epilog (even in __fastcall and __thiscall functions) because the inline epilog code generated by the compiler does so.

The stack layout on entering _SEH_epilog is:

-00000004:	original ebp	; ebp points here
-00000008:	-1
-0000000c:	OFFSET __sehtable$_func
-00000010:	OFFSET __except_handler3
-00000014:	original fs:0	; fs:0 points here
-00000018:	undefined
-0000001C:	new esp
-00000020:	ebx
-00000024:	esi
-00000028:	edi
-0000002c:	return address 	; esp points here

Following the example of the inline code, we restore the original value of fs:0 first:

__SEH_epilog PROC
	mov	ecx, DWORD PTR [ebp-16]
	mov	DWORD PTR fs:0, ecx
	...
__SEH_epilog ENDP

Then we pop the return address off the stack, so we can restore ebx, esi and edi:

__SEH_epilog PROC
	mov	ecx, DWORD PTR [ebp-16]
	mov	DWORD PTR fs:0, ecx
	pop	ecx
	pop	edi
	pop	esi
	pop	ebx
	...
__SEH_epilog ENDP

The next operation the inline code does is restoring esp and ebp, so let's do it too:

__SEH_epilog PROC
	mov	ecx, DWORD PTR [ebp-16]
	mov	DWORD PTR fs:0, ecx
	pop	ecx
	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp
	...
__SEH_epilog ENDP

Finally, let's put the return address back on the stack, and then return to it:

__SEH_epilog PROC
	mov	ecx, DWORD PTR [ebp-16]
	mov	DWORD PTR fs:0, ecx
	pop	ecx
	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp
	push	ecx
	ret	0
__SEH_epilog ENDP

And we're done. Any questions?

Q&A

What about functions that have more than 4096 bytes of local variables?

If a function has more than 4096 bytes of local variables (4068, actually — the hidden 28 bytes are the SEH frame minus the saved ebx, esi and edi), then the compiler will emit inline prologs/epilogs; the prolog will use _chkstk instead of sub esp.

What about code compiled with stack checks (/GS)?

Functions compiled with stack checks have different prologs/epilogs; the external implementations are called _SEH_prolog4 and _SEH_epilog4, and maybe we'll see them in a future episode.

Conclusions

In this first installment of "Inside the mind of a ReactOS developer" we have seen what the _SEH_prolog and _SEH_epilog functions are, what they do and how to implement them without looking at as little as one byte of copyrighted Microsoft code.

This blog post is the first public source on the internet that described the aforementioned functions in detail (that I know of), and if there is interest, I will write more like this.

Comments are welcome.

That's all.

8/08/2009

NT architecture: What is...an Architecture?

Windows is a well known OS from the user point of view ( tell me your first game played wasn't winemine or solitaire under any of the Windows versions), also thanks to some interesting books as "Windows Internals 4th or 5th Edition" now you can begin understanding the internal behavior of Windows.

ReactOS is an operating system which aims to have the same internal behavior than Windows, following this target,ReactOS will be able to load/run Applications and Drivers designed for Windows.

Windows,Gnu-Linux,ReactOS or any other OS has just one objective from the User point of view: Run Apps.From the developing point of view: Create the fastest/reliable way of connecting Applications with Hardware, processing and returning results.

But "Windows" is just an easy name to remember.The way an OS behaves,the way an OS processes the information,how the app is "processed" and how it is connected with the hardware(to show results in the LCD screen) is called Architecture, by the way "Windows" has followed different architectures.The last one is called "NT architecture".

What´s an Architecture?

I like the idea of comparing any OS architecture with the architecture of the buildings. Imagine you are an architect, imagine you can make a building with the shape you want but you have to give room for 100 people. You can decide creating a building of 10 floors and each floor creating 10 flats, or you can decide creating a skytower of 100 floors and just one flat on each floor. They are different Architectures, but they are valid solutions. In the same way you may have different OS Architectures, in this case the requirement is "processing and connecting the Software with the Hardware", so you can find: Nt architecture, Mac OS X architecture, Linux kernel...

And the Best Architecture is...

Let´s return to the idea of the buildings. Deciding making a skytower or a 10-floors building will make you facing different problems to solve.

Let´s study the skytower: you will have one neighboor in each floor, that means that each neighboor just will have a direct relation with the neighboor it has over his head and the one under his feet. So the flats are quite silence.That is a nice thing if you are the architect but also the owner who wants to rent the flats. "Silence full-floor flats": nice advertisement.

But of course it has some drawbacks, the first one: If the guy of the first floor wants to go to the roof will need some sandwiches to arrive to the roof if there isnt any direct elevator.

The 10floors-Architecture is a nice approach, it is much less expensive to construct, but there are much more noise in each floor and a lot of relations(which in a neighborhood can lead to problems) are created. Let's count the relations: each guy of the 5th floor will have 9 neighboors and 10 neighboors over his head and another 10 under his feet.Total: 29 direct relations. This is EACH guy. Try to calculate the number of total relations and compare with the number of the total relations with the Skytower approach.

Now, we can indeed invent a new architecture called: 1-Floor. An odd "building" which just have one enormous floor and 100 neighboors. Could you calculate the number of possible direct relations?Yes, a lot. But ey!They can arrive to the roof quite easy, withour any Elevator!!Wowww..(*ehem*)

Ok,Ok. Why am I talking about roofs,neighboors,floors?

The Translation...

An OS architecture can be studied as a number of superposed layers, as a building can be studied as a number of superposed floors. So we can have an architecture with 100 layers or an architecture with just 1 layer.This doesnt say too much, but let´s following. Each layer is usually divided in different structures,as a floor can be divided in 10 flats.And each flat(sorry, structure) has its own task inside the Architecture. One flat(sorry structure) processes the graphics , other flat talks with the hardware,other flat talk with the application, other flat talks with flat2 because flat3 is angry with the latter one...

So lets go to the skytower example again:

On the top(the Roof) of the Skytower there is an Application, in the basement of the building there is the Hardware, and theflats have to create "a way" for the app to reach the hardware.(Anything ends in the microprocessor,right?or accessing the HDD, or printing info in the Screen)

The skytower is the typical example of a too much layered OS.A too-much layered has some advantages, you as a programmer dont have to create a lot of direct relations between the structures(neighboors) in this skytower just 2 direct relations.So seems to be easier to control, but..you face one main issue: Slowness. The "call" has to go through 100 floors.No,there isn't a fast elevator.Maybe the "call" doesn't need to be processed in the floor number 63, but it has to go through it to reach the floor number 62.

What happens with a 1-Floor architecture? This is a non-layered-at-all architecture. Layers are nice since they reduce the number of direct relations (if you didn't calculate,please do it now :3.It´s huge )but they introduce extra time if you layered the architecture too much. A non layered architecture is huge in direct relations, and it is a full mess since you will be seeing the call going to the flat number #1 which later goes to the number #35 then going back to #12 and then to #98 and all this happening in ONE floor.

Of course an App makes different "calls"("ways"). A "call" which calculates 2+2 isnt the same that a "call" which reproduces music.This means that a call which calculates 2+2 needs to go through some flats but maybe different(some could be commun) that the flats which the "sound call" goes through.

In a 1-Floor architecture (thanks to be all the flats in the same floor), the calls just have to go in and out from the desired flats and ending in the basement.Now,imagine the skytower again: both calls will go through the 100 Flats(because in skytower flat and floor is the same)!!!

Look carefully the two pictures : There are 2 calls represented, but in different architectures(Skytower vs 10-Floors).

Call1:(Orange) This one has to visit Flat98, Flat4 before reaching the hardware

Call2:(Pink)This one has to visit 20,19,3.

As you can see in a Skytower architecture the calls have to go through all the Floors, to reach the desired flats. Something like calling to the door and saying: "Hi, I want to go into your flat but just because i need to reach your neighboor who is under your feet. My apologies"

In the 10-Floors, this happens much less, but you can find also calls (look Call1) which goes through layers without visiting any of the flats, just to arrive a flat which is placed in a layer below.

Look also that the call #100 to #98 doesnt need to go through 99. This is possible because Direct Relations inside the layer. Same with #19 and #3. Of course it´s just a developing decission if ALL the flats inside a layer have direct relations, or if it isn't needed(easier to code).

Look the call #20,#19,#3: It´s possible that 20 and 3 doesnt have a Direct Relation so the call needs to go through 19 (like in the Skytower) or maybe it goes through 19 because it needs to "do something" inside.

"So it´s better a 1-Floor architecture...":

Well,not really. Imagine that the 100% of the time you need at least going inside 2 flats(call1: flat #1,flat #13; call2: flat#3,flat#11,flat#7;call3: flat #1, flat#7,flat #14...and others 2 or more flats combinations).Then it´s obvious that if you create a 2-Floor architecture (remember you can place the flats inside this 2 Floors as you wish:you are the architect)you arent going to lose extra time!

So yes, you have to find the balance between number of direct relations(easier,clear and cleaner code as possible) with the performance of the OS(not too much layers).And also it depends of the objective of the OS: a real-time OS cant be quite layered or it will "lose a lot of time" from a Real-time OS point of view(any nanosecond counts).

So each OS tries to give its own solution to this balance problem, and that is why there are different architectures. ReactOS follows the NT architecture and its layers are placed in the same way Nt does.

And yes: We are going to try to explain the ReactOS (NT) architecture..Are you ready? ;)

See you next week :)