Thursday, November 27, 2008

Object Manager and I/o Manager

Part 1: How a file (object) is created

The Object Manager is the glue that creates and manages objects in the NT Executive.
And as such, it is essential to know how the Object Manager works with the I/O Manager to create a file object.

Before getting into the specifics of the creation of a file object (FILE_OBJECT), it's useful to know that the I/O Manager must register and create object types, to let the Object Manager know an existence of such types. The two main types that we are concerned with are (let's call them) IoFileObjectType and IoDeviceObjectType.

When the I/O Manager creates these two (2) types, it actually needs to specify a parse routine which are (let's call them) IoParseFileObject and IoParseDeviceObject. So basically from there, we know that the Object Manager when calling ObLookupObjectByName gets called, it will try to find this specific file. The I/O Manager will call ObLookupObjectByName but not in the usual way. It will pass the type "IoFileObjectType" to ObLookupObjectByName and then all control is given to the Object Manager via "ObLookupObjectByName" and eventually in ObLookupObjectByName (if it's successful), will end up at a device's type parse procedure, and call IoParseDevice, but will pass the type as IoFileObjectType.

This is where all the grunt work gets done. Not only does it create an IRP, but sends the remaining portion of the filename down to the "IoCallDriver" method. So basically, any registered DEVICE_OBJECT can get the related DRIVER_OBJECT, and call the necessary supporting file routines. The IRP will signify what type of major function, minor function, and create and store the FILE_OBJECT necessary to completing the call. IoParseDeviceObject needs to be lengthy as it does all the work. Once it returns from IoCallDriver, IoParseDeviceObject returns from ObLookupObjectByName, and then ObLookupObjectByName returns a handle to IopCreateFile/IopOpenFile, ibid et al.

This is just a brief overview in what must take place for the creation of a file to happen. This is not a trivial task that can be explained in full detail in one post. But over the next week, I'll cover more details in what must take place.

Friday, November 14, 2008

Important Links

I am doing an overall comparative study with the Linux Kernel and the ntoskrnl.
And as such, I have provided a link for a basic understanding of the ntoskrnl.

For ntoskrnl overview, there are many slides for each component of the Ntoskrnl:

http://www.i.u-tokyo.ac.jp/edu/training/ss/lecture/new-documents/Lectures

Tuesday, November 11, 2008

Semaphore Part 1 (__ticket_spin_lock explained)

We know from the previous article that semaphore uses the spin_lock_irq, which in turn, calls _raw_spin_lock, then calls, __ticket_spin_lock:

static __always_inline void __ticket_spin_lock(raw_spinlock_t *lock)
{
short inc = 0x0100;

asm volatile (
LOCK_PREFIX "xaddw %w0, %1\n"
"1:\t"
"cmpb %h0, %b0\n\t"
"je 2f\n\t"
"rep ; nop\n\t"
"movb %1, %b0\n\t"
"jmp 1b\n"
"2:"
: "+Q" (inc), "+m" (lock->slock)
:
: "memory", "cc");
}

This can be defined in C code as:

void __ticket_spin_lock_C(raw_lock_t * lock)
{

short inc = *(short *)&lock->slock;

//
// xaddw
//
lock->slock += 0x0100;

while ( *(char *)&inc != *( (char *)&inc + 1) ) )
*(char *)&inc = *(char *)&lock->slock;

}

Before, knowing what we're doing here, we must also look at __ticket_spin_unlock:

static __always_inline void __ticket_spin_unlock(raw_spinlock_t *lock)
{
asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
: "+m" (lock->slock)
:
: "memory", "cc");
}

This can be implemented in C as:

void __ticket_spin_unlock_C(raw_spinlock_t * lock)
{
++lock->slock; /* must-be-atomic action */
}

Therefore, while the low byte is not equal to the high byte in the summation of increment and slock, keep looping and update the low byte of slock and comparing it with the high byte of inc. Once they are equal, then we have attained the lock.

The only way this will happen is when __ticket_spin_unlock gets called. It increments the slock, and returns. Basically, the ordering is preserved through this process, inc will denote the current thread. So, when slock is incremented, it will be the first thread that has access to the lock. Therefore, no starvation will occur, and this is truly first-come-first-serve.

Semaphores Part 1

Sempahores is one of the dark arts of the xkrn1; and as such, I will go over the semaphore and spinlocks, in detail over a few tutorials.

Spinlocks are the at the core of locking primitives. Sempahores are implemented in terms of spinlocks, and even further, spinlocks are implemented in terms of a "raw" lock. Let's look at the definition of a raw lock:

typedef struct raw_spinlock {
unsigned int slock;
} raw_spinlock_t

Currently, the xkrn1 uses the ticket algorithm. It uses 16-bits to represent the datastructure. Bits 0-7 are called the "owner"; bits 8-15 are called the "waiting" bits. When owner == waiting, it will mean that the current thread holds the lock.

A spinlock structure contains a spinlock and miscellaneous data for housekeeping and status information:

typedef struct {
raw_spinlock_t raw_lock;
#ifdef CONFIG_GENERIC_LOCKBREAK
unsigned int break_lock;
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu;
void *owner;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
} spinlock_t

We will look at struct lockdep_map in the next article; however, it will be very important and is very complex.

But the structure for the semaphore is as follows:

struct semaphore {
spinlock_t lock;
unsigned int count;
struct list_head wait_list;
};

Therefore, we see that the semaphore is composed of the spinock, the count, as well as the wait_list. The wait_list uses the technique described in previous articles, the list_head structure. Therefore, we can get to the containing struct seamphore if we have a reference to the wait_list.

Basically, the spinlock is dependent upon the raw_spinlock_t.
The _raw_spin_lock, calls __ticket_spin_lock, looks likes this:

static __always_inline void __ticket_spin_lock(raw_spinlock_t *lock)
{
short inc = 0x0100;

asm volatile (
LOCK_PREFIX "xaddw %w0, %1\n"
"1:\t"
"cmpb %h0, %b0\n\t"
"je 2f\n\t"
"rep ; nop\n\t"
"movb %1, %b0\n\t"
"jmp 1b\n"
"2:"
: "+Q" (inc), "+m" (lock->slock)
:
: "memory", "cc");
}

In summary, a semaphore is composed of a spinlock, which will get checked from assembler code via _raw_spin_lock, i.e. __ticket_spin_lock. The ticketing algorithm can be seen in many articles online; however, we will go over the algorithm in the reference articles, detailed later this month.

In the next article, we will go over the semaphore's interface of _raw_spin_lock, namely, spin_lock_irq.

Monday, November 10, 2008

Useful C macro's

Some useful macro's and functions I will go over are align macros.

#define ALIGN(x, a) __ALIGN((x), (a) - 1)
#define __ALIGN(x, a) (((x) + (a)) & ~(a))
#define IS_ALIGNED(x, a) ( ((x) & ((a) - 1)) == 0 )

These macro's are used to identify the proper alignment. They use the bitwise operations extensively. Also, another set of macro's that are useful to know are rounding macros.

#define ROUND_UP(x, y) ( (((x) + (y) - 1) / (y)) * (y) )

Another macro that is seen throughout the internet is the sizeof operator. Although, it is not officially a macro, it is a compiler-specific operator. However, it should be equivalent to the following macro's:

#define sizeof_value(val) ( (size_t)((&val + 1) - &val) )
#define sizeof_type(type) ( (size_t)((type *)0 + 1) )

These are small tricks that can retrieve the types by casting to pointers and checking the pointer arithmetic.

Latest News - Software Engineer (MSFT)

I will write about my new adventures and my new studies since joining MSFT.
First and foremost, I got into MSFT recently, and am studying device driver code.
There is an increasingly interest in kernel programming, as of late.
And as such, I will go over two main macros that are used for the Linux Kernel, as well as generic kernel programming knowledge.

As a side note, I am studying the Memory Manager in the Linux Kernel, and finishing the Linux Device Driver (3rd ed.) and becoming more knowledgeable into the xkrnl. I will post about basics of Device Driver (xkrnl) programming, as well as new things I learn as I study the xkrnl's VMM.

The two macros I will note is the offset_of and the container_of.
#define offset_of(struct_type, member) \
( (unsigned long) ( \
(&(struct_type *)0)->member \
) )
#define container_of(ptr, struct_type, member) \
( (struct_type *) ( \
(char *)ptr - offset_of(struct_type, member)\
) )

The offset_of macro, is used to find the member field offset in the struct 'struct type', and container_of is used to find the containing structure, given an address 'ptr', such that ptr is the member 'member' of struct 'struct_type'.

The interesting part is the offset_of() macro, which gives the exact offset position, by using the trick where if you deference and then take the address of a NULL pointer, it will just jump to the address, ignoring the dereference.