Thursday, November 27, 2008

Object Manager and I/o Manager

Part 1: How a file (object) is created

The Object Manager is the glue that creates and manages objects in the NT Executive.
And as such, it is essential to know how the Object Manager works with the I/O Manager to create a file object.

Before getting into the specifics of the creation of a file object (FILE_OBJECT), it's useful to know that the I/O Manager must register and create object types, to let the Object Manager know an existence of such types. The two main types that we are concerned with are (let's call them) IoFileObjectType and IoDeviceObjectType.

When the I/O Manager creates these two (2) types, it actually needs to specify a parse routine which are (let's call them) IoParseFileObject and IoParseDeviceObject. So basically from there, we know that the Object Manager when calling ObLookupObjectByName gets called, it will try to find this specific file. The I/O Manager will call ObLookupObjectByName but not in the usual way. It will pass the type "IoFileObjectType" to ObLookupObjectByName and then all control is given to the Object Manager via "ObLookupObjectByName" and eventually in ObLookupObjectByName (if it's successful), will end up at a device's type parse procedure, and call IoParseDevice, but will pass the type as IoFileObjectType.

This is where all the grunt work gets done. Not only does it create an IRP, but sends the remaining portion of the filename down to the "IoCallDriver" method. So basically, any registered DEVICE_OBJECT can get the related DRIVER_OBJECT, and call the necessary supporting file routines. The IRP will signify what type of major function, minor function, and create and store the FILE_OBJECT necessary to completing the call. IoParseDeviceObject needs to be lengthy as it does all the work. Once it returns from IoCallDriver, IoParseDeviceObject returns from ObLookupObjectByName, and then ObLookupObjectByName returns a handle to IopCreateFile/IopOpenFile, ibid et al.

This is just a brief overview in what must take place for the creation of a file to happen. This is not a trivial task that can be explained in full detail in one post. But over the next week, I'll cover more details in what must take place.

Friday, November 14, 2008

Important Links

I am doing an overall comparative study with the Linux Kernel and the ntoskrnl.
And as such, I have provided a link for a basic understanding of the ntoskrnl.

For ntoskrnl overview, there are many slides for each component of the Ntoskrnl:

http://www.i.u-tokyo.ac.jp/edu/training/ss/lecture/new-documents/Lectures

Tuesday, November 11, 2008

Semaphore Part 1 (__ticket_spin_lock explained)

We know from the previous article that semaphore uses the spin_lock_irq, which in turn, calls _raw_spin_lock, then calls, __ticket_spin_lock:

static __always_inline void __ticket_spin_lock(raw_spinlock_t *lock)
{
short inc = 0x0100;

asm volatile (
LOCK_PREFIX "xaddw %w0, %1\n"
"1:\t"
"cmpb %h0, %b0\n\t"
"je 2f\n\t"
"rep ; nop\n\t"
"movb %1, %b0\n\t"
"jmp 1b\n"
"2:"
: "+Q" (inc), "+m" (lock->slock)
:
: "memory", "cc");
}

This can be defined in C code as:

void __ticket_spin_lock_C(raw_lock_t * lock)
{

short inc = *(short *)&lock->slock;

//
// xaddw
//
lock->slock += 0x0100;

while ( *(char *)&inc != *( (char *)&inc + 1) ) )
*(char *)&inc = *(char *)&lock->slock;

}

Before, knowing what we're doing here, we must also look at __ticket_spin_unlock:

static __always_inline void __ticket_spin_unlock(raw_spinlock_t *lock)
{
asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
: "+m" (lock->slock)
:
: "memory", "cc");
}

This can be implemented in C as:

void __ticket_spin_unlock_C(raw_spinlock_t * lock)
{
++lock->slock; /* must-be-atomic action */
}

Therefore, while the low byte is not equal to the high byte in the summation of increment and slock, keep looping and update the low byte of slock and comparing it with the high byte of inc. Once they are equal, then we have attained the lock.

The only way this will happen is when __ticket_spin_unlock gets called. It increments the slock, and returns. Basically, the ordering is preserved through this process, inc will denote the current thread. So, when slock is incremented, it will be the first thread that has access to the lock. Therefore, no starvation will occur, and this is truly first-come-first-serve.

Semaphores Part 1

Sempahores is one of the dark arts of the xkrn1; and as such, I will go over the semaphore and spinlocks, in detail over a few tutorials.

Spinlocks are the at the core of locking primitives. Sempahores are implemented in terms of spinlocks, and even further, spinlocks are implemented in terms of a "raw" lock. Let's look at the definition of a raw lock:

typedef struct raw_spinlock {
unsigned int slock;
} raw_spinlock_t

Currently, the xkrn1 uses the ticket algorithm. It uses 16-bits to represent the datastructure. Bits 0-7 are called the "owner"; bits 8-15 are called the "waiting" bits. When owner == waiting, it will mean that the current thread holds the lock.

A spinlock structure contains a spinlock and miscellaneous data for housekeeping and status information:

typedef struct {
raw_spinlock_t raw_lock;
#ifdef CONFIG_GENERIC_LOCKBREAK
unsigned int break_lock;
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu;
void *owner;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
} spinlock_t

We will look at struct lockdep_map in the next article; however, it will be very important and is very complex.

But the structure for the semaphore is as follows:

struct semaphore {
spinlock_t lock;
unsigned int count;
struct list_head wait_list;
};

Therefore, we see that the semaphore is composed of the spinock, the count, as well as the wait_list. The wait_list uses the technique described in previous articles, the list_head structure. Therefore, we can get to the containing struct seamphore if we have a reference to the wait_list.

Basically, the spinlock is dependent upon the raw_spinlock_t.
The _raw_spin_lock, calls __ticket_spin_lock, looks likes this:

static __always_inline void __ticket_spin_lock(raw_spinlock_t *lock)
{
short inc = 0x0100;

asm volatile (
LOCK_PREFIX "xaddw %w0, %1\n"
"1:\t"
"cmpb %h0, %b0\n\t"
"je 2f\n\t"
"rep ; nop\n\t"
"movb %1, %b0\n\t"
"jmp 1b\n"
"2:"
: "+Q" (inc), "+m" (lock->slock)
:
: "memory", "cc");
}

In summary, a semaphore is composed of a spinlock, which will get checked from assembler code via _raw_spin_lock, i.e. __ticket_spin_lock. The ticketing algorithm can be seen in many articles online; however, we will go over the algorithm in the reference articles, detailed later this month.

In the next article, we will go over the semaphore's interface of _raw_spin_lock, namely, spin_lock_irq.

Monday, November 10, 2008

Useful C macro's

Some useful macro's and functions I will go over are align macros.

#define ALIGN(x, a) __ALIGN((x), (a) - 1)
#define __ALIGN(x, a) (((x) + (a)) & ~(a))
#define IS_ALIGNED(x, a) ( ((x) & ((a) - 1)) == 0 )

These macro's are used to identify the proper alignment. They use the bitwise operations extensively. Also, another set of macro's that are useful to know are rounding macros.

#define ROUND_UP(x, y) ( (((x) + (y) - 1) / (y)) * (y) )

Another macro that is seen throughout the internet is the sizeof operator. Although, it is not officially a macro, it is a compiler-specific operator. However, it should be equivalent to the following macro's:

#define sizeof_value(val) ( (size_t)((&val + 1) - &val) )
#define sizeof_type(type) ( (size_t)((type *)0 + 1) )

These are small tricks that can retrieve the types by casting to pointers and checking the pointer arithmetic.

Latest News - Software Engineer (MSFT)

I will write about my new adventures and my new studies since joining MSFT.
First and foremost, I got into MSFT recently, and am studying device driver code.
There is an increasingly interest in kernel programming, as of late.
And as such, I will go over two main macros that are used for the Linux Kernel, as well as generic kernel programming knowledge.

As a side note, I am studying the Memory Manager in the Linux Kernel, and finishing the Linux Device Driver (3rd ed.) and becoming more knowledgeable into the xkrnl. I will post about basics of Device Driver (xkrnl) programming, as well as new things I learn as I study the xkrnl's VMM.

The two macros I will note is the offset_of and the container_of.
#define offset_of(struct_type, member) \
( (unsigned long) ( \
(&(struct_type *)0)->member \
) )
#define container_of(ptr, struct_type, member) \
( (struct_type *) ( \
(char *)ptr - offset_of(struct_type, member)\
) )

The offset_of macro, is used to find the member field offset in the struct 'struct type', and container_of is used to find the containing structure, given an address 'ptr', such that ptr is the member 'member' of struct 'struct_type'.

The interesting part is the offset_of() macro, which gives the exact offset position, by using the trick where if you deference and then take the address of a NULL pointer, it will just jump to the address, ignoring the dereference.

Saturday, March 1, 2008

CoreOS Interviews

I will describe some questions that I was asked.

1.) Given a list of negative and positive integers, group all the negative integers together and the positive on the opposite side, in place.

int pos = 0;
for (int i = 0; i < n; ++i) {
if (array[i] < 0) {
swap(&array[pos++], &array[i]);
}
}


2.) Implement strstr() with no libraries.

char *strstr(char *hay, char *need) {
if (hay == (char *)0 || char == (char *)0)
return (char *)0;
int haysz = my_strlen(hay);
int needsz = my_strlen(need);
int j = 0;
for (int i = 0; i < haysz - needsz; ++i) {
for (j = 0; j < needsz; ++j)
if (hay[i + j] != need[j])
break;
if (j == needsz)
return &hay[i];
}
return (char *)0;
}


3.) Given 3 buckets labeled B, W, and BW. The labels are wrong, choose a ball out of a bucket to correctly fix the labels. Which bucket must you choose from?
BW, because this will open up the possibilities by process of elimination.
For example, if the ball you choose from BW is white, then that bucket must be W, since it cannot be BW. Moving along, you look at the bucket labeled B, the only alternatives are W or BW. W was already chosen, so the bucket must be labeled BW. Looking at the last bucket labeled W, the only option is B, which satisfies the rules. (This logic works for the ball chosen from the BW bucket is black)

4.) Test the cp command.
Input, type of files, restrictions on non-valid files (i.e., they don't exist).

5.) Find the top six (6) frequencies from 1 - 100, that have the highest strength. You are given a function called int getStrength(int freq); and it will return a value from 1 - 10. 10 is the highest strength.

typedef pair FreqStr;
vector list;
int cmp(FreqStr first, FreqStr next) {
return first.second >= next.second;
}
for (int i = 1; i <= 100; ++i) {
list.push_back(make_pair(i, getStrength(i)));
}
sort(list.begin(), list.end(), cmp);
for (int i = 0; i < 6 && i < list.size(); ++i)
top6[i] = list[i].second;


6.) Given a nxn grid, and '*' represents mine's as in minesweeper. Return the grid with '*' characters remaining, but the blank spaces with numbers, indicating how many mine's are adjacent to that (x,y) position. Adjacent, in this context, means horizontal, vertical, and diagonal from the (x,y) position.

bool inBounds(int x, int y) {
return (0 <= x && x < n && 0 <= y && y < n);
}
int count(int x, int y) {
int cnt = 0;
for (int xoff = -1; xoff <= 1; ++xoff)
for (int yoff = -1; yoff <= 1; ++yoff)
if (inBounds(x + xoff, y + yoff)) {
if (grid[x + xoff, y + yoff] == '*')
++cnt;
}
return cnt;
}
for (int x = 0; x < n; ++x)
for (int y = 0; y < n; ++y) {
if (grid[x, y] != '*') {
grid[x, y] = '0' + count(x,y);
}
}

Saturday, February 9, 2008

Linear Tree Traversal

Linear Tree Traversals
This is useful if you want to simulate going through a tree in sorted order, as if the tree was a linear list, and the list was sorted.
There are two mechanisms how this is done, one called "linking," which we will not discuss, and the other is going and finding the lowest element in the tree, and traversing it in such a way, where we go from least to highest. The latter is the one which I will discuss.

So, there are two pieces to this linear traversal.
First, to find the lowest element, this is just a matter of going left, until you can't.
Second, to find the next element, you can do one of two things. You want to go right, and then all the way down to the left most node. If you have no right child node, then you are left to go up. There are two cases, one where you are the left or the right child of your parent.

If you are a left child of your parent, it hasn't been "traversed," therefore pick that element. Else if you are the right child of your parent, the parent node has been "traversed" so keep going up until you are not the right child of your parent.

Before, displaying the code, let's prove that going up (once, if you are the left, and continuously, until you hit NULL, if you are continuously the right) will indeed be the right choise.

Ok, starting off you are the left most node in the tree. This guarantees that you are the "lowest" node in the tree. However, if you look at its parent node, it is still bigger then the "lowest" node's right node. So, instead of going up, you'd want to go right, and then follow its left chain all the way down.
However, if you are a right child, then to even get to that point your parent node has been traversed. Why? This is because to get to the right node, you have had to traverse the parent, to even get to the right node. This is in part due to the fact that, we go right, then all the way down left. And once we bubble up to that "subroot" which is the right child, we "traverse" it, and then go to its right node, and then to the lowest node in that subtree. So, in doing so, when we bubble back up, there is no need to look at the parent node of the right child.

The code follows:

typedef struct node {
struct node * left;
struct node * right;
struct node * parent;
int val;
} uwf_node_t;

// Find the first node in the tree.
uwf_node_t * findFirst(uwf_node_t * root) {
if (!root) {
return (uwf_node_t *)NULL;
}
while (root->left)
root = root->left;
// Return the left most node.
return (root);
}
uwf_node_t * findNext(uwf_node_t * subRoot) {
if (!subRoot)
return (uwf_node_t *)NULL;
// Let's suppose, we are the left most node
// instead of bubbling up, we need to go right,
// and then left as far as possible.
if (subRoot->right) {
subRoot = subRoot->right;
while (subRoot->left)
subRoot = subRoot->left;
} else {
uwf_node_t * oldRoot = subRoot;
// Find the parent node that is "valid"
while (subRoot = subRoot->parent) {
if (subRoot->left == oldRoot)
break;
oldRoot = subRoot;
}
}
return (subRoot);
}

Friday, January 25, 2008

Find the Missing Number

Suppose you have a list of N-1 integers, and they are from 1-N.
Each number is used at most once.
Therefore, there is a missing number.

To add more complexity, assume you can only look at one bit at any given time.
Let's call this function:

boolean isSet(int index, int bitPosition) {
return array[index] & pow(2, bitPosition);
}


The answer is to view the 0th bit, counting up how many are supposed to be odd
and how many are supposed to be even, we can analyze that the missing number is
in one of these sets, and therefore, one of the sets can be removed completely.
Therefore, the next bit check is reduced from N to N/2, and doing so, will be
something similar to N + N/2 + N/4 + ... smaller than N*lg(N).
The trick is when the number is in the set that is "odd" or that has the bit "set",
then that bit will be set, and you increment and look at the next bit, else, look
at the next bit + 1.


for (i = 0; i < 8; ++i) { // log(N)
memset(bitSetList, 0, N);
memset(bitNotSetList, 0, N);
bitSet = bitNotSet = 0;
if (N <= 1) {
printf("Printing the value ...\n");
printf("%d\n", set);
break;
}
for (j = 0; j < N; ++j) // N + N/2 + N/4 + ...
// Roughly N * lg(N) since 1 + 1/2 + 1/3 + ... + 1/N -> lg(N+1)
// and N * (1 + 1/2 + ... + 1/N) > N + N/2 + N/4 + ...
if (f(array[j], i))
bitSetList[bitSet++] = array[j];
else
bitNotSetList[bitNotSet++] = array[j];

// Update the index counter.
if (N%2) {
if (N/2 == bitSet)
set |= pow(2, i),
updateBitSet(&i, &N, bitSet, bitSetList, BIT_SET);
else
updateBitSet(&i, &N, bitNotSet, bitNotSetList, BIT_NOT_SET);
} else { // even
if (N/2 == bitSet)
set |= pow(2, i),
updateBitSet(&i, &N, bitSet, bitSetList, BIT_SET);
else
updateBitSet(&i, &N, bitNotSet, bitNotSetList, BIT_NOT_SET);
}
}

Saturday, January 19, 2008

Factorial n!

Computing n!

There are many ways to compute n!, but I will introduce a divide and conquer method.
Using these rules below:
f(n,m) = 1 if n == m,
f(n,m) = 0 if n > m,
f(n,m) = f(n,x) * f(x+1,m) where x = (n + m)/2, otherwise

int f(int n, int m)
{
if ( !(n < m) ) return n == m;
int x = (n + m) / 2
return f(n,x) * f(x+1,m);
}

Thursday, January 17, 2008

Project Euler - Prime Generation

Project Euler - Mathematics/CS competition site (http://www.projecteuler.net)
I want to show an alternate way than S of E, currently, that I know of.
The space required is just the array of primes.
I used to perform the Seives (of Erasothenes) which is N * lg lg N, which is quite fast.
However, a better technique (in terms of space) is used in the following example.
This code will be python, and I will show the implementation below.
I can find the 10,001st prime number in under one second.

#! /usr/bin/python
# This program computes
# 10,001st prime.
# count of how many primes
nprimes = 1
# current value
n = 1
# current primes list with
# its value squared
primes = [ (2,4) ]
while nprimes < 10001:
# Gen all primes
n += 2
for prime, primeSquared in primes:
if n < sq:
primes.append((n, n**2))
nprimes += 1
if nprimes == 10001:
print n
break;
if n % prime == 0:
break;

The Sieve Of Erasothenes (S of E) follows (which is much faster for large inputs):

#!/usr/bin/python
# O(N lg lg N)
import Numeric
primes = [2]
N = 2000000
p = Numeric.ones(N,Numeric.Int)
n = 3
c = 1
while n < N:
if p[n] == 1:
c += 1
primes.append(n);
j = n + n;
while j < N:
p[j] = 0
j += n
n += 2

Wednesday, January 16, 2008

Generating all subsets

Generating all subsets

I am not really going to dive into much detail here.
But assume you are representing a set with a bitmask within a uint32_t (or the like).
To generate all the subsets is quite easy if you desire to do it within a couple of loops, however, doing it in one loop is conceivably hard, and even harder to prove that it works.

The following code is using recursion to identify and prints out all subsets of the set (bitmask).

void f(int bitmask) {
cout << bitmask << endl;
for (int j = 0; j < n; ++j)
if (bitmask & (1 << j))
f(bitmask ^ (1 << j));
}

However, a better solution and one that avoids recursion is the following:

for (int subset = bitmask;; subset = (subset - 1) & bitmask)
cout << subset << endl;

To prove this, we first need to know what exactly happens when you perform
(subset - 1) & bitmask
Basically, the & bitmask identifies that it is a subset of bitmask.
But does this guarantee all of them?
(subset - 1) will remove the lowest one-bit that is set, and replace it with a zero, and set all the lower bits to one.

The reasoning that this proves that all subsets are conceivably generating with this single for loop is certain. Every time you update subset, you are un-setting the lowest bit within bitmask. Once you have exhausted the lower bit. It goes to the next lowest bit that is set, since the lowest one has been exhausted already.

For example,
X = 11010111
Y = X (this is valid)
Y = (Y - 1) & X = 11010110 (removes lowest bit)
Y = (Y - 1) & X = 11010101 (removes 2nd lowest bit)
Y = (Y - 1) & X = 11010100 (removes 2 lowest bits)
Y = (Y - 1) & X = 11010011 (removes 3rd lowest bit)
Y = (Y - 1) & X = 11010010 (removes 3rd and 1st lowest bit)
Y = (Y - 1) & X = 11010001 (removes 3rd and 2nd lowest bit)
Y = (Y - 1) & X = 11010000 (removes 3 lowest bits)
This follows for all Y, such that Y > 0.
The bitwise-AND with X, guarantees it's always a subset of X.
q.e.d.

Therefore, the invariant of this process, is that the lowest bit is unset (within X), then proceeding with higher bits, it sets all remaining bits to being set, and then it proceeds to unset that bits along with the current bit that is being proceeded to be unset. This will guarantee all subsets.