programmer's blog: 2007

Friday, December 21, 2007

Word Numbers

Question:
"If the integers from 1 to 999,999,999 are written as words, sorted alphabetically, and concatenated, what is the 51 billionth letter?"

To be precise: if the integers from 1 to 999,999,999 are expressed in words (omitting spaces, 'and', and punctuation[1]), and sorted alphabetically so that the first six integers are

* eight
* eighteen
* eighteenmillion
* eighteenmillioneight
* eighteenmillioneighteen
* eighteenmillioneighteenthousand

and the last is

* twothousandtwohundredtwo

then reading top to bottom, left to right, the 28th letter completes the spelling of the integer "eighteenmillion".

The 51 billionth letter also completes the spelling of an integer. Which one, and what is the sum of all the integers to that point?

Solution:
The solution to this problem involves noticing that you can skip many words if the order is not important. Notice the 999,999 numbers after eightmillion. They include 8,000,001 - 8,999,999 in some sorted order. Thus, it's easy to count the numbers (sum) as well as the sum.

The hard part is when you have to drill down on the problem. However, it's the same problem over again. Except there are fewer number that are involved. Let's look at eightthousand. What are the next 999 numbers involved? 8,001 - 8,999 in some sorted order.

So if we have 3 lists of numbers, and place them in orders, relative to 10^3 then we only have roughly 3000 numbers to sort and go through at any time before drilling down. The trick is to count the sum quickly as well as the number of letters while skipping over many letters and numbers in the process.

However, it boils down to only needing to keep track of the lower order categories.

We need to know the sum from 1 - 999, and the sum from 1 - 999,999.
(n*(n+1)/2)
We need to know the sum of letters from 1 - 999, and 1 - 999,999.
This is easy, but not that straight-forward. I approached it differently, so I kept a list of number (spelled out) from 1 - 999. So that part was just looping through and summing all entries in that list.

However, to get the other sum from 1 - 999,999. I looped the higher order list that contained the thousand list of numbers 1,000 - 999,000. And added that sum of characters plus the sum of 1 - 999, and then added it with the sum of characters multiplied by 999.

Therefore, all that's left is to sort all categories and walk through the list, updating the current sum of numbers and the current sum of letters.

Wednesday, November 21, 2007

Printing Neatly

Word wrap in most text editors take the greedy approach of formatting a long non-newline text that spans into a multi-line paragraph. However, we will look at something that tries to minimize the "badness" or shall we say the number of unused whitespace. Whitespace between words does not count as unused whitespace, but rather space that is used to fill the rest of the line is "unused."

Firstly, we need to consult the badness function:


c[i,j] = (SPACES_PER_LINE - (j - i + Length(word k)(i,j)))^2
Length(word k)(i,j) = SUM from k = i to j the length of the word k.

Also, the memoized version of this can be used to identify a one-state function, that depends on the number of words. Note that also, if c[i,j] is not INFINITY (the operand inside the squared is >= 0), then the total cost is just c[i,j]. Else, choose for every k, such that k = i to j-1, find the optimal solution to fit up to k words on previous lines plus the cost c[k+1,j] on this line. The solution is the minimum. So the recurrence relation is:


f(n) = MIN { c[1,n], MIN { for(k,1,n) f(k) + c[k+1,n] } }

Therefore, in code the memoized version would look like so:


#define SPACES_PER_LINE 80
#define INFINITY 1e9
uint32_t memo[N + 1];

uint32_t c(uint32_t i, uint32_t j)
{
     uint32_t spaces = j - i;
     uint32_t wordLength = 0;
     for (int k = i; k <= j; ++k) {
          wordLength += str[k].size();
     }
     int32_t res = SPACES_PER_LINE - (spaces + wordLength);
     if (res >= 0) {
         return ((uint32_t)res * res);
     }
     return (INFINITY);
}
uint32_t f(uint32_t n)
{
     if (memo[n] != -1) {
          return (memo[n]);
     }
     if (c(1, n) != INFINITY) {
          return (memo[n] = c(1, n));
     }
     uint32_t min = INFINITY;
     for (int k = 1; k < n; ++k) {
          min = MIN(min, f(k) + c(k+1, n));
     }
     return (memo[n] = min);
}

This solution could also have been built up using dp.

Monday, November 19, 2007

KMP - String pattern algo

Naive approaches for string pattern matching take in the complexity order of O(N*M), such that N is the size of the string to search, and M is the size of the string of the pattern to look for.

The KMP approach is based upon a function that details previous suffix/prefix operations.
This is doable in O(M) time:


F[M]; // all zero's.
j = 0;
for (int i = 2; i <= M;) {
     if (P[i-1] == P[j]) {
        F[i++] = ++j;
     } else if (j) {
        j = F[j];  // Shift by j - i
     } else {
        ++i;
     }        
 }

The KMP Failure Function is probably the hardest part, building a suffix datastructure that is also a prefix. The technique is to use dp with respect to the previously calculated fail function for determining the best result. In code, the recursive function would look like this:


int failFunction(int n)
{
   if (n <= 1) return 0; 
   int res = n - 1;
   do {
      res = failFunction(res);
      if (str[res] == str[n-1]) 
         return (res + 1);
   } while (res)
   return (0);
}

A memoized version of it goes like so:


int memoFF(int n)
{
   if (n <= 1) return 0;
   if (memo[n] != -1) return memo[n];
   int res = n-1;
   do {
      res = memoFF(res);
      if (str[res] == str[n-1])
         return (memo[n] = res + 1);
   } while (res);
   return (memo[n] = 0);
}

The actual algorithm for computing the failure function is the hard part. The rest is easy:


int KMP(string S, string P)
{
   int N = S.size(), M = P.size();
   for (int i = 0, j = 0; i < N;) {
      if (S[i] == P[j]) {
         ++i, ++j;
         if (j == M) return 1;
      } else if (j){
         j = F[j];
      } else {
         ++i;
      }
   }
   return 0;
}

Wednesday, October 31, 2007

DLIST hack (cont.)

Last blog was about a doubly circular linked list, as well as some common definitions. This article is going to talk about why a generic doubly circular linked list is useful. The main reason to use the DLIST structure mentioned in the previous article is because it is very generic. Otherwise you will have next/prev self-referential structure pointers. In this case, we made a struct whose sole members were self-referential structure pointers. Therefore, if any struct has this structure as a member, then it can link to any other struct that also has this same structure as a member.

This is a little bit odd, but useful. The oddity is the reason why it's a _hack_. To see why this would be useful is easy. You can chain multiple different types of structures. This is quite amazing actually, and there is a way to get the original structure that holds this DLIST object within it.

Before giving an example, I will describe a way, in which we can get the original structure whose DLIST member is being chained within the doubly cirucluar linked list. There is another post, in which I described the list-head problem. Whereby, if you have an element within a structure, you can go to the top of the structure if you know how many bytes the top of the structure is in regards to the actual element of the structure. All you need to do is subtract the bytes from the starting address of the element within the structure and the offset position of the element within the structure definition.

Let's see some code:


typedef struct dlist {
        struct dlist * next;
        struct dlist * prev;
} dlist_t;

typedef struct A {
        int c;
        dlist_t list;
        int a;
} A_t;

#define DLIST_INIT(dlistp) \
{ \
        (dlistp)->prev = (dlistp)->next = (dlistp); \
}
#define DLIST_ADD2HEAD(dlistp, newp) \
{ \
        dlist_t *oldfirstp; \
        oldfirstp = (dlistp)->next; \
        oldfirstp->prev = (dlistp)->prev; \
        (newp)->next = oldfirstp; \
        (newp)->prev = (dlistp); \
        (dlistp)->next = (newp); \
}
#define DLIST_ADD2TAIL(dlistp, newp) \
{ \
        dlist_t *oldlastp; \
        oldlastp = (dlistp)->prev; \
        (dlistp)->prev = (newp); \
        (newp)->next = (dlistp); \
        (newp)->prev = oldlastp; \
        oldlastp->next = (newp); \
}
#define DLIST_GET_ELEMENT(ptr, type, element_of_type) \
        ((type *)((char *)(ptr) - (size_t)offsetof(type, element_of_type)))

int main() {

        A_t a1, b1, c1;
        a1.a = 3, b1.a = 4, c1.a = 5;

        DLIST_INIT(&a1.list)
        DLIST_INIT(&b1.list)
        DLIST_INIT(&c1.list)

        DLIST_ADD2TAIL(&a1.list, &b1.list)
        DLIST_ADD2TAIL(&a1.list, &c1.list)

        printf("%d\n", DLIST_GET_ELEMENT(&a1.list, struct A, list)->a);
        printf("%d\n", DLIST_GET_ELEMENT(a1.list.next, A_t, list)->a);
        printf("%d\n", DLIST_GET_ELEMENT(a1.list.prev, A_t, list)->a);
        printf("%d\n", (unsigned)(&((A_t *)0)->list));

        return 0;
}

This allows us to chain any structure that has a DLIST member somewhere located within the structure.

Saturday, October 20, 2007

DLIST hack

DLIST stands for doubly-circular linked list.

These are defined as follows:


        -------
prev<-- |     | -->next
        -------

They actually point back to themselves, initially.


typedef struct dlist {
   struct dlist * next;
   struct dlist * prev;
} dlist_t;

Some key defines: (dlist is a structure, and newNode is a structure *)


#define DLIST_INIT(dlist) \
   (dlist).next = &(dlist), \
   (dlist).prev = &(dlist)

#define DLIST_ADD2TAIL(dlist,newNode) \
   (newNode)->next = &(dlist), \
   (newNode)->prev = (dlist).prev, \
   (dlist).prev->next = (newNode), \
   (dlist).prev = (newNode);

// Assumes the first element is actually
// dlist.next
#define DLIST_REMOVE_FROM_HEAD(dlist) \
   (dlist).next = (dlist).next->next, \
   (dlist).next->prev = &(dlist);

#define DLIST_REMOVE_FROM_TAIL(dlist) \
   (dlist).prev->prev->next = &(dlist), \
   (dlist).prev = (dlist).prev->prev;

#define DLIST_ADD2HEAD(dlist,newNode) \
   (newNode)->next = (dlist).next, \
   (newNode)->prev = &(dlist), \
   (dlist).next->prev = (newNode), \
   (dlist).next = (newNode);

#define DLIST_ELEMENT(dlist,type,element) \
   ((type *)((char *)&(dlist) - (size_t)offsetof(type,element)))

Buddy Memory Allocator

Buddy Memory Allocator

This is a simple memory allocator to design. Lots of internal fragmentation goes on, but the design is fairly simple. First off, when designing a memory allocator, basic essentials are understood as common sense knowledge.

Arena's are big memory "chunks" given to a user level process for dynamic memory allocation usage. Within each arena, there are fragments. Each arena manages its own
fragments that are allocated to the user. Also, above that, the memory allocator will license a memory fragment from any free memory segment >= to the fragment asked. Also, for alignment issues, the fragment asked will be the next highest byte-alignment for easy transaction. There are minimum and maximum memory allocation requests.

Also, the buddy memory allocation schema is used when freeing up memory allocations, and then coalescing fragments together to form the original allocation units. Also, when you coalesce two "buddies" together, they have to be of the same size (for easy manipulation). There are certain issues that the buddy memory allocator I have been studying does. It allocates the largest chunk first, and then goes down by powers of 2 until you run out of memory in the arena (also, note that no fragment size will be less than the minimum memory allocation unit defined).

With that in mind, the arena needs to know which fragments are being allocated and freed. Also, we need each fragment to know which arena it belongs to. There is an option also, to include an array of linked lists depending on the size of the allocation that it can perform. Also, there is the necessity of figuring out who the buddy of one fragment is. This is essential since we let each fragment have only one buddy. Therefore, the main target is to identify fragments->Arena and Arena->fragments relationships. Each fragment can have an identifier detailing the bitmap of the arena, and each arena could have a mapping detailing which allocations are free or not.

Therefore, here are a couple of attributes:


#define GET_BUDDY(arena,fragment,lg2size) \
   ((typeof(fragment))((char *)arena + 
               (((size_t)((char *)fragment - (char*)arena)) 
                 ^ (1<<(lg2size))))

typedef struct bitmap {
   uint32_t arenaKey1;
   uchar_t  bitset[TOTAL_FRAGMENTS / CHAR_BIT];
   uint32_t arenaKey2;
} bitmap_t;

typedef struct fragment {
   bitmap_t * arenaId;
   /* some fragment header info */
   uint32_t log2size;
} fragment_t;

/* Holds all fragment header's of a certain size, 
   and holds the fragment header's of all sizes. */
fragment_header_info * fragHeaders[N_ALLOCATION_SIZES];

Wednesday, October 10, 2007

Linux Kernel Code: #define for loop

When competing in algorithm-type competitions, there tends to be a need for looping through structures. The infamous #define for loop comes to mind. Nearly all TC competitors use it, but the originality was unknown -- well, until a couple of days ago. Here I am studying on Linux Kernal Internals, and I come to what seems to be either a macro or an inline function:


   for_each_task(p)

This segment of code is used a lot in the Linux Kernel. First and foremost, there is no notion of thread or process within the linux kernel. The single most basic unit idea of a transaction is a task. Therefore, the Linux Kernel has a table of tasks. Before, the Linux Kernel 2.4, the Linux Kernel would have a table or an array of task structures. Not anymore, it is constructed so that it holds a doubly circular linked list of tasks. Furthermore, the first task that gets inserted into this list is the kernel initialization. Therefore, it's name is such as "init_task" and so when you get back to this address, you know you have cycled around. Also, there is a notion of current -- the current task. Back to the point, since it is now a doubly circular linked list structure, the for_each_task(p) is defined as the following:


#define for_each_task(p) \
   for (p = &init_task; (p = p->next_task) != &init_task; )

This is the first place where a #define is used where a for-loop is constructed. Therefore, many programmers nowadays use a #define in low-level systems programming. It's nothing new; it's been used since all kernel eternity...

Tuesday, October 9, 2007

FFS: Revisited (How to find the first bit set)

This seems to be a reoccurring problem that's looked at within this blog. Moreover, it has many features that are good to learn from. This article will focus on the binary-search with the respect of the number of bits.

This is a classic binary search problem, with the bits as the search space, and the conditional expression will be if the mask & (bitwise-AND) bits_val == 0 or not.

Analysis:
If we separate an uint32_t into 2 16-bit segments, then a branch would be if the lower end or higher end held the first bit (0). And then branch with 2 8-bit segments, etc.
This is quite easy to see; the only stipulation is that we need to keep a running counter when we need to skip "bits". Also, in this case, we will always consider the low-end order since it will be more intuitive.

Therefore, if the bits_val & mask == 0, we need to count all those bits + shift the original bits_val by that number of bits, since we are accounting for it to be on the low-end order.

The algorithm therefore looks like so:


#define BITS 32
uint32_t ffs(uint32_t bits_val)
{
   uint32_t offset = 0;
   uint32_t mask = (1<<(BITS / 2)) - 1, nbits = BITS / 2;
   do {
      if ( (bits_val&mask) == 0) {
         // update offset and low-end order of bits_val
         offset += nbits;
         bits_val >>= nbits;
      }
      // update the bits used as well as the new mask.
      nbits /= 2;
      mask >>= nbits;
   } while(nbits);
   return (offset);
}

uint32_t fls(uint32_t bits_val)
{
   uint32_t offset = BITS-1;
   uint32_t nbits = BITS / 2, 
            mask = (((1<<(BITS/2))-1)<<(BITS/2);

   do {
      if ((bits_val&mask) == 0) {
         offset -= nbits;
         bits_val <<= nbits;
      }
      nbits /= 2;
      mask <<= nbits;
   } while(nbits);
   return (offset);   
}

Also, as a side note, there are many problems that deal with byte-alignment.
Another proposed solution, is the following:
(uses a math trick to get the dividend within range and multiply with the alignment)


#define ALIGN(x,align) ( (((x)+(align)-1)/(align)) * (align) )

Since it uses integer division, (x + align - 1)/align will give you the value that is within range, since x could be a multiple of "align," it will give you the low(ceiling) order of x + align - 1. When multiplying this by align, it will yield the exact (next) multiple of align >= x.

Friday, October 5, 2007

Automated daily jobs

On *nix machines, there's something called cron. It's very helpful for doing daily builds for testing purposes. Actually, it's good for anything that needs to be run by automation. Test automation, build automation, scheduled tasks, ibid et al.

The whole deal with cron jobs are that you have crontabs, where a crontab is a file that describes when it should be run, where mail should be sent to in case of output or errors. It will pinpoint the minute, hour, day of week, month, and the like. Also you give the script or command to automate. To see a more definitive description, check the man pages.

This makes your life a lot easier.

Wednesday, October 3, 2007

Life: Up to Date

Well, it seems I haven't written in a few weeks. I have been real busy and needing to make decisions. However, none of them are good.

#1) I dropped out of the master's program at UWF.
#2) I am more than behind in deadlines at work.
#3) I am more behind in my studies.
#4) I am behind in studying for the GRE/LSAT.
#5) I have made a decision to study with a CMU alum, to get back into the game.

Therefore, I need to do things that counter-balance my bad decisions.
My study plan goes as follows:
1) LSAT
2) GRE (Maths/Verbal)
3) Technologies (awk, sed, rss feeds, bash/ksh, php, C/C++)
4) Back-end Development (C/C++)
5) Datastructures/Algorithms
6) Projects - Game Programming
7) General Reading.

A situation that is indeed funny to behold. The reason for me dropping out hinges upon two facts that are pretty bold for me to say, but I believe they are true.
1. Dr. Coffey scored me an F and didn't respond with any emails or notification of my grade as an F.
2. Dr. Edwards is the most slack professor/advisor at the university.
Wisdom is not wisdom when it's brutally demeanored by slack individuals; it then just proceeds to be laziness with experience. Therefore, it's time to pick up the slack. Dropping out of school was the wisest choice I have ever made, and it just pushes me to strive even harder. (Plus, the topics I was studying was not what I need to be concentrated on)

-- some pathetic coder

Thursday, September 20, 2007

Address alignment

This problem is very common in OS type related interviews.

Basically, you are given an address and you need it to be n-aligned, meaning you need the address to start on a multiple of n. The problem itself, is not a hard concept. The quickest way is to add the address with the alignment and subtract values until it's a multiple of n. You guarantee it's a multiple (or greater than a multiple) because you are adding n to the address.

So a solution to this might be:


uint64_t aligned = address + alignment;
while (aligned%alignment) --aligned;

However, address itself could be already aligned, so if we add by alignment-1, this will suffice to guarantee that it will pick the shortest length for an address-alignment with respect to the address.


uint64_t aligned = address + alignment - 1;
while (aligned%alignment) --aligned;

Adding more stipulations to the limits of alignment such as alignment must be a power of 2.
This simplies things now. We know that if it has to be aligned to be a power of 2, then we know it will be at least even. However, a rule with checking if something is a power of two is this:


int isPowerOf2(uint64_t x) {
   return (x & (x-1)) == 0;
}

Therefore, no bits in x and x-1 are the same if x is indeed a power of 2.
Also, if we take address + alignment - 1 to be the initial step in this process, we know that the value will be greater than or equal to a power of 2 where alignment is a power of 2. This implies that if it is greater than or equal to a multiple of alignment, then we can just zero out the bits that alignment - 1 has, since it is required for it to have NO bits in common. And we know that no multiple of a power of 2 can have the (power of 2 - 1) bits set. This reasoning is actually pretty easy, if we add (power of 2 - 1), then it will have a remainder somewhere between 0 and power of 2 - 1; therefore, just zero out the bits that are there. Actually, this is indeed the solution.


uint64_t address_alignment(uint64_t address, uint64_t align) {
   return ((address + align - 1) & ~(align - 1));
}

Thursday, August 23, 2007

Random Code Post

Facebook Puzzle for Korn Shell.

The code follows:


#include <string>
#include <vector>
#include <cmath>
#include <iostream>
using namespace std;

#define isA(x) ( ('a'<= (x) && (x) <= 'z') || \
   ('A' <= (x) && (x) <= 'Z') )
#define isL(x) ( ('a' <= (x) && (x) <= 'z') )
#define toL(x) ( (x) + 'a' - 'A' )

// Is the sum of the bits that are set
//  less than or equal to 26?
int LESS_THANEQ_26(int x) {
 int count = 0;
 for( int i = 0; i < 26; ++i ) 
  if( x & (1<<i) )
   count += (i+1);
 return !!(count <= 26);
}


// Returns how many bits are set in *nearly* constant time.
inline int COUNT(int x) { 
 x = x - ((x>>1)&033333333333) - ((x>>2)&011111111111);
 return (((x+(x>>3)) & 030707070707) % 63);
}

// gcd(a,b,c) = gcd(a,gcd(b,c)) = gcd(gcd(a,b),c))
// lcm(a,b,c) = lcm(a,lcm(b,c)) = lcm(lcm(a,b),c))
// lcm(a,b) = a*b/gcd(a,b)
unsigned long long gcd( unsigned long long a, unsigned long long b ) {
 if( a == 0 ) { 
  return b;
 }
 return gcd( b % a, a );
}
unsigned long long lcm(unsigned long long a, unsigned long long b)
{ 
 return a * b / gcd(a,b);
}
// Main LCM method to call.
// It will actually call lcm().  This method
// chains all the lcm() calls together.
int LCM(int x) {
 // nothing within the set
 if( x == 0 )  {
  return 0;
 }
 
 unsigned long long a = 1;
 
 for( int i = 0; i < 26; ++i ) {
  if( x & (1<<i) ) { 
   a = lcm(a,i+1);
  }
 }
  
 return (int)(a);
}

// Memoiz keeps the max values of rotations.
int memoiz[27];

// Keeps masks of the bits that are set.
// If a bit is set, then that describes how many
// numbers are in that swap cycle.
int poss[500000];

 
int go() {

 int totalPossValidEntries = 0;

 for( int j = 0; j < (1<<26); ++j ) {
  // At most 6 bits can be set.
  int c = COUNT(j);

  if( c > 6) {
   continue;
  }
   
  poss[totalPossValidEntries++] = j;
 }
 
 // Now the total count is doable for calculating
 // the lcm.
 for (int i = 0; i < totalPossValidEntries; ++i )
  if( LESS_THANEQ_26(poss[i]) )
   memoiz[COUNT(poss[i])] >?= LCM(poss[i]);
   
 // now dp the results.
 for( int i = 0; i <= 26; ++i )
  for( int j = 0; j < i; ++j )
   memoiz[i] >?= memoiz[j];
 
 return 0;
}

int main(int argc, char **argv) {
 go();
 unsigned uniqueElem = 0;
 for( int i = 1; i < argc; ++i ) {        
  for( int j = 0; j < strlen(argv[i]); ++j) {
   if(isA(argv[i][j])) {
    if(isL(argv[i][j])) {
     uniqueElem |= 
      1<<(argv[i][j] - 'a');
    } else {     
     uniqueElem |= 
      1<<(toL(argv[i][j]) - 'a');
    }   
   } 
  }
 }
 cout << memoiz[COUNT(uniqueElem)] << endl;

 return 0;
}

Bitwise Ops and Its Applications

Bitwise Operations

Let's go over some basics. The most basic bitwise operations are |, &, ~, ^, and their respective assignment versions: |=, &=, ^=.

| is the OR operation. It will perform the OR of two (2) operands.
& is the AND operation. It will perform the AND of two (2) operands.
^ is the XOR operation. It will perform the XOR of two (2) operands.
~ is the "tilde" operation. It will flip the bits of a given integral type.

(Also, << and >> are left and right shifts)

Therefore, we can use integral types to perform these bitwise operations under.
I will assume everyone knows what the basic bitwise operations actually do.

Therefore, to go over why something like this would be useful. It is handy to keep track of lists without keeping bigger or larger datastructures, especially when the data is small. Meaning, that you just want to keep track of the indexes or numbers that are in or out of the list. With that said, using bitwise on integral types is ideal.

The basic knowledge is to look at the bits as indexes within an array for a given integral type. Then to include number i, you include bit i to the integral type. This is easy. To do that, you just bitwise-OR the integral type with the ith-bit being set. To do so, you would need to use the left shift (<<) operator. This is very useful. To unset or excludle a bit within a set you need a trick to take it from the list. There are many ways to do this; however, many of them need checks in order to not modify the actual integral type if that bit is NOT even in there. Therefore, an easy way to do this is the bitwise-AND the integral type with all bits set to 1 except the ith-bit. After this knowledge, now we know how to implement a bitset or set using an integral type without using a big datastructure. Since it is just overhead if you only are concerned with storing numbers. However, using integral types is limitd because currently the largest integral type is a uint64_t, which holds only 64 bits meaning the number range from 0 - 63. But the implementation follows:


uint64_t bitset = 0; // free all the bits within this integral type

// include the i-th bit
void include(int i)
{
   biset |= (1<<i);
}

// exclue the i-th bit
void exclude(int i)
{
   bitset &= ~(1<<i);
}

Bitwise operations are used heavily within orgranizing massive amounts of data - bitset tree structures. Also, with file systems and organizing data.

Tuesday, August 21, 2007

Doubly-Linked Lists

Doubly Linked Lists

I was reading this article the other day and thought I should share its contents. It's quite interesting. It has the notion of using one pointer to construct a doubly linked list of nodes. It copies the notion of the XOR-swap described in a previous article.


   x ^= y ^= x ^= y;

However, we don't need to swap, but only hold the XOR value of x and y, namely the value of


   x ^ y

Therefore, we can have one pointer hold the XOR of the prev and next nodes. And to extract the previous, you XOR the combination with the next node, and vice versa to extract the next node. This quite simple, but effecient. Also, when assigning a new value for this one pointer that holds the XOR-combination, you need to remove one of the two and replace it with a new value. The code below follows:


   #ifdef __BIT_32_REGISTER__
      #define XOR(a,b) ( (typeof((a))) ((unsigned)(a) ^ (unsigned)(b)) )
   #else  /*__BIT_64_REGISTER__*/
      #define XOR(a,b) ( (typeof((a))) \
       ((unsigned long long)(a) ^ (unsigned long long)(b)))
   #endif

   initial XOR setup:
   node->onePointer = XOR(prev,next);

   // since we know that onePointer always holds 
   // prev ^ next of the current node, we can just XOR
   // out the prev or next value:
   // 1.) let's replace the previous node with a new prev
   node->onePointer = XOR(node->onePointer,XOR(oldPrev,newPrev));

   // 2.) let's repalce the next node with a new next node
   node->onePointer = XOR(node->onePointer,XOR(oldNext,newNext));

Therefore, now we know what the single pointer should hold. And since it holds the XOR combination, we can XOR it with the prev (that's supposed to have been set with the next to come up with this XOR combination value, meaning onePointer ^ prev will remove prev, and will evaluate the value to just "next" within the XOR combination value) to remove it, and vice versa with the next pointer.

Friday, August 17, 2007

F-Question

Given a range [a,b], find out how many numbers within the range, have a prime number of bits set.
The brute force solution is too slow, especially if [a,b] are 64-bit unsigned integers, which in this case we'll assume.

Therefore, we need a way to count the bits. If perhaps we had a perfect range, meaning it was between 0 and a power of 2, then the number of bits that are prime, are the number of bits choose (prime number <= number of bits to choose from). The reason, being is that if we have n bits to choose from, and we want to pick p bits to be set to 1, then there are C(n,p) ways to choose the numbers., all of which are unique.

0000 - 0111 (where the power of 2 would be 1000 in this case)
So all we have to do is look at the bits that we can choose from, 3.
And we have two different prime number values to be chosen - 2 and 3.

Therefore, C(3,2) + C(3,3) is the total number of ways to do this.
This is the easy case. However, numbers aren't exactly given to us in such a perfect world. Multiple bits could be set for a given number x:
101011101

In this case we split the ranges into powers of 2 decreasing within MSB. Why am I doing this? Because it will be easier to calculate the number of bits within the given range. Because it basically reduces the problem to the initial power-of-2 solution we came up with earlier, except now we have a notion of fixed bits and free bits. The fixed bits have to be counted, however, the free bits have a choice. Therefore, the choose function will have to account that the free bits to choose from + fixed bits = prime number of bits. Therefore, we need to have the choose function to be C(n,k-fixed) such that k is prime, therefore, you are only going to choose k-fixed bits from the remaining n left. (It would be nice if k-fixed was >= 0)

The reason that these range values would be a power-of-2 solution is because the right most end of the range would be a "type" of power-of-2 - 1 value, meaning the difference between start and end is that you only have to look at the right most 1-bits that are set, and keep into consideration the remaining bits to the left that are "fixed" when computing the choose function. So, therein lies one last problem, generating all the powers-of-2 numbers starting from MSB and appending the new power-of-2 number. Also, there should be a function to count the number of bits that are set.

The following code shows the entire procedure: (C++)



#include <string>
#include <cstdlib>
#include <iostream>
#include <vector>
#include <map>
using namespace std;

typedef unsigned long long uint64_t;

#define MAX_BITS 64

// decently quick way to count how many bits are set:
uint64_t MIT( uint64_t x ) {
    x = x - ((x>>1ULL)&0x7777777777777777ull)
          - ((x>>2ULL)&0x3333333333333333ull)
          - ((x>>3ULL)&0x1111111111111111ull);
    x = ((x+(x>>4ULL))&0x0f0f0f0f0f0f0f0full) % 255;
}

// memoiz the choose function
map<pair <uint64_t,uint64_t>, uint64_t> nck;

// keeps the list of ranges: 
vector<uint64_t> start, end;

uint64_t next( uint64_t z, uint64_t x, int &y) {
    if (y == -1) return 0;

    for (int i = y; i >= 0; --i) {
        if (x & (1ULL<<i)) {
            // next time y will start
            //  at next (lower) index
            y = i-1;
            return z | (1ULL<<i);
        }
    }
    y = -1;
    return 0;
}
void init(uint64_t x) {
    // always append zero (0) first
    start.push_back(0);

    uint64_t z = 0;

    // which bit do we set? 
    int y = MAX_BITS-1;
    while ((z=next(z,x,y)) != 0) {
        start.push_back(z);
    }
}
uint64_t nchoosek( uint64_t n, uint64_t k)
{
    if (k > n) return 0;
    if (n == k) return 1;
    if (n == 0) return 0;
    if (k == 0) return 1;

    // memoized solution? 
    if (nck.count(make_pair(n,k))) {
        return nck[make_pair(n,k)];
    }
    // memoiz the solution here:
    return nck[make_pair(n,k)] = 
            nchoosek(n-1,k-1) + nchoosek(n-1,k);
}

int isPrime( int x ){
    if (x < 2) {
        return 0;
    }

    // could optimize it for just odd's and 2
    for (int i = 2; i*i <= x; ++i) {
        if ((x % i) == 0) {
            return 0;
        }
    }
    return 1;
}

uint64_t R( uint64_t x1, uint64_t x2) {
    // get the digits that are free
    uint64_t n = MIT(x2 - x1); 

    // find the number of fixed digits
    int fixed = MIT(x2 - (x2-x1)); //MIT(x1 & ~(x2-x1));

    uint64_t sum = 0;    // sum up the diff prime(s)

    // go through all primes, starting with fixed+const
    for (int x = fixed + n; x - fixed >= 0; --x) {
        if (isPrime(x)) {
            sum += nchoosek(n,x-fixed);
        }
    }
    return sum;
}
uint64_t f( uint64_t x ) {
    // find all the ranges
    init(x);

    for (int i = 1; i < start.size(); ++i) {
        end.push_back(start[i]-1);
    }
    end.push_back(start[end.size()]);

    // go through each range, and calculate the sums
    uint64_t sum = 0;

    for (int i = 0; i < start.size(); ++i) {
        sum += R(start[i],end[i]);
    }

    return sum;
}

uint64_t bits_prime( uint64_t a, uint64_t b )
{
        uint64_t fb = f(b);

        // clear the range lists:
        start.clear(); end.clear();

        uint64_t fa = f(a);

        // inclusion exclusion rule:                                                                                                                                                                                                
        return (fb - fa + isPrime(MIT(a)));
}

Thursday, August 16, 2007

Pointer Fun

Pointers

The term pointer is basically just another word for number. The only difference is that there is something unique about the pointer type. You are allowed to dereference it, meaning you are allowed to go to the actual (virtual) memory address and get its value located at that memory address.

Other than that, it can be addressed using an unsigned integer (if it's 32-bits) or an unsigned long long (if it's 64-bits).

Therefore, in C++, you can do something like the following.
char *c = (char *)malloc(32);

The same thing can be addressed using an unsigned int.
unsigned int i = (unsigned int)malloc(32);

The values of c and i are identical and equivalent. However, you cannot go to the memory address of c and dereference (without casts) using the variable i, type being unsigned int. However, with casts you can:

*((char *)i) = 'a';
*c = 'a'

The two expressions above yield the same exact semantics. The syntax is a bit different, but they *mean* the same thing. This is probably the hardest concept for beginner programmers of the C/C++ language to understand and fully grasp. If you understand this, then malloc() and free()ing are completely understandable. They are after all using number values, and an unsigned int is a 32-bit number value, which is the same type for a pointer on a 32-bit machine.

Casting between ints and pointers is done a lot, especially within OS/network specific applications. Manipulating pointers is essential to get into deep OS-specific applications, and therefore, this is a good way to start that type of learning. Also, anytime you get confused, just think of a pointer as an integer. Anywhere you can modify an integer with it being legal, the same holds true of a pointer type. After all, they are the same thing.

The most humorous example is when you pass a struct in by pointer, and you change an attribute of that struct. In general terms, anything you change within the function becomes changed., even if it's a pointer attribute. Some get really confused about this because when you pass something by pointer and you try to assign it a different value, the value after function calls remains unchanged. However, this is a different example, because the attributes of the struct being passed in by pointer is the actual address of the attributes. So you are actually going to its (virtual) memory address and changing it literally.

Manipulating memory is very intoxicating and fun as well.
Let's say you have this:


   typedef struct A {
        int a;
        char b;
   } A_t;
   typedef struct B {
        int c;
        A_t *a;
        short d;
   } B_t;

And you want to allocate space for the entire struct of B as well as the pointer to type A_t. But you can only use one malloc() call. How would you do this? Well the answer is to allocate space for both of them; however, you are going to have to have both structures side by side in (virtual) memory. This is quite simple.


   char *c = (char *)malloc(sizeof B_t + sizeof A_t);
   B_t *b = (B_t *)c; // first part is B_t
   b->a = (A_t *)(c + sizeof B_t); // second part is A_t

Therefore, you allocated space for both pointers, initializing the entire struct B_t.

A-Question

Suppose you are trying to print out the mirror image of a tree, a binary tree to boot.
How would you go about doing this? Well, if you know anything about tree structures, you know they are heavily recursive. There are two ways to do this, either modify the tree to be its mirror image, or come up with a way to traverse it mirror-imagedly. The funny thing is, it's virtually the same concept. However, I'm going to show you the former way. (I'll leave it as an exercise to the do latter algorithm)

This is going to be very short and sweet. The only thing you have to do is, change the pointers from right to left, and left to right, assuming you are granted the root node of the tree. Just do it recursively, like so:


   void mirrorImage( Node *root )
   {
      Node *left = root->left;
      Node *right = root->right;
      root->right = left;
      root->left = right;

      // now recurse like a heathen
      if (left) {
         mirrorImage(left);
      }
      if (right) {
         mirrorImage(right);
      }
   }

So basically you update the pointers, and then do it recursively. After all of this is finished, the result will be the mirror image of the tree.

C++ Tricks: Part II

Another set of tricks.

I am going to outline a couple of more tricks that are used for interviews and overall cool things to know:

How to identify the endianness of a machine:
The easiest way to do this is using a pointer to a short, and figuring out if the low order is a certain value that you expect from a big endian or little endian byte order.


bool isBigEndian()
{
     short s = 1;

     // big Endian if MSB == 0
     return (*((char*)&s) == 0);
}

Another trick is to use an unsigned integer as a set data structure. However, it's limited to holding only 32 values. Usually this is enough, but sometimes it's not. In any event, here's the process that goes into it.

You can look at an unsigned integer as a 32-bit integer, where the ith bit (0-indexed) is set to one (1) if and only if it is within the set, and zero (0) otherwise. Therefore, to add something to a set, there is a trick to use:


          x |= (1<<(i));

This will get the ith bit for you and bitwise-OR it and store it in the variable x. Usually it can be used like this for computing set-type problems, like the TSP-related problems, permutation, backtracking problems, and the like. Of course, to remove the set, there is a similar trick:


         x &= ~(1<<(i));

This will flip all the bits to a one (1) except for the i-th bit and bitwise-AND it with x. Therefore, the only bits that are changed within x was the i-th bit, everything else stays the same. This technique is quite useful.

Tricky way to check if a string is a palindrome. It takes n/2 checks such that n is the length of the string in consideration. The algorithm to do so is to check the mirror image of the indices since they should be identical in a palindrome. (Front<->End) The code follows:


     int len = str.size();
     for (int i = 0; i < len/2; ++i) {
          if (str[i] != str[len-1-i]) {
               return false;
          }
     }
     return true;

Obtaining the offset of a data member from a structure. This is quite useful. The key to this idea is that it is based on the rule that if you ever take the address-of (&) operator on an element that is dereferenced, it does not evaluate what's being dereferenced. The code follows:


      typedef struct B {
           int x;
           char y;
      } B_t;
      typedef struct A {
           char a;
           int b;
           short c;
           B_t d;
           long long e;
      } A_t;
      To find the offset of member 'e' within
      type A_t, you need to know the padding infrastructure.
      This could get complicated, so this is a better way:

      unsigned int offset = (unsigned int)(&((A_t*)NULL)->e);

Because it does not evaluate the NULL dereferencing e, this just calculates where the variable e is within the A_t structure. This is quite useful in things such as the list-head problem, which will be shown in an upcoming article. And since NULL is actually ((void*)0), the starting address of the structure in this case would be zero (0), which is the reason why it would give us the actual offset of where 'e' is stored at.

Wednesday, August 15, 2007

GCC Operators (C++)

Min/Max gcc operators:

Note: "< ?", "> ?", "< ?=", and "> ?=" should have not any spaces.


   "< ? and > ? and the min and max operators"

The concept with the min and max operators is that they return an expression that is the minimum or the maximum value of two operands. Therefore, instead of doing something like:


   x > y ? y : x; OR x > y ? x : y;

You can now do:


   "x < ? y; OR x > ? y;"

Also, you can chain these operands out:


   "x < ? y < ? z < ? a < ? b < ? c;"

This is helpful when you need to find the minimum value of multiple values.

Also, there is something called min-assingment and max-assignment gcc operators.
Something in code that corresponds to:
"Set x to the minimum of y and z" OR "Set x to the minimum of x and y"


   x = ( z < y ? z : y );
   x = ( x < y ? x : y );
Alternatively:
   "x = z < ? y;"
   "x = x < ? y;"

Now, something of the form ?= is valid. Therefore, the above would look like:


   "x < ?= y; // for x = x < ? y;"
   "x = (z < ? y);"

Note, you can mix everything together. The reason that this is important is because that it doesn't compute anything more than once.

Tuesday, August 14, 2007

Nvidia Interview Question

"Popcount"

This is a popoular question, and hopefully you know that the MIT HAKMEM is a solution to this. But assuming you don't really know how to think on your own, the solution posed here is good enough for you to grasp.

The "POPCOUNT" question is to count how many bits are set in an unsigned 32-bit number. So basically, the brute force solution is this:


     for (int i = 0; i < 32; ++i)
         if (bits & (1<< i))
             ++count;
// count now holds how many bits are set

This is actually not so bad, right? Well, if you are actually trying to figure out how many bits are set, you need a solution that is near constant time. Because you are dealing with OS-specific code now. So it better be faster than just looping through the number of bits and counting them. (Especially, if written in a high level language)

Therefore, they'll give you a hint (assuming you can understand the interviewer). They will propose, say you can use anything, how would you make this run faster? The hint is so vague and if you haven't dealt much with this type of stuff, you are virtually screwed :)

But here's something they should tell you, assuming you have a look-up table to count up how many bits are in a byte, how would you calculate how many bits are set within a 32-bit unsigned integer? Well, that makes it so much easier to answer :) All that's needed is to manipulate the bytes when passing into the lookup table.

Therefore, say you are looking at a lookup table with 256 entries, indexed by 0 - 255. The index is the actualy byte that you are trying to calculate. The result gives you the actual count.

For example,


      bitCount[x] gives you the number of bits that are set within this 8-bit byte, namely x.

Therefore, you have to split the 32-bit unsigned integer into 8-bit bytes when passing them into this lookup table, and you just sum up the result. Easy right? Indeed.
Here's the code:


// assume y holds the 32-bit unsigned integer value
for (int i = 0; i < 4; ++i)
     sum += bitCount[ (y>>(8*i)) & 0xff ];
The "& 0xff" part tells the compiler to just get the low-end order 8-bit byte.

This was straight forward after figuring out that you could have a lookup table that yields all the results.

C++ Tricks: Part I

Impress your interviewer

"Comma Expressions Hack"
Clearly, anyone who has done C++ knows the trick where you can initialize multiple variables of the same type at one time.

For example,
int a = 0, b= 1, c = 2;

This declares three variables with the same type, namely int. However, the comma is strictly an expression separator. This is quite useful when you hate the whole ordeal of using braces ("{}") for if-statements, loop type structures, and the like. You can use comma-separated expressions for "regular" statements. What do I mean by regular? Regular, in this context, means that an expression is not a "break" or a "return". These two (2) special statements are not "regular" expressions, not to be confused with a regular expression studied in theoretical computer science.

Therefore, you can do things like this:


if( a == 0 )
     b = 3, s = "hello", d = 4;

Multiple statements within an if-statement that doesn't exercise the utilization of braces. In any event, things like this should be properly avoided if adhering to coding standards within your academic or professional setting. But it's rather useful to know.

The "swap" without an extra variable.
Swapping two variables usually requires the use of another variable.

For example,


     int a = 3, b = 4;
     int tmp = a;
     // comma trick:
     a = b, b = tmp;

Therefore, we can do this without an extra variable, namely tmp.
It uses the XOR property of variables in C++.
A little clarification might help in this situation.
Say we have a and b as ints. Let's store the XOR of a and b into a variable named c:


         c = a ^ b;

Therefore, we know that:


        c ^ a == b
        c ^ b == a

We can use this feature, without having another variable, c.
If we do it correctly, and chain the XOR operation, there's no need for the extra variable.


       a ^= b;    // think of a holding the value that c would hold
       b ^= a;   // if a holds the value of c, then c ^b == a, so now b holds the value of a.
       a ^= b;  // if a held the value that c holds, then a ^ b, is actually (c ^ a) since b holds "a"
More compactly, you can do it as such:
      a ^= b ^= a ^= b;

Therefore, without using another variable, we have just swapped two integral types.

Note: These examples are for use with the C++ programming language. Though some of the tricks might work in the language, it is not wise to assume they all do.

Monday, August 13, 2007

Jobs and Interviews

Computer Science Graduates: Do you actually want to code?

All the cool jobs are geared towards hard interviews. This is a waste of time. You might as well just get a web development jobs since they pay more anyways. I mean the reason for going to school is to get a job right? Then, do college graduates actually want to work hard? I mean they worked while in school, eh? I guess that's why Stanford CS graduates go into financing and economics. That's where the money is. They spend a lot of money studying computer science and minoring in economics and they go into the financial market, not wanting to code, but critical thinkers and financial honsho's to make money. But enough of the spiel on CS graduates going after other careers, we'll talk more on the actual interviews and jobs that CS graduates that want to code are in the market for.

There are two types of coding categories that are in the market now. First there's web development and then there's systems programming. Web development stems vast areas and technologies, whereas systems programming is central to OS, Networks, C/ASM programming. Web development is easier, as well as more cost-productive. They are able to pay you more than a systems programmer. Possibly even double the salary. But then you have a hybrid among the two. Back-end C/C++/Java/C# programmers that work for financial companies. Now what can I say about these types of jobs. It basically takes the advantages of both areas, and intermingles different areas that suck -- well, besides the money part :)

Lab49 is a company that takes a lot of the brunt of the work there. As well as financial companies like banks and financial lenders ... think JP Morgan Chase, Lehman Brothers, Visa, Amex, Mastercard, and others. They have the ability to pay their coders a hefty amount, and the work isn't too shitty. It has the need for multi-threaded applications with an actual programming language, not just scripting languages. Not that scripting languages are bad, in any sense. So back-end programmers have the core ability to create infrasturctures, as well as some web development intermingled.

Therefore, if you want to work on systems-programming type jobs but want the full potential pay as web developement can offer, then the financial industry is your target. If you don't want to have too much problematic programming or even want to think *too* much, then go for web development. I mean seriously, CS graduates don't want to think past a certain amount -- we are lazy, prideful, and want to just chill. With that in mind, go find a shitty job for your shitty degree, that requires the least amout of work to actually do with a shitty boss.

Note: Some web developement companies are really well-developed, so they try to bring the back-end style into their mix, which is good. Especially when they have enough funds to do their own research. Keep that in mind. (Google, Amazon, Ebay, ibid. et. al.)

Friday, August 10, 2007

Coding 101: Intro to Programming

Basics to Coding.

First, I will describe everything in terms of mathematics and proceed into coding.
Some things, you should know are the basic function definitions in mathematics.

This is called a function definition.
f(x,y) : { x + 2 * y }

This itself doesn't do anything, but if you "plug" in values to f(x,y) : { x + 2 * y }, you can think of it as just replacing and substitution.
For example.,

f(1,2) using f(x,y) : { x + 2 * y } -> { 1 + 2 * 2 } = 5

This is quite simple, x is the first parameter and y is the second parameter.
So everywhere you see an x you replace with the first value (in this case it's a 1), and everywhere you see a y you replace with the second value (in this case it's a 2).

Then you just output the value 5.
Therefore, f(1,2) = 5

This is in terms of mathematics. Computer Science handles this a little bit differently, but the main concepts are the exact same.
In coding, you have types for everything. So in the above example, you need types for x, y, and the return of the function. In that case f(1,2) = 5,
5 is the return type.

Mathematics makes this one-dimensional, whereas Computer Science tends to make it more generic. They have types for everything as well. For example, f(1.02,2.03) is valid input in terms of mathematics because there is no strict "type" that is enforced. However, in Computer Science there are "types" and so if x and y were of type int., this would not necessarily be a valid use of the function f.

Therefore, let's look at the description of a function.


RETURN_TYPE FUNCTION_NAME( TYPE x, TYPE y )
{
       return x + 2 * y;
}

The RETURN_TYPE is the type that is evaluated by the keyword "return" in the code above.
Let's assume we only know the primitive types:
char
int
double

Therefore, RETURN_TYPE and TYPE have to be either one of the primitive types. By the way, primitive type just means basic types.

Therefore, the function could look something like this:


int f(int x, int y)
{
      return x + 2 * y;
}

Here, f is the function name. The function name could be anything it wanted to be.
Also functions can return no type., meaning the function does not want to return anything, and it cannot return anything. So if the function has no return type, the keyword is actually "void"
void f(int x, int y) { x + 2 * y; return ; }

If you notice clearly, the expression after the keyword "return" is just a semicolon ";" meaning just leave the function.

Any variable that is used within a program needs to have a type.
This means you cannot just have something like this:

x = 5;

if x is not defined somewhere like:
int x; // declaration
OR
int x = 5; // declaration and assignment = initializtion

Also, let me point out there is something called a function definition and a function call.
A function definition is telling you what the function actually does. A function call is when you call a function. That's easy right?

Before you call a function, it has to be defined somewhere. This makes sense to. You can think of it as a car analogy. To start your car, it first has to have an engine. If you try starting your car, and you have no engine, it won't work. So first get your engine setup properly, and then you can start using it.

// FUNCTION DEFINITION
int functionName( int x, int y )
{
return x + 2 * y;
}

// FUNCTION CALL
functionName(1,2);

There are some differences here. The main difference is that you don't specify the types in the function call. But in the function definition you have to specify *everything* so that's how it makes it clearer. Also, you can think of it as assignments going on.

So a function call works like this:
functionName(1,2); --> go look up the functionName with parameters having values 1 and 2.
So then it finds a match above:
int functionName( int x, int y ) { return x + 2 * y; }

Then it looks and matches the types for the parameters, so it does something like this:
int functionName( int x = 1, int y = 2 ) { return x + 2 * y; }

Meaning, now x has the value 1 and y has the value 2, and so it just goes and does the computation there. So (x=1) + 2 * (y=2) -> 1 + 2 * 2 = 1 + 4 = 5, and then returns the result 5.

so functionName(1,2) holds the value 5.

Here's the tricky part. The entity value of "functionName(1,2)" can be replaced with the value 5, because functionName has the return type of int, so "functionName(1,2)" or the called function of functionName(1,2) has an int type.

What does this mean? It means you can use it anywhere you can use an int.
Let's say we do this.

int z = 3; // we declare z to be an int and assign in the value of 3.
This can happen since z's type is an "int" and the type of 3 is an "int" also.
(3 is an int literal, meaning it's like a constant int. It's universally known.)

Therefore, we know that funcionName(1,2) has return type int.
Therefore, "functionName(1,2)" can be used anywhere an "int" type can.

So, we know that
int z = 3; // z is an int type

z = 4; // we can do this because 4 is an int type as well.

Can we do this?
z = functionName(1,2)

Indeed, because z and functionName(1,2) both have the same type we can assign functionName(1,2) to z.

since functionName(1,2) = 5, it's almost like a replace and subsitutition that takes place.
z = functionName(1,2) is the same as z = 5 (since functionName(1,2) equals 5)

This is just a short example of how functions work. A lot of the material was tedious to get the point across. Also, functions can have no return type, or one return type. But you can have any number of parameters. (no parameters, or many parameters) In our function we only had two (2) parameters - x and y. But you can have as many as you want.

*This is dedicated to a friend -- Q*

Thursday, August 9, 2007

Meebo Interview

Meebo Me!

So there's all this talk about some student at University of West Florida got an interview ticket with Meebo. Meebo is the wannabe Trillian but online website that allows you to connect to multiple IM clients at once. The idea isn't astounding, its just an implmentation done over the web. Interestingly, a computer science student at UWF actually implmented a C program that connected and login'ed a Yahoo user. So in that end, it's not unique nor difficult. It's just a matter of following the protocol that the other IMs use and implement it.

So in any event, let's go on to the interview dilemma with Meebo. Firstly, they tell you to answer their puzzles on their site and then to submit them with your resume in an email. If successful with the puzzles, a technical recruiter will grill you with questions you should already know if you were able to answer the puzzles on your own. In fact, Meebo gets many applications from students from Stanford and Pacific University, and so someone from UWF needs to stand out and show some passion, enthusiasm, as well as past projects and goals. There is a higher expectation because UWF is a smaller school, and the applicant needs to show he's got what it takes in the interview process with Meebo. THe technical recruiter, as far as I know, is very judgmental and bases his judgment on mere appearances without due regard to the facts of answering the questions. More on past projects and goals (3-5 yr range) than actually answering the questions. This is true because Meebo is trying to expand; however, the technical recruiter's knowledge is minimal and so it's easy to come up with things that are false but believable.

The only thing I have found that is against Meebo is their democracy in regards to students from schools that aren't as prestigious as Stanford University. There is almost a willingness to not even look at those students that are applying. This is truly unfair. Not all students coming out of Stanford University have passion, though they do have project they are working on. Projects don't make passion, though from a simplistic, dummed-down, amateruish point of view, it would imply that they do have passion because they are actually working on something. However, these are mere appearances, and you'd at least expect someone whose job is to inspect and detect what things motivate, inspirate, and creates passion in college graduates would be something more than "Are you working on current projects?" It's solely based on what is happenin now. Students are busy especially in their last semester, and what type of question is that anyways? There is no assessment in the past or even a consideration in their academic studies. Possibly, the technical recruiter really doesn't care that he finds someone suitable. And the fact that Meebo is trying to "double" their engineers in the year 2007, it doesn't seem that they'll be doing that with these types of bogus technical recruiters :)

The expectations from someone representing Meebo were weak and therefore, the Meebo experience wasn't worth as much a the UWF student thought it would be., despite the fact that the technical recruiter wasn't keen on how to judge applicants fairly.

Saturday, July 28, 2007

Re: MIT HACKMEM BITCOUNT

This is a reply to the MIT HACKMEM article that I posted earlier.
I mentioned I did not realize what they were doing in the previous article, but recently just stumbled on what they are actually doing. They are actually doing the same thing I was aiming for, however in a sneaky way.

They are actually performing the hamming algorithm using subtraction. On most architectures, subtraction and addition are close within clock cycle ranges. Though I probably should test the two solutions.

To get an overall sense of what they are doing, let's first examine 3-bit numbers. This will make it easier to extend to a 32-bit number.

Say the 3 lower bits are all 1's, therefore the number is actually 7. But we want to find the count in the 3-bit number, since all are set, the answer should be 3.

There's a technique that you can use to count the number of bits.
x - x>>1 - x>>2 as mentioned in the previous article.
So that's all we have to do:

x = 111 (value of 7 in decimal)
x>>1 = 011 (value of 3 in decimal)
x>>2 = 001 (value of 1 in decimal)
so therefore x - x>>1 - x>>2 = 7 - 3 - 1 = 3, which is the correct number of bits set.

To extend this to a 32-bit number, you have to notice one small thing.
When I did a shift over x>>1 and x>>2, there was a zero that was added.
You need to get the same effect without shifting over the bit from the 32-bit number, meaning something like this:

say x = 101 111 (in binary split into 3-bit segments)
x>>1 = 010 111
x>>2 = 001 010
x - x>>1 - x>>2 is not going to give us the effect we wanted, so we have to mask out the bits that are set that would delude us from getting the correct version.

Therefore, from the previous version the bits that were zero (0) when we used 3-bit numbers follows:
x (no zeroed bits)
x>>1 (the top (3rd bit) was zeroed out)
x>>2 (the top 2 (3rd bit and 2nd bit) was zeroed out)

Therefore, to get the 32-bit number that we want we need a correct mask, which would be to mask out the top bit for every 3-bit chunk for x>>1 and the top 2 bits for every 3-bit chunk for x>>2.
Suprisingly, if we use the octal numbering system, this is quite easy.
(The octal numbering system for zeroing out the top bit is just 03 and to zeroing out the top 2 bits is just 01;
This is because there's only 3 bits in an octal number:
111 = 7
011 = 3
001 = 1

Therefore you can think of the algorithm as actually doing something like:
x = (x&037777777777) - ((x>>1)&033333333333) - ((x>>2)&011111111111)

which is equivalent to x - ((x>>1)&033333333333) - ((x>>2)&011111111111)

And then it hammers down how many bits are set in each 3-bit chunk, and so we just do a 3-right bit shift addition and mod by 63 to get the final answer.

Here is some code in C,


#define y 0x1111111111111111ull
#define n 0x0f0f0f0f0f0f0f0full

#define MIT(x) \
   ( \
           ((HAMMING(x) + (HAMMING(x)>>4)) & (n) ) % 255 \
   )
#define HAMMING(x) \
   ( \
    ( (x)&(y)) + \
    (((x)>>1)&(y)) + \
    (((x)>>2)&(y)) + \
    (((x)>>3)&(y)) \
   )

The code MIT(x) will yield the solution where x is a 64-bit number. I had to change the implementation a little for 32-bit numbers and 64-bit numbers. For 64-bit numbers, you need to hold the sums in 4-bit chunks instead, and therefore mod'ing by 63 is no longer correct, but 255 is. And the mask has also changed accordingly. (This was my way of im plementing it from the last post, but the code looks nearly equivalent:


 x = x - ((x>>1)&0x7777777777777777) -
         ((x>>2)&0x3333333333333333) -
         ((x>>3)&0x1111111111111111);
 x = ((x+(x>>4))&0x0f0f0f0f0f0f0f0f) % 255;

q.e.d.

Monday, July 2, 2007

fls implementation

fls stands for find last bit set for an unsigned int.
For example, the function signature looks like unsigned fls( unsigned ); meaning that it returns the index of some unsigned integer that you pass to it.

There are a lot of approaches to this. (I am not going to point out the binary search/scan across the unsigned integer, though it is the classical solution)

First and foremost, I will explain what x & ~(x-1) actually means.
When x is a power of 2, the check if x & (x-1) == 0 is a trick that is employed to verify that x is indeed a power of 2. But why? What is actually taking place...

The simple fact is that x-1 is actually taking the right most (LSB) that is a '1' and converting it to a '0' and setting all bits after (to the right) it to a '1'.

This invariably explains why if x is a power of 2 that it sets all values behind that bit to a '1' and the bit that is initially set to a '0'. Therefore, if that is the case, then what does x & ~(x-1) mean?

Well, that's simple. If x-1 yields the lsb that is set to a '0' that means that is the first value that is not the same in x and x-1. That also explains that every bit above (msb) it are the same and everything below (lsb) it are the opposite. Therefore, if you flip the bits of x-1, which is ~(x-1), and AND (&) it with x, then the bits above the lsb that is set are different, and the bits that were set to '1's below the lsb that was set get set to '0's. But the lsb that was set in x & the lsb that was a '0' in x-1 when flipped gets turned to a '1' again, and therefore is the only bit that is the same.

So there we have it, x & ~(x-1) actually yields only one bit set, namely that of lsb that was a '1'.
Therefore, if we know this, there is a simple algorithm to perform:


unsigned fls( unsigned x ) {
return (unsigned)log2( (double) (x & ~(x-1) );
}

This will give us the index of the last bit that is set.

However, this case is slow because it uses the SIMD on Intel's architecture. Context switches and first-use faults yields this operation to be very slow, compared to the other solutions that exist that obviously try to speed the fls() operation up.

However, there is another solution. A solution that is quite unique. Something that is quite a hack, but the speed is incredibly astounding. Let's consider a char, call it c.
If c is split into 3-bit chunks, and only one bit is set in c, then it's rather easy to find the index by dividing by 2. For example,

100 = 4
4/2 = 2 which happens to be the index of the bit set

010 = 2
2/2 = 1 which happens to be the index of the bit set

001 = 1
1/2 = 0 which happens to be the index of the bit set

Therefore, we proved it for small numbers, shifting with correct values will give the correct result by adding the shift offset. This is index quite fast; however dividing by 2 is the same as right shifting by 1. So instead of right shifting by multiples of 3 and then dividing by 2, we can right shift by the multiple of 3 + 1. The implementation is below:


#define BITS 8
unsigned fls( unsigned x ) {
  int y = 0;
  x &= ~(x-1);
  for( int i = 0; i < y; ++i ) {
       y = (x>>(BITS*i)) & 0xff;

       // make sure it's not zero (0)
       if( !y ) continue;

       int ret = (y&07) ? (y&amp;amp;amp;07)>>1 :
                 (y&070) ? ((y&070)>>4) + 3 :
                 (y&0700) ? ((y&0700)>>7) + 6;

       return ret + (BITS*i);
  }
  return 0;
}

Therefore, (BITS*i) shifts the starting index to start on the valid byte assignment.

MIT HAKMEM BITCOUNT re-do:

Ok, so there's this famous question of how to count the bits in a number.
This is easily done by looping through the number & 0x1 and check if it's nonzero and increment the counter, but let's do it with constant memory and in constant time.

I've read through a lot of tutorials that tried to explain this, and it's overly complicated, imho.
I can give a simpler solution that makes sense to the dumbest of people. It uses almost the same concepts as the MIT HAKMEM solution.

So I guess I should give the HAKMEM MIT AI Labs Bit Count solution first:
(Let's assume the number you want to count is always the variable x)
x = x - ((x>>1)&033333333333) - ((x>>2)&011111111111);
x = (x+(x>>3)) & 030707070707) % 63;
x now holds the bit count.

This technique uses the fact that if you have a 3-bit number then all you have to do is:
x - (x>>1) - (x>>2)
so if you have a 32 bit number it's just:
x - (x>>1) - ... - (x>>31), or more generally if you have k-bit number:
x - (x>>1) - ... - (x>>(k-1))

Somehow, x>>1 & 033333333333 is supposed to give you the values x>>1, x>>3, ...
and x>>2 & 01111111111 is supposed to give you the values x>>2, x>>4, ...
therefore, it will yield x - (x>>1) - ... - (x>>31) and yield the solution, almost.

There is one last trick that they do. The solution is actually stored in 3-bit chunks.
However, we want to extract them. One way is to store them in 6-bit chunks where the first 3-bit chunks are all zero (0), then we can just take the number and mod it by 63 (2^6 -1) and it will give us the result, which is exactly what this algorithm does. However, to get it in 6-bit chunks where you have the LSBits as the 3-bit chunks you have to mask the sum(s) with something to the effect of 00707070707... however, the high end order actually only has 2-bit left, so that's why the mask is 03070707...

(But I never really got how they get (x>>1) + (x>>3) + (x>>5) by doing x>>1 & 03333....)
But that's the moral of the story.

That's why I came up with another implementation:

My implementation is a little easier if you are familiar with the hamming weight algorithm.
What you do is just add the weights of every other bit in the 3-bit sequence, sum them up, mask it with the 030707... and then mod by 63 again.

Therefore, here is my solution with the utilization of the hamming weight algorithm:
(again assuming the value is in x):

x = (x&011111111111) + ((x>>1)&011111111111) + ((x>>2)&011111111111);
x = ((x+(x>>3)) & 030707070707) % 63;


unsigned count( unsigned x ) {
     x = (x&011111111111) +
               ((x>>1) & 011111111111) +
               ((x>>2)&011111111111);
     return ((x+(x>>3))&030707070707) % 63;
}

The hamming weight algorithm can be looked like this:
0101 0111 1110 0001 = x
If you & 011111.... with x
0101 0111 1110 0001
1001 0010 0100 1001
Which will set every 3rd bit (starting from last) to a one or zero (everything else is zero)
If you do the same with x>>1, this will set every 3rd bit to one or zero (everything is zero) starting with the 2nd to last bit, and if you do the same with x>>2 it will respectively start with the 3rd to last bit.

Therefore if you add them, you will get the sum of the 3-bit chunks. However, you have shift to the right by 3, so that you can get the 3-bit + 3-bit chunks into a 6-bit chunk and then with the mask zero out the upper 3-bit chunk of this new 6-bit chunk. Because each of the previous 3-bit chunk held the correct value:

001 011 (looking at it like 3-bit chunks)
010 000
011 011 (2+1=3 and 3+0=3, but we want to merge them, so shift the result by 3 and add it to itself)
000 011
011 110 is the result in the right 3-bit set, so let's mask out the top 3-bit set since that won't give us anything we want. (since we are doing (0 3-bit + 1 3-bit ) + (2 3-bit + 3 3-bit)...)
therefore we mask it with 30707070707.... and when we mod it by 63, it takes the 6 bit chunks and adds the values, essentially. (since the top (3-bit) order is always 0)

Sunday, June 3, 2007

The C Tutorial

Ok, this blog is actually a segment devoted to a friend or two...
The tutorial will cover basics of C concepts, and implementations of a couple functions.

Part I - Basics of C
Arrays and Pointers are ideally the same. Actually, the name of an array is the same as a pointer to its first element. In simple terms a pointer is just an unsigned int, with "special powers." Pointers have the ability to go to the address it holds and access the data type T, if the pointer is of type "pointer to T" (T *). An array can be thought of as a T * const type, that is the address that it points to cannot be changed.

Also, you can think of there being the most basic unit of pointers, the char*, since a char is one byte. In any event, I will give some examples to demonstrate this:

struct A a;
struct A* pa = &a;

pa points to a struct, or put another way, pa holds the address of a struct A type.
*pa will go to the memory location that pa currently holds and extract the type to which it points, namely the type is struct A.

so

*pa holds the exact same object of "a" in the example above.

Since, the universal pointer is the char*, we can access each byte of the "a" object like so:

char *c = pa; // point to the starting address of "a"

We can scan through every byte, but there is something essential we must know about incrementing a pointer:
If we have a pointer of type T, then incrementing the pointer by one is equivalent to doing:
T *p;
p + 1 ; // increment by one
start at p's address and add sizeof(T) bytes to get "p+1"

That's why a char* is universal, everytime you increment a char*, it gets increased by only one.
Thereby, letting you move freely within any aggregate data type, using a char*.

Malloc() is a function call that returns an address; ideally, you just cast to the proper T* for proper assignment. However, if you understand what pointers are, then it's legal to just assign it to an unsigned int. (if you cast correctly)

Malloc, basically keeps track of a header that will hold the total size of the current block, as well as have a flag to check if it's currently in use. Free() knows the size of this magical header and will set the flag back to unused, so that this memory can be used again some time later. Think of the header something like so:


struct malloc_header {
      size_t totalSizeToAllocateIncludingHeader;
      char isBeingUsed;
};

So when you call malloc(), it's actually adding the size of the header to the total number of bytes to allocate:


int MALLOC(size_t size) {
          return size + sizeof( struct malloc_header);
 }

where the standard malloc() will return
the total number of bytes actually needed.

and when you call free, it moves back sizeof(struct malloc_header) bytes to get to the header and unset the isBeingUsed flag and deallocate that block of memory.

Another concept that should be known is pass by value and pass by reference. In C, there is no such thing as pass by reference. However, there is a hack to get passed this dilemma. Just pass a pointer (address of an object), and if you change the value through this pointer, then the object will be changed after the function call. That's the entire concept of "trying to make C parameters pass by reference." An example follows:


void func( char *c ) {
      *c = 3;
}

char c = 5;
func(&c);
// c is now 3 after the function call

Address-of operator is a nice operator to have. Bascially, it just does a look-up on the type that is using the address-of (&) operator and returns a pointer of that type. For example,

char c = 3;
char *d = &c; // c's type is "char", so &c returns a pointer to type char, char*
char **s = &amp;amp;d; // d's type is "char*", so &d returns a pointer to type char*, char**

char a[3];
&a is of type char (*)[3], because a is of type char[3], therefore it's a pointer to an array of 3 chars.

Common things to know about strings. In C, there is a little problem with C-style strings, as they call it. They are basically null-terminated strings, meaning, they need to end in the integer value 0, or ascii char value '\0'. It's basically a hack that was agreed upon. If this value is not present, none of the string library functions will work for you. Also, string literals (sequences of characters in double quotes) always end with a '\0' (null-terminator character). Therefore,
"hello" is actually seen as hello\0 by the compiler. There are some differences when assigning string literals though.

In pointer notation:
char *s = "hello"; // s points to the address where 'h' is located
char s[] = "hello"; // s contains the string 'h', 'e', 'l', 'l', 'o', '\0' as its elements in the array
The value of s, in both cases hold the address of a char that has value 'h', though they are NOT the same address, but they both hold the same value.

Function calls in C -
Basically when a function call is executed, an activation record is set up and added to the stack.
First what happens is that the parameters to the function get added, then the return address get added, and then the local variables (sometimes called automatic variables) are added to the stack. They are popped back off the stack in reverse order. That's why they tell you to NEVER return the address of a local variable. (It's automatically deleted)

*You should know fork(), waitpid(), wait() and semaphores/locks *


Part II - Examples (Code)
int atoi(char *str) {
    // str holds "1235"
    char isNeg = str[0] == '-'; // is it neg?

    int i, currValue = 0;

       /* if we multiply currValue by 10, we are shifting the currValue to the left */
       /* for example, say we have "23" and currValue = 2, currValue * 10 = 20, so
                currValue * 10 + 3 = 23
       */

    for( i = isNeg; i < strlen(str); ++i )
         currValue = currValue * 10 + (str[i]-'0'); // shift it to the left and add an integer               if( isNeg )                   currValue = -currValue;          return currValue; }   char *itoa(int number) {             char isNeg = number < number =" -number;" s =" (char*)malloc(32);" count =" 0;" t =" s;" count =" 0;" half =" size/2;" i =" 0;" i =" 0;" curr ="="" next =" curr-">next;
    curr->next = prev;

    return reverse(curr, next);
}

// iterative solution
struct Node * reverse( struct Node *head) {
    struct Node *prev = NULL;
    while( head ) {
         // keep track of the next Node* in the list
         struct Node *next = head->next;

         // let head's next pointer be its previous
         head->next = prev;

         // update the pointers for prev and head
         prev = head;
         head = next;
    }
    return prev;
}

// inorder traversal for a binary tree
void inOrder( struct Node *root ) {
    if( !root ) return;

    inOrder(root->left);

    /* print out the value of this node */
    printf("%d\n", root->id);

    inOrder(root->right);
}

Thursday, May 31, 2007

Interviewing tips Part I(for UWF students)

Ok, this is part one of a sequence of blogs that are directly correlated to help UWF students get jobs at the top tech companies in the nation, and even, the world. There has been a lot of hype that UWF students just end up at Harris Corporation or at Space and Naval Warfare (SPAWAR), and can't do any better. This blog (out of many) will help enlighten that false prophecy of where a UWF CS graduate *can* end up.

The blog concentrates on the courses that actually help the UWF graduate prepare for interviews. Of course, one needs to worry about the items in their resume. Be it true or not, you need to know what's on your resume and how to elaborate on the selected items during the interview. (Since that is all the interviewer has to go off in bringing you in for an interview in the first place) In any event, the first part is to know the in's and out's of your resume.

Foretelling what the interview questions are highly unlikely, but there are a couple things to consider and which courses will help in aiding the student to get passing marks in the interview. The questions for a top company will generally come from questions regarding OOP, syntax/coding, OOP-implementation in a certain language, datastructures, algorithms, and system programming-specific concepts. The language of choice is almost always C or C++. However, if the questions regarding OOP and OOP-implementation tend to crop up, then it's a viable choice between C++ and Java. Otherwise, you'd best stick to C and C++ to do well on the interview.

Given that information, the courses that are most pertinent for the UWF student are as follows:
DataStructures and Algorithms
Analysis of Algorithms
Intermediate Programming
Operating Systems

That's about it -- just four courses that need to be studied. However, that ends up being a lot of material to be familiar with. I will break down the essentials that need to be taken from the courses:
Intermediate Programming - OOP concepts, how to implement OOP concepts, differences in other languages (i.e., Java vs C++ with regards to inheritance)

Operating Systems - system programming concepts - threads, process, PCB, synchronization, IPC, thread scheduling, library vs kernel threads, as well as the dining philosophers problem.

Datastructures - various datastructures, problem sovling capability, and time complexity.

Analysis of Algorithms - time complexity, memoization, greedy algorithms, and the infamous TSP.

Here's a short list of the actual concepts that *someone* has studied that got that *someone* passed many interviews:
OOP - inhertiance, virtual functions, pure virtual functions, abstract class vs interface, multiple inheritance, virtual function implementation (C++), and polymorphism (what it means to be polymorphic as well as examples to show polymorphic behavior)

Threads - Kernel vs Library, communication between threads - mutex/semaphores, thread synchronization - CS (critical section), and the dining philosopher problem.

Algorithms/Datastructures - everything about linked list and tree structures. (The complexity for algorithms with trees is almost always O(n*log(n)) with the log(n) as being the key feature to go from top->bottom of a tree) "Know your time complexity of the algorithms"

any comments on what else should be studied throughout the UWF course curriculum? :)
Next blog will cover some specific concepts in C that should be known.
(Hint: pointers are evil)

Sunday, May 27, 2007

I have education, but I'm not educated...

Courses taught in school these days are a bit lacking.

For example, a lot of courses are taught in full Java, including Operating Systems. Refer to Joel on Software if you are inclined to thinking that Java is useful for teaching, edifying, and promoting students to think on their own.

In any event, this post is to ponder the actual education learned while in school. I was referred to a student that he was "in school, but not being educated" and "what is the point of school if you don't learn"...

Therein, lies a great paradox that probably won't be answered by the end of this writing, but I will definitely try a poke at it. So, students know that teachers don't really care about the students, and that there is a lot of material to learn, and the teachers usually give a half-ass lecture on the subject and then provide a way for you to fail the course by giving explicit material that was NOT covered in lecture. They seem to like to fail students with a smile on their face, and wonder why students don't really like them?! But the point is that teachers aren't really teaching their students anything anymore. I can go into specific detail, but I doubt it's really needed. This post is not for the professors as much as it is for the students. Students from all over can testify that this is the case. Though it probably isn't 100% true, it is true for the general public.

Therefore, how do students actually learn? Well, it's the environment, some would say. I don't necessarily disagree with this idea. If you are in a good and learning environment, then it's easy to be educated by other students. The environment is good and manageable to learn something. However, what happens when you have a poor learning environment, and the teachers, well... suck. I guess you're virtually screwed, and without a good job, to boot. With that said you have education, but aren't educated. Therefore, what we need is to have the best of both world, good environment and good teachers. But that's a perfect world, and well, teachers and the environment is somewhat NOT perfect. Anyways, just thoughts to ponder...

Friday, May 18, 2007

Stress...

Stress, that's the theme of this post...

What makes a person have spasms and fits of rage? Sometimes, it's seen when there is an overload of pressure and stress. The pressure, stress, and hard work for students studying Computer Science boils down to basically a 30-45 minute interview with a company.
This can be actually quite funny to watch. Possibly, the most entertaining event out there is when a CS graduate flares up in, but a moment after being unable to answer particular interview questions. I mean, this is their livelihood on the line. For the sake of the money, for the sake of the career, and for the sake of reputation, all your years of studying boils down to a couple questions, or even a "question" of the entire existence of your career path.
I don't know about you, but that's hella scary. There is a story of which someone has seen, and inevitably, had to be told. So, let's go on with the story here. A CS graduate had an interview with Google for a possible Software Engineer position. What a great chance to work with one of the top companies in the nation, and even the world! However, there's a reason why Google is the best. They hire the best, and that means hard interviews, and for which a good story behind the interview. In any event, as the interview went along. The complexion and reaction of the CS graduate was overwhelmingly changing and changing. And it wasn't for the betterment of the one being interviewed. Sadly to say, his appearance seemed like in a confounded state, ready to pounce on anything that seemed to go by him. If anything was a hint of what was going on his mind, the shaking of his head repeatedly throughout half of the interview should've been an indication of how stressful continuing the interview was! So the interview ends, in which happens to be a closure and more shaking of the head continues. So, the interview is over and the stress is done with, right? Think again, the CS graduate turns and puts his hands on his hips and shakes his head repeatedly. Then he grasps for the air as if it were a substance to be reached and then his hands turn to fists powering down onto the desk with a heavy, strong, and boisterous *thump*, literally heard by everyone in the small building. Possibly, that has quenched the stress and pressure of doing badly on the Google interview. Well not quite, he goes and punches the wall, constantly saying, "I knew that SHIT! I knew that SHIT! How come I get interviewed by someone who can't speak proper english, and on a phone, no less!" Interestingly enough, after the yelling, screaming, and utter disappointing shock of doing badly, he leaves the room and the building.
This has a little to do with stress and pressure on CS graduates trying to get into the top companies that they study so hard while in college to get into. More or less, it's too much stress that the students go through while competing with one another. Ideally, some have the "no stress-no pressure" mentality... well, that's before the interview. Not many have the effect of doing well under stress and pressure, or at least the stress and pressure come out and show, just like our young CS graduate that bombed the Google interview.
The story is funny, true, and unbearable to keep yourself to not laugh at. I mean, this has to be one of the funniest things to capture, possibly on a YouTube.com moment. Anyways, this is just one of the many stories that are out there with failed interviews and stressful moments. But I have a good friend that is going for a Micrsoft interview next week, and was joking with him how the stress factor is going to come; maybe not now, but he'll see. But good luck to him on that, and good luck to all that have to feel the stressful threat of interviewing and work through that eventful day.

Thursday, May 17, 2007

UWF Graduates doing OK

Throughtout the massive posts that seem to discredit UWF, there are some success stories. I guess I will dive into the current curriculum at UWF before going into some of these "success" stories.
The curriculum is similar to any other CS-school out there. Some professors are actually elite. Let me highlight some of the schools that the professors graduated from as well as some of the Ph.D. schools that UWF has to offer:

Massachussetts Institute of Technology (MIT) - Math/Operations Research
University of California San Diego (UCSD) - Comptuer Science/Engineering
Ohio State University (OSU) - Computer Science/Engineering
University of Michigan - Mathematics
University of Texas - Mathematics/Philosophy
State University of New York - Stony Brook (SUNY) - Mathematics
University of Cairo - Egypt - Computer Science/Mathematics/Engineering
Michigan State University (MSU) - Computer Science
College of William and Mary - Computer Science/Engineering/Philosophy
University of Southern Mississippi - Computer Science/Engineering/Mathematics
University of Alberta - Mathematics
University of Waterloo - Computer Science/Mathematics
University of Toronto - Computer Science
Purdue University - Mechanical Engineering
University of Ulm - Computer Science/Engineering

The wealth of diversity in the faculty is great. Some of the UWF faculty have even worked for AT&T Bell Labs. Many contracts with NASA, Motorola, and LANL are tied to UWF. However, for some reason, the professors are NOT pursuing options for the students at UWF to jump on board. This is quite unusual. If anything, they are missing out in class, just for their conferences and intense schedule with research on these oh-so-many contracts. Including student participation on some of these projects might even make it worthwhile for attending UWF's CS dept. But I guess that's not a major concern now since they have recently hired a Chemistry professor to take over the Engineering, Physics, Computer Science, and Mathematics departments.

In any event, there is some point to this babbling going on today.
Ah, yes, the success stories.
I've talked about a student gaining entrance into Microsoft from UWF. Though, it was through an "in," so I guess that's a good thing for UWF's reputation.
There are a couple students that are doing really well that are UWF alums.
Currently, they are not residing in the state of Florida anymore, however, they are achieving an income of greater than $65,000 USD (which is higher than the average starting salary for MS graduates) Touche for them!

I guess I'm stuck here writing this blog, and totally wish I could make that type of money, eh?!
Anyways, to each his own. But the point is that UWF indeed has a good faculty, good resources, good contracts for research, as well as good students. The only thing I see lacking is the need for the professors to latch onto students to help them grow and create an environment where the faculty and students can cooperate on a learning and educating plateau. This would totally increase the chances for students to actually have a job by the time they graduate.

And I'm not talking about some of these crappy web development jobs out there that pay $50,000 that does not require much thinking involved. It's time that UWF contribute to the actual research in the area, and that is NOT including IHMC, the non-profit organization that pays up to its type of organization. Yeah, that's right, "NON-PROFIT" and that ain't no joke. I mean, come on. $7.50/hr for undergraduates as being testers... and $10.00/hr for graduate students interning. I think I'd make more at McDonald's on any given day, and that's without a 4-yr degree at a university that IHMC says they sponsor. That's totally bogus, especially since the main head guru used to be a faculty member at UWF. Talk about low-balling the purchase price for a UWF-alum that's concentrated in Computer Science. Some graduates need to actually pay off their bills, not stack them up, even after graduating college!?!

Anyways, that's the need that UWF currently has, and I think, if UWF actually stepped up to the plate, they could definitely spearhead the future technologists to compete with the current market in California, Seattle, DC, and New York. That's my opinion. I mean, the professors at UWF know how to teach, and some of them do it fairly well. Now, all that's left is to actually get some of the professors to care about their students. Why the hell would you even get into teaching if you did NOT care about the students. Are you just saying random things that the students don't understand, and then saying, "I DID MY JOB." What kind of load of crap is that? And what kind of load of crap is it where the professors teach a subject where they haven't dived into fairly much without code. I mean, if your'e going to teach Networks, you should at least know how to do some coding. Your lectures should be at least well-prepared. There is no excuse for not being prepared as a professor. That's your livelihood... no one really cares if you have a pretty face and hooters, teach the damn subject! (not really pointing to anyone but yeah)

Um yeah, so the point is that UWF's faculty is really good, but they should have more of a caring attitude with respect to their students. I recently read an article about Condoleeza Rice, when she was an Associate Professor at Stanford. Students would actually come to her with their problems, not related to school, just their problems. And she said she actually spent time talking to them about their problems, and figuring out the best way to approach the problems. I mean, she's awesome... a teacher that cares, who would've thought that actually would've been important!
But anyways, the point is that UWF is a good school, and their material is the same as any other school, but if they cared more about their students, the students could make a more meaningful impact in the world. But I guess since all the UWF graduates that are doing OK don't care much about that, they're more inclined to go for the jobs that pay more money. Is that really what the CS degree is for? Do we study 4-5 years for a degree to join a financial company that does great things, but why is UWF creating financial programmers/developers? Anyways, I guess students (graduates) will work in whatever field they chose, but at least guide them in the right area... (By the way, there is nothing wrong with being a financial programmer if that's what you want to do, but why do it if it's not.)

On a last note, a professor (who actually cared) asked his student, "You should try to get into a good school for your Ph.D.; you're smart, and I think you have potential. Or you could just get a job and make money, but I thought you might want to do something better than that." This, in essence, drives the spirit that teachers who care are more awesome than teachers who are there to make money and work on their own research.... than say, inspire their students to revolutionize the world.

Any comments?