Wednesday, November 21, 2007

Printing Neatly

Word wrap in most text editors take the greedy approach of formatting a long non-newline text that spans into a multi-line paragraph. However, we will look at something that tries to minimize the "badness" or shall we say the number of unused whitespace. Whitespace between words does not count as unused whitespace, but rather space that is used to fill the rest of the line is "unused."

Firstly, we need to consult the badness function:

c[i,j] = (SPACES_PER_LINE - (j - i + Length(word k)(i,j)))^2
Length(word k)(i,j) = SUM from k = i to j the length of the word k.

Also, the memoized version of this can be used to identify a one-state function, that depends on the number of words. Note that also, if c[i,j] is not INFINITY (the operand inside the squared is >= 0), then the total cost is just c[i,j]. Else, choose for every k, such that k = i to j-1, find the optimal solution to fit up to k words on previous lines plus the cost c[k+1,j] on this line. The solution is the minimum. So the recurrence relation is:

f(n) = MIN { c[1,n], MIN { for(k,1,n) f(k) + c[k+1,n] } }

Therefore, in code the memoized version would look like so:

#define SPACES_PER_LINE 80
#define INFINITY 1e9
uint32_t memo[N + 1];

uint32_t c(uint32_t i, uint32_t j)
{
uint32_t spaces = j - i;
uint32_t wordLength = 0;
for (int k = i; k <= j; ++k) {
wordLength += str[k].size();
}
int32_t res = SPACES_PER_LINE - (spaces + wordLength);
if (res >= 0) {
return ((uint32_t)res * res);
}
return (INFINITY);
}
uint32_t f(uint32_t n)
{
if (memo[n] != -1) {
return (memo[n]);
}
if (c(1, n) != INFINITY) {
return (memo[n] = c(1, n));
}
uint32_t min = INFINITY;
for (int k = 1; k < n; ++k) {
min = MIN(min, f(k) + c(k+1, n));
}
return (memo[n] = min);
}

This solution could also have been built up using dp.

No comments: