MorningMorningSun Jun 27 11:50:19 2004
String Lift
Topics: Programming

My current bathroom book, Programming Perl, has an interesting note in it -- the Perl compiler doesn't hoist initializations out of loops, suggesting that the programmer should exercise some common sense. This is a bad inconsistancy, I think. For you non-programmers, let me demonstrate what hoisting does..

int num_processed=0; char* thisline; while(thisline = readline("Give me a name:")) { int length=0; length = strlen(thisline); if(length == 0) {break;} printf("The name was %d characters long\n", length); }

A good C compiler will effectively turn it into this (let's ignore non-hoisting optimizations): int num_processed=0; char* thisline; int length; while(thisline = readline("Give me a name:")) { length = strlen(thisline); if(length == 0) {break;} printf("The name was %d characters long\n", length); }

There are two things to note here, first, length always gets a new value before it is read, so the initializing to zero is unneeded. Further, and this is the main point, there's no need to keep creating and destroying the length variable every time the while block is run through again, so it can do it once outside the block instead. At least as of the writing of that Perl book, Perl will not do this optimization for you. There are no doubt all sorts of other optimizations that Perl does for you, but not that, because they want people to make obvious optimizations themselves. This is important -- it is a good thing to suggest to programmers that they make high-level optimizations, but generally these are of the kind that a compiler cannot (or at least would be difficult to) make for you. It would be fascinating to design a programming language where very rich metadata were provided routinely so really intelligent compilers could more easily make optimizations that no compiler of current languages could do safely.. Perl6 will be making some steps towards this with a rich attribute system. Anyhow, the reason that this particular optimization should be made automatically for the user, contrary to Perl design at that time (hey, maybe in Perl 5.8 or later it changed), is another philosophy in programming that I take to a sort of extreme -- avoid globals. Why avoid globals? It's hard to tell when variables change when it could be changed anywhere in the program. Frequently, the globals are not actually used through the entire program anyhow, but some lazy and bad programmers make everything global. I extend the avoiding of globals to .. well, let's call it superscoping. Under this philosophy, even within functions, we try moderately hard to keep the number-of-lines-scope of variables small. If, for example, we're in the middle of a large, complicated function, and enter an area of the code where we're trying to do something complex and need a lot of variables for it that don't belong in the rest of the code, we'll create an unconditioned block to scope those variables. Example:

Instead of void doOpenPrefsFile(...) { // First acquire the lock, then open the file, then read it // Stage 1: Acquire int lockid; int lockmgrsocket; char* fname; FILE* myfile; char errstringLINK="";

  1. define THISLINE_SIZE 80
char thislineLINK; int parseline; int in_block_parse; int in_multiline;

lockid = socket(...) ....

// Stage 2: Open myfile = fopen(fname, ...) if(myfile == NULL) { strncpy(errstring, "Failed to open file: "); strncat(errstring, my_geterror()); ... } // Stage 3: Read

while(thisline = fgets(thisline, THISLINE_SIZE, myfile)) { if(regex_match(thisline, "^\S*#")) { ... } ... parseline++; } }

A superscoping way to do that would be: void doOpenPrefsFile(...) { // First acquire the lock, then open the file, then read it char* fname; FILE* myfile;

       // Stage 1: Acquire
{

int lockid; int lockmgrsocket; lockid = socket(...) .... }

       // Stage 2: Open
{

char errstringLINK=""; myfile = fopen(fname, ...) if(myfile == NULL) { strncpy(errstring, "Failed to open file: "); strncat(errstring, my_geterror()); ... }

       }
       // Stage 3: Read
{
  1. define THISLINE_SIZE 80
char thislineLINK; int parseline; int in_block_parse; int in_multiline; while(thisline = fgets(thisline, THISLINE_SIZE, myfile)) { if(regex_match(thisline, "^\S*#")) { ... } ... parseline++; } } }
From that, it's obvious that there's no hidden meaning to any of those variables outside where they're actually used, because their scope and their use are much more tightly bound. The compiler doesn't need to work as hard (and can be smarter, especially in dynamic languages like Perl where it's hard to be smart and consistant at the same time). It's tempting, of course, to simply declare that those blocks are sufficiently seperate that they should be their own functions. This is sometimes the case, but depending on how much connectedness there is between the parts of the function, it can be a pain to pass everything needed to break things into functions. This approach offers a middle ground for when it makes sense, between splitting things off and keeping them in a single function, and in fact makes it easier to break things off later should that prove desirable. Note, however, that when it makes sense for the compiler to hoist (most of superscoping applies to conditional blocks as well), it should do it. Dividing what actually happens in the code from the conceptually clean version is the point of optimization, and with superscoping, it should be a win-win situation, instead of, as in Perl, a lose-lose one (either your code is slower or it's harder to read and possibly slower anyhow because it's hard to optimize, or, optionally, it's much uglier if you use double blocks).


Time Heals All Wounds.. And Then Kills the Patient
Previous Next