Jump to content
Ian Harris

Code Vanishing Once Compiled

Recommended Posts

Hi all,

 

I'm having a look at the code generated by Sourceboost, and perhaps I've got confused, but this doesn't seem right. This is the routine.

 

void draw_clear_screen() {
uns8 count,count1;

for(count = 0; count < sizeof(draw_buffer0); count++) {
	draw_buffer0[count] = 0;
}

#if DRAW_TOTAL_BUFFER_SIZE > 256
#warning "Got greater than 256"
for(count = 0; count < sizeof(draw_buffer1); count++) {
	draw_buffer1[count] = 0;

}
#endif	

#if DRAW_TOTAL_BUFFER_SIZE > 512
for(count = 0; count < sizeof(draw_buffer2); count++) {
	draw_buffer2[count] = 0;
}	
#endif
#if DRAW_TOTAL_BUFFER_SIZE > 768
for(count = 0; count < sizeof(draw_buffer3); count++) {
	draw_buffer3[count] = 0;
}	
#endif

}

 

Note the use of multiple buffers since Sourceboost doesn't handle > 256 byte arrays yet. In this case, there are two buffers, one 256 bytes, the second 248.

DRAW_TOTAL_BUFFER_SIZE is defined as 504

 

I put the #warning in to see if the define is correct - it is, a #warning shows up during compilation.

 

But this is what I get out the other end:

 

 

 

 

void draw_clear_screen() {

uns8 count,count1;

for(count = 0; count < sizeof(draw_buffer0); count++) {
0326  6A44		  CLRF draw_clear_0001E_1_count
0328			label31
0338  2A44		  INCF draw_clear_0001E_1_count, F
033A  D7F6		  BRA	label31

	draw_buffer0[count] = 0;
0328  0101		  MOVLB 0x01
032A  EE01F000	  LFSR 0x00, gbl_draw_buffer0
032E  50E9		  MOVF FSR0L, W
0330  5044		  MOVF draw_clear_0001E_1_count, W
0332  26E9		  ADDWF FSR0L, F
0334  0E00		  MOVLW 0x00
0336  6EEF		  MOVWF INDF0

}

#if DRAW_TOTAL_BUFFER_SIZE > 256
#warning "Got greater than 256"
for(count = 0; count < sizeof(draw_buffer1); count++) {
	draw_buffer1[count] = 0;


}
#endif	

#if DRAW_TOTAL_BUFFER_SIZE > 512
for(count = 0; count < sizeof(draw_buffer2); count++) {
	draw_buffer2[count] = 0;
}	
#endif
#if DRAW_TOTAL_BUFFER_SIZE > 768
for(count = 0; count < sizeof(draw_buffer3); count++) {
	draw_buffer3[count] = 0;
}	
#endif

}


void draw_setup_io() {

 

Note that the second loop has been (presumably) optimised away! Even if I remove the #if, it gives the same result. But if I do this:

 

void draw_clear_screen() {
uns8 count,count1;

for(count = 0; count < sizeof(draw_buffer1); count++) {
	draw_buffer1[count] = 0;

}
for(count = 0; count < sizeof(draw_buffer0); count++) {
	draw_buffer0[count] = 0;
}


#if DRAW_TOTAL_BUFFER_SIZE > 512
for(count = 0; count < sizeof(draw_buffer2); count++) {
	draw_buffer2[count] = 0;
}	
#endif
#if DRAW_TOTAL_BUFFER_SIZE > 768
for(count = 0; count < sizeof(draw_buffer3); count++) {
	draw_buffer3[count] = 0;
}	
#endif

}

 

void draw_clear_screen() {

uns8 count,count1;

for(count = 0; count < sizeof(draw_buffer1); count++) {
0326  6A44		  CLRF draw_clear_0001E_1_count
0328			label31
0328  0EF8		  MOVLW 0xF8
032A  6044		  CPFSLT draw_clear_0001E_1_count
032C  D00A		  BRA	label32
033E  2A44		  INCF draw_clear_0001E_1_count, F
0340  D7F3		  BRA	label31
0342			label32

	draw_buffer1[count] = 0;
032E  0103		  MOVLB 0x03
0330  EE03F000	  LFSR 0x00, gbl_draw_buffer1
0334  50E9		  MOVF FSR0L, W
0336  5044		  MOVF draw_clear_0001E_1_count, W
0338  26E9		  ADDWF FSR0L, F
033A  0E00		  MOVLW 0x00
033C  6EEF		  MOVWF INDF0


}
for(count = 0; count < sizeof(draw_buffer0); count++) {
0342  6A44		  CLRF draw_clear_0001E_1_count
0344			label33
0354  2A44		  INCF draw_clear_0001E_1_count, F
0356  D7F6		  BRA	label33

	draw_buffer0[count] = 0;
0344  0101		  MOVLB 0x01
0346  EE01F000	  LFSR 0x00, gbl_draw_buffer0
034A  50E9		  MOVF FSR0L, W
034C  5044		  MOVF draw_clear_0001E_1_count, W
034E  26E9		  ADDWF FSR0L, F
0350  0E00		  MOVLW 0x00
0352  6EEF		  MOVWF INDF0

}


#if DRAW_TOTAL_BUFFER_SIZE > 512
for(count = 0; count < sizeof(draw_buffer2); count++) {
	draw_buffer2[count] = 0;
}	
#endif
#if DRAW_TOTAL_BUFFER_SIZE > 768
for(count = 0; count < sizeof(draw_buffer3); count++) {
	draw_buffer3[count] = 0;
}	
#endif

}


void draw_setup_io() {

 

It generates the other loop. Swapping them back:

 

void draw_clear_screen() {
uns8 count,count1;

for(count = 0; count < sizeof(draw_buffer0); count++) {
	draw_buffer0[count] = 0;
}

for(count = 0; count < sizeof(draw_buffer1); count++) {
	draw_buffer1[count] = 0;

}


#if DRAW_TOTAL_BUFFER_SIZE > 512
for(count = 0; count < sizeof(draw_buffer2); count++) {
	draw_buffer2[count] = 0;
}	
#endif
#if DRAW_TOTAL_BUFFER_SIZE > 768
for(count = 0; count < sizeof(draw_buffer3); count++) {
	draw_buffer3[count] = 0;
}	
#endif

}

 

generates:

 

void draw_clear_screen() {

uns8 count,count1;

for(count = 0; count < sizeof(draw_buffer0); count++) {
0326  6A44		  CLRF draw_clear_0001E_1_count
0328			label31
0338  2A44		  INCF draw_clear_0001E_1_count, F
033A  D7F6		  BRA	label31

	draw_buffer0[count] = 0;
0328  0101		  MOVLB 0x01
032A  EE01F000	  LFSR 0x00, gbl_draw_buffer0
032E  50E9		  MOVF FSR0L, W
0330  5044		  MOVF draw_clear_0001E_1_count, W
0332  26E9		  ADDWF FSR0L, F
0334  0E00		  MOVLW 0x00
0336  6EEF		  MOVWF INDF0

}

for(count = 0; count < sizeof(draw_buffer1); count++) {
	draw_buffer1[count] = 0;


}


#if DRAW_TOTAL_BUFFER_SIZE > 512
for(count = 0; count < sizeof(draw_buffer2); count++) {
	draw_buffer2[count] = 0;
}	
#endif
#if DRAW_TOTAL_BUFFER_SIZE > 768
for(count = 0; count < sizeof(draw_buffer3); count++) {
	draw_buffer3[count] = 0;
}	
#endif

}


void draw_setup_io() {

 

It vanishes again. I'm also slightly puzzled by the lack of a return at the end of the routine.

 

Any thoughts anyone?

 

kind regards

Ian.

Share this post


Link to post
Share on other sites

Ian,

 

Not enough time to look at it a properly but maybe a clue for you - If you restrict your buffer size to < 256 i.e. 255 then it seems to compile as you wanted. If the second buffer size is 256 then the second part seems to be optimised out as you say. Interestingly if you make the first buffer size 256 it then compiles both!

 

Regards

 

davidb

Share this post


Link to post
Share on other sites

David,

 

Well done. The unsigned short works, including getting the return at the end of the subroutine - unsigned char generates crazy behaviour. I've been using unsigned char for some time, perhaps this is a fault with the new compiler.

 

cheers

Ian.

 

Ian,

 

OK, another clue - try making your count variable an unsigned short.

 

Regards

 

davidb

Share this post


Link to post
Share on other sites

Hi Ian,

 

Turn ON all warnings and the compiler will issue a warning.

 

You have an infinite FOR loop as count will always be < 256 for a 256 byte array as it is a unsigned char (as David indicates). Infinite loops don't need a return instruction. Smart compiler :)

 

Cheers

 

Reynard

Share this post


Link to post
Share on other sites

Ahh yes. The compiler is not seeing 256 as a 16 bit number and hence isn't promoting the comparison to 16 bits.

 

Yet I want an 8 bit variable to loop through all the values...something like this should work:

 

count = 0;

do {

draw_buffer0[count] = 0;

count++;

} while (count != 0);

 

(I have "All warnings" turned on and it didn't complain.)

 

cheers

Ian.

 

Hi Ian,

 

Turn ON all warnings and the compiler will issue a warning.

 

You have an infinite FOR loop as count will always be < 256 for a 256 byte array as it is a unsigned char (as David indicates). Infinite loops don't need a return instruction. Smart compiler :)

 

Cheers

 

Reynard

Share this post


Link to post
Share on other sites

No warning eh! It gave me one when DRAW_TOTAL_BUFFER_SIZE set to 504.

 

You could just sneak in a little cast for the loop compare.

	for(count = 0; (int)count < sizeof(draw_buffer0); count++) {
	draw_buffer0[count] = 0;
}

 

Cheers

 

Reynard

 

On second thought no you can't it is still a char. Stupid me.

:)

Edited by Reynard

Share this post


Link to post
Share on other sites

Thanks Reynard, yes, almost!

 

I still think this is broken:

 

unsigned char x,y;

for (x =0; x < 256; x++) {
y = x;
}

 

Resulting in:

 

for (x =0; x < 256; x++) {
009E  6A3D		  CLRF main_1_x
00A0			label8
00A4  2A3D		  INCF main_1_x, F
00A6  D7FC		  BRA	label8

y = x;
00A0  503D		  MOVF main_1_x, W
00A2  6E3E		  MOVWF main_1_y

}
<further code not compiled>

 

And yes, it does give a warning - expression always true. But looking at the code - it's just broken, it never actually gets to the y=x bit, if that was actually your intention. The result in real life would be a warning and a random hang (interrupts aside).

 

Looks like this code will do the trick (until sourceboost has > 256 byte arrays):

 

void draw_clear_screen() {

uns8 count;

#if DRAW_TOTAL_BUFFER_SIZE < 256
	count = 0;
	do {
		draw_buffer0[count] = 0;
		count++;
	} while (count < DRAW_TOTAL_BUFFER_SIZE);
#else
	count = 0;
	do {
		draw_buffer0[count] = 0;
		count++;
	} while (count != 0);
	#if DRAW_TOTAL_BUFFER_SIZE < 512
		do {
			draw_buffer1[count] = 0;
			count++;
		} while (count < DRAW_TOTAL_BUFFER_SIZE - 256);
	#else
		do {
			draw_buffer1[count] = 0;
			count++;
		} while (count != 0);

		#if DRAW_TOTAL_BUFFER_SIZE < 768
			do {
				draw_buffer2[count] = 0;
				count++;
			} while (count < DRAW_TOTAL_BUFFER_SIZE - 512);
		#else
			do {
				draw_buffer2[count] = 0;
				count++;
			} while (count != 0);
			#if DRAW_TOTAL_BUFFER_SIZE < 1024
				do {
					draw_buffer3[count] = 0;
					count++;
				} while (count < DRAW_TOTAL_BUFFER_SIZE - 768);
			#else
				do {
					draw_buffer3[count] = 0;
					count++;
				} while (count != 0);
			#endif
		#endif

	#endif		

#endif


}

 

 

 

No warning eh! It gave me one when DRAW_TOTAL_BUFFER_SIZE set to 504.

 

You could just sneak in a little cast for the loop compare.

	for(count = 0; (int)count < sizeof(draw_buffer0); count++) {
	draw_buffer0[count] = 0;
}

 

Cheers

 

Reynard

 

On second thought no you can't it is still a char. Stupid me.

:)

Share this post


Link to post
Share on other sites

Ian,

 

You have to follow the instruction address sequence and not the sequence the instructions are listed.

 

Always look at the .lst file and not the .asm file when reading assembler output.

 

Cheers

 

Reynard

Share this post


Link to post
Share on other sites
Ahh yes. The compiler is not seeing 256 as a 16 bit number and hence isn't promoting the comparison to 16 bits.

 

Yet I want an 8 bit variable to loop through all the values...something like this should work:

 

count = 0;

do {

draw_buffer0[count] = 0;

count++;

} while (count != 0);

 

(I have "All warnings" turned on and it didn't complain.)

 

cheers

Ian.

 

Hi Ian,

 

Turn ON all warnings and the compiler will issue a warning.

 

You have an infinite FOR loop as count will always be < 256 for a 256 byte array as it is a unsigned char (as David indicates). Infinite loops don't need a return instruction. Smart compiler :)

 

Cheers

 

Reynard

You have an infinite loop when testing for unsigned char < 256 but the above solution is also dangerous. Just because we all see and use a lot of code that relys on numbers rolling over to 0, especially with timers etc, doesn't mean we should do it if we don't have to. It's poor coding and makes for big headaches when porting to new platforms or even re-using code on the same platform. A good example will be when SourceBoost releases the next version of the compiler that allows > 256 byte arrays for 18f devices. You will still be on the same platform and your code is now difficult to update.

 

I haven't tested this but I think it produces the same or fewer instructions than your rollover solution above:

draw_buffer0[0] = 0;
for( count = 255; count; --count ) 
draw_buffer0[count] = 0;

// And my experience with SourceBoost has been that it can often optimize
// a do-while loop more than a for loop. Not sure why. So use your code above
// with the following mod. Note that testing for non-zero is usually faster than 
// testing for a specific number such as 255.

count =255;
draw_buffer0[0] = 0;
do {
draw_buffer0[count] = 0;
count--;
} while( count );

// And using a pointer is probably faster than either of the above. 
// My understanding is that's how most memset/memcpy algorithms work except that they
// can sometimes further optimize by using larger variable sizes such as combining 4 chars
// into 1 long and then writing to memory 1/4 as often. May not work on 8-bit architecture.

unsigned char *p = draw_buffer0;

while( p < draw_buffer0 + 256 ) 
*p++ = 0;

 

I'm curious what the SourceBoost optimizations are on each of the above...

 

-Henry

Share this post


Link to post
Share on other sites

Hi Henry,

 

My (updated) original do/while loop:

 

- 13 instructions

 

Your three suggestions:

 

- 14 instructions

- 11 instructions

- 27 instructions

 

Nice work! It would be interesting to see how many *executed* instructions it takes. Probably the quickest would be to unroll the loop:

 

draw_buffer[0] = 0;

draw_buffer[1] = 1;

etc

at the cost of 256 instructions (but at a cost of ROM).

 

 

uns8 count;

draw_buffer0[0] = 0;
068A  0100		  MOVLB 0x00
068C  6B80		  CLRF gbl_draw_buffer0, 1

for( count = 255; count; --count )
068E  69B1		  SETF main_1_count, 1
0690			label61
0690  53B1		  MOVF main_1_count, F, 1
0692  E009		  BZ	label62
06A2  07B1		  DECF main_1_count, F, 1
06A4  D7F5		  BRA	label61
06A6			label62

draw_buffer0[count] = 0;
0694  EE00F080	  LFSR 0x00, gbl_draw_buffer0
0698  50E9		  MOVF FSR0L, W
069A  51B1		  MOVF main_1_count, W, 1
069C  26E9		  ADDWF FSR0L, F
069E  0E00		  MOVLW 0x00
06A0  6EEF		  MOVWF INDF0


// And my experience with SourceBoost has been that it can often optimize
// a do-while loop more than a for loop. Not sure why. So use your code above
// with the following mod. Note that testing for non-zero is usually faster than
// testing for a specific number such as 255.

count =255;
06A6  69B1		  SETF main_1_count, 1

draw_buffer0[0] = 0;
06A8  6B80		  CLRF gbl_draw_buffer0, 1

do {
06AA			label63

draw_buffer0[count] = 0;
06AA  EE00F080	  LFSR 0x00, gbl_draw_buffer0
06AE  50E9		  MOVF FSR0L, W
06B0  51B1		  MOVF main_1_count, W, 1
06B2  26E9		  ADDWF FSR0L, F
06B4  0E00		  MOVLW 0x00
06B6  6EEF		  MOVWF INDF0

count--;
06B8  07B1		  DECF main_1_count, F, 1

} while( count );
06BA  E1F7		  BNZ	label63


// And using a pointer is probably faster than either of the above.
// My understanding is that's how most memset/memcpy algorithms work except that they
// can sometimes further optimize by using larger variable sizes such as combining 4 chars
// into 1 long and then writing to memory 1/4 as often. May not work on 8-bit architecture.

unsigned char *p = draw_buffer0;
06BC  0E00		  MOVLW HIGH(gbl_draw_buffer0+D'0')
06BE  6FB3		  MOVWF main_1_p+D'1', 1
06C0  0E80		  MOVLW LOW(gbl_draw_buffer0+D'0')
06C2  6FB2		  MOVWF main_1_p, 1


while( p < draw_buffer0 + 256 )
06C4			label64
06C4  0E00		  MOVLW HIGH(gbl_draw_buffer0+D'0')
06C6  6FB5		  MOVWF CompTempVar1071, 1
06C8  0E80		  MOVLW LOW(gbl_draw_buffer0+D'0')
06CA  6FB4		  MOVWF CompTempVar1070, 1
06CC  0E00		  MOVLW 0x00
06CE  27B4		  ADDWF CompTempVar1070, F, 1
06D0  0E01		  MOVLW 0x01
06D2  23B5		  ADDWFC CompTempVar1071, F, 1
06D4  51B5		  MOVF CompTempVar1071, W, 1
06D6  5DB3		  SUBWF main_1_p+D'1', W, 1
06D8  E102		  BNZ	label65
06DA  51B4		  MOVF CompTempVar1070, W, 1
06DC  5DB2		  SUBWF main_1_p, W, 1
06DE			label65
06DE  E209		  BC	label66
06F0  D7E9		  BRA	label64
06F2			label66

*p++ = 0;
06E0  51B3		  MOVF main_1_p+D'1', W, 1
06E2  6EEA		  MOVWF	FSR0H
06E4  51B2		  MOVF main_1_p, W, 1
06E6  6EE9		  MOVWF FSR0L
06E8  4BB2		  INFSNZ main_1_p, F, 1
06EA  2BB3		  INCF main_1_p+D'1', F, 1
06EC  0E00		  MOVLW 0x00
06EE  6EEF		  MOVWF INDF0

 

Ahh yes. The compiler is not seeing 256 as a 16 bit number and hence isn't promoting the comparison to 16 bits.

 

Yet I want an 8 bit variable to loop through all the values...something like this should work:

 

count = 0;

do {

draw_buffer0[count] = 0;

count++;

} while (count != 0);

 

(I have "All warnings" turned on and it didn't complain.)

 

cheers

Ian.

 

Hi Ian,

 

Turn ON all warnings and the compiler will issue a warning.

 

You have an infinite FOR loop as count will always be < 256 for a 256 byte array as it is a unsigned char (as David indicates). Infinite loops don't need a return instruction. Smart compiler :)

 

Cheers

 

Reynard

You have an infinite loop when testing for unsigned char < 256 but the above solution is also dangerous. Just because we all see and use a lot of code that relys on numbers rolling over to 0, especially with timers etc, doesn't mean we should do it if we don't have to. It's poor coding and makes for big headaches when porting to new platforms or even re-using code on the same platform. A good example will be when SourceBoost releases the next version of the compiler that allows > 256 byte arrays for 18f devices. You will still be on the same platform and your code is now difficult to update.

 

I haven't tested this but I think it produces the same or fewer instructions than your rollover solution above:

draw_buffer0[0] = 0;
for( count = 255; count; --count ) 
draw_buffer0[count] = 0;

// And my experience with SourceBoost has been that it can often optimize
// a do-while loop more than a for loop. Not sure why. So use your code above
// with the following mod. Note that testing for non-zero is usually faster than 
// testing for a specific number such as 255.

count =255;
draw_buffer0[0] = 0;
do {
draw_buffer0[count] = 0;
count--;
} while( count );

// And using a pointer is probably faster than either of the above. 
// My understanding is that's how most memset/memcpy algorithms work except that they
// can sometimes further optimize by using larger variable sizes such as combining 4 chars
// into 1 long and then writing to memory 1/4 as often. May not work on 8-bit architecture.

unsigned char *p = draw_buffer0;

while( p < draw_buffer0 + 256 ) 
*p++ = 0;

 

I'm curious what the SourceBoost optimizations are on each of the above...

 

-Henry

Share this post


Link to post
Share on other sites

Ian,

 

If you are experimenting with loop unrolling try using Tom Duffs 'Duffs Device' or its variants. Speed v ROM tradeoff.

 

Cheers

 

davidb

Share this post


Link to post
Share on other sites

David,

 

After some searching:

Consider the following routine, abstracted from code which copies an array of shorts into the Programmed IO data register of an Evans & Sutherland Picture System II:

 

	send(to, from, count)
register short *to, *from;
register count;
{
	do
		*to = *from++;
	while(--count>0);
}

(Obviously, this fails if the count is zero.)

The VAX C compiler compiles the loop into 2 instructions (a movw and a sobleq,

I think.) As it turns out, this loop was the bottleneck in a real-time animation playback program which ran too slowly by about 50%. The standard way to get more speed out of something like this is to unwind the loop a few times, decreasing the number of sobleqs. When you do that, you wind up with a leftover partial loop. I usually handle this in C with a switch that indexes a list of copies of the original loop body. Of course, if I were writing assembly language code, I'd just jump into the middle of the unwound loop to deal with the leftovers. Thinking about this yesterday, the following implementation occurred to me:

 

	send(to, from, count)
register short *to, *from;
register count;
{
	register n=(count+7)/8;
	switch(count%8){
	case 0:	do{	*to = *from++;
	case 7:		*to = *from++;
	case 6:		*to = *from++;
	case 5:		*to = *from++;
	case 4:		*to = *from++;
	case 3:		*to = *from++;
	case 2:		*to = *from++;
	case 1:		*to = *from++;
		}while(--n>0);
	}
}

Disgusting, no? But it compiles and runs just fine. I feel a combination of pride and revulsion at this discovery. If no one's thought of it before, I think I'll name it after myself.

 

It amazes me that after 10 years of writing C there are still little corners that I haven't explored fully. (Actually, I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into.)

 

Many people (even bwk?) have said that the worst feature of C is that switches don't break automatically before each case label. This code forms some sort of argument in that debate, but I'm not sure whether it's for or against.

 

 

I'm not sure it would compile under sourceboost, and pointer arithmetic on PICs is never going to be an efficient way to do anything, and PICs don't have a general purpose set of registers per se... Still the concept is interesting. The smaller the thing you need to do multiple times, the more the "loop management" overhead costs you overall.

 

In the draw library I'm creating for PicPack, the idea is to buffer the display before sending it out - meaning that things like clearing the display (or rather, the buffer) and the act of sending out the buffer to the device become costly. There's some sort of trade off between matching the buffer to the hardware (ie, y=0 top left or bottom left, horizontal alignment of buffer or vertical alignment) which makes the "paint" quicker and matching the buffer to how easy it is to calculate which pixel to turn on / off (which makes drawing onto the buffer quicker). I'm still exploring both.

 

Ian.

 

Ian,

 

If you are experimenting with loop unrolling try using Tom Duffs 'Duffs Device' or its variants. Speed v ROM tradeoff.

 

Cheers

 

davidb

Share this post


Link to post
Share on other sites

Ian,

 

I was only suggesting the possibility of trying the Duff principle of embedding a do-while within a switch for simply clearing your arrays, not necessarily the form shown in the example for copying using pointers.

 

However, as you point out, it is academic anyway since SourceBoost won't compile it.

 

I just had to try a simple example but the compiler expects a right parenthesis at the end of a case statement so a do-while cannot be spread across the switch. Nothing ventured nothing gained!

 

Cheers

 

davidb

Share this post


Link to post
Share on other sites

Join the conversation

You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...