Pointer arithmetic and comparisons.

Fri Dec 8 17:42:04 AEST 1989

In article <257ECDFD.CDD at marob.masa.com>, daveh at marob.masa.com (Dave
Hammond) writes:
> Machine/OS: 286-compatible/MSDOS
> Compiler: Turbo-C 1.5
>    [ stuff deleted in concern of bandwidth]
> some_function(char *buffer, int len)
> {
> 	char *p = buffer;
> 	char *e = &buffer[len];
> 
> 	while ((*p++ = getchar()) != EOF && p < e) {
> 		...
> 		}
> 
> 	...
> }
>     [ more stuff deleted ]
> 
> The problem occurs when the address resulting from &buffer[len] exceeds
> 65535.  For example, if &buffer[0] is 65535 and len is 100, &buffer[len]
> becomes 99, making `while (p < e)' immediately false.
> 
> I was under the impression that (for n > 0) buffer[n] should not yield
> an address lower than buffer[0].  Is the pointer comparison I am doing
> non-portable, or otherwise ill-advised ?
> 
> Thanks in advance.
> 
> --
> Dave Hammond
> daveh at marob.masa.com

This question is probably one that a lot of people on the net have gotten bit
by or will get bit by, so I will answer over the net.  Yes the value of the
pointer to the end of the string array is suposed to be larger than the pointer
to the beginning of the string array.

What is causeing the problems with the comparison in the MS-DOS
enviroment is that
the 8086 family of micro-processors has a segmented architecture.  The
net result
of this is that in compareing pointers you need to compare both the
segment and the
offset to determine the placement in memory.  Because of the segmented
architecture
compilers of MS-DOS machines support many memory models.  These models
are listed below:
     TINY - One segment - all pointers are a 16 bit offset
     SMALL - One segment for code and one for data - all pointers are a
16 bit offset but
             code is in a different segment than data (i.e. pointers to
functions have a
             different segment that pointers to data) - usual default
                 NOTE: The example you give should result in a stack
corruption problem
     MEDIUM - any number of code segments on data segment pointers are
non-normalized
              (i.e. there are many segment:offset pairs that resolve to
the same linear
               address.  (segmets are defined on 16 byte boundries wit a
size of 64K) )
     COMPACT - one code segment any number of data segments
non-normalized pointers
     LARGE - any number of code segments, any number of data segments,
non-normalized
             pointers
     HUGE - and number of code segments, any number of data segments,
normalized pointers

There exists also another difference for non-normalized and normalized
pointers.  The non-
normalized pointers wrap around at 64K (65536 -> 0) and do not adjust
the segment. Therefor,
in the code fragment you posted when p is incremented it points to
offset zero in the same
segment as the start of the array, not the segment starting 64K above
the one that holds the
start of the array.  But, a normalized pointer has it's offset
automatically reset to keep
the offset less than 16 bytes.  To do this the segment is adjusted. 
This means that using
huge pointers in the example above should (CMA - Cover My A**) fix the
problem with wrap around.

There is a dis-advantage to working with huge pointers - they take more
time because of the
normalization.

I hope this quick overview of the architecture of the IBM-PC helps and
is not just a waste of
network bandwidth.

BTW - this segmentation is not a poblem with *NIX on a PC because *NIX
automatically uses the
HUGE memory model for an 8088, 8086, 80188, 80186, and 80286 and either
the HUGE or the
protected mode flat memory model for the 80386 (unfortunately this model
is not available
to DOS on a 80386 beacuse DOS runs in real mode not protectred mode).

Glen U. Sunada
gs940971 at longs.LANCE.ColoState.EDU