Reference manual for 386 assembly language on AT&T Sys V

Marco S Hyman marc at dumbcat.sf.ca.us
Mon Oct 15 17:16:52 AEST 1990


In article <15563 at reed.UUCP> celarier at reed.bitnet () writes:
    I am looking for a reference manual for the 386 assembly language
    as recognized by as(1) on AT&T's System V, release 4.  AT&T does not
    use the same notation as ASM386, presented in the Intel literature,
    but I cannot find a document which specifies the AT&T notation.

Assuming that SysV Rel 4 uses the same syntax as SysV release 3.x here it
is.  I get enough requests for this thing that I think the post is worth
while.  (But I'll listen to flames that disagree -- to the mailbox please,
no need to clutter the rest of the net.

BTW: This doc is very similar to the doc in the SunOS Doc box that comes
with a 386i.

// marc

dumbcat% zcat ~src/as386.doc.Z

8<------------------------------ Cut Here ------------------------------>8
				  - 1 -























		   Preliminary 386 Assembler Definition
	      Prepared by INTERACTIVE Systems Corp. - 2/5/86








































				  - 2 -



       1.  Purpose_of_this_Document

       This document provides the  third  draft	 of  the  assembler
       language	 definition  for the 5.3/386 CCS.  The goal of this
       effort is to take the current 286 assembler and	upgrade	 it
       to a 386	assembler in the minimum possible time.	 This docu-
       ment describes the resulting product.

       1.1  INVOKING_THE_ASSEMBLER

       The assembler is	invoked	by the command:
       as [-o outfile] [-n] [-R] [-v] [-u] [-x]	infile


       The flags have the following meaning:

       -o filename
	     Use filename as the output	file.  The output file name
	     is	 generated by the algorithm at the end of this sec-
	     tion.

	 -n
	     No	address	optimization.

	 -R
	     Remove (unlink) the input file after assembly is  com-
	     pleted.

	 -V
	     Write the version number of the assembler on the stan-
	     dard error	output.	 This option does allow	for normale
	     assembly.

	 -u
	     Remove unreferenced debugging symbols from	the  symbol
	     table.

	 -x
	     Extended addressing (48-bit pointers) will	be used.

       The input assembly language program is read from	infile	and
       the  output object module is written to outfile.	 The assem-
       bler only accepts one infile on a command line.	If  outfile
       is  not	specified,  the	 name is created from infile by	the
       following algorithm:

	  + If the name	infile ends with the two characters .s,	the
	    name  outfile  is  created	by replacing these last	two
	    characters with .o.













				  - 3 -



	  + If the name	infile does not	end with the two characters
	    .s	and  is	 no  more than 12 characters long, the name
	    outfile is created by appending .o to the name infile.

	  + If the name	infile does not	end with the two characters
	    .s	and  is	 greater  than 12 characters long, the name
	    outfile is created by appending  .o	 to  the  first	 12
	    characters	of  infile.  This satisfies the	UNIX system
	    requirement	that a file name be no more than 14 charac-
	    ters long.

       1.2  INPUT_FORMAT

       The input to the	assembler is a text file.  This	 file  must
       consist of a sequence of	lines ending with a newline charac-
       ter (ASCII LF).	Each line can contain one  or  more  state-
       ments.  If several statements appear on a line, they must be
       separated by semicolons(;).  Each statement must	be  one	 of
       the following:

	  + An empty statement is one that contains  nothing  other
	    than  spaces,  tabs,  and  form-feed characters.  Empty
	    statements have no meaning to the assembler.  They	can
	    be inserted	freely to improve the appearance of a list-
	    ing.

	  + An assignment statement is one that	gives a	value to  a
	    symbol.   It consists of a symbol, followed	by an equal
	    sign(=), followed by an expression.	 The expression	 is
	    evaluated  and  the	 result	 is assigned to	the symbol.
	    Assignment statements do not generate any  code.   They
	    are	 used  only  to	assign assembly	time values to sym-
	    bols.

	  + A pseudo operation statement  is  a	 directive  to	the
	    assembler  that does not necessarily generate any code.
	    It consists	of a pseudo  operation	code,  followed	 by
	    zero  or  more  operands.	Every pseudo operation code
	    begins with	a period(.).

	  + A machine operation	statement is a mnemonic	representa-
	    tion  of  an  executable  machine  instruction  that is
	    translated by the assembler.  It consists of an  opera-
	    tion code, followed	by zero	or more	operands.

       In addition, each statement can be modified by one  or  more
       of the following:

	  + A label can	be placed at the begining of any statement.
	    This consists of a symbol followed by a colon(:).  When
	    a label is encountered by the assembler, the  value	 of











				  - 4 -



	    the	location counter is assigned to	the label.

	  + A comment can be inserted at the end of  any  statement
	    by	preceding  the	comment	with a slash(/).  The slash
	    causes the assembler to ignore any	characters  in	the
	    line  after	 the  slash.   This facility is	provided to
	    allow insertion of internal	program	documentation  into
	    the	source file for	a program.

       1.3  OUTPUT_FORMAT

       The output of the assembler is an object	file.	The  object
       file produced by	the assembler contains at least	the follow-
       ing three sections:

	  .text	 This is an initialized	 section,  normally  it	 is
		 read  only  and  contains the code from a program.
		 It may	also contain read only tables.

	  .data	 This is an initialized	 section,  normally  it	 is
		 readable  abd	writable.   It contains	initialized
		 data.	These can be scalers or	tables.

	  .bss	 This is an uninitialized section.   Space  is	not
		 allocated for this segment in the coff	file.

       An optional section, .comment may also be produced (See	the
       section "Pseudo Ops").

       Every statement in the input assembly language program  that
       generates  code or data generates it into one of	these three
       sections.  The section into which the generated bytes are to
       be  written  starts  out	as .text, and can be switched using
       section control pseudo operations.

       The assembler can produce object	modules	 with  any  one	 of
       four  (4)  different magic numbers.  Each magic number indi-
       cates that a different (incompatible) function  linkage	has
       been used.

       The -x option must be specified to get an object	 file  with
       48-bit  pointers.  The default object file type (with no	 -x
       option) is a 32-bit pointer object file.

       The -x option does not change the output	code - the handling
       of  48-bit  addresses must be done in the assembly code,	byt
       the programer.  The -x option tells the assembler what  type
       of magic	number to put into the coff file.














				  - 5 -



			 SYMBOLS AND EXPRESSIONS


       2.  SYMBOLS_and_EXPRESSIONS

       2.1  Values

       Values are represented in the assembler by 32 bit  2's  com-
       pliment	values.	  All arithmetic is performed using 32 bits
       of precision.  Note that	the values used	in a  386  instruc-
       tion may	use 8, 16, or 32 bits.

       2.1.1  Types  Every value is an instance	one of the  follow-
       ing types:

       Undefined
	     An	undefined symbol is one	whose  value  has  not	yet
	     been  defined.  Examples of undefined symbols are for-
	     ward references and externals.

       Absolute
	     An	absolute type is one whose value  does	not  change
	     with  relocation.	 Examples  of  absolute	symbols	are
	     numeric constants and expressions whose  operands	are
	     only numeric constants.

       Text
	     A text type symbol	is one whose value is  relative	 to
	     the text segment.

       Data
	     A data type symbol	is one whose value is  relative	 to
	     the data segment.

	Bss
	     A bss type	symbol is one whose value  is  relative	 to
	     the bss segment.

       Any of the above	symbol types can  be  given  the  attribute
       EXTERNAL.

       2.2  Symbols

       A symbol	has a value and	a type	each  of  which	 is  either
       specified explicitly by an assignment statement or from it's
       context.	 Refer to section 2.3 (Expressions) for	the regular
       expression definition of	a symbol.

       2.2.1  Reserved_Symbols	The following symbols are  reserved
       by the assembler.












				  - 6 -



	  .  Commonly refered to as  dot.   This  is  the  location
	     counter  while  assembling	a program.  It takes on	the
	     current location in the text, data, or bss	section.

       .text
	     This symbol is of type text.  It is used to label	the
	     beginning	of  a  text  section  in  the program being
	     assembled.

       .data
	     This symbol is of type data.  It is used to label	the
	     beginning	of  a  data  section  in  the program being
	     assembled.

       .bss
	     This symbol is of type bss.  It is	used to	 label	the
	     beginning of a bss	section	in the program being assem-
	     bled.

       2.3  Expressions

       2.3.1  General  The expressions accepted	 by  the  UNIX	386
       assembler  can  be described by their semantic and syntactic
       rules.

       The following are the operators supported by the	assembler:

	    OPERATOR		ACTION
	    ---------------------------
	       +	       addition
	       -	       subtraction
	       \*	       multiplication
	       \/	       division
	       &	       bit wise	logical	and
	       |	       bit wise	logical	or
	       >	       right shift
	       <	       left shift
	       \%	       remainder operator
	       !	       bit wise	logical	and not

       In the  following  syntactic  rules  the	 non-terminals	are
       represented by lower case letters.  The terminal	symbols	are
       represented by upper case letters and the  symbols  enclosed
       in double quotes	("") are terminal symbols.

       There is	no precedence to the  operators.   Square  brackets
       must be used to establish precedence.















				  - 7 -



	       SYNTACTIC RULES FOR THE ASSEMBLER

	       expr    : term
		       | expr "+" term
		       | expr "	term
		       | expr "/" term
		       | expr "&" term
		       | expr "|" term
		       | expr ">" term
		       | expr "<" term
		       | expr "" term
		       | expr "!" term
		       | expr "-" term
		       ;


	       term    : id
		       | number
		       | "-" term
		       | "[" expr "]"
		       | "<o>" term
		       | "<s>" term
		       ;

	       id      : LABEL
		       ;

	       number  : DEC_VAL
		       | HEX_VAL
		       | OCT_VAL
		       | BIN_VAL
		       ;

       The Terminal nodes can be described by the following regular
       expressions.


	       LABEL   = [a-zA-Z_][a-zA-Z0-9_]*:
	       DEC_VAL = [1-9][0-9]*
	       HEX_VAL = 0[Xx][0-9a-fA-F][0-9a-fA-F]*
	       OCT_VAL = 0[0-7]*
	       BIN_VAL = 0[Bb][0-1][0-1]*

       In the above regular expressions	 choices  are  enclosed	 in
       square brackets,	a range	of letters or numbers are separated
       by a dash (-), and the star (*) indicates zero (0)  or  more
       instances of the	previous character.

       Semantically the	expressions fall into two groups, they	are
       absolute	 and  relocatable.   The  following table shows	the
       legal combinations of absolute and relocatable operands,	for











				  - 8 -



       the  addition  and  subtraction operators.  All other opera-
       tions are only legal on absolute	valued expressions.

       All numbers have	the absolute attribute.	  Symbols  used	 to
       reference  storage,  text  or  data, are	relocatable.  In an
       assignment statement Symbols on the left	hand  side  inherit
       their relocation	attributes from	the right hand side.

       In the table "a"	is an absolute valued expression,  and	"r"
       is  a  relocatable valued expression.  The resulting type of
       the operation is	given to the right of the equal	sign.


	       a + a = a
	       r + a = r

	       a - a = a
	       r - a = r
	       r - r = a


       In the last example, the	 relocatable  expressions  must	 be
       declared	before their difference	can be taken.

       Following are some examples of valid expressions:

	 1.  label"

	 2.  $label"

	 3.  [label + 0x100]"

	 4.   [label1 -	label2]"

	 5.   $[label1 - label2]"

       Following are some examples of invalid expressions:

	 1.   [$label -	$label]"

	 2.   [label1 *	5]"

	 3.   (label + 0x20)"



















				  - 9 -



			    PSEUDO OPERATIONS

       .align val	   The align pseudo op causes the next data
			   generated to	be aligned modulo val.	Val
			   must	 be an positive	integer	value.

       .bcd val		   The bcd pseudo  op  generates  a  packed
			   decimal  (80-bit) value into	the current
			   section.  This is not valid for the .bss
			   section.   Val  is  a non-floating point
			   constant.

       .bss		   The bss pseudo op  changes  the  current
			   section to .bss

       .bss tag, bytes	   Define symbol tag in	 the  .bss  section
			   and	add  bytes  to the value of dot	for
			   .bss.  This does not	change the  current
			   section  to .bss.  Tag is a symbol name.
			   Bytes must be an positive integer value.

       .byte val [,val]	   The byte pseudo op generates	initialized
			   bytes into the current section.  This is
			   not valid for .bss.	Each  val  must	 be
			   an 8-bit value.

       .comm name, expr	   The comm pseudo op allocates	storage	 in
			   the	 .data	section.   The	storage	 is
			   referenced by name, and has	a  size	 in
			   bytes  of expr.  Name is a symbol.  Expr
			   must	be an positive integer.	  The  name
			   can not be pre-defined.

       .data		   The data pseudo op changes  the  current
			   section to .data.

       .double val	   The double pseudo op	generates an  80287
			   long	real (64-bit) into the current sec-
			   tion.  Not valid the	.bss section.	Val
			   is a	floating point constant.

       .even		   The even pseudo op  aligns  the  current
			   program  counter,  (.)  to an even boun-
			   dary.

       .float val	   The float pseudo op	generates  a  80287
			   short  real	(32  bit)  into	the current
			   section.  This is not valid in the  .bss
			   section.   Val  is a	floating point con-
			   stant.












				  - 10 -



       .globl name	   This	pseudo op makes	the variable, name,
			   accessible to other programs.

       .ident string	   The ident pseudo op creates an entry	 in
			   the	comment	 section containing string.
			   String is any  sequence  of	characters,
			   not including the double quote, '"'.

       .lcomm name, expr   The lcomm pseudo op allocates storage in
			   the .bss section.  The storage is refer-
			   enced by name, and has a size  of  expr.
			   Name	 is a symbol.  Expr must be of type
			   positive integer.  Name can not be  pre-
			   defined.

       .long val	   The long  pseudo  op	 generates  a  long
			   integer  (32-bit two's complement value)
			   into	the current section.   This  pseudo
			   op is not valid for the .bss	section	Val
			   is a	non-floating point constant.

       .noopt		   The noopt pseudo op

       .optim		   The optim pseudo op

       .set name, expr	   The set pseudo op sets the value of sym-
			   bol name to expr.  This is equivalent to
			   an assignment.

       .string str	   This	pseudo places the characters in	str
			   into	 the  object  module at	the current
			   loc and terminates  the  string  with  a
			   null.   The	string	must be	enclosed in
			   double quotes ("").	This pseudo  op	 is
			   not valid for the .bss section.

       .text		   The text pseudo op defines  the  current
			   section as .text.

       .value expr [,expr] The value pseudo op is used to  generate
			   an  initialized  word (16-bit two's com-
			   plement value) into the current section.
			   This	 pseudo	op is not valid	in the .bss
			   section.  Each expr	must  be  a  16-bit
			   value.

       .version	string	   The version pseudo op puts  the  C  com-
			   piler version into the comment section.














				  - 11 -



			      SDB PSEUDO OPS

       .type expr	   The type pseudo op is  used	with  in  a
			   .def-.endef pair.  It gives the name	the
			   C compiler type representation expr.

       .val expr	   The val pseudo op is	used with  a  .def-
			   .endef pair.	 It gives name the value of
			   expression.	The type of expr determines
			   the section for name.

       .tag str		   The tag pseudo op is	 used  in  relation
			   with	 a  previously	defined	.def pseudo
			   op.	If the name of a .def is  a  struc-
			   ture	 or a union, str should	be the name
			   of that structure or	union  tag  defined
			   in a	previous .def-.endef pair.

       .size expr	   The size pseudo op is used with the .def
			   pseudo op.  If name of .def is an object
			   such	as a structure or  an  array,  this
			   gives  it  a	 total	size of	expr.  Expr
			   must	be a positive integer.

       .scl expr	   The scl pseudo op is	used with the  .def
			   pseudo  op.	 With  in the .def it gives
			   name	the storage  class  of	expr.	The
			   type	of expr	should be positive.

       .line expr	   The line pseudo op is used with the .def
			   pseudo  op.	 It defines the	source line
			   number of the definition of symbol  name
			   in the .def.	 Expr should yield an posi-
			   tive	value.

       .ln line	[,addr]	   This	pseudo	op  provides  the  relative
			   source line number to the beginning of a
			   function.   It  is  used  to	 pass  info
			   through to sdb.

       .file name	   The file pseudo op is  the  source  file
			   name.   Only	 one  is allowed per source
			   file.  Name must be	between	 1  and	 14
			   characters.	This must be the first line
			   an assembly file.

       .endef		   The	endef  pseudo  op  is  the   ending
			   bracket for a .def.

       .def name	   The def  pseudo  op	starts	a  symbolic
			   description	 for   symbol	name.	See











				  - 12 -



			   .endef.  Name is a symbol name.

       .dim expr [,expr]   The dim pseudo op is	used with the  .def
			   pseudo  op.	If the name of a .def is an
			   array, the expressions give	the  dimen-
			   sions.  Up to 4 dimensions are accepted.
			   The type of each  expression	 should	 be
			   positive.






















































				  - 13 -



			   MACHINE INSTRUCTIONS


       3.  Machine_Instructions

       3.1  Differences	between	the UNIX 386 and the Intel 386
	    assemblers

       This section describes the instructions that  the  assembler
       accepts.	  The  detailed	specification of how the particular
       instructions operate are	not  included.	 The  operation	 of
       particular instructions is described in the Intel documenta-
       tion.

       The following describes the differences between the Unix	386
       and  Intel  386 assembly	languages.  This explanation covers
       all aspects of translation from Intel assembler to Unix	386
       assembler.

       This is a list of  the  differences  between  the  Unix	386
       assembly	language and Intel's.

	 1.  All register names	use percent sign (%) as	a prefix to
	     distinguish them from symbol names.

	 2.  Instructions with two (2) operands	use the	left as	the
	     source and	the right as the destination.  This follows
	     the UNIX system's	assembler  convention,	and  it	 is
	     reversed from Intel's notation.

	 3.  Most instructions that can	operate	on a byte, word, or
	     long  may	have "b", "w", or "l" appended to them.	 In
	     general when an opcode is specified with no type  suf-
	     fix,  it  defaults	 to  long.  In general the UNIX	386
	     assembler	derives	 its  type  information	 from	the
	     opcode,  where  as	 the Intel 386 assembler can derive
	     its type information from the operand types. Where	the
	     type information is derived, motivates the	b, w, and l
	     suffixes used in the Unix 386 assembler.























				  - 14 -



       3.2  Operands

       Three kinds of  operands	 are  generally	 available  to	the
       instructions:   register,  memory,  and	immediate operands.
       Full descriptions  of  each  type  appear  below.   Indirect
       operands	are available to jump and call instructions; but NO
       other instructions can use memory indirect operands.

       The assembler always assumes it is generating code for a	 32
       bit  segment.  So when 16 bit data is called for	( i.e. movw
       %ax, %bx	) it will automatically	generate the  16  bit  data
       prefix byte.

       Byte, Word, and Long registers are available  on	 the  80386
       processor.   The	 code  segment	(%cs),	instruction pointer
       (%eip), and the flag register are not available as  explicit
       operands	to the instructions.

       The names of the	byte, word, and	long registers available as
       operands	and a brief description	appear below:

	 1.  8-bit (byte) general registers

	     %al low byte of %ax register

	     %ah high byte of %ax register

	     %cl low byte of %cx register

	     %ch high byte of %cx register

	     %dl low byte of %dx register

	     %dh high byte of %dx register

	     %bl low byte of %bx register

	     %bh high byte of %bx register

	 2.  16-bit general registers

	     %ax low 16-bits  of %eax register

	     %cx low 16-bits  of %ecx register

	     %dx low 16-bits  of %edx register

	     %bx low 16-bits  of %ebx register

	     %sp low 16-bits  of the stack pointer (%esp)












				  - 15 -



	     %bp low 16-bits  of the frame pointer (%ebp)

	     %si low 16-bits  of the source index register (%esi)

	     %di low 16-bits  of  the  destination  index  register
		 (%edi)

	 3.  32-bit General Registers

	     %eax 32-bit accumulator

	     %ecx 32-bit general register

	     %edx 32-bit general register

	     %ebx 32-bit general register

	     %esp 32-bit stack pointer

	     %ebp 32-bit frame pointer

	     %esi 32-bit source	index register

	     %edi 32-bit destination index register

	 4.  Segment registers

	     %cs Code  segment	register,  all	references  to	the
		 instruction space use this register.

	     %ds Data segment register,	the default segment  regis-
		 ter for most references to memory operands.

	     %ss Stack segment register, the default segment regis-
		 ter  for  memory  operands  in	 the  stack.  (i.e.
		 default segment register  for	%bp  %sp  %esp	and
		 %ebp).

	     %es General purpose segment register
		 Some string instructions use this extra segment as
		 their default segment.

	     %fs General purpose segment register

	     %gs General purpose segment register

















				  - 16 -



       3.3  Instruction_Descriptions

       This section describes the Unix 5.3/386 instruction  syntax.
       Refer  to  section 3.13.13.1 for	the differences	between	the
       UNIX 386	and the	Intel 386 assemblers.

       Since the assembler assumes it is always	generating code	for
       a  32 bit segment it always assumes a 32	bit address, and it
       automatically predeeds word operations with a  16  bit  data
       prefix byte.

       In this section the following notation is used:

	 1.  The mnemonics are expressed in  a	regular	 expression
	     type syntax.  Alternatives	separated by a vertical	bar
	     (|) and enclosed with in square brackets, "[]", denote
	     one  of  them  must  be chosen.  Alternatives enclosed
	     with in curly braces, "{}", denote	one or none of	the
	     them may be used.	The vertical bar (|) separates dif-
	     ferent suffixes for  operators  or	 operands.   As	 an
	     example  when  an	8, 16, or 32 bit immediate value is
	     permitted	in   an	  instruction	we   would   write:
	     imm[8|16|32].

	 2.  imm[8|16|32|48] - any immediate  value,  as  they	are
	     defined above.  Immediate values are defined using	the
	     regular expression	syntax	previously  defined.   When
	     there  is a choice	between	operand	sizes the assembler
	     will choose the smallest representation.

	 3.  reg[8|16|32] - any	general	 purpose  register.   Where
	     each number indicates one of the following:
	     32: %eax, %ecx, %edx, %ebx, %esi, %edi,%ebp, %esp.
	     16: %ax, %cx, %dx,	%bx, %si, %di, %bp, %sp.
	     8:	%al, %ah, %cl, %ch, %dl, %dh, %bl, %bh.

	 4.  mem[8|16|32|48] - any memory operand.  The	8, 16,	32,
	     and  48  suffixes	represent  byte,  word,	 dword,	and
	     inter-segment memory address quantities, respectively.

	 5.  r/m[8|16|32] - any	general	purpose	register or  memory
	     operand.  The operand type	is determined from the suf-
	     fix.  They	are 8 =	byte, 16 = word, and  32  =  dword.
	     The  registers  for  each operand size are	the same as
	     reg[8|16|32] above.

	 6.  creg - any	control	register The control registers are:
	     %cr0, %cr2, or %cr3.

	 7.  dreg - the	debug register.	 The debug  registers  are:
	     %db0, %db1, %db2, %db3, %db6, %db7.











				  - 17 -



	 8.  sreg - any	segment	register The segment registers are:
	     %cs, %ds, %ss, %es, %fs, %gs.

	 9.  treg - the	test register.	 The  test  registers  are:
	     %tr6 and %tr7

	10.  cc	- condition codes.  The	condition codes	are:

	       1.  a	   - jmp above

	       2.  ae	   - above or equal

	       3.  b	   - below

	       4.  be	   - below or equal

	       5.  c	   - carry

	       6.  e	   - equal

	       7.  g	   - greater

	       8.  ge	   - greater than or equal to

	       9.  l	   - less than

	      10.  le	   - less than or equal	to

	      11.  na	   - not above

	      12.  nae	   - not above or equal	to

	      13.  nb	   - not below

	      14.  nbe	   - not above or equal	to

	      15.  nc	   - no	carry

	      16.  ne	   - not equal

	      17.  ng	   - not greater than

	      18.  nge	   - not greater than or equal to

	      19.  nl	   - not less than

	      20.  nle	   - not less than or equal to

	      21.  no	   - not over flow













				  - 18 -



	      22.  np	   - not parity

	      23.  ns	   - not sign

	      24.  nz	   - not zero

	      25.  o	   - overflow

	      26.  p	   - parity

	      27.  pe	   - parity even

	      28.  po	   - parity odd

	      29.  s	   - sign

	      30.  z	   - zero

	11.  disp[8|32]	- the number of	bits  used  to	define	the
	     distance of a relative jump.  Since the assembler only
	     supports a	32  bit	 address  space	 only  8  bit  sign
	     extended, and 32 bit address are supported.

	12.  immPtr - When the immediate form of a long	call  or  a
	     long  jump	is used	the selector and offset	are encoded
	     as	an immediate pointer (immPtr).

       Addressing modes
       Represented by: [sreg:][offset][([base][,index][,scale])].
       Where all the items in the square brackets are optional,	and
       at  least one is	necessary.  If any of the items	in side	the
       parenthesis are used the	parenthesis are	mandatory.

       Sreg is a segment register over ride prefix.  It	may be	any
       segment	register.  If a	segment	over ride prefix is present
       it must be followed by a	colon (:), before the  offset  com-
       ponent  of  the address.	 Sreg does not represent an address
       by itself.  An address must contain an offset component.

       Offset is a displacement	from a segment	base.	It  may	 be
       absolute	 or  relocatable.  A label is an example of a relo-
       catable offset.	A number  is  an  example  of  an  absolute
       offset.

       Base and	index can be any 32 bit	register.  Scale is a  mul-
       tiplication  factor  for	 the  index register field.  Please
       refer to	the Intel documentation	for  more  details  on	the
       80386 addressing	modes.

       Following are some examples of addresses:












				  - 19 -



       movl var, %eax
	     Move the contents of memory location var into %eax.

       movl %cs:var, %eax
	     Move the contents of the memory location, var  in	the
	     code segment into %eax.

       movl $var, %eax
	     Move the address of var into %eax.

       movl array_base(%esi), %eax
	     Add the address of	memory location	array_base  to	the
	     content of	%esi to	get an address in memory.  Move	the
	     content of	this address into %eax.

       movl (%ebx, %esi, 4), %eax
	     Multiply the  content of %esi by 4, add  this  to	the
	     content  of %ebx, to produce a memory reference.  Move
	     the content of this memory	location into %eax.

       movl struct_base(%ebx, %esi, 4),	%eax
	     Multiply the content of %esi by 4,	 add  this  to	the
	     content   of   %ebx,   add	 this  to  the	address	 of
	     struct_base, to produce an	address.  Move the  content
	     of	this address into %eax.

       A note about expressions	and immediate values.  An immediate
       value is	an expression preceded by a dollar sign.

	       immediate: "$" expr

       Immediate values	carry the absolute  or	relocatable  attri-
       butes  of  their	expression component.  Immediate values	can
       not be used in an expression.

       Immediate values	should be considered  as  another  form	 of
       address.	 The immediate form of address.

       3.3.1  Processor_Extension_Instructions	Please refer to	the
       chapter on floating point support.






















				  - 20 -



       3.3.1.1	Control_and_Test_Register_Instructions

	 1.  mov{l}  creg, reg32

	 2.  mov{l}  dreg, reg32

	 3.  mov{l}  reg32, creg

	 4.  mov{l}  reg32, dreg

	 5.  mov{l}  treg, reg32

	 6.  mov{l}  reg32, treg

       NOTE: The Unix assembler	accepts	"mov" or "movl"	as  exactly
       the  same  instruction  for  the	 control  and test register
       group.

       3.3.1.2	New_Condition_Code_Instructions

	 1.  jcc     disp32

	 2.  setcc   r/m8







































				  - 21 -



       3.3.1.3	Bit_Instructions  All the new bit instructions	are
       only defined for	word and long register or memory operands.

	 1.  bt{wl}	 reg[16|32], r/m[16|32]

	 2.  bt{wl}	 imm8, r/m[16|32]

	 3.  bts{wl}	 imm8, r/m[16|32]

	 4.  bts{wl}	 reg[16|32], r/m[16|32]

	 5.  btr{wl}	 imm8, r/m[16|32]

	 6.  btr{wl}	 reg[16|32], r/m[16|32]

	 7.  btc{wl}	 imm8, r/m[16|32]

	 8.  btc{wl}	 reg[16|32], r/m[16|32]

	 9.  bsf{wl}	 reg[16|32], r/m[16|32]

	10.  bsr{wl}	 reg[16|32], r/m[16|32]

	11.  shld{wl}	 imm8, reg[16|32], r/m[16|32]

	12.  shld{wl}	 reg[16|32], r/m[16|32]

	13.  shrd{wl}	 imm8, reg[16|32], r/m[16|32]

	14.  shrd{wl}	 reg[16|32], r/m[16|32]

       NOTE: All the bit operation mnemonics with out a	type suffix
       default to long.





























				  - 22 -



       3.3.1.4	New_Arithmetic_Instruction

	 1.  imul    r/m[16|32], reg[16|32]

       NOTE: This is the uncharacterized multiply.  It has a 16	 or
       32 bit product, as opposed to a 32 or 64	bit product.
























































				  - 23 -



       3.3.1.5	New_Move_with_Zero_or_Sign_Extension_Instructions

	 1.  movzbw   r/m8, reg16

	 2.  movzbl   r/m8, reg32

	 3.  movzwl   r/m16, reg32

	 4.  movsbw   r/m8, reg16

	 5.  movsbl   r/m8, reg32

	 6.  movswl   r/m16, reg32

       3.3.2  Data_Movement_Instructions

	 1.  clr{bwl}	     r/m[8|16|32]

	 2.  lea{wl}	     mem32, reg[16|32]

	 3.  mov{bwl}	     r/m[8|16|32], reg[8|16|32]

	 4.  mov{bwl}	     reg[8|16|32], r/m[8|16|32]

	 5.  mov{bwl}	     imm[8|16|32], r/m[8|16|32]

	 6.  pop{wl}	     r/m[16|32]

	 7.  popa{wl}

	 8.  push{bwl}	     imm[8|16|32]

	 9.  push{wl}	     r/m[16|32]

	10.  pusha{wl}

	11.  xchg{bwl}	     reg[8|16|32], r/m[8|16|32]

       NOTE1: pushb sign extends the immediate byte to a long,	and
       pushes a	long (4	bytes) onto the	stack.

       NOTE2: When a type suffix is not	used with a  data  movement
       mnemonic	the type defaults to long.  The	Unix assembler does
       not derive the type of the operands from	the operands.


















				  - 24 -



       3.3.3  Segment_Register_Instructions

	 1.  lds{wl}	     mem[32|48], reg[16|32]

	 2.  les{wl}	     mem[32|48], reg[16|32]

	 3.  lfs{wl}	     mem[32|48], reg[16|32]

	 4.  lgs{wl}	     mem[32|48], reg[16|32]

	 5.  lss{wl}	     mem[32|48], reg[16|32]

	 6.  movw	     sreg[cs|ds|ss|es] , r/m16

	 7.  movw	     r/m16, sreg[cs|ds|ss|es]

	 8.  popw	     sreg[ds|ss|es|fs|gs]

	 9.  pushw	     sreg[cs|ds|ss|es|fs|gs]

       NOTE1: The pushw	and popw push and pop  16  bit	quantities.
       This  is	 done  by  using  an data size over ride byte (OSP)
       byte.

       NOTE2: When the type suffix is not used with the	 lds,  les,
       lfs, lgs, and lss instructions a	48 bit pointer is assumed.

       NOTE3: Since the	assembler assumes no type  suffix  means  a
       type  of	 long, the type	suffix of "w" when working with	the
       segment registers is mandatory.
































				  - 25 -



       3.3.4  I/O_Instructions

	 1.  in{bwl}	     imm8

	 2.  in{bwl}	     %dx

	 3.  ins{bwl}	     %dx

	 4.  out{bwl}	     imm8

	 5.  out{bwl}	     %dx

	 6.  outs{bwl}	     %dx

       NOTE1: When the type suffix is left off the I/O instructions
       they  default to	long.  So in = inl, out	= outl,	ins = insl,
       and outs	= outsl.

       3.3.5  Flag_Instructions

	 1.  lahf

	 2.  sahf

	 3.  popf{wl}

	 4.  pushf{wl}

	 5.  cmc

	 6.  clc

	 7.  stc

	 8.  cli

	 9.  sti

	10.  cld

	11.  std

       NOTE: When the type suffix  not	used  the  pushf  and  popf
       instructions  default  to  long.	  Pushf	= pushfl and popf =
       popfl.  A pushw or popw will push or pop	a 16 bit  quantity.
       This is done by using the OSP prefix byte
















				  - 26 -



       3.3.6  Arithmetic/Logical_Instructions

	 1.  add{bwl}	     reg[8|16|32], r/m[8|16|32]

	 2.  add{bwl}	     r/m[8|16|32], reg[8|16|32]

	 3.  add{bwl}	     imm[8|16|32], r/m[8|16|32]

	 4.  adc{bwl}	     reg[8|16|32], r/m[8|16|32]

	 5.  adc{bwl}	     r/m[8|16|32], reg[8|16|32]

	 6.  adc{bwl}	     imm[8|16|32], r/m[8|16|32]

	 7.  sub{bwl}	     reg[8|16|32], r/m[8|16|32]

	 8.  sub{bwl}	     r/m[8|16|32], reg[8|16|32]

	 9.  sub{bwl}	     imm[8|16|32], r/m[8|16|32]

	10.  sbb{bwl}	     reg[8|16|32], r/m[8|16|32]

	11.  sbb{bwl}	     r/m[8|16|32], reg[8|16|32]

	12.  sbb{bwl}	     imm[8|16|32], r/m[8|16|32]

	13.  cmp{bwl}	     reg[8|16|32], r/m[8|16|32]

	14.  cmp{bwl}	     r/m[8|16|32], reg[8|16|32]

	15.  cmp{bwl}	     imm[8|16|32], r/m[8|16|32]

	16.  inc{bwl}	     r/m[8|16|32]

	17.  dec{bwl}	     r/m[8|16|32]

	18.  test{bwl}	     reg[8|16|32], r/m[8|16|32]

	19.  test{bwl}	     r/m[8|16|32], reg[8|16|32]

	20.  test{bwl}	     imm[8|16|32], r/m[8|16|32]

	21.  sal{bwl}	     imm8, r/m[8|16|32]

	22.  sal{bwl}	     %cl, r/m[8|16|32]

	23.  shl{bwl}	     imm8, r/m[8|16|32]

	24.  shl{bwl}	     %cl, r/m[8|16|32]













				  - 27 -



	25.  sar{bwl}	     imm8, r/m[8|16|32]

	26.  sar{bwl}	     %cl, r/m[8|16|32]

	27.  shr{bwl}	     imm8, r/m[8|16|32]

	28.  shr{bwl}	     %cl, r/m[8|16|32]

	29.  not{bwl}	     r/m[8|16|32]

	30.  neg{bwl}	     r/m[8|16|32]

	31.  bound{wl}	     reg[16|32], r/m[16|32]

	32.  and{bwl}	     reg[8|16|32], r/m[8|16|32]

	33.  and{bwl}	     r/m[8|16|32], reg[8|16|32]

	34.  and{bwl}	     imm[8|16|32], r/m[8|16|32]

	35.  or{bwl}	     reg[8|16|32], r/m[8|16|32]

	36.  or{bwl}	     r/m[8|16|32], reg[8|16|32]

	37.  or{bwl}	     imm[8|16|32], r/m[8|16|32]

	38.  xor{bwl}	     reg[8|16|32], r/m[8|16|32]

	39.  xor{bwl}	     r/m[8|16|32], reg[8|16|32]

	40.  xor{bwl}	     imm[8|16|32], r/m[8|16|32]

       NOTE: When the type suffix is not included in an	 arithmetic
       or logical instruction it defaults to a long.




























				  - 28 -



       3.3.7  Multiply_and_Divide

	 1.  imul{wl} imm[16|32], r/m[16|32], reg[16|32]

	 2.  mul{bwl}	     r/m[8|16|32]

	 3.  div{bwl}	     r/m[8|16|32]

	 4.  idiv{bwl}	     r/m[8|16|32]

       NOTE: When the type suffix is not included in a multiply	 or
       divide instruction it defaults to a long.

       3.3.8  Conversion_Instructions

	 1.  cbtw

	 2.  cwtd

	 3.  cwtl

	 4.  cltd

       NOTE: convert byte to word: %al -> %ax
       convert word to double: %ax -> %dx:%ax
       convert word to long: %ax -> %eax
       convert long to double: %eax -> %edx:%eax

       3.3.9  Decimal_Arithmetic_Instructions

	 1.  daa

	 2.  das

	 3.  aaa

	 4.  aas

	 5.  aam

	 6.  aad

       3.3.10  _Coprocessor_Instructions

	 1.  wait

	 2.  esc















				  - 29 -



       3.3.11  String_Instructions

	 1.  movs[bwl]

	 2.  movs    -	     same as movsl

	 3.  smov[bwl]	     same as movs[bwl]

	 4.  smov    -	     same as smovl

	 5.  cmps[bwl]

	 6.  cmps    -	     same as cmpsl

	 7.  scmp[bwl]	     same as cmps[bwl]

	 8.  scmp    -	     same as scmpl

	 9.  stos[bwl]

	10.  stos    -	     same as stosl

	11.  ssto[bwl]	     same as stos[bwl]

	12.  ssto    -	     same as sstol

	13.  lods[bwl]

	14.  lods    -	     same as lodsl

	15.  slod[bwl]	     same as lods[bwl]

	16.  slod    -	     same as slodl

	17.  scas[bwl]

	18.  scas    -	     same as scasl

	19.  ssca[bwl]	     same as scas[bwl]

	20.  ssca    -	     same as sscal

	21.  xlat

	22.  rep

	23.  repnz

	24.  repz













				  - 30 -



       NOTE: All Intel string op mnemonics default to longs.





























































				  - 31 -



       3.3.12  _Procedure_Call_and_Return

	 1.  lcall   immPtr

	 2.  lcall   r/m48 (indirect)

	 3.  lret

	 4.  lret    imm16

	 5.  call    disp32

	 6.  call    r/m32 (indirect)

	 7.  ret

	 8.  ret     imm16

	 9.  enter   imm16, imm8

	10.  leave

       3.3.13  Jump_Instructions

	 1.  jcc     disp[8|32]

	 2.  jcxz    disp[8|32]

	 3.  loop    disp[8|32]

	 4.  loopnz  disp[8|32]

	 5.  loopz   disp[8|32]

	 6.  jmp     disp[8|32]

	 7.  ljmp    immPtr

	 8.  jmp     r/m32	     (indirect)

	 9.  ljmp    r/m48	     (indirect)

       NOTE: The UNIX  386  assembler  optimizes  for  SDI's  (Span
       Dependent  Instructions).  So intra-segment jumps are optim-
       ized to their short forms when possible.

       3.3.14  Interrupt_Instructions

	 1.  int     3













				  - 32 -



	 2.  int     imm8

	 3.  into

	 4.  iret

























































				  - 33 -



       3.3.15  Protection_Model_Instructions

	 1.  sldt    r/m16

	 2.  str     r/m16

	 3.  lldt    r/m16

	 4.  ltr     r/m16

	 5.  verr    r/m16

	 6.  verw    r/m16

	 7.  sgdt    r/m32

	 8.  sidt    r/m32

	 9.  lgdt    r/m32

	10.  lidt    r/m32

	11.  smsw    r/m32

	12.  lmsw    r/m32

	13.  lar     r/m32, reg32

	14.  lsl     r/m32, reg32

	15.  clts

       3.3.16  Miscellaneous_Instructions

	 1.  lock

	 2.  nop

	 3.  hlt

	 4.  addr16

	 5.  data16



















				  - 34 -



	   TRANSLATION TABLES FOR UNIX TO INTEL	FLOAT MNEMONICS

       The following tables show the relationship between the  Unix
       and  Intel  mnemonics.  The mnemonics are organized into	the
       same functional categories  as  the  Intel  mnemonics.	The
       Intel  mnemonics	 appear	in section two of the 80287 numeric
       supplement.


       The notational conventions  used	 in  the  table	 are:  When
       letters	appear	with in	square brackets	, "[]",	exactly	one
       of the letters are required.   If  letters  appear  with	 in
       curly  braces,  "{}", then either one or	none of	the letters
       are required.  When a a group of	letters	is  separated  from
       other  letters  by  a  bar,  "|", with in square	brackets or
       curly braces then the group of letters between the bars or a
       bar  and	a closing bracket or brace are considered an atomic
       unit.  As an example, "fld[lst] means: fldl, flds, or  fldt.
       Where  fst{ls}  means:  fst,  fstl, or fsts.  And fild{l|ll}
       means: fild, fildl, or fildll.


       The Unix	operators are built from  the  Intel  operators	 by
       adding  suffixes	 to  them.  The	80287 deals with three data
       types, integer, packed decimal, and reals.  The Unix  assem-
       bler is not typed.  So the operator has to carry	with it	the
       type of data item it is operating on.  If the  operation	 is
       on  an  integer	the following suffixes apply: l	for Intel's
       short (32 bit), and ll for Intel's long (64 bits).   If	the
       operator	 applies  to reals then: s is short (32	bits), l is
       long (64	bits), and t is	temporary real (80 bits).































				  - 35 -



		       Real Transfers
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fld[lst]	      |	       fld	       load real
	       fst{ls}	      |	       fst	       store real
	       fstp{lst}      |	       fstp	       store real and pop
	       fxch	      |	       fxch	       exchange	registers


		     Integer Transfers
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fild{l|ll}     |	       fild	       integer load
	       fist{l}	      |	       fist	       integer store
	       fistp{l|ll}    |	       fistp	       integer store and pop


		 Packed	Decimal	Transfers
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fbld	      |	       fbld	       Packed decimal (BCD) load
	       fbstp	      |	       fbstp	       Packed decimal (BCD) store and pop

			  Addition
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fadd{ls}	      |	       fadd	       real add
	       faddp	      |	       faddp	       real add	and pop
	       fiadd{l}	      |	       fiadd	       integer add



			 Subtraction
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fsub{ls}	      |	       fsub	       subtract	real
	       fsubp	      |	       fsubp	       subtract	real and pop
	       fsubr{ls}      |	       fsubr	       subtract	real reversed
	       fsubrp	      |	       fsubrp	       subtract	real reversed and pop
	       fisub{l}	      |	       fisub	       integer subtract
	       fisubr{l}      |	       fisubr	       integer subtract	reverse


		       Multiplication
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fmul{ls}	      |	       fmul	       multiply	real
	       fmulp	      |	       fmulp	       multiply	real and pop
	       fimul{l}	      |	       fimul	       integer multiply













				  - 36 -



			  Division
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fdiv{ls}	      |	       fdiv	       divide real
	       fdivp	      |	       fdivp	       divide real and pop
	       fdivr{ls}      |	       fdivr	       divide real reversed
	       fdivrp	      |	       fdivrp	       divide real reversed and	pop
	       fidiv{l}	      |	       fidiv	       integer divide
	       fidivr{l}      |	       fidivr	       integer divide reversed


		Other Arithmetic Operations
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fsqrt	      |	       fsqrt	       square root
	       fscale	      |	       fscale	       scale
	       fprem	      |	       fprem	       partial remainder
	       frndint	      |	       frndint	       round to	integer
	       fxtract	      |	       fxtract	       extract exponent	and significand
	       fabs	      |	       fabs	       absolute	value
	       fchs	      |	       fchs	       change sign


		  Comparison Instructions
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fcom{ls}	      |	       fcom	       compare real
	       fcomp{ls}      |	       fcomp	       compare real and	pop
	       fcompp	      |	       fcompp	       compare real and	pop twice
	       ficom{l}	      |	       ficom	       integer compare
	       ficomp{l}      |	       ficomp	       integer compare and pop
	       ftst	      |	       ftst	       test
	       fxam	      |	       fxam	       examine


		Transcendental Instructions
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fptan	      |	       fptan	       partial tangent
	       fpatan	      |	       fpatan	       partial arctangent
	       f2xm1	      |	       f2xm1	       2^x - 1
	       fyl2x	      |	       fyl2x	       Y * log2X
	       fyl2xp1	      |	       fyl2xp1	       Y * log2(X+1)



















				  - 37 -



		   Constant Instructions
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       fldl2e	      |	       fldl2e	       load logeE
	       fldl2t	      |	       fldl2t	       load log2 10
	       fldlg2	      |	       fldlg2	       load log2 2
	       fldln2	      |	       fldln2	       load loge2
	       fldpi	      |	       fldpi	       load pie
	       fldz	      |	       fldz	       load + 0


	       Processor Control Instructions
	       UNIX	      |	       INTEL	       Operation
	       =================================================
	       finit/fnint    |	       finit/fnint     initialize processor
	       fnop	      |	       fnop	       no operation
	       fsave/fnsave   |	       fsave/fnsave    save state
	       fstcw/fnstcw   |	       fstcw/fnstcw    store control word
	       fstenv/fnstenv |	       fstenv/fnstenv  store environment
	       fstsw/fnstsw   |	       fstsw/fnstsw    store status word
	       frstor	      |	       frstor	       restore state
	       fsetpm	      |	       fsetpm	       set protected mode
	       fwait	      |	       fwait	       CPU wait
	       fclex/fnclex   |	       fclex/fnclex    clear exceptions
	       fdecstp	      |	       fdecstp	       decrement stack pointer
	       ffree	      |	       ffree	       free registers
	       fincstp	      |	       fincstp	       increment stack pointer



































				  - 38 -
-- 
// marc at dumbcat.sf.ca.us
// {ames,decwrl,sun}!pacbell!dumbcat!marc



More information about the Comp.unix.sysv386 mailing list