Another Assembly Tutorial

Assembly language is very unique. This tutorial will cover 16-bit x86 assembly language, which means we are going to make programs that’s originally intended to run on DOS.
Here are the prerequisites for this tutorial:

  • System: MS-DOS or 32-bit Windows or DOSBox
  • FASM (Flat Assembler)

Now let’s get into what you need to know. In assembly language there are a few CPU registers that you can manipulate for fast processing of small amounts of data. Here are some of them:

  • AX = Accumulator Register. Specifies sub-function to execute during system calls and contains the return value for many system calls and instructions.
  • BX = Base Register. Can be used as pointer. Usually used to hold address of data or a file handle.
  • CX = Count Register. Used to specify an integer value and is used as a loop counter for the LOOP instruction.
  • DX = Data Register. Used to specify additional data or as a pointer to data.
  • SI = Source Index Register. Can be used as a pointer. Points to input source during memory or string manipulations
  • DI = Destination Index Register. Can be used as pointer. Points to output destination during memory or string manipulations.
  • BP = Base Pointer. Don’t modify unless you know what you’re doing. Points to bottom of stack.
  • SP = Stack Pointer. Don’t directly modify unless you know what you’re doing. Points to top of stack.

Each of these registers are 2 bytes long or 16-bits, hence the name 16-bit Assembly. But registers AX, BX, CX, and DX can also be referenced by their 1 byte halves. You can do this by specifying the high half (the left side) or the low half (right side) of the register. Here’s a visual of these registers and their names:

     +----+----+       +----+----+
AX = | AH | AL |  BX = | BH | BL |
     +----+----+       +----+----+
     +----+----+       +----+----+
CX = | CH | CL |  DX = | DH | DL |
     +----+----+       +----+----+

In order to make system calls in 16-bit asm, you just call the appropriate interrupt. The instruction for calling an interrupt is INT. The single 1 byte operand for that instruction specifies which interrupt to call. When you call an interrupt, the system performs the appropriate operation. Since 1 byte is limited to 256 values (0x00 – 0xFF), then room for more functionality has been extended with each interrupt call. Usually, before you call an interrupt, you would place a sub-function index number in AX with additional required parameters in the other registers then execute the call.

Let’s give an example of doing that with our first hello world code which can be assembled with FASM:

org 100h
mov ah,9
mov dx,msg
int 21h
int 20h
msg db "Hello world!$"

So we start off with the assembler directive known as “org 100h”, this means that our program executable code will start at offset address 100h (256) within our file. With simple executable files in COM format, this is the address of the program start. All the data before this address gets filled out by DOS to specify program information including the command-line arguments.

With x86 intel assembly instructions, the format of an asm instruction is as follows:

INSTRUCTION DEST,SRC

An instruction never takes more than 2 operands. Our first instruction is MOV. That means to copy the value specified by SRC into the area specified by DEST. So “mov ah,9” means to set the value of the AH register to 9. Since AH would equal 9, then the whole AX register would equal 0900 because AH is the upper 1 byte half of AX.

Next is “mov dx,msg”, as you can see at the bottom of our code we have “msg db “Hello world!$”. There we have declared a variable, so to speak. It is bytes in memory with the ascii value “Hello world!” (db means declare byte). Thanks to our assembler, we can reference that area of memory by it’s address by specifying the name we assigned to it; which in this case is msg. So “mov dx,msg” means to place the address of our string into DX.

But guess what? this address is 2 bytes long, that means our pointers can only address from 0000h to FFFFh. That’s 65536 bytes of memory maximum for our com file. Since a kilobyte is 1024 bytes, that means (65536 / 1024) our com file can only be a max of 64k. But don’t worry, for a beginner like you that should be plenty of space for now.

So now we have come to the part where we actually do something. The last 2 instruction were to set up for this interrupt call. the sub-function number was placed in AH. 9 means that we want to print a ‘$’ terminated string (when calling int 21h), the parameter for this function is placed in DX which is the address of the string to print. Now we execute the call with INT 21h to print it. Finally INT 20h. This interrupt terminates the program whenever you call it.

And that’s our program! In order to run it, save it then assemble or compile, then open the output com file through DOS or cmd. Next time I’ll explain more. But I hope you enjoyed this hello world introductory for assembly language.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: