Apple I Replica Creation -- Chapter 6: Programming in Assembly
Chapter 6 |
Programming in Assembly |
Introduction
Programming in Assembly, a low-level programming language, can be difficult. However, it's also a very rewarding process that allows us to get to the core of how the processor interacts with other components. The programmer issues arcane instructions to the processor and those instructions are exactly what the processor does. BASIC, by contrast, is a high-level language. Instructions in BASIC are translated into a series of Assembly instructions. Thus with BASIC, it's difficult to know precisely what the processor is doing, whereas with Assembly, we control the processor directly.
Using the Monitor
Power on your Apple I replica and hit Reset. The \ prompt means you've entered the Monitor. In the Monitor, it's possible to read and write data to and from memory. Everything is presented in hexadecimal format. To view the contents of location $FFEF, for example, type:
FFEF
The output will be:
FFEF: 2C
It is also possible to view a range of addresses by specifying the beginning and end address with a period in between, such as:
FFF0.FFFF
FFF0: 12 D0 30 FB 8D 12 D0 60
FFF8: 00 00 00 0F 00 FF 00 01
Note that only the address for the first value of each line is given. $FFF0 contains $12, $FFF1 contains $D0, etc. Each line displays the contents of eight bytes.
Non-consecutive memory locations can be viewed by separating each address by a space:
EFA1 FFEF FFFF
FFA1: CA
FFEF: 2C
FFFF: 01
All the addresses we've been looking at are in the $FFxx range. This range was chosen because $FFxx contains the Monitor, which guarantees we'll be looking at known values. From here onward, we'll be working in main memory. If you're using your own replica with the traditional 8 KB RAM, you have 4 KB of main memory, located at $0000.0FFF. If you're using a Replica I, you have $0000.7FFF (32 KB) available.
We can edit data in a similar manner to how we view it. To insert $AA in $0000, type:
0000:AA
The output will be the previous content of $0000:
0000: 02
Now, examine $0000 to view the new contents:
0000
0000: AA
It is also possible to enter multiple bytes of data at once. For example:
0000: 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14
0000: AA
0000.0017
0000: 01 02 03 04 05 06 07 08
0008: 09 0A 0B 0C 0D 0E 0F 10
0010: 11 12 13 14 68 FF 02 FF
Finally, to begin running a program, specify the line number and append an R for RUN. To run the program we just reviewed, for example, you would type:
0000R
If you try this right now, the system will hang because the program is meaningless. Hit Reset to recover. You should now have a basic working understanding of the Monitor, which will allow you to enter programs. If you'd like to learn more, see the Apple I Operations Manual found at the Apple I Owners Club.
Set Up the Assembler
Assembly code is written using mnemonics such as LDA and JMP. An assembler translates these mnemonics into hexadecimal codes that can be understood by the processor. The conversion can be done by hand. Appendix B shows the conversions to opcodes. Assemble a program by hand, and you can type it directly into the Apple I Monitor and run it. The conversion is simple, but very tedious. A much easier method is to use an assembler program such as xa65 by Andre Fachat and Cameron Kaiser.
xa65 is released under the GNU General Public License. Source code is included in the Supplemental Software package as well as a compiled version for Mac OS X. If you're running OS X, copy the application's directory to your hard drive. Open Terminal (in Applications/Utilities) and navigate to the xa65 folder. Your command will look something like this, depending upon to where you copied the folder:
cd /Users/owad/xa-2.1.4h
"cd" stands for "change directory." To run the program, type:
./xa
This will present the help screen:
Cross-Assembler 65xx V2.1.4h 12dec1998 (c) 1989-98 by A.Fachat
usage : xa { option | sourcefile }
options:
-v = verbose output
-x = old filename behaviour (overrides -o, -e, -l)
-C = no CMOS-opcodes
-B = show lines with block open/close
-c = produce o65 object instead of executable files (i.e. do not link)
-o filename = sets output filename, default is 'a.o65'
A filename of '-' sets stdout as output file
-e filename = sets errorlog filename, default is none
-l filename = sets labellist filename, default is none
-r = adds crossreference list to labellist (if -l given)
-M = allow ":" to appear in comments, for MASM compatibility
-R = start assembler in relocating mode
-Llabel = defines 'label' as absolute, undefined label even when linking
-b? adr = set segment base address to integer value adr.
'?' stands for t(ext), d(ata), b(ss) and z(ero) segment
(address can be given more than once, latest is taken)
-A adr = make text segment start at an address that when the _file_
starts at adr, relocation is not necessary. Overrides -bt
Other segments have to be take care of with -b?
-G = suppress list of exported globals
-DDEF=TEXT = defines a preprocessor replacement
-Idir = add directory 'dir' to include path (before XAINPUT)
Environment:
XAINPUT = include file path; components divided by ','
XAOUTPUT= output file path
There are only a couple of options we're worried about at the moment: -v, -o, -bt, and -C. -v turns on verbose output. This provides more error messages, more detailed descriptions, and a better idea of what's going on. If something's not working right, turn on this option to get more details on the problem. -o lets us specify what the output file will be called. -bt allows us to specify the starting address of the code. Note that to avoid modifying blocks used by the Monitor, our programs will start at location $0280 (decimal 640). -C turns off support for opcodes that were added in the 65c02. Since we won't be using these opcodes, this isn't necessary at the present time, but if you're assembling someone else's code, turning on this option will warn you if they've used any opcodes that your processor doesn't support.
Our typical assemble command will look like this:
./xa -o samplefile.hex –bt640 samplefile
samplefile is the input; samplefile.hex is the output. Hex files, such as samplefile.hex, can be viewed using HexEdit, which is included in the Supplemental Software package.
The Apple I Monitor uses memory locations $0024 through $002B as index pointers and locations $0200 through $027F as input buffer storage. The Monitor will not let you edit these locations. It will let you try and it won't give you any error message, but when you go to examine the contents you'll find it unchanged. |
Registers
You're going to find that moving data around consumes a massive portion of the Apple I processor's time. There are three general-purpose registers in the 6502, each of which hold one byte of data. These are the Accumulator, X, and Y. The Accumulator is the register that handles all arithmetic operations and is the one through which most data passes. X and Y are generic registers for temporarily storing data that you want to keep close to the processor. Additional registers will be discussed in Chapter 7.
Hello World
Our first assembly program, like our first BASIC program, is HELLO WORLD. We'll need three instructions for this: LDA, JSR, and JMP.
JMP is the mnemonic for "jump." It's the equivalent of the GOTO instruction in BASIC—though instead of specifying line numbers, we specify memory addresses. When our program is finished running we want to return control to the Monitor. We do this by jumping to location $FF1F. To jump to the Monitor you would type:
JMP $FF1F
The xa65 assembler allows us to use labels to represent physical addresses, so that at the start of the file you could state:
Monitor = $FF1F
And then later on in the code, you should include the instruction:
JMP Monitor
JSR stands for "jump subroutine". It is the equivalent of BASIC's GOSUB instruction. The Apple I Monitor has a subroutine called Echo which takes whatever value is in the Accumulator and sends it to the display. First, we load an ASCII value (see chart in Appendix A). Then we call the Echo subroutine. The character is printed and control returns to the main program. To jump to the Echo subroutine you would type:
JSR $FFEF
LDA is the mnemonic for "load Accumulator". The # symbol is used to indicate "Immediate Addressing" mode. This means that the actual value specified—as opposed to the value stored at that address in memory—is loaded into the Accumulator. For example, to load the value $3B into the Accumulator, you would type:
LDA #$3B
Our "Hello World" program should load 'H' into the accumulator, jump to the Echo subroutine, and then move on to the next letter. When it's done, control should be returned to the Monitor. Since all these letters could get a bit tedious, let's make it "HI W!" instead of "HELLO WORLD":
Echo = $FFEF
Monitor = $FF1F
LDA #$48 ; H in ASCII is $48
JSR Echo
LDA #$49 ; I
JSR Echo
LDA #$20 ; SPC
JSR Echo
LDA #$57 ; W
JSR Echo
LDA #$21 ; !
JSR Echo
JMP Monitor ; Return to Monitor
Save this file in the same directory as xa65 with the name "hiw". In Terminal, navigate to that directory and run the line:
./xa -o hiw.hex -bt640 hiw
Now open hiw.hex in HexEdit (Figure 6.1). If you're using a Serial I/O card on your Apple I, select all the text and copy it into your text editor. In your text editor (such as SubEthaEdit or BBEdit), select the linebreak and copy it into the clipboard. Use the editor's Find/Replace function to replace all linebreaks with a single space. This will place the entire program on a single line, which is what we'll need when we attempt to send it to the Apple I. If you don't have a Serial I/O card, you'll have to type the program in by hand.
The assembled machine code for hiw is:
A9 48 20 EF FF A9 49 20 EF FF A9 20 20 EF FF A9 57 20 EF FF A9 21 20 EF FF 4C 1F FF
We want to load this into the memory, starting at location $0280; so therefore, we need to enter the line:
0280: A9 48 20 EF FF A9 49 20 EF FF A9 20 20 EF FF A9 57 20 EF FF A9 21 20 EF FF 4C 1F FF
Type R to begin running at the most recently examined location ($0280), or specify a line specifically:
0280R
The program will output:
HI W!
And return control to the Monitor. If the program doesn't run correctly, examine it in memory. You should get:
0280.029D
0280: A9 48 20 EF FF A9 49 20
0288: EF FF A9 20 20 EF FF A9
0290: 57 20 EF FF A9 21 20 EF
0298: FF 4C 1F FF E4 FF
How do those Assembly instructions translate into machine code? Here's a line-by-line comparison (Table 6.1).
Assembly Code | Machine Code |
LDA #$48 | A9 48 |
JSR $FFEF | 20 EF FF |
LDA #$49 | A9 49 |
JSR $FFEF | 20 EF FF |
LSA #$20 | A9 20 |
JSR $FFEF | 20 EF FF |
LDA #$57 | A9 57 |
JSR $FFEF | 20 EF FF |
LDA #$21 | A9 21 |
JSR $FFEF | 20 EF FF |
JMP $FF1F | 4C 1F FF |
From this table, we can ascertain that $A9 is the opcode for LDA (when using Immediate Addressing). $A9 is followed by the value being loaded into the Accumulator. The opcode for JSR is $20. It's followed by the address of the subroutine in reverse order—low byte first, then high byte. JMP's opcode is $4C and uses the same addressing method as JSR.
TV Typewriter
Hello World demonstrated how to output text to the screen. This program will illustrate how to read it from the keyboard. First, we need to introduce two new Assembly instructions: RTS and BPL.
RTS is short for "Return from Subroutine." It is the equivalent of RETURN in BASIC. JSR (jump to subroutine) is used to enter a subroutine; RTS is used to return from it.
BPL is the mnemonic for "Branch on Plus." It's similar to an IF and GOTO statement in BASIC. BPL appears in the form:
BPL addr
If the value in the Accumulator is positive or zero, BPL branches to the location specified by addr. If the value in the Accumulator is negative, BPL does not branch, but continues execution at the next line.
There are also two new memory addresses we need to acquaint ourselves with. They are KbdRdy ($D011) and Kbd ($D010). These addresses are both on the 6821 PIA. The value at KbdRdy is negative when there is data waiting to be read from the keyboard. When there is no data waiting, it is positive. The location Kbd contains the value of the character read in from the keyboard.
A subroutine to read data from the keyboard should loop (wait) until a character is available and then load that character into the Accumulator and return:
KbdRdy = $D011
Kbd = $D010
GetChar:
LDA KbdRdy ; Key ready?
BPL GetChar ; Loop until ready
LDA Kbd ; Load character
RTS ; Return
Note the use of the GetChar label in the above example. BPL branches to whatever address at which GetChar is. GetChar is whatever address LDA KbdRdy begins at. The assembler will figure out what these actual addresses are and insert numbers.
Let's configure our TV Typewriter program so that it reads a character from the keyboard, echoes that character to the screen, and repeats this loop indefinitely:
Begin:
JSR GetChar ; Read character from keyboard and store in accumulator
JSR Echo ; Echo value in accumulator to screen
JMP Begin ; Loop forever
The completed program might look like this:
Echo = $FFEF
KbdRdy = $D011
Kbd = $D010
Begin:
JSR GetChar ; Read character from keyboard and store in accumulator
JSR Echo ; Echo value in accumulator to screen
JMP Begin ; Loop forever
GetChar:
LDA KbdRdy ; Key ready?
BPL GetChar ; Loop until ready
LDA Kbd ; Load character
RTS ; Return
Assemble the program with the instruction:
./xa -o tvtypewriter.hex -bt640 tvtypewriter
This will produce the machine code:
20 89 02 20 EF FF 4C 80 02 AD 11 D0 10 FB AD 10 D0 60
Precede that with the starting memory location (0280:) and copy the whole string into the Apple I Monitor, just as we did for Hello World. Hit R and Return to run the program. Your Apple I will now allow you to display whatever you can type. To clear the screen without halting the program, press the Clear button on the motherboard. To halt execution, press Reset.
X and Y
The X and Y registers are very similar to the Accumulator, but without the additional mathematical abilities discussed later. One mathematical function X and Y can perform, though, is increment. The INX and INY instructions increment X and Y by one, respectively. The converse is also available: DEX and DEY are used to decrement.
X and Y support the basic memory addressing operations supported by the Accumulator (see appendix C for specific addressing modes). Several instructions allow us to transfer values between the Accumulator and the X and Y registers (Table 6.2)
Instruction | Description |
TAX | Transfer Accumulator to X |
TAY | Transfer Accumulator to Y |
TXA | Transfer X to Accumulator |
TYA | Transfer Y to Accumulator |
Note that it is not possible to transfer directly between X and Y without first going through the Accumulator.
In the sample program below, $00 is loaded into X. X is incremented and then transferred to the Accumulator. The Accumulator is echoed to screen, then the loop is repeated. This will print every value from 0 through 255 and then loop back to the beginning. Since our display interprets all values as ASCII codes, you won't see the numbers 0 through 255. Instead, you'll see all the symbols those values represent.
Echo = $FFEF
Begin:
LDX #$00 ; Load 0 into X
Loop:
INX ; Increment X
TXA ; Transfer X to A
JSR Echo ; Echo A to screen
JMP Loop ; Repeat loop
To run the program, assemble it with xa65 and copy the hexadecimal code to the Apple I Monitor, as we did in the Hello World and TV Typewriter examples.
Memory Addressing
The 6502 has a myriad of addressing techniques used for moving data between memory and the processor's registers. The most basic of these are used constantly. Some of the more complex modes will seem almost esoteric, but they're invaluable under the right circumstances. The examples in this section refer to the data in Table 6.3.
Location | Contents |
X (register) | $02 |
Y (register) | $03 |
PC (register) | $C100 |
$0000 | |
$0001 | |
$0002 | $B3 |
$0003 | $5A |
$0004 | $EF |
$0017 | $10 |
$0018 | $D0 |
$002A | $35 |
$002B | $C2 |
$D010 | $33 |
$C238 | $2F |
Accumulator: A
The Accumulator is implied as the operand in Accumulator addressing and thus no address needs to be specified. An example is the ASL (Arithmetic Shift Left) instruction, covered later in this chapter. It is understood that the value in the Accumulator is always the value being shifted left.
Implied: i
With Implied addressing, the operand is implied and therefore does not need to be specified. An example is the TXA instruction.
Immediate: #
Immediate addressing loads the operand directly into the Accumulator. In this example:
LDA #$22
The Accumulator now contains the value $22.
Absolute: a
Absolute addressing specifies a full 16-bit address. For example:
LDA $D010
The Accumulator now contains $33, the value stored at location $D010.
Zero Page: zp
Zero Page addressing uses a single byte to specify an address on the first page of memory ($00xx). Thus, in the example:
LDA $02
The value $B3 at location $0002 is loaded into the Accumulator.
Relative: r
Relative addressing specifies memory locations relative to the present address. The current address is stored in the Program Counter (PC). Relative addressing adds the address in PC to the specified offset. In this example:
BPL $2D
PC is $C100 and the offset is $2D, so the new address is $C12D.
Absolute Indexed with X: a,x
With Absolute Indexed addressing with X, the value stored in X is added to the specified address. In this example:
LDA $0001,X
The value in X is $02. The sum of $0001 and $02 is $0003; therefore, the value at location $0003, which is $5A, is loaded into the Accumulator.
Absolute Indexed with Y: a,y
Absolute Indexed addressing with Y works the same as it does with X. In this example:
LDA $0001,Y
The value in Y is $03. The sum of $0001 and $03 is $0003, so the value at location $0004, which is $EF, is loaded into the Accumulator.
Zero Page Indexed with X: zp,x
Zero Page Indexed addressing is like Absolute Indexed addressing, but limited to the zero page. In the example:
LDA $01,X
X contains $02, which is added to $01 to present the zero page address $03, and is equivalent to the absolute address $0003. $0003 contains $5A, which is loaded into the Accumulator.
Zero Page Indexed with Y: zp,y
Zero Page Indexed addressing with Y works the same as it does with X. In the example:
LDA $01,Y
Y contains $03, which is added to $01 to present the zero page address $04, and is equivalent to the absolute address $0004. $0004 contains $EF, which is loaded into the Accumulator.
Absolute Indexed Indirect: (a,x)
With Absolute Indexed Indirect addressing, the value stored in X is added to the specified address. In this example:
JMP ($0001,X)
The value of X is $02; so, the processor loads the address stored in locations $0003 and $0004 and jumps to that address. This mode of addressing is only supported by the JMP instruction.
Zero Page Indexed Indirect: (zp,x)
Zero Page Indexed Indirect addressing first adds X to the zero page address, such as in this example:
LDA ($15,X)
X contains $02; so, the address $0017 is generated. Zero Page Indexed Indirect addressing next goes to that address and the following address ($0017 and $0018), and loads the values contained there as an address. $0017 contains $10 and $0018 contains $D0; consequently, the address $D010 is loaded. Finally, the processor goes to this address and loads the value it contains, $33, into the Accumulator.
Zero Page Indirect Indexed with Y: (zp),y
Zero Page Indirect Indexed addressing with Y begins by fetching the value at the zero page address and the zero page address plus one. In our example:
LDA ($2A),Y
The values at locations $2A and $2B would be fetched. $2A contains $35 and $2B contains $C2. This gives us the address $C235. Next, the value of Y, which is $03, is added to this address to create the address $C238. Location $C238 contains the value $2F, which is loaded into the Accumulator.
Interacting with Memory
We used LDA as our sample instruction for memory addressing to load the Accumulator, but the same can be done with LDX and LDY to load the X and Y registers. Data can be transferred from the registers to memory using the store instructions STA, STX, and STY. A few of the more esoteric addressing modes are only supported by particular instructions. See appendix C for details.
This program demonstrates some simple memory interactions, in which we store three variables in memory and then later retrieve them. Tables 6.4 and 6.5 show the contents of the registers before and after running the program, respectively.
Register | Value |
A | $73 |
X | $E2 |
Y | $90 |
STA $05 ; Store $73 at $0005
STX $1023 ; Store $E2 at $1023
STY $00FF ; Store $90 at $00FF
LDA $1023 ; Load $E2 from $1023
LDX $FF ; Load $90 from $00FF
LDY $0005 ; Load $73 from $0005
Register | Value |
A | $E2 |
X | $90 |
Y | $73 |
Printing Strings
The program in this section will introduce two new functions: the ability to assemble strings of characters and the ability to branch on a condition. ASCII characters can be stored in memory just like instructions, and by using the ASCII table in Appendix A we could manually enter the hexadecimal values into the Apple I's memory, just as we entered the instructions. The xa65 assembler provides a pseudo-opcode, .asc, which automatically converts a string of characters to hexadecimal. The statement:
.asc "HELLO WORLD"
Will enter "HELLO WORLD" as ASCII, wherever the statement is placed.
The Branch on Result Zero instruction (BEQ) branches (jumps) to a new memory location whenever the Z (Zero) flag is high. BEQ uses relative addressing, so the value following the BEQ instruction is added to the current address (in the program counter) to get the new address.
The Z flag can be set using the Compare (CMP) instruction. CMP compares the value in the Accumulator to the value specified after CMP. If the value in the Accumulator is less than the value in memory, then N equals 1. If they are equal, then Z and C equal 1. If the value in the Accumulator is greater than the value in memory, then C equals 1. In the following example, $AA is compared to the value in the Accumulator:
0000: CMP #$AA
0002: BEQ $05
0004: INX
If the Accumulator does not contain $AA, then the Z flag is reset to 0 and BEQ does not cause a branch. If the Accumulator does contain $AA, then Z is set to 1 and BEQ causes a branch to $05 + program counter. The program counter is at $0004 (the next instruction); therefore, the program branches to $0009 and continues execution at that location.
CPX and CPY work the same as CMP, but use the X and Y registers instead of the Accumulator.
The program displayed later uses branching to print an array of characters. The characters begin at the label string, such that the instruction
LDA string
will load the first value at string, in this case 'H', represented by an ASCI $48. The instruction:
LDA string + 1
will load the second value in the array, which is 'E', or $45 in ASCII.
With this strategy, our program can use Absolute Indexed addressing to specify the offset (i.e. which character to load).
LDA string,X
The above instruction loads the value at string + X into the Accumulator. By starting X at zero and repeatedly incrementing it, every character in the array can be reached.
How do we know when to stop incrementing? The NULL character, $00, can be appended to the end of the string:
.asc "HELLO WORLD",$00
After loading the character into the Accumulator, we can now compare it to the NULL character. If it is the NULL character, we know we have reached the end of the string and can branch to done:
CMP #$00 ; Compare char to NULL ($00)
BEQ done ; If NULL break out of loop
The following program uses the above methods to print to screen a character array of undetermined length:
Echo = $FFEF
Monitor = $FF1F
Begin:
LDX #$00 ; Our loop expects X to start at 0
print:
LDA string,X ; Load the next character
CMP #$00 ; Compare char to NULL ($00)
BEQ done ; If NULL break out of loop
JSR Echo ; If not NULL, print the character
INX ; Increment X, so we can get at the next character
JMP print ; Loop again
done:
JMP Monitor ; Return to Monitor
string:
.asc "HELLO WORLD",$00 ; ASCII string, $00 is NULL
Assemble this program and examine it in HexEdit. You will see your string displayed in ASCII on the right side of the window. Highlight the string and the corresponding ASCII values will be boxed (Figure 6.2). You can see NULL ($00) following immediately after the string. Preceding the character string is the rest of the program.
String Subroutine
The objective in this section is to develop a subroutine that can be used by any program to print any string. This subroutine draws heavily from the program developed in the previous section. The primary difference is that instead of hard-coding the location of the string, we want it to be able to load characters from any address. Since the Accumulator, X, and Y registers are only 8-bit, passing a 16-bit address is rather tricky.
Once possible workaround is to store the string's address in a pre-defined memory location, and then have the subroutine load it from that location. This can be done with Zero Page Indirect Indexed addressing. This addressing mode is only supported by the Y register; as a result, all our previous uses of the X register must be changed to Y.
If we load the string's address into $00 and $01, then we can access the characters in the string with the command:
LDY ($00),Y
The following program and subroutine demonstrates this method. This subroutine could be assembled separately and stored in ROM for use with any program. If you want to use this subroutine frequently but don't have the tools to program a ROM, you can load it into a high location of memory. Since memory is preserved through resets, you'll only lose the code when power is turned off.
Echo = $FFEF
Monitor = $FF1F
Begin:
LDA #<welcome ; Load low byte of string's addr (indicated by <)
STA $00 ; Save addr to $00
LDA #>welcome ; Load high byte of string's addr (indicated by >)
STA $01 ; Save addr to $01
JSR printsub ; Print the string
LDA #<goodbye ; Load low byte of string's addr
STA $00 ; Save addr to $00
LDA #>goodbye ; Load high byte of string's addr
STA $01 ; Save addr to $01
JSR printsub ; Print the string
JMP Monitor ; Return to Monitor
/* Strings */
welcome:
.asc "Welcome to the Apple I",$0D,$00 ; $0D is carriage return
; $00 is NULL
goodbye:
.asc "That was a short demo.",$00
/*
Name: PrintSub
Precondtions: Address of string to print is stored at $00,$01.
Postcondtions: The string is printed to screen.
Destroys: A, Y, Flags.
Description: Prints the string at the address at $00,$01.
*/
printsub:
LDY #$00 ; Our loop expects Y to start at 0
print:
LDA ($00),Y ; Load the next character
CMP #$00 ; Compare char to NULL ($00)
BEQ done ; If NULL break out of loop
JSR Echo ; If not NULL, print the character
INY ; Increment X, so we can get at the next character
JMP print ; Loop again
done:
RTS
This program exceeds the Monitor's maximum line length, so the simplest way to transfer it is to enter one line at a time. Each line in HexEdit contains 16 ($10) bytes, so you'll need to increment the memory address by $10 with each line you copy. Entry will look like this:
280: A9 99 85 00 A9 02 85 01 20 C8 02 A9 B1 85 00 A9
290: 02 85 01 20 C8 02 4C 1F FF 57 65 6C 63 6F 6D 65
2A0: 20 74 6F 20 74 68 65 20 41 70 70 6C 65 20 49 0D
2B0: 00 54 68 61 74 20 77 61 73 20 61 20 73 68 6F 72
2C0: 74 20 64 65 6D 6F 2E 00 A0 00 B1 00 C9 00 F0 07
2D0: 20 EF FF C8 4C CA 02 60
Bit Representation
Our video circuitry can only print ASCII characters, but the processor instructions are geared primarily towards integer operations. The subroutine presented in this section will print the Accumulator's contents in its binary representation. For example, if $05 is in the Accumulator, this subroutine will print "00000101".
There are many possible ways to approach writing this subroutine. One such way is to use the negative (N) flag. This flag is used in Two's Complement notation to indicate that the highest bit (bit 7) is a 1. By rotating the byte so that each value has its turn in the bit 7 position, N will be reset after each rotate. By branching whenever N is 1 to print an ASCII '1', and not branching whenever it is 0 to print an ASCII '0', we can print a 1 or 0 for each bit.
Three new instructions are used in this subroutine: CPY, ROL, ROR, BMI, and BNE.
Compare Y (CPY) is the equivalent of CMP for the Y register. The instruction:
CPY #$08
compares the value in the Y register to the value $08. If they are equal, the Zero (Z) flag is set. Branch on Result Not Zero (BNE) branches whenever Z equals 0. We can use this combination of instructions to create a loop:
LDY #$00 ; Initialize Y to 0
loop:
INY ; Increment Y
CPY #$08 ; Compare Y to $08
BNE loop ; If Y != $08 then loop
The above program initializes Y to $00 and then loops, incrementing Y until it equals $08.
Branch on Minus (BMI) branches when N equals 1. The N flag is set whenever bit 7 is high, by numerous instructions (see Appendix D), such as LDA and CMP. For example:
loop:
LDA #$80
BMI loop
will load the value $80 (1000 0000b) into the Accumulator and set N to 1 since bit 7 is high. Because N is high, BMI will branch to loop, and the program will run indefinitely.
Rotate Right (ROR) and Rotate Left (ROL) rotate the contents of the Accumulator or a memory address. The 'Carry' (C) is included in the rotation, such that if you did a ROL on the value:
0000 1111 C=1
the result would be:
0001 1111 C=0
It takes nine rotates to return the memory or Accumulator contents to their original values. Each time a rotate is done, the N flag in responds to the new bit 7.
The PrintBin subroutine below uses these new instructions to print the binary representation of whatever value is passed to it in the Accumulator:
Echo = $FFEF
Monitor = $FF1F
Begin:
LDA #$00
JSR printbin ; Print $00 in binary
LDA #$FF
JSR printbin ; Print $FF in binary
LDA #$AA
JSR printbin ; Print $AA in binary
JMP Monitor
/*
Name: PrintBin
Precondtions: Byte to print is in Accumulator.
Postcondtions: The string is printed to screen.
Destroys: A, X, Y, Flags.
Description: Prints the binary representation of the value in A.
*/
printbin:
LDY #$00 ; Initialize counter to 0
ROR ; ROR once so initial setup will mesh with the loop
TAX ; Loop expects value to be in X register
begin:
TXA ; Restore value to Accumulator
ROL ; Rotate next bit into bit 7's location
BMI neg ; If that bit is a 1, then branch to neg
TAX ; The bit is 0; save byte to X
LDA #$30 ; Load ASCII '0' for printing
JSR Echo ; Print 0
JMP continue ; Skip over code for printing a 1
neg:
TAX ; Save byte to X
LDA #$31 ; Load ASCII '1' for printing
JSR Echo ; Print 1
continue:
INY ; Increment counter
CPY #$08 ; Compare counter to 8 (number of bits to print)
BNE begin ; If counter != 8, then not all bits have been printed,
; so loop again, otherwise continue
LDA #$0D ; Load ASCII Carriage Return (CR)
JSR Echo ; Print CR to screen
RTS ; Exit subroutine
Using the Stack
A great feature of the Echo subroutine is that it doesn't destroy the contents of any registers—you can call Echo and pick up right where you left off without worrying about the contents of the Accumulator or the X and Y registers being any different. The PrintBin subroutine, by contrast, destroys the contents of all three. We can preserve the contents of these registers by saving them to the stack.
The stack is a data structure that only allows data to be added or removed at the topmost location. Data is either pushed onto the top of the stack or popped (sometimes called pulled) off the top of the stack. Figure 6.3 illustrates data being pushed and popped.
The 6502 uses the stack to store the return address when jumping to a subroutine (JSR). When returning from a subroutine (RTS), the address is popped from the stack and loaded into the Program Counter so the processor can continue operation where it left off. The stack is allocated the addresses between $0100 and $01FF. Overwrite data within this range and you risk corrupting the contents of the stack.
We can also use the stack for our own purposes, using the commands PHA, PLA, PHP, PLP, TSX, and TXS.
Push Accumulator on Stack (PHA) pushes the value in the Accumulator onto the top of the stack. Pull Accumulator from Stack (PLA) pops the top value off the stack and places it in the Accumulator. If we were to put the actions in Figure 6.3 into code, it would look like this:
LDA #$4B
PHA ; Push $4B
LDA #$79
PHA ; Push $79
PLA ; Pull $79
LDA #$AF
PHA ; Push $AF
PLA ; Pull $AF
PLA ; Pull $4B
In this example, every time a PLA instruction is issued, the contents of the Accumulator are overwritten. Normally, something would be done with the data after popping it.
The stack pointer is the address of the top of the stack. The value is stored in its own register, S. The address changes every time you push or pop a byte. If you want to manipulate this address, you can copy it into the X register using the Transfer Stack Pointer to Index X (TSX) instruction. You can copy your own stack pointer into S using the Transfer Index X to Stack Register (TSX) instruction.
Push Processor Status on Stack (PHP) and Pull Processor Status from Stack (PLP) are used to preserve the contents' processor flags (e.g. N, V, etc.). This is particularly useful when you don't want a subroutine to destroy your flags. Using PHP before the subroutine is called and PLP after it returns will ensure that they are not lost. This is a concept we can use to preserve all our register contents when calling a subroutine.
Suppose we want to print the binary representation of every integer from 0 to 255. This is best done with a loop:
LDX #$00 ; Initialize counter to 0
JMP start ; Skip over increment first time through the loop
next:
INX ; Increment the counter
start:
TXA ; printbin2 requires the value be in the accumulator
JSR printbin2 ; print the value as binary
CPX #$FF ; If we've reached 255 ($FF), we're done
BNE next ; If not, loop again
JMP Monitor
There's just one problem – the X register is overwritten each time the PrintBin subroutine is called, destroying our counter. A possible fix for this is to push X onto the stack before calling the subroutine, then pop it after the subroutine returns, which could be done like this:
LDX #$00 ; Initialize counter to 0
JMP start ; Skip over increment first time through the loop
next:
INX ; Increment the counter
start:
TXA ; printbin2 requires the value be in the accumulator
PHA ; Save X (already in the accumulator), before subroutine
JSR printbin ; print the value as binary
PLA ; Retreive X from Stack, place it in the accumulator
TAX ; Copy X from accumulator to X register
CPX #$FF ; If we've reached 255 ($FF), we're done
BNE next ; If not, loop again
JMP Monitor
Another possibility is to modify the PrintBin subroutine so that it does not destroy the registers. This can be done by pushing all registers onto the stack at the start of the program and popping them at the end:
/*
Name: PrintBin2
Precondtions: Byte to print is in Accumulator.
Postcondtions: The string is printed to screen.
Destroys: Flags
Description: Prints the binary representation of the value in A.
*/
printbin2:
STA temp ; Save Accumulator for later
PHA ; Push Accumulator
TXA
PHA ; Push X (via A)
TYA ; Push Y (via Y)
PHA
LDA temp ; Restore the Accumulator
LDY #$00 ; Initialize counter to 0
ROR ; ROR once so initial setup will mesh with the loop
TAX ; Loop expects value to be in X register
JSR PrintBin ; Now that registers are saved, call the ordinary
; PrinBin subroutine
PLA
TAY ; Recover Y
PLA
TAX ; Recover X
PLA ; Recover Accumulator
RTS ; Exit subroutine
temp:
.asc $00 ; Temporary storage location for Accumulator value
Which method is better? It depends on the application. If it doesn't matter that your subroutine destroys the registers, then there's no point in wasting time and space in the subroutine to preserve them. On the rare occasions you do want to preserve the registers, it would be more efficient to do so in the calling program. On the other hand, there are some subroutines such as Echo that get called very frequently. Imagine having to first push and then pop the contents of all the registers each time Echo is called. In a case such as this, it's much less frustrating and more space-efficient to put the stack operations inside the subroutine.
Bit Manipulation
The instructions AND Memory with Accumulator (AND), OR Memory with Accumulator (ORA), and Exclusive-OR Memory with Accumulator (EOR) are used to perform bit-by-bit operations on data. These are the same operations discussed in Chapter 2, but while in that chapter they are applied to single bits, here they are applied to bytes on a bit-by-bit basis. To AND $AC with $E6, for example, produces $A5:
1010 1100 = $AC
AND 1110 0110 = $E6
1010 0101 = $A5
In an AND operation, each resulting bit is high only if both bits in that column are high. This is very useful for masking. Suppose you want to isolate the lower four bits from the upper—that is, you want to change $B6 into just $06. Use $0F as a mask:
1011 0110 = $B6
AND 0000 1111 = $0F
0000 0110 = $06
To turn $B6 into just $0B, use $F0 as a mask and then ROR four times.
Suppose that, by some terrible calamity, you come upon a string of text that contains lower-case characters. Masking can be used to make these upper-case. Upper-case ASCII characters have hex values in the range of $41 (0100 0001b) for 'A' to $5A (0101 1010b) for 'Z'. Lower-case characters are in the range $61 (0110 0001b) for 'a' to $7A (0111 1010b) for 'z'. Note that the values for 'A' and 'a' are identical, except for bit 5. The same applies to 'Z' and 'z', as well as the other letters. To change a character from lower- to upper-case, you only need to change bit 5 from 1 to 0.
This can be done programmatically by masking the ASCII value with 1101 1111b. This passes through every bit except bit 5, which is forced to a 0. Here are some examples:
0110 0001 = $61 = 'a'
AND 1101 1111 = $DF
0100 0001 = $41 = 'A'
0100 0001 = $41 = 'A'
AND 1101 1111 = $DF
0100 0001 = $41 = 'A'
0111 0011 = $73 = 's'
AND 1101 1111 = $DF
0101 0011 = $53 = 'S'
The following program uses a mask with $DF to convert all letters in a string from lower- to upper-case; it will leave upper-case letters untouched, but will affect a few non-letter characters above $5A:
Echo = $FFEF
Monitor = $FF1F
LDY #$00 ; Initialize counter to 0
caps:
LDA string,Y ; Load in first character
CMP #$00 ; Is this character null?
BEQ done ; If so, we're done converting
CMP #$41 ; Is this character a letter?
BMI skip ; If not, skip it
AND #$DF ; Mask with $DF to convert to uppercase
STA string,Y ; Overwrite original character with the new
skip:
INY ; Increment the counter
JMP caps ; Loop to the next character
done:
LDA #<string ; Save memory address of string for access by PrintSub
STA $00
LDA #>string
STA $01
JSR printsub ; Print the new uppercase string
JMP Monitor
string:
.asc "lowercase string, MOSTLY",$00
/* Append PrintSub subroutine here */
While AND is useful for clearing bits to 0, ORA can be used to set them to 1. Suppose that contrary to the last example, you now want to convert all letters from upper- to lower-case. Bit 5 would then need to be set high. This can be done with an ORA operation:
0100 0001 = $41 = 'A'
ORA 0010 0000 = $20
0110 0001 = $61 = 'a'
0110 0001 = $61 = 'a'
ORA 0010 0000 = $20
0110 0001 = $61 = 'a'
0101 0011 = $53 = 'S'
ORA 0010 0000 = $20
0111 0011 = $73 = 's'
To modify the above program so that it converts upper- to lower-case instead of vice-versa, it is only necessary to change one line:
AND #$DF ; Mask with $DF to convert to uppercase
so that it reads
ORA #$20 ; Set bit 5 to convert to lowercase
Finally, the Exclusive-OR (EOR) command is useful for flipping bits:
0010 1010 = $2A
EOR 1111 1111 = $FF
1101 0101 = $D5
1111 0000 = $F0
EOR 1111 1111 = $FF
0000 1111 = $0F
One particularly interesting feature of EOR is that it can be used to swap two values without using a temporary variable. Under normal circumstances, if we were trying to swap two values stored in memory, the code would look something like this:
LDA $00 ; Load both values into registers
LDX $01
STA $01 ; Store both values to memory
STX $00
But what if only one register is available? The following code will swap the contents of $00 and $01 using only the Accumulator:
LDA $00 ; Load v1
EOR $01 ; v1 xor v2 = v1
STA $00 ; Store in v1
EOR $01 ; v1 xor v2 = v2
STA $01 ; Store in v2
EOR $00 ; v1 xor v2 = v1
STA $00 ; Store in v1
AND, ORA and EOR are powerful tools for bit manipulation. Most of the examples in this chapter deal with ASCII manipulation where they are of limited usefulness, but for other applications, such as interacting with peripherals, they are invaluable.
Math
The 6502 contains instructions for addition and subtraction, and can work in both binary and decimal modes. The examples in this section will use binary mode, but you can find many examples of decimal mode and Binary Coded Decimal (BCD) on the Internet.
Add Memory to Accumulator with Carry (ADC) is the instruction for addition. The carry bit is a flag in the Processor Status Register that indicates there is a bit to carry beyond the 8th place. Here are a few examples of simple arithmetic, with carry first reset to 0:
0000 0001 = $01
+ 0000 0001 = $01
0000 0010 = $02
0000 0001 = $01
+ 0000 0011 = $03
0000 0100 = $04
0000 0011 = $03
+ 0000 0011 = $03
0000 0110 = $06
0000 1111 = $0F
+ 0000 1111 = $0F
0001 1110 = $1E
0110 1110 = $6E
+ 0001 1111 = $1F
1000 1101 = $8D
1001 1101 = $9D
+ 0110 0001 = $61
1111 1110 = $FE
This last example came dangerously close to overflowing the available 8 bits. The largest value a single byte can hold is 255. To calculate values larger than this, the carry flag is needed. The carry flag allows us to continue our operation into the next byte, with a carry, so that no part of the value is lost. For example:
C 1
0000 0000 = $00 1000 0100 = $84
+ 0000 0000 = $00 + 1001 0010 = $92
0000 0001 = $01 C=1 0001 0110 = $16
The sum of $84 and $92 exceeds $FF and our byte overflows. This overflow sets the carry flag and we can continue our addition in the next byte with a sum of $01. Putting the bytes together, we get the grand sum of $0116.
By using two bytes we can address values up to 65,535. Suppose we were to add $42A5 to $15D2. The same layout can be used:
C 1
0100 0010 = $42 1010 0101 = $A5
+ 0001 0101 = $15 + 1101 0010 = $D2
0101 1000 = $58 C=1 0111 0111 = $77
The method is equally effective when no carry flag is necessary, such as in the case of adding $3E1F to $1A73:
C 0
0011 1110 = $3E 0001 1111 = $1F
+ 0001 1010 = $1A + 0111 0011 = $73
0101 1000 = $58 C=0 1001 0010 = $92
The carry flag can be manually set or cleared using the Set Carry Flag (SEC) and Clear Carry Flag (CLC) instructions, respectively. Many instructions set the carry flag; subsequently, it is important to clear the flag before performing addition. Below is the code for adding $61 to $9D:
LDA #$9D ; $9D + ...
CLC ; Make sure Carry is not already set
ADC #$61 ; $9D + $61
JSR printbin ; Print the result in binary
Sixteen-bit addition is only slightly more complex. Suppose we want to add the value in the locations $00,$01 to $02,$03 and store the result in $04,$05. First the low bits are added and then the high bits, with carry:
CLC ; Make sure Carry is not already set
LDA $00
ADC $02 ; Add the low bytes
STA $04 ; And save sum to memory
LDA $01
ADC $03 ; Add the high bytes (note that Carry was NOT cleared)
STA $05 ; And save sum to memory
LDA $04
JSR printbin ; Print low byte
LDA $05
JSR printbin ; Print high byte
JMP Monitor
Subtraction is handled just like addition, except it expects the carray flag to be high when it starts. Before issuing the first SBC instruction, it's always necessary to set the carry flag using SEC.
You may have noticed by now that there is no instruction for multiplication. There are many techniques for multiplying two numbers. The simplest is to place the first value in the Accumulator and the second in Y, then decrement Y each time you add the value in the Accumulator to its original value. In this manner, 5 x 4 becomes 5+5+5+5.
Since we're working in base-2, multiplying or dividing by powers of two stands out as particularly easy. To multiply by two, simply shift once to the left; to multiply by four, shift twice to the left, and so on. The same technique applies to division. Shift once to the right to divide by two, twice to the right to divide by four, and so on. Here are some examples:
0110 1100 = $6C
x 2 (shift left 1 bit)
11011000 = $D8
0000 0110 = $06
x 16 (shift left 4 bits)
0110 0000 = $60
1110 1000 = $E8
/ 8 (shift right 3 bits)
0001 1101 = $1D
These multiplication and division operations can be performed with the Accumulator Shift Left (ASL) and Logical Shift Right (LSR) instructions. ASL shifts the value in the Accumulator (or memory) left by one bit. Bit 7 is shifted into the carry flag. A zero is shifted into bit 0. LSR shifts the value in the Accumulator (or memory) right by one bit. A zero is shifted into bit 7 and bit 0 is shifted into the carry flag. The following program multiplies the value in the Accumulator by 8:
ASL
ASL
ASL
This program divides the value in the Accumulator by 4:
LSR
LSR
Obviously, this method has its limitations. It is easy when multiplying to exceed 255 and when dividing to generate a fraction. Checking the carry flag after each shift will alert you if this occurs.
It is possible to use ASL and LSR to multiply by values that are not powers of two. To multiply by 10, for example, multiply by two, then multiply by eight, and add the results together.
Summary
In this chapter I've attempted to give a basic overview of the instructions for programming in assembly and their practical applications. Some instructions, which are similar to those already covered here, have been passed over. For a complete list of instructions and their uses, see appendices B, C, and D. For a more thorough look at programming in assembly, visit the Apple I Owners Club's web site on www.applefritter.com, or Mike Naberezny's web site at www.6502.org. There, you'll find a wealth of data sheets, tutorials, and source code.
← Previous | Contents | Next → |