Interesting thoughts on a 32-bit 6502... one could argue that a binary incompatible version is basically the ARM1 but it's interesting to explore how to extend the 6502 in various ways whilst preserving compatibility with the original ISA.
My own avenue of interest is seeing just how far the 6502's original 8/16-bit ISA could be optimised in a way that sticks to those widths. It occurred to me since I drew that previous opcode grid that the addition of a (writeable) Z register could really help with that, more so than I've seen in the 65CE02:
Image may be NSFW.
Clik here to view.
Those Group 3 and 7 instructions are all 1 byte in length because they get their base address from a register rather than an operand in main memory. That saves 1 byte and 1 cycle in the case of Group 3, which helps compensate a tad for indirect addressing's increased cycle count. The saving is 2 bytes and 2 cycles for Group 7... whereas an LDA abs,X takes 4+ cycles, an LDA W,Z would likely need only 2+ cycles. Or just 2 if we can ditch the page boundary penalty, which iirc the 65CE02 did manage. Combine that with register IN/DE operations requiring just 1 cycle and we're on course for having code that run 2-3 times faster at the same clock speed.
Poking around in a stack remains simple, since S is now S* (the 'Stack Pointer Pointer') and acts as a 1-byte address in Zero Page where the true (16-bit) stack pointer lives, making it very similar in function to the Group 3 loads and stores. There IS a cycle penalty here, though, since obvs it'd need to load that pointer, adjust it, read/write the value pointed to, and also write the adjusted pointer value back... having Zero Page as zero-turnaround SRAM inside the CPU itself would probably partly or wholly claw those cycles back but it does create a big abstraction inversion problem for MMUs. Still chewing this!
STZ is out, since we can make plenty of savings elsewhere with those far more generalised 1-byte instructions.
TSB and TRB are out... frankly coz they seem a tad idiosyncratic. Some of the SWA variants take their place. Complementing BIT/AND we have TEQ/EOR since TEQ is to EOR as BIT is to AND.
Nicked from the 65C816 is PHR which pushes a 16-bit value to the Call Stack, computed as an address relative to PC. It's the next best thing to having a full set of conditional 8 and 16-bit Branch, Saving Return Address and why I ditched JSR (&&,X).
My own avenue of interest is seeing just how far the 6502's original 8/16-bit ISA could be optimised in a way that sticks to those widths. It occurred to me since I drew that previous opcode grid that the addition of a (writeable) Z register could really help with that, more so than I've seen in the 65CE02:
Image may be NSFW.
Clik here to view.

Those Group 3 and 7 instructions are all 1 byte in length because they get their base address from a register rather than an operand in main memory. That saves 1 byte and 1 cycle in the case of Group 3, which helps compensate a tad for indirect addressing's increased cycle count. The saving is 2 bytes and 2 cycles for Group 7... whereas an LDA abs,X takes 4+ cycles, an LDA W,Z would likely need only 2+ cycles. Or just 2 if we can ditch the page boundary penalty, which iirc the 65CE02 did manage. Combine that with register IN/DE operations requiring just 1 cycle and we're on course for having code that run 2-3 times faster at the same clock speed.
Poking around in a stack remains simple, since S is now S* (the 'Stack Pointer Pointer') and acts as a 1-byte address in Zero Page where the true (16-bit) stack pointer lives, making it very similar in function to the Group 3 loads and stores. There IS a cycle penalty here, though, since obvs it'd need to load that pointer, adjust it, read/write the value pointed to, and also write the adjusted pointer value back... having Zero Page as zero-turnaround SRAM inside the CPU itself would probably partly or wholly claw those cycles back but it does create a big abstraction inversion problem for MMUs. Still chewing this!
STZ is out, since we can make plenty of savings elsewhere with those far more generalised 1-byte instructions.
TSB and TRB are out... frankly coz they seem a tad idiosyncratic. Some of the SWA variants take their place. Complementing BIT/AND we have TEQ/EOR since TEQ is to EOR as BIT is to AND.
Nicked from the 65C816 is PHR which pushes a 16-bit value to the Call Stack, computed as an address relative to PC. It's the next best thing to having a full set of conditional 8 and 16-bit Branch, Saving Return Address and why I ditched JSR (&&,X).
Statistics: Posted by Arx — Thu Mar 28, 2024 5:22 pm