Why the EAX register of x86 is called that

kens · on March 22, 2020

One thing directly connected to this history is why the x86 is little-endian. As the article explains, the 8008 was designed for the Datapoint 2200 terminal. The 8008 was intended as a compatible replacement for the existing Datapoint processor, which was built from simple TTL chips.

To reduce the chip count, the Datapoint 2200 used a serial processor, which processed one bit at a time, so you had a 1-bit ALU among other things. One consequence is that you have to start with the low-order bit when doing addition, so you can handle carries. And for 16-bit values, you also have to start with the low-order byte. This forces you into a little-endian architecture.

Thus, to be compatible with the Datapoint 2200, the 8008 was also made little-endian. Unfortunately, Intel was very slow creating the 8008, so Datapoint had moved on to a parallel 74181-based architecture and didn't want the 8008. Intel decided to sell the 8008 as a stand-alone product, essentially creating the microprocessor industry. As the article explains, the x86 grew out of the 8008, so x86 also inherited the little-endian architecture

Vogtinator · on March 22, 2020

I don't think that really forced any particular endianness.

Endianness is only really a thing if there's addressable memory, which is not the case for a bit stream. Whether a system is big/little/mixed-endian was only a question of how the address lines are set for multi-byte accesses.

tomxor · on March 22, 2020

I thought that too at first, but then I found this stack-overflow answer fairly convincingly confirming the OPs claim with quotes from it's designers: https://stackoverflow.com/a/36263273 although it's still not explicitly clear why.

> Shustek: So, for example, storing numbers least significant byte first, came from the fact that this was _serial_ and you needed to process the low bits first.

> Poor: You had to do it that way. You had no choice.

I'm wondering if it was the bit order that mattered for compatibility with the 1-bit serial processor of the Datapoint 2200, and if that determined byte ordering of it's 16bit successor?... I don't know much about 1-bit computing, but if you think about it - to be useful, it must be capable of doing operations on arbitrary length words (within some reasonable limitation), i.e not limited to 8 bits like the 8008. So maybe this put it in a funny position: while byte endianess makes no sense and bit endianness seem to make no difference for the 8008 since it was 8bit; it did make a difference to the datapoint which was serial 1-bit and could operate through the 8bit word boundaries.

To be clear what i mean, imagine how a 1-bit ALU would process 16 bits, it would want them in order like this (the bits are in logical order, not numerical order):

    (8bit) Little Endian:
    byte 0000000011111111
    bit  0123456789abcdef

But not like this:

    (8bit) Big Endian:
    byte 0000000011111111
    bit  76543210fedcba98

I don't know if this is correct, and i'm assuming here that bit endianness must determine byte endianness in the 16bit 8086. Clarification/correction is most welcome.

Gibbon1 · on March 23, 2020

Assume you are adding two words one bit at a time with one bit of storage for the carry. You can do that with LE but not BE.

tomxor · on March 23, 2020

Can you please give a bit more detail? what I don't quite understand is why it would be a problem to do so in reverse bit order?

I'm assuming you are saying it's not about word boundaries? that was just my guess, but is it correct that bit order does matter?

Gibbon1 · on March 23, 2020

When you add two binary numbers the simplest method for handling the carry is a ripple carry. The carry bit ripples up from lsb to msb.

If you're trying to build a simple 1 bit ALU you only need to save the carry bit as long as you add the bits from least significant to most. if you do it the reverse you have an intermediate result that needs to be saved.

One bit ALU adds like

   a0 + b0 => r0 + carry0
   a1 + b1 + carry0 => r1 + carry1
   a2 + b2 + carry1 => r1 + carry2
   ...
   an + bn + carry(n-1) => r1 + carryn

As you can see you only need to keep the carry bit. The result bit gets written back to memory immediately.

What's going on is a 1 bit processor represents the most extreme trade off between gate count and speed for a given functionality. AKA low gate count but requires many clock cycles per operation.

tomxor · on March 24, 2020

> as long as you add the bits from least significant to most

Thanks, although I already understood what you explained, this bit made me realize more clearly the detail I was tripping up on. I was never imagining the 1-bit adder to start anywhere _but_ the LSB, what I couldn't figure out was why it couldn't simply iterate the bits in reverse logical order for big endian...

I'm guessing there is a good reason why serial processors can't read a sequence of bits in reverse logical order, and it probably applies to modern CPUs too? either that or it's a design choice that cannot be dynamic.

I should probably pick a book up at this point, thanks for answering my questions.

JoeAltmaier · on March 23, 2020

The carry between bytes can be applied as-you-go with LE but has to be done after with BE. Which would double the execution time (adding zero/1 to each byte of the result).

ajross · on March 22, 2020

Pretty much all simple multi-word ALU algorithms need to start with the low word first. That particular terminal wasn't remotely unique (though for all I know, it was indeed the specific product that drove the Intel design decision), nor was x86 the first LE architecture. The PDP/11 made the same decision for the same reason several years earlier (and the VAX followed for pseudocompatibility), etc...

Really, it wasn't until the mid-80's and the RISC revolution, where all of a sudden people found themselves designing systems that would be 32 bits wide from the very first silicon that the community "decided" that the only true byte order should be BE.

And of course, the reason they made that decision was simply that the order of bytes in memory happened to match the way readers of the latin alphabet write arabic numbers on paper. Had the arabs invented computing, there would never even have been a debate.

KMag · on March 22, 2020

On the other hand, unsigned big-endian numbers sort lexographically and numerically the same way. So, for instance, if you're using a b+ tree to store time series of key-value pairs, you can use tuples of <key, big_endian_timestamp> as your b+ tree keys and use memcmp() on the raw bytes of your keys.

For variable-length integers, using a UTF-8-like big-endian encoding allows you to keep the relationship between lexographic and numeric sorting.

This is also a huge advantage of ISO 8601 over other date formats.

projektfu · on March 22, 2020

The mainframes that were byte-addressable (e.g. System/360) were big endian generally. I think this property led to the Motorola decision to use big endian on the 68000 and, because of its popularity in workstations, the RISC vendors followed suit. MIPS chose to make the endianness selectable by the hardware vendor, and that has become common, with little endian wining these days in ARM implementations.

greenyoda · on March 22, 2020

> Really, it wasn't until the mid-80's and the RISC revolution, where all of a sudden people found themselves designing systems that would be 32 bits wide from the very first silicon that the community "decided" that the only true byte order should be BE.

IBM 360-series mainframes were 32-bit and big-endian in the 1960s. (Their earlier computers may have been big-endian too, I'm not sure.)

cesarb · on March 22, 2020

> (Their earlier computers may have been big-endian too, I'm not sure.)

Yes, their choice of big endian probably came from their earlier unit record equipment, which used punched cards as input, storage, and output. Since punched cards were (sort of) directly human readable, it was natural to use big endian.

KMag · on March 23, 2020

Also, making punch cards bid-endian meant that BCD integers sorted the same way lexographically and numerically, so there wasn't any special sorting mode necessary for numeric fields in punch card sorting machines.

nineteen999 · on March 23, 2020

Indeed, and the M68K (release in 1979) is big-endian as well.

userbinator · on March 22, 2020

I've always had it memorised that LE is the "logical" order and BE the "backwards" order, specifically for this reason: with LE, the bit at offset n has value 2^n, and the byte at offset n has value 256^n. With BE it depends on how wide the total value is, which introduces an awkward length-dependent term.

Practically all multi-precision arithemtic libraries store the segments of a bignum in LE order too, despite the fact that the individual segments may be BE on a BE machine. That is even more confusing.

gumby · on March 23, 2020

That misunderstands history. The earlier DEC machines (e.g. PDP-1/6/10/20 were all BE (as were Multics and mainframes of the time); the PDP-10 is why network ordering is BE.

And while I consider the PDP-6 to be the first RISC architecture, really that title goes to 1970s machines like the 801.

jart · on March 22, 2020

Little endian is natural memory ordering and was specified in Von Neumman's EDVAC design doc during world war ii, so the question isn't so much, "why little endian?" but rather, "why not?" Only reason computers were ever designed to use "traditional" ordering is because business mainframes loved BCD and they loved to code GUIs that just blit'd raw fixed memory onto the display driver.

KMag · on March 23, 2020

As I mentioned elsewhere, the fact that big-endian positive integers (and IEEE-754 normalized floats/doubles) sort the same lexographically and numerically is useful. The same goes for UTF-8-like big-endian encodings for variable length integers and ISO 8601-formatted dates / times. It's handy to be able to use memcmp() on compound b+ tree keys.

The advantage works both ways: it's also nice if your normal integer vector instructions can be used to accelerate memcmp, without needing an extra opcode or instruction flag to specifically support parallel lexographical comparison. Note that if the vector units only support signed integers, then you only need an exclusive-or to flip all of the sign bits in order to get lexographical sorting, and that vector exclusive-or has important real-word use cases for hash functions and so-called add-shift-xor encryption algorithms like ChaCha20.

Alternatively, if you're going to include a dedicated lexographical comparison instruction for strcmp/memcmp (like x86 REP CMPS), a fast implementation will need less dedicated circuitry if your processor natively supports big-endian operations for other instructions.

Big-endian use in mainframes likely evolved from big-endian BCD numeric fields on punch card sorting/tabulating machines, where big-endianness was an advantage in that the normal lexographic sorting would put constant-width numeric fields in numerical order.

jart · on March 23, 2020

SWAR techniques are orthogonal to endianness. Read Hacker's Delight; it mentions big endian only once, to explain how to convert away from it. Describing variable-length encoding as big-endian-like is quite a creative stretch. Xor does not flip two's complement sign correctly, except in a few special cases like YCbCr. Please read the EDVAC design doc, since Von Neumman invented two's complement too. His ideas are the reason why programmers have never needed to care much about whether it's signed or unsigned, since the machine arithmetic will behave exactly the same way (with few exceptions).

KMag · on March 24, 2020

> Describing variable-length encoding as big-endian-like is quite a creative stretch.

LEB128 is a little-endian variable-length encoding. That's what the LE stands for.

UTF-8 is a big-endian variable-length encoding. The most significant bits are packed in the earlier bytes, so that lexographic ordering comes out correctly.

I'm really alluding to something like LEB128-encoding uint64_ts, except big-endian and putting all of the continuation flags at the beginning of the encoding, so a single switch on the first byte gives you the encoded length and it also sorts correctly lexographically.

> Xor does not flip two's complement sign correctly

Re-read what I wrote. I wasn't talking about flipping the sign bit in order to negate the numeric value. I was talking about flipping the sign bit to get correct lexographic text ordering on a machine that can only do signed comparisons.

> since the machine arithmetic will behave exactly the same way (with few exceptions).

I was talking about _exactly_ one of these exceptions... comparing two values. For simplicity, let's pretend we have a 16-bit BE CPU that can only perform signed comparisons, and we want to lexographically compare two 4-byte arrays: [ 41 4F 4F 4F ] vs. [ C1 81 4F 4F ] we perform the first load to compare 0x414F (16719) vs. 0xC181 (-15999) in order to get the correct lexographical ordering in all cases, we need to invert the sense of the sign bit. Flipping the sign bit in these two cases gives 0xC14F (-16049) vs 0x8181 (33153).

In all cases, flipping the sign bit allows one to perform unsigned comparison on a CPU that can only perform signed comparisons. For an W-bit word, you end up subtracting (W-1)^2 from all values with the most significant bit unset and adding (W-1)^2 to all values with the most significant bit set. I understand this doesn't negate the values. That's not the point. The point is to perform unsigned comparison on a CPU that doesn't natively support it.

tomxor · on March 23, 2020

> Only reason computers were ever designed to use "traditional" ordering is because business mainframes loved BCD and they loved to code GUIs that just blit'd raw fixed memory onto the display driver.

Hah, so it's literally because we write numbers with most significant digit on the left, which is the opposite to logical ordering?

I presume the only advantage to BE today is compatibility then...

gumby · on March 23, 2020

“We” being people with left-to-right languages. All rtl languages I know of write numbers the same as ltr languages, i.e. in the rtl case little endian.

As you have to read an entire positional number to be able to get even its magnitude; the only “advantage” I can see of LE over BE in natural languages is that at least rtl can get odd/even immediately. Then again in ltr you can get sign immediately. In real life human use neither “advantage” is compelling.

KMag · on March 23, 2020

If you want your (vector or scalar) integer instructions to be useful for fast strcmp/memcmp implementations, then you'll want to support unsigned big-endian integers. Also, if you want to mix integers and strings in compound b+ tree keys and use memcmp to compare keys, you'll want to store your integers big-endian.

jart · on March 23, 2020

I don't believe you. That sounds more like bitreverse and morton interleaving. If you're serious that it's actually big endian and that it has a real use case (outside human social traditions and legacy infrastructure maintainence), then I'd love to see convincing details.

As for strings, GCC/Clang are great at vectorizing normal byte-oriented code. It doesn't matter if it generates or you hand code VPCMPEQB vs. VPCMPEQQ either, since they appear to have the exact same latency. Modern compilers have also been moving in the direction of punishing folks who do things like `* (uint32_t * )p`, due to aliasing rules rather than endianness. Rob Pike had an amusing blog post on this subject: https://commandcenter.blogspot.com/2012/04/byte-order-fallac...

KMag · on March 24, 2020

> I don't believe you. That sounds more like bitreverse and morton interleaving. If you're serious that it's actually big endian and that it has a real use case (outside human social traditions and legacy infrastructure maintainence), then I'd love to see convincing details.

Back when I worked on Google's indexing system (2006-2010), one of the stages keyed documents by URL (with the host name in DNS order: com.google.www) followed by a big-endian timestamp. This put records for the same domain together, and put successive crawls for the same URL in chronological order. This improved compression ratios in our BigTable tablets, and finding the latest crawled version of <URL> was just a search for the largest key <= <URL><MAX_TIMESTAMP>.

> It doesn't matter if it generates or you hand code VPCMPEQB vs. VPCMPEQQ either, since they appear to have the exact same latency.

But with VPCMPEQB, you need to have 8 times as many conditional branches as CPCMPEQQ (in the naive case) in order to turn your vectorized comparison into a memcmp result. Note that VCMPEQ* don't modify any flags, so you can't just JE/JNE after your VCMPEQ*. Now, via some vector permutes, shifts, and bitwise-ors, you can cut down on the number of conditional branches. However, it's still more processing than if you can load / compare your vector registers in lexographical order.

jart · on March 24, 2020

But Google doesn't use big endian CPUs in prod, so you've proven the point yourself. I agree that lexicographically arranging things can have an awesome impact on performance though. Still think that has zero to do with endianness.

> But with VPCMPEQB, you need to have 8 times as many conditional branches as CPCMPEQQ in order to turn your vectorized comparison into a memcmp result

Not at all. It doesn't require any branches, other than the loop itself. Here's the trick everyone uses, for the simple case of finding one character, e.g. NUL:

    0:  add $vectorlen,i  
        pmovdqa mem,vector  
        pcmpeqb query,vector  
        pmovmskb vector,bitmask  
        bsf bitmask,offset  
        jz 0b  
        add offset,i  
        ret

BSF gives you the byte offset of the match, and bam you're done.

KMag · on March 24, 2020

> But Google doesn't use big endian CPUs in prod, so you've proven the point yourself.

The whole point is you just memcmp the whole key byte array instead of having to parse out the fields. If the timestamp is in little-endian order, then memcmp won't give you chronological ordering on the timestamp suffix.

In the case of pcmpeqb / pmovmskb / bsf, I was explicitly talking about memcmp, not finding the first same or differing byte. I don't want to get too bogged down in specific architectures vs. the more general pros/cons of byte orders themselves... but... there's pcmpeq and pcmpgt, but no pcmpne. That's fine, so to implement memcmp, you'd just xor with -1 before bsf to find the index of the first different byte and then explicitly compare the differing byte. So far so good, except that Intel explicitly designed AVX to scale up to 1024-bit vectors, at which pmovmskb can't pack 128 bits into a GP register. Fine, so use pcmpeqw/pmovmskw instead of pcmpeqb/pmovmskb, but then once you have the offset of the first differing uint16_t, you can't just do a native-endian comparison on the two uint16_ts. If the x86 were big-endian, then bsf with 64-bit GP registers would scale all the way up to 4096-bit vector registers without requiring an extra bit extractions or bswap instructions. It's fundamentally an advantage that a native-endian 64 bit subtraction performs an 8 byte lexographical comparison on big-endian machines.

Granted, the advantage is tiny, but this whole rabbit hole discussion was in reply to "I presume the __only__ advantage to BE today is compatibility then..." (emphasis mine) in [https://news.ycombinator.com/item?id=22660000]

tomxor · on March 23, 2020

Ah, but isn't that all part of the same quirk of human language, i.e that we compare strings from left most significant to right least significant?

KMag · on March 24, 2020

It's not some quirk. There are two common endiannesses, and the byte sort precedence of one of them matches the byte sort precedence we use for text. Since we represent characters as numbers, and for efficiency we have CPU instructions that treat aggregations of these numbers as larger numbers, we can get double-duty out of these instructions/circuits if the order they use is the same as the lexographic ordering of our text.

I'm not saying big-endian is universally better. I'm just saying there are real-world advantages.

Note there are more than two endiannesses, where the most common middle-endian variant is big-endian order of 16-bit words, but little-endian order within those 16-bit words.

As for linguistic quirks, there's no obvious universal connection between writing order for text, the order digits are written, and the way numbers are pronounced, so I'm not sure what you're getting at. Arabic is written right-to-left, but they still write the most significant digit on the left. However, I've read that Arabic writers line-break numbers (when forced) by placing the most significant digits on the upper line and the least significant digits on the lower line, so it's not a clear-cut case of Arabic writing numbers in little-endian order right-to-left. I'm not sure about Arabic number pronunciation. I do know that German pronunciation is middle-endian: 256 is "two hundred six-and-fifty", despite the digit writing order being big-endian in German. I met a German guy who incorrectly swaps digits if you talk to him in English while he's doing mental math... forcing him to process English somehow also swaps digit orders in his head.

kps · on March 22, 2020

Simarly, x86_64 condition codes have a parity flag because the terminal needed it.

jpxw · on March 22, 2020

From reddit:

“This is basically the report_final_FINAL2.docx of register names.”

read_if_gay_ · on March 22, 2020

x86 in its entirety is report_final_FINAL2.docx

nitrogen · on March 22, 2020

Where does that filename meme come from?

pas · on March 22, 2020

People's "Downloads" folder being full of New folder", "New folder (1)", all of them being full of similarly named documents.

Also anyone not using proper version control (business folks, designers, scientists, students) usually has a bunch of files all over the place with every kind of inconsistent names for a bunch of versions.

psychoslave · on March 23, 2020

I can use proper version control for code, but when it comes to office products, I use an ISO date to prefix my "versions" of files.

This is really an integration of proper version control within the workflow. I don't find it appropriate to blame the user when they are so misguided.

Don't get me wrong, an ISO date prefix is nothing close to perfect for version control, but it:

- works anywhere you are asked to name a sequence of things

- tacitly communicates well the naming scheme so your colleagues can get it and possibly even start to imitate it

- give a clear time referral

- isn't sensible to system time-stamp, so this keep the chronology even when file are touched or copied elsewhere with a different time-stamp

- spontaneously generate a chronologically ordered result of versions when their are displayed in list

Once again, this is not my favorite way to organize things. But from a practical point of view, it often comes as the less worst solution you can run seamlessly.

pizlonator · on March 22, 2020

That analogy only works if “report”, “report_final”, and “report_final_FINAL” were all released, were insanely successful and made you a billionaire and then you didn’t want to lose the customers that liked those versions when you made some small edits.

elsjaako · on March 22, 2020

Half-life 2: Episode 2

saagarjha · on March 22, 2020

I wouldn't call the extension of an ISA to a new width "small edits" :)

Hendrikto · on March 22, 2020

What hurts most is the ".docx" part

anonymfus · on March 22, 2020

"x" in ".docx" is for XML by the way.

Sharlin · on March 22, 2020

And the "X" in XML is, appropriately enough, for "extensible". Not "extended" though, which would've been even more appropriate.

hinkley · on March 22, 2020

No no, I’ve worked quite a bit with processing XML. He’s got a point.

skocznymroczny · on March 23, 2020

_new2

nayuki · on March 22, 2020

Excellent explanation in the article. Also note that in x86-64 mode, the low 8 bits of all registers can be accessed, namely: AL, BL, CL, DL, SIL, DIL, SPL, BPL, R8B, R9B, R10B, R11B, R12B, R13B, R14B, R15B. Previously, there was no equivalent of SIL, DIL, SPL, and BPL. https://docs.microsoft.com/en-us/windows-hardware/drivers/de...

hinkley · on March 22, 2020

AL, BL make sense. But now I need an explanation for why they switched from L to B for R8B...

Also if you could explain A-R15 that’d be super. Apparently they have learned absolutely nothing.

pizlonator · on March 22, 2020

For some of the L ones it’s possible to access the H part. So it’s not just a byte (B) it’s the low (L) byte and there is a high (H, just the second byte of the word) you can also get to.

For the high 8 registers it isn’t like that. There is no way to get to the “high” byte.

therealcamino · on March 22, 2020

I had the same question about the suffix letters. My uninformed guess is:

  8 bits == byte       == B
 16 bits == word       == W
 32 bits == doubleword == D

jagrsw · on March 22, 2020

I believe D for doubleword is mostly Microsoft C/C++ thing, though I'm sure it appears in other places too.

When it comes to the assembler syntax 32bit word, my guess would be, that 32bit words (e.g AT&T x86 syntax, m68 assemblers) are mostly indicated as l (for long).

movl (at&t's x86) or mov.l (m68k).

Edit: Ah.. yeah, also intel x86 asm syntax uses it, so I guess that's where the idea of using D for 32bit values originates.

xscott · on March 22, 2020

Most people list the registers in alphabetic order, but numerically they are encoded with the B registers in the 4th place: EAX=0, ECX=1, EDX=2, EBX=3, ...

If you ever find yourself writing an AMD-64 assembler, it really feels like you're digging through archaeology with all of the weird quirks you need to implement. The SSE, AVX, and AVX-512 encodings add even more levels of, "why did they do that?!?" which don't make much sense except in the context of history.

userbinator · on March 22, 2020

but numerically they are encoded with the B registers in the 4th place

I've never seen an authoritative answer for that but I believe it comes from the fact that in the 16-bit addressing modes, BX is the only one of the 4 ABCD registers that can be used to address memory, and the circuitry for decoding is very slightly simpler to detect two 1 bits than one 1 and one 0 bit. The 4 other registers are, in order, SP BP SI DI.

which don't make much sense except in the context of history

That can be said for a lot of things...

falcrist · on March 22, 2020

> B, C, D, E were completely generic and interchangeable.

I was under the impression that it went

Accumulator

Base

Counter

Data

I'm not sure about the D or E registers, but I am sure I remember using B as the base address register for arrays, and using C as the counter register for loops and such because the others couldn't be used that way.

It's been a while. Am I misremembering?

bonzini · on March 22, 2020

Yes, in the Z80 BC was a sort of counter register (e.g. for LDIR or DJNZ instructions), which is perhaps why BC became CX. In the 8080 there wasn't much difference between BC and DE.

The interesting part is that BX maps to HL, which explains the weird order AX/CX/DX/BX in the encoding of 8086 instructions.

gpvos · on March 22, 2020

It's in the article. This was introduced with the 8086; in the 8008 they were still generic.

DmitryOlshansky · on March 22, 2020

E - extended, meaning from 16 to 32 bits.

A - accumulator

X - wildcard for both upper and lower 8-bit parts, this becomes redundant b/c of E prefix

amelius · on March 22, 2020

> I’m afraid there’s no short answer! We’ll have to go back to 1972…

No we don't ... the E stands for "extended".

BorisTheBrave · on March 22, 2020

And the X also stands for extended.

Kranar · on March 22, 2020

Alternatively the X is a very old assembly notation for "pair". 8-bit registers on 8080 processors could be paired together to work as a single register. Operations performed on these register pairs used the letter X.

You can look here at the original 8080 reference manual:

https://altairclone.com/downloads/manuals/8080%20Programmers...

For example page 4 lists the name of instructions, INR is "increment register" while INX is "increment register pair", similar notation is used for several other instructions where the X is the register paired version of an instruction.

In the case of the AX register, it likely just refers to the pair of AH and AL registers.

At any rate, it's really interesting glancing over the various 8086 reference manuals. Gives me a deep sense of appreciation for how far things have come and how things have managed to build upon what are otherwise some very simple and fundamental building blocks.

jchw · on March 22, 2020

I was wondering if it was going to cover RAX/amd64, and it does. Nothing terribly new here but it’s a nice dive into an interesting microcosm of intel architecture.

I do somewhat wish AMD managed to get R0-R7 as the standard, though :p oh well.

remcob · on March 22, 2020

The R0-R7 naming is pretty widely supported. For example by LLVM's assembler, so you can use it in Rust or C(++) inline assembly.

pizlonator · on March 22, 2020

I love the x86 registers and their names and special roles.

On the one hand, it’s gross that x86 still has this legacy.

On the other hand, it’s a good thing that it’s possible to maintain compatibility so far back while still having such good perf. I find that aspect of modern x86 to be super impressive.

chkaloon · on March 22, 2020

Interesting how much early processor history was driven by Intel project delays.

Tuna-Fish · on March 22, 2020

IMHO the story of the iAPX432 really is the whole industry in microcosm.

Intel hires the best of the best, the true cream of the crop to design them an ISA that is meant to crystallize everything that is known about ISA design into a single completely new design that discards all the broken crap of yesterday and will be futureproof for decades if not forever, and a chip to implement it.

Everything is going to be great forever, but it turns out it's a bit hard to get done, so they just have the B-team whip up something quick to sell in the meantime. This something was the 8086, which gets adopted into increasingly successful products, but no matter, the new shiny thing is going to displace all of that when it finally ships.

Then it does, and it actually has a hard time competing in performance with the much older stopgap product. It turns out that the team of superstars they hired was very theoretical, and built an ISA that was a dream to program against, but did not really understand what it took to build something that was going to be fast when implemented in hardware. (Also, the system was meant to be used mainly with high-level languages, and the compilers really were not there yet.) Being more expensive, much slower than the competition and with 0 market penetration, the iAPX 432 was dead on arrival in the market.

Luckily, the B-team had been busy working up on another extension of the stop-gap product, the 80286, which is again a runaway success, and only partially because of backwards compatibility with the existing x86 ecosystem. It was also quite fast for it's time.

JoeAltmaier · on March 23, 2020

I recall the iAPX432 came out years later than the 8086, was a 'capabilities machine' and took 500 memory cycles to execute "JMP ." e.g. jump to self, potentially the simplest possible operation.

pjmlp · on March 23, 2020

Intel history is full of such missteps.

Last one was dropping MPX, while everyone else is making memory tagging compulsory as means to tame C and C++ code.

Tteriffic · on March 24, 2020

Sounds just like the Itanium/amd64 story.

mkchoi212 · on March 22, 2020

So cool. Kinda disappointed at my hacker side of things because I never questioned why EAX register is actually named "EAX". Wondering what else I take for granted now :p

kazinator · on March 22, 2020

A is the A register, first letter of the alphabet.

Under 16 bits, it has a high and low part, AL and AH. The 16 bit whole got called AX, because X gets used a lot as a wild card.

EAX is extended (to 32 bits) AX.

pansa2 · on March 22, 2020

A is for “accumulator” - the registers are not just alphabetical. In fact, internally the order is AX, CX, DX then BX.

agumonkey · on March 22, 2020

I.. remember m68k starting with A and B to make binary operators reductions. But I may be wrong.

Also, what are the meaning of C and D ? carry / delta ?

fl0wenol · on March 22, 2020

Count and Data. B is Base.

agumonkey · on March 22, 2020

dank u

iforgotpassword · on March 22, 2020

Then you could have kept calling it A.

kazinator · on March 22, 2020

They wanted some sort of backwards ccompatibility at the assembly language level, the idea being that 8008 assembly code could be fed to an 8080 assembler. Not sure how well that worked out in practice, but that would be a motivation for A remaining a language-level feature that refers to the 8 bit part.

danmg · on March 22, 2020

No. because you still have the opcodes that operate on just 64/32/16/8 bits of the register.

thaumasiotes · on March 22, 2020

Aren't those AL and AH? When we rename A to AL, why do we need to permanently retire the term "A"? The L in AL stands for "low"; what does the A stand for?

kazinator · on March 22, 2020

The A stands for AL, in a snippet of 8008 assembly code that you're supposed to be able to use in the middle of 8080 assembly code, and dwhich is written in a language that knows nothing about AL or AX. Or that was the idea.

thaumasiotes · on March 22, 2020

Thanks. So it's not so much that "you still have the opcodes that operate on just 64/32/16/8 bits" as "ASCII assembly code for any CPU is expected to be source-compatible with ASCII assembly code for any later CPU"?

Is there any indication in the source demarcating the 8008 assembly from the 8080 assembly?

iforgotpassword · on March 22, 2020

But was that ever supported in practice? I don't remember that being supported, and 8008 assembly code needed to be translated anyways by a tool, so that could have taken care of A -> AL.

jonsen · on March 22, 2020

Accumulator.

thaumasiotes · on March 22, 2020

You're missing my point. AL is the "L"ow bits of the "A" register. AH is the "H"igh bits of the "A" register. The whole thing, low plus high, is the "A" register. We can tell that by the names AL and AH.

saagarjha · on March 22, 2020

> The whole thing, low plus high, is the "A" register.

Well, it's the "AX" register.

maayank · on March 22, 2020

I LOVE this kind of technical computer history! Anyone knows of other recommended resources?

coldpie · on March 22, 2020

It's more electrical/silicon engineering than technical history, but I absolutely love this article:

"The 6502 CPU's overflow flag explained at the silicon level" by Ken Shirriff,

http://www.righto.com/2013/01/a-small-part-of-6502-chip-expl...

woadwarrior01 · on March 22, 2020

If you’re visiting the Bay Area, be sure to visit the Computer History Museum in Mountain View. It’s the Mecca of computing history. Also, for early internet history, I’d recommend Katie Hafner’s book: Where Wizards Stay Up Late.

maayank · on March 22, 2020

I did and I loved every moment of it :) Also in meatspace, enjoyed Bletchley Park in the UK.

Re: books, I liked Robert X. Cringley's Accidental Empires, but it covers only until 91 (or 93 if IIRC for the 2nd edition) of the PC industry and not really technical.

edit: also, of course: https://devblogs.microsoft.com/oldnewthing/

Y_Y · on March 22, 2020

The Bletchley Park museum is about ten times better and has lots of running computers, knowledgeable staff, and stuff to play with.

peter_d_sherman · on March 23, 2020

Excerpt:

"You might think—gee, seven is a very odd number of registers—and would be right! The registers were encoded as three bits of the instruction, so it allowed for eight combinations. The last one was for a pseudo-register called M. It stood for memory. M referred to the memory location pointed by the combination of registers H and L. H stood for high byte, while L stood for low byte of the memory address. That was the only available way to reference memory in [an] 8008."

perl4ever · on March 22, 2020

Now I want to define an architecture just like the 8008, only with each register 64-bits and only 64-bits.

saagarjha · on March 22, 2020

Defining an architecture is easy! All you have to do is write up a document with instructions and their encodings, and boom! You’ve created a new architecture. Here’s one I made earlier this year for a class I was teaching, for example: https://github.com/regular-vm/specification

perl4ever · on March 22, 2020

Well, yes, but that would be in order to emulate it or maybe implement it on an fpga or something.

saagarjha · on March 22, 2020

Yes, that's what we did. Did you have something else in mind?

tenebrisalietum · on March 22, 2020

That's such a waste if you really only need something that will fit in 8 bits. Especially limited register space.

Also what if I want to deal with something like screen pixels that packs nicely in 8 bit boundaries, but maybe I don't want the first 8 bits?

knolax · on March 22, 2020

Assembly mnemonics and their cousin pin names are so excessively terse. I wish it was mandatory to expand acronyms in a dedicated field in datasheets. So often you'll find pin names like "NCE" where you're just expected to know a priori that it means "active-low Chip Enable" and it's so counterproductive.

tenebrisalietum · on March 22, 2020

Assembly mnemonics I think are terse to make it dirt simple for assemblers to read them, which themselves had to be hand-entered via hex until they became "self-hosting." Assemblers could not be on the same level as compilers. This was definitely very needed when these CPUs were new in the 70's and such.

I've always liked how 6502 neatly has everything in 3 letter opcodes. But that's not scalable given modern CPU capabilities.

userbinator · on March 22, 2020

...and for humans to read and write them, because what could be a single symbol in an HLL ('+') turns into several latters ('ADD') in Asm.

Some companies experimented with "symbolic Asm" which was even terser, and made it look more like an HLL.

billfruit · on March 22, 2020

More interestingly why are FS and GS registers called so?

Narishma · on March 22, 2020

Maybe because they come after DS and ES.

rzzzt · on March 22, 2020

CS, DS and ES also have letters in alphabetical order, yet they are treated as acronyms. Wondering why is there no meaning (even if fabricated) attached to FS and GS?

rwmj · on March 22, 2020

Not really sure what your question means, but originally CS for "code segment", DS for "data segment" and SS for "stack segment" had distinct use cases. They weren't general purpose segment registers at all. ES was an "extra" data segment register. When FS and GS were added later the alphabetical ordering of CS, DS, ES, FS, GS was natural (with SS still being the odd one out).

gmueckl · on March 22, 2020

Even in 32 bit mode, x86 uses segmented menory access and CS, DS and SS retain their roles as default segments for code, data and stack memory accesses. IIRC, all the segment registers are special because the CPU caches the corresponding segment table entry transparently in the background. The table entries are fairly complex.

iforgotpassword · on March 22, 2020

That's how we got virtualization on x86. When AMD killed segmentation in AMD64, VMware et al cried out loud. AMD reintroduced segmentation after that as a stop-gap solution, but then we got VTx and SVM which are a much better solution anyways.

salawat · on March 22, 2020

I wish they'd done a better job of re-implementing segmentation. I can't tell you how many times my programs have stopped thanks to a segmentation fault. /s

asveikau · on March 22, 2020

Segmentation is also commonly used for thread local storage.

rzzzt · on March 22, 2020

Precisely what you wrote: CS is "Code Segment", DS is "Data Segment", ES is "Extra Segment" (even here it feels a bit manufactured), but FS and GS lack any semi-reasonable expansion.

masklinn · on March 22, 2020

Because the purpose of FS and GS was not specifically defined at the hardware level, they're segment registers but not specific purpose.

And I guess the complete lack of dedicated purpose of FS and GS is why x64 keeps them available.

psychoslave · on March 23, 2020

So, are they faint and generic segments? :D

anticensor · on March 22, 2020

FS: all-Function Segment GS: General-purpose Segment

jbverschoor · on March 22, 2020

Hmm I guess I'm getting old. ehhh senior

based2 · on March 22, 2020

https://www.reddit.com/r/programming/comments/fm2xb9/heres_w...

JoeAltmaier · on March 22, 2020

The iAPX 432 was long after the 8086, not before.

And the 80286 predated the 80386 as a 32-bit processor, so the 80286 was Intel's first 32-bit offering. It was still segmented, missing the paging hardware. I helped write an operating system for it. Short-lived.

ch_123 · on March 22, 2020

The iAPX 432 project started in 1975, but took many years to deliver anything. The 8086 was indeed intended as a stop gap while the ‘432 was under development.

The 80286 by most definitions was a 16-bit CPU, having a 16-bit data bus, 16-bit general purpose registers and 16-bit segments. It did have a 24-bit address space though, up from the 20 bits of the 8086.

EDIT: Even Intel called the 286 a 16-bit CPU, see the Preface of http://bitsavers.trailing-edge.com/components/intel/80286/21...

JoeAltmaier · on March 23, 2020

Oh of course! Thanks. The 24-bit address space changed operating system designs to use a LONG to store physical memory addresses, which I guess confused my memory of the whole thing.

It had 'real mode' which was the ordinary 8086 addressing mode (capable of 20-bit addressing to 1MB) and 'protected mode' which we called 'imaginary mode' since nobody used it.

intc · on March 22, 2020

20286 was a 16 bit CPU. But it was the first one to support "protected mode".

JoeAltmaier · on March 23, 2020

I remember, we called that 'imaginary mode' since it required substantial rewriting of code and few made the attempt.

CalChris · on March 22, 2020

> The iAPX 432 was long after the 8086, not before.

That's why the article said their flagship processor iAPX 432 is delayed. The 432 started development in 1975 (!) but wasn't shipped until 1981.

https://en.wikipedia.org/wiki/Intel_iAPX_432#Description