Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, and AVX by Daniel Kusswurm

November 25, 2014
Publisher: Apress
ISBN: 978-1484200650

From the Amazon website:

Modern X86 Assembly Language Programming shows the fundamentals of x86 assembly language programming. It focuses on the aspects of the x86 instruction set that are most relevant to application software development. The book's structure and sample code are designed to help the reader quickly understand x86 assembly language programming and the computational capabilities of the x86 platform. Book appendixes can be downloaded here.

Major topics of the book include the following:

  • 32-bit core architecture, data types, internal registers, memory addressing modes, and the basic instruction set
  • X87 core architecture, register stack, special purpose registers, floating-point encodings, and instruction set
  • MMX technology and instruction set
  • Streaming SIMD extensions (SSE) and Advanced Vector Extensions (AVX) including internal registers, packed integer arithmetic, packed and scalar floating-point arithmetic, and associated instruction sets
  • 64-bit core architecture, data types, internal registers, memory addressing modes, and the basic instruction set
  • 64-bit extensions to SSE and AVX technologies
  • X86 assembly language optimization strategies and techniques



  • The book's purpose is to teach you how to code functions using x86 assembly language that can be invoked from a high-level language such as C++.
  • All of the code emphasize x86 protected-mode programming using a flat linear memory model.
  • All of the x86-32, x87 FPU, MMX, and x86-SSE code can be executed using a processor based on the Nehalem (or later) microarchitecture. Nehalem was introduced in late 2008. It re-introduced hyper-threading and incorporates SSE4.2. Processors based on Nehalem include first generation Core i3, i5, and i7 CPUs.
  • The AVX and AVX2 code requires a processor based on the Sandy Bridge or Haswell microarchitecture.
  • An extensive set of x86-related documentation is available from both Intel and AMD. The most important tome is Volume 2 of the reference manual entitled "Intel 64 and IA-32 Architectures Software Developer's Manual - Combined Volumes: 1, 2A, 2B, 2C, 3A, 3B and 3C. This volume contains comprehensive information for each processor instruction, including detailed operational descriptions, lists of valid operands, affected status flags, and potential exceptions.
  • Numerical data types can be divided into two subtypes: scalar and packed.

The internal architecture of an x86-32 processor can be logically partitioned into several execution units:

  • core execution unit
  • x87 FPU
  • SIMD execution units

The most important bits in EFLAGS register:

  • auxiliary carry flag (AF)
  • carry flag (CF)
  • overflow flag (OF)
  • parity flag (PF)
  • sign flag (SF)
  • zero flag (ZF)

There are three basic types of operands: immediate, register, and memory.

The x86-32 instruction set supports using up to four separate components to specify a memory operand:

EffectiveAddress  = BaseRegister + IndexRegister * ScaleFactor + FixedDisplacementValue

The instructions are grouped into the following categories:

  • Data transfer: mov, cmovcc, push, pop, pushad, popad, xchg, xadd, movsx, movzx
  • Data comparison: cmp, cmpxchg, cmpxchg8b
  • Data conversion: cbw, cwde, cwd, cdq, bswap, movbe, xlatb
  • Binary arithmetic: add, adc, sub, sbb, imul, mul, idiv, div, inc, dec, neg, daa, das, aaa, aas, aam, aad
  • Logical: and, or, xor, not, test
  • Rotate and shift: rcl, rcr, rol, ror, sal/shl, sar, shr, shld, shrd
  • Byte set and bit strings: setcc, bt, bts, btr, btc, bsf, bsr
  • String: cmpsb, cmpsw, cmpsd, lodsb, lodsw, lodsd, movsb, movsw, movsd, scasb, scasw, scasd, stosb, stosw, stosd, rep, repe, repz, repne, repnz
  • Flag manipulation: clc, stc, cmc, std, cld, lahf, sahf, pushfd, popfd
  • Control transfer: jmp, jcc, call, ret, enter, leave, jecxz, loop, loope, loopz, loopne, loopnz
  • Miscellaneous: bound, lea, nop, cpuid