Disassembly Basics (IDA / Ghidra)
Turning the original game’s machine code back into readable assembly (and, with a decompiler, pseudo-C) to recover the algorithm behind a format when static byte-inspection alone leaves gaps.
Why it matters
strings and a hex editor reveal a format’s layout, but not its logic — how PID frames are RLE-decoded, how WWD tile flags are unpacked. When the structure is ambiguous, watching CLAW.EXE’s loader code resolves it definitively. Used carefully, this stays clean-room: you read the algorithm to document it, then someone re-implements from the doc — you never paste the output. See reverse-engineering-mindset-ethics.
How it works
Two tools dominate: Ghidra (free, NSA, excellent decompiler) and IDA (commercial, the disassembly standard). Workflow on a 32-bit Win32 PE like CLAW.EXE:
- Find the format reader — search strings for
"CLAW.WWD", follow the cross-reference (xref) to the function that opens it. - Read the decompiler view — Ghidra’s pseudo-C shows
freadsizes and struct field offsets directly; aread(buf, 0x20)confirms a 32-byte header. - Trace the loop — RLE/decode loops show up as a
whilereading a count byte then copying/skipping N pixels.
Recognise x86 calling conventions: arguments are pushed right-to-left, return value is in EAX. A cmp/jz chain after a 4-byte load is almost always a magic-number check. See computer-architecture for register and stack basics.
Example
Decompiling the PID loader, Ghidra emits roughly:
hdr.flags = read_u32(f); // +0x00
hdr.width = read_u32(f); // +0x08
hdr.height = read_u32(f); // +0x0C
if (hdr.flags & 1) decode_rle(f, out); // compressed
else fread(out, 1, w*h, f); // rawThat single if (flags & 1) answers a question hex-staring could not: bit 0 of the header flags toggles RLE compression. You write that fact in the spec and implement it in new code.
Pitfalls
- Pasting decompiler output as your engine — that is a derivative work; document the behaviour, write fresh code.
- Trusting auto-named locals —
iVar3,puVar7are guesses; rename only once you understand them. - Ignoring struct padding — the compiler may align fields; a “missing” 2 bytes is often padding, not a field.
- Disassembling when you do not need to — if re-rendering already matches, the loop logic is settled; do not burn hours in IDA.