Welcome Guest ( Log In | Register )

Outline · [ Standard ] · Linear+

 The Making of My x86 Disassembler - Part 1

views
     
TSFlierMate4
post Mar 13 2023, 02:31 PM, updated 3y ago

Getting Started
**
Validating
90 posts

Joined: Jan 2023
In 2020, I expressed my intention to do compiler, disassembler and QR code generator in Compiler and Disassembler, does anyone want to join? thread started by my previous user account.

Despite the criticism, e.g.
QUOTE
Indeed, without analyzing the code flow, a disassembler, even one that decodes every byte sequence correctly, will produce a result that leaves something to be desired.


...I still plan to go ahead with a x86 disassembler.

While I am still studying the decoding of CPU opcodes, I have finished the program to read EXE file and dump it code section.

In part 1 (I hope there will part 2 in coming weeks or months), I will share how to find the code section in EXE file.

In the mean time, you can find my code repo "exedump" on GitHub:


You may also refer to PE Format

1. Open file, read the beginning of file
2. If 'MZ', go to next step, if not quit
3. Set file position to 0x3C, read the DWORD offset value (start of PE)
4. Set file position to value read in previous step
5. If 'PE,0,0", go to next step, if not quit
6. Read adjacent WORD for machine type (optional)
7. Read next adjacent WORD for number of section
8. Set file position to 0x18 relative to 'start of PE'
9. Read magic number WORD value (for 32-bit or 64-bit PE)
10. Set file position to 0x2C relative to 'start of PE'
11. Read BaseOfCode DWORD value
12. If magic number (read in step 9) is 0x10B,
set file position to 0xF8 relative to 'start of PE', or else
set file position to 0x108 relative to 'start of PE'
13. Read section table (each table is 40 bytes long)
Its VirtualAddress, SizeOfRawData, PointerToRawData DWORD values
14. If VirtualAddress is equal to BaseOfCode (read in step 11) then go to print hexdump
If not match, then loop until 'number of section' (read in step 7)

You may ask why don't just use AddressOfEntryPoint? Well, because it is relative virtual address, not file offset on disk.
To set file position on disk, have to read section table (or section headers), I mean "code section", for its PointerToRawData.
But how we do know it is code section? I use a trick here, by comparing VirtualAddress found in each section table with BaseOfCode found in header.
For example: (Virtual address)

CODE

'.data' section: 0x1000
'.text' section: 0x2000
'.idata' section : 0x3000


The virtual address is unique for each section, so if my BaseOfCode is 0x2000, then I know the '.text' section is code section.

Until part 2!

This post has been edited by FlierMate4: Mar 14 2023, 12:20 PM
KLKS
post Mar 13 2023, 05:41 PM

Getting Started
**
Junior Member
292 posts

Joined: Jan 2003


Why not contribute to and explore existing frameworks?

https://github.com/qilingframework/qiling
or
https://github.com/capstone-engine/capstone
TSFlierMate4
post Mar 13 2023, 06:14 PM

Getting Started
**
Validating
90 posts

Joined: Jan 2023
QUOTE(KLKS @ Mar 13 2023, 05:41 PM)
Why not contribute to and explore existing frameworks?

https://github.com/qilingframework/qiling
or
https://github.com/capstone-engine/capstone
*
Thank you for the suggestion, I know qiling uses capstone, and I know this is open-source, but I can learn better if I start one from scratch, further more, mine will be just supporting up to i386 instruction set, no extension set and no x64. I believe I will be more ready to join other open-source project after gaining experience from doing my own.
So, maybe later. BTW, I still remember you are rockstar malware analyzt, good!
KLKS
post Mar 13 2023, 07:15 PM

Getting Started
**
Junior Member
292 posts

Joined: Jan 2003


if your objective is to write a disassembler, you should consider separating out the parser of executables and the disassembling engine into separate modules as your project matures.
Also using a higher level language for the disassembler would make it easier to maintain and for others to contribute to

This post has been edited by KLKS: Mar 13 2023, 07:18 PM
duplicatecard P
post Mar 15 2023, 02:56 PM

New Member
*
Probation
11 posts

Joined: Oct 2022
Following! Interesting project. Great job!
junyian
post Mar 16 2023, 02:01 PM

Casual
***
Junior Member
401 posts

Joined: Jan 2003


There's also Zydis, which is a dedicated x86/64 disassembler engine.
https://github.com/zyantific/zydis.

Capstone is more generic across architectures and platforms, so it fits perfectly to what Qiling is for.
Tullamarine
post Mar 20 2023, 10:08 PM

Getting Started
**
Validating
163 posts

Joined: Apr 2020
.........

This post has been edited by Tullamarine: May 16 2023, 05:05 AM
Tullamarine
post Mar 27 2023, 04:39 PM

Getting Started
**
Validating
163 posts

Joined: Apr 2020
QUOTE(FlierMate4 @ Mar 13 2023, 02:31 PM)
But how we do know it is code section? I use a trick here, by comparing VirtualAddress found in each section table with BaseOfCode found in header.
For example: (Virtual address)

CODE

'.data' section: 0x1000
'.text' section: 0x2000
'.idata' section : 0x3000


The virtual address is unique for each section, so if my BaseOfCode is 0x2000, then I know the '.text' section is code section.

*
From other disassembler source code I found, actually there is a better to tell which section is code section:

QUOTE
The section flags in the Characteristics field of the section header indicate characteristics of the section

IMAGE_SCN_CNT_CODE
0x00000020
The section contains executable code.


Next time should use this approach, more reliable.
Tullamarine
post May 1 2023, 04:11 PM

Getting Started
**
Validating
163 posts

Joined: Apr 2020
Hi guys, I give up doing my own disassembler, too much work.
Instead, I rely on Zydis (like that one recommended by @junyian), it requires Zydis.dll (600KB) to run.

Hosted on this GitHub repo:

https://github.com/exedumper/disasm/

Supports x86 and x64 EXE / DLL.

This post has been edited by Tullamarine: May 16 2023, 05:06 AM
Tullamarine
post May 6 2023, 10:19 PM

Getting Started
**
Validating
163 posts

Joined: Apr 2020
............

This post has been edited by Tullamarine: May 25 2023, 02:26 AM
Tullamarine
post May 23 2023, 06:36 PM

Getting Started
**
Validating
163 posts

Joined: Apr 2020
QUOTE(junyian @ Mar 16 2023, 02:01 PM)
There's also Zydis, which is a dedicated x86/64 disassembler engine.
https://github.com/zyantific/zydis.

Capstone is more generic across architectures and platforms, so it fits perfectly to what Qiling is for.
*
I try to use "Zydis" as hash tag in Twitter, but surprisingly Zydis is a brand name of psychotropic drugs. And many of posts with "Zydis" were referrring to the medication. blush.gif
Tullamarine
post May 23 2023, 06:43 PM

Getting Started
**
Validating
163 posts

Joined: Apr 2020
QUOTE(Tullamarine @ May 1 2023, 04:11 PM)
Hi guys, I give up doing my own disassembler, too much work.
Instead, I rely on Zydis (like that one recommended by @junyian), it requires Zydis.dll (600KB) to run.

Hosted on this GitHub repo:

https://github.com/exedumper/disasm/

Supports x86 and x64 EXE / DLL.
*
I was too rush when finishing the disasm.asm project, hence I confuse how to put a 32-bit unsigned integer into one of a pair of two 32-bit uint parameter. (Have to consider little-endianness in x86).

Below is the necessary changes (so that runtime address in disassembled instruction for conditional jump is correct):
user posted image

This post has been edited by Tullamarine: May 23 2023, 06:44 PM
FlierMate
post May 27 2023, 08:09 PM

On my way
****
Validating
543 posts

Joined: Nov 2020
Thanks @junyian for your likes.

Below is screenshot of my disasm.exe, disassembling write.exe (WordPad) on the fly in command prompt window.

user posted image

Notably "int3" is used in MSVC programs that shipped with Windows as padding byte for alignment. You will notice a sequence of "int3" through out these programs.

But my disasm.asm doesn't have code flow analysis, anything in code section will be disassembled regardless of data or code.
Yes, some programs may have data in code section, but these data (or data string) are not executed. So if data is interpreted as code and disassembled by my disasm.asm, then the output is wrong. sad.gif

In the case, always use high-end disassembler like IDA or Ghidra. biggrin.gif
junyian
post May 27 2023, 11:05 PM

Casual
***
Junior Member
401 posts

Joined: Jan 2003


You should start disassembling from the entry point. Not the start of the .code section smile.gif
FlierMate
post May 27 2023, 11:54 PM

On my way
****
Validating
543 posts

Joined: Nov 2020
.......

This post has been edited by FlierMate: Jun 5 2023, 08:10 PM
MatQuasar
post Aug 11 2023, 09:32 PM

Casual
***
Validating
329 posts

Joined: Jun 2023
PE-bear is so powerful, that my EXEDUMP.EXE and DISASM.EXE are just two tiny functions of it.

https://github.com/exedumper/exed (A variant of EXEDUMP)
https://github.com/exedumper/disasm

(Both repos above were abandoned and no longer maintained, please don't submit PR)

For more updated version of exed and disasm, please download attachment of this post:
Attached File  disasm.zip ( 223.21k ) Number of downloads: 4

Attached File  exed.zip ( 2.8k ) Number of downloads: 5


Bug fix: disasm - Runtime address endianness, command-line parsing for PowerShell
exed - Command-line parsing for PowerShell

user posted image

This post has been edited by MatQuasar: Oct 15 2023, 11:24 PM
flashang
post Aug 12 2023, 09:33 AM

Casual
***
Junior Member
355 posts

Joined: Aug 2021


QUOTE(MatQuasar @ Aug 11 2023, 09:32 PM)
PE-bear is so powerful, that my EXEDUMP.EXE and DISASM.EXE are just two tiny functions of it.

https://github.com/exedumper/exed  (A variant of EXEDUMP)
https://github.com/exedumper/disasm

(Both repos above were abandoned and no longer maintained, please don't submit PR)

user posted image
*
For those people who make compiler for windows system, also need to study PE Format.

Ref :
PE Format - Win32 apps | Microsoft Learn
https://learn.microsoft.com/en-us/windows/w...debug/pe-format

Portable Executable - Wikipedia
https://en.wikipedia.org/wiki/Portable_Executable


MatQuasar
post Aug 12 2023, 01:13 PM

Casual
***
Validating
329 posts

Joined: Jun 2023
QUOTE(flashang @ Aug 12 2023, 09:33 AM)
For those people who make compiler for windows system, also need to study PE Format.

Ref :
PE Format - Win32 apps | Microsoft Learn
https://learn.microsoft.com/en-us/windows/w...debug/pe-format

Portable Executable - Wikipedia
https://en.wikipedia.org/wiki/Portable_Executable
*
I also recommend PE Tutorial by Tomasz Grysztar:

https://board.flatassembler.net/topic.php?t=20690

Also, an old document on PE format spec:

https://bytepointer.com/resources/oleary_pe_format.htm



This post has been edited by MatQuasar: Oct 15 2023, 07:34 PM

 

Change to:
| Lo-Fi Version
0.0385sec    0.36    6 queries    GZIP Disabled
Time is now: 24th December 2025 - 03:35 AM