Difference between revisions of "Development:Debugging Crash"
 (Remove screenshot of outdated FEXConfig interface)  | 
				|||
| (21 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
== Getting Started ==  | == Getting Started ==  | ||
| − | * Debug an application with `gdb --args   | + | * Debug an application with `gdb --args FEX <application full path>`  | 
* Under GDB make sure to do `handle SIGBUS SIGILL SIG63 noprint`  | * Under GDB make sure to do `handle SIGBUS SIGILL SIG63 noprint`  | ||
** We use signals for various things, check out [[Development:Debugging_FEX_with_Signals|Here]] for more information  | ** We use signals for various things, check out [[Development:Debugging_FEX_with_Signals|Here]] for more information  | ||
| Line 6: | Line 6: | ||
== Crash in emulated/JIT code ==  | == Crash in emulated/JIT code ==  | ||
Walking through debugging a simple test application that is crashing.  | Walking through debugging a simple test application that is crashing.  | ||
| − |    $ gdb --args   | + |    $ gdb --args FEX ./sigsegv_test  | 
| − |    Reading symbols from   | + |    Reading symbols from FEX...  | 
   (gdb) r  |    (gdb) r  | ||
| − |    Starting program: /usr/bin/  | + |    Starting program: /usr/bin/FEX ./sigsegv_test  | 
   [Thread debugging using libthread_db enabled]  |    [Thread debugging using libthread_db enabled]  | ||
   Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".  |    Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".  | ||
   [New Thread 0x7fccb75f30 (LWP 90107)]  |    [New Thread 0x7fccb75f30 (LWP 90107)]  | ||
| − |    Thread 2 "  | + |    Thread 2 "FEX received signal SIGSEGV, Segmentation fault.  | 
   [Switching to Thread 0x7fccb75f30 (LWP 90107)]  |    [Switching to Thread 0x7fccb75f30 (LWP 90107)]  | ||
   0x0000007fccfb9ec8 in ?? ()  |    0x0000007fccfb9ec8 in ?? ()  | ||
| Line 73: | Line 73: | ||
** It's in this code somewhere, let's change some FEX settings to get a clearer picture  | ** It's in this code somewhere, let's change some FEX settings to get a clearer picture  | ||
* Set block size to one instruction and disable multiblock  | * Set block size to one instruction and disable multiblock  | ||
| − | |||
* Now rerun our test application and find the new RIP  | * Now rerun our test application and find the new RIP  | ||
| Line 98: | Line 97: | ||
FEX's gdbserver integration is sorely lacking so you can't even use a remote gdb server connecting to FEX right now.  | FEX's gdbserver integration is sorely lacking so you can't even use a remote gdb server connecting to FEX right now.  | ||
| − | If you enable thunks you can get better backtraces here. [[  | + | If you enable thunks you can get better backtraces here. [[Development:Debugging_Crash_In_Thunks|Debugging_Crash_In_Thunks]]  | 
| + | |||
| + | == Attempting to use FEX-Emu's gdbserver implementation ==  | ||
| + | '''Here be dragons'''  | ||
| + | |||
| + | FEX supports gdbserver as an integration. It's implementation is significantly limited but can still be used for debugging and getting some backtraces.  | ||
| + | * Currently hardcodes the port to use as `8086` and if you have multiple gdbserver processes running then it will encounter problems.  | ||
| + | * Currently does not follow processes through fork/execve at all. No multiprocess support  | ||
| + | ** This means you '''must''' only start the process you're caring about debugging  | ||
| + | * Currently starts the process paused and will wait until gdb attach before continuing  | ||
| + | ** No way to start a FEX instance then attach at some later point  | ||
| + | * Ctrl-C to stop the FEX process needs to be done twice  | ||
| + | ** Maybe with a small delay inbetween because gdb needs to fetch a bunch of data on pause  | ||
| + | ** Known bug, unknown why broken at the moment  | ||
| + | |||
| + |   FEXLoader -G -- <Application> <Args...>  | ||
== Double checking if we are in JIT code ==  | == Double checking if we are in JIT code ==  | ||
| Line 112: | Line 126: | ||
   (gdb) p (char*)0x7fccfb9000  |    (gdb) p (char*)0x7fccfb9000  | ||
   $4 = 0x7fccfb9000 "FEXJIT::Arm64JITCore::"  |    $4 = 0x7fccfb9000 "FEXJIT::Arm64JITCore::"  | ||
| + | |||
| + | == Getting RIP of current code block ==  | ||
| + | FEX sets up an address in our CPU context to get some debug data out.  | ||
| + | |||
| + | Currently this isn't exposed in a way that a debugger can see other than manually typing out gdb commands  | ||
| + | |||
| + |   p/x *(uint64_t*)($x28) = inline block header ptr  | ||
| + |   p/x *(uint32_t*)(*(uint64_t*)($x28)) = OffsetToBlockTail  | ||
| + |   p/x *(uint64_t*)($x28) + *(uint32_t*)(*(uint64_t*)($x28))  | ||
| + | |||
| + |   p/x *(unsigned long long*)((*(unsigned long long*)($x28) + *(unsigned int*)(*(unsigned long long*)($x28)))+8) = RIP of block  | ||
| + | |||
| + |   disas *(unsigned long long*)($x28),+*(unsigned long long*)(*(unsigned long long*)($x28) + *(unsigned int*)(*(unsigned long long*)($x28))) = Disassemble this block of code.  | ||
| + | |||
| + | |||
| + | While hard to decipher here is basically what is happening.  | ||
| + | - x28 is the CPU register that FEX keeps in the JIT for context accesses  | ||
| + | - offset 184 is the offset of the `InlineJITBlockHeader` member inside of that context.  | ||
| + | - As long as FEX is in a JIT block that offset will be valid to point to the current RIP that the block is operating on.  | ||
| + | |||
| + | Getting the stack can also be very useful  | ||
| + | |||
| + |   x/64wx ((FEXCore::Core::CPUState*)$x28)->gregs[FEXCore::X86State::REG_RSP]  | ||
| + | |||
| + | Doing raw pointer math here means that this works even when gdb fails to find symbols for the CPUState object, which happens very frequently for some reason.  | ||
Latest revision as of 14:13, 22 September 2025
Getting Started
- Debug an application with `gdb --args FEX <application full path>`
 - Under GDB make sure to do `handle SIGBUS SIGILL SIG63 noprint`
- We use signals for various things, check out Here for more information
 
 
Crash in emulated/JIT code
Walking through debugging a simple test application that is crashing.
$ gdb --args FEX ./sigsegv_test Reading symbols from FEX... (gdb) r Starting program: /usr/bin/FEX ./sigsegv_test [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". [New Thread 0x7fccb75f30 (LWP 90107)] Thread 2 "FEX received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fccb75f30 (LWP 90107)] 0x0000007fccfb9ec8 in ?? ()
- Okay, we have a sigsegv. Let's double check that it is JIT code (aka, guest emulated code)
 
 (gdb) disas $pc,+32
 Dump of assembler code from 0x7fe26ebc20 to 0x7fe26ebc40:
 => 0x0000007fe26ebc20:  stlrb   w22, [x21]
    0x0000007fe26ebc24:  mov     x4, #0x0                        // #0
    0x0000007fe26ebc28:  ldr     x10, [x20]
    0x0000007fe26ebc2c:  add     x21, x20, #0x8
    0x0000007fe26ebc30:  mov     x22, #0x0                       // #0
    0x0000007fe26ebc34:  strb    w22, [x28, #428]
    0x0000007fe26ebc38:  mov     x22, #0x0                       // #0
    0x0000007fe26ebc3c:  strb    w22, [x28, #431]
 End of assembler dump.
 (gdb) info reg x9
 x9             0x0                 0
- Looks like JIT code, even doing accesses to x28 which is the FEX CPU state
 - Code has no backtrace which reinforces this
 - Code is doing an atomic store, which reinforces this is FEX emulating the x86 TSO memory model
 
- Now that we have checked that we are in the JIT code. Where are we in the guest side?
 - Let's dump the FEX CPU state information that is directly pointed to in x28 at all times in JIT code.
 
 (gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State
 $3 = {rip = 0x401110, gregs = {0x416eb0, 0x7fe1e3b640, 0xffffffffffffff70, 0x0, 0x7fe1e3bf30, 0x0, 0x416eb0, 0x7fe1e3ae28, 0x0, 0x7fe1e3b640, 0x8, 0x7fe2054cc0, 0x7ff75ff48e, 0x7ff75ff48f, 0x0, 0x7fe163b000}, xmm = {{0x0, 0x0}, {0x0, 0x0}, {0xdeadbeef, 0xbad0dad1} <repeats 14 times>}, es = 0x0, cs = 0x0, ss = 0x0, ds = 0x0, gs = 0x0, fs = 0x7fe1e3b640, flags = {0x0, 0x1, 0x0,
   0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 38 times>}, mm = {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, gdt = {{base = 0x0} <repeats 32 times>}, FCW = 0x37f, FTW = 0xffff}
- Looks like our guest RIP is currently `0x401110`
- Consult the `info proc mappings` again
 
 
 0x401000           0x402000     0x1000     0x1000 {...}/sigsegv_test
- Yep, we are inside our test application
 - For a simple test, let's load the application in gdb-multiarch and disassemble where we are
 
 $ gdb-multiarch ./sigsegv_test
 Reading symbols from ./sigsegv_test...
 (gdb) set disassembly-flavor intel
 (gdb) disas 0x401110
 Dump of assembler code for function main(int, char**):
    0x0000000000401110 <+0>:     push   rbp
    0x0000000000401111 <+1>:     mov    rbp,rsp
    0x0000000000401114 <+4>:     mov    DWORD PTR [rbp-0x4],0x0
    0x000000000040111b <+11>:    mov    DWORD PTR [rbp-0x8],edi
    0x000000000040111e <+14>:    mov    QWORD PTR [rbp-0x10],rsi
    0x0000000000401122 <+18>:    mov    rax,QWORD PTR [rbp-0x10]
    0x0000000000401126 <+22>:    movsxd rcx,DWORD PTR [rbp-0x8]
    0x000000000040112a <+26>:    mov    rax,QWORD PTR [rax+rcx*8]
    0x000000000040112e <+30>:    mov    QWORD PTR [rbp-0x18],rax
    0x0000000000401132 <+34>:    mov    rax,QWORD PTR [rbp-0x18]
    0x0000000000401136 <+38>:    mov    BYTE PTR [rax],0x63
    0x0000000000401139 <+41>:    xor    eax,eax
    0x000000000040113b <+43>:    pop    rbp
    0x000000000040113c <+44>:    ret
 End of assembler dump.
 (gdb)
- Okay, not super helpful since FEX translates instructions in to blocks, `0x401110` is just our starting address
- It's in this code somewhere, let's change some FEX settings to get a clearer picture
 
 - Set block size to one instruction and disable multiblock
 - Now rerun our test application and find the new RIP
 
(gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State.rip $2 = 0x401136
- Alright, now we know the RIP is exactly at `0x401136`
 - Back in gdb-multiarch
 
 (gdb) disas 0x401136,+1
 Dump of assembler code from 0x401136 to 0x401137:
    0x0000000000401136 <main(int, char**)+38>:   mov    BYTE PTR [rax],0x63
- Looks like something in main is storing 0x63 to a nullptr
 - In this simple case we can now take a look at the test application's source and find the problem.
- We know the problem is in the first block of main()
 - We know the exact instruction that it is at
 - We know it's something storing a byte to memory
 
 - For more complex cases it is likely necessary to use reverse engineering tools
- BinaryNinja, Ghidra, IDA, and Hopper are all examples of tools like this
 
 
What to do from here
Now it becomes a lot harder. You don't get a typical debugging environment or even clean backtraces.
FEX's gdbserver integration is sorely lacking so you can't even use a remote gdb server connecting to FEX right now.
If you enable thunks you can get better backtraces here. Debugging_Crash_In_Thunks
Attempting to use FEX-Emu's gdbserver implementation
Here be dragons
FEX supports gdbserver as an integration. It's implementation is significantly limited but can still be used for debugging and getting some backtraces.
- Currently hardcodes the port to use as `8086` and if you have multiple gdbserver processes running then it will encounter problems.
 - Currently does not follow processes through fork/execve at all. No multiprocess support
- This means you must only start the process you're caring about debugging
 
 - Currently starts the process paused and will wait until gdb attach before continuing
- No way to start a FEX instance then attach at some later point
 
 - Ctrl-C to stop the FEX process needs to be done twice
- Maybe with a small delay inbetween because gdb needs to fetch a bunch of data on pause
 - Known bug, unknown why broken at the moment
 
 
FEXLoader -G -- <Application> <Args...>
Double checking if we are in JIT code
(gdb) info reg pc pc 0x7fccfb9ec8 0x7fccfb9ec8 (gdb) info proc mappings ... 0x7fccfb9000 0x7fcdfb9000 0x1000000 0x0
- Looks like FEX JIT mapping, we start out at 16MB but scale up to 128MB
 - Depending on version of FEX we can check the base mapping for a unique string
 
(gdb) p (char*)0x7fccfb9000 $4 = 0x7fccfb9000 "FEXJIT::Arm64JITCore::"
Getting RIP of current code block
FEX sets up an address in our CPU context to get some debug data out.
Currently this isn't exposed in a way that a debugger can see other than manually typing out gdb commands
p/x *(uint64_t*)($x28) = inline block header ptr p/x *(uint32_t*)(*(uint64_t*)($x28)) = OffsetToBlockTail p/x *(uint64_t*)($x28) + *(uint32_t*)(*(uint64_t*)($x28))
p/x *(unsigned long long*)((*(unsigned long long*)($x28) + *(unsigned int*)(*(unsigned long long*)($x28)))+8) = RIP of block
disas *(unsigned long long*)($x28),+*(unsigned long long*)(*(unsigned long long*)($x28) + *(unsigned int*)(*(unsigned long long*)($x28))) = Disassemble this block of code.
While hard to decipher here is basically what is happening.
- x28 is the CPU register that FEX keeps in the JIT for context accesses
- offset 184 is the offset of the `InlineJITBlockHeader` member inside of that context.
- As long as FEX is in a JIT block that offset will be valid to point to the current RIP that the block is operating on.
Getting the stack can also be very useful
x/64wx ((FEXCore::Core::CPUState*)$x28)->gregs[FEXCore::X86State::REG_RSP]
Doing raw pointer math here means that this works even when gdb fails to find symbols for the CPUState object, which happens very frequently for some reason.