Development:Debugging Crash
Getting Started
- Debug an application with `gdb --args FEXInterpreter <application full path>`
- Under GDB make sure to do `handle SIGBUS SIGILL SIG63 noprint`
- We use signals for various things, check out Here for more information
Crash in emulated/JIT code
Walking through debugging a simple test application that is crashing.
$ gdb --args FEXInterpreter ./sigsegv_test Reading symbols from FEXInterpreter... (gdb) r Starting program: /usr/bin/FEXInterpreter ./sigsegv_test [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". [New Thread 0x7fccb75f30 (LWP 90107)] Thread 2 "FEXInterpreter" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fccb75f30 (LWP 90107)] 0x0000007fccfb9ec8 in ?? ()
- Okay, we have a sigsegv. Let's double check that it is JIT code (aka, guest emulated code)
(gdb) disas $pc,+32 Dump of assembler code from 0x7fe26ebc20 to 0x7fe26ebc40: => 0x0000007fe26ebc20: stlrb w22, [x21] 0x0000007fe26ebc24: mov x4, #0x0 // #0 0x0000007fe26ebc28: ldr x10, [x20] 0x0000007fe26ebc2c: add x21, x20, #0x8 0x0000007fe26ebc30: mov x22, #0x0 // #0 0x0000007fe26ebc34: strb w22, [x28, #428] 0x0000007fe26ebc38: mov x22, #0x0 // #0 0x0000007fe26ebc3c: strb w22, [x28, #431] End of assembler dump. (gdb) info reg x9 x9 0x0 0
- Looks like JIT code, even doing accesses to x28 which is the FEX CPU state
- Code has no backtrace which reinforces this
- Code is doing an atomic store, which reinforces this is FEX emulating the x86 TSO memory model
- Now that we have checked that we are in the JIT code. Where are we in the guest side?
- Let's dump the FEX CPU state information that is directly pointed to in x28 at all times in JIT code.
(gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State $3 = {rip = 0x401110, gregs = {0x416eb0, 0x7fe1e3b640, 0xffffffffffffff70, 0x0, 0x7fe1e3bf30, 0x0, 0x416eb0, 0x7fe1e3ae28, 0x0, 0x7fe1e3b640, 0x8, 0x7fe2054cc0, 0x7ff75ff48e, 0x7ff75ff48f, 0x0, 0x7fe163b000}, xmm = {{0x0, 0x0}, {0x0, 0x0}, {0xdeadbeef, 0xbad0dad1} <repeats 14 times>}, es = 0x0, cs = 0x0, ss = 0x0, ds = 0x0, gs = 0x0, fs = 0x7fe1e3b640, flags = {0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 38 times>}, mm = {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, gdt = {{base = 0x0} <repeats 32 times>}, FCW = 0x37f, FTW = 0xffff}
- Looks like our guest RIP is currently `0x401110`
- Consult the `info proc mappings` again
0x401000 0x402000 0x1000 0x1000 {...}/sigsegv_test
- Yep, we are inside our test application
- For a simple test, let's load the application in gdb-multiarch and disassemble where we are
$ gdb-multiarch ./sigsegv_test Reading symbols from ./sigsegv_test... (gdb) set disassembly-flavor intel (gdb) disas 0x401110 Dump of assembler code for function main(int, char**): 0x0000000000401110 <+0>: push rbp 0x0000000000401111 <+1>: mov rbp,rsp 0x0000000000401114 <+4>: mov DWORD PTR [rbp-0x4],0x0 0x000000000040111b <+11>: mov DWORD PTR [rbp-0x8],edi 0x000000000040111e <+14>: mov QWORD PTR [rbp-0x10],rsi 0x0000000000401122 <+18>: mov rax,QWORD PTR [rbp-0x10] 0x0000000000401126 <+22>: movsxd rcx,DWORD PTR [rbp-0x8] 0x000000000040112a <+26>: mov rax,QWORD PTR [rax+rcx*8] 0x000000000040112e <+30>: mov QWORD PTR [rbp-0x18],rax 0x0000000000401132 <+34>: mov rax,QWORD PTR [rbp-0x18] 0x0000000000401136 <+38>: mov BYTE PTR [rax],0x63 0x0000000000401139 <+41>: xor eax,eax 0x000000000040113b <+43>: pop rbp 0x000000000040113c <+44>: ret End of assembler dump. (gdb)
- Okay, not super helpful since FEX translates instructions in to blocks, `0x401110` is just our starting address
- It's in this code somewhere, let's change some FEX settings to get a clearer picture
- Set block size to one instruction and disable multiblock
- See the image in FEXConfig to the right
- Now rerun our test application and find the new RIP
(gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State.rip $2 = 0x401136
- Alright, now we know the RIP is exactly at `0x401136`
- Back in gdb-multiarch
(gdb) disas 0x401136,+1 Dump of assembler code from 0x401136 to 0x401137: 0x0000000000401136 <main(int, char**)+38>: mov BYTE PTR [rax],0x63
- Looks like something in main is storing 0x63 to a nullptr
- In this simple case we can now take a look at the test application's source and find the problem.
- We know the problem is in the first block of main()
- We know the exact instruction that it is at
- We know it's something storing a byte to memory
- For more complex cases it is likely necessary to use reverse engineering tools
- BinaryNinja, Ghidra, IDA, and Hopper are all examples of tools like this
What to do from here
Now it becomes a lot harder. You don't get a typical debugging environment or even clean backtraces.
FEX's gdbserver integration is sorely lacking so you can't even use a remote gdb server connecting to FEX right now.
If you enable thunks you can get better backtraces here. Debugging_Crash_In_Thunks
Attempting to use FEX-Emu's gdbserver implementation
Here be dragons
FEX supports gdbserver as an integration. It's implementation is significantly limited but can still be used for debugging and getting some backtraces.
- Currently hardcodes the port to use as `8086` and if you have multiple gdbserver processes running then it will encounter problems.
- Currently does not follow processes through fork/execve at all. No multiprocess support
- This means you must only start the process you're caring about debugging
- Currently starts the process paused and will wait until gdb attach before continuing
- No way to start a FEX instance then attach at some later point
- Ctrl-C to stop the FEX process needs to be done twice
- Maybe with a small delay inbetween because gdb needs to fetch a bunch of data on pause
- Known bug, unknown why broken at the moment
FEXLoader -G -- <Application> <Args...>
Double checking if we are in JIT code
(gdb) info reg pc pc 0x7fccfb9ec8 0x7fccfb9ec8 (gdb) info proc mappings ... 0x7fccfb9000 0x7fcdfb9000 0x1000000 0x0
- Looks like FEX JIT mapping, we start out at 16MB but scale up to 128MB
- Depending on version of FEX we can check the base mapping for a unique string
(gdb) p (char*)0x7fccfb9000 $4 = 0x7fccfb9000 "FEXJIT::Arm64JITCore::"
Getting RIP of current code block
FEX sets up an address in our CPU context to get some debug data out.
Currently this isn't exposed in a way that a debugger can see other than manually typing out gdb commands
p/x *(uint64_t*)($x28+184) = inline block header ptr p/x *(uint32_t*)(*(uint64_t*)($x28+184)) = OffsetToBlockTail p/x *(uint64_t*)($x28+184) + *(uint32_t*)(*(uint64_t*)($x28+184))
p/x *(uint64_t*)((*(uint64_t*)($x28+184) + *(uint32_t*)(*(uint64_t*)($x28+184)))+8) = RIP of block
disas *(uint64_t*)($x28+184),+*(uint64_t*)(*(uint64_t*)($x28+184) + *(uint32_t*)(*(uint64_t*)($x28+184))) = Disassemble this block of code.
While hard to decipher here is basically what is happening.
- x28 is the CPU register that FEX keeps in the JIT for context accesses
- offset 184 is the offset of the `InlineJITBlockHeader` member inside of that context.
- As long as FEX is in a JIT block that offset will be valid to point to the current RIP that the block is operating on.
Getting the stack can also be very useful
x/64wx ((FEXCore::Core::CPUState*)$x28)->gregs[FEXCore::X86State::REG_RSP]
Doing raw pointer math here means that this works even when gdb fails to find symbols for the CPUState object, which happens very frequently for some reason.
Getting RIP of current code block - x86 host
FEX sets up an address in our CPU context to get some debug data out.
Currently this isn't exposed in a way that a debugger can see other than manually typing out gdb commands
p/x *(uint64_t*)($r14+184) = inline block header ptr p/x *(uint32_t*)(*(uint64_t*)($r14+184)) = OffsetToBlockTail p/x *(uint64_t*)($r14+184) + *(uint32_t*)(*(uint64_t*)($x28+184))
p/x *(unsigned long long*)((*(unsigned long long*)($r14+184) + *(unsigned int*)(*(unsigned long long*)($r14+184)))+8) = RIP of block
disas *(unsigned long long*)($r14+184),+*(unsigned long long*)(*(unsigned long long*)($r14+184) + *(unsigned int*)(*(unsigned long long*)($r14+184))) = Disassemble this block of code.
While hard to decipher here is basically what is happening. - x28 is the CPU register that FEX keeps in the JIT for context accesses - offset 184 is the offset of the `InlineJITBlockHeader` member inside of that context. - As long as FEX is in a JIT block that offset will be valid to point to the current RIP that the block is operating on.
Getting the stack can also be very useful
x/64wx ((FEXCore::Core::CPUState*)$r14)->gregs[FEXCore::X86State::REG_RSP]
Doing raw pointer math here means that this works even when gdb fails to find symbols for the CPUState object, which happens very frequently for some reason.