Development:Debugging Crash
Getting Started
- Debug an application with `gdb --args FEXInterpreter <application full path>`
- Under GDB make sure to do `handle SIGBUS SIGILL SIG63 noprint`
- We use signals for various things, check out Here for more information
Crash in emulated/JIT code
Walking through debugging a simple test application that is crashing.
$ gdb --args FEXInterpreter ./sigsegv_test Reading symbols from FEXInterpreter... (gdb) r Starting program: /usr/bin/FEXInterpreter ./sigsegv_test [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". [New Thread 0x7fccb75f30 (LWP 90107)] Thread 2 "FEXInterpreter" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fccb75f30 (LWP 90107)] 0x0000007fccfb9ec8 in ?? ()
- Okay, we have a sigsegv. Let's double check that it is JIT code (aka, guest emulated code)
(gdb) disas $pc,+32 Dump of assembler code from 0x7fe26ebc20 to 0x7fe26ebc40: => 0x0000007fe26ebc20: stlrb w22, [x21] 0x0000007fe26ebc24: mov x4, #0x0 // #0 0x0000007fe26ebc28: ldr x10, [x20] 0x0000007fe26ebc2c: add x21, x20, #0x8 0x0000007fe26ebc30: mov x22, #0x0 // #0 0x0000007fe26ebc34: strb w22, [x28, #428] 0x0000007fe26ebc38: mov x22, #0x0 // #0 0x0000007fe26ebc3c: strb w22, [x28, #431] End of assembler dump. (gdb) info reg x9 x9 0x0 0
- Looks like JIT code, even doing accesses to x28 which is the FEX CPU state
- Code has no backtrace which reinforces this
- Code is doing an atomic store, which reinforces this is FEX emulating the x86 TSO memory model
- Now that we have checked that we are in the JIT code. Where are we in the guest side?
- Let's dump the FEX CPU state information that is directly pointed to in x28 at all times in JIT code.
(gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State $3 = {rip = 0x401110, gregs = {0x416eb0, 0x7fe1e3b640, 0xffffffffffffff70, 0x0, 0x7fe1e3bf30, 0x0, 0x416eb0, 0x7fe1e3ae28, 0x0, 0x7fe1e3b640, 0x8, 0x7fe2054cc0, 0x7ff75ff48e, 0x7ff75ff48f, 0x0, 0x7fe163b000}, xmm = {{0x0, 0x0}, {0x0, 0x0}, {0xdeadbeef, 0xbad0dad1} <repeats 14 times>}, es = 0x0, cs = 0x0, ss = 0x0, ds = 0x0, gs = 0x0, fs = 0x7fe1e3b640, flags = {0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 38 times>}, mm = {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, gdt = {{base = 0x0} <repeats 32 times>}, FCW = 0x37f, FTW = 0xffff}
- Looks like our guest RIP is currently `0x401110`
- Consult the `info proc mappings` again
0x401000 0x402000 0x1000 0x1000 {...}/sigsegv_test
- Yep, we are inside our test application
- For a simple test, let's load the application in gdb-multiarch and disassemble where we are
$ gdb-multiarch ./sigsegv_test Reading symbols from ./sigsegv_test... (gdb) set disassembly-flavor intel (gdb) disas 0x401110 Dump of assembler code for function main(int, char**): 0x0000000000401110 <+0>: push rbp 0x0000000000401111 <+1>: mov rbp,rsp 0x0000000000401114 <+4>: mov DWORD PTR [rbp-0x4],0x0 0x000000000040111b <+11>: mov DWORD PTR [rbp-0x8],edi 0x000000000040111e <+14>: mov QWORD PTR [rbp-0x10],rsi 0x0000000000401122 <+18>: mov rax,QWORD PTR [rbp-0x10] 0x0000000000401126 <+22>: movsxd rcx,DWORD PTR [rbp-0x8] 0x000000000040112a <+26>: mov rax,QWORD PTR [rax+rcx*8] 0x000000000040112e <+30>: mov QWORD PTR [rbp-0x18],rax 0x0000000000401132 <+34>: mov rax,QWORD PTR [rbp-0x18] 0x0000000000401136 <+38>: mov BYTE PTR [rax],0x63 0x0000000000401139 <+41>: xor eax,eax 0x000000000040113b <+43>: pop rbp 0x000000000040113c <+44>: ret End of assembler dump. (gdb)
- Okay, not super helpful since FEX translates instructions in to blocks, `0x401110` is just our starting address
- It's in this code somewhere, let's change some FEX settings to get a clearer picture
- Set block size to one instruction and disable multiblock
- See the image in FEXConfig to the right
- Now rerun our test application and find the new RIP
(gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State.rip $2 = 0x401136
- Alright, now we know the RIP is exactly at `0x401136`
- Back in gdb-multiarch
(gdb) disas 0x401136,+1 Dump of assembler code from 0x401136 to 0x401137: 0x0000000000401136 <main(int, char**)+38>: mov BYTE PTR [rax],0x63
- Looks like something in main is storing 0x63 to a nullptr
- In this simple case we can now take a look at the test application's source and find the problem.
- We know the problem is in the first block of main()
- We know the exact instruction that it is at
- We know it's something storing a byte to memory
- For more complex cases it is likely necessary to use reverse engineering tools
- BinaryNinja, Ghidra, IDA, and Hopper are all examples of tools like this
What to do from here
Now it becomes a lot harder. You don't get a typical debugging environment or even clean backtraces.
FEX's gdbserver integration is sorely lacking so you can't even use a remote gdb server connecting to FEX right now.
If you enable thunks you can get better backtraces here. Debugging_Crash_In_Thunks
Attempting to use FEX-Emu's gdbserver implementation
Here be dragons
FEX supports gdbserver as an integration. It's implementation is significantly limited but can still be used for debugging and getting some backtraces.
- Currently hardcodes the port to use as `8086` and if you have multiple gdbserver processes running then it will encounter problems.
- Currently does not follow processes through fork/execve at all. No multiprocess support
- This means you must only start the process you're caring about debugging
- Currently starts the process paused and will wait until gdb attach before continuing
- No way to start a FEX instance then attach at some later point
- Ctrl-C to stop the FEX process needs to be done twice
- Maybe with a small delay inbetween because gdb needs to fetch a bunch of data on pause
- Known bug, unknown why broken at the moment
FEXLoader -G -- <Application> <Args...>
Double checking if we are in JIT code
(gdb) info reg pc pc 0x7fccfb9ec8 0x7fccfb9ec8 (gdb) info proc mappings ... 0x7fccfb9000 0x7fcdfb9000 0x1000000 0x0
- Looks like FEX JIT mapping, we start out at 16MB but scale up to 128MB
- Depending on version of FEX we can check the base mapping for a unique string
(gdb) p (char*)0x7fccfb9000 $4 = 0x7fccfb9000 "FEXJIT::Arm64JITCore::"
Getting RIP of current code block
FEX sets up an address in our CPU context to get some debug data out.
Currently this isn't exposed in a way that a debugger can see other than manually typing out gdb commands
p/x *(uint64_t*)($x28+184) = inline block header ptr p/x *(uint32_t*)(*(uint64_t*)($x28+184)) = OffsetToBlockTail p/x *(uint64_t*)($x28+184) + *(uint32_t*)(*(uint64_t*)($x28+184))
p/x *(unsigned long long*)((*(unsigned long long*)($x28+184) + *(unsigned int*)(*(unsigned long long*)($x28+184)))+8) = RIP of block
disas *(unsigned long long*)($x28+184),+*(unsigned long long*)(*(unsigned long long*)($x28+184) + *(unsigned int*)(*(unsigned long long*)($x28+184))) = Disassemble this block of code.
While hard to decipher here is basically what is happening.
- x28 is the CPU register that FEX keeps in the JIT for context accesses
- offset 184 is the offset of the `InlineJITBlockHeader` member inside of that context.
- As long as FEX is in a JIT block that offset will be valid to point to the current RIP that the block is operating on.
Getting the stack can also be very useful
x/64wx ((FEXCore::Core::CPUState*)$x28)->gregs[FEXCore::X86State::REG_RSP]
Doing raw pointer math here means that this works even when gdb fails to find symbols for the CPUState object, which happens very frequently for some reason.