Difference between revisions of "Development:Debugging Crash"
Jump to navigation
Jump to search
Line 98: | Line 98: | ||
FEX's gdbserver integration is sorely lacking so you can't even use a remote gdb server connecting to FEX right now. | FEX's gdbserver integration is sorely lacking so you can't even use a remote gdb server connecting to FEX right now. | ||
− | If you enable thunks you can get better backtraces here. [[ | + | If you enable thunks you can get better backtraces here. [[Development:Debugging_Crash_In_Thunks|Debugging_Crash_In_Thunks]] |
== Double checking if we are in JIT code == | == Double checking if we are in JIT code == |
Revision as of 18:03, 9 March 2022
Getting Started
- Debug an application with `gdb --args FEXInterpreter <application full path>`
- Under GDB make sure to do `handle SIGBUS SIGILL SIG63 noprint`
- We use signals for various things, check out Here for more information
Crash in emulated/JIT code
Walking through debugging a simple test application that is crashing.
$ gdb --args FEXInterpreter ./sigsegv_test Reading symbols from FEXInterpreter... (gdb) r Starting program: /usr/bin/FEXInterpreter ./sigsegv_test [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". [New Thread 0x7fccb75f30 (LWP 90107)] Thread 2 "FEXInterpreter" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fccb75f30 (LWP 90107)] 0x0000007fccfb9ec8 in ?? ()
- Okay, we have a sigsegv. Let's double check that it is JIT code (aka, guest emulated code)
(gdb) disas $pc,+32 Dump of assembler code from 0x7fe26ebc20 to 0x7fe26ebc40: => 0x0000007fe26ebc20: stlrb w22, [x21] 0x0000007fe26ebc24: mov x4, #0x0 // #0 0x0000007fe26ebc28: ldr x10, [x20] 0x0000007fe26ebc2c: add x21, x20, #0x8 0x0000007fe26ebc30: mov x22, #0x0 // #0 0x0000007fe26ebc34: strb w22, [x28, #428] 0x0000007fe26ebc38: mov x22, #0x0 // #0 0x0000007fe26ebc3c: strb w22, [x28, #431] End of assembler dump. (gdb) info reg x9 x9 0x0 0
- Looks like JIT code, even doing accesses to x28 which is the FEX CPU state
- Code has no backtrace which reinforces this
- Code is doing an atomic store, which reinforces this is FEX emulating the x86 TSO memory model
- Now that we have checked that we are in the JIT code. Where are we in the guest side?
- Let's dump the FEX CPU state information that is directly pointed to in x28 at all times in JIT code.
(gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State $3 = {rip = 0x401110, gregs = {0x416eb0, 0x7fe1e3b640, 0xffffffffffffff70, 0x0, 0x7fe1e3bf30, 0x0, 0x416eb0, 0x7fe1e3ae28, 0x0, 0x7fe1e3b640, 0x8, 0x7fe2054cc0, 0x7ff75ff48e, 0x7ff75ff48f, 0x0, 0x7fe163b000}, xmm = {{0x0, 0x0}, {0x0, 0x0}, {0xdeadbeef, 0xbad0dad1} <repeats 14 times>}, es = 0x0, cs = 0x0, ss = 0x0, ds = 0x0, gs = 0x0, fs = 0x7fe1e3b640, flags = {0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 38 times>}, mm = {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, gdt = {{base = 0x0} <repeats 32 times>}, FCW = 0x37f, FTW = 0xffff}
- Looks like our guest RIP is currently `0x401110`
- Consult the `info proc mappings` again
0x401000 0x402000 0x1000 0x1000 {...}/sigsegv_test
- Yep, we are inside our test application
- For a simple test, let's load the application in gdb-multiarch and disassemble where we are
$ gdb-multiarch ./sigsegv_test Reading symbols from ./sigsegv_test... (gdb) set disassembly-flavor intel (gdb) disas 0x401110 Dump of assembler code for function main(int, char**): 0x0000000000401110 <+0>: push rbp 0x0000000000401111 <+1>: mov rbp,rsp 0x0000000000401114 <+4>: mov DWORD PTR [rbp-0x4],0x0 0x000000000040111b <+11>: mov DWORD PTR [rbp-0x8],edi 0x000000000040111e <+14>: mov QWORD PTR [rbp-0x10],rsi 0x0000000000401122 <+18>: mov rax,QWORD PTR [rbp-0x10] 0x0000000000401126 <+22>: movsxd rcx,DWORD PTR [rbp-0x8] 0x000000000040112a <+26>: mov rax,QWORD PTR [rax+rcx*8] 0x000000000040112e <+30>: mov QWORD PTR [rbp-0x18],rax 0x0000000000401132 <+34>: mov rax,QWORD PTR [rbp-0x18] 0x0000000000401136 <+38>: mov BYTE PTR [rax],0x63 0x0000000000401139 <+41>: xor eax,eax 0x000000000040113b <+43>: pop rbp 0x000000000040113c <+44>: ret End of assembler dump. (gdb)
- Okay, not super helpful since FEX translates instructions in to blocks, `0x401110` is just our starting address
- It's in this code somewhere, let's change some FEX settings to get a clearer picture
- Set block size to one instruction and disable multiblock
- Now rerun our test application and find the new RIP
(gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State.rip $2 = 0x401136
- Alright, now we know the RIP is exactly at `0x401136`
- Back in gdb-multiarch
(gdb) disas 0x401136,+1 Dump of assembler code from 0x401136 to 0x401137: 0x0000000000401136 <main(int, char**)+38>: mov BYTE PTR [rax],0x63
- Looks like something in main is storing 0x63 to a nullptr
- In this simple case we can now take a look at the test application's source and find the problem.
- We know the problem is in the first block of main()
- We know the exact instruction that it is at
- We know it's something storing a byte to memory
- For more complex cases it is likely necessary to use reverse engineering tools
- BinaryNinja, Ghidra, IDA, and Hopper are all examples of tools like this
What to do from here
Now it becomes a lot harder. You don't get a typical debugging environment or even clean backtraces.
FEX's gdbserver integration is sorely lacking so you can't even use a remote gdb server connecting to FEX right now.
If you enable thunks you can get better backtraces here. Debugging_Crash_In_Thunks
Double checking if we are in JIT code
(gdb) info reg pc pc 0x7fccfb9ec8 0x7fccfb9ec8 (gdb) info proc mappings ... 0x7fccfb9000 0x7fcdfb9000 0x1000000 0x0
- Looks like FEX JIT mapping, we start out at 16MB but scale up to 128MB
- Depending on version of FEX we can check the base mapping for a unique string
(gdb) p (char*)0x7fccfb9000 $4 = 0x7fccfb9000 "FEXJIT::Arm64JITCore::"