Development:Debugging Crash
Revision as of 17:58, 9 March 2022 by Sonicadvance1 (talk | contribs) (Created page with "== Getting Started == * Debug an application with `gdb --args FEXInterpreter <application full path>` * Under GDB make sure to do `handle SIGBUS SIGILL SIG63 noprint` ** We us...")
Getting Started
- Debug an application with `gdb --args FEXInterpreter <application full path>`
- Under GDB make sure to do `handle SIGBUS SIGILL SIG63 noprint`
- We use signals for various things, check out Here for more information
Crash in emulated/JIT code
Walking through debugging a simple test application that is crashing.
$ gdb --args FEXInterpreter ./sigsegv_test Reading symbols from FEXInterpreter... (gdb) r Starting program: /usr/bin/FEXInterpreter ./sigsegv_test [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". [New Thread 0x7fccb75f30 (LWP 90107)] Thread 2 "FEXInterpreter" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fccb75f30 (LWP 90107)] 0x0000007fccfb9ec8 in ?? ()
- Okay, we have a sigsegv. Let's double check that it is JIT code (aka, guest emulated code)
(gdb) disas $pc,+32 Dump of assembler code from 0x7fe26ebc20 to 0x7fe26ebc40: => 0x0000007fe26ebc20: stlrb w22, [x21] 0x0000007fe26ebc24: mov x4, #0x0 // #0 0x0000007fe26ebc28: ldr x10, [x20] 0x0000007fe26ebc2c: add x21, x20, #0x8 0x0000007fe26ebc30: mov x22, #0x0 // #0 0x0000007fe26ebc34: strb w22, [x28, #428] 0x0000007fe26ebc38: mov x22, #0x0 // #0 0x0000007fe26ebc3c: strb w22, [x28, #431] End of assembler dump. (gdb) info reg x9 x9 0x0 0
- Looks like JIT code, even doing accesses to x28 which is the FEX CPU state
- Code has no backtrace which reinforces this
- Code is doing an atomic store, which reinforces this is FEX emulating the x86 TSO memory model
- Now that we have checked that we are in the JIT code. Where are we in the guest side?
- Let's dump the FEX CPU state information that is directly pointed to in x28 at all times in JIT code.
(gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State $3 = {rip = 0x401110, gregs = {0x416eb0, 0x7fe1e3b640, 0xffffffffffffff70, 0x0, 0x7fe1e3bf30, 0x0, 0x416eb0, 0x7fe1e3ae28, 0x0, 0x7fe1e3b640, 0x8, 0x7fe2054cc0, 0x7ff75ff48e, 0x7ff75ff48f, 0x0, 0x7fe163b000}, xmm = {{0x0, 0x0}, {0x0, 0x0}, {0xdeadbeef, 0xbad0dad1} <repeats 14 times>}, es = 0x0, cs = 0x0, ss = 0x0, ds = 0x0, gs = 0x0, fs = 0x7fe1e3b640, flags = {0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 38 times>}, mm = {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, gdt = {{base = 0x0} <repeats 32 times>}, FCW = 0x37f, FTW = 0xffff}
- Looks like our guest RIP is currently `0x401110`
- Consult the `info proc mappings` again
0x401000 0x402000 0x1000 0x1000 {...}/sigsegv_test
- Yep, we are inside our test application
- For a simple test, let's load the application in gdb-multiarch and disassemble where we are
$ gdb-multiarch ./sigsegv_test Reading symbols from ./sigsegv_test... (gdb) set disassembly-flavor intel (gdb) disas 0x401110 Dump of assembler code for function main(int, char**): 0x0000000000401110 <+0>: push rbp 0x0000000000401111 <+1>: mov rbp,rsp 0x0000000000401114 <+4>: mov DWORD PTR [rbp-0x4],0x0 0x000000000040111b <+11>: mov DWORD PTR [rbp-0x8],edi 0x000000000040111e <+14>: mov QWORD PTR [rbp-0x10],rsi 0x0000000000401122 <+18>: mov rax,QWORD PTR [rbp-0x10] 0x0000000000401126 <+22>: movsxd rcx,DWORD PTR [rbp-0x8] 0x000000000040112a <+26>: mov rax,QWORD PTR [rax+rcx*8] 0x000000000040112e <+30>: mov QWORD PTR [rbp-0x18],rax 0x0000000000401132 <+34>: mov rax,QWORD PTR [rbp-0x18] 0x0000000000401136 <+38>: mov BYTE PTR [rax],0x63 0x0000000000401139 <+41>: xor eax,eax 0x000000000040113b <+43>: pop rbp 0x000000000040113c <+44>: ret End of assembler dump. (gdb)
- Okay, not super helpful since FEX translates instructions in to blocks, `0x401110` is just our starting address
- It's in this code somewhere, let's change some FEX settings to get a clearer picture
- Set block size to one instruction and disable multiblock
- Now rerun our test application and find the new RIP
(gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State.rip $2 = 0x401136
- Alright, now we know the RIP is exactly at `0x401136`
- Back in gdb-multiarch
(gdb) disas 0x401136,+1 Dump of assembler code from 0x401136 to 0x401137: 0x0000000000401136 <main(int, char**)+38>: mov BYTE PTR [rax],0x63
- Looks like something in main is storing 0x63 to a nullptr
- In this simple case we can now take a look at the test application's source and find the problem.
- We know the problem is in the first block of main()
- We know the exact instruction that it is at
- We know it's something storing a byte to memory
- For more complex cases it is likely necessary to use reverse engineering tools
- BinaryNinja, Ghidra, IDA, and Hopper are all examples of tools like this
What to do from here
Now it becomes a lot harder. You don't get a typical debugging environment or even clean backtraces.
FEX's gdbserver integration is sorely lacking so you can't even use a remote gdb server connecting to FEX right now.
Double checking if we are in JIT code
(gdb) info reg pc pc 0x7fccfb9ec8 0x7fccfb9ec8 (gdb) info proc mappings ... 0x7fccfb9000 0x7fcdfb9000 0x1000000 0x0
- Looks like FEX JIT mapping, we start out at 16MB but scale up to 128MB
- Depending on version of FEX we can check the base mapping for a unique string
(gdb) p (char*)0x7fccfb9000 $4 = 0x7fccfb9000 "FEXJIT::Arm64JITCore::"