Development:Debugging Crash

Getting Started

Debug an application with `gdb --args FEXInterpreter <application full path>`
Under GDB make sure to do `handle SIGBUS SIGILL SIG63 noprint`
- We use signals for various things, check out Here for more information

Crash in emulated/JIT code

Walking through debugging a simple test application that is crashing.

 $ gdb --args FEXInterpreter ./sigsegv_test
 Reading symbols from FEXInterpreter...
 (gdb) r
 Starting program: /usr/bin/FEXInterpreter ./sigsegv_test
 [Thread debugging using libthread_db enabled]
 Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
 [New Thread 0x7fccb75f30 (LWP 90107)]
 
 Thread 2 "FEXInterpreter" received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x7fccb75f30 (LWP 90107)]
 0x0000007fccfb9ec8 in ?? ()

Okay, we have a sigsegv. Let's double check that it is JIT code (aka, guest emulated code)

 (gdb) disas $pc,+32
 Dump of assembler code from 0x7fe26ebc20 to 0x7fe26ebc40:
 => 0x0000007fe26ebc20:  stlrb   w22, [x21]
    0x0000007fe26ebc24:  mov     x4, #0x0                        // #0
    0x0000007fe26ebc28:  ldr     x10, [x20]
    0x0000007fe26ebc2c:  add     x21, x20, #0x8
    0x0000007fe26ebc30:  mov     x22, #0x0                       // #0
    0x0000007fe26ebc34:  strb    w22, [x28, #428]
    0x0000007fe26ebc38:  mov     x22, #0x0                       // #0
    0x0000007fe26ebc3c:  strb    w22, [x28, #431]
 End of assembler dump.
 (gdb) info reg x9
 x9             0x0                 0

Looks like JIT code, even doing accesses to x28 which is the FEX CPU state
Code has no backtrace which reinforces this
Code is doing an atomic store, which reinforces this is FEX emulating the x86 TSO memory model

Now that we have checked that we are in the JIT code. Where are we in the guest side?
Let's dump the FEX CPU state information that is directly pointed to in x28 at all times in JIT code.

 (gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State
 $3 = {rip = 0x401110, gregs = {0x416eb0, 0x7fe1e3b640, 0xffffffffffffff70, 0x0, 0x7fe1e3bf30, 0x0, 0x416eb0, 0x7fe1e3ae28, 0x0, 0x7fe1e3b640, 0x8, 0x7fe2054cc0, 0x7ff75ff48e, 0x7ff75ff48f, 0x0, 0x7fe163b000}, xmm = {{0x0, 0x0}, {0x0, 0x0}, {0xdeadbeef, 0xbad0dad1} <repeats 14 times>}, es = 0x0, cs = 0x0, ss = 0x0, ds = 0x0, gs = 0x0, fs = 0x7fe1e3b640, flags = {0x0, 0x1, 0x0,
   0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 38 times>}, mm = {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, gdt = {{base = 0x0} <repeats 32 times>}, FCW = 0x37f, FTW = 0xffff}

Looks like our guest RIP is currently `0x401110`
- Consult the `info proc mappings` again

 0x401000           0x402000     0x1000     0x1000 {...}/sigsegv_test

Yep, we are inside our test application
For a simple test, let's load the application in gdb-multiarch and disassemble where we are

 $ gdb-multiarch ./sigsegv_test
 Reading symbols from ./sigsegv_test...
 (gdb) set disassembly-flavor intel
 (gdb) disas 0x401110
 Dump of assembler code for function main(int, char**):
    0x0000000000401110 <+0>:     push   rbp
    0x0000000000401111 <+1>:     mov    rbp,rsp
    0x0000000000401114 <+4>:     mov    DWORD PTR [rbp-0x4],0x0
    0x000000000040111b <+11>:    mov    DWORD PTR [rbp-0x8],edi
    0x000000000040111e <+14>:    mov    QWORD PTR [rbp-0x10],rsi
    0x0000000000401122 <+18>:    mov    rax,QWORD PTR [rbp-0x10]
    0x0000000000401126 <+22>:    movsxd rcx,DWORD PTR [rbp-0x8]
    0x000000000040112a <+26>:    mov    rax,QWORD PTR [rax+rcx*8]
    0x000000000040112e <+30>:    mov    QWORD PTR [rbp-0x18],rax
    0x0000000000401132 <+34>:    mov    rax,QWORD PTR [rbp-0x18]
    0x0000000000401136 <+38>:    mov    BYTE PTR [rax],0x63
    0x0000000000401139 <+41>:    xor    eax,eax
    0x000000000040113b <+43>:    pop    rbp
    0x000000000040113c <+44>:    ret
 End of assembler dump.
 (gdb)

Okay, not super helpful since FEX translates instructions in to blocks, `0x401110` is just our starting address
- It's in this code somewhere, let's change some FEX settings to get a clearer picture
Set block size to one instruction and disable multiblock

Now rerun our test application and find the new RIP

 (gdb) p/x ((FEXCore::Core::CpuStateFrame*)$x28)->State.rip
 $2 = 0x401136

Alright, now we know the RIP is exactly at `0x401136`
Back in gdb-multiarch

 (gdb) disas 0x401136,+1
 Dump of assembler code from 0x401136 to 0x401137:
    0x0000000000401136 <main(int, char**)+38>:   mov    BYTE PTR [rax],0x63

Looks like something in main is storing 0x63 to a nullptr
In this simple case we can now take a look at the test application's source and find the problem.
- We know the problem is in the first block of main()
- We know the exact instruction that it is at
- We know it's something storing a byte to memory
For more complex cases it is likely necessary to use reverse engineering tools
- BinaryNinja, Ghidra, IDA, and Hopper are all examples of tools like this

What to do from here

Now it becomes a lot harder. You don't get a typical debugging environment or even clean backtraces.

FEX's gdbserver integration is sorely lacking so you can't even use a remote gdb server connecting to FEX right now.

Double checking if we are in JIT code

 (gdb) info reg pc
 pc             0x7fccfb9ec8        0x7fccfb9ec8
 
 (gdb) info proc mappings
 ...
 0x7fccfb9000       0x7fcdfb9000  0x1000000        0x0

Looks like FEX JIT mapping, we start out at 16MB but scale up to 128MB
Depending on version of FEX we can check the base mapping for a unique string

 (gdb) p (char*)0x7fccfb9000
 $4 = 0x7fccfb9000 "FEXJIT::Arm64JITCore::"

Development:Debugging Crash

Contents

Getting Started

Crash in emulated/JIT code

What to do from here

Double checking if we are in JIT code

Navigation menu

Search