Overwriting the Exception Handling Cache Pointer - Dwarf Oriented Programming James Oakley (Electron) ElectronlOO *noSPAM* gmail.com Rodrigo Rubira Branco (@BSDaemon) rodrigo *noSPAM* kernelhacking.com Sergey Bratus (Sbratus) Sergey *noSPAM* cs.dartmouth.edu Credits • This presentation combines ideas, research, discussions from the following personnel: - Sergey Bratus (Insecurity Theory, Exploiting the Hard- working Dwarf) - Meredith Patterson (Langsec) - R.I. P. Len Sassaman (Langsec) - James Oakley (Exploiting the Hard-working Dwarf -> everything related to that, including Katana) - Rodrigo Rubira Branco (Exploiting the Hard-working Dwarf -> exploitation, implementation details, research organization) Motivation • Software exploitation is not generic anymore • There are different exploitation primitives in different contexts • A modern exploitation technique shows how to take advantage of those primitives • There are much more been 'computed' then only the code that was written (the target) Insecurity State • "Treat all systems as compromised" - "There's no such thing as 'secure' any more." -- Deborah Plunkett, NSA Information Assurance Directorate • "Long weeks to short months before a security meltdown" - Brian Snow, in December 2010 Does prevention works? • Many initiatives in 'trustworthy computing' • Many books on 'secure programming' • Many academic publications • Lots of test methods: fuzzing, binary analysis, you name it - And STILL the software SUCKS and is EXPLOITED! • And about the hardware? We have no idea how bad it is... Insecurity is all about computation • Trustworthiness of a computer system is what the system can and cannot compute - Can the system decide if an input is invalid/unexpected/malicious & reject it? - Will program perform only expected computations, or malicious ones too? • Exploitation is setting up, instantiating, and programming a weird machine - A part of the target is overwhelmed by crafted input and enters an unexpected but manipulable state Decidability • Computation has some unsolvable (un-decidable) problems - about recognition of inputs! • An un-decidable problem does not have an algorithm that solves it for the general case Software Exploitation • A part of the target is overwhelmed by crafted input and enters an unexpected but manipulate state • Primitives are exposed - Memory corruption, implicit control flows - Unexpected control flows, ... • A "weird machine" is unleashed - A more powerful, programmable execution environment than intended or expected Software is Complex • Checks for input validity are scattered throughout the program, mixed with processing logic • Lots of additional computing options existent and available to the 'weird machine progran aka, exploit writer Weird Machine is Born! Exploiting Additional Computations • Finally we are in our talk line... • There are many computations inside a program that can be used to subvert the code execution (and some of then has nothing to do with the original code itself) • ROP is not new, exploits are using it since 2000 (maybe even before) *nix Exception Handling • Binaries compiled with GCC and that support exception handling have Dwarf bytecode: - Describe the stack frame layout - Interpreted to unwind the stack after an exception occurs • The process image includes the Dwarf interpreter (part of the GNU C++ runtime) • Bytecode can be written to force the interpreter to perform any computation (Turing-Complete), including, but not limited to, setup a library/system call modifying registers such as stack and base pointers -> See James and Sergey previous work on Dwarf Trojans James Oakley and Sergey Bratus • Proved that Dwarf can replace code creating a Trojan completely using Dwarf bytecode • Proved that Dwarf is a complete development environment: - Can read memory - Can compute with values from memory/registers - Can influence the flow of execution of a process ELF Header Program Headers ■ init ■text ■fini ■eh frame hdr ■eh frame ■gcc except table ■dynamic ■got ■data ■symtab ■strtab Section Headers v J ELF The executable has this format either on disk or in memory. Dwarf • Developed as a debugging format to replace STABS • Standard: h tt p : //d wa rf st d . o rg • Provide information such as code line, variable types, backtraces, others • ELF Sections: .debug_info, .debug_line, .debug_frame are defined in the standard • .debug_frame defines how to unwind the stack (how to restore each entry in the previous call frame) Linux Exception Handling • GCC, the Linux Standards Base and the ABI x86_64 adopted a very similar format used in the .debug_frame to describe the stack unwind during an exception: .eh_frame • It is not exactly the same as dwarf • It adds pointer encoding and language-specific data • As usual, the documentation is sparse and very limited: - Partially discussed in the Linux Standards Base - Partitially defined in the ABI - Partially implemented in GCC .eh_frame • Theoretically it is a table, where for each address in the .text it is describe how to restore the registers to the previous call frame EIP CFA EBP ESI W3SM RET OxfOOOfOOO rsp+16 *(cfa-16) *(cfa-8) OxfOOOfOOl rsp+16 *(cfa-16) *(cfa-8) 0xf000f002 rbp+16 *(cfa-16) eax=edi *(cfa-8) OxfOOOfOOa rbp+16 *(cfa-16) *(cfa-24) eax=edi *(cfa-8) • CFA (Canonical Frame Address) - Address relative to the call frame • Each line defines how each part of the code can return to the previous frame Size Limitations • Obviously, keep such a table would use more space then the code itself • That's why the adoption of bytecode: The table is 'compressed', providing everything required to create it when needed • Portions of the table are created as needed (on-demand) .eh frame .eh_frame section | CFI = Call Frame Information | | CIE = Common Information Entry DE | | FDE = Frame Description Entry .---.I | | *LSDA 1 1 | LSDA = Language Specific Data Area ......... | | "The .eh_frame section shall contain 1 or more Call Frame Information (CFI) records. The number of records present shall be determined by size of the section as contained in the section header. Each CFI record contains a Common Information Entry (CIE) record followed by 1 or more Frame Description Entry (FDE) records. Both CIEs and FDEs shall be aligned to an addressing unit sized boundary" FDExCIE FDE (Frame Description Entry) exists for each logii Instruction block CIE (Common Information Entry) holds common Information between FDEs INSTRUCTIONS in FDE hold the DWARF bytecode FDExCIE initial_location/address_range: Defines for which instructions this FDE applies augmentation: Language-specific information return_address_register: Entry in a virtual table that defines the .text location to return to (eip) instructions: Table rules. Dwarf has a language to describe the table. Dwarf Instructions • Work as an assembly language (unexpected computations) • Turing-Complete Stack-Based Machine • Can access memory and register values • Have some limitations: - Cannot write to register/memory (but we can force out-of-order code execution and obtain writes) - Cannot call native code - Cannot write to registers that are not callee-saved in the ABI (we can write to callee-saved register thought) - GCC limits the stack in 64 words Dwarf Programming • DW_CFA_set_loc N Next instructions apply to the first N bytes of the function • DW_CFA_def_cfa R OFF CFA is calculed starting from register R and offset OFF • DW_CFA_offset R OFF Register R is restaured from the value in CFA OFF • DW_CFA_register Rl R2 Register Rl is restaured with the contents of R2 And the table is back... • Each architecture register receives a DWARF equivalent (the mapping is architecture specific) • Dwarf Instructions define rules for a column or advances to the next line (program location) • In a FDE, lines heritage from instruction lines above them EIP CFA EBP EBX EAX RET OxfOOOfOOO rsp+16 *(cfa-16) *(cfa-8) OxfOOOfOOl rsp+16 *(cfa-16) *(cfa-8) 0xf000f002 rbp+16 *(cfa-16) eax=edi *(cfa-8) OxfOOOfOOa rbp+16 *(cfa-16) *(cfa-24) eax=edi *(cfa-8) Dwarf Expressions • To not anticipate all unwinding mechanisms of a system, the standard defines flexibility: - DW_CFA_expression R EXPRESSION R receives the value from the EXPRESSION result - DW_CFA_val_expression R EXPRESSION R restored to result of EXPRESSION • Expressions have their own instructions: - Constant Values: DW_OP_constu, DW_OP_const8s, etc - Arithmetic: DW OP plus, DW OP mul, DW_OP_and, DW_OP_xor, etc - Memory read: DW_OP_deref - Register read: DW_OP_bregx - Flow Control: DW_OP_le, DW_OP_skip, DW_OP_bra, etc Katana Emit a dwarfscript > $e=load "demo" Loaded ELF "demo" > dwarfscript emit ,, .eh_frame" $e "demo.dws" Wrote dwarfscript to demo.dws Dwarfscript assembler > $ehframe=dwarfscript compile "demo.dws" > replace section $e ".eh_frame" $ehframe[0] Replaced section ".eh_frame" > save $e "demo_rebuilt" Saved ELF object to "demo_rebuilt" > !chmod +x demo_rebuilt So what? • With Katana you can see and modify unwind tables in an easy way - Control the unwinding flow (how the call stack is handled) - Avoid an exception handler to execute another one - Redirect exceptions - Find/solve symbols - Calculate relocations Example • If function foo is responsible for an exception - Change flow to function bar - Thru static analysis, we see that bar is at 0x600DF00D - In the FDE, we change: DW_CFA_offset rl6 1 -To: DW_CFA_val_expression rl6 begin EXPRESSION DW_OP_constu 0x600DF00D dnd EXPRESSION .gcc_except_table • So far, redirected only to 'catch' blocks • The .gcc_except_table hold language-specific data (where the exception handlers are) - Interpreted by the personality routines - We can stop an exception at any time - Unlike the .eh_frame, do not have standards - There is no documentation, so let's see the code ;) Assembly • While compiling a program using GCC, do: -save-temps -fverbose-asm -dA .section .gcc_except_table,"a",@progbits .align 4 .LLSDA963: .byte Oxff # @LPStart format (omit) .byte 0x3 # @TType format (udata4) .uleb128.LLSDATT963-.LLSDATTD963 # @TType base offset .LLSDATTD963: .byte 0x1 # call-site format (uleb128) .uleb128 .LLSDACSE963-.LLSDACSB963 # Call-site table length .LLSDACSB963: .uleb128 .LEHB0-.LFB963 # region start .uleb128 .LEHEO-.LEHBO # length .uleb128 .L6-.LFB963 # landing pad .uleb128 0x1 # action .uleb128 .LEHB1-.LFB963 # region 1 start .uleb128 .LEHE1-.LEHB1 # length .uleb128 0x0 # landing pad .uleb128 0x0 # action .uleb128 .LEHB2-.LFB963 # region 2 start .uleb128 .LEHE2-.LEHB2 # length .uleb128 .L7-.LFB963 # landing pad .uleb128 0x0 # action .LLSDACSE963: .byte 0x1 # Action record table .byte 0x0 .align 4 .long ZTIi Layout gcc_except_table language-specific data areas (LSDAs) J call site position call site length landing pad position 1 .offset to next action I Exception Handling Flow cxa_allocate_exception | < in libstdc-H- J _Unwind_RaiseException in libgcc unwind one frame call personality routine if no handler, loop return into handler _gxx_personality_vO in libstdc-H- read language specific data . User Code catch block bookkeeping bookkeeping execution continues __cxa_begin_catch | in libstdc-H- J __cxa_end_catch j in libstdc-H- J Exceptions are not asynchronous • Functions that call throwQ just call: - cxa_allocate_exception() -> To allocate space using malloc (or buffers in the .bss if malloc fails - gcc-xxx/l i bstd++v3/l i bsu pc++/e h_a I loc. :84) - And then cxa_throw() -> That will go thru the frames until a handler for the exception is found Proving (assembly) Dump of assembler code for function main: <+9>: mov $0x4,%edi # std::size_t thrown_size # Allocates a new " cxa_refcounted_exception M followed by 4 bytes; we # do a M throw(l) M , 1 being an "int" occupies 4 bytes. <+14>: callq 0x400930 < cxa_allocate_exception@plt> <+25>: mov $0x0,%edx # void (*dest) (void *) <+30>: mov $0x6013c0,%esi # std::type_info *tinfo <+35>: mov %rax,%rdi #void*obj <+38>: callq 0x400940 <__cxa_throw@plt> cxa_allocate_exception() • Returns a pointer to a - struct cxa_refcounted_exception, which helds a reference to an object cxa_exception • cxa_throw() is then executed to: - Initialize the current context (register values) - Iterate in the stack until it finds the exception handler What We've Shown Before • Ret-into-libc • Used the dynamic-linker already in Dwarf to find execvpe • Used Dwarf to prepare the stack • In less than 200 bytes and less than 20 words in the stack (showing that a 64-stack word limitation is not an obstacle) • Started in an offset of execvpe where they can control the Dwarf registers (and not in the function beginning) What else can be done? • Old GCC had both, the .eh_frame and the .gcc_except_table as +W • Well... - Li bgcc/l i bstd C++ need to find those areas in memory, right? - The program header, GNU_EH_FRAME contains the .eh_frame location (dl_iterate_phdr is the function that finds it) - Libgcc caches the value! Fake EH • If we can overwrite the cached value, we are able to control the exceptions and leverage everything already explained • Libgcc does not export symbols, so we need to find an heuristic/reverse to find what to overwrite Caching • The pointer caching is done in: unwind-dw2-fde-glibc.c: #define FRAME_HDR_CACHE_SIZE 8 static struct frame_hdr_cache_element { _Unwind_Ptr pcjow; _Unwind_Ptr pc_high; _Unwind_Ptr load_base; const ElfW(Phdr) *p_eh_frame_hdr; const ElfW(Phdr) *p_dynamic; struct frame_hdr_cache_element *link; } frame_hdr_cache[FRAME_HDR_CACHE_SIZE]; Caching 8 cache entries for the frame header - Uses a Least Used Replacement Algorithm (_Unwind_lteratePhdr_Callback()) - Most recently used is the head of the list In the test environment, the frame_hdr_cache was at 0x6e0 bytes from the offset of the writable data segment of libgcc • This is the aforementioned array, with 48 bytes in size The executable itself is the 3 rd element of the array (the first two are the libgcc and libstdc++) The offset for the writable data segment of libgcc can be found in this way (based in what we know): - 0x6e0+48*2=0x740 The entry p_eh_frame_hdr that we want to overwrite is at 24 bytes of this structure. Example • Together with the paper/presentation, we release a demo program for the exploitation. Those are the test characteristics we see: - 0x7ffff760e000 -> libgcc loading address - 0x220000 -> offset for the writable data segment (starting from the lib base address) - 0x6e0 -> offset for the cache elements (starting from the writable data segment) - 48 bytes is the size of the structure - 2 entries before ours (readelf-d) - 24 bytes inside the structure is what we want to overwrite Exploiting • To simplify the exploitation, it is interesting to align the structures in known offsets/ controlled offsets: - .eh_frame in the example aligned to start exactly at 0x50 bytes from the start of the .eh_frame_hdr - .gcc_except_table aligned to start exactly at 0x200 bytes from the start of the .eh_frame Memleak We need the value of EBP, so we going to use a memleak. It can be achieved in different ways, depending on the target program (e.g.: overwriting parameters to printf-like functions, or if the vulnerability is a format string, which is our sample case) To calculate the EBP_PREVIOUS we use %llx (format string), so we use 4 bytes of space in the buffer and advance the stack pointer in 8 bytes (so the premise for exploiting the sample program is to manage to leak the EBP): #to get the value of ebp_previous instr=r n %llx%llx%llx%llx%llx%llx%^^ %llx%llx%x%x M proc.sendline(instr) proc.expectfunknown command: [0-9a-f]* ([0-9a-f]*).*") ebp_previous=int(proc. match. group(l),16) info("\nfound ebp_previous = Ox%x" % ebp_previous) Heuristics • We know the size of the previous frame (disassembling), so we are capable of calculate the EBP of our frame: - ebp=ebp_previous-PREV_FRAME_SIZE • With our frame address, we can calculate the address of libgcc, since we know the offsets: - libgcc_reveal_location=ebp- LIBGCC_REVEAL_EBP_OFFSET; More Heuristics • The value that reveals the .text location of the libgcc is at 0xffffc798 (discovered in the previous slide), and it is 0x679 above ESP and 0x750 above EBP • The libgcc base is calculated using the previously revealed address and masking the 3 low nibbles. We also use a fixed value to adjust the result (found thru disassembly): - Iibgcc_base=(libgcc_revealed & OxFFFFFOOO) - LI BGCC_REVEAL_ADJ UST • The separation between .text and .data segments in libgcc is 0x19000 (x86): - libgcc_data_base=libgcc_base+LIBGCC_DATA_OFFSET Finalizing • Finally, we find the frame_hdr_cache and the respective p_eh_frame_hdr from the libgcc_data_base, as previously described: - frame_hdr_cache=libgcc_data_base +CACHE_LIBGCC_OFFSET - p_eh_frame_hdr=frame_hdr_cache +CACHE_ENTRY_SIZE*PREVIOUS_CACHE_ENTRIES +OFFSET IN CACHE ENTRY In the demo case • With all the values, we redirect the execution: - Function doWork starts at 0x0804936a - Throw is at 0x08049634 - Distance: 0x2ca (call site 14 in the dict_mod.dws for the demo) - We force the execution of the l_am_never_called that is at 0x08049842 (offset of 0x4d8 from doWork) • The Dwarf payload is injected in the dictionary been readed by the target program (instead of using a shellcode). We find the pointers, overwrite the caching target address and the desired catch block is executed. Other possibilities • If you have a Write N you can overwrite the .eh_frame entirely (if it is +W, what is not normal in new systems) • You can overwrite the .eh_frame using a shellcode • You can use a stagered ret-into-lib to remap the .eh frame as +W and then overwrite it THE END! Really is !? James Oakley (Electron) ElectronlOO *noSPAM* gmail.com Rodrigo Rubira Branco (@BSDaemon) rodrigo *noSPAM* kernelhacking.com Sergey Bratus (Sbratus) Sergey *noSPAM* cs.dartmouth.edu