论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2007-11-21 00:00 |只看该作者 |倒序浏览

Basic Inline Assembly Code
The asm format
The GNU C compiler uses the asm keyword to denote a section of source code that is written in assembly language. The basic format of the asm section is as follows:
asm( “assembly code” );
The assembly code contained within the parentheses must be in a specific format:

The instructions must be enclosed in quotation marks.
If more than one instruction is included, the newline character must be used to separate each line of assembly language code. Often, a tab character is also included to help indent the assembly language code to make lines more readable.

A sample basic inline assembly section could look like this:
asm (“movl $1, %eax\n\tmovl $0, %ebx\n\tint $0x80”);or
asm ( “movl $1, %eax\n\t”“movl $0, %ebx\n\t”“int $0x80”);
Using global C variablesThe basic inline assembly code can utilize global C variables defined in the application. The word to remember here is “global.” Only globally defined variables can be used within the basic inline assembly code. The variables are referenced by the same names used within the C program.
Using the volatile modifierSometimes optimization is not a good thing with inline assembly functions. It is possible that the compiler may look at the inline code and attempt to optimize it as well, possibly producing undesirable effects.
If you want the compiler to leave your hand-coded inline assembly function alone, you can just say so! The volatile modifier can be placed in the asm statement to indicate that no optimization is desired on that section of code. The format of the asm statement using the volatile modifier is as follows:
asm volatile (“assembly code”);The assembly code within the statement uses the standard rules it would use without the volatile modifier. Nor does the addition of the volatile modifier change the requirement to store and retrieve the register values within the inline assembly code.
Using an alternate keyword
The asm keyword used to identify the inline assembly code section may be altered if necessary. The ANSI C specifications use the asm keyword for something else, preventing you from using it for your inline assembly statements. If you are writing code using the ANSI C conventions, you must use the __asm__ keyword instead of the normal asm keyword.
The __asm__ keyword can also be modified using the __volatile__ modifier.
Extended ASM
The basic asm format provides an easy way to create assembly code, but it has its limitations. For one, all input and output values have to use global variables from the C program. In addition, you have to be extremely careful not to change the values of any registers within the inline assembly code.
The GNU compiler provides an extended format for the asm section that helps solve these problems. The extended format provides additional options that enable you to more precisely control how the inline assembly language code is generated within the C or C++ language program. This section describes the extended asm format.
Extended ASM formatBecause the extended asm format provides additional features to use, they must be included in the new format. The format of the extended version of asm looks like this:
asm (“assembly code” : output locations : input operands : changed registers);
This format consists of four parts, each separated by a colon:

Assembly code: The inline assembly code using the same syntax used for the basic asm format
Output locations: A list of registers and memory locations that will contain the output values from the inline assembly code
Input operands: A list of registers and memory locations that contain input values for the inline assembly code
Changed registers: A list of any additional registers that are changed by the inline code

Not all of the sections are required to be present in the extended asm format. If no output values are associated with the assembly code, the section must be blank, but two colons must still separate the assembly code from the input operands. If no registers are changed by the inline assembly code, the last colon may be omitted.
Specifying input and output valuesIn the basic asm format, input and output values are incorporated using the C global variable name within the assembly code. Things are a little different when using the extended format.
In the extended format, you can assign input and output values from both registers and memory locations. The format of the input and output values list is
“constraint”(variable)
where variable is a C variable declared within the program. In the extended asm format, both local and global variables can be used. The constraint defines where the variable is placed (for input values) or moved from (for output values). This is what defines whether the value is placed in a register or a memory location.
The constraint is a single-character code. The constraint codes are shown in the following table.
Constraint
Description
a
Use the %eax, %ax, or %al registers.
b
Use the %ebx, %bx, or %bl registers.
c
Use the %ecx, %cx, or %cl registers.
d
Use the %edx, %dx, or $dl registers.
S
Use the %esi or %si registers.
D
Use the %edi or %di registers.
r
Use any available general-purpose register.
q
Use either the %eax, %ebx, %ecx, or %edx register.
A
Use the %eax and the %edx registers for a 64-bit value.
f
Use a floating-point register.
t
Use the first (top) floating-point register.
u
Use the second floating-point register.
m
Use the variable’s memory location.
o
Use an offset memory location.
V
Use only a direct memory location.
i
Use an immediate integer value.
n
Use an immediate integer value with a known value.
g
Use any register or memory location available.
In addition to these constraints, output values include a constraint modifier, which indicates how the output value is handled by the compiler. The output modifiers that can be used are shown in the following table.
Output Modifier
Description
+
The operand can be both read from and written to.
=
The operand can only be written to.
%
The operand can be switched with the next operand if
necessary.
&
The operand can be deleted and reused before the
inline functions complete.
This example:
asm (“assembly code” : “=a”(result) : “d”(data1), “c”(data2));
places the C variable data1 into the EDX register, and the variable data2 into the ECX register. The result of the inline assembly code will be placed into the EAX register, and then moved to the result variable.
Using registersIn extended asm format, to reference a register in the assembly code you must use two percent signs instead of just one.
You don’t always need to specify the output value in the inline assembly section. Some assembly instructions already assume that the input values contain the output values. For example, The MOVS instructions include the output location within the input values.
Using placeholdersThe extended asm format provides placeholders that can be used to reference input and output values within the inline assembly code. This enables you to declare input and output values in any register or memory location that is convenient for the compiler.
The placeholders are numbers, preceded by a percent sign. Each input and output value listed in the inline assembly code is assigned a number based on its location in the listing, starting with zero. The placeholders can then be used in the assembly code to represent the values.
For example, the following inline code:
asm (“assembly code”: “=r”(result): “r”(data1), “r”(data2));
will produce the following placeholders:

%0 will represent the register containing the result variable value.
%1 will represent the register containing the data1 variable value.
%2 will represent the register containing the data2 variable value.

Notice that the placeholders provide a method for utilizing both registers and memory locations within the inline assembly code.
Referencing placeholders
If an input and output value in the inline assembly code share the same C variable from the program, you can specify that using the placeholders as the constraint value. This can create some odd-looking code, but it comes in handy to reduce the number of registers required in the code. For example:
asm (“imull %1, %0”: “=r”(data2): “r”(data1), “0”(data2));
The 0 tag signals the compiler to use the first named register for the output value data2. The first named register is defined in the second line, which assigns a register to the data2 input variable. This ensures that the same register will be used to hold the input and output values. Of course, the result will be placed in the data2 value when the inline code is complete.
Alternative placeholders
If you are working with a lot of input and output values, the numeric placeholders can quickly become confusing. To help keep things sane, the GNU compiler (starting with version 3.1) enables you to declare alternative names as placeholders.
The alternative name is defined within the sections in which the input and output values are declared. The format is as follows:
%[name]”constraint”(variable)
The name value defined becomes the new placeholder identifier for the variable in the inline assembly code, as shown in the following example:
asm (“imull %[value1], %[value2]”: [value2] “=r”(data2): [value1] “r”(data1), “0”(data2));
Changed registers listThe compiler assumes that registers used in the input and output values will change, and handles that
accordingly. You do not need to include these values in the changed registers list. In fact, if you do, it
will produce an error message, as demonstrated in the following badregtest.c program:
/* badregtest.c - An example of incorrectly using the changed registers list */
#include stdio.h>
int main()
{
int data1 = 10;
int result = 20;
asm ("addl %1, %0"
      : "=d"(result)
      : "c"(data1), "0"(result)
      : "%ecx", "%edx");
printf("The result is %d\n", result);
return 0;
}
The badregtest.c program specifies that the result variable should be loaded into the EDX register and the data1 variable into the ECX register. The changed registers list incorrectly specifies that the ECX and EDX registers change within the inline code. Note that the registers are listed in the changed registers list using the full register names, not just a single letter as with the input and output register definitions. Using the percent sign with the register name is optional.
When you try to compile this program, an error will be produced:
$ gcc -o badregtest badregtest.cbadregtest.c: In function ‘main’:badregtest.c:8: error: can’t find a register in class ‘DREG’ while reloading ‘asm’
The compiler already knew that the EDX register was used as a register, and it could not properly handle the request for the changed register list.
The proper use of the changed register list is to notify the compiler if your inline assembly code uses any additional registers that were not initially declared as input or output values. The compiler must know about these registers so it knows to avoid using them, as demonstrated in the changedtest.c program:

/* changedtest.c - An example of setting registers in the changed registers list */
#include stdio.h>
int main()
{
int data1 = 10;
int result = 20;
asm ("movl %1, %%eax\n\t"
      "addl %%eax, %0"
      : "=r"(result)
      : "r"(data1), "0"(result)
      : "%eax");
printf("The result is %d\n", result);
return 0;
}
In the changedtest.c program, the inline assembly code uses the EAX register as an intermediate location to store a data value. Because the register was not declared as an input or output value, it must be included in the changed registers list.
Now that the compiler knows that the EAX register is not available, it will work around that. The input and output values were declared using the r constraint, which enables the compiler to select the registers to use. Looking at the generated assembly language code, you can see which registers were selected:

      movl $10, -4(%ebp)
movl $20, -8(%ebp)
movl -4(%ebp), %ecx
movl -8(%ebp), %edx
#APP
movl %ecx, %eax
addl %eax, %edx
#NO_APP
movl %edx, %eax
The code for moving the C variables into registers uses the ECX and EDX registers. The compiler purposely avoided using the EAX register, as it was declared as being used in the inline assembly code.
Using memory locations
Although using registers in the inline assembly language code is faster, you can also directly use the memory locations of the C variables. The m constraint is used to reference memory locations in the input and output values. Remember that you still have to use registers for the assembly instructions that require them, so you may have to define intermediate registers to hold the data. The memtest.c program demonstrates this:
/* memtest.c - An example of using memory locations as values */
#include stdio.h>
int main()
{
int dividend = 20;
int divisor = 5;
int result;
asm("divb %2\n\t"
   "movl %%eax, %0"
   : "=m"(result)
   : "a"(dividend), "m"(divisor));
printf("The result is %d\n", result);
return 0;
}
The asm section loads the dividend value into the EAX register as required by the DIV instruction. The divisor is kept in a memory location, as is the output value. The generated assembly code looks like the following:
movl $20, -4(%ebp)
movl $5, -8(%ebp)
movl -4(%ebp), %eax
#APP
divb -8(%ebp)
movl %eax, -12(%ebp)
#NO_APP
The values are loaded into memory locations (in the stack), with the dividend value also moved to the EAX register. When the result is determined, it is moved into its memory location on the stack, instead of to a register.
Handling jumps
The inline assembly language code can also contain labels to define locations in the inline assembly code. Normal assembly conditional and unconditional branches can be implemented to jump to the defined labels.
There are two restrictions when using labels in inline assembly code. The first one is that you can only jump to a label within the same asm section. You cannot jump from one asm section to a label in another asm section.
The second restriction is for another asm section in your C code, you cannot use the same labels again, or an error message will result due to duplicate use of labels. In addition, if you try to incorporate labels that use C keywords, such as
function names or global variables, you will also generate errors.
There are two solutions to solve this. The easiest solution is to just use different labels within different asm sections. If you are hand-coding each of the asm sections, this is a viable alternative.
If you are using the same asm sections, you cannot alter the labels within the inline assembly code. The solution is to use local labels.
Both conditional and unconditional branches allow you to specify a number as a label, along with a directional flag to indicate which way the processor should look for the numerical label. The first occurrence of the label found will be taken. To demonstrate this, the jmptest2.c program can be used:

/* jmptest2.c - An example of using generic jumps in inline assembly */
#include stdio.h>
int main()
{
int a = 10;
int b = 20;
int result;
asm("cmp %1, %2\n\t"
   "jge 0f\n\t"
   "movl %1, %0\n\t"
   "jmp 1f\n"
   "0:\n\t"
   "movl %2, %0\n"
   "1:"
   :"=r"(result)
   :"r"(a), "r"(b));
printf("The larger value is %d\n", result);
return 0;
}
The labels have been replaced with 0: and 1:. The JGE and JMP instructions use the f modifier to indicate the label is forward from the jump instruction. To move backward, you must use the b modifier.
Using Inline Assembly Code
Creating inline assembly macro functions
Just as you can with the C macro functions, you can declare macro functions that include inline assembly code. The inline assembly code must use the extended asm format, so the proper input and output values can be defined. Because the macro function can be used multiple times in a program, you should also use numeric labels for any branches required in the assembly code.
An example of defining an inline assembly macro function is as follows:

#define GREATER(a, b, result) ({ \
asm(“cmp %1, %2\n\t” \
“jge 0f\n\t” \
“movl %1, %0\n\t” \
“jmp 1f\n “ \
“0:\n\t” \
“movl %2, %0\n “ \
“1:” \
:”=r”(result) \
:”r”(a), “r”(b)); })
The a and b input variables are assigned to registers so they can be used in the CMP instruction. The JGE and JMP instructions use numeric labels so the macro function can be used multiple times in the program without duplicating assembly labels. The result variable is copied from the register that contains the greater of the two input values. Note that the asm statement must be in a set of curly braces to indicate the start and end of the statement. Without them, the compiler will generate an error each time the macro is used in the C code.

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u1/48729/showart_427077.html

文库|博客

返回列表

Chinaunix › 论坛 › 操作系统 › Linux新手园地 › Linux文档专区 › 汇编语言学习笔记（二十六） -- Using Inline Assembly

汇编语言学习笔记（二十六） -- Using Inline Assembly [复制链接]