User:Syaghmour/Text Processing Instructions
Text Processing Instructions edit
SSE 4.2 adds four string text processing instructions PCMPISTRI
, PCMPISTRM
, PCMPESTRI
and PCMPESTRM
. These instructions take three parameters, arg1
an xmm register, arg2
an xmm or a 128-bit memory location and IMM8
an 8-bit immediate control byte. These instructions will perform arithmetic comparison between the packed contents of arg1
and arg2
. IMM8
specifies the format of the input/output as well as the operation of two intermediate stages of processing. The results of stage 1 and stage 2 of intermediate processing will be referred to as IntRes1
and IntRes2
respectively. These instructions also provide additional information about the result through overload use of the arithmetic flags(AF
, CF
, OF
, PF
, SF
and ZF
).
The instructions proceed in multiple steps:
arg1
andarg2
are compared- An aggregation operation is applied to the result of the comparison with the result flowing into
IntRes1
- An optional negation is performed with the result flowing into
IntRes2
- An output in the form of an index(in
ECX
) or a mask(inXMM0
) is produced
IMM8 control byte description edit
IMM8 control byte is split into four group of bit fields that control the following settings:
IMM8[1:0]
specifies the format of the 128-bit source data(arg1
andarg2
):IMM8[1:0] Description 00b unsigned bytes(16 packed unsigned bytes) 01b unsigned words(8 packed unsigned words) 10b signed bytes(16 packed signed bytes) 11b signed words(8 packed signed words) IMM8[3:2]
specifies the aggregation operation whose result will be placed in intermediate result 1, which we will refer to asIntRes1
. The size ofIntRes1
will depend on the format of the source data, 16-bit for packed bytes and 8-bit for packed words:IMM8[3:2] Description 00b Equal Any, arg1 is a character set, arg2 is the string to search in. IntRes1[i] is set to 1 if arg2[i] is in the set represented by arg1: arg1 = "aeiou" arg2 = "Example string 1" IntRes1 = 1010001000010000
01b Ranges, arg1 is a set of character ranges i.e. "09az" means all characters from 0 to 9 and from a to z., arg2 is the string to search over. IntRes1[i] is set to 1 if arg[i] is in any of the ranges represented by arg1: arg1 = "09az" arg2 = "Testing 1 2 3, T" IntRes1 = 0111111010101000
10b Equal Each, arg1 is string one and arg2 is string two. IntRes1[i] is set to 1 if arg1[i] == arg2[i]: arg1 = "The quick brown " arg2 = "The quack green " IntRes1 = 1111110111010011
11b Equal Ordered, arg1 is a substring string to search for, arg2 is the string to search within. IntRes1[i] is set to 1 if the substring arg1 can be found at position arg2[i]: arg1 = "he" arg2 = ", he helped her " IntRes1 = 0010010000001000
IMM8[5:4]
specifies the polarity or the processing ofIntRes1
, into intermediate result 2, which will be referred to asIntRes2
:IMM8[5:4] Description 00b Positive Polarity IntRes2 = IntRes1 01b Negative Polarity IntRes2 = -1 XOR IntRes1 10b Masked Positive IntRes2 = IntRes1 11b Masked Negative IntRes2 = IntRes1 if reg/mem[i] is invalid else ~IntRes1 IMM8[6]
specifies the output selection, or howIntRes2
will be processed into the output. ForPCMPESTRI
andPCMPISTRI
, the output is an index into the data currently referenced byarg2
:IMM8[6] Description 0b Least Significant Index ECX contains the least significant set bit in IntRes2 1b Most Significant Index ECX contains the least significant set bit in IntRes2 - For
PCMPESTRM
andPCMPISTRM
, the output is a mask reflecting all the set bits inIntRes2
:IMM8[6] Description 0b Least Significant Index Bit Mask, the least significant bits of XMM0 contain the IntRes2 16(8) bit mask. XMM0 is zero extended to 128-bits. 1b Most Significant Index Byte/Word Mask, XMM0 contains IntRes2 expanded into byte/word mask IMM8[7]
should be set to zero since it has no designed meaning.
The Four Instructions edit
pcmpistri IMM8, arg1, arg2 | GAS Syntax |
pcmpistri arg2, arg1, IMM8 | Intel Syntax |
PCMPISTRI
, Packed Compare Implicit Length Strings, Return Index. Compares strings of implicit length and generates index in ECX
.
Operands
arg1
- XMM Register
arg2
- XMM Register
- Memory
IMM8
- 8-bit Immediate value
Modified flags
CF
is reset ifIntRes2
is zero, set otherwiseZF
is set if a null terminating character is found inarg2
, reset otherwiseSF
is set if a null terminating character is found inarg1
, reset otherwiseOF
is set toIntRes2[0]
AF
is resetPF
is reset
Example
;
; nasm -felf32 -g sse4_2StrPcmpistri.asm -l sse4_2StrPcmpistri.lst
; gcc -o sse4_2StrPcmpistri sse4_2StrPcmpistri.o
;
global main
extern printf
extern strlen
extern strcmp
section .data
align 4
;
; Fill buf1 with a repeating pattern of ABCD
;
buf1: times 10 dd 0x44434241
s1: db "This is a string", 0
s2: db "This is a string slightly different string", 0
s3: db "This is a str", 0
fmtStr1: db "String: %s len: %d", 0x0A, 0
fmtStr1b: db "strlen(3): String: %s len: %d", 0x0A, 0
fmtStr2: db "s1: =%s= and s2: =%s= compare: %d", 0x0A, 0
fmtStr2b: db "strcmp(3): s1: =%s= and s2: =%s= compare: %d", 0x0A, 0
;
; Functions will follow the cdecl call convention
;
section .text
main: ; Using main since we are using gcc to link
sub esp, -16 ; 16 byte align the stack
sub esp, 16 ; space for four 4 byte parameters
;
; Null terminate buf1, make it proper C string, length is now 39
;
mov [buf1+39], byte 0x00
lea eax, [buf1]
mov [esp], eax ; Arg1: pointer of string to calculate the length of
mov ebx, eax ; Save pointer in ebx since we will use it again
call strlenSSE42
mov edx, eax ; Copy length of arg1 into edx
mov [esp+8], edx ; Arg3: length of string
mov [esp+4], ebx ; Arg2: pointer to string
lea eax, [fmtStr1]
mov [esp], eax ; Arg1: pointer to format string
call printf ; Call printf(3):
; int printf(const char *format, ...);
lea eax, [buf1]
mov [esp], eax ; Arg1: pointer of string to calculate the length of
mov ebx, eax ; Save pointer in ebx since we will use it again
call strlen ; Call strlen(3):
; size_t strlen(const char *s);
mov edx, eax ; Copy length of arg1 into edx
mov [esp+8], edx ; Arg3: length of string
mov [esp+4], ebx ; Arg2: pointer to string
lea eax, [fmtStr1b]
mov [esp], eax ; Arg1: pointer to format string
call printf ; Call printf(3):
; int printf(const char *format, ...);
lea eax, [s2]
mov [esp+4], eax ; Arg2: pointer to second string to compare
lea eax, [s1]
mov [esp], eax ; Arg1: pointer to first string to compare
call strcmpSSE42
mov [esp+12], eax ; Arg4: result from strcmpSSE42
lea eax, [s2]
mov [esp+8], eax ; Arg3: pointer to second string
lea eax, [s1]
mov [esp+4], eax ; Arg2: pointer to first string
lea eax, [fmtStr2]
mov [esp], eax ; Arg1: pointer to format string
call printf
lea eax, [s2]
mov [esp+4], eax ; Arg2: pointer to second string to compare
lea eax, [s1]
mov [esp], eax ; Arg1: pointer to first string to compare
call strcmp ; Call strcmp(3):
; int strcmp(const char *s1, const char *s2);
mov [esp+12], eax ; Arg4: result from strcmpSSE42
lea eax, [s2]
mov [esp+8], eax ; Arg3: pointer to second string
lea eax, [s1]
mov [esp+4], eax ; Arg2: pointer to first string
lea eax, [fmtStr2b]
mov [esp], eax ; Arg1: pointer to format string
call printf
lea eax, [s3]
mov [esp+4], eax ; Arg2: pointer to second string to compare
lea eax, [s1]
mov [esp], eax ; Arg1: pointer to first string to compare
call strcmpSSE42
mov [esp+12], eax ; Arg4: result from strcmpSSE42
lea eax, [s3]
mov [esp+8], eax ; Arg3: pointer to second string
lea eax, [s1]
mov [esp+4], eax ; Arg2: pointer to first string
lea eax, [fmtStr2]
mov [esp], eax ; Arg1: pointer to format string
call printf
lea eax, [s3]
mov [esp+4], eax ; Arg2: pointer to second string to compare
lea eax, [s1]
mov [esp], eax ; Arg1: pointer to first string to compare
call strcmp ; Call strcmp(3):
; int strcmp(const char *s1, const char *s2);
mov [esp+12], eax ; Arg4: result from strcmpSSE42
lea eax, [s3]
mov [esp+8], eax ; Arg3: pointer to second string
lea eax, [s1]
mov [esp+4], eax ; Arg2: pointer to first string
lea eax, [fmtStr2b]
mov [esp], eax ; Arg1: pointer to format string
call printf
call exit
;
; size_t strlen(const char *s);
;
strlenSSE42:
push ebp
mov ebp, esp
mov edx, [ebp+8] ; Arg1: copy s(pointer to string) to edx
;
; We are looking for null terminating char, so set xmm0 to zero
;
pxor xmm0, xmm0
mov eax, -16 ; Avoid extra jump in main loop
strlenLoop:
add eax, 16
;
; IMM8[1:0] = 00b
; Src data is unsigned bytes(16 packed unsigned bytes)
; IMM8[3:2] = 10b
; We are using Equal Each aggregation
; IMM8[5:4] = 00b
; Positive Polarity, IntRes2 = IntRes1
; IMM8[6] = 0b
; ECX contains the least significant set bit in IntRes2
;
pcmpistri xmm0,[edx+eax], 0001000b
;
; Loop while ZF != 0, which means none of bytes pointed to by edx+eax
; are zero.
;
jnz strlenLoop
;
; ecx will contain the offset from edx+eax where the first null
; terminating character was found.
;
add eax, ecx
pop ebp
ret
;
; int strcmp(const char *s1, const char *s2);
;
strcmpSSE42:
push ebp
mov ebp, esp
mov eax, [ebp+8] ; Arg1: copy s1(pointer to string) to eax
mov edx, [ebp+12] ; Arg2: copy s2(pointer to string) to edx
;
; Subtract s2(edx) from s1(eax). This admititedly looks odd, but we
; can now use edx to index into s1 and s2. As we adjust edx to move
; forward into s2, we can then add edx to eax and this will give us
; the comparable offset into s1 i.e. if we take edx + 16 then:
;
; edx = edx + 16 = edx + 16
; eax+edx = eax -edx + edx + 16 = eax + 16
;
; therefore edx points to s2 + 16 and eax + edx points to s1 + 16.
; We thus only need one index, convoluted but effective.
;
sub eax, edx
sub edx, 16 ; Avoid extra jump in main loop
strcmpLoop:
add edx, 16
movdqu xmm0, [edx]
;
; IMM8[1:0] = 00b
; Src data is unsigned bytes(16 packed unsigned bytes)
; IMM8[3:2] = 10b
; We are using Equal Each aggregation
; IMM8[5:4] = 01b
; Negative Polarity, IntRes2 = -1 XOR IntRes1
; IMM8[6] = 0b
; ECX contains the least significant set bit in IntRes2
;
pcmpistri xmm0, [edx+eax], 0011000b
;
; Loop while ZF=0 and CF=0:
;
; 1) We find a null in s1(edx+eax) ZF=1
; 2) We find a char that does not match CF=1
;
ja strcmpLoop
;
; Jump if CF=1, we found a mismatched char
;
jc strcmpDiff
;
; We terminated loop due to a null character i.e. CF=0 and ZF=1
;
xor eax, eax ; They are equal so return zero
jmp exitStrcmp
strcmpDiff:
add eax, edx ; Set offset into s1 to match s2
;
; ecx is offset from current poition where two strings do not match,
; so copy the respective non-matching byte into eax and edx and fill
; in remaining bits w/ zero.
;
movzx eax, byte[eax+ecx]
movzx edx, byte[edx+ecx]
;
; If s1 is less than s2 return integer less than zero, otherwise return
; integer greater than zero.
;
sub eax, edx
exitStrcmp:
pop ebp
ret
exit:
;
; Call exit(3) syscall
; void exit(int status)
;
mov ebx, 0 ; Arg one: the status
mov eax, 1 ; Syscall number:
int 0x80
Expected output:
String: ABCDABCDABCDABCDABCDABCDABCDABCDABCDABC len: 39
strlen(3): String: ABCDABCDABCDABCDABCDABCDABCDABCDABCDABC len: 39
s1: =This is a string= and s2: =This is a string slightly different string= compare: -32
strcmp(3): s1: =This is a string= and s2: =This is a string slightly different string= compare: -32
s1: =This is a string= and s2: =This is a str= compare: 105
strcmp(3): s1: =This is a string= and s2: =This is a str= compare: 105
pcmpistrm IMM8, arg1, arg2 | GAS Syntax |
pcmpistrm arg2, arg1, IMM8 | Intel Syntax |
PCMPISTRM
, Packed Compare Implicit Length Strings, Return Mask. Compares strings of implicit length and generates a mask stored in XMM0
.
Operands
arg1
- XMM Register
arg2
- XMM Register
- Memory
IMM8
- 8-bit Immediate value
Modified flags
CF
is reset ifIntRes2
is zero, set otherwiseZF
is set if a null terminating character is found inarg2
, reset otherwiseSF is set if a null terminating character is found in
arg2
, reset otherwiseOF
is set toIntRes2[0]
AF
is resetPF
is reset
pcmpestri IMM8, arg1, arg2
GAS Syntax
pcmpestri arg2, arg1, IMM8
Intel Syntax
PCMPESTRI
, Packed Compare Explicit Length Strings, Return Index. Compares strings of explicit length and generates index in ECX
.
Operands
arg1
- XMM Register
arg2
- XMM Register
- Memory
IMM8
- 8-bit Immediate value
Implicit Operands
EAX
holds the length of arg1
EDX
holds the length of arg2
Modified flags
CF
is reset if IntRes2
is zero, set otherwise
ZF
is set if EDX
is < 16(for bytes) or 8(for words), reset otherwise
SF
is set if EAX
is < 16(for bytes) or 8(for words), reset otherwise
OF
is set to IntRes2[0]
AF
is reset
PF
is reset
pcmpestrm IMM8, arg1, arg2
GAS Syntax
pcmpestrm arg2, arg1, IMM8
Intel Syntax
PCMPESTRM
, Packed Compare Explicit Length Strings, Return Mask. Compares strings of explicit length and generates a mask stored in XMM0
.
Operands
arg1
- XMM Register
arg2
- XMM Register
- Memory
IMM8
- 8-bit Immediate value
Implicit Operands
EAX
holds the length of arg1
EDX
holds the length of arg2
Modified flags
CF
is reset if IntRes2
is zero, set otherwise
ZF
is set if EDX
is < 16(for bytes) or 8(for words), reset otherwise
SF
is set if EAX
is < 16(for bytes) or 8(for words), reset otherwise
OF
is set to IntRes2[0]
AF
is reset
PF
is reset