|
|
|
![]() |
|
Strumenti |
![]() |
#1 |
Senior Member
Iscritto dal: Feb 2008
Messaggi: 910
|
Intel SSE
ciao a tutti,
avrei bisogno di una lista delle funzioni SSE (e successive, fino al 4.1 possibilmente) supportate dai processori Intel. qualcuno sa dove trovarla? |
![]() |
![]() |
![]() |
#2 |
Senior Member
Iscritto dal: Aug 2000
Messaggi: 17963
|
sse
addps - Adds 4 single-precision (32bit) floating-point values to 4 other single-precision floating-point values. addss - Adds the lowest single-precision values, top 3 remain unchanged. subps - Subtracts 4 single-precision floating-point values from 4 other single-precision floating-point values. subss - Subtracts the lowest single-precision values, top 3 remain unchanged. mulps - Multiplies 4 single-precision floating-point values with 4 other single-precision values. mulss - Multiplies the lowest single-precision values, top 3 remain unchanged. divps - Divides 4 single-precision floating-point values by 4 other single-precision floating-point values. divss - Divides the lowest single-precision values, top 3 remain unchanged. rcpps - Reciprocates (1/x) 4 single-precision floating-point values. rcpss - Reciprocates the lowest single-precision values, top 3 remain unchanged. sqrtps - Square root of 4 single-precision values. sqrtss - Square root of lowest value, top 3 remain unchanged. rsqrtps - Reciprocal square root of 4 single-precision floating-point values. rsqrtss - Reciprocal square root of lowest single-precision value, top 3 remain unchanged. maxps - Returns maximum of 2 values in each of 4 single-precision values. maxss - Returns maximum of 2 values in the lowest single-precision value. Top 3 remain unchanged. minps - Returns minimum of 2 values in each of 4 single-precision values. minss - Returns minimum of 2 values in the lowest single-precision value, top 3 remain unchanged. pavgb - Returns average of 2 values in each of 8 bytes. pavgw - Returns average of 2 values in each of 4 words. psadbw - Returns sum of absolute differences of 8 8bit values. Result in bottom 16 bits. pextrw - Extracts 1 of 4 words. pinsrw - Inserts 1 of 4 words. pmaxsw - Returns maximum of 2 values in each of 4 signed word values. pmaxub - Returns maximum of 2 values in each of 8 unsigned byte values. pminsw - Returns minimum of 2 values in each of 4 signed word values. pminub - Returns minimum of 2 values in each of 8 unsigned byte values. pmovmskb - builds mask byte from top bit of 8 byte values. pmulhuw - Multiplies 4 unsigned word values and stores the high 16bit result. pshufw - Shuffles 4 word values. Takes 2 128bit values (source and dest) and an 8-bit immediate value, and then fills in each Dest 32-bit value from a Source 32-bit value specified by the immediate. The immediate byte is broken into 4 2-bit values. Logic: andnps - Logically ANDs 4 single-precision values with the logical inverse (NOT) of 4 other single-precision values. andps - Logically ANDs 4 single-precision values with 4 other single-precision values. orps - Logically ORs 4 single-precision values with 4 other single-precision values. xorps - Logically XORs 4 single-precision values with 4 other single-precision values. Compare: cmpxxps - Compares 4 single-precision values. cmpxxss - Compares lowest 2 single-precision values. comiss - Compares lowest 2 single-recision values and stores result in EFLAGS. ucomiss - Compares lowest 2 single-precision values and stores result in EFLAGS. (QNaNs don't throw exceptions with ucomiss, unlike comiss.) Compare Codes (the xx parts above): eq - Equal to. lt - Less than. le - Less than or equal to. ne - Not equal. nlt - Not less than. nle - Not less than or equal to. ord - Ordered. unord - Unordered. Conversion: cvtpi2ps - Converts 2 32bit integers to 32bit floating-point values. Top 2 values remain unchanged. cvtps2pi - Converts 2 32bit floating-point values to 32bit integers. cvtsi2ss - Converts 1 32bit integer to 32bit floating-point value. Top 3 values remain unchanged. cvtss2si - Converts 1 32bit floating-point value to 32bit integer. cvttps2pi - Converts 2 32bit floating-point values to 32bit integers using truncation. cvttss2si - Converts 1 32bit floating-point value to 32bit integer using truncation. State: fxrstor - Restores FP and SSE State. fxsave - Stores FP and SSE State. ldmxcsr - Loads the mxcsr register. stmxcsr - Stores the mxcsr register. Load/Store: movaps - Moves a 128bit value. movhlps - Moves high half to a low half. movlhps - Moves low half to upper halves.? movhps - Moves 64bit value into top half of an xmm register. movlps - Moves 64bit value into bottom half of an xmm register. movmskps - Moves top bits of single-precision values into bottom 4 bits of a 32bit register. movss - Moves the bottom single-precision value, top 3 remain unchanged is another xmm register, otherwise they're set to zero. movups - Moves a 128bit value. Address can be unaligned. maskmovq - Moves a 64bit value according to a mask. movntps - Moves a 128bit value directly to memory, skipping the cache. (NT stands for "Non Temporal".) movntq - Moves a 64bit value directly to memory, skipping the cache. Shuffling: shufps - Shuffles 4 single-precision values. Complex. unpckhps - Unpacks single-precision values from high halves. unpcklps - Unpacks single-precision values from low halves. Cache Control: prefetchT0 - Fetches a cache-line of data into all levels of cache. prefetchT1 - Fetches a cache-line of data into all but the highest levels of cache. prefetchT2 - Fetches a cache-line of data into all but the two highest levels of cache. prefetchNTA - Fetches data into only the highest level of cache, not the lower levels. sfence - Guarantees that all memory writes issued before the sfence instruction are completed before any writes after the sfence instruction. sse2 Arithmetic: addpd - Adds 2 64bit doubles. addsd - Adds bottom 64bit doubles. subpd - Subtracts 2 64bit doubles. subsd - Subtracts bottom 64bit doubles. mulpd - Multiplies 2 64bit doubles. mulsd - Multiplies bottom 64bit doubles. divpd - Divides 2 64bit doubles. divsd - Divides bottom 64bit doubles. maxpd - Gets largest of 2 64bit doubles for 2 sets. maxsd - Gets largets of 2 64bit doubles to bottom set. minpd - Gets smallest of 2 64bit doubles for 2 sets. minsd - Gets smallest of 2 64bit values for bottom set. paddb - Adds 16 8bit integers. paddw - Adds 8 16bit integers. paddd - Adds 4 32bit integers. paddq - Adds 2 64bit integers. paddsb - Adds 16 8bit integers with saturation. paddsw - Adds 8 16bit integers using saturation. paddusb - Adds 16 8bit unsigned integers using saturation. paddusw - Adds 8 16bit unsigned integers using saturation. psubb - Subtracts 16 8bit integers. psubw - Subtracts 8 16bit integers. psubd - Subtracts 4 32bit integers. psubq - Subtracts 2 64bit integers. psubsb - Subtracts 16 8bit integers using saturation. psubsw - Subtracts 8 16bit integers using saturation. psubusb - Subtracts 16 8bit unsigned integers using saturation. psubusw - Subtracts 8 16bit unsigned integers using saturation. pmaddwd - Multiplies 16bit integers into 32bit results and adds results. pmulhw - Multiplies 16bit integers and returns the high 16bits of the result. pmullw - Multiplies 16bit integers and returns the low 16bits of the result. pmuludq - Multiplies 2 32bit pairs and stores 2 64bit results. rcpps - Approximates the reciprocal of 4 32bit singles. rcpss - Approximates the reciprocal of bottom 32bit single. sqrtpd - Returns square root of 2 64bit doubles. sqrtsd - Returns square root of bottom 64bit double. Logic: andnpd - Logically NOT ANDs 2 64bit doubles. andnps - Logically NOT ANDs 4 32bit singles. andpd - Logically ANDs 2 64bit doubles. pand - Logically ANDs 2 128bit registers. pandn - Logically Inverts the first 128bit operand and ANDs with the second. por - Logically ORs 2 128bit registers. pslldq - Logically left shifts 1 128bit value. psllq - Logically left shifts 2 64bit values. pslld - Logically left shifts 4 32bit values. psllw - Logically left shifts 8 16bit values. psrad - Arithmetically right shifts 4 32bit values. psraw - Arithmetically right shifts 8 16bit values. psrldq - Logically right shifts 1 128bit values. psrlq - Logically right shifts 2 64bit values. psrld - Logically right shifts 4 32bit values. psrlw - Logically right shifts 8 16bit values. pxor - Logically XORs 2 128bit registers. orpd - Logically ORs 2 64bit doubles. xorpd - Logically XORs 2 64bit doubles. Compare: cmppd - Compares 2 pairs of 64bit doubles. cmpsd - Compares bottom 64bit doubles. comisd - Compares bottom 64bit doubles and stores result in EFLAGS. ucomisd - Compares bottom 64bit doubles and stores result in EFLAGS. (QNaNs don't throw exceptions with ucomisd, unlike comisd. pcmpxxb - Compares 16 8bit integers. pcmpxxw - Compares 8 16bit integers. pcmpxxd - Compares 4 32bit integers. Compare Codes (the xx parts above): eq - Equal to. lt - Less than. le - Less than or equal to. ne - Not equal. nlt - Not less than. nle - Not less than or equal to. ord - Ordered. unord - Unordered. Conversion: cvtdq2pd - Converts 2 32bit integers into 2 64bit doubles. cvtdq2ps - Converts 4 32bit integers into 4 32bit singles. cvtpd2pi - Converts 2 64bit doubles into 2 32bit integers in an MMX register. cvtpd2dq - Converts 2 64bit doubles into 2 32bit integers in the bottom of an XMM register. cvtpd2ps - Converts 2 64bit doubles into 2 32bit singles in the bottom of an XMM register. cvtpi2pd - Converts 2 32bit integers into 2 32bit singles in the bottom of an XMM register. cvtps2dq - Converts 4 32bit singles into 4 32bit integers. cvtps2pd - Converts 2 32bit singles into 2 64bit doubles. cvtsd2si - Converts 1 64bit double to a 32bit integer in a GPR. cvtsd2ss - Converts bottom 64bit double to a bottom 32bit single. Tops are unchanged. cvtsi2sd - Converts a 32bit integer to the bottom 64bit double. cvtsi2ss - Converts a 32bit integer to the bottom 32bit single. cvtss2sd - Converts bottom 32bit single to bottom 64bit double. cvtss2si - Converts bottom 32bit single to a 32bit integer in a GPR. cvttpd2pi - Converts 2 64bit doubles to 2 32bit integers using truncation into an MMX register. cvttpd2dq - Converts 2 64bit doubles to 2 32bit integers using truncation. cvttps2dq - Converts 4 32bit singles to 4 32bit integers using truncation. cvttps2pi - Converts 2 32bit singles to 2 32bit integers using truncation into an MMX register. cvttsd2si - Converts a 64bit double to a 32bit integer using truncation into a GPR. cvttss2si - Converts a 32bit single to a 32bit integer using truncation into a GPR. Load/Store: (is "minimize cache pollution" the same as "without using cache"??) movq - Moves a 64bit value, clearing the top 64bits of an XMM register. movsd - Moves a 64bit double, leaving tops unchanged if move is between two XMMregisters. movapd - Moves 2 aligned 64bit doubles. movupd - Moves 2 unaligned 64bit doubles. movhpd - Moves top 64bit value to or from an XMM register. movlpd - Moves bottom 64bit value to or from an XMM register. movdq2q - Moves bottom 64bit value into an MMX register. movq2dq - Moves an MMX register value to the bottom of an XMM register. Top is cleared to zero. movntpd - Moves a 128bit value to memory without using the cache. NT is "Non Temporal." movntdq - Moves a 128bit value to memory without using the cache. movnti - Moves a 32bit value without using the cache. maskmovdqu - Moves 16 bytes based on sign bits of another XMM register. pmovmskb - Generates a 16bit Mask from the sign bits of each byte in an XMM register. Shuffling: pshufd - Shuffles 32bit values in a complex way. pshufhw - Shuffles high 16bit values in a complex way. pshuflw - Shuffles low 16bit values in a complex way. unpckhpd - Unpacks and interleaves top 64bit doubles from 2 128bit sources into 1. unpcklpd - Unpacks and interleaves bottom 64bit doubles from 2 128 bit sources into 1. punpckhbw - Unpacks and interleaves top 8 8bit integers from 2 128bit sources into 1. punpckhwd - Unpacks and interleaves top 4 16bit integers from 2 128bit sources into 1. punpckhdq - Unpacks and interleaves top 2 32bit integers from 2 128bit sources into 1. punpckhqdq - Unpacks and interleaces top 64bit integers from 2 128bit sources into 1. punpcklbw - Unpacks and interleaves bottom 8 8bit integers from 2 128bit sources into 1. punpcklwd - Unpacks and interleaves bottom 4 16bit integers from 2 128bit sources into 1. punpckldq - Unpacks and interleaves bottom 2 32bit integers from 2 128bit sources into 1. punpcklqdq - Unpacks and interleaces bottom 64bit integers from 2 128bit sources into 1. packssdw - Packs 32bit integers to 16bit integers using saturation. packsswb - Packs 16bit integers to 8bit integers using saturation. packuswb - Packs 16bit integers to 8bit unsigned integers unsing saturation. Cache Control: clflush - Flushes a Cache Line from all levels of cache. lfence - Guarantees that all memory loads issued before the lfence instruction are completed before anyloads after the lfence instruction. mfence - Guarantees that all memory reads and writes issued before the mfence instruction are completed before any reads or writes after the mfence instruction. pause - Pauses execution for a set amount of time. sse3 addsubpd - Adds the top two doubles and subtracts the bottom two. addsubps - Adds top singles and subtracts bottom singles. haddpd - Top double is sum of top and bottom, bottom double is sum of second operand's top and bottom. haddps - Horizontal addition of single-precision values. hsubpd - Horizontal subtraction of double-precision values. hsubps - Horizontal subtraction of single-precision values. Load/Store: lddqu - Loads an unaligned 128bit value. movddup - Loads 64bits and duplicates it in the top and bottom halves of a 128bit register. movshdup - Duplicates the high singles into high and low singles. movsldup - Duplicates the low singles into high and low singles. fisttp - Converts a floating-point value to an integer using truncation. Process Control: monitor - Sets up a region to monitor for activity. mwait - Waits until activity happens in a region specified by monitor. ssee3 psignd - Gives 32bit integer magnitudes the sign of the 2nd operand. psignw - Gives 16bit integer magnitudes the sign of the 2nd operand. psignb - Gives 8bit integer magnitudes the sign of the 2nd operand. phaddd - Horizontal addition of unsigned 32bit integers. phaddw - Horizontal addition of unsigned 16bit integers. phaddsw - Horizontal saturated addition of 16bit integers. phsubd - Horizontal subtraction of unsigned 32bit integers. phsubw - Horizontal subtraction of unsigned 16bit integers. phsubsw - Horizontal saturated subtraction of 16bit words. pmaddubsw - Multiply-accumulate instruction (finally). pabsd - abs() for 32bit integers. pabsw - abs() for 16bit integers. pabsb - abs() for 8bit integers. pmulhrsw - 16bit integer multiplication, stores top 16bits of result. pshufb - Another complex shuffle instruction. palignr - Combines two register values, and extracts a register-width value from it, based on an offset. sse4.1/2/a SSE4.1 mpsadbw - Sum of absolute differences. phminposuw - minimum+index extraction (16bit word). pmuldq - packed multiply. pmulld - packed multiply. dpps - dot product, single precision. dppd - dot product, double precision. blendps - conditional copy. blendpd - conditional copy. blendvps - conditional copy. blendvpd - conditional copy. pblendvb - conditional copy. pblendw - conditional copy. pminsb - packed minimum signed byte. pmaxsb - packed maximum signed byte. pminuw - packed minimum unsigned word. pmaxuw - packed maximum unsigned word. pminud - packed minimum unsigned dword. pmaxud - packed maximum unsigned dword. pminsd - packed minimum signed dword. pmaxsd - packed maximum signed dword. roundps - packed round single precision float to integer. roundss - scalar round single precision float to integer. roundpd - packed round double precision float to integer. roundsd - scalar round double precision float to integer. inserps - complex data shuffling. pinsrb - complex data shuffling. pinsrd - complex data shuffling. pinsrq - complex data shuffling. extractps - complex data shuffling. pextrb - complex data shuffling. pextrw - complex data shuffling. pextrd - complex data shuffling. pextrq - complex data shuffling. pmovsxbw - packed sign extension. pmovzxbw - packed zero extension. pmovsxbd - packed sign extension. pmovzxbd - packed zero extension. pmovsxbq - packed sign extension. pmovzxbq - packed zero extension. pmovxswd - packed sign extension. pmovzxwd - packed zero extension. pmovsxwq - packed sign extension. pmovzxwq - packed zero extension. pmovsxdq - packed sign extension. pmovzxdq - packed zero extension. ptest - same as test, but for sse registers. pcmpeqq - quadword compare for equality. packusdw - saturating signed dwords to unsigned words. movntdqa - Non-temporal aligned move (this uses write-combining for efficiency). SSE4.2 crc32 - CRC32C function (using 0x11edc6f41 as the polynomial). pcmpestri - Packed compare explicit length string, Index. pcmpestrm - Packed compare explicit length string, Mask. pcmpistri - Packed compare implicit length string, Index. pcmpistrm - Packed compare implicit length string, Mask. pcmpgtq - Packed compare, greater than. popcnt - Population count. SSE4a lzcnt - Leading Zero count. popcnt - Population count. extrq - Mask-shift operation. inserq - Mask-shift operation. movntsd - Non-temporal double precision move. movntss - Non-temporal single precision move. google può esser spesso di aiuto lo sai? http://softpixel.com/~cwright/progra...simd/index.php
__________________
. |
![]() |
![]() |
![]() |
#3 | |
Senior Member
Iscritto dal: Feb 2008
Messaggi: 910
|
Quote:
|
|
![]() |
![]() |
![]() |
#4 | |
Senior Member
Iscritto dal: May 2008
Messaggi: 8003
|
Quote:
![]() ![]() edit: però le SSE4a non sono di AMD? |
|
![]() |
![]() |
![]() |
#6 | |
Senior Member
Iscritto dal: May 2008
Messaggi: 8003
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
Strumenti | |
|
|
Tutti gli orari sono GMT +1. Ora sono le: 16:43.