부호없는 정수와 부호있는 정수의 성능

program tip

부호없는 정수와 부호있는 정수의 성능

radiobox 2020. 11. 16. 08:05

부호없는 정수와 부호있는 정수의 성능

부호있는 정수보다 부호없는 정수를 사용하여 성능상의 이득 / 손실이 있습니까?

그렇다면 짧고 길게 지속됩니까?

2의 거듭 제곱으로 unsigned int나누는 것은 단일 시프트 명령어로 최적화 할 수 있기 때문에를 사용하면 더 빠릅니다 . 을 사용하면 signed int나누기 가 0으로 반올림 되지만 오른쪽으로 이동하면 반올림되기 때문에 일반적으로 더 많은 기계 명령어가 필요 합니다 . 예:

int foo(int x, unsigned y)
{
    x /= 8;
    y /= 8;
    return x + y;
}

다음은 관련 x부분 (서명 된 부분)입니다.

movl 8(%ebp), %eax
leal 7(%eax), %edx
testl %eax, %eax
cmovs %edx, %eax
sarl $3, %eax

다음은 관련 y부분 (부호없는 나눗셈)입니다.

movl 12(%ebp), %edx
shrl $3, %edx

C ++ (및 C)에서 부호있는 정수 오버플로는 정의되지 않은 반면 부호없는 정수 오버플로는 래핑하도록 정의됩니다. 예를 들어 gcc에서 -fwrapv 플래그를 사용하여 서명 된 오버플로를 정의 할 수 있습니다 (둘러싸 기 위해).

정의되지 않은 부호있는 정수 오버플로는 컴파일러가 오버플로가 발생하지 않는다고 가정하여 최적화 기회를 제공 할 수 있습니다. 토론은 이 블로그 게시물 을 참조하십시오 .

unsigned보다 동일하거나 더 나은 성능을 제공 signed합니다. 몇 가지 예 :

2의 거듭 제곱 인 상수로 나누기 (FredOverflow 의 답변 참조 )
상수로 나누기 (예 : 내 컴파일러는 서명되지 않은 경우 2 개의 asm 명령어, 서명 된 경우 6 개의 명령어를 사용하여 13으로 나누기를 구현합니다.)
숫자가 짝수인지 확인 (내 MS Visual Studio 컴파일러가 signed숫자에 대한 4 개의 명령어로 구현하는 이유를 모르겠습니다 .gcc는 경우와 같이 1 개의 명령어로 수행합니다 unsigned)

short일반적으로 int(가정 sizeof(short) < sizeof(int)) 과 동일하거나 더 나쁜 성능으로 이어집니다 . 성능 저하는 산술 연산의 결과 (일반적 int으로 절대 short)를 유형의 변수에 할당 할 때 발생 합니다.이 변수 short는 프로세서의 레지스터 (유형이기도 함 int)에 저장됩니다. 에서 로의 모든 변환은 시간 short이 int걸리고 성가시다.

참고 : 일부 DSP에는 signed short유형에 대한 빠른 곱셈 명령이 있습니다 . 이 특정 경우 short에는보다 빠릅니다 int.

의 차이에 관해서 int와 long, 난 단지 추측 할 수있다 (나는 64 비트 아키텍처에 익숙하지 않다). 경우 물론, int및 long(32 비트 플랫폼에서) 동일한 크기를 가지고, 그 성능은 동일합니다.

여러 사람이 지적한 매우 중요한 추가 사항 :

대부분의 애플리케이션에서 정말 중요한 것은 메모리 공간과 사용 된 대역폭입니다. 큰 배열 short에는 필요한 가장 작은 정수 ( , 심지어 signed/unsigned char)를 사용해야합니다 .

이것은 더 나은 성능을 제공하지만 이득은 비선형 (즉, 2 또는 4의 비율이 아님)이며 다소 예측할 수 없습니다. 이는 캐시 크기와 애플리케이션의 계산과 메모리 전송 간의 관계에 따라 다릅니다.

이것은 정확한 구현에 달려 있습니다. 그러나 대부분의 경우 차이가 없습니다. 정말로 관심이 있다면 고려하는 모든 변형을 시도하고 성능을 측정해야합니다.

이것은 특정 프로세서에 따라 크게 달라집니다.

대부분의 프로세서에는 부호있는 산술과 부호없는 산술 모두에 대한 명령어가 있으므로 부호있는 정수와 부호없는 정수 사용의 차이는 컴파일러가 사용하는 정수에 따라 달라집니다.

둘 중 하나가 더 빠르면 완전히 프로세서에 따라 다르며, 전혀 존재한다면 차이가 거의 없을 가능성이 큽니다.

부호있는 정수와 부호없는 정수의 성능 차이는 실제로 수락 답변이 제안하는 것보다 더 일반적입니다. 부호없는 정수를 상수로 나누는 것은 상수가 2의 거듭 제곱인지 여부에 관계없이 부호있는 정수를 상수로 나누는 것보다 빠르게 만들 수 있습니다. http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html 참조

게시물 끝에 그는 다음 섹션을 포함합니다.

자연스러운 질문은 동일한 최적화가 부호있는 분할을 개선 할 수 있는지 여부입니다. 불행히도 두 가지 이유로 그렇지 않은 것 같습니다.

배당금의 증가는 크기의 증가가되어야합니다. 즉, n> 0이면 증가하고 n <0이면 감소합니다. 이는 추가 비용을 발생시킵니다.

비협조적 제수에 대한 벌금은 부호있는 분할의 절반 정도에 불과하며 개선을위한 더 작은 창을 남깁니다.

따라서 반올림 알고리즘은 부호있는 분할에서 작동하도록 만들 수 있지만 표준 반올림 알고리즘보다 성능이 떨어집니다.

부호없는 유형에서는 2의 거듭 제곱으로 나누는 것이 더 빠를뿐만 아니라 다른 값으로 나누는 것도 부호없는 유형에서 더 빠릅니다. 당신이 보면 Agner 안개의 지시 테이블 당신은 부호 분할 서명 버전보다 비슷하거나 더 나은 성능을 가지고 있음을 볼 수 있습니다

예를 들어 AMD K7의 경우

╔═════════════╤══════════╤═════╤═════════╤═══════════════════════╗
║ Instruction │ Operands │ Ops │ Latency │ Reciprocal throughput ║
╠═════════════╪══════════╪═════╪═════════╪═══════════════════════╣
║ DIV         │ r8/m8    │ 32  │ 24      │ 23                    ║
║ DIV         │ r16/m16  │ 47  │ 24      │ 23                    ║
║ DIV         │ r32/m32  │ 79  │ 40      │ 40                    ║
║ IDIV        │ r8       │ 41  │ 17      │ 17                    ║
║ IDIV        │ r16      │ 56  │ 25      │ 25                    ║
║ IDIV        │ r32      │ 88  │ 41      │ 41                    ║
║ IDIV        │ m8       │ 42  │ 17      │ 17                    ║
║ IDIV        │ m16      │ 57  │ 25      │ 25                    ║
║ IDIV        │ m32      │ 89  │ 41      │ 41                    ║
╚═════════════╧══════════╧═════╧═════════╧═══════════════════════╝

Intel Pentium에도 동일하게 적용됩니다.

╔═════════════╤══════════╤══════════════╗
║ Instruction │ Operands │ Clock cycles ║
╠═════════════╪══════════╪══════════════╣
║ DIV         │ r8/m8    │ 17           ║
║ DIV         │ r16/m16  │ 25           ║
║ DIV         │ r32/m32  │ 41           ║
║ IDIV        │ r8/m8    │ 22           ║
║ IDIV        │ r16/m16  │ 30           ║
║ IDIV        │ r32/m32  │ 46           ║
╚═════════════╧══════════╧══════════════╝

물론 그것들은 아주 오래된 것입니다. 더 많은 트랜지스터가있는 새로운 아키텍처는 격차를 좁힐 수 있지만 기본 사항이 적용됩니다. 일반적으로 부호있는 분할을 수행하려면 더 많은 매크로 연산, 더 많은 논리, 더 많은 지연 시간이 필요합니다.

요컨대, 사실 전에 신경 쓰지 마십시오. 그러나 나중에 귀찮게하십시오.

If you want to have performance you have to use performance optimizations of a compiler which may work against common sense. One thing to remember is that different compilers can compile code differently and they themselves have different sorts of optimizations. If we're talking about a g++ compiler and talking about maxing out it's optimization level by using -Ofast, or at least an -O3 flag, in my experience it can compile long type into code with even better performance than any unsigned type, or even just int.

This is from my own experience and I recommend you to first write your full program and care about such things only after that, when you have your actual code on your hands and you can compile it with optimizations to try and pick the types that actually perform best. This is also a good very general suggestion about code optimization for performance, write quickly first, try compiling with optimizations, tweak things to see what works best. And you should also try using different compilers to compile your program and choosing the one that outputs the most performant machine code.

An optimized multi-threaded linear algebra calculation program can easily have a >10x performance difference finely optimized vs unoptimized. So this does matter.

Optimizer output contradicts logic in plenty of cases. For example, I had a case when a difference between a[x]+=b and a[x]=b changed program execution time almost 2x. And no, a[x]=b wasn't the faster one.

Here's for example NVidia stating that for programming their GPUs:

Note: As was already the recommended best practice, signed arithmetic should be preferred over unsigned arithmetic wherever possible for best throughput on SMM. The C language standard places more restrictions on overflow behavior for unsigned math, limiting compiler optimization opportunities.

IIRC, on x86 signed/unsigned shouldn't make any difference. Short/long, on the other hand, is a different story, since the amount of data that has to be moved to/from RAM is bigger for longs (other reasons may include cast operations like extending a short to long).

Signed and unsigned integers will always both operate as single clock instructions and have the same read-write performance but according to Dr Andrei Alexandrescu unsigned is preferred over signed. The reason for this is you can fit twice the amount of numbers in the same number of bits because you're not wasting the sign bit and you will use fewer instructions checking for negative numbers yielding performance increases from the decreased ROM. In my experience with the Kabuki VM, which features an ultra-high-performance Script Implementation, it is rare that you actually require a signed number when working with memory. I've spend may years doing pointer arithmetic with signed and unsigned numbers and I've found no benefit to the signed when no sign bit is needed.

Where signed may be preferred is when using bit shifting to perform multiplication and division of powers of 2 because you may perform negative powers of 2 division with signed 2's complement integers. Please see some more YouTube videos from Andrei for more optimization techniques. You can also find some good info in my article about the the world's fastest Integer-to-String conversion algorithm.

Traditionally int is the native integer format of the target hardware platform. Any other integer type may incur performance penalties.

EDIT:

Things are slightly different on modern systems:

int may in fact be 32-bit on 64-bit systems for compatibility reasons. I believe this happens on Windows systems.
Modern compilers may implicitly use int when performing computations for shorter types in some cases.

Unsigned integer is advantageous in that you store and treat both as bitstream, I mean just a data, without sign, so multiplication, devision becomes easier (faster) with bit-shift operations

참고URL : https://stackoverflow.com/questions/4712315/performance-of-unsigned-vs-signed-integers

'program tip' 카테고리의 다른 글

800px 이후 scrollDown에 div 표시 (0)	2020.11.16
ImageMagick / RMagick-RMagick 2.13.1을 설치할 수 없습니다. (0)	2020.11.16
이미지 크기 (drawable-hdpi / ldpi / mdpi / xhdpi) (0)	2020.11.16
Lodash : 중첩 된 개체가있을 때 필터를 어떻게 사용합니까? (0)	2020.11.16
Android 애플리케이션 내에서 SQLite 쿼리를 수행하는 방법은 무엇입니까? (0)	2020.11.16

현재글부호없는 정수와 부호있는 정수의 성능

radiobox

부호없는 정수와 부호있는 정수의 성능

부호없는 정수와 부호있는 정수의 성능

'program tip' 카테고리의 다른 글

'program tip'의 다른글

티스토리툴바

부호없는 정수와 부호있는 정수의 성능

부호없는 정수와 부호있는 정수의 성능

'program tip' 카테고리의 다른 글

'program tip'의 다른글

관련글

티스토리툴바