program tip

C에서 restrict 키워드를 사용하는 규칙?

radiobox 2020. 11. 9. 07:59
반응형

C에서 restrict 키워드를 사용하는 규칙?


restrictC 에서 키워드 를 사용하지 않는시기와시기 , 그리고 그것이 어떤 상황에서 실질적인 이점을 제공하는지 이해하려고합니다 .

" Demystifying The Restrict Keyword "(사용에 대한 경험 규칙을 제공함)를 읽은 후 함수가 포인터를 전달할 때 가리키는 데이터가 겹칠 가능성 (별칭)을 고려해야한다는 인상을 받았습니다. 다른 인수가 함수에 전달됩니다. 주어진 함수 :

foo(int *a, int *b, int *c, int n) {
    for (int i = 0; i<n; ++i) {
        b[i] = b[i] + c[i];
        a[i] = a[i] + b[i] * c[i];
    } 
}

컴파일러는 다시로드한다 c어쩌면 때문에, 제 발현 bc지점 같은 위치. 또한 같은 이유로 b로드되기 전에 저장 될 때까지 기다려야합니다 a. 그런 다음 기다려야합니다 a저장하고 다시로드해야합니다 bc다음 루프의 시작 부분에. 다음과 같이 함수를 호출하면 :

int a[N];
foo(a, a, a, N);

그러면 컴파일러가 이것을해야하는 이유를 알 수 있습니다. restrict효과적으로 사용 하면 컴파일러에이 작업을 절대 수행하지 않을 것이므로 저장 되기 전에 중복로드 c및로드 a삭제할 수 있습니다 b.

다른 SO 게시물에서 Nils Pipenbrinck는 성능 이점을 보여주는이 시나리오의 작업 예제를 제공합니다.

지금까지 restrict인라인되지 않는 함수에 전달하는 포인터 에 사용하는 것이 좋습니다 . 분명히 코드가 인라인되면 컴파일러는 포인터가 겹치지 않는다는 것을 알아낼 수 있습니다.

이제 여기가 저에게 애매 해지기 시작하는 부분입니다.

Ulrich Drepper의 논문에서 " 모든 프로그래머가 메모리에 대해 알아야하는 것 "에서 그는 "제한이 사용되지 않는 한 모든 포인터 액세스가 앨리어싱의 잠재적 인 소스입니다"라는 진술을하고, 그가 어디에 있는지 부분 행렬 곱셈의 특정 코드 예제를 제공합니다. 를 사용합니다 restrict.

그러나 그의 예제 코드를 포함하거나 포함하지 않고 컴파일하면 restrict두 경우 모두 동일한 바이너리가 생성됩니다. 나는 사용하고있다gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4)

다음 코드에서 알아낼 수없는 것은를 더 광범위하게 사용하기 위해 다시 작성해야하는지 restrict또는 GCC의 별칭 분석이 너무 좋기 때문에 각 인수에 별칭이 없음을 알아낼 수 있는지 여부입니다. 다른. 순수하게 교육적인 목적 restrict으로이 코드에서 matter를 사용하거나 사용하지 않으려면 어떻게 해야합니까? 그 이유는 무엇입니까?

다음으로 restrict컴파일 :

gcc -DCLS=$(getconf LEVEL1_DCACHE_LINESIZE) -DUSE_RESTRICT -Wextra -std=c99 -O3 matrixMul.c -o matrixMul

-DUSE_RESTRICT사용하지 않으려면 제거하십시오 restrict.

#include <stdlib.h>
#include <stdio.h>
#include <emmintrin.h>

#ifdef USE_RESTRICT
#else
#define restrict
#endif

#define N 1000
double _res[N][N] __attribute__ ((aligned (64)));
double _mul1[N][N] __attribute__ ((aligned (64)))
    = { [0 ... (N-1)] 
    = { [0 ... (N-1)] = 1.1f }};
double _mul2[N][N] __attribute__ ((aligned (64)))
    = { [0 ... (N-1)] 
    = { [0 ... (N-1)] = 2.2f }};

#define SM (CLS / sizeof (double))

void mm(double (* restrict res)[N], double (* restrict mul1)[N], 
        double (* restrict mul2)[N]) __attribute__ ((noinline));

void mm(double (* restrict res)[N], double (* restrict mul1)[N], 
        double (* restrict mul2)[N])
{
 int i, i2, j, j2, k, k2; 
    double *restrict rres; 
    double *restrict rmul1; 
    double *restrict rmul2; 

    for (i = 0; i < N; i += SM)
        for (j = 0; j < N; j += SM)
            for (k = 0; k < N; k += SM)
                for (i2 = 0, rres = &res[i][j],
                    rmul1 = &mul1[i][k]; i2 < SM;
                    ++i2, rres += N, rmul1 += N)
                    for (k2 = 0, rmul2 = &mul2[k][j];
                        k2 < SM; ++k2, rmul2 += N)
                        for (j2 = 0; j2 < SM; ++j2)
                          rres[j2] += rmul1[k2] * rmul2[j2];
}

int main (void)
{

    mm(_res, _mul1, _mul2);

 return 0;
}

또한 GCC 4.0.0-4.4에는 restrict 키워드를 무시하는 회귀 버그가 있습니다. 이 버그는 4.5에서 수정 된 것으로보고되었습니다 (하지만 버그 번호를 잃었습니다).


It is a hint to the code optimizer. Using restrict ensures it that it can store a pointer variable in a CPU register and not have to flush an update of the pointer value to memory so that an alias is updated as well.

Whether or not it takes advantage of it depends heavily on implementation details of the optimizer and the CPU. Code optimizers already are heavily invested in detecting non-aliasing since it is such an important optimization. It should have no trouble detecting that in your code.


(I don't know if using this keyword gives you a significant advantage, actually. It's very easy for programmer to err with this qualifier as there is no enforcement, so an optimizer cannot be certain that the programmer doesn't "lie".)

When you know that a pointer A is the only pointer to some region of memory, that is, it doesn't have aliases (that is, any other pointer B will necessarily be unequal to A, B != A), you can tell this fact to the optimizer by qualifying the type of A with the "restrict" keyword.

I have written about this here: http://mathdev.org/node/23 and tried to show that some restricted pointers are in fact "linear" (as mentioned in that post).


It's worth noting that recent versions of clang are capable of generating code with a run-time check for aliasing, and two code paths: one for cases where there is potential aliasing and the other for case where is is obvious there is no chance of it.

This clearly depends on the extents of data pointed to being conspicuous to the compiler - as they would be in the example above.

I believe the prime justification is for programs making heavy use of STL - and particularly <algorithm> , where is either difficult or impossible to introduce the __restrict qualifier.

Of course, this all comes at the expense of code-size, but removes a great deal of potential for obscure bugs that could result for pointers declared as __restrict not being quite as non-overlapping as the developer thought.

I would be surprised if GCC hadn't also got this optimisation.


May be the optimisation done here don't rely on pointers not being aliased ? Unless you preload multiple mul2 element before writing result in res2, I don't see any aliasing problem.

In the first piece of code you show, it is quite clear what kind of aliases problem can occur. Here it is not so clear.

Rereading Dreppers article, he does not specifically says restrict might solve anything. There is even this phrase :

{In theory the restrict keyword introduced into the C language in the 1999 revision should solve the problem. Compilers have not caught up yet, though. The reason is mainly that too much incorrect code exists which would mislead the compiler and cause it to generate incorrect object code.}

In this code, optimisations of memory access has already been done within the algorithm. The residual optimisation seems to be done in the vectorized code presented in appendice. So for the code presented here, I guess there is no difference, because no optimisation relying on restrict is done. Every pointer access is a source of aliasing, but not every optimisation relies on aliassing.

Premature optimization being the root of all evil, the use of the restrict keyword should be limited to the case your are actively studying and optimizing, not used wherever it could be used.


If there is a difference at all, moving mm to a seperate DSO (such that gcc can no longer know everything about the calling code) will be the way to demonstrate it.


Are you running on 32 or 64-bit Ubuntu? If 32-bit, then you need to add -march=core2 -mfpmath=sse (or whatever your processor architecture is), otherwise it doesn't use SSE. Secondly, in order to enable vectorization with GCC 4.2, you need to add the -ftree-vectorize option (as of 4.3 or 4.4 this is included as default in -O3). It might also be necessary to add -ffast-math (or another option providing relaxed floating point semantics) in order to allow the compiler to reorder floating point operations.

Also, add the -ftree-vectorizer-verbose=1 option to see whether it manages to vectorize the loop or not; that's an easy way to check the effect of adding the restrict keyword.


The problem with your example code is that the compiler will just inline the call and see that there is no aliasing ever possible in your example. I suggest you remove the main() function and compile it using -c.

참고URL : https://stackoverflow.com/questions/2005473/rules-for-using-the-restrict-keyword-in-c

반응형