형식화 된 메모리 뷰에 권장되는 메모리 할당 방법은 무엇입니까?
유형이 지정된 메모리보기에 대한 Cython 문서에는 유형이 지정된 메모리보기 에 할당하는 세 가지 방법이 나열되어 있습니다.
- 원시 C 포인터에서
- 에서
np.ndarray
와 - 에서
cython.view.array
.
외부에서 내 cython 함수로 데이터를 전달하지 않고 대신 메모리를 할당하고로 반환하고 싶다고 가정합니다 np.ndarray
. 또한 해당 버퍼의 크기가 컴파일 시간 상수가 아니라고 가정합니다. 즉, 스택에 할당 할 수 없지만 malloc
옵션 1에는 필요합니다 .
따라서 세 가지 옵션은 다음과 같습니다.
from libc.stdlib cimport malloc, free
cimport numpy as np
from cython cimport view
np.import_array()
def memview_malloc(int N):
cdef int * m = <int *>malloc(N * sizeof(int))
cdef int[::1] b = <int[:N]>m
free(<void *>m)
def memview_ndarray(int N):
cdef int[::1] b = np.empty(N, dtype=np.int32)
def memview_cyarray(int N):
cdef int[::1] b = view.array(shape=(N,), itemsize=sizeof(int), format="i")
나에게 놀라운 것은 세 가지 경우 모두 Cython 이 메모리 할당, 특히 __Pyx_PyObject_to_MemoryviewSlice_dc_int
. 이것은 처음에 파이썬 객체를 생성 한 다음 불필요한 오버 헤드처럼 보이는 메모리 뷰로 "캐스트"한다는 것을 암시합니다 (그리고 여기서 제가 틀렸을 수 있습니다. Cython의 내부 작동에 대한 제 통찰력은 매우 제한적입니다).
간단한 벤치 마크 2. 얇은 차이로 가장 빠른 인으로, 세 가지 방법 사이에 큰 차이를 공개하지 않습니다.
세 가지 방법 중 권장되는 것은 무엇입니까? 아니면 다른 더 나은 옵션이 있습니까?
후속 질문 :np.ndarray
함수에서 해당 메모리 뷰로 작업 한 후 최종적으로 결과를으로 반환하고 싶습니다 . 형식화 된 메모리 뷰가 최선의 선택 ndarray
입니까, 아니면 처음 에 생성하기 위해 아래와 같이 이전 버퍼 인터페이스를 사용 하시겠습니까?
cdef np.ndarray[DTYPE_t, ndim=1] b = np.empty(N, dtype=np.int32)
기본적인 아이디어는 당신이 원하는 것입니다 cpython.array.array
및 cpython.array.clone
( 하지 cython.array.*
) :
from cpython.array cimport array, clone
# This type is what you want and can be cast to things of
# the "double[:]" syntax, so no problems there
cdef array[double] armv, templatemv
templatemv = array('d')
# This is fast
armv = clone(templatemv, L, False)
편집하다
그 스레드의 벤치 마크는 쓰레기 였다는 것이 밝혀졌습니다. 내 타이밍과 함께 내 세트는 다음과 같습니다.
# cython: language_level=3
# cython: boundscheck=False
# cython: wraparound=False
import time
import sys
from cpython.array cimport array, clone
from cython.view cimport array as cvarray
from libc.stdlib cimport malloc, free
import numpy as numpy
cimport numpy as numpy
cdef int loops
def timefunc(name):
def timedecorator(f):
cdef int L, i
print("Running", name)
for L in [1, 10, 100, 1000, 10000, 100000, 1000000]:
start = time.clock()
f(L)
end = time.clock()
print(format((end-start) / loops * 1e6, "2f"), end=" ")
sys.stdout.flush()
print("μs")
return timedecorator
print()
print("INITIALISATIONS")
loops = 100000
@timefunc("cpython.array buffer")
def _(int L):
cdef int i
cdef array[double] arr, template = array('d')
for i in range(loops):
arr = clone(template, L, False)
# Prevents dead code elimination
str(arr[0])
@timefunc("cpython.array memoryview")
def _(int L):
cdef int i
cdef double[::1] arr
cdef array template = array('d')
for i in range(loops):
arr = clone(template, L, False)
# Prevents dead code elimination
str(arr[0])
@timefunc("cpython.array raw C type")
def _(int L):
cdef int i
cdef array arr, template = array('d')
for i in range(loops):
arr = clone(template, L, False)
# Prevents dead code elimination
str(arr[0])
@timefunc("numpy.empty_like memoryview")
def _(int L):
cdef int i
cdef double[::1] arr
template = numpy.empty((L,), dtype='double')
for i in range(loops):
arr = numpy.empty_like(template)
# Prevents dead code elimination
str(arr[0])
@timefunc("malloc")
def _(int L):
cdef int i
cdef double* arrptr
for i in range(loops):
arrptr = <double*> malloc(sizeof(double) * L)
free(arrptr)
# Prevents dead code elimination
str(arrptr[0])
@timefunc("malloc memoryview")
def _(int L):
cdef int i
cdef double* arrptr
cdef double[::1] arr
for i in range(loops):
arrptr = <double*> malloc(sizeof(double) * L)
arr = <double[:L]>arrptr
free(arrptr)
# Prevents dead code elimination
str(arr[0])
@timefunc("cvarray memoryview")
def _(int L):
cdef int i
cdef double[::1] arr
for i in range(loops):
arr = cvarray((L,),sizeof(double),'d')
# Prevents dead code elimination
str(arr[0])
print()
print("ITERATING")
loops = 1000
@timefunc("cpython.array buffer")
def _(int L):
cdef int i
cdef array[double] arr = clone(array('d'), L, False)
cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]
# Prevents dead-code elimination
str(d)
@timefunc("cpython.array memoryview")
def _(int L):
cdef int i
cdef double[::1] arr = clone(array('d'), L, False)
cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]
# Prevents dead-code elimination
str(d)
@timefunc("cpython.array raw C type")
def _(int L):
cdef int i
cdef array arr = clone(array('d'), L, False)
cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]
# Prevents dead-code elimination
str(d)
@timefunc("numpy.empty_like memoryview")
def _(int L):
cdef int i
cdef double[::1] arr = numpy.empty((L,), dtype='double')
cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]
# Prevents dead-code elimination
str(d)
@timefunc("malloc")
def _(int L):
cdef int i
cdef double* arrptr = <double*> malloc(sizeof(double) * L)
cdef double d
for i in range(loops):
for i in range(L):
d = arrptr[i]
free(arrptr)
# Prevents dead-code elimination
str(d)
@timefunc("malloc memoryview")
def _(int L):
cdef int i
cdef double* arrptr = <double*> malloc(sizeof(double) * L)
cdef double[::1] arr = <double[:L]>arrptr
cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]
free(arrptr)
# Prevents dead-code elimination
str(d)
@timefunc("cvarray memoryview")
def _(int L):
cdef int i
cdef double[::1] arr = cvarray((L,),sizeof(double),'d')
cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]
# Prevents dead-code elimination
str(d)
산출:
INITIALISATIONS
Running cpython.array buffer
0.100040 0.097140 0.133110 0.121820 0.131630 0.108420 0.112160 μs
Running cpython.array memoryview
0.339480 0.333240 0.378790 0.445720 0.449800 0.414280 0.414060 μs
Running cpython.array raw C type
0.048270 0.049250 0.069770 0.074140 0.076300 0.060980 0.060270 μs
Running numpy.empty_like memoryview
1.006200 1.012160 1.128540 1.212350 1.250270 1.235710 1.241050 μs
Running malloc
0.021850 0.022430 0.037240 0.046260 0.039570 0.043690 0.030720 μs
Running malloc memoryview
1.640200 1.648000 1.681310 1.769610 1.755540 1.804950 1.758150 μs
Running cvarray memoryview
1.332330 1.353910 1.358160 1.481150 1.517690 1.485600 1.490790 μs
ITERATING
Running cpython.array buffer
0.010000 0.027000 0.091000 0.669000 6.314000 64.389000 635.171000 μs
Running cpython.array memoryview
0.013000 0.015000 0.058000 0.354000 3.186000 33.062000 338.300000 μs
Running cpython.array raw C type
0.014000 0.146000 0.979000 9.501000 94.160000 916.073000 9287.079000 μs
Running numpy.empty_like memoryview
0.042000 0.020000 0.057000 0.352000 3.193000 34.474000 333.089000 μs
Running malloc
0.002000 0.004000 0.064000 0.367000 3.599000 32.712000 323.858000 μs
Running malloc memoryview
0.019000 0.032000 0.070000 0.356000 3.194000 32.100000 327.929000 μs
Running cvarray memoryview
0.014000 0.026000 0.063000 0.351000 3.209000 32.013000 327.890000 μs
( "반복"벤치 마크의 이유는 일부 방법이이 점에서 놀랍도록 다른 특성을 가지고 있기 때문입니다.)
초기화 속도 순서 :
malloc
: This is a harsh world, but it's fast. If you need to to allocate a lot of things and have unhindered iteration and indexing performance, this has to be it. But normally you're a good bet for...
cpython.array raw C type
: Well damn, it's fast. And it's safe. Unfortunately it goes through Python to access its data fields. You can avoid that by using a wonderful trick:
arr.data.as_doubles[i]
which brings it up to the standard speed while removing safety! This makes this a wonderful replacement for malloc
, being basically a pretty reference-counted version!
cpython.array buffer
: Coming in at only three to four times the setup time of malloc
, this is looks a wonderful bet. Unfortunately it has significant overhead (albeit small compared to the boundscheck
and wraparound
directives). That means it only really competes against full-safety variants, but it is the fastest of those to initialise. Your choice.
cpython.array memoryview
: This is now an order of magnitude slower than malloc
to initialise. That's a shame, but it iterates just as fast. This is the standard solution that I would suggest unless boundscheck
or wraparound
are on (in which case cpython.array buffer
might be a more compelling tradeoff).
The rest. The only one worth anything is numpy
's, due to the many fun methods attached to the objects. That's it, though.
As a follow up to Veedrac's answer: be aware using the memoryview
support of cpython.array
with python 2.7 appears to lead to memory leaks currently. This seems to be a long-standing issue as it is mentioned on the cython-users mailing list here in a post from November 2012. Running Veedrac's benchmark scrip with Cython version 0.22 with both Python 2.7.6 and Python 2.7.9 leads to a large memory leak on when initialising a cpython.array
using either a buffer
or memoryview
interface. No memory leaks occur when running the script with Python 3.4. I've filed a bug report on this to the Cython developers mailing list.
'programing tip' 카테고리의 다른 글
Pylint의 Cell-var-from-loop 경고 (0) | 2020.11.21 |
---|---|
iFrame의 크기를 자동으로 조정하는 방법은 무엇입니까? (0) | 2020.11.21 |
C ++ 14 자리 구분자에 공백 문자가 선택되지 않은 이유는 무엇입니까? (0) | 2020.11.21 |
Google 크롬 개발자 도구는 매우 느리게 작동합니다. (0) | 2020.11.21 |
C # 용으로 작성된 퍼지 검색 또는 문자열 유사성 함수 라이브러리가 있습니까? (0) | 2020.11.21 |