programing tip

파이썬에서 바이트 문자열을 정수로 변환하는 방법

itbloger 2020. 6. 10. 22:37

파이썬에서 바이트 문자열을 정수로 변환하는 방법

파이썬에서 바이트 문자열을 int로 어떻게 변환 할 수 있습니까?

이렇게 말하십시오 : 'y\xcc\xa6\xbb'

나는 영리하고 어리석은 방법을 생각해 냈습니다.

sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))

나는 이것을 더 간단하게하는 내장 또는 표준 라이브러리에 무언가가 있어야한다는 것을 알고있다 ...

이것은 int (xxx, 16)를 사용할 수있는 16 진수 문자열을 변환하는 것과 다르지만 대신 실제 바이트 값의 문자열을 변환하려고합니다.

최신 정보:

다른 모듈을 가져올 필요가 없기 때문에 James의 답변이 조금 더 좋았지 만 Greg의 방법이 더 빠릅니다.

>>> from timeit import Timer
>>> Timer('struct.unpack("<L", "y\xcc\xa6\xbb")[0]', 'import struct').timeit()
0.36242198944091797
>>> Timer("int('y\xcc\xa6\xbb'.encode('hex'), 16)").timeit()
1.1432669162750244

내 해키 방법 :

>>> Timer("sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))").timeit()
2.8819329738616943

추가 업데이트 :

누군가 다른 모듈을 가져 오는 데 어떤 문제가 있는지 의견을 물었습니다. 글쎄, 모듈을 임포트하는 것이 반드시 싼 것은 아닙니다.

>>> Timer("""import struct\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""").timeit()
0.98822188377380371

모듈을 가져 오는 비용을 포함 시키면이 방법이 갖는 거의 모든 이점이 무시됩니다. 여기에는 전체 벤치 마크 실행시 한 번만 가져 오는 비용이 포함된다고 생각합니다. 매번 다시로드 할 때 어떤 일이 발생하는지보십시오.

>>> Timer("""reload(struct)\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""", 'import struct').timeit()
68.474128007888794

말할 필요도없이, 한 번의 가져 오기 마다이 방법을 많이 실행하면 이보다 비례 적으로 덜 문제가됩니다. CPU가 아닌 i / o 비용 일 수 있으므로 특정 시스템의 용량 및로드 특성에 따라 달라질 수 있습니다.

struct 모듈을 사용하여 이를 수행 할 수도 있습니다 .

>>> struct.unpack("<L", "y\xcc\xa6\xbb")[0]
3148270713L

Python 3.2 이상에서

>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='big')
2043455163

또는

>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='little')
3148270713

바이트 문자열 의 엔디안 에 따라 .

이것은 또한 임의의 길이의 바이트 문자열-정수 및을 지정하여 2의 보수 부호있는 정수에 대해서도 작동합니다 signed=True. 에 대한 문서를from_bytes 참조하십시오 .

Greg가 말했듯이 이진 값을 처리하는 경우 struct를 사용할 수 있지만 "16 진수"만 있고 바이트 형식이면 다음과 같이 변환 할 수 있습니다.

s = 'y\xcc\xa6\xbb'
num = int(s.encode('hex'), 16)

... 이것은 다음과 같습니다 :

num = struct.unpack(">L", s)[0]

... 모든 바이트에서 작동한다는 점을 제외하고.

다음 함수를 사용하여 int, hex 및 bytes 사이의 데이터를 변환합니다.

def bytes2int(str):
 return int(str.encode('hex'), 16)

def bytes2hex(str):
 return '0x'+str.encode('hex')

def int2bytes(i):
 h = int2hex(i)
 return hex2bytes(h)

def int2hex(i):
 return hex(i)

def hex2int(h):
 if len(h) > 1 and h[0:2] == '0x':
  h = h[2:]

 if len(h) % 2:
  h = "0" + h

 return int(h, 16)

def hex2bytes(h):
 if len(h) > 1 and h[0:2] == '0x':
  h = h[2:]

 if len(h) % 2:
  h = "0" + h

 return h.decode('hex')

출처 : http://opentechnotes.blogspot.com.au/2014/04/convert-values-to-from-integer-hex.html

import array
integerValue = array.array("I", 'y\xcc\xa6\xbb')[0]

Warning: the above is strongly platform-specific. Both the "I" specifier and the endianness of the string->int conversion are dependent on your particular Python implementation. But if you want to convert many integers/strings at once, then the array module does it quickly.

In Python 2.x, you could use the format specifiers <B for unsigned bytes, and <b for signed bytes with struct.unpack/struct.pack.

E.g:

Let x = '\xff\x10\x11'

data_ints = struct.unpack('<' + 'B'*len(x), x) # [255, 16, 17]

And:

data_bytes = struct.pack('<' + 'B'*len(data_ints), *data_ints) # '\xff\x10\x11'

That * is required!

See https://docs.python.org/2/library/struct.html#format-characters for a list of the format specifiers.

>>> reduce(lambda s, x: s*256 + x, bytearray("y\xcc\xa6\xbb"))
2043455163

Test 1: inverse:

>>> hex(2043455163)
'0x79cca6bb'

Test 2: Number of bytes > 8:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAA"))
338822822454978555838225329091068225L

Test 3: Increment by one:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAB"))
338822822454978555838225329091068226L

Test 4: Append one byte, say 'A':

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))
86738642548474510294585684247313465921L

Test 5: Divide by 256:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))/256
338822822454978555838225329091068226L

Result equals the result of Test 4, as expected.

I was struggling to find a solution for arbitrary length byte sequences that would work under Python 2.x. Finally I wrote this one, it's a bit hacky because it performs a string conversion, but it works.

Function for Python 2.x, arbitrary length

def signedbytes(data):
    """Convert a bytearray into an integer, considering the first bit as
    sign. The data must be big-endian."""
    negative = data[0] & 0x80 > 0

    if negative:
        inverted = bytearray(~d % 256 for d in data)
        return -signedbytes(inverted) - 1

    encoded = str(data).encode('hex')
    return int(encoded, 16)

This function has two requirements:

The input data needs to be a bytearray. You may call the function like this:
```
s = 'y\xcc\xa6\xbb'
n = signedbytes(s)
```
The data needs to be big-endian. In case you have a little-endian value, you should reverse it first:
```
n = signedbytes(s[::-1])
```

Of course, this should be used only if arbitrary length is needed. Otherwise, stick with more standard ways (e.g. struct).

int.from_bytes is the best solution if you are at version >=3.2. The "struct.unpack" solution requires a string so it will not apply to arrays of bytes. Here is another solution:

def bytes2int( tb, order='big'):
    if order == 'big': seq=[0,1,2,3]
    elif order == 'little': seq=[3,2,1,0]
    i = 0
    for j in seq: i = (i<<8)+tb[j]
    return i

hex( bytes2int( [0x87, 0x65, 0x43, 0x21])) returns '0x87654321'.

It handles big and little endianness and is easily modifiable for 8 bytes

As mentioned above using unpack function of struct is a good way. If you want to implement your own function there is an another solution:

def bytes_to_int(bytes):
    result = 0
    for b in bytes:
        result = result * 256 + int(b)
return result

A decently speedy method utilizing array.array I've been using for some time:

predefined variables:

offset = 0
size = 4
big = True # endian
arr = array('B')
arr.fromstring("\x00\x00\xff\x00") # 5 bytes (encoding issues) [0, 0, 195, 191, 0]

to int: (read)

val = 0
for v in arr[offset:offset+size][::pow(-1,not big)]: val = (val<<8)|v

from int: (write)

val = 16384
arr[offset:offset+size] = \
    array('B',((val>>(i<<3))&255 for i in range(size)))[::pow(-1,not big)]

It's possible these could be faster though.

EDIT:
For some numbers, here's a performance test (Anaconda 2.3.0) showing stable averages on read in comparison to reduce():

========================= byte array to int.py =========================
5000 iterations; threshold of min + 5000ns:
______________________________________code___|_______min______|_______max______|_______avg______|_efficiency
⣿⠀⠀⠀⠀⡇⢀⡀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⡀⠀⢰⠀⠀⠀⢰⠀⠀⠀⢸⠀⠀⢀⡇⠀⢀⠀⠀⠀⠀⢠⠀⠀⠀⠀⢰⠀⠀⠀⢸⡀⠀⠀⠀⢸⠀⡇⠀⠀⢠⠀⢰⠀⢸⠀
⣿⣦⣴⣰⣦⣿⣾⣧⣤⣷⣦⣤⣶⣾⣿⣦⣼⣶⣷⣶⣸⣴⣤⣀⣾⣾⣄⣤⣾⡆⣾⣿⣿⣶⣾⣾⣶⣿⣤⣾⣤⣤⣴⣼⣾⣼⣴⣤⣼⣷⣆⣴⣴⣿⣾⣷⣧⣶⣼⣴⣿⣶⣿⣶
    val = 0 \nfor v in arr: val = (val<<8)|v |     5373.848ns |   850009.965ns |     ~8649.64ns |  62.128%
⡇⠀⠀⢀⠀⠀⠀⡇⠀⡇⠀⠀⣠⠀⣿⠀⠀⠀⠀⡀⠀⠀⡆⠀⡆⢰⠀⠀⡆⠀⡄⠀⠀⠀⢠⢀⣼⠀⠀⡇⣠⣸⣤⡇⠀⡆⢸⠀⠀⠀⠀⢠⠀⢠⣿⠀⠀⢠⠀⠀⢸⢠⠀⡀
⣧⣶⣶⣾⣶⣷⣴⣿⣾⡇⣤⣶⣿⣸⣿⣶⣶⣶⣶⣧⣷⣼⣷⣷⣷⣿⣦⣴⣧⣄⣷⣠⣷⣶⣾⣸⣿⣶⣶⣷⣿⣿⣿⣷⣧⣷⣼⣦⣶⣾⣿⣾⣼⣿⣿⣶⣶⣼⣦⣼⣾⣿⣶⣷
                  val = reduce( shift, arr ) |     6489.921ns |  5094212.014ns |   ~12040.269ns |  53.902%

This is a raw performance test, so the endian pow-flip is left out.
The shift function shown applies the same shift-oring operation as the for loop, and arr is just array.array('B',[0,0,255,0]) as it has the fastest iterative performance next to dict.

I should probably also note efficiency is measured by accuracy to the average time.

In python 3 you can easily convert a byte string into a list of integers (0..255) by

>>> list(b'y\xcc\xa6\xbb')
[121, 204, 166, 187]

참고URL : https://stackoverflow.com/questions/444591/how-to-convert-a-string-of-bytes-into-an-int-in-python

저작자표시

'programing tip' 카테고리의 다른 글

Objective-C-문자열에서 마지막 문자 제거 (0)	2020.06.10
쿠키와 세션의 차이점은 무엇입니까? (0)	2020.06.10
Intellij IDEA : "소스에서 스크롤"에 대한 단축키 (0)	2020.06.09
이전 버전의 Android NDK는 어디서 찾을 수 있습니까? (0)	2020.06.09
jQuery DataTables에서 특정 열에 대한 정렬 비활성화 (0)	2020.06.09

현재글파이썬에서 바이트 문자열을 정수로 변환하는 방법

itbloger

파이썬에서 바이트 문자열을 정수로 변환하는 방법

파이썬에서 바이트 문자열을 정수로 변환하는 방법

Function for Python 2.x, arbitrary length

'programing tip' 카테고리의 다른 글

'programing tip'의 다른글

티스토리툴바

파이썬에서 바이트 문자열을 정수로 변환하는 방법

파이썬에서 바이트 문자열을 정수로 변환하는 방법

Function for Python 2.x, arbitrary length

'programing tip' 카테고리의 다른 글

'programing tip'의 다른글

관련글

티스토리툴바