파이썬을 사용하여 문자열에서 숫자를 제외한 문자를 제거 하시겠습니까?

programing tip

파이썬을 사용하여 문자열에서 숫자를 제외한 문자를 제거 하시겠습니까?

itbloger 2020. 7. 13. 21:39

파이썬을 사용하여 문자열에서 숫자를 제외한 문자를 제거 하시겠습니까?

문자열에서 숫자를 제외한 모든 문자를 제거하려면 어떻게합니까?

Python 2. *에서 가장 빠른 방법은 다음과 .translate같습니다.

>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>>

string.maketrans이 경우 번역 테이블 (길이 256의 문자열) ''.join(chr(x) for x in range(256))을 만듭니다 ( 이 경우에는 더 빠름 ;-). .translate변환 테이블을 적용합니다 (여기서는 all본질적으로 ID를 의미 하므로 관련이 없음 ). 두 번째 인수 인 핵심 부분에있는 문자를 삭제합니다.

.translate매우 다르게 유니 코드 문자열에서 작동 (파이썬 3 문자열 - 내가 할 소원 질문 파이썬의 주요 릴리스가 관심 인 지정!) - 확실히이 간단한, 꽤 이렇게 빨리,하지만 아직도 확실히 가능.

2. *로 돌아 가면 성능 차이는 놀랍습니다 ... :

$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop

7-8 배 빠른 속도를내는 것은 땅콩이 거의 아니므로이 translate방법을 알고 사용하는 것이 좋습니다. 다른 인기있는 비 RE 접근 방식 :

$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop

RE보다 50 % 느리므로 .translate접근 방식이이를 훨씬 능가합니다.

Python 3 또는 유니 코드의 경우 삭제하려는 항목을 .translate리턴하는 맵핑 (문자가 아닌 서 수가있는 키)을 전달해야합니다 None. 다음은 "모든 것을 제외한"문자를 삭제하기 위해 이것을 표현하는 편리한 방법입니다.

import string

class Del:
  def __init__(self, keep=string.digits):
    self.comp = dict((ord(c),c) for c in keep)
  def __getitem__(self, k):
    return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)

또한 방출 '1233344554552'합니다. 그러나 이것을 xx.py에 넣으면 ... :

$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop

... 이러한 "삭제"작업에 대한 성능 이점이 사라지고 성능이 저하됨을 나타냅니다.

다음 re.sub과 같이 사용하십시오 .

>>> import re
>>> re.sub("\D", "", "aas30dsa20")
'3020'

\D matches any non-digit character so, the code above, is essentially replacing every non-digit character for the empty string.

Or you can use filter, like so (in Python 2k):

>>> filter(lambda x: x.isdigit(), "aas30dsa20")
'3020'

Since in Python 3k, filter returns an iterator instead of a list, you can use the following instead:

>>> ''.join(filter(lambda x: x.isdigit(), "aas30dsa20"))
'3020'

s=''.join(i for i in s if i.isdigit())

Another generator variant.

You can use filter:

filter(lambda x: x.isdigit(), "dasdasd2313dsa")

On python3.0 you have to join this (kinda ugly :( )

''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))

along the lines of bayer's answer:

''.join(i for i in s if i.isdigit())

You can easily do it using Regex

>>> import re
>>> re.sub("\D","","£70,000")
70000

x.translate(None, string.digits)

will delete all digits from string. To delete letters and keep the digits, do this:

x.translate(None, string.letters)

The op mentions in the comments that he wants to keep the decimal place. This can be done with the re.sub method (as per the second and IMHO best answer) by explicitly listing the characters to keep e.g.

>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'

A fast version for Python 3:

# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)

def keeper(keep):
    table = defaultdict(_NoneType)
    table.update({ord(c): c for c in keep})
    return table

digit_keeper = keeper(string.digits)

Here's a performance comparison vs. regex:

$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop

So it's a little bit more than 3 times faster than regex, for me. It's also faster than class Del above, because defaultdict does all its lookups in C, rather than (slow) Python. Here's that version on my same system, for comparison.

$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop

Ugly but works:

>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>

Use a generator expression:

>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000 loops, best of 3: 2.48 usec per loop

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000 loops, best of 3: 2.02 usec per loop

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000 loops, best of 3: 2.37 usec per loop

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000 loops, best of 3: 1.97 usec per loop

I had observed that join is faster than sub.

You can read each character. If it is digit, then include it in the answer. The str.isdigit() method is a way to know if a character is digit.

your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'

Not a one liner but very simple:

buffer = ""
some_str = "aas30dsa20"

for char in some_str:
    if not char.isdigit():
        buffer += char

print( buffer )

I used this. 'letters' should contain all the letters that you want to get rid of:

Output = Input.translate({ord(i): None for i in 'letters'}))

Example:

Input = "I would like 20 dollars for that suit" Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'})) print(Output)

Output: 20

참고URL : https://stackoverflow.com/questions/1450897/remove-characters-except-digits-from-string-using-python

'programing tip' 카테고리의 다른 글

프로그래밍 방식으로 탐색 막대의 높이를 얻습니다. (0)	2020.07.13
4 방향으로 스 와이프를 인식하는 방법 (0)	2020.07.13
PHP를 사용하여 CSV 파일을 구문 분석하는 방법 (0)	2020.07.13
log4net이 작동하지 않습니다 (0)	2020.07.13
안전한 지역 삽입 상하 높이 확보 (0)	2020.07.13

현재글파이썬을 사용하여 문자열에서 숫자를 제외한 문자를 제거 하시겠습니까?

itbloger

파이썬을 사용하여 문자열에서 숫자를 제외한 문자를 제거 하시겠습니까?

파이썬을 사용하여 문자열에서 숫자를 제외한 문자를 제거 하시겠습니까?

'programing tip' 카테고리의 다른 글

'programing tip'의 다른글

티스토리툴바

파이썬을 사용하여 문자열에서 숫자를 제외한 문자를 제거 하시겠습니까?

파이썬을 사용하여 문자열에서 숫자를 제외한 문자를 제거 하시겠습니까?

'programing tip' 카테고리의 다른 글

'programing tip'의 다른글

관련글

티스토리툴바