Python 정규식-R 접두사
r
접두사가 사용되지 않을 때 아래 예제 1이 작동하는 이유를 누구든지 설명 할 수 있습니까 ? r
이스케이프 시퀀스를 사용할 때마다 접두사를 사용해야 한다고 생각했습니다 . 예제 2와 예제 3이이를 보여줍니다.
# example 1
import re
print (re.sub('\s+', ' ', 'hello there there'))
# prints 'hello there there' - not expected as r prefix is not used
# example 2
import re
print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello there there'))
# prints 'hello there' - as expected as r prefix is used
# example 3
import re
print (re.sub('(\b\w+)(\s+\1\b)+', '\1', 'hello there there'))
# prints 'hello there there' - as expected as r prefix is not used
때문에 \
그들은 유효한 이스케이프 시퀀스 경우에만 이스케이프 시퀀스를 시작합니다.
>>> '\n'
'\n'
>>> r'\n'
'\\n'
>>> print '\n'
>>> print r'\n'
\n
>>> '\s'
'\\s'
>>> r'\s'
'\\s'
>>> print '\s'
\s
>>> print r'\s'
\s
아니라면 에 'R'또는 'R'프리픽스가 존재, 서열 탈출 이스케이프 시퀀스가 인식 표준 C. 의해 사용되는 것과 유사한 규칙에 따라 해석하여 문자열 :
Escape Sequence Meaning Notes \newline Ignored \\ Backslash (\) \' Single quote (') \" Double quote (") \a ASCII Bell (BEL) \b ASCII Backspace (BS) \f ASCII Formfeed (FF) \n ASCII Linefeed (LF) \N{name} Character named name in the Unicode database (Unicode only) \r ASCII Carriage Return (CR) \t ASCII Horizontal Tab (TAB) \uxxxx Character with 16-bit hex value xxxx (Unicode only) \Uxxxxxxxx Character with 32-bit hex value xxxxxxxx (Unicode only) \v ASCII Vertical Tab (VT) \ooo Character with octal value ooo \xhh Character with hex value hh
경로 리터럴에 대해 원시 문자열에 의존하지 마십시오. 원시 문자열 에는 엉덩이에 사람을 물린 것으로 알려진 다소 특이한 내부 작업이 있습니다.
When an "r" or "R" prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal
r"\n"
consists of two characters: a backslash and a lowercase "n". String quotes can be escaped with a backslash, but the backslash remains in the string; for example,r"\""
is a valid string literal consisting of two characters: a backslash and a double quote;r"\"
is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.
To better illustrate this last point:
>>> r'\'
SyntaxError: EOL while scanning string literal
>>> r'\''
"\\'"
>>> '\'
SyntaxError: EOL while scanning string literal
>>> '\''
"'"
>>>
>>> r'\\'
'\\\\'
>>> '\\'
'\\'
>>> print r'\\'
\\
>>> print r'\'
SyntaxError: EOL while scanning string literal
>>> print '\\'
\
the 'r' means the the following is a "raw string", ie. backslash characters are treated literally instead of signifying special treatment of the following character.
http://docs.python.org/reference/lexical_analysis.html#literals
so '\n'
is a single newline
and r'\n'
is two characters - a backslash and the letter 'n'
another way to write it would be '\\n'
because the first backslash escapes the second
an equivalent way of writing this
print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello there there'))
is
print (re.sub('(\\b\\w+)(\\s+\\1\\b)+', '\\1', 'hello there there'))
Because of the way Python treats characters that are not valid escape characters, not all of those double backslashes are necessary - eg '\s'=='\\s'
however the same is not true for '\b'
and '\\b'
. My preference is to be explicit and double all the backslashes.
Not all sequences involving backslashes are escape sequences. \t
and \f
are, for example, but \s
is not. In a non-raw string literal, any \
that is not part of an escape sequence is seen as just another \
:
>>> "\s"
'\\s'
>>> "\t"
'\t'
\b
is an escape sequence, however, so example 3 fails. (And yes, some people consider this behaviour rather unfortunate.)
Try that:
a = '\''
'
a = r'\''
\'
a = "\'"
'
a = r"\'"
\'
Check below example:
print r"123\n123"
#outputs>>>
123\n123
print "123\n123"
#outputs>>>
123
123
참고URL : https://stackoverflow.com/questions/2241600/python-regex-r-prefix
'programing tip' 카테고리의 다른 글
LINQ to SQL 트랜잭션을 만드는 방법은 무엇입니까? (0) | 2020.11.15 |
---|---|
Linux 커널 : 시스템 호출 후킹 예제 (0) | 2020.11.15 |
보기를 복제하려면 어떻게합니까? (0) | 2020.11.15 |
C ++ 0x : 람다를 참조로 매개 변수로받는 적절한 방법 (0) | 2020.11.15 |
캔버스를 png 이미지로 저장하는 방법? (0) | 2020.11.15 |