programing tip

C에서 대소 문자를 구분하지 않는 문자열 comp

itbloger 2020. 11. 28. 08:48

C에서 대소 문자를 구분하지 않는 문자열 comp

char*대소 문자를 무시하고 비교하려는 우편 번호가 두 개 있습니다. 이를 수행하는 기능이 있습니까?

아니면 tolower 함수를 사용할 때마다 반복해서 비교해야합니까?

이 함수가 문자열의 숫자와 어떻게 반응하는지 알기

감사

C 표준에는이를 수행하는 기능이 없습니다. POSIX를 준수하는 Unix 시스템 strcasecmp은 헤더 에 있어야 합니다 strings.h. Microsoft 시스템에는 stricmp. 휴대하기 위해서는 다음과 같이 직접 작성하십시오.

int strcicmp(char const *a, char const *b)
{
    for (;; a++, b++) {
        int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
        if (d != 0 || !*a)
            return d;
    }
}

그러나 이러한 솔루션은 UTF-8 문자열에서 작동하지 않으며 ASCII 문자열에서만 작동합니다.

에서 strcasecmp () 를 살펴보십시오 strings.h.

나는 stricmp(). 대소 문자를 구분하지 않고 두 문자열을 비교합니다.

어떤 경우에는 문자열을 소문자로 변환하는 것이 더 빠를 수 있습니다.

표준 헤더에 대한 추가 문자열 함수를 포함하는 이름이 지정된 내장 메서드를 찾았습니다.

관련 서명은 다음과 같습니다.

int  strcasecmp(const char *, const char *);
int  strncasecmp(const char *, const char *, size_t);

또한 xnu 커널 (osfmk / device / subrs.c)에서 동의어이며 다음 코드에서 구현되었으므로 원래 strcmp 함수와 비교하여 숫자의 동작이 변경 될 것으로 예상하지 않습니다.

tolower(unsigned char ch) {
    if (ch >= 'A' && ch <= 'Z')
        ch = 'a' + (ch - 'A');
    return ch;
 }

int strcasecmp(const char *s1, const char *s2) {
    const unsigned char *us1 = (const u_char *)s1,
                        *us2 = (const u_char *)s2;

    while (tolower(*us1) == tolower(*us2++))
        if (*us1++ == '\0')
            return (0);
    return (tolower(*us1) - tolower(*--us2));
}

대소 문자를 구분하지 않는 비교를 할 때주의해야 할 추가 함정 :

소문자 또는 대문자로 비교 하시겠습니까? (충분히 일반적인 문제)

모두 아래로 0을 반환 strcicmpL("A", "a")하고 strcicmpU("A", "a").
그러나 strcicmpL("A", "_")와 strcicmpU("A", "_")같은 다른 서명 결과를 반환 할 수 '_'상부와 소문자 사이에 종종있다.

이는와 함께 사용할 때 정렬 순서에 영향을줍니다 qsort(..., ..., ..., strcicmp). 일반적으로 사용할 수와 같은 비 표준 라이브러리 C 함수 stricmp()또는 strcasecmp()잘 정의하는 경향 소문자를 통해 비교 선호한다. 그러나 변형이 존재합니다.

int strcicmpL(char const *a, char const *b) {
  while (*a) {
    int d = tolower(*a) - tolower(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return 0;
}

int strcicmpU(char const *a, char const *b) {
  while (*a) {
    int d = toupper(*a) - toupper(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return 0;
}

char음수 값을 가질 수 있습니다. (드물지 않음)

touppper(int)및 tolower(int)에 지정된 unsigned char값과 음 EOF. 또한, strcmp()반환 결과는 각각 것처럼 char로 전환 된 unsigned char경우에 관계없이 char되어 서명 또는 서명 .

tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct

로케일 (덜 일반적)

ASCII 코드 (0-127)를 사용하는 문자 세트는 어디에나 있지만 나머지 코드에는 로케일 특정 문제 가있는 경향이 있습니다. 따라서 strcasecmp("\xE4", "a")한 시스템에서는 0을 반환하고 다른 시스템에서는 0이 아닌 값을 반환 할 수 있습니다.

유니 코드 (미래의 방식)

솔루션이 ASCII 이상을 처리해야하는 경우 unicode_strcicmp(). C lib는 이러한 기능을 제공하지 않으므로 일부 대체 라이브러리의 사전 코딩 된 기능을 사용하는 것이 좋습니다. 직접 작성 unicode_strcicmp()하는 것은 어려운 작업입니다.

모든 문자가 하나의 낮은 문자에서 하나의 높은 문자로 매핑됩니까? (페단 틱)

[AZ]는 [az] 를 사용하여 일대일로 매핑 하지만 다양한 로케일은 다양한 소문자 문자를 하나의 대문자로 매핑하고 그 반대의 경우도 마찬가지입니다. 또한 일부 대문자는 소문자가 없을 수 있으며 다시 한 번 반대입니다.

이것은 코드가 tolower()및 tolower().

int d = tolower(toupper(*a)) - tolower(toupper(*b));

또, 코드 그랬다면 정렬 잠재적 인 다른 결과 tolower(toupper(*a))대를 toupper(tolower(*a)).

휴대 성

@비. Nadolsonstrcicmp() 은 자신의 롤링을 피할 것을 권장 하며 이는 코드에 동등한 이식성 기능이 필요한 경우를 제외하고는 합리적입니다.

아래는 일부 시스템에서 제공하는 기능보다 더 빠르게 수행되는 접근 방식입니다. 와 다른 두 개의 다른 테이블을 사용하여 두 개가 아닌 루프 당 단일 비교를 수행 '\0'합니다. 결과는 다를 수 있습니다.

static unsigned char low1[UCHAR_MAX + 1] = {
  0, 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...  // @ABC... Z[...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...  // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
  'A', 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...
}

int strcicmp_ch(char const *a, char const *b) {
  // compare using tables that differ slightly.
  while (low1[(unsigned char)*a] == low2[(unsigned char)*b]) {
    a++;
    b++;
  }
  // Either strings differ or null character detected.
  // Perform subtraction using same table.
  return (low1[(unsigned char)*a] - low1[(unsigned char)*b]);
}

저는 여기 에서 가장 많이 찬성 된 답변 의 팬 이 아닙니다 (부분적으로는 두 문자열 중 하나에서 null 종결자를 읽지 만 한 번에 두 문자열을 읽지 않으면 계속되어야하기 때문에 정확하지 않기 때문에). t), 그래서 나는 직접 썼다.

이는에 대한 직접 드롭 인 교체 `strncmp()`이며 아래와 같이 수많은 테스트 사례에서 완전히 테스트되었습니다.

코드 만 :

#include <ctype.h> // for `tolower()`
#include <limits.h> // for `INT_MIN`

// Case-insensitive `strncmp()`
static inline int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = INT_MIN;
    size_t chars_compared = 0;

    if (!str1 || !str2)
    {
        goto done;
    }

    while ((*str1 || *str2) && (chars_compared < num))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

done:
    return ret_code;
}

완전히 주석 처리 된 버전 :

#include <ctype.h> // for `tolower()`
#include <limits.h> // for `INT_MIN`

/*

Case-insensitive string compare (strncmp case-insensitive)
- Identical to strncmp except case-insensitive. See: http://www.cplusplus.com/reference/cstring/strncmp/
- Aided/inspired, in part, by: https://stackoverflow.com/a/5820991/4561887

str1    C string 1 to be compared
str2    C string 2 to be compared
num     max number of chars to compare

return:
(essentially identical to strncmp)
INT_MIN  invalid arguments (one or both of the input strings is a NULL pointer)
<0       the first character that does not match has a lower value in str1 than in str2
 0       the contents of both strings are equal
>0       the first character that does not match has a greater value in str1 than in str2

*/
static inline int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = INT_MIN;

    size_t chars_compared = 0;

    // Check for NULL pointers
    if (!str1 || !str2)
    {
        goto done;
    }

    // Continue doing case-insensitive comparisons, one-character-at-a-time, of str1 to str2, 
    // as long as at least one of the strings still has more characters in it, and we have
    // not yet compared num chars.
    while ((*str1 || *str2) && (chars_compared < num))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            // The 2 chars just compared don't match
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

done:
    return ret_code;
}

테스트 코드 : (여기에서 온라인으로 실행) : https://onlinegdb.com/B1Qoj0W_N

int main()
{
    printf("Hello World\n\n");

    const char * str1;
    const char * str2;
    size_t n;

    str1 = "hey";
    str2 = "HEY";
    n = 3;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "heY";
    str2 = "HeY";
    n = 3;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "hey";
    str2 = "HEdY";
    n = 3;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "heY";
    str2 = "HeYd";
    n = 3;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));   
    printf("\n");

    str1 = "heY";
    str2 = "HeYd";
    n = 6;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "hey";
    str2 = "hey";
    n = 6;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "hey";
    str2 = "heyd";
    n = 6;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "hey";
    str2 = "heyd";
    n = 3;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    return 0;
}

샘플 출력 :

Hello World

strncmpci (헤이, 헤이, 3) = 0
strncmp (헤이, 헤이, 3) = 32

strncmpci (heY, HeY, 3) = 0
strncmp (heY, HeY, 3) = 32

strncmpci (hey, HEdY, 3) = 21
strncmp (hey, HEdY, 3) = 32

strncmpci (heY, HeYd, 3) = 0
strncmp (heY, HeYd, 3) = 32

strncmpci (heY, HeYd, 6) = -100
strncmp (heY, HeYd, 6) = 32

strncmpci (헤이, 헤이, 6) = 0
strncmp (헤이, 헤이, 6) = 0

strncmpci (hey, heyd, 6) = -100
strncmp (hey, heyd, 6) = -100

strncmpci (헤이, 헤이, 3) = 0
strncmp (헤이, 헤이, 3) = 0

참조 :

This question & other answers here served as inspiration and gave some insight (Case Insensitive String comp in C)
http://www.cplusplus.com/reference/cstring/strncmp/
https://en.wikipedia.org/wiki/ASCII
https://en.cppreference.com/w/c/language/operator_precedence

You can get an idea, how to implement an efficient one, if you don't have any in the library, from here

It use a table for all 256 chars.

in that table for all chars, except letters - used its ascii codes.
for upper case letter codes - the table list codes of lower cased symbols.

then we just need to traverse a strings and compare our table cells for a given chars:

const char *cm = charmap,
        *us1 = (const char *)s1,
        *us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
    if (*us1++ == '\0')
        return (0);
return (cm[*us1] - cm[*--us2]);

static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
    int k;
    for (k = 0; k < length; k++)
    {

        if ((str1[k] | 32) != (str2[k] | 32))
            break;
    }

    if (k != length)
        return 1;
    return 0;
}

Reference

As others have stated, there is no portable function that works on all systems. You can partially circumvent this with simple ifdef:

#include <stdio.h>

#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif

int main() {
    printf("%d", strcasecmp("teSt", "TEst"));
}

int strcmpInsensitive(char* a, char* b)
{
    return strcmp(lowerCaseWord(a), lowerCaseWord(b));
}

char* lowerCaseWord(char* a)
{
    char *b=new char[strlen(a)];
    for (int i = 0; i < strlen(a); i++)
    {
        b[i] = tolower(a[i]);   
    }
    return b;
}

good luck

Edit-lowerCaseWord function get a char* variable with, and return the lower case value of this char*. For example "AbCdE" for value of char*, will return "abcde".

Basically what it does is to take the two char* variables, after being transferred to lower case, and make use the strcmp function on them.

For example- if we call the strcmpInsensitive function for values of "AbCdE", and "ABCDE", it will first return both values in lower case ("abcde"), and then do strcmp function on them.

참고URL : https://stackoverflow.com/questions/5820810/case-insensitive-string-comp-in-c

'programing tip' 카테고리의 다른 글

Windows에서 CMake 실행 (0)	2020.11.28
MySQL IN 조건 제한 (0)	2020.11.28
float를 사용하지 않고 div를 왼쪽 / 오른쪽으로 정렬하는 방법은 무엇입니까? (0)	2020.11.27
부트 스트랩 : 하나가 확장되면 다른 섹션 축소 (0)	2020.11.27
Ctrl + R은 SSMS에서 쿼리 결과 창을 숨기지 않습니다. (0)	2020.11.27

현재글C에서 대소 문자를 구분하지 않는 문자열 comp

itbloger

C에서 대소 문자를 구분하지 않는 문자열 comp

C에서 대소 문자를 구분하지 않는 문자열 comp

대소 문자를 구분하지 않는 비교를 할 때주의해야 할 추가 함정 :

이는에 대한 직접 드롭 인 교체 `strncmp()`이며 아래와 같이 수많은 테스트 사례에서 완전히 테스트되었습니다.

테스트 코드 : (여기에서 온라인으로 실행) : https://onlinegdb.com/B1Qoj0W_N

샘플 출력 :

참조 :

'programing tip' 카테고리의 다른 글

'programing tip'의 다른글

티스토리툴바

C에서 대소 문자를 구분하지 않는 문자열 comp

C에서 대소 문자를 구분하지 않는 문자열 comp

대소 문자를 구분하지 않는 비교를 할 때주의해야 할 추가 함정 :

이는에 대한 직접 드롭 인 교체 strncmp()이며 아래와 같이 수많은 테스트 사례에서 완전히 테스트되었습니다.

테스트 코드 : (여기에서 온라인으로 실행) : https://onlinegdb.com/B1Qoj0W_N

샘플 출력 :

참조 :

'programing tip' 카테고리의 다른 글

'programing tip'의 다른글

관련글

티스토리툴바

이는에 대한 직접 드롭 인 교체 `strncmp()`이며 아래와 같이 수많은 테스트 사례에서 완전히 테스트되었습니다.