Author yan12125
Recipients Alex.Willmer, vstinner, xdegaye, yan12125
Date 2016-11-18.12:52:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1479473539.59.0.449329504571.issue26928@psf.upfronthosting.co.za>
In-reply-to
Content
There are some locale strings supported in setlocale(): https://android.googlesource.com/platform/bionic/+/master/libc/bionic/locale.cpp#104. However, seems mbstowcs just ignores such a setting on Android. Here's an example:

#include <locale.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

#define BUFFER_SIZE 10

void test_mbstowcs()
{
    wchar_t dest[BUFFER_SIZE];
    memset(dest, 0, sizeof(dest));
    printf("mbstowcs: %ld\n", mbstowcs(dest, "中文", BUFFER_SIZE));
    printf("dest: %x %x\n", dest[0], dest[1]);
}

int main()
{
    printf("setlocale: %d\n",  setlocale(LC_ALL, "en_US.UTF-8") != NULL);
    test_mbstowcs();
    printf("setlocale: %d\n",  setlocale(LC_ALL, "C") != NULL);
    test_mbstowcs();
    return 0;
}

On Linux (glibc 2.24) the result is:

$ ./a.out 
setlocale: 1
mbstowcs: 2
dest: 4e2d 6587
setlocale: 1
mbstowcs: -1
dest: 0 0

On Android (6.0 Marshmallow) the result is:
shell@ASUS_Z00E_2:/ $ /data/local/tmp/a.out
setlocale: 1
mbstowcs: 2
dest: 4e2d 6587
setlocale: 1
mbstowcs: 2
dest: 4e2d 6587

A quick search indicates setlocale() affects *scanf functions only, so I guess it's safe to force UTF-8 in CPython.
History
Date User Action Args
2016-11-18 12:52:19yan12125setrecipients: + yan12125, vstinner, xdegaye, Alex.Willmer
2016-11-18 12:52:19yan12125setmessageid: <1479473539.59.0.449329504571.issue26928@psf.upfronthosting.co.za>
2016-11-18 12:52:19yan12125linkissue26928 messages
2016-11-18 12:52:19yan12125create