classification
Title: Plistlib: Half of the double width characters are missing when writing binary plist
Type: behavior Stage: commit review
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: kuglee, python-dev, ronaldoussoren, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-09-30 22:57 by kuglee, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
input.plist kuglee, 2016-09-30 22:57 This is the input file
output.plist kuglee, 2016-09-30 22:58 This is the output file.
plistlib-astral-characters.patch serhiy.storchaka, 2016-10-01 06:33 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (3)
msg277781 - (view) Author: (kuglee) Date: 2016-09-30 23:01
I read an emoji character from a plist file. The emoji printed correctly to stdout. However when I dump the file to a binary plist only the half of the emoji was present.
msg277800 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-01 06:33
The simplest reproducer:

>>> import plistlib
>>> plistlib.loads(plistlib.dumps('\U0001f40d', fmt=plistlib.FMT_BINARY))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/plistlib.py", line 1006, in loads
    fp, fmt=fmt, use_builtin_types=use_builtin_types, dict_type=dict_type)
  File "/usr/lib/python3.5/plistlib.py", line 997, in load
    return p.parse(fp)
  File "/usr/lib/python3.5/plistlib.py", line 623, in parse
    return self._read_object(self._object_offsets[top_object])
  File "/usr/lib/python3.5/plistlib.py", line 704, in _read_object
    return self._fp.read(s * 2).decode('utf-16be')
  File "/usr/lib/python3.5/encodings/utf_16_be.py", line 16, in decode
    return codecs.utf_16_be_decode(input, errors, True)
UnicodeDecodeError: 'utf-16-be' codec can't decode bytes in position 0-1: unexpected end of data

Proposed patch fixes this issue.
msg278062 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-04 17:09
New changeset 381ef0f08f89 by Serhiy Storchaka in branch '3.5':
Issue #28321: Fixed writing non-BMP characters with binary format in plistlib.
https://hg.python.org/cpython/rev/381ef0f08f89

New changeset 3a7234d04fe9 by Serhiy Storchaka in branch '3.6':
Issue #28321: Fixed writing non-BMP characters with binary format in plistlib.
https://hg.python.org/cpython/rev/3a7234d04fe9

New changeset b6c85e7e558a by Serhiy Storchaka in branch 'default':
Issue #28321: Fixed writing non-BMP characters with binary format in plistlib.
https://hg.python.org/cpython/rev/b6c85e7e558a
History
Date User Action Args
2017-03-31 16:36:39dstufftsetpull_requests: + pull_request1109
2016-10-04 17:10:19serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> commit review
2016-10-04 17:09:41python-devsetnosy: + python-dev
messages: + msg278062
2016-10-04 17:00:53serhiy.storchakasetassignee: serhiy.storchaka
versions: - Python 2.7
2016-10-01 06:33:15serhiy.storchakasetfiles: + plistlib-astral-characters.patch

versions: + Python 2.7, Python 3.6, Python 3.7
keywords: + patch
nosy: + serhiy.storchaka, ronaldoussoren

messages: + msg277800
stage: patch review
2016-09-30 23:01:24kugleesetmessages: + msg277781
2016-09-30 22:58:04kugleesetfiles: + output.plist
2016-09-30 22:57:38kugleecreate