classification
Title: UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string
Type: behavior Stage: resolved
Components: Unicode Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: anmyachev, ezio.melotti, miss-islington, mrabarnett, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2021-10-13 14:31 by anmyachev, last changed 2021-10-14 17:51 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
test.py anmyachev, 2021-10-13 14:31 test.py - reproducer
Pull Requests
URL Status Linked Edit
PR 28939 merged serhiy.storchaka, 2021-10-13 21:31
PR 28943 merged miss-islington, 2021-10-14 10:17
PR 28945 merged serhiy.storchaka, 2021-10-14 10:56
Messages (7)
msg403837 - (view) Author: Anatoly Myachev (anmyachev) Date: 2021-10-13 14:31
Expected behavior - if `read()` function works correctly, then `readline()` should also works.

Reproducer in file - just run: `python test.py`.

Traceback (most recent call last):
  File "test.py", line 11, in <module>
    f.readline()
  File "C:\Users\amyachev\Miniconda3\envs\modin\lib\encodings\unicode_escape.py", line 26, in decode
    return codecs.unicode_escape_decode(input, self.errors)[0]
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string
msg403838 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-13 14:41
Can you please try write a simpler (shorter) reproducer?
msg403840 - (view) Author: Anatoly Myachev (anmyachev) Date: 2021-10-13 14:55
Hello!

I can reduce it a little.
The buffer shoudln't be decreased, as it seems there is a some kind relation with the buffer size for IO operations.

buffer = b'col1,col2,col3,col4,col5,col6\\r\\n0,2000-01-01,0,00:00:00,DuBFsyerJU,1809.3924826424557\\r\\n10,2000-01-01,10,01:00:00,AlwGHbVPpB,2853.2392617952996\\r\\n20,2000-01-01,20,02:00:00,TEkGgsYXYz,9933.278931158615\\r\\n30,2000-01-01,30,03:00:00,tfvnynVSfp,8574.917426248916\\r\\n40,2000-01-01,40,04:00:00,YOGjhztMWe,3768.71871233428\\r\\n50,2000-01-01,50,05:00:00,vkTOJSeQmU,6330.252072351792\\r\\n60,2000-01-01,60,06:00:00,LeolDfaGyv,5052.618993456892\\r\\n70,2000-01-01,70,07:00:00,OcyrbYVtyr,4287.371622852719\\r\\n80,2000-01-01,80,08:00:00,VUwDPNhcFV,3589.697826814614\\r\\n90,2000-01-01,90,09:00:00,KOadtzcNyK,4794.158259020925\\r\\n100,2000-01-01,100,10:00:00,rdSOjXJBWC,8826.736894397129\\r\\n110,2000-01-01,110,11:00:00,qzwVBOklhk,8086.105782454443\\r\\n120,2000-01-01,120,12:00:00,UTRlqVfKoD,1012.5061461339624\\r\\n130,2000-01-01,130,13:00:00,wKqEkRhkfw,2511.3137510933934\\r\\n140,2000-01-01,140,14:00:00,LxklWJbgxo,406.7116346419042\\r\\n150,2000-01-01,150,15:00:00,SxmZkdUgHv,8424.978062284761\\r\\n160,2000-01-01,160,16:00:00,nEvzypASGb,9890.252156059063\\r\\n170,2000-01-01,170,17:00:00,xiFkkjoDPB,2728.8359201479675\\r\\n180,2000-01-01,180,18:00:00,boMmgpBXgL,4231.680208002166\\r\\n190,2000-01-01,190,19:00:00,dXLJXWiXZI,7757.44902751916\\r\\n200,2000-01-01,200,20:00:00,PBdjwKoCMD,4915.090357003991\\r\\n210,2000-01-01,210,21:00:00,zGWLALpmoA,359.5243650158153\\r\\n220,2000-01-01,220,22:00:00,CfpZJoOqGZ,704.7990862762942\\r\\n230,2000-01-01,230,23:00:00,DrkxpLhpEN,520.3290677592321\\r\\n240,2000-01-02,240,00:00:00,TDKEBbZAzQ,5218.671660857721\\r\\n250,2000-01-02,250,01:00:00,gULwzvNeWO,4218.66872701774\\r\\n260,2000-01-02,260,02:00:00,ogSyzHWmNY,9026.657391329585\\r\\n270,2000-01-02,270,03:00:00,NetmmthtzN,2027.8312539582244\\r\\n280,2000-01-02,280,04:00:00,PoYiHipTzR,7667.627476518046\\r\\n290,2000-01-02,290,05:00:00,MjHIRGmsoq,4144.001792539834\\r\\n300,2000-01-02,300,06:00:00,qESRSNnNnO,5348.024681284471\\r\\n310,2000-01-02,310,07:00:00,sSIjcXWhLC,3622.4673907599413\\r\\n320,2000-01-02,320,08:00:00,IvjrlljbeB,7500.419388155823\\r\\n330,2000-01-02,330,09:00:00,aVWVRXZjZy,3686.5972529264213\\r\\n340,2000-01-02,340,10:00:00,QKeTjcNlCG,1228.9751449454411\\r\\n350,2000-01-02,350,11:00:00,phEdHCVsbe,4254.15983968718\\r\\n360,2000-01-02,360,12:00:00,ursHJjQxRK,6099.131673115221\\r\\n370,2000-01-02,370,13:00:00,JvjcRlYcYG,1503.3586866746164\\r\\n380,2000-01-02,380,14:00:00,gzCyqHPRRb,7816.898213939008\\r\\n390,2000-01-02,390,15:00:00,lQZmobRwzt,8295.113759829599\\r\\n400,2000-01-02,400,16:00:00,qspiYGfTou,1987.8215069414816\\r\\n410,2000-01-02,410,17:00:00,mcqWMMzomf,15.878728570531964\\r\\n420,2000-01-02,420,18:00:00,fiPsxulpGU,5380.485947841902\\r\\n430,2000-01-02,430,19:00:00,gTAyTkpeez,4720.7159908343565\\r\\n440,2000-01-02,440,20:00:00,hzFbhAPvFX,946.5797295044975\\r\\n450,2000-01-02,450,21:00:00,NYNcYxsyVl,7333.850198973723\\r\\n460,2000-01-02,460,22:00:00,wvgMmIxLzo,7399.341315026157\\r\\n470,2000-01-02,470,23:00:00,bZoyzAGgEC,5464.053510955946\\r\\n480,2000-01-03,480,00:00:00,jZNaceUYyr,1390.8829937709977\\r\\n490,2000-01-03,490,01:00:00,sbfLgcCpru,9626.900131786555\\r\\n500,2000-01-03,500,02:00:00,MHpAkHfnmV,9406.471079089133\\r\\n510,2000-01-03,510,03:00:00,ENdFBGtRCq,3740.8773019724517\\r\\n520,2000-01-03,520,04:00:00,FzqXhMLHLY,4270.3585910905\\r\\n530,2000-01-03,530,05:00:00,wWinjEGhAj,8548.152649813675\\r\\n540,2000-01-03,540,06:00:00,LcxAImCvxt,4097.693176523874\\r\\n550,2000-01-03,550,07:00:00,sDhzGBYKpt,1673.7466277500146\\r\\n560,2000-01-03,560,08:00:00,jhagjcZhGU,4103.702089490347\\r\\n570,2000-01-03,570,09:00:00,ZIkRwPWyWP,9368.662605679918\\r\\n580,2000-01-03,580,10:00:00,uphgoCQwZY,3321.0096306747137\\r\\n590,2000-01-03,590,11:00:00,jEKaqqScLF,8442.084614664149\\r\\n600,2000-01-03,600,12:00:00,kSIJFBHVnL,4065.19226287942\\r\\n610,2000-01-03,610,13:00:00,YRhoANskYn,5089.668482943252\\r\\n620,2000-01-03,620,14:00:00,SnlwCSdkWf,5738.46737129545\\r\\n630,2000-01-03,630,15:00:00,ANfpLOiJTV,393.77545256928823\\r\\n640,2000-01-03,640,16:00:00,DUxigzNtLz,6798.725575133883\\r\\n650,2000-01-03,650,17:00:00,jaJECwmWTY,5178.597327486391\\r\\n660,2000-01-03,660,18:00:00,tzrWZLSELo,7467.995039288831\\r\\n670,2000-01-03,670,19:00:00,rbUWLCKjeV,4013.698847016407\\r\\n680,2000-01-03,680,20:00:00,JKFAZgEkja,1538.6412971598695\\r\\n690,2000-01-03,690,21:00:00,uEomQhtneK,2849.6558284053976\\r\\n700,2000-01-03,700,22:00:00,VNqwqzfgXT,6756.852702484582\\r\\n710,2000-01-03,710,23:00:00,YzYqAlWMKn,9250.2543956494\\r\\n720,2000-01-04,720,00:00:00,VBrvxVqNpT,7430.930594705144\\r\\n730,2000-01-04,730,01:00:00,KxgdYwiVtl,1190.2548337790097\\r\\n740,2000-01-04,740,02:00:00,oPUENybUiS,247.4663426770396\\r\\n750,2000-01-04,750,03:00:00,bgpLfCsNrU,6472.8593061097\\r\\n760,2000-01-04,760,04:00:00,xmRUnIzNOL,5791.031151521782\\r\\n770,2000-01-04,770,05:00:00,SsYMDEINvO,347.35344936110636\\r\\n780,2000-01-04,780,06:00:00,XuorBLXsEt,9003.971751685769\\r\\n790,2000-01-04,790,07:00:00,jRYnFPYRKE,858.8836157464275\\r\\n800,2000-01-04,800,08:00:00,uRRXIdQDYH,4914.608250347407\\r\\n810,2000-01-04,810,09:00:00,nxkVSEnKXv,3586.0998633311424\\r\\n820,2000-01-04,820,10:00:00,BddLdFLDkg,9392.836980063128\\r\\n830,2000-01-04,830,11:00:00,MNuZvbMDqM,4075.512732895953\\r\\n840,2000-01-04,840,12:00:00,KfiIyqdZJq,4450.624248264806\\r\\n850,2000-01-04,850,13:00:00,ZNzdZZhipO,5155.329570863023\\r\\n860,2000-01-04,860,14:00:00,MmVEuWyJJt,7125.153628136557\\r\\n870,2000-01-04,870,15:00:00,QTVeqONJWF,7459.723393845693\\r\\n880,2000-01-04,880,16:00:00,sVHRlErfHm,5349.520468668593\\r\\n890,2000-01-04,890,17:00:00,OfcunHkqxU,2538.9594014567383\\r\\n900,2000-01-04,900,18:00:00,rXTISMpGvf,6136.26826553925\\r\\n910,2000-01-04,910,19:00:00,YYgIQPrYmN,2828.778965008356\\r\\n920,2000-01-04,920,20:00:00,acLWVYscRm,2135.4492617161204\\r\\n930,2000-01-04,930,21:00:00,ejuIuzrhoE,7853.20277523869\\r\\n940,2000-01-04,940,22:00:00,nEIyUKZvtl,9026.298438227512\\r\\n950,2000-01-04,950,23:00:00,fVrPrRMjgE,1108.9112508806\\r\\n960,2000-01-05,960,00:00:00,aQbeIHZfrq,6779.761579736982\\r\\n970,2000-01-05,970,01:00:00,NSYmULwYsy,4710.484556444787\\r\\n980,2000-01-05,980,02:00:00,OstJdNkpJM,6696.018116272272\\r\\n990,2000-01-05,990,03:00:00,zPdwVSfwsw,1019.0631993852805\\r\\n1000,2000-01-05,1000,04:00:00,PrPiNtxItj,4786.919229745998\\r\\n1010,2000-01-05,1010,05:00:00,iTrMpbwDkd,1082.2792701135043\\r\\n1020,2000-01-05,1020,06:00:00,VIOGBhjuvc,6712.260837571906\\r\\n1030,2000-01-05,1030,07:00:00,vKfivaIyHN,8660.527086155422\\r\\n1040,2000-01-05,1040,08:00:00,bAlxEIEfpN,1415.7747325826188\\r\\n1050,2000-01-05,1050,09:00:00,cJPGJmIKdc,9816.3246377919\\r\\n1060,2000-01-05,1060,10:00:00,AdSXaKQpQX,3536.32709953549\\r\\n1070,2000-01-05,1070,11:00:00,PHntAagAlw,7431.850668273714\\r\\n1080,2000-01-05,1080,12:00:00,ZtQrFBobvY,4224.027690860892\\r\\n1090,2000-01-05,1090,13:00:00,ZuPnbhaSOU,3484.8530656320654\\r\\n1100,2000-01-05,1100,14:00:00,qOSVmejqdo,6847.384220484392\\r\\n1110,2000-01-05,1110,15:00:00,kwckywqRbb,5867.829131220223\\r\\n1120,2000-01-05,1120,16:00:00,JLrzzbUfDi,6991.180870142121\\r\\n1130,2000-01-05,1130,17:00:00,qPuDjhipNE,2544.115558392327\\r\\n1140,2000-01-05,1140,18:00:00,nTuOipVPUZ,3521.350549002792\\r\\n1150,2000-01-05,1150,19:00:00,FxTDpmsUYC,5796.837844528479\\r\\n1160,2000-01-05,1160,20:00:00,IilnnODeoz,9981.446352555968\\r\\n1170,2000-01-05,1170,21:00:00,lJpBtcVSww,8659.609927822496\\r\\n1180,2000-01-05,1180,22:00:00,uefmaifDgk,164.5549179029382\\r\\n1190,2000-01-05,1190,23:00:00,AQsKnkJxOV,455.31829622753816\\r\\n1200,2000-01-06,1200,00:00:00,IUcDyPSHIE,5727.976331105652\\r\\n1210,2000-01-06,1210,01:00:00,nrEdNiWGdi,2015.5167059418156\\r\\n1220,2000-01-06,1220,02:00:00,EflmCojQzg,9514.004760633412\\r\\n1230,2000-01-06,1230,03:00:00,LsAIvtooWr,7898.8225145572\\r\\n1240,2000-01-06,1240,04:00:00,yiDOUysGHw,4219.262059231663\\r\\n1250,2000-01-06,1250,05:00:00,idWAZATxwy,3043.2304072778616\\r\\n1260,2000-01-06,1260,06:00:00,sBedlknKzY,3840.820372936372\\r\\n1270,2000-01-06,1270,07:00:00,ReEmhVRAjb,6966.434389542963\\r\\n1280,2000-01-06,1280,08:00:00,XnFrfzMBKt,6041.8596064524045\\r\\n1290,2000-01-06,1290,09:00:00,MaMMHEWEIf,2569.2675325271707\\r\\n1300,2000-01-06,1300,10:00:00,OUpokSyVfO,7387.813510302333\\r\\n1310,2000-01-06,1310,11:00:00,VgCigxOcbF,7695.008235452545\\r\\n1320,2000-01-06,1320,12:00:00,ouRNYgSzXq,3293.250454887212\\r\\n1330,2000-01-06,1330,13:00:00,iQczJExipS,1892.9945453269115\\r\\n1340,2000-01-06,1340,14:00:00,vVbLlDWFCr,7105.276586964716\\r\\n1350,'

with open("bug_csv.csv", "wb") as f:
    f.write(buffer)

with open("bug_csv.csv", encoding="unicode_escape", newline="") as f:
    f.readline()
msg403848 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2021-10-13 16:25
It can be shortened to this:

buffer = b"a" * 8191 + b"\\r\\n"

with open("bug_csv.csv", "wb") as f:
    f.write(buffer)

with open("bug_csv.csv", encoding="unicode_escape", newline="") as f:
    f.readline()

To me it looks like it's reading in blocks of 8K and then decoding them,  but it isn't correctly handling an escape sequence that happens to cross a block boundary.
msg403892 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-14 10:17
New changeset c96d1546b11b4c282a7e21737cb1f5d16349656d by Serhiy Storchaka in branch 'main':
bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939)
https://github.com/python/cpython/commit/c96d1546b11b4c282a7e21737cb1f5d16349656d
msg403919 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-14 17:02
New changeset 0bff4ccbfd3297b0adf690655d3e9ddb0033bc69 by Miss Islington (bot) in branch '3.10':
[3.10] bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939) (GH-28943)
https://github.com/python/cpython/commit/0bff4ccbfd3297b0adf690655d3e9ddb0033bc69
msg403920 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-14 17:03
New changeset 7c722e32bf582108680f49983cf01eaed710ddb9 by Serhiy Storchaka in branch '3.9':
[3.9] bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939) (GH-28945)
https://github.com/python/cpython/commit/7c722e32bf582108680f49983cf01eaed710ddb9
History
Date User Action Args
2021-10-14 17:51:53serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-10-14 17:03:33serhiy.storchakasetmessages: + msg403920
2021-10-14 17:02:29serhiy.storchakasetmessages: + msg403919
2021-10-14 10:56:35serhiy.storchakasetpull_requests: + pull_request27233
2021-10-14 10:30:09serhiy.storchakalinkissue45467 dependencies
2021-10-14 10:17:32serhiy.storchakasetmessages: + msg403892
2021-10-14 10:17:21miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request27231
2021-10-13 21:31:05serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request27228
2021-10-13 18:45:55serhiy.storchakasetassignee: serhiy.storchaka
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.8
2021-10-13 17:08:24vstinnersetnosy: + serhiy.storchaka
2021-10-13 16:25:23mrabarnettsetnosy: + mrabarnett
messages: + msg403848
2021-10-13 14:55:06anmyachevsetmessages: + msg403840
2021-10-13 14:41:20vstinnersetmessages: + msg403838
2021-10-13 14:31:37anmyachevcreate