classification
Title: Documentation Language mixed up
Type: Stage: needs patch
Components: Documentation Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Mariatta, asl, docs@python, inada.naoki, mdk, zach.ware
Priority: high Keywords:

Created on 2017-09-26 05:14 by asl, last changed 2017-10-10 06:05 by inada.naoki.

Files
File name Uploaded Description Edit
shot-20170926-28526-p58mc1.jpeg asl, 2017-09-27 17:23
デザインと歴史 FAQ — Python 2.7.14 ドキュメント.html asl, 2017-09-27 17:23
Messages (15)
msg303004 - (view) Author: asl (asl) Date: 2017-09-26 05:14
Some of the documentation language are mixed up.

eg:
Japanese on the English page https://docs.python.org/2.7/faq/design.html
French on the Japanese https://docs.python.org/ja/2.7/faq/design.html

It seem to affect multiple pages:
https://docs.python.org/2.7/bugs.html
https://docs.python.org/2.7/distributing/index.html
https://docs.python.org/2.7/extending/index.html
https://docs.python.org/2.7/extending/extending.html
The c-api pages https://docs.python.org/2.7/c-api/index.html
The faq pages https://docs.python.org/2.7/faq/index.html

And possibly others.
msg303008 - (view) Author: Mariatta Wijaya (Mariatta) * (Python committer) Date: 2017-09-26 05:48
Indeed a problem! Seems to affect Python 2.7 docs. If not mistaken, Julien worked on the language switcher. Maybe he has some clue ...
msg303021 - (view) Author: Julien Palard (mdk) * Date: 2017-09-26 07:49
Hi asl, thanks for reporting I'm looking at it.
Thanks Mariatta for the notification.
msg303023 - (view) Author: Julien Palard (mdk) * Date: 2017-09-26 07:56
I'm currently unable to find any misplaced string...

I suspect the "fast builds" may cause the bug and the "full build" may fix it, and I probably checked right after a full build.

I'll test this locally soon (full build in one lang followed by a fast build in another lang) to see if I can reproduice it.

The build cycle is "build english then build french then build japanese", so having english on french is unnoticable, and you noticed french on japanese and japanese on english, so it looks to match with the build pattern.

We may temporarily deactivate fast builds it needed (if it's long to fix) I'll know more after my local tests.
msg303154 - (view) Author: Julien Palard (mdk) * Date: 2017-09-27 15:09
Local builds were not able to reproduce the bug for the moment and I did not spotted the bug on production neither. If anyone see it please write down the URL, the misplaced translation, and the date, time, timezone on which the string was found, so I can inspect the server logs.

Did the strings were all over the page, or only a few strings on a normally rendered page?
msg303164 - (view) Author: asl (asl) Date: 2017-09-27 17:23
Attached is a screenshot of https://docs.python.org/2.7/faq/design.html generated by web-capture.net shortly before this report was created.

It shows the whole content was in Japanese.

On the screenshot, it says it was last updated on Sep 26, 2017.

It was one of the numerous sources I used to verify that it wasn't just my end which was seeing the language mix up.

Unfortunately, that is the only actual screenshot that I have available that was created when it happened.

The browser cache for the English page is also gone but I do have the cache of the Japanese page which was in French.

https://docs.python.org/ja/2.7/faq/design.html
```
HTTP/1.1 200
status: 304
date: Wed, 27 Sep 2017 16:54:12 GMT
via: 1.1 varnish
age: 130544
x-served-by: cache-sin18020-SIN
x-cache: HIT
x-cache-hits: 2
x-timer: S1506531252.204533,VS0,VE0
server: nginx
content-type: text/html
last-modified: Tue, 26 Sep 2017 03:00:31 GMT
etag: "59c9c2cf-146c2"
x-clacks-overhead: GNU Terry Pratchett
accept-ranges: bytes
content-length: 83650
```
msg303165 - (view) Author: asl (asl) Date: 2017-09-27 17:23
html dump
msg303171 - (view) Author: Julien Palard (mdk) * Date: 2017-09-27 19:26
Thanks for those details.

Until we find japanese on a french page, or french on an english page, which would clearly disproove the hypothesis of a build picking up translated files from a previous build, I still consider this hypothesis the good one.

I'm also seeing pages like genindex-位.html on the english directory tree, which is less surprising : the english build may just not remove them.
msg303174 - (view) Author: Julien Palard (mdk) * Date: 2017-09-27 19:38
Found another hypothesis: As the builds take a long time, builds may overlap (starting a build while another is still unfinished, typically rsync-ing its files), but this would result in:

- japanese being build while french being rsynced
- french being build while english being rsynced
- english being build while japanese being rsynced

This mean japanese on french pages, french on english pages, and english on japanese pages, which is the other way around, nobody spotted any of this overlaps, so I discard this hypothesis.
msg303175 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2017-09-27 19:48
Sounds like we really ought to be building each translation in its own directory so they can't possibly stomp on each other.
msg303176 - (view) Author: Julien Palard (mdk) * Date: 2017-09-27 19:54
I'd prefer to understand exacly what is going wrong and fix it, but I agree it would fix the issue and even fix the other hypothesis, so I don't exclude doing it, even if I find the root cause.

I'm currently doing a full build locally (I previously tried building using the Doc/Makefile, was unable to reproduce the bug, I'm now building using the build_docs.py script) it will take like an hour on my laptop, I'll see.
msg303186 - (view) Author: Julien Palard (mdk) * Date: 2017-09-27 22:33
I'm still unable to reproduce the bug.

I'm now monitoring the docs.python.org hierarchy whith a:

grep -rl 'définition' /srv/docs.python.org/ja/; grep -rl か /srv/docs.python.org/{2.7,3.6,3.7}

So if it happen again we'll maybe learn more, according to this simple grep the bug is not on production at this time on any page.
msg303991 - (view) Author: Julien Palard (mdk) * Date: 2017-10-09 19:04
FTR I'm still daily monitoring the presence of mixed-up pages server side and did not spotted a single one.

I'm still using my very basic:

  $ grep -rl 'définition' /srv/docs.python.org/ja/; grep -rl か /srv/docs.python.org/{2.7,3.6,3.7}
msg304014 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-10-10 05:35
Japanese HTML has this line:

    <link rel="canonical" href="https://docs.python.org/2/faq/design.html" />

I suspect this line affects CDN's cache.
But I can't find document about canonical link in fastly's document.
msg304015 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-10-10 06:05
https://twitter.com/miyagawa/status/917629042278359040

Miyagawa-san, the member of Fastly told me they doesn't use HTML
content for cache key, unless we customize VCL.

But I can't find VCL for docs.python.org in github.com/python.
History
Date User Action Args
2017-10-10 06:05:50inada.naokisetmessages: + msg304015
2017-10-10 05:35:13inada.naokisetnosy: + inada.naoki
messages: + msg304014
2017-10-09 19:04:38mdksetmessages: + msg303991
2017-09-27 22:33:08mdksetmessages: + msg303186
2017-09-27 19:54:50mdksetmessages: + msg303176
2017-09-27 19:48:18zach.waresetnosy: + zach.ware
messages: + msg303175
2017-09-27 19:38:24mdksetmessages: + msg303174
2017-09-27 19:26:05mdksetmessages: + msg303171
2017-09-27 17:23:46aslsetfiles: + デザインと歴史 FAQ — Python 2.7.14 ドキュメント.html

messages: + msg303165
2017-09-27 17:23:32aslsetfiles: + shot-20170926-28526-p58mc1.jpeg

messages: + msg303164
2017-09-27 15:09:07mdksetmessages: + msg303154
2017-09-26 07:56:07mdksetmessages: + msg303023
2017-09-26 07:49:21mdksetmessages: + msg303021
2017-09-26 05:48:01Mariattasetpriority: normal -> high

nosy: + Mariatta, mdk
messages: + msg303008

stage: needs patch
2017-09-26 05:14:07aslcreate