问题描述
对于HTML5和Python CGI:
如果我编写UTF-8元标记,我的代码不起作用。
如果我不写,它就可以工作。
页面编码是UTF-8。
print(Content-type:text / html)
print()
print(
<!doctype html>
< html>
< head>
< meta charset =UTF-8>
< / head>
< body>
< / body>
< / html>
)
print(Content-type:text / html)
此代码不起作用。
print(
<!doctype html>
< html>
< head>< / head>
< body>
şöğıçü
< / body>
< / html>
)
但这个代码有效。
对于CGI,使用 print()
要求为输出设置正确的编解码器。 print()
写入和 sys.stdout
已经以特定的编码打开以及如何确定取决于平台和可以根据脚本的运行方式而有所不同。将脚本作为CGI脚本运行意味着您几乎不知道将使用什么编码。
在您的情况中,Web服务器已将文本输出的语言环境设置为UTF-8以外的固定编码。 Python使用该语言环境设置以该编码生成输出,并且没有< meta>
标头,您的浏览器正确猜测该编码(或服务器已在内容中传达它但是使用< meta>
标头,您会告诉它使用不同的编码,这对于生成的数据是不正确的。
$ b
在显式编码为UTF-8后,您可以直接写入 sys.stdout.buffer
。创建一个辅助函数来简化操作:
import sys
def enc_print(string ='' ,encoding ='utf8'):
sys.stdout.buffer.write(string.encode(encoding)+ b'\\\
')
enc_print(Content-type:text / html)
enc_print()
enc_print(
<!doctype html>
< html>
< head>
< meta charset =UTF-8>
< / head>
< body>
şöğıçü
< / body>
< / HTML>
)
另一种方法是将 sys.stdout
与新的:
import sys
import io
$ b $ def set_output_encoding(codec,errors ='strict'):
sys.stdout = io.TextIOWrapper(
sys.stdout.detach( ),错误=错误,
line_buffering = sys.stdout.line_buffering)
set_output_encoding('utf8')
print(Content-type:text / html)
print()
print(
<!doctype html>
< html>
< head>< / head>
< body> $ b $bşöğıçü
< / body>
< / html>
)
For HTML5 and Python CGI:
If I write UTF-8 Meta Tag, my code doesn't work.If I don't write, it works.
Page encoding is UTF-8.
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
şöğıçü
</body>
</html>
""")
This codes doesn't work.
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head></head>
<body>
şöğıçü
</body>
</html>
""")
But this codes works.
For CGI, using print()
requires that the correct codec has been set up for output. print()
writes to sys.stdout
and sys.stdout
has been opened with a specific encoding and how that is determined is platform dependent and can differ based on how the script is run. Running your script as a CGI script means you pretty much do not know what encoding will be used.
In your case, the web server has set the locale for text output to a fixed encoding other than UTF-8. Python uses that locale setting to produce output in in that encoding, and without the <meta>
header your browser correctly guesses that encoding (or the server has communicated it in the Content-Type header), but with the <meta>
header you are telling it to use a different encoding, one that is incorrect for the data produced.
You can write directly to sys.stdout.buffer
, after explicitly encoding to UTF-8. Make a helper function to make this easier:
import sys
def enc_print(string='', encoding='utf8'):
sys.stdout.buffer.write(string.encode(encoding) + b'\n')
enc_print("Content-type:text/html")
enc_print()
enc_print("""
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
şöğıçü
</body>
</html>
""")
Another approach is to replace sys.stdout
with a new io.TextIOWrapper()
object that uses the codec you need:
import sys
import io
def set_output_encoding(codec, errors='strict'):
sys.stdout = io.TextIOWrapper(
sys.stdout.detach(), errors=errors,
line_buffering=sys.stdout.line_buffering)
set_output_encoding('utf8')
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head></head>
<body>
şöğıçü
</body>
</html>
""")
这篇关于Python CGI - UTF-8不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!