在 Python 中将 Unicode 字符串转换为常规字符串

在Python中，许多消息（包括错误或输入和输出）都是国际化的，因此不懂英语的人也可以用它进行编程。Unicode 是一种格式，其中字符串中的各个字符被赋予唯一的标识。它是一种规范，为人类语言中使用的每个字符分配一个特定的值或代码。

Unicode 是一种为人类语言中使用的每个字符分配唯一代码的格式。它对于保持编程语言的一致性和容纳特殊字符至关重要。Unicode 库不断更新最新的字符和符号

在 Unicode 标准中，字符（字符串的最小组成部分）表示为代码点。代码点可以具有 0 到 0x10FFFF 之间的任何值（大约 110 万个值）。因此，Unicode 字符串是代码点的集合。

Unicode 字符可以转换为 8 位字节，这个过程可以在计算机系统的内存中完成，也称为编码。

UTF-8 编码的优点

“UTF”代表 Unicode 转换格式，“8”代表在此格式中用于编码的 8 位值。与在系统 CPU 中使用 32 位编码相比，它具有许多优点，例如它可以处理代码点值，而且 ASCII 字符串也是有效的 utf-8 文本。

使用 utf-8 编码时可以解决字节顺序问题。丢失的数据可以重新同步，并且可以确定utf-8文本的起点和终点。与不可移植的 32 位编码不同，它也是可移植的。

该方法现在是 python 的标准 Unicode 编码。

另请查看：Converting Base-2 Binary Number Strings to Integers in Python。

将普通字符串转换为 Unicode 字符串

我们可以使用 python 中的内置库将 Unicode 字符串解码为普通字符串。我们将在本文中介绍它，您可以使用您认为适合您的程序的任何一个。

示例：将字符串转换为 Unicode 字符

我们首先将字符串转换为 Unicode 字符。

import re 
# initializing string
org = 'Askpythonisthebest'
sol = (re.sub('.', lambda x: r'\u % 04X' % ord(x.group()), org)) 
# printing result
print("The unicode converted String : " + str(sol))

我们的输出是：

The unicode converted String : \u  041\u  073\u  06B\u  070\u  079\u  074\u  068\u  06F\u  06E\u  069\u  073\u  074\u  068\u  065\u  062\u  065\u  073\u  074

将 Unicode 字符串转换为普通字符串

我们可以使用 unicodedata 模块中的 unicode.normalize() 函数将 Unicode 字符串转换为普通字符串。该模块使用 Unicode 字符数据库中的相同约定。

该函数的语法是：

unicodedata.normalize('type', string_name)

“type”参数可以采用 4 个不同的值：“NFC”、“NFKC”、“NFD”和“NFKD”。对于每个字符有两种范式：D代表正常规范分解（NFD）和C，它首先执行正常规范分解，然后再次组合预组合字符（NFC）。

规范形式“NFKD”应用规范兼容性分解，而规范形式“NFKC”首先应用规范兼容性分解，然后规范组合。

两个 Unicode 字符串在人眼看来可能相同，但如果一个字符串具有组合字符，而另一个字符串没有，则它们比较起来可能不相等。

示例：将 Unicode 字符串转换为常规字符串

让我们看一个小例子：

#importing the unicodedata module
import unicodedata 
#initializing unicode string
org=u"Askpython.com!"
#converting the unicode string using normalize()
ans= unicodedata.normalize('NFC', org)
#printing the type after coversion
print("The string and it's type is= ")
print(ans, type(ans))

输出将是：

The string and it's type is= 
Askpython.com! <class 'str'>

将 Unicode 字符串转换为 Python 中的字符串。

让我们再举一个例子，这次我们将使用encode()函数和normalize()函数来处理多个特殊字符。

#importing the unicodedata module
import unicodedata 
#initializing unicode string
org=u"üft träms inför på fédéral große-aàççññ"
#converting the unicode string using normalize()
ans= unicodedata.normalize('NFKD', org).encode('ascii', 'ignore')
#printing the type after coversion
print("The string and it's type is= ")
print(ans, type(ans))

输出将是：

The string and it's type is= 
b'uft trams infor pa federal groe-aaccnn' <class 'bytes'>

从 Unicode 到普通字符串的特殊字符转换

建议：将字节转换为 Ascii 或 Unicode。

结论

Unicode字符数据库提供了一种为各种字符分配唯一值的通用方法，从而可以轻松地在计算机系统中识别它们。尽管大多数 Unicode 和 ASCII 编码和解码都在幕后进行，但了解将字符转换为其 Unicode 对应项的机制和规则至关重要。在本教程中，我们演示了如何在 Python 中轻松将 Unicode 字符串转换为常规字符串。