从 Python 字符串中删除非字母数字字符

使用该re.sub()方法从字符串中删除所有非字母数字字符。

该re.sub()方法将通过用空字符串替换它们来从字符串中删除所有非字母数字字符。

主程序


import re

my_str = 'bobby !hadz@ com 123'

# ✅ Remove all non-alphanumeric characters from string

new_str = re.sub(r'[\W_]', '', my_str)
print(new_str)  # 👉️ 'bobbyhadzcom123'

# -----------------------------------------------

# ✅ Remove all non-alphanumeric characters from string,
#  preserving whitespace

new_str = re.sub(r'[^\w\s]', '', my_str)
print(new_str)  # 👉️ 'bobby hadz com 123'

如果您需要从字符串中删除非字母字符，请单击以下副标题。

从 Python 中的字符串中删除所有非字母字符

该示例使用re.sub()方法从字符串中删除所有非字母数字字符。

re.sub方法返回一个新字符串，该字符串是通过用提供的替换替换模式的出现而获得的。

如果未找到模式，则按原样返回字符串。

我们传递给该方法的第一个参数re.sub()是一个正则表达式。

方括号[]用于指示一组字符。

( \Wcapital W) 特殊字符匹配任何不是单词字符的字符。

我们通过用空字符串替换每个字符来删除所有非字母数字字符。

删除非字母数字字符但保留空格

如果要保留空格并删除所有非字母数字字符，请使用以下正则表达式。

主程序


import re

my_str = 'bobyb !hadz@ com 123'

new_str = re.sub(r'[^\w\s]', '', my_str)
print(new_str)  # 👉️ 'bobyb hadz com 123'

^集合开头的插入符号表示“NOT”。换句话说，匹配所有不是 Unicode 单词字符、数字、下划线或空格的字符。

该\w字符与字符相反\W并匹配：

可以是任何语言中单词的一部分的字符
数字
下划线字符

该\s字符匹配 Unicode 空白字符，例如 [ \t\n\r\f\v].

如果您在阅读或编写正则表达式时需要帮助，请参阅

官方文档中的正则表达式语法副标题。

该页面包含所有特殊字符的列表以及许多有用的示例。

如果您的字符串有多个空格彼此相邻，您可能必须用
一个空格替换多个连续的空格。

主程序


import re

my_str = 'bobby    !hadz@    com   123'


new_str = re.sub(r'[^\w\s]', '', my_str)
print(new_str)  # 👉️ 'bobby    hadz    com   123'

result = " ".join(new_str.split())
print(result)  # 👉️ 'bobby hadz com 123'

该str.split()方法将字符串拆分为一个或多个空白字符，然后我们使用单个空格分隔符连接字符串列表。

或者，您可以使用生成器表达式。

使用生成器表达式从字符串中删除非字母数字字符

这是一个三步过程：

使用生成器表达式迭代字符串。
使用该str.isalnum()方法检查每个字符是否为字母数字。
使用该str.join()方法连接字母数字字符。

主程序


my_str = 'bobby !hadz@ com 123'


new_str = ''.join(char for char in my_str if char.isalnum())
print(new_str)  # 👉️ 'bobbyhadzcom123'

new_str = ''.join(char for char in my_str if char.isalnum() or char == ' ')
print(new_str)  # 👉️ 'bobby hadz com 123'

我们使用
生成器表达式
来迭代字符串。

生成器表达式用于对每个元素执行某些操作或选择满足条件的元素子集。

在每次迭代中，我们使用该str.isalnum()方法检查当前字符是否为字母数字并返回结果。

如果字符串中的所有字符都是字母数字并且字符串至少包含一个字符，则str.isalnum方法
返回，否则，该方法返回。TrueFalse

主程序


print('A'.isalnum())  # 👉️ True

print('!'.isalnum())  # 👉️ False

print('5'.isalnum())  # 👉️ True

生成器对象仅包含字母数字字符。

最后一步是使用该str.join()方法将字母数字字符连接成一个字符串。

主程序


my_str = 'bobby !hadz@ com 123'


new_str = ''.join(char for char in my_str if char.isalnum())
print(new_str)  # 👉️ 'bobbyhadzcom123'

str.join方法将一个可迭代对象作为参数并返回一个字符串，该字符串是可迭代对象中字符串的串联。

调用该方法的字符串用作元素之间的分隔符。

出于我们的目的，我们join()在空字符串上调用该方法以连接不带分隔符的字母数字字符。

如果要删除非字母数字字符并保留空格，请使用布尔 OR
运算符。

主程序


my_str = 'bobby !hadz@ com 123'


new_str = ''.join(
    char for char in my_str
    if char.isalnum() or char == ' '
)
print(new_str)  # 👉️ 'bobby hadz com 123'

我们使用了布尔or运算符，因此要将字符添加到生成器对象，必须满足其中一个条件。

该字符必须是字母数字或必须是空格。

使用 filter() 从字符串中删除非字母数字字符

您还可以使用该filter()函数从字符串中删除所有非字母数字字符。

主程序


my_str = 'bobby !hadz@ com 123'

new_str = ''.join(filter(str.isalnum, my_str))
print(new_str)  # 👉️ bobbyhadzcom123

fnmatch.filter
方法接受一个可迭代对象和一个模式，并返回一个新列表，该
列表仅包含与提供的模式匹配的可迭代对象元素。

我们将str.isalnum方法传递给，filter()以便使用字符串中的每个字符调用该方法。

该方法返回一个新对象，该对象仅包含该方法返回的filter字符。str.isalnum()True

最后一步是使用该str.join()方法将对象连接filter成一个字符串。

在 Python 中从 String 中删除所有非字母字符

该re.sub()方法还可用于从字符串中删除所有非字母字符。

主程序


import re

my_str = 'bobby! hadz@ com'


# ✅ Remove all non-alphabetic characters from string (re.sub())

new_str = re.sub(r'[^a-zA-Z]', '', my_str)
print(new_str)  # 👉️ 'bobbyhadzcom'

# -----------------------------------------------------

# ✅ Remove all non-alphabetic characters from string, preserving whitespace
new_str = re.sub(r'[^a-zA-Z\s]', '', my_str)
print(new_str)  # 👉️ 'bobby hadz com'

该示例使用re.sub()方法从字符串中删除所有非字母字符。

re.sub方法返回一个新字符串，该字符串是通过用提供的替换替换模式的出现而获得的。

主程序


import re

my_str = 'bobby! hadz@ com'


new_str = re.sub(r'[^a-zA-Z]', '', my_str)
print(new_str)  # 👉️ 'bobbyhadzcom'

new_str = re.sub(r'[^a-zA-Z\s]', '', my_str)
print(new_str)  # 👉️ 'bobby hadz com'

如果未找到模式，则按原样返回字符串。

我们传递给该方法的第一个参数re.sub()是一个正则表达式。

方括号[]用于指示一组字符。

^集合开头的插入符号表示“NOT”。换句话说，匹配所有不是字母的字符。

和字符代表小写和a-z大写A-Z字母范围。

删除所有非字母字符，但保留空格

如果您需要删除所有非字母字符并保留空格，请使用以下正则表达式。

主程序


import re

my_str = 'bobby! hadz@ com'


new_str = re.sub(r'[^a-zA-Z\s]', '', my_str)
print(new_str)  # 👉️ 'bobby hadz com'

该\s字符匹配 unicode 空白字符，如 [ \t\n\r\f\v].

总的来说，正则表达式匹配所有非字母或空白字符。

如果您在阅读或编写正则表达式时需要帮助，请参阅

官方文档中的正则表达式语法副标题。

该页面包含所有特殊字符的列表以及许多有用的示例。

如果您的字符串有多个相邻的空格，您可能必须用一个空格替换多个连续的空格。

主程序


import re

my_str = 'bobby!   hadz@   com'


new_str = re.sub(r'[^a-zA-Z\s]', '', my_str)
print(new_str)  # 👉️ 'bobby   hadz   com'

result = ' '.join(new_str.split())
print(result)  # 👉️ 'bobby hadz com'

该str.split()方法将字符串拆分为一个或多个空白字符，然后我们使用单个空格分隔符连接字符串列表。

或者，您可以使用生成器表达式。

# Remove all non-alphabetic characters from String using generator expression

This is a three-step process:

Use a generator expression to iterate over the string.
Use the str.isalpha() method to check if each character is alphabetic.
Use the str.join() method to join the alphabetic characters.

main.py


my_str = 'bobby! hadz@ com'


new_str = ''.join(
    char for char in my_str
    if char.isalpha()
)
print(new_str)  # 👉️ 'bobbyhadzcom'

new_str = ''.join(
    char for char in my_str
    if char.isalpha() or char == ' '
)
print(new_str)  # 👉️ 'bobby hadz com'

We used a generator expression to iterate over the string.

Generator expressions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the str.isalpha() method to check if the current
character is alphabetic and we return the result.

The str.isalpha method
returns True if all characters in the string are alphabetic and there is at
least one character, otherwise, the method returns False.

main.py


print('H'.isalpha())  # 👉️ True

print('@'.isalpha())  # 👉️ False

The generator object contains only alphabetic characters.

main.py


my_str = 'bobby! hadz@ com'


new_str = ''.join(
    char for char in my_str
    if char.isalpha()
)
print(new_str)  # 👉️ 'bobbyhadzcom'

The last step is to use the str.join() method to join the alphabetic
characters into a string.

The str.join method takes an
iterable as an argument and returns a string which is the concatenation of the
strings in the iterable.

The string the method is called on is used as the separator between the
elements.

For our purposes, we call the join() method on an empty string to join the alphabetic characters without a separator.

If you want to remove the non-alphabetic characters and preserve the whitespace,
use the boolean or operator.

main.py


my_str = 'bobby! hadz@ com'


new_str = ''.join(
    char for char in my_str
    if char.isalpha() or char == ' '
)
print(new_str)  # 👉️ 'bobby hadz com'

We used the boolean or operator, so for the character to be added to the
generator object, one of the conditions has to be met.

The character has to be alphabetic or it has to be a space.

# Remove all non-alphabetic characters from String using filter()

This is a three-step process:

Pass the str.isalpha() method and the string to the filter() function.
The str.isalpha() method will filter out all non-letter characters.
Use the str.join() method to join the result into a string.

main.py


a_string = 'bobby123hadz456.com'

only_letters = ''.join(
    filter(
        str.isalpha,
        a_string
    )
)

print(only_letters)  # 👉️ bobbyhadzcom

The filter function
takes a function and an iterable as arguments and constructs an iterator from
the elements of the iterable for which the function returns a truthy value.

We passed the str.isalpha() method to the filter() function.

The str.isalpha() method gets called with each character in the string and returns True if the character is a letter.

The last step is to use the str.join() method to join all matching characters
into a string.

Which approach you pick is a matter of personal preference. I’d use the
str.isalpha() method with a generator expression because the approach is quite
direct and intuitive.

# Additional Resources

You can learn more about the related topics by checking out the following
tutorials:

从 Python 字符串中删除非字母数字字符

目录

Remove non-alphanumeric characters from a Python string