计算 Python 中字符串或文件中唯一单词的数量

计算 Python 字符串中不重复单词的数量

要计算字符串中唯一单词的数量：

使用str.split()方法将字符串拆分为单词列表。
使用该类set()将列表转换为set.
使用该len()函数获取字符串中唯一单词的计数。

主程序


my_str = 'one one two two'

unique_words = set(my_str.split())
print(unique_words)  # 👉️ {'one', 'two'}

length = len(unique_words)
print(length)  # 👉️ 2

如果您需要计算文件中的唯一字数，请单击以下副标题：

计算 Python 文本文件中的唯一单词

我们首先使用该str.split()方法将字符串拆分为单词列表。

主程序


my_str = 'one one two two'

print(my_str.split()) # 👉️ ['one', 'one', 'two', 'two']

str.split ()方法使用定界符将字符串拆分为子字符串列表。

当没有分隔符传递给该str.split()方法时，它会将输入字符串拆分为一个或多个空白字符。

下一步是使用该类set()将单词列表转换为
set对象。

主程序


my_str = 'one one two two'

unique_words = set(my_str.split())
print(unique_words)  # 👉️ {'one', 'two'}

set ()类接受一个可迭代的可选参数，并返回一个新set对象，其中的元素取自可迭代对象。

Set 对象存储唯一元素的无序集合，因此将列表转换为 aset会删除所有重复元素。

最后一步是使用该len()函数获取唯一单词的数量。

主程序


my_str = 'one one two two'

unique_words = set(my_str.split())
print(unique_words)  # 👉️ {'one', 'two'}

length = len(unique_words)
print(length)  # 👉️ 2

len ()函数返回对象的长度（项目数）。

该函数采用的参数可以是序列（字符串、元组、列表、范围或字节）或集合（字典、集合或冻结集合）。

在 Python 中计算文本文件中的唯一单词

要计算文本文件中的唯一单词：

将文件内容读入字符串，并拆分成单词。
使用该类set()将列表转换为set对象。
使用该len()函数来计算文本文件中的唯一单词。

主程序


with open('example.txt', 'r', encoding='utf-8') as f:
    words = f.read().split()
    print(words)  # 👉️ ['one', 'one', 'two', 'two', 'three', 'three']

    unique_words = set(words)
    print(len(unique_words))  # 👉️ 3

    print(unique_words)  # {'three', 'one', 'two'}

上面的示例假定您有一个example.txt使用以下内容命名的文件。

例子.txt


one one
two two
three three

我们以读取模式打开文件，并使用该read()方法将其内容读入字符串。

下一步是使用该str.split()方法将字符串拆分为单词列表。

主程序


with open('example.txt', 'r', encoding='utf-8') as f:
    words = f.read().split()
    print(words)  # 👉️ ['one', 'one', 'two', 'two', 'three', 'three']

str.split ()方法使用定界符将字符串拆分为子字符串列表。

当没有分隔符传递给该str.split()方法时，它会将输入字符串拆分为一个或多个空白字符。

我们使用该类set()将列表转换为set对象。

主程序


with open('example.txt', 'r', encoding='utf-8') as f:
    words = f.read().split()
    print(words)  # 👉️ ['one', 'one', 'two', 'two', 'three', 'three']

    unique_words = set(words)
    print(len(unique_words))  # 👉️ 3

    print(unique_words)  # {'three', 'one', 'two'}

set ()类接受一个可迭代的可选参数，并返回一个新set对象，其中的元素取自可迭代对象。

Set 对象是唯一元素的无序集合，因此将列表转换为 aset会删除所有重复元素。

最后一步是使用该len()函数来获取文件中唯一单词的计数。

len ()函数返回对象的长度（项目数）。

该函数采用的参数可以是序列（字符串、元组、列表、范围或字节）或集合（字典、集合或冻结集合）。

使用 for 循环计算字符串中唯一单词的数量

这是一个五步过程：

声明一个存储空列表的新变量。
使用str.split()方法将字符串拆分为单词列表。
使用for循环遍历列表。
使用该list.append()方法将所有唯一单词附加到列表中。
使用len()函数获取列表的长度。

主程序


my_str = 'one one two two'

unique_words = []

for word in my_str.split():
    if word not in unique_words:
        unique_words.append(word)


print(len(unique_words)) # 👉️ 2

print(unique_words) # 👉️ ['one', 'two']

我们使用该str.split()方法将字符串拆分为单词列表，并使用for 循环遍历该列表。

在每次迭代中，我们使用not in运算符来检查元素是否不存在于列表中。

in 运算符测试成员资格。例如，如果是的成员
，则x in l计算为，否则计算为。TruexlFalse

x not in l返回的否定x in l。

list.append ()方法将一个项目添加到列表的末尾。

主程序


my_list = ['bobby', 'hadz']

my_list.append('com')

print(my_list)  # 👉️ ['bobby', 'hadz', 'com']

最后一步是使用该len()函数获取字符串中唯一单词的数量。

使用 for 循环计算文本文件中的唯一单词

这是一个五步过程：

声明一个存储空列表的新变量。
将文件内容读入字符串，并拆分成单词。
使用for循环遍历列表。
使用该list.append()方法将所有唯一单词附加到列表中。
使用len()函数获取列表的长度。

主程序


unique_words = []

with open('example.txt', 'r', encoding='utf-8') as f:
    words = f.read().split()
    print(words)  # 👉️ ['one', 'one', 'two', 'two', 'three', 'three']

    for word in words:
        if word not in unique_words:
            unique_words.append(word)


print(len(unique_words))  # 👉️ 3

print(unique_words)  # 👉️ ['one', 'two', 'three']

我们将文件的内容读入一个字符串，并使用该str.split()方法将字符串拆分为单词列表。

在每次迭代中，我们使用not in运算符来检查该词是否不存在于唯一词列表中。

如果满足条件，我们使用该list.append()方法将值追加到列表中。