在 Python 中查找两个字符串之间的公共子字符串

Find common substring between two strings in Python

查找两个字符串之间的公共子字符串：

使用SequenceMatcher类来获取Match对象。
使用该find_longest_match()方法找到最长的匹配子串。
该方法返回提供的字符串中最长的匹配块。

主程序


from difflib import SequenceMatcher

string1 = 'one two three four'
string2 = 'one two nine ten'

match = SequenceMatcher(None, string1, string2).find_longest_match(
    0, len(string1), 0, len(string2))

print(match)  # 👉️ Match(a=0, b=0, size=8)

# 👇️ one two
print(string1[match.a:match.a + match.size])

# 👇️ one two
print(string2[match.b:match.b + match.size])

我们将以下 3 个参数传递给
SequenceMatcher
类：

姓名	描述
`isjunk`	如果元素是垃圾并且应该被忽略，则返回 true 的函数。我们传递`None`给`isjunk`，因此没有元素被忽略。
`a`	要比较的序列。默认为空字符串。
`b`	要比较的序列。默认为空字符串。

The SequenceMatcher class is used to compare pairs of sequences of any type,
so long as the sequence elements are hashable.

The SequenceMatcher class returns a Match object that implements a find_longest_match() method.

The
find_longest_match
method finds the longest matching block in the provided sequences.

The arguments we passed to the method indicate that we want to find the longest
match in the entirety of a and b.

main.py


from difflib import SequenceMatcher

string1 = 'one two three four'
string2 = 'one two nine ten'

match = SequenceMatcher(None, string1, string2).find_longest_match(
    0, len(string1), 0, len(string2))

print(match)  # 👉️ Match(a=0, b=0, size=8)

# 👇️ one two
print(string1[match.a:match.a + match.size])

# 👇️ one two
print(string2[match.b:match.b + match.size])

The common substring doesn’t have to be at the beginning of the string.

main.py


from difflib import SequenceMatcher

string1 = 'four five one two three four'
string2 = 'zero eight one two nine ten'

match = SequenceMatcher(None, string1, string2).find_longest_match(
    0, len(string1), 0, len(string2))

print(match)  # 👉️ Match(a=9, b=10, size=9)

# 👇️ ' one two '
print(string1[match.a:match.a + match.size])

# 👇️ ' one two '
print(string2[match.b:match.b + match.size])

Notice that the common substring contains leading and trailing whitespace.

You can use the str.strip() method if you need to remove the leading and
trailing whitespace characters.

main.py


from difflib import SequenceMatcher

string1 = 'four five one two three four'
string2 = 'zero eight one two nine ten'

match = SequenceMatcher(None, string1, string2).find_longest_match(
    0, len(string1), 0, len(string2))

print(match)  # 👉️ Match(a=9, b=10, size=9)

# 👇️ 'one two'
print(string1[match.a:match.a + match.size].strip())

# 👇️ 'one two'
print(string2[match.b:match.b + match.size].strip())

The str.strip
method returns a copy of the string with the leading and trailing whitespace
removed.

If you only need to find a leading common substring between two strings, you can also use the os.path.commonprefix method.

main.py


import os

string1 = 'one two three four'
string2 = 'one two nine ten'

common_substring = os.path.commonprefix([string1, string2])
print(common_substring)  # 👉️ one two

The
os.path.commonprefix
method takes a list of strings and returns the longest path prefix that is a
prefix of all paths in the list.

If the list is empty, an empty string is returned.

The commonprefix() method can find the leading common substring between as
many strings as necessary.

main.py


import os

string1 = 'one two three four'
string2 = 'one two nine ten'
string3 = 'one two eight'

common_substring = os.path.commonprefix([string1, string2, string3])
print(common_substring)  # 👉️ one two

However, the method wouldn’t work if the common substring is not at the
beginning of each string.

main.py


import os

string1 = 'one two three four'
string2 = 'eight one two nine ten'

common_substring = os.path.commonprefix([string1, string2])
print(common_substring)  # 👉️ ""

In this case, you have to use the find_longest_match() method from the first
example.

–