LeetCode 187 Repeated DNA Sequences (Python)

Posted by 小明MaxMing on October 17, 2020

题目

All DNA is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

解题思路

对于每个子串,存到set中,如果出现过,则加到结果里。可以使用RK算法进行哈希,降低复杂度,可以参考LeetCode 1044 Longest Duplicate SubstringLeetcode1044

代码

class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        seen = set()
        res = set()
        for i in range(len(s) - 9):
            tmp=s[i: i + 10]
            if tmp in seen:
                res.add(tmp)
            else:
                seen.add(tmp)
        return res
class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        L, n = 10, len(s)
        if n <= L:
            return []
        a = 4
        MOD = 2 ** 63 - 1
        aL = pow(a, L, MOD)
        to_int = {'A': 0, 'C': 1, 'G': 2, 'T': 3}
        nums = [to_int[s[i]] for i in range(n)]
        h = 0
        for i in range(L):
            h = h * a + nums[i]
        seen, res = {h, }, set()
        for i in range(1, n - L + 1):
            h = (h * a - nums[i - 1] * aL + nums[i + L - 1]) % MOD
            if h in seen:
                res.add(s[i: i + L])
            else:
                seen.add(h)
        return res

视频讲解 YouTube<--欢迎点击订阅

视频讲解 bilibili<--欢迎点击订阅