Rabin Karp Algorithm – Character Match

When we have a string pattern and want to match it with the substring of a given string, what can we do?

The tuition way is that we could use substring() to match in a for loop. But the Time Complexity would be O(N* L), N refer to the string length, L refer to the pattern length.

But in Rabin Karp way, the Time Complexity would be O(N). We convert String into Number in this case, and use a sliding window to care only the right digit and left digit in this window. In details, we calculate the hash value for pattern, and check if the hash value same for both pattern and each substring we get.

187. Repeated DNA Sequences

The DNA sequence is composed of a series of nucleotides abbreviated as 'A''C''G', and 'T'.

  • For example, "ACGAATTCCG" is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Example 1:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]

Example 2:

Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]
enumeration
List<String> findRepeatedDnaSequences(String s) {
    int n = s.length();

    HashSet<String> seen = new HashSet(); // to store every DNA sequence we have seen
    HashSet<String> res = new HashSet<>(); // to store every DNA sqeuenct existed in seen

    for (int i = 0; i + 10 <= n; i++) {
        String subStr = s.substring(i, i + 10); // get the substring as 10 character long
        if (seen.contains(subStr)){
            res.add(subStr);
        } else {
            seen.add(subStr);
        }
    }
    return new LinkedList<>(res);
}
Rabin karp
List<String> findRepeatedDnaSequences(String s) {
    // We only care 4 Characters, so we assign integer onto them
    int[] nums = new int[s.length()];
    for (int i = 0; i < nums.length; i++) {
        switch (s.charAt(i)) {
            case 'A':
                nums[i] = 0;
                break;
            case 'G':
                nums[i] = 1;
                break;
            case 'C':
                nums[i] = 2;
                break;
            case 'T':
                nums[i] = 3;
                break;
        }
    }

    HashSet<Integer> seen = new HashSet<>(); // the interger for character sequence we have seen
    HashSet<String> res = new HashSet<>(); // the duplicate character sequence
    
    int L = 10;
 // length of sequence
    int R = 4; // quaternary
 
    // R^(L - 1)
    int RL = (int) Math.pow(R, L - 1);
    
    int windowHash = 0;
    
    
    int left = 0, right = 0;
    while (right < nums.length) {
        // add new number as the lowest digit
        windowHash = R * windowHash + nums[right];
        right++;

        // when sequence length is 10
        if (right - left == L) {
            // check if seen
            if (seen.contains(windowHash)) {

                res.add(s.substring(left, right));
            } else {

                seen.add(windowHash);
            }
            // remove the digit in the highest digit
            windowHash = windowHash - nums[left] * RL;
            left++;
        }
    }
    return new LinkedList<>(res);
}

28. Find the Index of the First Occurrence in a String

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Example 1:

Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6.
The first occurrence is at index 0, so we return 0.

Example 2:

Input: haystack = "leetcode", needle = "leeto"
Output: -1
Explanation: "leeto" did not occur in "leetcode", so we return -1.
BRUTE FORCE

int rabinKarp(String txt, String pat) {
   
    int L = pat.length();
    
    int R = 256;
    

    // for module calculate
    long Q = 1658598167;
    
    // RL = Math.pow(R, L - 1);
    long RL = 1;
    for (int i = 1; i <= L - 1; i++) {

        RL = (RL * R) % Q;
    }
    
    long patHash = 0;
    for (int i = 0; i < pat.length(); i++) {
        patHash = (R * patHash + pat.charAt(i)) % Q;
    }
    
    
    long windowHash = 0;
    
    
    int left = 0, right = 0;
    while (right < txt.length()) {
        windowHash = ((R * windowHash) % Q + txt.charAt(right)) % Q;
        right++;


        if (right - left == L) {
            if (windowHash == patHash) {

                // check hash collision
                if (pat.equals(txt.substring(left, right))) {
                    return left;
                }
            }
            

            // + Q to avoid negative integer
            windowHash = (windowHash - (txt.charAt(left) * RL) % Q + Q) % Q;

            left++;
        }
    }

    return -1;
}