When we have a string pattern and want to match it with the substring of a given string, what can we do?
The tuition way is that we could use substring() to match in a for loop. But the Time Complexity would be O(N* L), N refer to the string length, L refer to the pattern length.
But in Rabin Karp way, the Time Complexity would be O(N). We convert String into Number in this case, and use a sliding window to care only the right digit and left digit in this window. In details, we calculate the hash value for pattern, and check if the hash value same for both pattern and each substring we get.
187. Repeated DNA Sequences
The DNA sequence is composed of a series of nucleotides abbreviated as 'A'
, 'C'
, 'G'
, and 'T'
.
- For example,
"ACGAATTCCG"
is a DNA sequence.
When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s
that represents a DNA sequence, return all the 10
-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Example 1:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC","CCCCCAAAAA"]
Example 2:
Input: s = "AAAAAAAAAAAAA" Output: ["AAAAAAAAAA"]
enumeration
List<String> findRepeatedDnaSequences(String s) {
int n = s.length();
HashSet<String> seen = new HashSet(); // to store every DNA sequence we have seen
HashSet<String> res = new HashSet<>(); // to store every DNA sqeuenct existed in seen
for (int i = 0; i + 10 <= n; i++) {
String subStr = s.substring(i, i + 10); // get the substring as 10 character long
if (seen.contains(subStr)){
res.add(subStr);
} else {
seen.add(subStr);
}
}
return new LinkedList<>(res);
}
Rabin karp
List<String> findRepeatedDnaSequences(String s) {
// We only care 4 Characters, so we assign integer onto them
int[] nums = new int[s.length()];
for (int i = 0; i < nums.length; i++) {
switch (s.charAt(i)) {
case 'A':
nums[i] = 0;
break;
case 'G':
nums[i] = 1;
break;
case 'C':
nums[i] = 2;
break;
case 'T':
nums[i] = 3;
break;
}
}
HashSet<Integer> seen = new HashSet<>(); // the interger for character sequence we have seen
HashSet<String> res = new HashSet<>(); // the duplicate character sequence
int L = 10;
// length of sequence
int R = 4; // quaternary
// R^(L - 1)
int RL = (int) Math.pow(R, L - 1);
int windowHash = 0;
int left = 0, right = 0;
while (right < nums.length) {
// add new number as the lowest digit
windowHash = R * windowHash + nums[right];
right++;
// when sequence length is 10
if (right - left == L) {
// check if seen
if (seen.contains(windowHash)) {
res.add(s.substring(left, right));
} else {
seen.add(windowHash);
}
// remove the digit in the highest digit
windowHash = windowHash - nums[left] * RL;
left++;
}
}
return new LinkedList<>(res);
}
28. Find the Index of the First Occurrence in a String
Given two strings needle
and haystack
, return the index of the first occurrence of needle
in haystack
, or -1
if needle
is not part of haystack
.
Example 1:
Input: haystack = "sadbutsad", needle = "sad" Output: 0 Explanation: "sad" occurs at index 0 and 6. The first occurrence is at index 0, so we return 0.
Example 2:
Input: haystack = "leetcode", needle = "leeto" Output: -1 Explanation: "leeto" did not occur in "leetcode", so we return -1.
BRUTE FORCE
int rabinKarp(String txt, String pat) {
int L = pat.length();
int R = 256;
// for module calculate
long Q = 1658598167;
// RL = Math.pow(R, L - 1);
long RL = 1;
for (int i = 1; i <= L - 1; i++) {
RL = (RL * R) % Q;
}
long patHash = 0;
for (int i = 0; i < pat.length(); i++) {
patHash = (R * patHash + pat.charAt(i)) % Q;
}
long windowHash = 0;
int left = 0, right = 0;
while (right < txt.length()) {
windowHash = ((R * windowHash) % Q + txt.charAt(right)) % Q;
right++;
if (right - left == L) {
if (windowHash == patHash) {
// check hash collision
if (pat.equals(txt.substring(left, right))) {
return left;
}
}
// + Q to avoid negative integer
windowHash = (windowHash - (txt.charAt(left) * RL) % Q + Q) % Q;
left++;
}
}
return -1;
}