Edit Distance

The edit distance algorithm is very popular among the data scientists. It's one of the basic algorithms used for evaluation of machine translation and speech recognition.

The naive approach would be to check for all possible edit sequences and choose the shortest one in-between. That would result in an exponential complexity and it's an overkill since we actually don't need to have all possible edit sequences but just the shortest one.


Here is an example for this approach in LeetCode.

Use D[i][j] refer to the edit distance for word 1 and word 2. D[i][j] = edit distance between word 1[1...i] word 2[1...j].

It turns out that one could compute D[i][j], knowing D[i - 1][j], D[i][j - 1] and D[i - 1][j - 1].


If the last character is the same, i.e. word1[i] = word2[j] then

D[i][j] = 1 + min(D[i - 1][j], D[i][j - 1], D[i - 1][j - 1] - 1)

and if not, i.e. word1[i] != word2[j] we have to take into account the replacement of the last character during the conversion.

D[i][j] = 1 + min(D[i - 1][j], D[i][j - 1], D[i - 1][j - 1])

The above assumption means:

1.When the last character is the same:
  • [i-1][j] -> [i][j]: edit distance of change i-1 of word 1 to j of word 2, plus one more distance of deleteing the i of word 1.
  • [i][j-1] -> [i][j]: edit distance of change i of word 1 to j-1 of word 2, plus one more distace of adding the j of word 2.
  • [i-1][j-1] - 1 -> [i][j]: edit distance of change i-1 of word 1 to j-1 of word 2, no need more distance of minus one first and plus one then
2.When the last character is not the same:
  • [i-1][j] -> [i][j]: edit distance of change i-1 of word 1 to j of word 2, plus one more distance of deleteing the i of word 1.
  • [i][j-1] -> [i][j]: edit distance of change i of word 1 to j-1 of word 2, plus one more distace of adding the j of word 2.
  • [i-1][j-1] -> [i][j]: edit distance of change i-1 of word 1 to j-1 of word 2, plus one more distance of replacing the i of word 1 with j of the word 2

The code is below:

class Solution {
    public int minDistance(String word1, String word2) {

        // got length of word 1 and word 2
        int len1 = word1.length();
        int len2 = word2.length();

        // if one of word is empty, need to change all of another
        if( len1 * len2 == 0 )
            return len1 + len2;

        // build the dp array 
        int [][]edit = new int[len1+1][len2+1];

        // build the basic case for change word 1 to empty
        for( int i = 0; i <= len1; i++ )
            edit[i][0] = i;

        // build the basic case for change empty to word 2
        for( int j = 0; j <= len2; j++ )
            edit[0][j] = j;

        // calculate 
        for( int i = 1; i <= len1; i++ ){ // from 1 bc it means the actually length
            for( int j = 1; j <= len2; j++ ){

                if( word1.charAt(i-1) == word2.charAt(j-1) ){
                    edit[i][j] = edit[i-1][j-1]; // no need to change 
                }
                else{
                    edit[i][j] = 1 + Math.min( edit[i-1][j-1], Math.min( edit[i-1][j], edit[i][j-1] ));
                }

            }
        }

        return edit[len1][len2];
    }
}

Leave a Reply