Wed Dec 08, 2010 5:57 am by admin 


Say you have one string of alphabetic characters, and say you have another, guaranteed smaller string of alphabetic characters. Algorithmically speaking, what's the fastest way to find out if all the characters in the smaller string are in the larger string?
For example, if the two strings were:
String 1: ABCDEFGHLMNOPQRS
String 2: DCGSRQPOM
You'd get true as every character in string2 is in string1. If the two strings were:
String 1: ABCDEFGHLMNOPQRS
String 2: DCGSRQPOZ
you'd get false as Z isn't in the first string.
The naive way to do this operation would be to iterate over the 2nd string once for each character in the 1st string. That'd be O(n*m) in algorithm parlance where n is the length of string1 and m is the length of string2. Given the strings in our above example, thats 16*8 = 128 operations in the worst case.
A slightly better way would be to sort each string and then do a stepwise iteration of both sorted strings simultaneously. Sorting both strings would be (in the general case) O(m log m) + O(n log n) and the linear scan after that is O(m+n). Again for our strings above, that would be 16*4 + 8*3 = 88 plus a linear scan of both strings at a cost of 16 + 8 = 24. Thats 88 + 24 = 112 total operations. Slightly better. (As the size of the strings grow, this method would start to look better and better)
Finally, the best method would simply be O(n+m). That is, iterate through the first string and put each character in a hashtable (cost of O(n) or 16). Then iterate the 2nd string and query the hashtable for each character you find. If its not found, you don't have a match. That would cost 8 operations  so both operations together is a total of 24 operations. Not bad and way better than the other solutions.
Is there another solution?
What if  given that we have a limited range of possible characters  I assigned each character of the alphabet to a prime number starting with 2 and going up from there. So A would be 2, and B would be 3, and C would be 5, etc. And then I went through the first string and 'multiplied' each character's prime number together. You'd end up with some big number right? And then  what if I iterated through the 2nd string and 'divided' by every character in there. If any division gave a remainder  you knew you didn't have a match. If there was no remainders through the whole process, you knew you had a subset. Would that work? 




Mon Jan 10, 2011 1:30 am by harryaskham 


The primes solution fails if the second string contains, say, 2 Ds when the original string has only 1  you would try and divide by the corresponding prime twice when it exists only once as a prime factor, giving a remainder.
For what it's worth, I think a hashing solution is overkill, given the space of only 26 characters  collisions would be very unlikely. Instead, a similar solution would be to simply keep a size26 array of booleans  iterate over the first string, and set booleans[letterCode] to true for each. Then, you can simply iterate over the second string, returning false if booleans[letterCode] is false for any character, and true after the iteration has completed without returning.
So basically a hash table, but since the key is the value, it itself can be used as the index (I tried both in Java and although both were fast, the arraybacked solution used about 1/3 the time). 




Thu Sep 08, 2016 1:55 am by Zyxion 


The method of assigning primes works even if the second string has, say, 2 d's while the first string only has one if you simply maintain the one large m,multiplied together value. So, if the multiplied value from the first string is some big number, you just go to each letter in the second string and do the big number divided by the letter's corresponding prime. So for each d you do the same mathematical operation, N/d. Two d's just means you do it twice, you don't divide by d then use the result of that division for the next letter. 






All times are GMT

Page 1 of 1 
