The string 's' is hashed once for each of the 'k' hash functions, as though we were planning to add it to the filter. Instead of adding it however, we examine the bit that we would have set, and consider its value.
If the bit is 1 (set), the string we are hashing may be in the filter, since it would have set this bit when it was originally hashed. However, it may also be that another string just happened to produce a hash value that would also set this bit. That would be a false positive. This is why we have k > 1, so we can minimize the likelihood of false positives occuring.
If every bit corresponding to every one of the k hashes of our query string is set, we can say with some probability of being correct that the string we are holding is indeed "in" the filter. However, we can never be sure.
If, however, as we hash our string and peek at the resulting bit in the filter, we find the bit is 0 (not set)... well now, that's different. In this case, we can say with absolute certainty that the string we are holding is not in the filter, because if it were, this bit would have to be set.
In this way, the Bloom filter can answer NO with absolute surety, but can only speak a qualified YES.
|bloom||Bloom filter |
|buf||string to check |
|len||the length of the string |
- false if string does not exist in the filter
true if string is may be in the filter