Home Bitmap and Bloom Filter
Post
Cancel

Bitmap and Bloom Filter


Contents


What is a Bitmap?

A bitmap is an array of bits that efficiently represents data in binary form. Each bit corresponds to a particular value or condition, and the state of the bit (0 or 1) indicates the absence or presence of that value.

Bitmaps can use a very small space to quickly determine if an element is in a certain set.

Advantages of Bitmaps

  • Memory Efficiency: Bitmaps are highly space-efficient, especially when representing large sets of binary data.
  • Fast Operations: Bit-level operations are extremely fast, making bitmaps ideal for high-performance applications.
  • Simple Implementation: The structure and manipulation of bitmaps are straightforward, allowing for easy implementation and use.

Applications of Bitmaps

Bitmaps are used in various applications, including:

  1. Image Representation: In graphics, bitmaps represent images, where each pixel’s color is stored as a series of bits.
  2. Data Compression: Bitmaps are used in data compression techniques to represent repetitive data efficiently.
  3. Membership Testing: In databases and information retrieval, bitmaps quickly check the existence of elements within a set.
  4. Network Routing: Bitmaps help manage routing tables and efficiently handle network paths.
  5. Bloom Filters: Bitmaps are a core component of bloom filters, used for probabilistic membership testing.

How Bitmaps Work

Basic Operations

  1. Setting a Bit: To set a bit at a specific position, use the bitwise OR operation.
  2. Clearing a Bit: To clear a bit, use the bitwise AND operation with a mask.
  3. Checking a Bit: To check the state of a bit, use the bitwise AND operation.

Example in Java

Here’s a simple example of how to use a bitmap in Java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public class Bitmap {
    private byte[] bitmap; // we can also use int[], long[], char[], etc.

    public Bitmap(int size) {
        bitmap = new byte[(size + 7) / 8]; // the byte holds 8 bits
    }

    public void setBit(int position) {
        int index = position / 8;
        int offset = position % 8;
        bitmap[index] |= (1 << offset);
    }

    public void clearBit(int position) {
        int index = position / 8;
        int offset = position % 8;
        bitmap[index] &= ~(1 << offset);
    }

    public boolean checkBit(int position) {
        int index = position / 8;
        int offset = position % 8;
        return (bitmap[index] & (1 << offset)) != 0;
    }
}

Bloom Filter: Applications of Bitmap

A Bloom filter is a binary array of m bits, all initially set to 0, accompanied by k hash functions. Each hash function maps elements to a bit in the array.

Bloom filters allow for false positives (indicating an element is in the set when it is not) but no false negatives (an element not being in the set when it is). In other words, Bloom filters can be used to query whether an element is “probably in the set” or “definitely not in the set”.

Why do We Need Bloom Filters?

  • Because if the range of the element to be checked is too large, it will lead to low space utilisation of the bitmap.
    • For example, if we want to check if a number in the range of 1 billion is in the bitmap, then we need 1 billion bits of memory. If the number to be checked is only 100, then there is a lot of wasted space

A simple solution to the above problem: hash the number to force it to a smaller fixed range. But in this way, the probability of hash conflict is too high (i.e. the probability of false positive is too high).

  • That’s why we need Bloom filters.
    • Since there is a high probability of hash conflict with a single hash function, the probability of hash conflict (false positive) can be greatly reduced by using multiple hash functions.

Note: Using the Bloom filter, the probability of false positive can only be reduced, not eliminated.

false negatives: we only added 146 and 196, but 177 will check as positive

How Bloom Filters Work

  1. Insertion: To add an element, compute k hash values for the element, each determining a position in the bit array. Set the bits at these positions to 1.
  2. Query: To check if an element is in the set, compute its k hash values. If all corresponding bits are 1, the element is probably in the set. If any bit is 0, the element is definitely not in the set.

Example in Java

Here’s a simple example of implementing a Bloom filter in Java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
public class BloomFilter {
    private BitSet bitSet;
    private int bitArraySize;
    private int numberOfHashFunctions;

    public BloomFilter(int size, int hashFunctions) {
        bitSet = new BitSet(size);
        bitArraySize = size;
        numberOfHashFunctions = hashFunctions;
    }

    private int[] hash(Object item) {
        int[] hashes = new int[numberOfHashFunctions];
        Random random = new Random(item.hashCode());
        for (int i = 0; i < numberOfHashFunctions; i++) {
            hashes[i] = random.nextInt(bitArraySize);
        }
        return hashes;
    }

    public void add(Object item) {
        int[] hashes = hash(item);
        for (int hash : hashes) {
            bitSet.set(hash);
        }
    }

    public boolean contains(Object item) {
        int[] hashes = hash(item);
        for (int hash : hashes) {
            if (!bitSet.get(hash)) {
                return false;
            }
        }
        return true;
    }
}

Limitations of Bloom Filters

  • False Positives: May incorrectly indicate that an element is in the set.
  • Non-Removable Elements: Once an element is added, it cannot be removed without potential errors.



Reference:

  • Wang, Zheng (2019) The Beauty of Data Structures and Algorithms. Geek Time.
This post is licensed under CC BY 4.0 by the author.
ip