In-depth Understanding of Redis Data Structures - Dictionaries

A dictionary, also known as a symbolic table, associative array, or mapping, is an abstract data structure used to hold key-value pairs. In a dictionary, a key can be associated with a value, and these associated keys and values are called key-value pairs. The keys in a key-value pair are unique, and we can look up or update the corresponding value by mapping the key to the value.

Many advanced development languages have collections that support dictionaries, such as the Map collection in Java, C does not have a built-in data structure like a dictionary, and Redis has built its own dictionary implementation.

Applications

Dictionaries are widely used in Redis, and the Redis database uses dictionaries as the underlying implementation of the data. Adding, deleting, changing, and checking data is also built on top of dictionaries.

When executing the command.

1	set msg "hello"

Create a key-value pair in the database with the key msg and the value hello, this key-value pair is implemented using a dictionary. Creating key-value pairs of other data types, such as list, hash, set, sort set, is also implemented using dictionaries.

A dictionary is also one of the underlying implementations of the hash data type, for example, a hash data type website contains 100 key-value pairs where the key is the URL name and the value is the web address:

redis> HGETALL website
"Redis"
"Redis.io"
"nacos"
"nacos.io"
.....

The underlying website key is a dictionary that wraps up to 100 key-value pairs, e.g.

The key in the key-value pair is “Redis” and the value is “Redis.io”.
The key in the key-value pair is “nacos”, and the value is “nacos.io”.

Dictionary implementation

Redis dictionaries use a hash table as the underlying implementation. A hash table has multiple hash table nodes inside it, and each hash table node holds the key-value pairs in the dictionary.

Hash Table

The hash table used by the Redis dictionary is represented by the dict.h/dictht structure.

/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
     // table array
    dictEntry **table;
    // Size of the hash table
    unsigned long size;
    // equal to size-1, used to calculate the index value
    unsigned long sizemask;
    // Number of existing key-value pairs
    unsigned long used;
} dictht;

Note: This is the hash table structure, where each dictionary has two implementations of incremental re-hashing from the old hash table to the new one.

The table attribute is an array, each element of the array points to the hash node dictEntry, and each dictEntry structure holds a key-value pair.
The size records the size of the hash table, which is the size of the table array.
The used attribute records the number of key-value pairs currently available in the hash table. sizemask has a value equal to size-1, and this attribute, together with the hash table, determines where the keys should be placed in the table array.

The following figure shows an empty hash table of size 4 (no key-value pairs containing tasks).

Hash Table Nodes

Hash table nodes are represented using dictEntry structures, each of which holds a key-value pair.

typedef struct dictEntry {
    // key
    void *key;
    // value   
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
    } v;
    // Points to the next hash table node, forming a chain table
    struct dictEntry *next;
} dictEntry;

Among them.

key holds the key value
v holds the value. v can be a pointer to either a uint64_t integer or an int64_t integer.
next A pointer to another hash table node, this pointer joins multiple key-value pairs with the same hash value together as a way to solve the hash conflict problem.

The following figure shows two hash table nodes k0 and k1 with the same hash value of the key, which are connected together by the next pointer.

Dictionaries

The dictionary in Redis is represented by the dict.h/dict structure.

typedef struct dict {
    // Type-specific functions
    dictType *type;
   // Private functions
    void *privdata;
    // Hash Table
    dictht ht[2];
    // rehash index
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int iterators; /* number of iterators currently running */
} dict;

The type attribute and privadata attribute are set for different types of key-value pairs, for creating polymorphic dictionaries.
The type is a pointer to a dictType structure, and each dictType contains several sets of functions that operate on key-value pairs of a specific type, and Redis sets different functions for dictionaries with different purposes. The following diagram shows the dictType dictionary types.

typedef struct dictType {
  // Calculating hash values
  unsigned int (*hashFunction)(const void *key);
  // Copy key
  void *(*keyDup)(void *privdata, const void *key);
  // Copy Value
  void *(*valDup)(void *privdata, const void *obj);
  // Contrast Key
  int (*keyCompare)(void *privdata, const void *key1, const void *key2);
  // Destruction key
  void (*keyDestructor)(void *privdata, void *key);
 // Destruction value
 void (*valDestructor)(void *privdata, void *obj);
} dictType;

The privdata attribute holds optional arguments passed by the function for different type operations.
ht[2] is an array containing two number sizes, of type dictht hash table. The dictionary only uses the ht[0] hash table, and ht[1] will only be used when rehashing the ht[0] hash table.
The rehashidx records the progress of the rehash and has a value of -1 if there is no rehash currently in progress.

The following figure shows a dictionary in a normal state (without rehash).

Hashing algorithm

When a new key-value pair is to be added to the dictionary, the program needs to first calculate the hash value and index value based on the keys in the key-value pair, and then place the hash table containing the new key value at the specified index of the hash table array based on the index value.

The steps for Redis to calculate the hash and index values are as follows.

Calculates the hash of the key using the hash function set by the dictionary.

hash = dict->type->hashFunction(key)
Use the sizemask attribute of the hash table and the hash value to compute the hash value by taking the remainder.

index = hash & dict ->ht[x].sizemask

Students who have understood the underlying principle of HashMap should know that the above calculation of index values and HashMap to find the index subscript is similar to the principle.

What is remainder & operation?
The remainder is to calculate the remainder of the division of two numbers, for example, the length of an array is 4, the index range is 0 ~ 3, you need to place 0, 1, 7, placed as follows.

As an example, to add a key-value pair k0 and v0 to the empty dictionary table below.

First calculate the hash value of the key.

1	hash = dict->type->hashFunction(key)

Calculate the hash value of key k0.
Suppose the calculated hash value is 8, then calculate the index value.

1	index = dict -> hash & ht[0].sizemask = 8 & 3 = 0;

The index value 0 of key k0 is calculated, which indicates that the nodes of the key-value pair k0 and v0 are placed to the position of subscript 0 of the hash table array, as shown in the following figure.

Key Conflict

A key conflict occurs when two or more computed array index values agree.

Redis hash table uses the chain table method to solve the key conflict, each hash table node has a next pointer, multiple hash table nodes form a single linked table with the next pointer, and multiple nodes that are assigned to the same array index are connected using a one-way chain table, which solves the key conflict problem well.

For example, the program wants to add a key-value pair k2 and v2 to the following hash table, and calculate the index value of k2 as 2, then the keys k1 and k2 will conflict.

The solution to the conflict is to use the next pointer to connect the node where k2 and k1 are located, as shown in the following figure.

Summary

A dictionary is a mapped key-value pair data structure in which the keys are unique and the corresponding values can be found quickly by the unique keys.
Dictionaries are widely used in Redis databases.
- All key-value pairs of these data types use dictionaries as the underlying implementation.
- Hash type key-value pairs are also based on dictionary implementation.
Structure of the dictionary
- A dictionary contains a dictionary with a function dictType, two hash tables ht[2], ht[0] is used only for the dictionary, ht[1] is used only when it encounters expansion.
- A dictionary contains two hash tables, each hash table dictht contains a table array, size records the size of the array, sizemask is equal to size - 1, sizemask and hash value determine the location of the data in the table. used records the existing key-value pairs.
- The hash table node dictEntry structure holds a key-value pair, where key holds the key, V holds the value, V can be a pointer, can be uint64_t integer, can also be int64_t integer. next is to solve the key hash conflict, two keys calculated index in the same place in the array, use next ` pointer connected together.
To add a key-value pair, first calculate the hash value of the key by calling dict->type->hashFunction(key), then do the remainder operation with dictht’s sizemask, and calculate the index position of the table array to be stored. If a key conflict occurs, use the chain method to form a single linked table with multiple hash nodes via the next pointer.
The Redis dictionary implementation and the HashMap data structure in Java have the following similarities.
- Determine the index location: The key first uses the hash algorithm to calculate the hash value, and then does a remainder operation with the length of the array-1 to determine the subscript of the stored array.
- Resolve hash conflicts: two key values are calculated with the same index, using the chain method, multiple nodes are linked together by the next pointer.