From SQL to Redis Searches: an introduction

Indiana Redis

In my previous post I shared a few methods to export a table from MySQL (or any other relational database) and import the rows into Redis Server in the form of Redis Hashes, which is a convenient Redis data structure for this purpose. Now the next step is understanding how to operate with this Redis data type to perform the most common searches. So let’s see a few examples of common SQL queries and how to achieve the same results in Redis.

In the following examples I will use the world database to perform the example and whose table city was imported in the previous post into Redis. The definition is:

CREATE TABLE `city` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `country` (`Code`)
) ENGINE=InnoDB AUTO_INCREMENT=4080 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Primary key lookup

In MySQL I can search by primary key (which is defined on the auto increment ID field) like:

mysql> SELECT * FROM world.city WHERE ID=5;
+----+-----------+-------------+---------------+------------+
| ID | Name      | CountryCode | District      | Population |
+----+-----------+-------------+---------------+------------+
|  5 | Amsterdam | NLD         | Noord-Holland |     731200 |
+----+-----------+-------------+---------------+------------+
1 row in set (0.01 sec)

In Redis, I can use HGETALL to get all the fields for this hash:

127.0.0.1:4321> HGETALL city:5
1) "Name"
2) "Amsterdam"
3) "CountryCode"
4) "NLD"
5) "District"
6) "Noord-Holland"
7) "Population"
8) "731200"

I can also refine the SELECT and get only a couple of columns:

mysql> SELECT Name, District FROM world.city WHERE ID=5;
+-----------+---------------+
| Name      | District      |
+-----------+---------------+
| Amsterdam | Noord-Holland |
+-----------+---------------+
1 row in set (0.00 sec)

And in Redis I can use HMGET (or HGET for a single field):

127.0.0.1:4321> HMGET city:5 Name District
1) "Amsterdam"
2) "Noord-Holland"

Secondary key lookup

Let’s try an unindexed scan in MySQL over the Name column, which has not a secondary index:

mysql> SELECT Name, District FROM world.city WHERE Name LIKE "New %";
+-------------+---------------+
| Name        | District      |
+-------------+---------------+
| New Bombay  | Maharashtra   |
| New Delhi   | Delhi         |
| New York    | New York      |
| New Orleans | Louisiana     |
| New Haven   | Connecticut   |
| New Bedford | Massachusetts |
+-------------+---------------+
6 rows in set (0.00 sec)

Let’s also search for cities with the desired population range:

mysql> SELECT Name, Population FROM world.city WHERE Population BETWEEN 1000 AND 1500;
+--------------+------------+
| Name         | Population |
+--------------+------------+
| Hamilton     |       1200 |
| Agaña        |       1139 |
| Longyearbyen |       1438 |
| Monaco-Ville |       1234 |
| Jamestown    |       1500 |
| Mata-Utu     |       1137 |
+--------------+------------+
6 rows in set (0.00 sec)

While in a relational database it is possible to search and filter on a column value regardless of the presence of a secondary index on a column (leading to a performance penalty), searches over Redis Hashes need an index. The HSCAN command can scan internal fields of a single Hash, which has no use in the data representation we have here. In other words, there is no cross-Hash scan in Redis. Redis Server core has no out-of-the-box indexing capability, but this can be sorted out either creating and managing an index, or using the RediSearch module. Let’s go in order and let’s see how a self maintained key works. In the first place, let’s summarize the main idea with this quote from the related document on secondary indexing with Redis, consult the document to learn how indexing works in Redis.

It is fair to say that, natively, Redis only offers primary key access. However since Redis is a data structures server, its capabilities can be used for indexing, in order to create secondary indexes of different kinds, including composite (multi-column) indexes.

Homegrown secondary indexes in Redis

So let’s say that I want to be able to retrieve cities by Name and by Population. In this case I will have to create and maintain two indexes. I will use a Sorted Set for both indexes, because it has unique features to simplify life with this task.

  • city_name_idx is a Sorted Set used as a lexicographical index. Setting the score to zero, the elements are ordered alphabetically. It is used for prefix searches.
  • city_population_idx is another Sorted Set used as a leaderboard, being the score the population of the city. It is used for range searches.

Check the script for index creation.

#!/usr/bin/python3
import redis

r = redis.Redis(host='127.0.0.1', port=4321, charset="utf-8", decode_responses=True)
r.delete('city_name_idx')
r.delete('city_population_idx')

for value in r.scan_iter(match="city:*"):
    tmp = r.hmget(value, ['Name', 'District', 'Population'])
    r.zadd("city_name_idx", {tmp[0]+":"+tmp[1]+":"+value.split(":")[1] : 0})
    r.zadd("city_population_idx", {tmp[0]+":"+value.split(":")[1] : tmp[2]})

Now I can test my indexes and perform the equivalent searches. See how the city is returned together with the district because I constructed the index as a composite index. The Hash ID is returned as well, so I can retrieve the containing Hash in case I need it (this is the same as in a typical relational database, where a secondary index embeds also the primary key). I can add the desired fields into the index, and I will retrieve them without the need to read the whole Hash (a sort of covering index).

127.0.0.1:4321> ZRANGE city_name_idx "[New " "[New z" BYLEX
New Bedford:Massachusetts:4044
New Bombay:Maharashtra:1106
New Delhi:Delhi:1109
New Haven:Connecticut:3971
New Orleans:Louisiana:3823
New York:New York:3793

Using the index as a leaderboard, I can simply get cities within a range of inhabitants:

127.0.0.1:4321> ZRANGE city_population_idx 1000 1500 BYSCORE WITHSCORES
 1) "Mata-Utu:3536"
 2) "1137"
 3) "Aga\xc3\xb1a:921"
 4) "1139"
 5) "Hamilton:191"
 6) "1200"
 7) "Monaco-Ville:2695"
 8) "1234"
 9) "Longyearbyen:938"
10) "1438"
11) "Jamestown:3063"
12) "1500"

Note: redis-cli formats the outputs as bytes composing the name of the city (UTF8 hexadecimal, where C3B1 corresponds to ‘ñ’ in the previous example, check it in a converter), you can use --raw option to see the city name printed correctly.

Or cities with at most 1500 inhabitants:

127.0.0.1:4321> ZRANGE city_population_idx 0 1500 BYSCORE WITHSCORES

Or cities with more than 1500 inhabitants:

127.0.0.1:4321> ZRANGE city_population_idx 1500 +inf BYSCORE WITHSCORES

Wrapping up, using Sorted sets I create secondary indexes to sort fields alphabetically or numerically. This is the equivalent of a composite index.

Maintainance of a secondary index in Redis

Whatever change I make to data in the Hashes (modify population in an existing city, add new cities etc..) I will have to update the index as well. If I am not using the RediSearch module, index maintenance must be performed, and performing atomic data change and index update is desired. You can achieve it using a MULTI/EXEC operation so both the Hash and the corresponding index are updated atomically. Check this resource for an example.

The RediSearch module

As the main page of RediSearch states, RediSearch is a source available Secondary Index, Query Engine and Full-Text Search over Redis, developed by Redis. Meaning that this is a fully baked module you can load in the Redis Server core and extend Redis to have powerful search capabilities. Let’s make a few examples but first of all let’s see how to install it.

Docker image with RediSearch module

You can choose either RediSearch Docker image to deploy a container with a Redis Server and RediSearch module, or even deploy redismod, a Docker image with select Redis modules. Let’s go with the former image, and let’s reuse the city.csv that can be imported as usual.

docker run -p 6379:6379 redislabs/redisearch:latest

Now you can check if the module is available.

127.0.0.1:6379> MODULE LIST
1) 1) "name"
   2) "search"
   3) "ver"
   4) (integer) 20015

Free Redis Cloud server with RediSearch

You can also get a forever free 30MB Redis database with the chosen module, so you can experiment freely. Browse to the instructions to create your free subscription.

Test searches on secondary index

See how easy it is to create a composite index on Name, District and Population for all the city: Hashes that is also sortable. The following will index existing Hashes.

FT.CREATE city_name_population_idx PREFIX 1 "city:" SCHEMA Name TEXT District TEXT Population NUMERIC SORTABLE

Refer to the FT.CREATE command description. Searching for cities and district by name can be accomplished simply with:

FT.SEARCH city_name_population_idx "@Name:New*" RETURN 2 Name District SORTBY Name

Whereas a search by population is:

FT.SEARCH city_name_population_idx "@Population:[1000 1500]" RETURN 2 Name Population SORTBY Population

Refer to the FT.SEARCH command description. Remove the index using:

FT.DROPINDEX city_name_population_idx

Test aggregation

Before leaving, let’s see how to use RediSearch to perform aggregations in the classical GROUP BY and aggregate functions fashion. Imagine that I want to retrieve the five most populated areas by CountryCode, sorted in descending order:

mysql> SELECT CountryCode,SUM(Population) AS sum FROM city GROUP BY CountryCode ORDER BY sum DESC LIMIT 5;
+-------------+-----------+
| CountryCode | sum       |
+-------------+-----------+
| CHN         | 175953614 |
| IND         | 123298526 |
| BRA         |  85876862 |
| USA         |  78625774 |
| JPN         |  77965107 |
+-------------+-----------+
5 rows in set (0.01 sec)

RediSearch supports complex aggregations using the FT.AGGREGATE, so it is easy to create a new index:

FT.CREATE city_countrycode_idx PREFIX 1 "city:" SCHEMA CountryCode TEXT Population NUMERIC SORTABLE

And rewrite the query as follows:

127.0.0.1:6379> FT.AGGREGATE city_countrycode_idx * GROUPBY 1 @CountryCode REDUCE SUM 1 @Population AS sum SORTBY 2 @sum DESC LIMIT 0 5
1) (integer) 232
2) 1) "CountryCode"
   2) "CHN"
   3) "sum"
   4) "175953614"
3) 1) "CountryCode"
   2) "IND"
   3) "sum"
   4) "123298526"
4) 1) "CountryCode"
   2) "BRA"
   3) "sum"
   4) "85876862"
5) 1) "CountryCode"
   2) "USA"
   3) "sum"
   4) "78625774"
6) 1) "CountryCode"
   2) "JPN"
   3) "sum"
   4) "77965107"

Wrapping up

Redis Server offers plenty of capabilities to manage indexes manually (or even inverted indexes for full text searches). In this post I have gone through basic examples, but the options to retrieve data using the core data types are unlimited.

If you want to take advantage of advanced features for searches, you can use RediSearch with Redis OSS, Redis Enterprise and Redis Enterprise Cloud, taking searches to a new level. In these examples I have used redis-cli, check client libraries such as redis-py to test the features of RediSearch using Python.

Finally, RediSearch is not just about range or prefix searches, or aggregation queries. There are multiple types of searches, such as phonetic matching, autocompletion suggestions, geo searches or the spellchecker to help design great applications. I will cover them in a future post.

Redis is a registered trademark of Redis Ltd. Any rights therein are reserved to Redis Ltd. Any use by mortensi is for referential purposes only and does not indicate any sponsorship, endorsement or affiliation between Redis and mortensi.

Leave A Comment