Skip to main content

Map

The MAP data structure is utilized for holding a set of Key:Value pairs, and stores data using a nested data structure of Array(Tuple(key, value)). It is appropriate in situations where the data type is constant, but the Key's value cannot be entirely ascertained.

Understanding Key:Value

The Key is of a specified basic data type, including Boolean, Number, Decimal, String, Date, or Timestamp. A Key's value cannot be Null, and duplicates are not allowed. The Value can be any data type, including nested arrays, tuples, and so on.

Map data can be generated through Key:Value pairs enclosed in curly braces or by using the Map function to convert two arrays into a Map. The Map function takes two arrays as input, where the elements in the first array serve as the keys and the elements in the second array serve as the values. See an example below:

-- Input arrays: [1, 2] and ['v1', 'v2']
-- Resulting Map: {1: 'v1', 2: 'v2'}

SELECT {'k1': 1, 'k2': 2}, map([1, 2], ['v1', 'v2']);
+-----------------+---------------------------+
| {'k1':1,'k2':2} | map([1, 2], ['v1', 'v2']) |
+-----------------+---------------------------+
| {'k1':1,'k2':2} | {1:'v1',2:'v2'} |
+-----------------+---------------------------+

Map and Bloom Filter Index

In Databend Map, a bloom filter index is created for the value with certain data types: Numeric, String, Timestamp, and Date.

This makes it easier and faster to search for values in the MAP data structure.

The implementation of the bloom filter index in Databend Map is in PR#10457.

The bloom filter is particularly effective in reducing query time when the queried value does not exist.

For example:

select * from nginx_log where log['ip'] = '205.91.162.148';
+----+----------------------------------------+
| id | log |
+----+----------------------------------------+
| 1 | {'ip':'205.91.162.148','url':'test-1'} |
+----+----------------------------------------+
1 row in set
Time: 1.733s

select * from nginx_log where log['ip'] = '205.91.162.141';
+----+-----+
| id | log |
+----+-----+
+----+-----+
0 rows in set
Time: 0.129s

Examples

Create a table with a Map column for storing web traffic data

CREATE TABLE web_traffic_data(id INT64, traffic_info MAP(STRING, STRING));

DESC web_traffic_data;
+-------------+--------------------+------+---------+-------+
| Field | Type | Null | Default | Extra |
+-------------+--------------------+------+---------+-------+
| id | INT64 | NO | | |
| traffic_info| MAP(STRING, STRING)| NO | {} | |
+-------------+--------------------+------+---------+-------+

Insert Map data containing IP addresses and URLs visited

INSERT INTO web_traffic_data VALUES(1, {'ip': '192.168.1.1', 'url': 'example.com/home'}),
(2, {'ip': '192.168.1.2', 'url': 'example.com/about'}),
(3, {'ip': '192.168.1.1', 'url': 'example.com/contact'});

Query

SELECT * FROM web_traffic_data;

+----+-----------------------------------+
| id | traffic_info |
+----+-----------------------------------+
| 1 | {'ip':'192.168.1.1','url':'example.com/home'} |
| 2 | {'ip':'192.168.1.2','url':'example.com/about'} |
| 3 | {'ip':'192.168.1.1','url':'example.com/contact'} |
+----+-----------------------------------+

Query the number of visits per IP address

SELECT traffic_info['ip'] as ip_address, COUNT(*) as visits
FROM web_traffic_data
GROUP BY traffic_info['ip'];

+-------------+--------+
| ip_address | visits |
+-------------+--------+
| 192.168.1.1 | 2 |
| 192.168.1.2 | 1 |
+-------------+--------+

Query the most visited URLs

SELECT traffic_info['url'] as url, COUNT(*) as visits
FROM web_traffic_data
GROUP BY traffic_info['url']
ORDER BY visits DESC
LIMIT 3;

+---------------------+--------+
| url | visits |
+---------------------+--------+
| example.com/home | 1 |
| example.com/about | 1 |
| example.com/contact | 1 |
+---------------------+--------+