Percona Database Performance Blog (mysqlperf) wrote,
Percona Database Performance Blog
mysqlperf

Storing and Using JSON Within PostgreSQL Part One

https://www.percona.com/blog/storing-and-using-json-within-postgresql-part-one/

https://www.percona.com/blog/?p=77983

Storing JSON PostgreSQL

Continuing our series on storing JSON directly within your database, we are now looking at the JSON functionality built into PostgreSQL.  You can see the first two posts in this series exploring MySQL’s JSON functionality with Storing JSON in Your Databases: Tips and Tricks For MySQL Part One and Storing JSON in Your Databases: Tips and Tricks For MySQL Part Two.  I used the exact same table structures and datasets as I did within my MySQL tests.  You can get the instructions to follow along at home here on GitHub.

PostgreSQL has two JSON datatypes available to you.  First JSON, and second JSONB.  The docs highlight the differences pretty well:

JSON JSON B Compare

Basically, JSONB is stored decomposed, making it more efficient for the database to interact with, while the JSON type is stored as an exact text copy of what was input.  This means formatting, white space, etc., will be lost with JSONB.  That said, the suggestion is to generally use JSONB.  The trade-off being slightly slower inserts vs. faster overall read performance (on top of the storage differences mentioned above).

In terms of performance, how much of a difference is there?  That is tricky, as it will depend heavily on the size and complexity of the JSON document being stored.  The larger, more complex the document, the larger the difference you could see.  But to illustrate an example, we will use our movie JSON to select the same data from both JSON and JSONB.

I created the following tables and loaded them with movie-related JSON:

create table movies_json (
	ai_myid serial primary key, 
	imdb_id varchar(255),
	json_column json not null
);

create unique index movies_json_imdb_idx on movies_json(imdb_id);
CREATE INDEX gin_index ON movies_json USING gin (jsonb_column);


create table movies_jsonb (
	ai_myid serial primary key, 
	imdb_id varchar(255),
	jsonb_column jsonb not null
);

create unique index movies_jsonb_imdb_idx on movies_jsonb(imdb_id);
CREATE INDEX movies_jsonb_gin_index ON movies_jsonb USING gin (json_column);

Querying the JSON column results in slower retrieval:

movie_json_test=# explain (verbose true, analyze true) select * from movies_json where json_column->>'title' = 'Avengers: Endgame (2019)';
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..55775.85 rows=1880 width=1059) (actual time=694.047..2516.879 rows=1 loops=1)
   Output: ai_myid, imdb_id, json_column
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on public.movies_json  (cost=0.00..54587.85 rows=783 width=1059) (actual time=1905.511..2512.010 rows=0 loops=3)
         Output: ai_myid, imdb_id, json_column
         Filter: ((movies_json.json_column ->> 'title'::text) = 'Avengers: Endgame (2019)'::text)
         Rows Removed by Filter: 125119
         Worker 0:  actual time=2511.276..2511.277 rows=0 loops=1
         Worker 1:  actual time=2511.322..2511.322 rows=0 loops=1
 Planning Time: 0.166 ms
 Execution Time: 2516.897 ms
(12 rows)

While the JSONB column is faster (3x):

movie_json_test=# explain (verbose true, analyze true) select * from movies_jsonb where jsonb_column->>'title' = 'Avengers: Endgame (2019)';
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..54116.60 rows=1873 width=1025) (actual time=723.324..726.914 rows=1 loops=1)
   Output: ai_myid, imdb_id, jsonb_column
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on public.movies_jsonb  (cost=0.00..52929.30 rows=780 width=1025) (actual time=548.982..721.730 rows=0 loops=3)
         Output: ai_myid, imdb_id, jsonb_column
         Filter: ((movies_jsonb.jsonb_column ->> 'title'::text) = 'Avengers: Endgame (2019)'::text)
         Rows Removed by Filter: 125119
         Worker 0:  actual time=720.995..720.995 rows=0 loops=1
         Worker 1:  actual time=202.751..720.994 rows=1 loops=1
 Planning Time: 0.038 ms
 Execution Time: 726.933 ms
(12 rows)

To ensure this is not a single data point I wrote a small script to run this test (with this title and other random titles).  Over 100 runs we had pretty consistent results:

JSON JSONB
Average Time: 2.5492</p>

Min Time: 2.5297428970225155

Max Time: 2.56536191503983

Average Time: 0.747

Min Time: 0.7297536049736664

Max Time: 0.7827945239841938

As stated, insert/updates will lose some of that performance, so keep that in mind.  It is, however, easier to interact and update things in a JSONB datatype.  For instance, some functions are only available for JSONB data types, like the JSONB_SET function which allows you to in-place update your JSON.

movie_json_test=# explain (verbose true, analyze true) update movies_jsonb set jsonb_column= jsonb_set(jsonb_column, '{imdb_rating}', '9') where jsonb_column->>'title' = 'Avengers: Endgame (2019)';
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Update on public.movies_jsonb  (cost=0.00..56211.80 rows=1873 width=52) (actual time=1928.691..1928.692 rows=0 loops=1)
   ->  Seq Scan on public.movies_jsonb  (cost=0.00..56211.80 rows=1873 width=52) (actual time=577.386..1928.246 rows=1 loops=1)
         Output: ai_myid, imdb_id, jsonb_set(jsonb_column, '{imdb_rating}'::text[], '9'::jsonb, true), ctid
         Filter: ((movies_jsonb.jsonb_column ->> 'title'::text) = 'Avengers: Endgame (2019)'::text)
         Rows Removed by Filter: 375358
 Planning Time: 0.040 ms
 Execution Time: 1928.718 ms
(7 rows)

I have seen references in several places to the insert performance difference between JSON and JSONB, but I have not seen any concrete numbers.  I decided to run a few tests:

  • Inserting 10K records into a JSON column (via insert into select from):  72.851 ms
  • Inserting 10K records into a JSONB column (via insert into select from):  754.045 ms

If you are doing a heavy insert workload, the difference is significant enough to take note and plan for it.  

Another distinct advantage for JSONB is the ability to use GIN indexes over the JSON document.  Gin indexes are designed and optimized to work for text searches.  This focus lends itself well to JSON.  That said, the syntax may be a bit more complicated to make use of GIN indexes.

Here is how you create a GIN index. 

movie_json_test=# CREATE INDEX movies_jsonb_gin_index ON movies_jsonb USING gin (jsonb_column);
CREATE INDEX

You can see the original query is still not using this index:

movie_json_test=# explain (verbose true, analyze true) select jsonb_column->>'title', jsonb_column->>'imdb_rating' from movies_jsonb where jsonb_column->>'title' = 'Avengers: Endgame (2019)';
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..54125.60 rows=1877 width=64) (actual time=716.059..719.346 rows=1 loops=1)
   Output: ((jsonb_column ->> 'title'::text)), ((jsonb_column ->> 'imdb_rating'::text))
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on public.movies_jsonb  (cost=0.00..52937.90 rows=782 width=64) (actual time=544.197..714.577 rows=0 loops=3)
         Output: (jsonb_column ->> 'title'::text), (jsonb_column ->> 'imdb_rating'::text)
         Filter: ((movies_jsonb.jsonb_column ->> 'title'::text) = 'Avengers: Endgame (2019)'::text)
         Rows Removed by Filter: 125119
         Worker 0:  actual time=202.768..713.907 rows=1 loops=1
         Worker 1:  actual time=713.890..713.891 rows=0 loops=1
 Planning Time: 0.041 ms
 Execution Time: 719.365 ms
(12 rows)

Why not?  The ->> returns the value as a text, so the conversion from JSON to Text causes some issues.  There are a couple of ways to work around this.  The first is to use @@ which returns the first JSON item:

movie_json_test=# explain (verbose true, analyze true) select jsonb_column->>'title', jsonb_column->>'imdb_rating' from movies_jsonb where jsonb_column @@ '$.title == "Avengers: Endgame (2019)"';
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.movies_jsonb  (cost=72.29..221.83 rows=38 width=64) (actual time=0.179..0.180 rows=1 loops=1)
   Output: (jsonb_column ->> 'title'::text), (jsonb_column ->> 'imdb_rating'::text)
   Recheck Cond: (movies_jsonb.jsonb_column @@ '($."title" == "Avengers: Endgame (2019)")'::jsonpath)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on movies_jsonb_gin_index  (cost=0.00..72.28 rows=38 width=0) (actual time=0.068..0.068 rows=2 loops=1)
         Index Cond: (movies_jsonb.jsonb_column @@ '($."title" == "Avengers: Endgame (2019)")'::jsonpath)
 Planning Time: 0.145 ms
 Execution Time: 0.199 ms
(8 rows)

This works.  You can also use the @> which checks if the JSON value entries exist.

movie_json_test=> explain (verbose true, analyze true) select jsonb_column->>'title', jsonb_column->>'imdb_rating' from movies_jsonb where jsonb_column @> '{ "title" : "Avengers: Endgame (2019)"}';
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.movies_jsonb  (cost=44.29..193.94 rows=38 width=64) (actual time=0.199..0.200 rows=1 loops=1)
   Output: (jsonb_column ->> 'title'::text), (jsonb_column ->> 'imdb_rating'::text)
   Recheck Cond: (movies_jsonb.jsonb_column @> '{"title": "Avengers: Endgame (2019)"}'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on movies_jsonb_gin_index  (cost=0.00..44.28 rows=38 width=0) (actual time=0.091..0.091 rows=1 loops=1)
         Index Cond: (movies_jsonb.jsonb_column @> '{"title": "Avengers: Endgame (2019)"}'::jsonb)
 Planning Time: 0.147 ms
 Execution Time: 0.214 ms
(8 rows)

Not only did this use the index it is now over 3x faster overall.  The PostgreSQL documentation offers a fairly detailed overview of the different options to interact with JSON, especially when it comes to functions and operators (note the table in the PG 12 Docs giving you an overview of the operators is a little more readable ) available to you. It’s important to note that not all operators and functions will work with GIN indexes.  For example, using ‘like_regex’:

movie_json_test=# explain (verbose true, analyze true) select jsonb_column->>'title', jsonb_column->>'imdb_rating' from movies_jsonb where jsonb_column @@ '$.title like_regex "^Avengers*"';
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..53546.87 rows=38 width=64) (actual time=218.550..795.063 rows=9 loops=1)
   Output: ((jsonb_column ->> 'title'::text)), ((jsonb_column ->> 'imdb_rating'::text))
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on public.movies_jsonb  (cost=0.00..52543.07 rows=16 width=64) (actual time=251.866..790.098 rows=3 loops=3)
         Output: (jsonb_column ->> 'title'::text), (jsonb_column ->> 'imdb_rating'::text)
         Filter: (movies_jsonb.jsonb_column @@ '($."title" like_regex "^Avengers*")'::jsonpath)
         Rows Removed by Filter: 125117
         Worker 0:  actual time=311.403..789.402 rows=3 loops=1
         Worker 1:  actual time=225.825..789.408 rows=1 loops=1
 Planning Time: 0.204 ms
 Execution Time: 795.087 ms
(12 rows)

In addition to the GIN indexes, you can create Hash or Btree indexes on your JSON columns, however, they are only useful if you are comparing or searching the entire JSON document.  Instead, you want to use an expression index (often referred to as functional indexes in other databases).  

Generated Columns and Expression Indexes

While the GIN indexes work for some use cases, you often will find it easier and more convenient to use actual columns or some of the normal functions you are familiar with.  The simplest use of expression indexes is to simply pull out and index a field you will use often.  Let’s say we will often want to search for a title.  We can create an index on the extracted JSON from our previous example.

movie_json_test=# create index movie_jsonb_title_index on movies_jsonb (((jsonb_column ->> 'title')::text));
CREATE INDEX

This creates a btree index on the jsonb_column->> title path, allowing us to search for a title.

movie_json_test=# explain (verbose true, analyze true) select jsonb_column->>'title', jsonb_column->>'imdb_rating' from movies_jsonb where jsonb_column->>'title' = 'Avengers: Endgame (2019)';
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.movies_jsonb  (cost=66.97..6421.19 rows=1877 width=64) (actual time=0.138..0.140 rows=1 loops=1)
   Output: (jsonb_column ->> 'title'::text), (jsonb_column ->> 'imdb_rating'::text)
   Recheck Cond: ((movies_jsonb.jsonb_column ->> 'title'::text) = 'Avengers: Endgame (2019)'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on movie_jsonb_title_index  (cost=0.00..66.50 rows=1877 width=0) (actual time=0.014..0.014 rows=1 loops=1)
         Index Cond: ((movies_jsonb.jsonb_column ->> 'title'::text) = 'Avengers: Endgame (2019)'::text)
 Planning Time: 0.160 ms
 Execution Time: 0.159 ms
(8 rows)

Originally, when we searched for movies with the ‘Avengers Endgame (2019)’ title there was no index (unless you modified the SQL to make use of the GIN index).  Now with this new index, you can see our original query is no longer doing a sequential scan but is instead using the btree index.  If we want the index to also be used for ‘ like string%’ statements as well to help with partial matching we need something a bit extra added to the index.  Note: this is not something you can do by default using only the GIN index (however you can look into the pg_trgm extension potentially for this).  Here we will create the same index with the operator class “text_pattern_ops” which is designed to work with Like and Regex.

Before:

movie_json_test=# explain (verbose true, analyze true) select jsonb_column->>'title', jsonb_column->>'imdb_rating' from movies_jsonb where jsonb_column->>'title' like 'Avengers%';
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..54125.60 rows=1877 width=64) (actual time=371.463..723.743 rows=8 loops=1)
   Output: ((jsonb_column ->> 'title'::text)), ((jsonb_column ->> 'imdb_rating'::text))
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on public.movies_jsonb  (cost=0.00..52937.90 rows=782 width=64) (actual time=288.053..718.957 rows=3 loops=3)
         Output: (jsonb_column ->> 'title'::text), (jsonb_column ->> 'imdb_rating'::text)
         Filter: ((movies_jsonb.jsonb_column ->> 'title'::text) ~~ 'Avengers%'::text)
         Rows Removed by Filter: 125117
         Worker 0:  actual time=204.176..718.288 rows=2 loops=1
         Worker 1:  actual time=288.637..718.299 rows=3 loops=1
 Planning Time: 0.130 ms
 Execution Time: 723.762 ms
(12 rows)

After:

movie_json_test=# create index movie_jsonb_title_index on movies_jsonb (((jsonb_column ->> 'title')::text) text_pattern_ops);
CREATE INDEX
movie_json_test=# explain (verbose true, analyze true) select jsonb_column->>'title', jsonb_column->>'imdb_rating' from movies_jsonb where jsonb_column->>'title' like 'Avengers%';
                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.movies_jsonb  (cost=71.66..6425.88 rows=1877 width=64) (actual time=0.195..0.498 rows=8 loops=1)
   Output: (jsonb_column ->> 'title'::text), (jsonb_column ->> 'imdb_rating'::text)
   Filter: ((movies_jsonb.jsonb_column ->> 'title'::text) ~~ 'Avengers%'::text)
   Heap Blocks: exact=8
   ->  Bitmap Index Scan on movie_jsonb_title_index  (cost=0.00..71.19 rows=1877 width=0) (actual time=0.015..0.015 rows=8 loops=1)
         Index Cond: (((movies_jsonb.jsonb_column ->> 'title'::text) ~>=~ 'Avengers'::text) AND ((movies_jsonb.jsonb_column ->> 'title'::text) ~<~ 'Avengert'::text))
 Planning Time: 0.168 ms
 Execution Time: 0.519 ms

Now you can also perform a similar operation, but store the results as a queryable generated column instead:

movie_json_test=> \d movies_jsonb
                                         Table "public.movies_jsonb"
    Column    |          Type          | Collation | Nullable |                    Default                    
--------------+------------------------+-----------+----------+-----------------------------------------------
 ai_myid      | integer                |           | not null | nextval('movies_jsonb_ai_myid_seq'::regclass)
 imdb_id      | character varying(255) |           |          | 
 jsonb_column | jsonb                  |           | not null | 
Indexes:
    "movies_jsonb_pkey" PRIMARY KEY, btree (ai_myid)
    "movies_jsonb_imdb_idx" UNIQUE, btree (imdb_id)

movie_json_test=> alter table movies_jsonb add column title varchar(255) generated always as (((jsonb_column ->> 'title')::text)) stored;
ALTER TABLE


movie_json_test=> \d movies_jsonb
                                                  Table "public.movies_jsonb"
    Column    |          Type          | Collation | Nullable |                             Default                             
--------------+------------------------+-----------+----------+-----------------------------------------------------------------
 ai_myid      | integer                |           | not null | nextval('movies_jsonb_ai_myid_seq'::regclass)
 imdb_id      | character varying(255) |           |          | 
 jsonb_column | jsonb                  |           | not null | 
 title        | character varying(255) |           |          | generated always as (((jsonb_column ->> 'title'::text))) stored
Indexes:
    "movies_jsonb_pkey" PRIMARY KEY, btree (ai_myid)
    "movies_jsonb_imdb_idx" UNIQUE, btree (imdb_id)

Now we have the option to add an index on the title column, select it without the JSON formatting, etc.  Note: generated columns can not be updated.

We can create our entire movie_json table with extra columns and expression indexes and make it look a bit more like a normal table:

create table movies_json_generated (
	ai_myid serial primary key, 
	imdb_id varchar(255) generated always as (jsonb_column ->> 'imdb_id') stored,
	title varchar(255) generated always as (jsonb_column ->> 'title') stored,
    imdb_rating decimal(5,2) generated always as ((jsonb_column  ->> 'imdb_rating')::numeric) stored,
	overview text generated always as (jsonb_column ->> 'overview') stored,
	director jsonb generated always as ((jsonb_column ->> 'director')::json) stored,
	country varchar(100) generated always as (jsonb_column ->> 'country') stored,
	jsonb_column jsonb,
	json_column json
);

create unique index gen_imdb_idx on movies_json_generated(imdb_id);
create index gen_title_idx on movies_json_generated(title);
create index gen_func_title_index on movies_json_generated (((json_column ->> 'title')::varchar));	
CREATE INDEX Gen_gin_index ON movies_json_generated USING gin (jsonb_column);

You may have noticed we had to explicitly cast the columns to a specific data type.  One of the challenges or at least the more difficult things to get used to with both MySQL and PostgreSQL.

It’s worth remembering that a JSON document does not have explicit data types, so you will often find that some functions or indexes may not work as expected because you are comparing data of two different types (and will return different results from some operations, i.e. sorting ascii vs. numeric).  For example: to add 1 to our IMDB rating:

movie_json_test=> update  movies_jsonb_generated_1 set jsonb_column= jsonb_set(jsonb_column::jsonb, '{imdb_rating}',(((((jsonb_column ->>'imdb_rating')))+1.1))) ;
ERROR:  operator does not exist: text + numeric
LINE 1: ...imdb_rating}',(((((jsonb_column ->>'imdb_rating')))+1))) ;
                                                              ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

Here the imdb_rating that is returned is not numeric so you can not add 1.  So logically, you want to cast the value to a numeric to allow for the addition. 

movie_json_test=> update  movies_jsonb_generated_1 set jsonb_column= jsonb_set(jsonb_column::jsonb, '{imdb_rating}',(((((jsonb_column ->>'imdb_rating')::numeric))+1))) ;
ERROR:  function jsonb_set(jsonb, unknown, numeric) does not exist
LINE 1: ...pdate  movies_jsonb_generated_1 set jsonb_column= jsonb_set(...
                                                             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

Again we run into a type issue.  This time the jsonb_set function can not pass in a numeric value (nor does it implicitly convert) when the function is looking for a JSONB value.  So let’s cast the result to JSONB:

movie_json_test=> update  movies_jsonb_generated_1 set jsonb_column= jsonb_set(jsonb_column::jsonb, '{imdb_rating}',(((((jsonb_column ->>'imdb_rating')::numeric))+1)::jsonb)) ;
ERROR:  cannot cast type numeric to jsonb
LINE 1: ...((((jsonb_column ->>'imdb_rating')::numeric))+1.1)::jsonb)) …

Here we find we can not cast a numeric directly to JSONB.  So we have to convert the numeric to text, then to JSONB.

movie_json_test=> update  movies_jsonb_generated_1 set jsonb_column= jsonb_set(jsonb_column::jsonb, '{imdb_rating}',(((((jsonb_column ->>'imdb_rating')::numeric))+1)::text::jsonb)) ;
UPDATE 100000

For some developers, this may prove a bit tricky to get the hang of.  This will also extend to things that work, but maybe just slow or may produce the wrong data.  For instance, when using compare, sort, or filter you may or may not pick up the index you are looking for and you may be comparing the ASCII value for numerics.  Below you can see us explicitly convert certain columns to compare them. You can also see that when you generate a column or create an expression index you need to do the same:

movie_json_test=> explain analyze select jsonb_column->>'title' as title,   jsonb_column->>'imdb_rating' as rating,   jsonb_column->>'imdb_id' as imdb_id  from movies_json_generated where (jsonb_column->>'imdb_rating')::numeric > 8 order by (jsonb_column->>'imdb_rating')::numeric desc limit 5;
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=65291.78..65292.36 rows=5 width=128) (actual time=1055.064..1058.740 rows=5 loops=1)
   ->  Gather Merge  (cost=65291.78..77449.06 rows=104198 width=128) (actual time=1055.062..1058.738 rows=5 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=64291.76..64422.00 rows=52099 width=128) (actual time=1053.392..1053.393 rows=4 loops=3)
               Sort Key: (((jsonb_column ->> 'imdb_rating'::text))::numeric) DESC
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Seq Scan on movies_json_generated  (cost=0.00..63426.41 rows=52099 width=128) (actual time=0.993..1052.275 rows=2688 loops=3)
                     Filter: (((jsonb_column ->> 'imdb_rating'::text))::numeric > '8'::numeric)
                     Rows Removed by Filter: 122432
 Planning Time: 0.061 ms
 Execution Time: 1058.769 ms
(14 rows)
 
movie_json_test=> \d movies_json_generated
                                                     Table "public.movies_json_generated"
    Column    |          Type          | Collation | Nullable |                                    Default                                    
--------------+------------------------+-----------+----------+-------------------------------------------------------------------------------
 ai_myid      | integer                |           | not null | nextval('movies_json_generated_ai_myid_seq'::regclass)
 imdb_id      | character varying(255) |           |          | generated always as (((json_column ->> 'imdb_id'::text))) stored
 title        | character varying(255) |           |          | generated always as (((json_column ->> 'title'::text))) stored
 imdb_rating  | numeric(5,2)           |           |          | generated always as (((json_column ->> 'imdb_rating'::text)::numeric)) stored
 overview     | text                   |           |          | generated always as (json_column ->> 'overview'::text) stored
 director     | jsonb                  |           |          | generated always as (((json_column ->> 'director'::text)::json)) stored
 country      | character varying(100) |           |          | generated always as (((json_column ->> 'country'::text))) stored
 jsonb_column | jsonb                  |           |          | 
 json_column  | json                   |           |          | 
Indexes:
    "movies_json_generated_pkey" PRIMARY KEY, btree (ai_myid)
    "gen_func_title_index" btree (((json_column ->> 'title'::text)::character varying))
    "gen_gin_index" gin (jsonb_column)
    "gen_imdb_idx" UNIQUE, btree (imdb_id)
    "gen_title_idx" btree (title)
 
movie_json_test=> explain analyze select jsonb_column->>'title' as title,   jsonb_column->>'imdb_rating' as rating,   jsonb_column->>'imdb_id' as imdb_id  from movies_json_generated where imdb_rating > 8 order by imdb_rating desc limit 5;
                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=62548.12..62548.70 rows=5 width=102) (actual time=112.458..114.704 rows=5 loops=1)
   ->  Gather Merge  (cost=62548.12..63277.80 rows=6254 width=102) (actual time=112.457..114.702 rows=5 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=61548.09..61555.91 rows=3127 width=102) (actual time=110.807..110.808 rows=4 loops=3)
               Sort Key: imdb_rating DESC
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 26kB
               ->  Parallel Seq Scan on movies_json_generated  (cost=0.00..61496.16 rows=3127 width=102) (actual time=0.128..109.939 rows=2688 loops=3)
                     Filter: (imdb_rating > '8'::numeric)
                     Rows Removed by Filter: 122432
 Planning Time: 0.146 ms
 Execution Time: 114.729 ms
(14 rows)

movie_json_test=> create index test_index_imdb_rating on movies_json_generated (imdb_rating);
CREATE INDEX
movie_json_test=> explain analyze select jsonb_column->>'title' as title,   jsonb_column->>'imdb_rating' as rating,   jsonb_column->>'imdb_id' as imdb_id  from movies_json_generated where imdb_rating > 8 order by imdb_rating desc limit 5;
                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..19.41 rows=5 width=102) (actual time=0.094..0.134 rows=5 loops=1)
   ->  Index Scan Backward using test_index_imdb_rating on movies_json_generated  (cost=0.42..28521.09 rows=7510 width=102) (actual time=0.093..0.132 rows=5 loops=1)
         Index Cond: (imdb_rating > '8'::numeric)
 Planning Time: 0.136 ms
 Execution Time: 0.152 ms
(5 rows)

A quick side note:  there will be a performance difference for inserts/updates on JSON/JSONB columns with one or more generated columns or expression indexes.  Because of how JSONB is stored, you can see a boost in performance when you update a column that has a lot of generated columns/expression indexes.  In my tests, I saw 3-4x or more performance improvement in doing updates on a JSONB column with several generated/expression indexes vs. a JSON column with the same setup.  That said, the more generated columns and expression indexes you have on a table, the more it can impact performance… this is a trade-off.

  • A table with a single generated column, took 7650ms to load 100K rows.
  • A table with 6 generated columns, took 8576ms to load 100K rows.

While this is a very simple setup with just a single JSONB column and the generated columns, you can see a slight overhead.  This held true with the same 100K records being updated, updating only one column in the table with six generated columns took 4.7 seconds vs. 3.6 seconds for a single generated column (a ~25% difference).  Again, not an in-depth test, but it illustrates that adding a ton of expression indexes or generated columns will not be without some cost.

Recap & What’s Next

A few quick takeaways:

  • Using JSONB is probably going to be your best option in most use cases
  • Be very careful of type conversions and making assumptions on the data within your JSON/JSONB columns.  You may get errors or odd results.
  • Use the available indexes, generated columns, and expression indexes to gain substantial performance benefits

Now you have the basics of JSON in PostgreSQL.  In part two of this series, we review some of the more advanced options and do a deeper dive on performance and explore a normalized schema -vs- one that is heavily JSON.

Subscribe
  • Post a new comment

    Error

    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments