PostgreSQL no SQL Series Note 4


/the_one_on_generating_jsons_from_columns

the first 3 notes helped introduce our subject and setup the db and the pieces to support a working project.
this note will dive into the json functionality we can get with postgres to help build json data out of columns, queries, and more. This will be really useful later on, when we switch over to full nosql with json and jsonb data types

/differences_between_json_and_jsonb_data_types

postgresql.org -> datatype-json

postgres supports json and jsonb data types to store and query json data.
json stands for javascript object notation, and jsonb is a binary representation of a json.

the main difference between both other than the way it is stored internally, are the operations we can do on each and the performance you can get out of them.

both data types support:

  • json grammar check
    • it will prevent to store an incorrectly constructed json
  • querying fields whether top level, nested, array etc

json characteristics:

  • stores an exact copy of the input
    • makes it faster to store
    • ordering and even white spaces are respected
    • repeated keys are respected (but only last one will be the “working one” for querying purposes)
  • needs reparsing for functions to process the input
    • makes it a little slower to work with
  • no indexing support

jsonb characteristics:

  • stores a decomposed binary format of the original input
    • makes it a bit slower to store due to the preprocessing before insert
    • makes it significantly faster to work with as it does not need to be reparsed
  • can be indexed
  • ordering of fields is not respected, white space is trimmed
  • repeated keys are not kept, just the last one is kept

these are C-R-U-C-I-A-L differences to take into consideration when choosing a data type over the other, and they can have some saying in the way we decide our data structures.

check this out

selecting same data as json vs jsonb
postgres=# SELECT '{"name":"john",   "last_name":"doe","last_name":"pierce"}'::json, '{"name":"john",   "last_name":"doe","last_name":"pierce"}'::jsonb;

                           json                            |                  jsonb                  
-----------------------------------------------------------+-----------------------------------------
 {"name":"john",   "last_name":"doe","last_name":"pierce"} | {"name": "john", "last_name": "pierce"}
(1 row)

json will select back exactly what we inserted, spaces and all, jsonb will change things and eliminate repeated keys keeping the last one only

I am not mentioning UTF-8 encoding of the json format and how that relates to the DB as a whole but it is also something to be aware of.

/from_columns_to_json

we’re gonna play in this note with the operations in postgres that allow to create jsons, and json arrays from column data.

/setting_up_a_supporting_project

as advertised, we’re using java's ecosystem to support an imaginary project to store, query, and present data.

when working with extensive subjects I find more engaging to use lifelike demos than toy examples, so we’re gonna try and build a tiny piece of a real project, with millions of rows and hitting the db as hard as we want to push it to it’s limits.

we’re seeing lately the rise of phone controlled top up debit cards, that show a lot more info on screen (and have way more features) than our own banks do. We can get info broken by purchase type, seller, dates, country, etc.
we are going to build a web based super charged version of one of those apps that would show the info you can get in vendors like Revolut or Monzo or maybe a bank like N26 or any service of the like, using the versatility of postgres to act as pure sql, nosql and hybrid database.

/the_architecture_we_will_be_exploring

we’re going to explore a small part of one of those top up card services. Mainly the part storing and querying the users and transactions data. We’re going to build some rest endpoints to be able to retrieve and update that data and a web interface that, I must confess, might err on the simple side as I am a backEnd developer

here we go!


/starting_relational

this type of apps provide a service to a user.
a user then, is an important entity for these apps, thou the user’s data itself is not the core of the service… (that we know of…)

let’s try a first attempt of modeling our user, with the basic data we need to capture

  • USER
    • user_uuid
    • user_id_type (passport, national id, etc)
    • user_id
    • first_name
    • last_name
    • age
    • credit_card
    • debit_card
    • status
    • address
    • joining_date
    • last_login

now… this is some pseudocode for a DB table

a few things to notice, this user model will sink very quickly. The main data we need is there, but it’s difficult to access. Storing all the data from a credit card into a single field makes no sense… limiting a user to just have 2 cards makes no sense either.

1st refinement

  • USER
    • user_uuid
    • user_id_type (passport, national id, etc)
    • user_id
    • first_name
    • last_name
    • age
    • status
    • address_personal_id
    • address_ship_to_id
    • last_login
    • created_at
    • modified_at
  • CREDIT / DEBIT CARD
    • card_uuid
    • type (credit, debit)
    • internal (is this one of our cards?)
    • issuer (a bank? a card itself)
    • processor (visa, amex, master)
    • number
    • country
    • holder_name
    • valid_from_date
    • expiration_date
    • security_code
    • created_at
    • modified_at
  • Address
    • address_uuid
    • country
    • city
    • street_name
    • street_number
    • building_type (house, apartment)
    • floor
    • apartment
    • created_at
    • modified_at

now, the model for our user, is better than the original but still needs some massaging.
the things to notice here, is that we are limiting the amount of addresses a user can have. Well, addresses are not likely to change very often and we won’t keep historics on the users addresses table itself, so the limit is not terrible. Regarding the cards a user can have, that’s another story as our users can choose to add many cards, and at some point cards will expire so limiting the amount of cards won’t be the best approach.

enters an intermediate table that will relate a user to his cards… a single user can have many cards, and maybe a family will share a single card among it’s members so we need to add a table that can allow more than one user to relate to more than one card, a many to many table.

  • USERS AND CARDS
    • user_uuid
    • card_uuid
    • status (in_use, discarded)
    • created_at
    • modified_at

can this be refined further?
YES! extensively…

users can use bank accounts to top up their account for example, but the core of this example is to show postgres no sql more that achieving a perfect user model and we have a fine set of tables to start playing with.
So we will leave our user, and it’s immediate satellite data like this for the time being.

let’s give this a SQL spin

user.sql
--create the DB
--creating user and satellite data

set search_path to postgres_no_sql;

create table if not exists users
(
  id                  serial8 primary key,
  user_id_type        varchar(25) check ( user_id_type in ('driver_licence', 'passport', 'national id')),
  user_id             varchar(25),
  first_name          varchar(25),
  last_name           varchar(25),
  age                 smallint,
  status              varchar(25) check ( status in ('active', 'inactive', 'suspended') ),
  address_personal_id integer,
  address_ship_to_id  integer,
  last_login          time with time zone,
  created_at          time with time zone,
  modified_at         time with time zone
);
banking_card.sql
create table if not exists banking_card
(
  id              serial8 primary key,
  type            varchar(20) check (type in ('credit', 'debit')),
  internal        boolean,
  issuer          varchar(20) check (issuer in ('bank', 'another_card')),
  processor       varchar(20) check (processor in ('visa', 'amex', 'master')),
  number          integer,
  country         varchar(20),
  holder_name     varchar(40),
  valid_from_date timestamp,
  expiration_date timestamp,
  security_code   smallint,
  created_at      time with time zone,
  modified_at     time with time zone
);
address.sql
create table if not exists address
(
  id            serial primary key,
  country       varchar(20),
  city          varchar(20),
  street_name   varchar(50),
  street_number varchar(7),
  post_code     varchar(10),
  building_type varchar(20) check ( building_type in ((house, apartment)) ),
  floor         smallint,
  apartment     varchar(10),
  created_at    time with time zone,
  modified_at   time with time zone
);
users_cards.sql
create table if not exists users_and_cards
(
  user_id         integer,
  banking_card_id integer,
  status          varchar(20) check (status in ('in_use', 'discarded')),
  created_at      time with time zone,
  modified_at     time with time zone
);

all right!

we do have our users and cards and their satellite data sql files.

NOTE: something we need, is for our top up / 2.0 banking card system, to be able to onboard and retrieve users.

Our example might make it look like trivial but banking regulations and fraud protection can make it difficult for those guys to onboard a new client, luckily we’re not a bank and we can skip all those regulatory validations.

/first_steps_into_documents

with our project and tables setup, we can dive into some postgres functionality as a no sql document db.

to explore the features we will use toy amount of data then move to more realistic amount of records.

/loading_some_data

let’s create toy db seed files that will contain small amounts of data to test out a few features and check how can we make a json out of the columns. I will prepend the word toy in front of files to indicate this data is likely to disappear or be discarded in the final app

toy_user_seed.sql
toy_address_seed.sql
toy_banking_card_seed.sql
toy_users_and_cards_seed.sql

this is all for now on the fourth note on postgres noSql.

——————– dependencies on relations inside docs. example can a doc guarantee that the user ID id is among known users?? Mongo / Postgresql foreign keys?