Postgres with JSON, works for me :)

Mar 20 2016

When designing a web application, one of the first decisions is where
and how to store the data. There are many alternatives, and the most
popular ones these days are PostgreSQL,
MySQL and MongoDB.
PostgreSQL and MySQL belong to the SQL camp or storing data in a
structured manner that is easily queried while MongoDB belongs to the
NoSQL camp that promotes storing data in a semi-structured manner that
scales very well but is more difficult to query.
In this blog post I will focus on PostgreSQL and on its feature of JSON
support that aims to provide some of the benefits of both worlds.
This post is not meant to be a SQL vs. NoSQL debate,
we
have
plenty
of
those :)

A quick overview first

Relational database - software application that holds its data in a
rational model. This means that the data is stored in tables, each table
has columns and rows. Each row represents an object, generally it
represents a class in object oriented programing. The structure of the
data is fixed and new records or object should be inserted according to
the predefined structure. The main language that is being used is SQL
for querying the database and make CRUD (create, update, delete)
operations. The popular relational databases are Oracle, SQL Server,
MySQL and PostgreSQL.

JSON - Javascript object notation, a standard for data
representation. JSON was originally used for data transmission, but today
it has many usages such as data store, configuration files, templating
etc. The JSON format is lightweight and allows unstructured data in
contrast to other format such as XML. Generally JSON does not have a
scheme so there are no validations on its content.

Document oriented database - one of the NoSQL families of databases.
This database in contrary to rational databases holds the data in
documents that are unstructured, it means that every document can be
different. One of the popular implementation for it is MongoDB, that
stores it data as JSON documents.

The hybrid solution

So what should you choose? This is a rather complex decision that
revolves around anticipated queries, scale, data structures etc. I don't
intend to outline all the considerations in this post but rather to show
an interesting midway... A "SQL" database with “NoSQL” capabilities such
as storing unstructured data.  PostgreSQL for instance added the JSON
column type in its 9.4 version.  This column can hold a JSON document
such as document oriented databases and enables indexing and complex
queries.  Now I can store the data structured and unstructured in the
same database and even in the same table.  For instance, let's assume an
application that receives a data from an API and it only uses the data
for presentation (think for instance canvas serialization). Instead of
migrating the data into a structured table, I can store it as JSON, and
not care about changes to the API. I can do things like search and
filter on the data with minimum effort. In addition to that I can store
some additional fields the will accompany the unstructured data such as
creation date, owner etc. and I can run joins against other tables
enjoying the best of both worlds. Of course "NoSQL" pundits will hail
the horizontal scalability aspects of NoSQL databases but the fact of
the matter is that for many if not most  apps this is not a real
requirement.

JSON in practice (with Rails)

First you need to update your PostgreSQL installation to 9.4 or later
you can find
here a
good guide for it.

Create a table with JSON column:

rails generate model Weather city:string forcast:jsonb

The result will be:

class CreateWeathers < ActiveRecord::Migration
  def change
    create_table :weathers do |t|
      t.string :city
      t.jsonb :forecast
      t.timestamps null: false
    end
    add_index  :weathers, :forecast, using: :gin
  end
end

The forecast column type is JSONB which is almost identical to the JSON
type but is more efficient for processing but a bit slower in storing.
To make thing go fast we will add an index to the column. JSONB indexes
are called GIN (Generalized Inverted Index) that makes queries much
faster.

In Order to query the JSON data, postgres defined special operators. A full table can be found here. The main ones are:
 - ->    get the data as JSON object
 - ->>   get the data as text

For this JSON for instance: (from http://openweathermap.org/api

  {"city": {
    "id": 2643743,
    "name": "London",
    "coord": {
      "lon": -0.12574,
      "lat": 51.50853
    },
    "country": "GB",
  },
  "list": [{
    "dt": 1458302400,
    "temp": {
      "day": 8.48,
      "min": 4.81,
      "max": 9.12,
    },
...

Query the weather from all the cities of Great Britain will be:

Weather.where("forecast -> 'city' ->> 'country' = 'GB'")

The forecast -> 'city' will retrieve the json of ‘city’, the ->> 'country' will retrieve the text of ‘country’ and will compare it to ‘GB’.

Conclusion

Using PostgreSQL with JSON type allows you to handle unstructured data
while still maintaining a lot of the benefits of a SQL database:

  • Transactions
  • Consistency (not just eventual consistency)
  • Query on JSON fields
  • Additional relational fields to accompany JSON data and join against
  • Great performance
Amitai B.
Software Developer
Back to Blog