ActiveRecord Overview and Tips

Mar 15 2016

What is ORM?

Object-Relational Mapping or ORM is a technique that connects the
rich objects of an application to tables in a relational database
management system. Using ORM, the properties and relationships of the
objects in an application can be easily stored and retrieved from a
database without writing SQL statements directly and with less
overall database access code. This post goes over the excellent
Ruby ORM called ActiveRecord


Migrations & Schema

Migrations are a convenient way to alter your database schama over time
in a consistent and easy way. They use Ruby code so you don't have to
edit tables by SQL.

Migrations capture the change in schema, the outcome is captured in
db/schema.rb 

For example, the migration

class CreateStudents < ActiveRecord::Migration
  def change
    create_table :students do |t|
      t.string :first_name
      t.string :last_name
      t.integer :age
      t.belongs_to :house
      t.timestamps
    end
  end
end

will be translated to schema.rb as

  create_table "students", force: :cascade do |t|
    t.string   "first_name"
    t.string   "last_name"
    t.integer  "age"
    t.integer  "house_id"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

Queries

Lets review some basic AR code, and its SQL interpretation.

Insert

The INSERT INTO Statement is used to add new rows of data to a table in
the database.

  Student.create(first_name: "Johny", last_name: "Bravo", age: 25)
  INSERT INTO students (first_name, last_name, age) VALUES ('Johny', 'Bravo', 25)
Select

SELECT statement is used to fetch the data from a database table which
returns data in the form of result table. These result tables are called
result-sets.

  Student.first
  SELECT * FROM students ORDER BY id LIMIT 1
Where

The WHERE clause is used to specify a condition while fetching the data
from single table or joining with multiple tables.

  Student.where('age < 30')
  SELECT * FROM students WHERE age < 30
  Student.where("first_name LIKE (?)", "%Har%")
  SELECT students.* FROM students WHERE (first_name LIKE ('%Har%'))
Composed Queries

You can chain AR queries. AR will analyze the code, and convert it to
one or more SQL queries.

  Student.where('age < 30').select(:first_name, :last_name)
  SELECT first_name, last_name FROM students WHERE age < 30
Join

An SQL JOIN clause is used to combine rows from two or more tables,
based on a common field between them.

The most common type of join is INNER JOIN (simple join). An SQL INNER
JOIN returns all rows from multiple tables where the join condition is
met.

Say we have 2 tables
Students:

id first_name last_name catch_phrase       house_id
1   johny       bravo     man I'm pretty     1        
2   elmaer     fudge     be vewi vewi quiet 2        
3   ...         ...       ...                          
                                                       

Houses:

id city street   number
1   NY   Broadway 1      
2   NY   5th       4      
3   ...   ...       ...    
                           

We want to get a list of [student_id, house_id]

  SELECT students.*, houses.*
  FROM students
  INNER JOIN houses
  ON students.house_id = houses.id

Note that we don't really need to pull each student & houses data into memory. We can specify which fields are required for our purpose.
More about that at the best practices section.

  SELECT students.id as student_id , houses.number as houses_number
  FROM students
  INNER JOIN houses
  ON students.house_id = houses.id

Result:

student_id house_number
1           1            
2           4            
3           ...          
                         

Relations

Relations between tables can be defined using secondary indexes.
For example, at the AR migration below, belongs_to :house simply adds a house_id column at that student table.

class CreateStudents < ActiveRecord::Migration
  def change
    create_table :students do |t|
      t.string :first_name
      t.string :last_name
      t.integer :age
      t.belongs_to :house
      t.timestamps
    end
  end
end

We also want to be able to add this student-house functionality to the Student Model:

  class Student < ActiveRecord::Base
    belongs_to :house

    def my_house_name
      house.name
    end

    def self.first_student_house_name
      Student.first.house.name
    end
  end

Many-to-many relations require an intermediate table. So if a Student
may take many courses & each course may be taken by many students - we
need to create such table. Note that for these intermediate tables, id
column is not mandatory.

  class CreateCoursesStudents < ActiveRecord::Migration
    def change
      create_table :courses_students do |t|
        t.belongs_to :student, index: true
        t.belongs_to :course, index: true
      end
    end
  end

Now we can add functionality to our models:

  class Student < ActiveRecord::Base
    belongs_to :house
    has_and_belongs_to_many :courses

    def first_course
      courses.first
    end
  end
  
  class Course < ActiveRecord::Base
    has_and_belongs_to_many :students
    
    def sort_students_by_id
      students.sort
    end
  end

Sequences

By default tables are created with an primary id integer column. The
value of this aaafiled is determined by the table sequence.  This is
what makes sure ids are unique and not null, as primary keys requires.

Some DB's allows us to create a costume sequence. Assuming we use
Postgres, we want models Potion & Charms that share an unique
magical_id, and use that as our primary key.

First we'll define the migrations

  class CreateMagicalSequence < ActiveRecord::Migration
    def change
      execute "CREATE SEQUENCE magical_seq INCREMENT BY 1 START WITH 1000"
    end
  end

  class CreatePotions < ActiveRecord::Migration
    def change
      execute "CREATE TABLE potions(magical_id INTEGER DEFAULT NEXTVAL('magical_seq'), name VARCHAR(32));"
    end
  end

  class CreateCharms < ActiveRecord::Migration
    def change
      execute "CREATE TABLE charms(magical_id INTEGER DEFAULT NEXTVAL('magical_seq'), name VARCHAR(32));"
    end
  end

Then set AR models to use 'magical_id' as primary key.

  class Potion < ActiveRecord::Base
    self.primary_key = 'magical_id'
  end
  class Charm < ActiveRecord::Base
    self.primary_key = 'magical_id'
  end

Now we can run something like

  Potion.create(name: 'p1').magical_id
  # => 1000
  Charm.create(name: 'c1').magical_id
  # => 1001
  Charm.create(name: 'c1').magical_id
  # => 1002
  Potion.create(name: 'p2').magical_id
  # => 1003

Best Practices

Use AR built in queries

ActiveRecord has a lot of built-in functionality we can use.

  Course.ids
  Student.count
  House.exists?
Keep code conversions

AR will convert different queries to the same SQL. It's easier to follow when we use the same patterns everywhere.

  House.where(name: 'house_name').first # bad
  House.find_by_name('house_name') # bad
  House.find_by(name: 'house_name') # good
  SELECT  houses.* FROM houses WHERE houses.name = 'house_name' LIMIT 1
Minimize the number of DB connections

Use arrays at queries.

  # bad
  s1 = Student.find(1)
  s2 = Student.find(5)
  s3 = Student.find(23)
  # good
  Student.where(id: [1, 5, 23])
 SELECT students.* FROM students WHERE students.id IN (1, 5, 23)

Use find_or_create

  emails = %w([email protected] papa.email.com)
  # not so good
  emails.each do |e| 
    u = User.find_by_email(e)
    unless u User.create(email: e)
  end
  # better
  emails.each {|e| User.find_or_create_by(email: e)}
Bulk actions

Each time you insert data to a table there are a few things going on.
Creating a connection to the DB is an expensive action. Not only that,
chances are the DB is on a remote system somewhere, so we need to think
about network overhead too.

AR has some built-in bulk actions such as update_all, destroy_all...

  def add
    i = Student.count
    Student.create(first_name: "fname_#{i}", last_name: "lname_#{i}", age: AGE_RANGE.sample)
  end

  # VERY BAD!
  def bulk_insert(n)
    (1..n).each { add }
  end

  # better...
  def bulk_insert(n)
    c = Student.count
    ActiveRecord::Base.transaction do
      (c + 1.. c + n).each { |i| Student.create(first_name: "fname_#{i}", last_name: "lname_#{i}", age: AGE_RANGE.sample) }
    end
  end

  # Even Better!
  def bulk_insert(n)
    c = Student.count
    values = (c + 1..c + n).map { |i| "('fname_#{i}', 'lname_#{i}', #{AGE_RANGE.sample})" }.join(', ')
    sql = "INSERT INTO students (first_name, last_name, age) VALUES #{values}"
    ActiveRecord::Base.connection.execute(sql)
  end
Avoid N+1 queries

Lets consider this code, what is the time complexity?

  students = Student.where("first_name LIKE (?)", "%H%")
  students.each {|s| s.house}

Now look at the console logs:

  2.3.0 :004 > students = Student.where("first_name LIKE (?)", "%H%")
    Student Load (0.5ms)  SELECT "students".* FROM "students" WHERE (first_name LIKE ('%H%'))

  2.3.0 :005 > students.each {|s| s.house}
    House Load (0.1ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 1]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 1]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 3]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 3]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 3]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 1]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 3]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 4]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 2]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 1]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 2]]
    House Load (0.0ms)  SELECT  "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1  [["id", 3]]

Hmmm....... that doesn't look too good.

Eager loading is AR mechanism for loading the associated records of the objects returned by Model.find using as few queries as possible.

Lets see what happens when we use the includes method

  2.3.0 :006 > students = Student.includes(:house).where("first_name LIKE (?)", "%H%")
    Student Load (0.5ms)  SELECT "students".* FROM "students" WHERE (first_name LIKE ('%H%'))
    House Load (0.2ms)  SELECT "houses".* FROM "houses" WHERE "houses"."id" IN (1, 3, 4, 2)

This code will send only 2 queries to the db. AR loaded the needed Houses to memory too. Running the students.each loop will not access the db.

You can also reduce time complexity by covering pure ruby logic down to db queries.
For example say we want to get tuples of {student_first_name, student_last_name, course_name} for each student & his courses.

  def students_names_and_courses
    Student.all.map do |s|
      res = []
      s.courses.each do |c| res.push({
        first_name: s.first_name,
        last_name: s.last_name,
        course_name: c.name
        })
      end
      res
    end
  end

We can do that by a single SQL query

  SELECT students.first_name as first_name, students.last_name as last_name, courses.name as course_name
  FROM courses_students 
  JOIN courses ON courses_students.course_id = courses.id
  JOIN students  ON courses_students.student_id = students .id
Minimize memory usage

Every time we make a query, we get back an ActiveRecord::Relation
object. Sometimes those objects can be pretty big. Use select to make
then thin, or even better, use pluck!

Say we wanna do some calculations on our students ages

 # bad - keeping unnecessary data in memory
 ages = []
 students = Student.all
 students.each {|s| arr.push(s.age)}
 do_some_calcs_on(ages)

 # better, but still bad - minimize the object size.
 ages = []
 Student.all.select(:age)
 students.each {|s| arr.push(s.age)}
 do_some_calcs_on(s.pluck(ages)

 # better - don't keep the object at memory at all!
 ages = Student.pluck(:age)
 do_some_calcs_on(ages)

You can also use limit to get the same effect. Say we want to get our 3 oldest students..

  # bad - doing the calculations in memory
  students = Student.all.sort_by{|s| s.age}.take 3
  # => SELECT "students".* FROM "students" ...
  # better - let the DB do the work
  Student.where('age > 17').order(:age).limit 3
  # => SELECT  "students".* FROM "students" WHERE (age > 17)  ORDER BY "students"."age" ASC LIMIT 3 ...

Resources

Guy Y.
Software Developer
Back to Blog