I’m reading SQLアンチパターン again.

I’m gonna roughly summarize it.

Premise

  • Query: means SELECT statement in the book
  • Statement: means non- SELECT statements like INSERT, UPDATE, DELETE and other Schema statements.

Examples in the book are mainly database for Bug management application.

Its tables relations are following:

(For me, describe the relations like ActiveRecord in Ruby on Rails)

class Account < ApplicationRecord
  has_many :repoted_bugs,  class_name: 'Bug', foreign_key: :reported_by
  has_many :assigned_bugs, class_name: 'Bug', foreign_key: :assigned_to
  has_many :verified_bugs, class_name: 'Bug', foreign_key: :verified_by
end

class Bug
  belongs_to :reporter,      class_name: 'Account', foreign_key: :reported_by
  belongs_to :assigned_user, class_name: 'Account', foreign_key: :assigned_to
  belongs_to :verifier,      class_name: 'Account', foreign_key: :verified_by

  has_many :comments

  has_many :bugs_products
  has_many :products, through: :bugs_products

  has_many :tags
end

class BugStatus
end

class Tag
  belongs_to :bug
end

class Snapshot
  belongs_to :bug
end

class Product
  has_one :installer
end

class Installer
  belongs_to :installer
end

The classes above is just working one, and it will be revised through chapters.

and we also think about Bug table like following,

class Issue
end

class Bug < Issue
end

class FeatureRequest < Issue
end

But table structures are unknown.

1. Jaywalking

column which has multiple values.

# e.g.)
bugs.repoter_ids = '10,13,14'

Problems

  • You need to use REGEXP to search records.
  • INDEX doesn’t work well.
  • JOIN takes cost.
  • difficult to use aggregate functions (SUM, AVG, etc)
  • Updating the columns takes cost.
    • Need 2 statements:
      1. Get current value with SELECT.
      1. UPDATE the value with using current value.
  • You can insert ANY type of value in the column.
  • How to check the separation character?

Solution

Use intersection table

CREATE TABLE Contacts (
  product_id BIGINT UNSIGNED NOT NULL,
  bug_id     BIGINT UNSIGNED NOT NULL,

  PRIMARY KEY (product_id, bug_id),
  -- ...
);

2. Naive Trees

(I don’t know the title of English ver.)

Add parent_id to express threads-type comments.

CREATE TABLE Comments (
  comment_id SERIAL PRIMARY KEY,
  parent_id  BIGINT UNSIGNED,
  comment    TEXT NOT NULL,
  FOREIGN KEY (parent_id) REFERENCES Comments(comment_id)
);

This is called Adjacency List.

Problems

  • You need to the # of layers JOIN when fetching all the comments.
  • Deleting node takes cost
    • Need to check and update the node’s children’s parent_id

You can use this pattern if your database supported recursion query

Or just suppose 1 or 2 comments layers.

Solutions

1. Path Enumeration

like UNIX path system,

# Suppose there are comments which ids are 1, 2, and 3.
# And new comment (reply) is added toward comment2.
comment.path = '1/2/4'

This pattern may be Jaywalking pattern…

2. Nested Set

using nsleft, nsright

nsleft: the smallest value of its children nsright: the largest value of its children

But this pattern might cause more complicated insert and moving operations.

If you want to support fast and easy query execution for the sub tree, take this pattern.

3. Closure table

Create TreePaths table out of the Comments table.

CREATE TABLE TreePaths (
  ancestor   BIGINT UNSIGNED NOT NULL,
  descendant BIGINT UNSIGNED NOT NULL,

  PRIMARY KEY (ancestor, descendant),
  -- ...
);

Store all paths for comments outside Comments.

e.g.) There are 4 comments.

ancestor descendant
1 1
1 2
1 3
1 4
2 2
2 4
3 3
4 4
  • Comment2 is under 1.
  • Comment3 is under 1.
  • Comment4 is under 2.

Most operations gets easier, but occupy more memories.

3. ID Required

What the Primary Key for?

Gurantee the uniqueness for all row of the table.

To avoid generating duplicated row.

Problems

  • If you define id in intersection table, you need to UNIQUE restriction for the pair ids.
    • e.g.) Contacts table (bug_id, account_id)
  • The id name is too general.

Solutions

You know the solution.

4. Key less entry

skip

5. EAV (Entity Attribute Value)

skip

This is like NoSQL.

EAV is also called as Open Schema, Schemaless, Name/Value pair.

6. Polymorphic Association

class Bug
  has_many :comments, as: :commentable
end

class FeatureRequest
  has_many :comments, as: :commentable
end


# Comments
# ========
# commentable_id   integer
# commentable_type varchar

class Comment
  belongs_to :commentable, polymorphic: true
end

Problems

  • You cannot define foreign key
  • etc.

Solutions

1. Create each table for the type

class Bug
  has_many :comments, class_name: 'BugComment'
end

class BugComment
  belongs_to :bug
end

# FeatureRequest / FeatureRequestComment

2. Use STI for the base table.

class Issue
  has_many :comments
end

class Bug < Issue
end

class FeatureRequest < Issue
end

class Comment
  belongs_to :issue
end

Simple!

7. Multi Column Attribute

skip

8. Metadata Tribble

Creating versioned tables to decrease rows per table.

  • Bugs_2008
  • Bugs_2009
  • Bugs_2010

Problems

  • There causes many troublesome for operation
  • Difficult to update row between tables
  • How to guarantee the uniqueness between tables
  • Cannot use foreign keys

But if you want to use this pattern for archive, it’s ok!

Solutions

1. Horizontal Partitioning (Sharding)

skip

2. Vertical Partitioning

  • Create separated table for variable length columns.
    • BLOB, TEXT and other variable length column type might cause low performance.
    • It depends on DB system.

Conclusion

Finish the logical design part of the book.