Adding New Columns To Existing Tables
Typically, when you want to add a new column to an existing table, it is relatively quick, however, there are a few considerations to take into account:
Considerations:
1. You have defaults on your new column
Adding a column with a DEFAULT clause or changing the type of an existing column will require the entire table and its indexes to be rewritten.
https://www.postgresql.org/docs/9.3/static/sql-altertable.html
By adding a default, the database needs to go and add new data (i.e. the default value) to each and every existing row. That will completely lock down the table until the migration is finished. You do NOT want to to do this.
2. You are going to add an index to the new column
Adding an index will also create a lock on the table, and is time consuming for large tables. Luckily, there is a relatively workaround for this. See below
3. You are adding a new column to a table that typically has long running queries
When you attempt add a new column to a table, the new column query will hold the table hostage and wait for any existing queries to finish running before it attempts to add a new column. For something like our users table that constantly has email queries being run on it, adding a new column can be tough.
Workarounds:
1. Defaults
If you must have default values, the best way to handle this is to have 3 migrations. First run a migration to create the column.
add_column :temp_tables, :column_x, :int
Then run a migration to add the default.
change_column :temp_tables, :column_x, :int, :default => 42
Finally, go back and backfill all of the data will defaults.
num = 1
loop do
to_update = TempTable.where(column_x: nil).limit(1000)
num = to_update.size
break if num = 0
to_update.update_all(column_x: 42)
end
##Note: code not tested
This will update 1000 columns at a time to have the most minimal impact.
2. Indexes
Concurrent to the rescue!
add_index :temp_table, :column_x, algorithm: :concurrently
Adding concurrent to the index creation will not lock the table.
With the concurrent command, the index gets generated separately and gets added to the table after it gets created so it won't freeze the table.
3. Tables with long running queries
Aside from fixing the queries so that they aren't taking such a long time to run, you can do the following:
- Wait or pause/cancel the long running queries, run your migration, and unpause the queries
- Create a new table and tie it back with the original table (after all, Postgres is a relational DB system)
There might be better solutions to get around this, but none come to mind at this time. If you have better solutions, please share!