SQLAlchemy Migrate Process Hiccups

So, I’ve done migration processes for two medium-large database schemas (50-100 tables) and I have found what I believe to be a disconnect in the process of migrating and a database and developing a database application.


The Problem
——————-
Here is the problem in general.  I am sorry it is so long winded, but it’s hard to see what is going on without this full expression.

You create a schema, and use that schema to create a database.  Maybe
you are using Pylons, and using setup-app to create the schema. Everyone is happy in schema land, and then someone realizes you need to make a change, and maybe even more changes in the future.  You decide to use migrate, because hell, migrate will make things easier. So, you write the migration, modify your model code and everything is honky dory.  Clients are happy, your developer feels like he has a maintainable codebase.  Your project begins to grow.  Awesome.

Not Awesome.  Your second developer needs a development environment. You give them your model, they setup-app it and they are ready to go. While they are plugging along, you realize you need a new migrate, so you create it, and upgrade your development version of the database, which is now at migration 2.  The production also goes off without a hitch, it’s now on 2.

What about your new development buddy?  When he ran migrate_version, he set his database up to version 0.  So, migrate will try to do 0->1 and then 1->2, but 0->1 will fail because his database is already at version 1 even though his version table says its 0.  Now he is in pain, and re-creates his entire database to get moving, and the whole cycle starts over again, although he can work… until you do the next migration.

This is not ideal, and I have been struggling with it at one of my clients.  In that situation, I am the developer buddy, bopping in from time to time to lend a hand.  I spend probably 5% of my time getting up and running, where if their migration system was somehow linked to their database model, I could be synchronized, and actually use it for a development tool, rather than a dusty old production tool.

Solution 1
—————
So, I came up with my own solution, and have talked to a close friend about an alternative solution.  I’d like to discuss both of them as they pertain to migrate, in the hopes that we can improve migrate, and  re-connect the development environment with the production one.

My solution is to connect the model base code to the migration version.  The way I accomplish this is to add a version.py file at the root of my model code, which has, incredibly, the version of the migration changeset that matches this model definition.  I also have a custom database creation script which creates a migrate_version table and fills it with a record of the correct version number.  Now I can svn up, and migrate up and everyone is happy. The problem with this method is that I have to maintain a damn file with the version number, and not go insane doing it.  Since my model contains 3 different database schemas in it, that means I have to maintain 3 variables in my version.py .  Personally, I don’t mind this, but I had to write more documentation than the amount in this post to make certain that my predecessors (or me in 6 months) can figure out what the heck is going on.

Solution 2
————–
The second solution, which Mark came up with, is to preserve the initial model.  Then, you do all of the migrations to bring you up to the current codebase.  This does not require any version file tracking, and if your migrations add boilerplate entries, you don’t have to record them in a second place. I think solution 2 is reasonable, but could be more time consuming to execute (which is probably not really a problem with everyone’s 2ghz machines these days).  The problems I see are that the boilerplate entries you make for production may a not be the same as you make for development.  This makes testing a bit harder if they differ. I also feel that solution 2 is more prone to problems, simply because there are a lot more cranks to turn to get a development database.

How can migrate help
—————————–
The way to provide solution 1 to the problem in migrate would be to allow the developer to connect his migrations to his model code, and allow migrate some level of control over the “versions” module in the model code.  So, when you do a migrate commit, you are also saying that the model code supports this version of migration.  It will probably only take me  a few hours to make this happen.  We then create an easy way to create the migrate_version table by importing something.  This way you can have a custom createdb script that also creates the migrate_version table.  Perhaps we could even provide a reasonable createdb template.

Solution 2 only requires us to provide a custom creation script.  For this we will have to preserve the initial schema (schema 0) and then upgrade.  We could probably also include a template instead of a script so that you can manage boilerplate on your own.

My own conclusions
—————————-
I lean more towards solution 1, mostly because I feel that it will provide for easier boilerplating, because you will have access to your whole model at once, instead of bits and pieces of your model as it moves from one version the next.  I also think the boilerplate for dev. is likely to be a lot different than that of prod for most folks, and I feel like solution 1 offers an easier way of handling it.  It also seems like solution 1 is much more efficient, because you don’t have to go and re-do all of that migration work, you have already arrived at the correct-for-now solution.

If you made it this far congratulations!  Now, go grab a cup of coffee, stretch your arms, come back and tell me what you think.

Tags: , , , ,

6 Responses to “SQLAlchemy Migrate Process Hiccups”

  1. Hans says:

    Well, I haven’t used Migrate yet; I look forward to being able to do that on an upcoming project. I had to solve a similar problem with our PHP ORM solution not that long ago. What we ended up doing was actually storing versions of our model in the database — in the comments on table names, to be specific. We also had a lightweight set of DB utilities and baseclasses that we could extend with our actual migration scripts. Yes, this solution does require that you use an RDBMS that supports comments (we use postgresql), but in practice it actually has worked very well and it provides some additional assurance that our migration scripts won’t try to migrate an already-upgraded table. This is certainly not the best solution, but I wanted to offer another lightweight option to this migration issue.

    -H

  2. percious says:

    Thanks for the reply hans. This was exactly the kind of discussion I was hoping to get. You have a unique solution to the problem, and I must admit it has the benefit of being flexible in that you could use other migrate solutions since the migration data is stored within the database, but it scares me a little in the way that PHPNuke does… But seriously, I am interested in all solutions people have for this problem. The more I think about it, the more I like solution 2.

  3. I haven’t used sqlalchemy-migrate, but the way we deal with this in the alter scripts we wrote at work is to have each migration test the database to see if it needs to be applied. So if a script wants to add a column, first the test() method is invoked and it looks for the column. If it isn’t found, it returns True, and the alter() method is run. If the column is found by test(), it returns False and the alter() method is not run.

    We found this to be much more workable than depending on recording which alterations have been made to a given database — that’s just redundant information if you have the ability to look at the schema.

  4. Hans says:

    I should also clarify that the end solution for us was a little more involved than simply storing version number in the table comment. Because we are using an ORM that generates the SQL based on a schema.xml file (Propel, to be specific), we modified our build system to split out all of the DDL related to indexes, foreign keys, views, etc. Before we actually run the migration code for each table, we do things like drop all FKEY constraints on the table. In some cases we have to drop other dependent objects too (e.g. views). At the end of our migration code we actually re-apply all of that dependent object creation DDL; much of it does not get applied (because it was never dropped), but that has proven a clean way of upgrading non-table objects.

    So, the point I wanted to clarify is that while the table-version-in-model method only addresses versioning for tables, we found that this was all that we really *needed* to version since we could drop and recreate all the other (dependent) objects in the system.

    We also moved all of the comment-setting code into database functions — we have a set_table_version(tablename, version) method, for example — which does some fancy regex to replace or prepend the version information to a comment. And a get_table_version(tablename) to get the current version. End result just looks like “v1.0.2 – Table to store user data.”

    In any event, the approach shouldn’t scare you like PHPNuke :) There was actually some thought put into how this works — and keeping the abstraction points in place to avoid coupling it too closely to any one particular database or even any one particular versioning mechanism. (We could always re-implement the Upgrade->isEligible(Table) method to use something besides table comments to determine eligibility.)

    -H

  5. Jorge Vargas says:

    I got very little experience with migrate, but I think there is a 3th, from migrate docs:
    “A freshly versioned database begins at version 0 by default. This assumes the database is empty. (If this is a bad assumption, you can specify the version at the time the database is declared under version control, with the version_control command.) We’ll see that creating and applying change scripts changes the database’s version number. ”

    This implies you can alter the initial version number. Although I’m not sure if you can do that with a migrate command, so back to your example: Say it is your project and I’m the one hoping in from time to time, you will give the the current model and the current db number, which I’ll insert as my first migration, and with that we should be able to fool the db.

    Now that’s all theory as I haven’t been able to get migrations going for this project :)

  6. Lucas says:

    Seems like Hans and Doug solution combined should be new migrate plan.

    Add a
    set_table_version(tablename, version)

    Version could be a combination of numbers (production.testing.development)(1.3.503)

    then when doing upgrades use the solution that goes through each upgrade version from 0 but first check if this version of the upgrade was applied? (isEligible(CurrentTableVersion,upgradeTableVersion)

    If upgrade was not applied, apply it,
    if upgrade was applied move on to the next item.

    This would make sure you only upgrade from what you are on forward.

    There should also be a command to reset it all. For example after few years you are at production 40.1.0, and you don’t want to check all the upgrades from day1. There should be a reset that takes the current 40.1.0 and makes it a first upgrade.

    Now we just needs some code samples to fill these parts in and connect it to migrate.

    Thanks,
    Lucas

Leave a Reply