PythonのRDBマイグレーションツール「Alembic」を使ってみる

こんにちは。エンジニアのnobushiです。以前Pythonフレームワークの FastAPI とORMライブラリ SQLAlchemy を紹介しました。

今回は運用時に必要となるデータベースのマイグレーションを行ってみたいと思います。マイグレーションツールは Alembic です。 URLから分かる通り SQLAlchemy のマイグレーションツールです。

導入
実行
所感

導入

docker-compose が使える環境で、任意のディレクトリに以下のファイルを配置してください。今回はちょっとファイルが多いです。

/
  alembic/
    versions/
      dc3f6aa44f46_create_some_infos_table.py
    env.py
    script.py.mako
  alembic.ini
  docker-compose.yml
  Dockerfile
  pyproject.toml
  main.py

docker-compose.yml

version: '3.5'
services:
  db:
    image: mysql:8.0
    container_name: db
    command: >
      --default-authentication-plugin=mysql_native_password
      --character-set-server=utf8mb4
      --collation-server=utf8mb4_bin
    environment:
      MYSQL_ROOT_PASSWORD: secret
      MYSQL_DATABASE: app
      MYSQL_USER: johndoe
      MYSQL_PASSWORD: secret
    ports:
      - 3306:3306
    restart: always
  app:
    build:
      context: .
    container_name: app
    depends_on:
      - db
    ports:
      - 80:80
    restart: always

mysqlとアプリケーションのコンテナを定義しています。

Dockerfile

FROM python:3.11-alpine3.17

WORKDIR /app

CMD ["hypercorn", "main:app", "--bind", "0.0.0.0:80", "--access-logfile", "-", "--error-logfile", "-"]

ENV PYTHONPYCACHEPREFIX=/var/cache/python
ENV PYTHONPATH=/app

EXPOSE 80

ARG POETRY_VERSION=1.3.2

# Install Poetry
RUN apk update --no-cache &&\\
    apk add --no-cache curl autoconf g++ libtool make libffi-dev &&\\
    curl -sSL <https://install.python-poetry.org> |\\
    POETRY_HOME=/opt/poetry POETRY_VERSION=${POETRY_VERSION} python &&\\
    cd /usr/local/bin &&\\
    ln -s /opt/poetry/bin/poetry &&\\
    poetry config virtualenvs.create false &&\\
    apk del --no-cache curl autoconf g++ libtool make &&\\
    rm -rf /tmp/*

# Install Libraries
COPY ./pyproject.toml ./poetry.lock* /app/
RUN apk update --no-cache &&\\
    apk add --no-cache autoconf g++ libtool make &&\\
    poetry install --no-root &&\\
    apk del --no-cache autoconf g++ libtool make &&\\
    rm -rf /tmp/*

# Uninstall Poetry
RUN apk update --no-cache &&\\
    apk add --no-cache curl autoconf g++ libtool make libffi-dev &&\\
    unlink /usr/local/bin/poetry &&\\
    curl -sSL <https://install.python-poetry.org> |\\
    POETRY_HOME=/opt/poetry POETRY_VERSION=${POETRY_VERSION} python - --uninstall &&\\
    apk del --no-cache curl autoconf g++ libtool make &&\\
    rm -rf /tmp/*

COPY . /app

pythonコンテナをベースにpoetryによるライブラリインストールを実行しています。

実行時には必要ないのでコンテナサイズ削減のためpoetryはアンインストールしています。

pyproject.toml

[tool.poetry]
name = "fastapi-alembic-sample"
version = "0.1.0"
description = ""
authors = []

[tool.poetry.dependencies]
python = "3.11.*"
alembic = "1.10.*"
fastapi = "0.94.*"
hypercorn = "0.14.*"
pymysql = "1.0.*"
python-multipart = "0.0.*"
sqlalchemy = "2.0.*"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core=1.1.*"]
build-backend = "poetry.core.masonry.api"

fastapiとsqlalchemyの構成に、今回のテーマである alembic を追加しています。

alembic設定関連ファイル

alembic設定関連ファイルは以下です。

/
  alembic/
    env.py
    script.py.mako
  alembic.ini

これらは Alembicのinitコマンドで自動的に生成されるものです。

中身についてはあまり気にする必要はありませんが最低限 alembic.ini のデータベースの接続先は設定する必要があります。

以下は alembic.ini のデータベースの接続先の設定を書き換えた部分です。

sqlalchemy.url = mysql+pymysql://johndoe:secret@db/app

設定内容は以前のブログでSQLAlchemyの接続先として設定したものと同じです。

各ファイルの内容も記述しておきます。

alembic.ini

# A generic, single database configuration.

[alembic]
# path to migration scripts
script_location = alembic

# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
# Uncomment the line below if you want the files to be prepended with date and time
# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
# for all available tokens
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s

# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .

# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the python-dateutil library that can be
# installed by adding `alembic[tz]` to the pip requirements
# string value is passed to dateutil.tz.gettz()
# leave blank for localtime
# timezone =

# max length of characters to apply to the
# "slug" field
# truncate_slug_length = 40

# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false

# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false

# version location specification; This defaults
# to alembic/versions.  When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "version_path_separator" below.
# version_locations = %(here)s/bar:%(here)s/bat:alembic/versions

# version path separator; As mentioned above, this is the character used to split
# version_locations. The default within new alembic.ini files is "os", which uses os.pathsep.
# If this key is omitted entirely, it falls back to the legacy behavior of splitting on spaces and/or commas.
# Valid values for version_path_separator are:
#
# version_path_separator = :
# version_path_separator = ;
# version_path_separator = space
version_path_separator = os  # Use os.pathsep. Default configuration used for new projects.

# set to 'true' to search source files recursively
# in each "version_locations" directory
# new in Alembic version 1.10
# recursive_version_locations = false

# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8

sqlalchemy.url = mysql+pymysql://johndoe:secret@db/app


[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts.  See the documentation for further
# detail and examples

# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME

# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
level = WARN
handlers = console
qualname =

[logger_sqlalchemy]
level = WARN
handlers =
qualname = sqlalchemy.engine

[logger_alembic]
level = INFO
handlers =
qualname = alembic

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic

[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S

alembic/env.py

from logging.config import fileConfig

from sqlalchemy import engine_from_config
from sqlalchemy import pool

from alembic import context

# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config

# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
    fileConfig(config.config_file_name)

# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = None

# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.


def run_migrations_offline() -> None:
    """Run migrations in 'offline' mode.

    This configures the context with just a URL
    and not an Engine, though an Engine is acceptable
    here as well.  By skipping the Engine creation
    we don't even need a DBAPI to be available.

    Calls to context.execute() here emit the given string to the
    script output.

    """
    url = config.get_main_option("sqlalchemy.url")
    context.configure(
        url=url,
        target_metadata=target_metadata,
        literal_binds=True,
        dialect_opts={"paramstyle": "named"},
    )

    with context.begin_transaction():
        context.run_migrations()


def run_migrations_online() -> None:
    """Run migrations in 'online' mode.

    In this scenario we need to create an Engine
    and associate a connection with the context.

    """
    connectable = engine_from_config(
        config.get_section(config.config_ini_section, {}),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
    )

    with connectable.connect() as connection:
        context.configure(
            connection=connection, target_metadata=target_metadata
        )

        with context.begin_transaction():
            context.run_migrations()


if context.is_offline_mode():
    run_migrations_offline()
else:
    run_migrations_online()

alembic/script.py.mako

"""${message}

Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}

"""
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}

# revision identifiers, used by Alembic.
revision = ${repr(up_revision)}
down_revision = ${repr(down_revision)}
branch_labels = ${repr(branch_labels)}
depends_on = ${repr(depends_on)}


def upgrade() -> None:
    ${upgrades if upgrades else "pass"}


def downgrade() -> None:
    ${downgrades if downgrades else "pass"}

main.py

from functools import wraps
import threading
from typing import Any
from typing import Callable
from typing import Generator
import uuid

from fastapi import Depends
from fastapi import FastAPI
from fastapi import Form
from fastapi import Request
from sqlalchemy import Column
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy.future import create_engine
from sqlalchemy.orm import declarative_base
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker

engine = create_engine("mysql+pymysql://johndoe:secret@db/app")

Base = declarative_base()

class SomeInfo(Base):
    __tablename__ = "some_infos"
    id = Column(Integer, primary_key=True)
    desc = Column(String(255), nullable=True)

ScopedSession = scoped_session(
    sessionmaker(bind=engine, future=True),
)

app = FastAPI()

def entrypoint(func: Callable[..., Any]) -> Callable[..., Any]:
    @wraps(func)
    def _entry_point(*args: Any, **keywords: Any) -> Any:
        session: Session = ScopedSession()
        try:
            result = func(*args, **keywords)
        except Exception:
            session.rollback()
            raise
        else:
            session.commit()
        finally:
            ScopedSession.remove()
        return result
    return _entry_point

@app.post(
    "/info",
)
@entrypoint
def post_info(
    desc: str = Form(...),
) -> dict:
    session: Session = ScopedSession()
    with session.begin_nested():
        some_info = SomeInfo(
            desc=desc,
        )
        session.add(some_info)
        session.flush()
        return {
            "id": some_info.id,
            "desc": some_info.desc,
        }

こちらも以前のブログとほとんど同じ内容ですが、以下の行が無くなっています。

Base.metadata.create_all(engine)

この行はModel定義に従ってテーブルを作成するコードでした。今回はその役割をAlembicが担うことになるのでこの行は不要となります。

alembic/versions/dc3f6aa44f46_create_some_infos_table.py

このファイルがAlembicのキモとなるマイグレーションスクリプトです。

ベースとなる部分はこのページのコマンドで作成します。

"""create some_infos table

Revision ID: dc3f6aa44f46
Revises:
Create Date: 2023-03-11 13:26:20.603428

"""
from alembic import op
import sqlalchemy as sa

# revision identifiers, used by Alembic.
revision = 'dc3f6aa44f46'
down_revision = None
branch_labels = None
depends_on = None

def upgrade() -> None:
    op.create_table(
        "some_infos",
        sa.Column("id", sa.BigInteger, primary_key=True),
        sa.Column("desc", sa.Text),
    )

def downgrade() -> None:
    op.drop_table("some_infos")

作成したてのスクリプトファイルは upgrade() と downgrade() が空になっているので、そこに詳細を記述します。マイグレーションを進める方が upgrade() 、戻る方が downgrade() です。

def upgrade() -> None:
    op.create_table(
        "some_infos",
        sa.Column("id", sa.BigInteger, primary_key=True),
        sa.Column("desc", sa.Text),
    )

upgrade() では some_infos テーブルを作成しています。

カラムは main.py のModel定義と同じ内容です。

def downgrade() -> None:
    op.drop_table("some_infos")

downgrade() では upgrade() で行った変更を元に戻すため逆のことを行います。 upgrade() がテーブル作成の場合は単純に作成したテーブルを削除すれば良いです。

実運用では downgrade() を使用することはおそらくないので必ずしも実装は必須ではないですが、開発時にはあったほうが何かと便利です。とはいえ完全に戻すことは難しいこともよくあるので可能な限り記述するという方向で良いかと思います。

実行

実行してみましょう。

まずはコンテナを立ち上げます。

> docker compose -p sample build
> docker compose -p sample up -d

＊プロジェクト名を固定するため、 -p sample を指定してます。

この時点ではまだデータベースが作成されていません。試しにAPIを呼び出してみましょう。

> curl -X POST -d 'desc=hoge' <http://localhost/info>
Internal Server Error

エラーになりました。テーブルが作成されていないのが原因ですね。

では、Alembicを実行してテーブルを作成しましょう。

Alembic実行のためには以下のコマンドを実行します。

> docker exec app alembic upgrade head

すでに起動している app コンテナで Alembic の upgrade コマンドを実行しています。

head は先端まで、という意味です。

このコマンドにより前述の Alembic のマイグレーションスクリプトの upgrade() を全て実行することになります。

> docker exec app alembic upgrade head
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> dc3f6aa44f46, create some_infos table

これで some_infos テーブルが作成されました。

再度先程のAPIを実行してみましょう。

> curl -X POST -d 'desc=hoge' <http://localhost/info>
{"id":1,"desc":"hoge"}

今度は成功しました。

所感

サービスをローンチした後もデータベースのスキーマを更新することはよくあります。その際にマイグレーションツールは必須とも言えるでしょう。

ただ、実際に運用する際にはこのマイグレーションツールをいつどうやって実行するかも重要なポイントとなります。その点については別途書こうと思います。