先安科技 - Odoo Commit 每日一读/8

今天给大家读的是2017年12月1日关于计划任务 ir.cron 的一次提交。

改进原因：

起源于 763d71 （我们也有读过ODOO COMMIT 每日一读/6 - 763D71）这次提交：如果系统中有 to x 状态（将要安装的，将要移除的，将要升级的）的模块计划任务就不执行。初看感觉没什么“毛病”，但是如果这个 to x 状态模块因为某些原因，更新/安装/卸载失败，他们就永远处在 to x 的“僵尸”状态，那么计划任务也永远停着。

改进方法：

这次提交首先会添加函数 reset_modules_state 用于重置“僵尸”模块，让他们的状态恢复到原始状态。此方法会被添加到模块加载方法里面，这对于新安装的数据库或者刚好执行了模块更新的数据库是有效的，但是如果数据库在用并存在了“僵尸”模块，而且一直没有执行模块更新，那么这个重置函数将不会起做用，你的计划任务也就一直“计划”，永远不会执行。针对这种情况，在执行计划任务的方法里会添加一个检测：下一次执行日期 nextcall 比起当前时间是否已经落后了5小时，也就是计划任务过了5小时也没执行成功，则执行 reset_modules_state ，把“僵尸”模块“R.I.P”（模块状态重置为之前的状态）

代码说明：

if changes:
     if not jobs:
         raise BadModuleState()
     # nextcall is never updated if the cron is not executed,
     # it is used as a sentinel value to check whether cron jobs
     # have been locked for a long time (stuck)
     parse = fields.Datetime.from_string
     oldest = min([parse(job['nextcall']) for job in jobs])
     if datetime.now() - oldest > MAX_FAIL_TIME:
         odoo.modules.reset_modules_state(db_name)
     else:
         raise BadModuleState()

这段代码虽然只有几行，但是如果看回原来提交可以了解到，提交者的原方案是通过在内存内进行比较来实现的，简单来说就是设置一个变量来保存模块的失败安装次数。看起来挺好的，但是大神odony指出如果在使用多进程（--worder X）的时候各个 woker 不会共享内存，所以这样的修改不可取。

经过“激烈”讨论（26个对话，12次提交）最后是通过上面代码来判断计划任务是否因为模块状态被“卡”住了。代码很简单因为计划任务里字段“下一执行日期”（ nextcall ），只有在计划任务成功执行后才会改变，所以可以通过这个字段跟当前时间做比较 datetime.now() ，如果超出指定时间 MAX_FAIL_TIME 则重置模块状态。

原始提交信息

From 3d1e23aaba2e220c7a92fb4323746a08a5a1b7ed Mon Sep 17 00:00:00 2001
From: Adrian Torres <[email protected]>
Date: Fri, 1 Dec 2017 14:15:42 +0000
Subject: [PATCH] [FIX] *: Reset module states on registry init error

Commit 763d714 introduced cron job locking for databases which had
modules with states set to 'to x', however if an
installation/uninstallation/upgrade fails, the state will stay at 'to
x', and it may stay in that state for an indefinite amount of time,
meaning that cron jobs could stay locked forever.

This commit fixes this in part by adding a cleanup function to loading.py that
will be executed whenever load_modules fails, the function will change
every 'to x' module to their original state, effectively unlocking the
execution of cron jobs.

This however only works to prevent "zombie" transient states for
brand new databases, however for existing databases which already
contain some modules in a zombie state it won't do anything unless
a module is installed/uninstalled/upgraded, which may never happen.

This is where the second part comes in (ir_cron.py), when failing to
execute crons, we check if the failure was due to bad module state
and if an arbitrary amount of time (5 hours as of this commit) has passed
since the last time it was supposed to be executed, if it is the case, it means
that the cron execution failed around 5 * 60 times (1 failure per minute for 5h)
in which case we assume that the crons are stuck because the db
has zombie states and we force a call to reset_module_states.
---
 odoo/addons/base/ir/ir_cron.py | 18 ++++++++++++++++--
 odoo/modules/__init__.py       |  2 +-
 odoo/modules/loading.py        | 21 +++++++++++++++++++++
 odoo/modules/registry.py       |  6 +++++-
 4 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/odoo/addons/base/ir/ir_cron.py b/odoo/addons/base/ir/ir_cron.py
index c577ea221acd4..59653ae5183a0 100644
--- a/odoo/addons/base/ir/ir_cron.py
+++ b/odoo/addons/base/ir/ir_cron.py
@@ -5,7 +5,7 @@
 import time
 import psycopg2
 import pytz
-from datetime import datetime
+from datetime import datetime, timedelta
 from dateutil.relativedelta import relativedelta

 import odoo
@@ -16,6 +16,7 @@
 _logger = logging.getLogger(__name__)

 BASE_VERSION = odoo.modules.load_information_from_description_file('base')['version']
+MAX_FAIL_TIME = timedelta(hours=5)  # chosen with a fair roll of the dice


 class BadVersion(Exception):
@@ -198,7 +199,7 @@ def _process_jobs(cls, db_name):
                 (version,) = cr.fetchone()
                 cr.execute("SELECT COUNT(*) FROM ir_module_module WHERE state LIKE %s", ['to %'])
                 (changes,) = cr.fetchone()
-                if not version or changes:
+                if version is None:
                     raise BadModuleState()
                 elif version != BASE_VERSION:
                     raise BadVersion()
@@ -209,6 +210,19 @@ def _process_jobs(cls, db_name):
                               ORDER BY priority""")
                 jobs = cr.dictfetchall()

+            if changes:
+                if not jobs:
+                    raise BadModuleState()
+                # nextcall is never updated if the cron is not executed,
+                # it is used as a sentinel value to check whether cron jobs
+                # have been locked for a long time (stuck)
+                parse = fields.Datetime.from_string
+                oldest = min([parse(job['nextcall']) for job in jobs])
+                if datetime.now() - oldest > MAX_FAIL_TIME:
+                    odoo.modules.reset_modules_state(db_name)
+                else:
+                    raise BadModuleState()
+
             for job in jobs:
                 lock_cr = db.cursor()
                 try:
diff --git a/odoo/modules/__init__.py b/odoo/modules/__init__.py
index 530f81ff5e394..b223df992802d 100644
--- a/odoo/modules/__init__.py
+++ b/odoo/modules/__init__.py
@@ -7,7 +7,7 @@

 from . import db, graph, loading, migration, module, registry

-from odoo.modules.loading import load_modules
+from odoo.modules.loading import load_modules, reset_modules_state

 from odoo.modules.module import (
     adapt_version,
diff --git a/odoo/modules/loading.py b/odoo/modules/loading.py
index e974484484125..950704e45e9e4 100644
--- a/odoo/modules/loading.py
+++ b/odoo/modules/loading.py
@@ -431,3 +431,24 @@ def load_modules(db, force_demo=False, status=None, update_module=False):
             _logger.log(25, "All post-tested in %.2fs, %s queries", time.time() - t0, odoo.sql_db.sql_counter - t0_sql)
     finally:
         cr.close()
+
+
+def reset_modules_state(db_name):
+    """
+    Resets modules flagged as "to x" to their original state
+    """
+    # Warning, this function was introduced in response to commit 763d714
+    # which locks cron jobs for dbs which have modules marked as 'to %'.
+    # The goal of this function is to be called ONLY when module
+    # installation/upgrade/uninstallation fails, which is the only known case
+    # for which modules can stay marked as 'to %' for an indefinite amount
+    # of time
+    db = odoo.sql_db.db_connect(db_name)
+    with db.cursor() as cr:
+        cr.execute(
+            "UPDATE ir_module_module SET state='installed' WHERE state IN ('to remove', 'to upgrade')"
+        )
+        cr.execute(
+            "UPDATE ir_module_module SET state='uninstalled' WHERE state='to install'"
+        )
+        _logger.warning("Transient module states were reset")
diff --git a/odoo/modules/registry.py b/odoo/modules/registry.py
index c92da2df13efd..50a0be95986f3 100644
--- a/odoo/modules/registry.py
+++ b/odoo/modules/registry.py
@@ -79,7 +79,11 @@ def new(cls, db_name, force_demo=False, status=None, update_module=False):
                 try:
                     registry.setup_signaling()
                     # This should be a method on Registry
-                    odoo.modules.load_modules(registry._db, force_demo, status, update_module)
+                    try:
+                        odoo.modules.load_modules(registry._db, force_demo, status, update_module)
+                    except Exception:
+                        odoo.modules.reset_modules_state(db_name)
+                        raise
                 except Exception:
                     _logger.exception('Failed to load registry')
                     del cls.registries[db_name]