分頁查詢引起的線上錯誤

開發了乙個統計的系統，資料同步工作使用定時任務完成。在資料同步的過程中有乙個步驟，需要同步更新乙個字段。
簡化舉例：
已有的產品表及生產資料（good_type存在null的情況）：
create table `good` (
`good_id` varchar(128) not null comment '主鍵id',
`good_name` varchar(128) not null comment '產品名稱',
`good_type` char(2) character set utf8mb4 collate utf8mb4_0900_ai_ci default null comment '產品型別（01 | 02 | 03）',
primary key (`good_id`)
) engine=innodb default charset=utf8mb4 collate=utf8mb4_0900_ai_ci comment='產品資訊表'
統計表：
create table `goodstat` (
`id` bigint not null auto_increment comment '主鍵id',
`good_name` varchar(128) character set utf8mb4 collate utf8mb4_0900_ai_ci not null comment '產品名稱',
`good_type` char(1) character set utf8mb4 collate utf8mb4_0900_ai_ci default null comment '產品型別（01 | 02 | 03 |...）',
primary key (`id`)
) engine=innodb default charset=utf8mb4 collate=utf8mb4_0900_ai_ci comment='產品資訊統計表'
前期需求設計，需要將表good中資料同步至goodstat。因中間存在業務邏輯處理，good表中的資料分兩步同步到goodstat中：
同步產品的基本資訊（good_type除外）至goodstat中；
統一更新goodstat中的good_type欄位。
因資料量巨大，第一步一開始使用的是以下sql語句，結果鎖表差點跑路。。。
insert into table2(field1,field2,...) select value1,value2,... from table1
第二步也為了快速使用了批量更新語句。。。同樣會鎖表
-- 對統計表goodstat中good_type欄位為null的資料進行關聯update
update goodstat
left join good on goodstat.good_id = good.good_id
set goodstat.good_type = good.good_type
where goodstat.good_type is null;
既然不能一次性同步資料，那就只能分批進行處理，雖然效率下降了不少，對存量資料的同步處理時間增加了2倍左右，但對增量資料卻無太大影響。
批量查詢，批量插入（這裡不展開敘述，重點是後面）
批量查詢good_type，批量更新
-- 批量查詢
select 
goodstat.id, good.good_type
from good
inner join goodstat on good.good_id = goodstat.good_id
where goodstat.good_type is null
limit offset, rows;
-- 批量更新
update goodstat set good_type =
when # then #
where id in
#
sql的改造基本就是如此，但在生產上卻出現了統計資料錯誤的問題（測試居然沒有測出來。。），通過排查生產資料庫，發現goodstat表中部分產品的good_type為null，於是在dev環境進行了模擬重現。
結果是：
1. good表資料量2w+，goodstat表資料量2w+，基本一致（存在髒資料清洗的操作）
2. good表中good_type有值資料量為1.8w+，而goodstat表中只有2k+資料的good_type欄位有值
通過debug以及對sql日誌的檢視，發現update語句更新資料量確實只有2000+，往前查詢，發現問題出在了select語句的查詢條件上。
...
where goodstat.good_type is null
...
之前使用一次性更新操作是沒有問題，增加這個條件也是為了提高效率，對good_type有值的資料不進行更新操作（業務場景下good_type確定後不會改變）。
但分批查詢然後更新的這種操作則不可以使用這個條件，這會導致部分資料被分頁條件跳過。
先放示意圖：
本來期望的是，先取出第1-1000條資料，更新good_type欄位，然後取第1001-2000條資料，直至結束。
但每一次批量查詢更新操作並不是所有資料都的good_type都有值，存在null的情況。
因此，如上圖，在第一次操作後，有700條資料更新了good_type，但有300條資料依舊為null，那麼在第二查詢時需要跳過（offset）1000條good_type為null的資料，就會把本應該是在第二次查詢結果中的700條資料也跳過去，並且會存在累加的情況。
或者簡單講：每一次查詢的總數量是變動的，因為每update一次後，部分資料的good_type已經不是null了。
-- 批量查詢總數量是變動的
select count(*) from (
select
goodstat.id, good.good_type
from good
inner join goodstat on good.good_id = goodstat.good_id
where goodstat.good_type is null
/*limit offset, rows;*/
)
修改查詢條件，根據上一次定時任務開始的時間查詢需要更新的資料，不管good_type是否為null，對增量資料統一處理（首次上線上一次定時任務開始時間被設定為1970-1-1 00:00:00）。
-- 批量查詢（laststattime - 上次定時任務開始時間）
select goodstat.id, good.good_type
from good inner join goodstat on good.good_id = goodstat.good_id
where 
= laststattime ]]>
= laststattime]]>
limit offset, rows;
				NULL空記錄引起的查詢錯誤
我們執行一下 錯誤原因 原來是因為表sbqiye裡有空記錄，很容易讓人想到，是我們在把excel表匯入的時候導主了空記錄，這樣的空記錄，一般在表的最後面，我們開啟表sbqiye，定位到最後一條記錄 果然有兩條空記錄，這是罪魁禍首，只要把這兩條空記錄刪除，再次執行第乙個圖中的命令，就能正確查出記錄了。...
				LOCAL LISTENER 引起的錯誤
local listener l2 啟動例項的時候 會先到 tnsnames.ora 檔案裡取查詢定義名為l2的tns服務名.例項起來後會嘗試把資料庫服務註冊到address指定位址。而這個位址就是你自己定義的listener所監聽的位址，這樣oracle例項就會把資料庫服務自動註冊到非預設的 了。...
				malloc函式引起的意外錯誤
在實現乙個簡單的字元對應函式的過程中發現了這個不算是bug的錯誤 這個錯誤出現的前提是這樣的 main函式如下 int main int argc,char argv getnodename函式 char getnodename int num printf n nout of the switch...
分頁查詢引起的線上錯誤

NULL空記錄引起的查詢錯誤

LOCAL LISTENER 引起的錯誤

malloc函式引起的意外錯誤

相關推薦