2008-07-04 Fri
The July 2, 2008 edition of the Haskell Weekly News is now available. It summarises recent developments and discussion within the Haskell community.
The July 2, 2008 edition of the Tcl-URL! is now available. It summarises recent developments within the Tcl community.
Kite 1.0b3 has been released. Kite is a small, object-oriented language.
This release includes: support for regular expressions, support for anonymous recursive functions, improved debugging support, more reliable exception handling, and other changes.
From: Kite 1.0b3 Released
eTcl 1.0-rc29 has been released. eTcl is a standalone, single-executable distribution of Tcl, supporting a number of platforms and including several popular extensions.
This release includes: the use of Tcl/Tk 8.5.3, new commands in the wce package, support for anti-aliased lines in Pixane, applets to wrap sources into a standalone executable, and other changes.
From: eTcl 1.0-rc29 Released
Assume you're running MySQL with Innodb tables and you've got crappy hardware, driver bug, kernel bug, unlucky power failure or some rare MySQL bug and some pages in Innodb tablespace got corrupted. In such cases Innodb will typically print something like this:
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 7.
InnoDB: You may have to recover from a backup.
080703 23:46:16 InnoDB: Page dump in ascii and hex (16384 bytes):
... A LOT OF HEX AND BINARY DATA...
080703 23:46:16 InnoDB: Page checksum 587461377, prior-to-4.0.14-form checksum 772331632
InnoDB: stored checksum 2287785129, prior-to-4.0.14-form stored checksum 772331632
InnoDB: Page lsn 24 1487506025, low 4 bytes of lsn at page end 1487506025
InnoDB: Page number (if stored to page already) 7,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 6353
InnoDB: Page may be an index page where index id is 0 25556
InnoDB: (index "PRIMARY" of table "test"."test")
InnoDB: Database page corruption on disk or a failed
and crash with assertion failure.
So what can you do to recover such a table ?
There are multiple things which can get corrupted and I will be looking in details on the simple one in this article - when page in clustered key index is corrupted. It is worse compared to having data corrupted in secondary indexes, in which case simple OPTIMIZE TABLE could be enough to rebuild it, but it is much better compared to table dictionary corruption when it may be much harder to recover the table.
In this example I actually went ahead and manually edited test.ibd file replacing few bytes so corruption is mild.
First I should note CHECK TABLE in INNODB is pretty useless. For my manually corrupted table I am getting:
-
mysql> CHECK TABLE test;
-
ERROR 2013 (HY000): Lost connection TO MySQL server during query
-
-
mysql> CHECK TABLE test;
-
+-----------+-------+----------+----------+
-
| TABLE | Op | Msg_type | Msg_text |
-
+-----------+-------+----------+----------+
-
| test.test | CHECK | STATUS | OK |
-
+-----------+-------+----------+----------+
-
1 row IN SET (0.69 sec)
First run is check table in normal operation mode - in which case Innodb simply crashes if there is checksum error (even if we're running CHECK operation). In second case I'm running with innodb_force_recovery=1 and as you can see even though I get the message in the log file about checksum failing CHECK TABLE says table is OK. This means You Can't Trust CHECK TABLE in Innodb to be sure your tables are good.
In this simple corruption was only in the data portion of pages so once you started Innodb with innodb_force_recovery=1 you can do the following:
-
mysql> CREATE TABLE `test2` (
-
-> `c` char(255) DEFAULT NULL,
-
-> `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
-
-> PRIMARY KEY (`id`)
-
-> ) ENGINE=MYISAM;
-
Query OK, 0 rows affected (0.03 sec)
-
-
mysql> INSERT INTO test2 SELECT * FROM test;
-
Query OK, 229376 rows affected (0.91 sec)
-
Records: 229376 Duplicates: 0 Warnings: 0
Now you got all your data in MyISAM table so all you have to do is to drop old table and convert new table back to Innodb after restarting without innodb_force_recovery option. You can also rename the old table in case you will need to look into it more later. Another alternative is to dump table with MySQLDump and load it back. It is all pretty much the same stuff. I'm using MyISAM table for the reason you'll see later.
You may think why do not you simply rebuild table by using OPTIMIZE TABLE ? This is because Running in innodb_force_recovery mode Innodb becomes read only for data operations and so you can't insert or delete any data (though you can create or drop Innodb tables):
-
mysql> OPTIMIZE TABLE test;
-
+-----------+----------+----------+----------------------------------+
-
| TABLE | Op | Msg_type | Msg_text |
-
+-----------+----------+----------+----------------------------------+
-
| test.test | OPTIMIZE | error | Got error -1 FROM storage engine |
-
| test.test | OPTIMIZE | STATUS | Operation failed |
-
+-----------+----------+----------+----------------------------------+
-
2 rows IN SET, 2 warnings (0.09 sec)
That was easy, right ?
I also thought so, so I went ahead and edited test.ibd a little more wiping one of the page headers completely. Now CHECK TABLE would crash even with innodb_force_recovery=1
080704 0:22:53 InnoDB: Assertion failure in thread 1158060352 in file btr/btr0btr.c line 3235
InnoDB: Failing assertion: page_get_n_recs(page) > 0 || (level == 0 && page_get_page_no(page) == dict_index_get_page(index))
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
If you get such assertion failures most likely higher innodb_force_recovery values would not help you - they are helpful in case there is corruption in various system areas but they can't really change anything in a way Innodb processes page data.
The next comes trial and error approach:
-
mysql> INSERT INTO test2 SELECT * FROM test;
-
ERROR 2013 (HY000): Lost connection TO MySQL server during query
You may think will will scan the table until first corrupted row and get result in MyISAM table ? Unfortunately test2 ended up to be empty after the run. At the same time I saw some data could be selected. The problem is there is some buffering taking place and as MySQL crashes it does not store all data it could recover to MyISAM table.
Using series of queries with LIMIT can be handly if you recover manually:
-
mysql> INSERT IGNORE INTO test2 SELECT * FROM test LIMIT 10;
-
Query OK, 10 rows affected (0.00 sec)
-
Records: 10 Duplicates: 0 Warnings: 0
-
-
mysql> INSERT IGNORE INTO test2 SELECT * FROM test LIMIT 20;
-
Query OK, 10 rows affected (0.00 sec)
-
Records: 20 Duplicates: 10 Warnings: 0
-
-
mysql> INSERT IGNORE INTO test2 SELECT * FROM test LIMIT 100;
-
Query OK, 80 rows affected (0.00 sec)
-
Records: 100 Duplicates: 20 Warnings: 0
-
-
mysql> INSERT IGNORE INTO test2 SELECT * FROM test LIMIT 200;
-
Query OK, 100 rows affected (1.47 sec)
-
Records: 200 Duplicates: 100 Warnings: 0
-
-
mysql> INSERT IGNORE INTO test2 SELECT * FROM test LIMIT 300;
-
ERROR 2013 (HY000): Lost connection TO MySQL server during query
As you can see I can get rows from the table in the new one until we finally touch the row which crashes MySQL. In this case we can expect this is the row between 200 and 300 and we can do bunch of similar statements to find exact number doing "binary search"
Note even if you do not use MyISAM table but fetch data to the script instead make sure to use LIMIT or PK Rangers when MySQL crashes you will not get all data in the network packet you potentially could get due to buffering.
So now we found there is corrupted data in the table and we need to somehow skip over it. To do it we would need to find max PK which could be recovered and try some higher values
-
mysql> SELECT max(id) FROM test2;
-
+---------+
-
| max(id) |
-
+---------+
-
| 220 |
-
+---------+
-
1 row IN SET (0.00 sec)
-
-
mysql> INSERT IGNORE INTO test2 SELECT * FROM test WHERE id>250;
-
ERROR 2013 (HY000): Lost connection TO MySQL server during query
-
-
mysql> INSERT IGNORE INTO test2 SELECT * FROM test WHERE id>300;
-
Query OK, 573140 rows affected (7.79 sec)
-
Records: 573140 Duplicates: 0 Warnings: 0
So we tried to skip 30 rows and it was too little while skipping 80 rows was OK. Again using binary search you can find out how many rows do you need to skip exactly to recover as much data as possible. Row size can be good help to you. In this case we have about 280 bytes per row so we get about 50 rows per page so not a big surprise 30 rows was not enough - typically if page directory is corrupted you would need to skip at least whole page. If page is corrupted at higher level in BTREE you may need to skip a lot of pages (whole subtree) to use this recovery method.
It is also well possible you will need to skip over few bad pages rather than one as in this example.
Another hint - you may want to CHECK your MyISAM table you use for recovery after MySQL crashes to make sure indexes are not corrupted.
So we looked at how to get your data back from simple Innodb Table Corruption. In more complex cases you may need to use higher innodb_force_recovery modes to block purging activity, insert buffer merge or recovery from transactional logs all together. Though the lower recovery mode you can run your recovery process with better data you're likely to get.
In some cases such as if data dictionary or "root page" for clustered index is corrupted this method will not work well - in this case you may wish to use Innodb Recovery Toolkit which is also helpful in cases you've want to recover deleted rows or dropped table.
I should also mention at Percona we offer assistance in MySQL Recovery, including recovery from Innodb corruptions and deleted data.
Entry posted by peter | No comment
因为大多数网络问题,往往都是无疾而终。
这当然不是说网络是和人一样有再生功能的,肚皮上切两道口子,不理它过几天就好了,网络可没有这个功能。虽然也有太阳黑子活动剧烈导致网络中断,不管它也会恢复的例子,但大多数的网络问题,还是“没病死不了人”,没有人去解决,不会自己消失。

查!看是不是铁道游击队改行了?
但是,IT行里心知肚明的一个潜规则就是,ISP很少承认自己网络存在故障。这在我国十分典型。您公司要是有专线可能会体会到,要是线路断掉,给电信的八路打电话去问,八路的回答往往是 – “我们查一下阿。”
然后,没动静了。
再打,人家会说 – “没问题阿,你再试一下。”
聪明人一试,通了,什么话也别说,打枪的不要,乖乖接着干活儿就完了。
笨的呢,通了,还给人家八路打电话 – “长官,你们干了什么了?怎么就通了?”

人家回答 – “我们什么也没干阿。”
浪费电话钱。。。
你又不是电信的领导,人家就发现混线了或者哪个Shaping做得不对有必要跟你汇报么?
国外的ISP要客气得多,五分钟一次地跟你汇报进展,就是每次都没找着毛病之前自然病愈。
你只能相信他也不明白是怎么回事。。。
这次,我们组的工程师上来一测,线路两端通畅,六脉吉祥,心里大体就有数了 – 多半是哪家服务商的线路出了故障,可没抓住人家的手。公司内部有监控系统,但监控的对象是点对点,中间这段儿的炮楼是好几家负责的,究竟哪里出了问题,刁 世贵说是汤司令通八路,汤司令说是易先生脚踩两条船,易先生说刁世贵是地下党,那可是个打不清的官司。
现在,虽然慢,接到报告时数据传送的事情已经干完了,而且网上测了测,所有指标都正常,看不出不正常的地方来。工程师只能发个公函给各个服务商,要每家给份报告说明自己那段有没有看见过八路,结果自然可想而知。

太君,俺们这疙瘩肯定没来过八路
这种事,通常拖过一段大家就会忘记。
然而,这一次不行了,第二天业务部门又半夜鸡叫起来 – 还是慢。
不敢怠慢,工程师登录上去看的时候,数据还在传,果然是慢悠悠的样子。一面冒火一面首先联系各炮楼。
结果还是 – 太君,八路的没有。

不信我们的话,有种你自己上炮楼瞧去
还真说不出什么来。
为什么呢?
线路测试确实表明,依然是两端通畅,六脉吉祥,传输时限要求是200MS,结果只有127MS,这供应商是个有心的,专门提供给我们一个网页可以随时看他们的网络状况,真的假的不说,显得满有诚意。
上面这张图是个例子,并不是当天的实际情况,可以看到作为ISP,提供的资料还是很详细的。
特别是我们一直担心的数据传输出错率,这里可以看到是0.00%,说明出错很低,按说不应该有问题啊。
还有一种可能,双方负责传输的网络设备负荷过高,无法承担工作。
检查了双方路由器的工作状况,CPU和内存的使用率都不高,显得富富有余。
可数据传输还是慢。
一线不行,就得叫二线了,半夜把老萨叫醒。

老大,快出来看看,我们顶不住啦!
半夜里爬起来,就算不是冬天也不舒服,不过,对于解决问题,老萨比这哥们儿还自信。
因为网络上的故障,说穿了复杂的并不多,多半是土八路用汽油桶放炮仗,红灯照作法,只要找到症结,不难解决。
于是,听了听这位的汇报 –
两端路由器之间没有丢包,两端对着测一下,速度值令人满意。看来,至少广域网上不该有问题。
为什么不从两端的服务器做一个简单的点对点Ping对测呢?
这比较困难,主要是防火墙上对Ping这种简单的测试手段进行了屏蔽。这还要归功于当年中国红客攻打白宫造成的震慑。为了报复美机轰炸中国驻南斯拉夫大使 馆,2001年5月4日,八万名大多不懂网络技术为何物的中国网民,在某些“别有用心的人”指挥下,于固定时间一二三同时对白宫网站的地址发出了Ping 测试的数据包。百川归海,这些每一个都不起眼的数据流汇集到白宫就成了超强的“数据炸弹”,白宫网站因而一时瘫痪。
从那儿以后,各国乃至各大公司,普遍在防火墙上设置拒绝Ping所使用的ICMP数据包。由于这是一条国际线路,双方合作精神不够,都没有为对方的Ping测试解禁,这下子给寻找问题带来了意外的困难。
一方面联系双方的防火墙负责人紧急解禁,一方面我们也不能干等着吧。萨调出Sky-X上的数据传输记录来,想看出问题何在。
几分钟以后,还真让我看出点儿东西来。
初看,传输的速度很高,最高的时候几乎把10M的带宽占满,似乎网络传输的效率颇高。但是,老萨把传输记录每秒取一值,问题就露出来了 。。。
在我的眼前出现了一排锯齿。
而正常的数据传输,应该是一条平滑曲线阿。
曙光,好像就在前面。
好像。。。
[待续]

线路的,不通了,谁的干活?
从公司结构来说,业务部门是一线,IT是二线。人家要面对客户的,地位自然高。虽然是放高利贷的吧,这年头一个杨白劳后头都跟着七八个周扒皮,你满脸堆笑 还未必能借出钱去。在大阪梅田车站一走就有穿超短裙的小姑娘往您怀里塞广告附送小礼物 – 什么内容?请您向周扒皮借钱啊。
所以,业务部门的不容易,人家回了公司就是劳苦功高的大爷,有需求做IT的绝不能等闲视之。而且,业务部门的人对IT感情复杂。这帮家伙无所事事的时候自 然让人看了有气,而如果他们无故地在那儿忙活就更加危险 – 肯定是琢磨着用什么机器代替一线的人去干活,如果成功后面接着就是人事部琢磨怎么砍人头。这简直是背后打冷枪的干活阿!
逮到这种好机会可以对IT牛气一下,自然要把架子端足,“此问题已经威胁到业务部门能否按时营业”,“迅速查清,否则本月业务额会产生相当下降”,“亚洲 区总裁昨日开会询问此事是否已经解决”。。。 一系列类似哀的美敦书的玩意儿充斥老萨的邮箱,夹杂着北京某报催稿的通知,让人感觉红烧肉都不那么对胃口了。
那么,为什么不解决呢?
老萨当然不是不想解决,侦骑四出,捉不良人全撒出去,已经折腾四五回了,一直没拿下来,这问题的症结始终没有找到。

向组织上汇报一下,再宽限两天成么?
问题的情况是这样的。
我们公司每天夜间都要进行大量的越洋数据传送,六月十一日,传输速度忽然降低到之前的1/10左右,业务部门的主管前来告状,说IT,你们的线路地嗝屁嗝屁死啦死啦地有。
不能说IT部门不重视。当天我们负责维护的小组就开始了调查,但是徒劳无功。
这并不是说负责的工程师无能。这条线路的情况的确比较复杂。
这是一条InternetVPN虚拟线路,也就是说,从两端不同国家ISP供应商那里购买数据端口,中间国际部分则通过互联网,使用IPSEC技术实现虚拟通道和安全要求。试想,建立一条跨国线路什么价格?两个Internet端口什么价格?如此设计可以大大降低费用。
然而,这也使故障发生时的调查比较复杂。
首先,这条线路涉及多个供应商
第一Internet的端口的供应商。我们在两国选用了相同的ISP供应商,并要求两端口之间的通讯在若干指标上有保障即SLA,费用比正常Internet端口高,但服务质量要好。
其中最重要的三项保障是传送时限(Latency),即一端数据到另一端的时间,如果这个时间太长,会降低数据传输的效率;出错率(Packet loss),这个可以理解,如果出错太多,数据就要不断重复传递,那是不能接受的;还有可用带宽(Bandwidth Availibility),设想两端我们各购买10M的端口,因为端口之间是Internet,如果经过的某个节点恰好是www.daier.com的 下载端口,每天好几百万鹅迷从上面荡眼罩,那两个端口之间我们能利用的带宽可能就被挤压成几百K了,形成瓶颈,那也是不行的。
顺便说一下,InternetVPN技术在中国与国外的数据链接中还缺乏唱主角的可能,最大的问题是国内标准Internet网络入口有集中管理和过滤处理的程序,因此,会造成传送时限过大的问题,从技术上限制了这一解决方案的使用。
网络供应商提供的保证,能够切实地实现,是这里发生故障最大的隐患。
第二是两端的本地线路供应商,以我所在一端为例,公司数据中心在大阪,而Internet供应商的端口在东京,大阪和东京之间我们使用了MPLS连结。以 日本而言,虽然网络故障也有发生,其网络服务还算相对稳定可靠。问题的隐患主要存在于不同网络服务商之间,可能存在一些设备和参数不匹配的现象,特别是如 果某个服务商更换了某个节点上自己的设备,由于各国标准不同,数据传送越洋后,数据包到了对端可能不能被识别而退回。
这就像我们进行国际投递一样,袁世凯从河南给爱迪生寄一坛子臭豆腐,彰德邮局可能说,嗯,味道顶好。等到了旧金山,美国佬可能拒送 – OMG,化学武器啊!
这就是两边规矩不同造成的问题,国际线路上的数据传递,每每有类似的现象。
第三是线路终端设备,一般在任一端的连接方式是
数据传递用的服务器 –〉 线路交换器(Switch)-〉防火墙 –〉线路交换器 –〉路由器 –〉远程线路接入设备 – 〉线路 –〉对端
线路交换器,路由器都使用的是Cisco一家的产品,相对来说不匹配的问题比较少些。不过,我们这套设备,还增加一个用于提高传输效率的加速器 Sky-X,增加了调查的复杂程度。
一般来说,这种数据传输故障的调查,我们会集中于三个方面。
第一个,有没有线路故障,这要找各家供应商来询问了。如果NTT说 – 太君,两个小时之前,东京到福冈之间的线路上,铁道游击队把光缆给扒了,那问题就算找到。
第二个,最近有没有做过系统的调整,比如昨天还挺好,今天换了个路有器就死啦死啦地了,就得查这个路由器是不是李向阳卖来的。
第三个,设备会不会有故障。通常,即便象Cisco这样的王牌公司,设备也不免在使用中发现某种潜在问题。如果没有按时安装他们提供的补丁,也可能造成某种问题。比如,发现特高课的里面藏着土八路,不及时让松田太君去检查一下早晚会出乱子。

怎么看怎么正常,要多正常有多正常,这个事儿,是谁干的呢?
当天晚上,上述的三个方面,工程师都没有检查出问题来,到处是老乡在喊平安无事。
不过,当时也没太着急。
[待续]
Recently I had a customer ask me about loading two huge files into InnoDB with LOAD DATA INFILE. The goal was to load this data on many servers without putting it into the binary log. While this is generally a fast way to load data (especially if you disable unique key checks and foreign key checks), I recommended against this. There are several problems with the very large transaction caused by the single statement. We didn't want to split the file into pieces for the load for various reasons. However, I found a way to load the single file in chunks as though it were many small files, which avoided splitting the file and let us load with many transactions instead of one huge transaction.
The smaller file is 4.1GB and has 260M lines in it; each row is just two bigints. The bigger file was about 20GB and had wider rows with textual data and about 60M lines (as I recall).
When InnoDB loads the file, it creates one big transaction with a lot of undo log entries. This has a lot of costs. To name a few:
- the big LOAD DATA INFILE clogs the binary log and slows replication down. If the load takes 4 hours on the master, it will cause the slave to fall 4 hours behind.
- lots of undo log entries collect in the tablespace. Not only from the load -- but from other transactions' changes too; the purge thread cannot purge them, so everything gets bloated and slow. Even simple SELECT queries might have to scan through lots of obsolete, but not-yet-purged, row versions. Later, the purge thread will have to clean these up. This is how you make InnoDB behave like PostgreSQL
- If the undo log space grows really big, it won't fit in the buffer pool and InnoDB essentially starts swapping between its buffer pool and the tablespace on disk.
Most seriously, if something should happen and the load needs to roll back, it will take a Very Long Time to do -- I hate to think how long. I'm sure it would be faster to just shut everything down and re-clone the machine from another, which takes about 10 or 12 hours. InnoDB is not optimized for rollbacks, it's optimized for transactions that succeed and commit. Rollback can take an order of magnitude longer to do.
For that reason, we decided to load the file in chunks of a million rows each. (InnoDB internally does operations such as ALTER TABLE in 10k row chunks, by the way; I chose 1M because the rows were small). But how to do this without splitting the file? The answer lies in the Unix fifo. I created a script that reads lines out of the huge file and prints them to a fifo. Then we could use LOAD DATA INFILE on the fifo. Every million lines, the script prints an EOF character to the fifo, closes it and removes it, then re-creates it and keeps printing more lines. If you 'cat' the fifo file, you get a million lines at a time from it. The code is pretty simple and I've included it in Maatkit just for fun. (It's unreleased as of yet, but you can get it with the following command: "wget http://www.maatkit.org/trunk/fifo").
So how did it work? Did it speed up the load?
Not appreciably. There actually was a tiny speedup, but it's statistically insignificant IMO. I tested this first on an otherwise idle machine with the same hardware as the production machines. First, I did it in one big 4.1GB transaction, then I did it 1 million rows at a time. Here's the CREATE TABLE:
-
CREATE TABLE load_test (
-
col1 bigint(20) NOT NULL,
-
col2 bigint(20) DEFAULT NULL,
-
KEY(col1),
-
KEY(col2)
-
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Here's the result of loading the entire 4GB file in one chunk:
-
time mysql -e "set foreign_key_checks=0; set sql_log_bin=0; set unique_checks=0; load data local infile 'infile.txt' into table load_test fields terminated by '\t' lines terminated by '\n' (col1, col2);"
-
-
real 234m53.228s
-
user 0m1.098s
-
sys 0m5.959s
While this ran, I captured vmstat output every 5 seconds and logged it to a file; I also captured the output of "mysqladmin ext -ri5 | grep Handler_write" and logged that to a file.
To load the file in chunks, I split my screen session in two and then ran (approximately -- edited for clarity) the following in one terminal:
-
perl mk-fifo-split infile.txt --fifo /tmp/my-fifo --lines 1000000
And this in the other terminal:
-
while [ -e /tmp/my-fifo ]; do
-
mysql -e "..... same as above.... ";
-
sleep 1;
-
done
After I was done, I ran a quick Perl script on the vmstat and mysqladmin log files to grab out the disk activity and rows-per-second to see what the progress was. Here are some graphs. This one is the rows per second from mysqladmin, and the blocks written out per second from vmstat.

And this one is the bytes/sec from Cacti running against this machine. This is only the bytes out per second; for some reason Cacti didn't seem to be capturing the bytes in per second.

You can see how the curves are roughly logarithmic, which is what you should expect for B-Tree indexes. The two curves on the Cacti graph actually show both files being loaded. It might seem counter-intuitive, but the second (smaller) curve is actually the larger file. It has fewer rows and that's why it causes less I/O overall.
I also used 'time' to run the Perl fifo script, and it used a few minutes of CPU time during the loads. So not very much at all.
Some interesting things to note: the load was probably mostly CPU-bound. vmstat showed from 1% to 3% I/O wait during this time. (I didn't think to use iostat to see how much the device was actually used, so this isn't a scientific measurement of how much the load was really waiting for I/O). The single-file load showed about 1 or 2 percent higher I/O wait, and you can see the single-file load uses more blocks per row; I can only speculate that this is the undo log entries being written to disk. (Peter arrived at the same guess independently.)
Unfortunately I didn't think to log the "cool-down period" after the load ended. It would be fun to see that. Cacti seemed to show no cool-down period -- as soon as the load was done it looked like things went back to normal. I suspect that's not completely true, since the buffer pool must have been overly full with this table's data.
Next time I do something like this I'll try smaller chunks, such as 10k rows; and I'll try to collect more stats. It would also be interesting to try this on an I/O-bound server and see what the performance impact is, especially on other transactions running at the same time.
Entry posted by Baron Schwartz | 2 comments





