看了mysqlperformance的一篇关于flush tables with read lock的文章,里面提到了它可能引发一些问题。好了,现学现卖,分享给大家。
现在很多的mysql备份工具在实现原理上都利用到了flush tables with read lock。这是为备份myisam表而设计的。像xtrabackup备份innodb表时并不会锁表,因为它也会备份在备份过程中新发生的事务日志,而对于myisam表的备份则是通过发出命令flush tables with read lock,然后拷贝myisam的相关表文件。那么在实际备份过程中可能会出现什么问题呢?答案是:有可能导致备份的时间严重的增长。下面来说说为什么会这样。
flush tables with read lock,也就是将所有的脏页都要刷新到磁盘,然后对所有的表加上了读锁,于是这时候直接拷贝数据文件也就是安全的。但是如果你发出命令flush tables with read lock时,还有其他的操作,而起是很耗时的操作呢?先说写操作,这个FTWRL肯定是得等的,等写操作完成才能执行FTWRL,这个很好理解。那么对于其他的读操作呢?比如说在FLWRL发出之前有一个query:select count(*) from tb,那么FTWRL也得等待(show processlist可以看到 waiting for table flush)。你可能会说在mysql中读与读不是不会排斥的吗,为什么需要等待呢?因为FTWRL是要flush脏页的,只有这样才真的能保证数据一致性(比如说在xtrabackup备份myisam表的时候),而在select count(*) from tb执行的时候,因为所有的操作都是在内存中操作,所以此时还不能完全flush,因此FTWRL就得等待。或许你还会有疑问,select的页不是脏页,为什么FTWRL还要等待呢?难道mysql不能做得更完善点吗?我觉得mysql还不是不会做的这么简单吧,等待的原因是因为这个表很大,无法一次性将所有的页都读到内存中来,而query具有原子性,总不可能执行一般被堵塞吧,所以说还是得乖乖的让它执行然,所以FTWRL就得等待了。
所以在利用xtrabackup、ibbackup这种备份工具的时候,也要考虑到这点。
2012年3月31日星期六
2012年3月25日星期日
mysql报错:two many connections/two many open files的解决方法
最近在论坛里面看到有几个朋友都发帖求助,原因是mysql报错:
[ERROR] Error in accept: Two many open files。
而且这位朋友还说他并没有同时打开这么多文件,那么这是什么原因呢?
首先,我们需要了解的是,在linux里面打开文件是以文件描述符(FD,file descripter)的形式打开的,每打开一次文件,那么os就分配给你一个文件描述符,对于同一个文件如果有多个进程打开,那么就可以分配多个文件描述符。
好了,现在我们回到这个问题上,这位朋友说他并没有一次性打开这么多的文件,怎么会出现这个问题呢?刚才说了,每个进程打开同一个文件都可能分配一个独立的FD,而mysql正是这么做的,每个session开打的数据文件的描述符都是独立的,而对于索引文件则是所有的session贡献,我先将手册里面的原话贴出来:
MySQL is multi-threaded, so there may be many clients issuing queries for a given table simultaneously. To minimize the problem with multiple client sessions having different states on the same table, the table is opened independently by each concurrent session. This uses additional memory but normally increases performance. WithMyISAMtables, one extra file descriptor is required for the data file for each client that has the table open. (By contrast, the index file descriptor is shared between all sessions.)
好了,那你现在应该明白了,为什么并没有同时打开那么文件,但是却有那么多的文件描述符。弄清楚问题,那么现在就来解决这个问题。
首先是增大table_open_cache,具体用法参考手册上说明
然后根本解决方法是在os上增大mysqld这个线程的最大开打文件数,在linux里面可以通过编辑文件 /etc/security/limits.conf
然后里面的信息如下:
# /etc/security/limits.conf
#
#Each line describes a limit for a user in the form:
#
#<domain> <type> <item> <value>
#
#Where:
#<domain> can be:
# - an user name
# - a group name, with @group syntax
# - the wildcard *, for default entry
# - the wildcard %, can be also used with %group syntax,
# for maxlogin limit
#
#<type> can have the two values:
# - "soft" for enforcing the soft limits
# - "hard" for enforcing hard limits
#
#<item> can be one of the following:
# - core - limits the core file size (KB)
# - data - max data size (KB)
# - fsize - maximum filesize (KB)
# - memlock - max locked-in-memory address space (KB)
# - nofile - max number of open files
# - rss - max resident set size (KB)
# - stack - max stack size (KB)
# - cpu - max CPU time (MIN)
# - nproc - max number of processes
# - as - address space limit
# - maxlogins - max number of logins for this user
# - maxsyslogins - max number of logins on the system
# - priority - the priority to run user process with
# - locks - max number of file locks the user can hold
# - sigpending - max number of pending signals
# - msgqueue - max memory used by POSIX message queues (bytes)
# - nice - max nice priority allowed to raise to
# - rtprio - max realtime priority
#
#<domain> <type> <item> <value>
#
这个里面可以对进程需要的很多资源进行配置,比如说我们现在需要就是onfile这个选项
open file
而且我们有两种配置方式,第一是针对用户组的形式,第二具体用户的形式
比如有mysql用户
mysql soft nofile 131072
上面就限制了mysql用户最大的打开文件数。同时可以通过ulimit命令来查看这些信息。
但有时候,我们遇到的不是two many open files 而是 two many connections,这个就是并发的连接数,通过修改参数mysql的max_connections来解决,但是需要注意的是当这个参数增大的时候,同时可能会引发two many open files的问题,原因上面已经解释过了。
自己并没有遇到过这个问题,只是简单模拟了一下,如果有错误,欢迎指正。
[ERROR] Error in accept: Two many open files。
而且这位朋友还说他并没有同时打开这么多文件,那么这是什么原因呢?
首先,我们需要了解的是,在linux里面打开文件是以文件描述符(FD,file descripter)的形式打开的,每打开一次文件,那么os就分配给你一个文件描述符,对于同一个文件如果有多个进程打开,那么就可以分配多个文件描述符。
好了,现在我们回到这个问题上,这位朋友说他并没有一次性打开这么多的文件,怎么会出现这个问题呢?刚才说了,每个进程打开同一个文件都可能分配一个独立的FD,而mysql正是这么做的,每个session开打的数据文件的描述符都是独立的,而对于索引文件则是所有的session贡献,我先将手册里面的原话贴出来:
MySQL is multi-threaded, so there may be many clients issuing queries for a given table simultaneously. To minimize the problem with multiple client sessions having different states on the same table, the table is opened independently by each concurrent session. This uses additional memory but normally increases performance. WithMyISAMtables, one extra file descriptor is required for the data file for each client that has the table open. (By contrast, the index file descriptor is shared between all sessions.)
好了,那你现在应该明白了,为什么并没有同时打开那么文件,但是却有那么多的文件描述符。弄清楚问题,那么现在就来解决这个问题。
首先是增大table_open_cache,具体用法参考手册上说明
然后根本解决方法是在os上增大mysqld这个线程的最大开打文件数,在linux里面可以通过编辑文件 /etc/security/limits.conf
然后里面的信息如下:
# /etc/security/limits.conf
#
#Each line describes a limit for a user in the form:
#
#<domain> <type> <item> <value>
#
#Where:
#<domain> can be:
# - an user name
# - a group name, with @group syntax
# - the wildcard *, for default entry
# - the wildcard %, can be also used with %group syntax,
# for maxlogin limit
#
#<type> can have the two values:
# - "soft" for enforcing the soft limits
# - "hard" for enforcing hard limits
#
#<item> can be one of the following:
# - core - limits the core file size (KB)
# - data - max data size (KB)
# - fsize - maximum filesize (KB)
# - memlock - max locked-in-memory address space (KB)
# - nofile - max number of open files
# - rss - max resident set size (KB)
# - stack - max stack size (KB)
# - cpu - max CPU time (MIN)
# - nproc - max number of processes
# - as - address space limit
# - maxlogins - max number of logins for this user
# - maxsyslogins - max number of logins on the system
# - priority - the priority to run user process with
# - locks - max number of file locks the user can hold
# - sigpending - max number of pending signals
# - msgqueue - max memory used by POSIX message queues (bytes)
# - nice - max nice priority allowed to raise to
# - rtprio - max realtime priority
#
#<domain> <type> <item> <value>
#
这个里面可以对进程需要的很多资源进行配置,比如说我们现在需要就是onfile这个选项
open file
而且我们有两种配置方式,第一是针对用户组的形式,第二具体用户的形式
比如有mysql用户
mysql soft nofile 131072
上面就限制了mysql用户最大的打开文件数。同时可以通过ulimit命令来查看这些信息。
但有时候,我们遇到的不是two many open files 而是 two many connections,这个就是并发的连接数,通过修改参数mysql的max_connections来解决,但是需要注意的是当这个参数增大的时候,同时可能会引发two many open files的问题,原因上面已经解释过了。
自己并没有遇到过这个问题,只是简单模拟了一下,如果有错误,欢迎指正。
2012年3月24日星期六
shell命令 2>&1
在unix高级环境编程第三章的习题有一个问题:
./a.out > outfile 2>&1 # 1
./a.out 2&1 > outfile # 2
问这两者有什么区别。
这个首先的说明几点:
1.在unix和类unix(linux之类)中以文件描述符的形式打开一个文件,这个文件描述符是一个非负数。
2.0代表标准输入,1代表标准输出,2代表标准错误输出
3.默认的情况下是1 所以> equal 1>,比如我们用文件描述符5打开了一个文件a,可以写成5>
4.shell命令的执行是从左往右
那么现在来看这两者有什么区别
对于第一条shell,首先是将a.out的内容标准输出,然后通过>(等价于1>,所以只对标准输出起作用)重定向到outfile,然后2>&1说明标准错误输出也往标准输出去,而由于之前标准输出以及重定向到了outfile,所以这里的错误输出也会重定向到outfile。
再看第二天shell,首先是a.out到标准输出,然后2>&1,所以标准错误输出也重定向到保准输出(注意此时的标准输出没有被重定向,所以默认是终端上),然后 > outfile ,这个说明此时再将标准输出到outfile,但是请注意标准错误输出2>不会也跟着被重定向到outfile,因为更改标准输出重定向在2>&1的后面。
至此你明白了吧,不信你可以自己试试这个结果,比如按照我下面的:
angbosen@db-9:~/shell-comand$ echo "aaa" > out.file 2>&1
angbosen@db-9:~/shell-comand$ cat out.file
aaa
angbosen@db-9:~/shell-comand$ echo "aaa" 2>&1 > out2.file
angbosen@db-9:~/shell-comand$ cat out2.file
aaa
angbosen@db-9:~/shell-comand$ cmd "aaa" 2>&1 > out2.file
-bash: cmd: command not found
hangbosen@db-9:~/shell-comand$ cmd "aaa" > out.file 2>&1
angbosen@db-9:~/shell-comand$ cat out.file
-bash: cmd: command not found
理解了吧,真的挺有意思的。
./a.out > outfile 2>&1 # 1
./a.out 2&1 > outfile # 2
问这两者有什么区别。
这个首先的说明几点:
1.在unix和类unix(linux之类)中以文件描述符的形式打开一个文件,这个文件描述符是一个非负数。
2.0代表标准输入,1代表标准输出,2代表标准错误输出
3.默认的情况下是1 所以> equal 1>,比如我们用文件描述符5打开了一个文件a,可以写成5>
4.shell命令的执行是从左往右
那么现在来看这两者有什么区别
对于第一条shell,首先是将a.out的内容标准输出,然后通过>(等价于1>,所以只对标准输出起作用)重定向到outfile,然后2>&1说明标准错误输出也往标准输出去,而由于之前标准输出以及重定向到了outfile,所以这里的错误输出也会重定向到outfile。
再看第二天shell,首先是a.out到标准输出,然后2>&1,所以标准错误输出也重定向到保准输出(注意此时的标准输出没有被重定向,所以默认是终端上),然后 > outfile ,这个说明此时再将标准输出到outfile,但是请注意标准错误输出2>不会也跟着被重定向到outfile,因为更改标准输出重定向在2>&1的后面。
至此你明白了吧,不信你可以自己试试这个结果,比如按照我下面的:
angbosen@db-9:~/shell-comand$ echo "aaa" > out.file 2>&1
angbosen@db-9:~/shell-comand$ cat out.file
aaa
angbosen@db-9:~/shell-comand$ echo "aaa" 2>&1 > out2.file
angbosen@db-9:~/shell-comand$ cat out2.file
aaa
angbosen@db-9:~/shell-comand$ cmd "aaa" 2>&1 > out2.file
-bash: cmd: command not found
hangbosen@db-9:~/shell-comand$ cmd "aaa" > out.file 2>&1
angbosen@db-9:~/shell-comand$ cat out.file
-bash: cmd: command not found
理解了吧,真的挺有意思的。
2012年3月23日星期五
mysql空用户(user列为空)带来的影响
今天搭建一个测试环境时,遇到了一个很隐蔽的问题。有必要分享一下
| user | password | host |
+------+-------------------------------------------+--------------------+
| root | *81F5E21E35407D884A6CD4A731AEBFB6AF209E1B | localhost |1
| root | | db-2.photo.xxx.org |2
| root | | 127.0.0.1 |3
| root | | ::1 |4
| | | localhost |5
| | | db-2.photo.xxx.org |6
| sys | *xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | db-2.photo.xxx.org |7
...
...
如上图所示,这个里面存在着两个用户名为空的情况。然后问题就是由此而起。
在本机(服务器,db-2.photo.xxx.org) mysql -usys -ppass -h`hostname` 或者 mysql -usys -ppass -h`hostname -i` 都提示无法连接。此时配置文件里面没有 --skip-name-resolve这个参数(DNS解析用的)而用户表里面明明有这个sys@db-2.photo.xxx.org 而且密码也对了啊?起初没有注意到这个空用户,没想到它的陷阱,搞了很久也没连上,后来只好请老大了,老大过来一看,直接drop user ''@db-2.photo.xxx.org(删掉这个空用户,老大V5),然后再连,OK 搞定。
这是什么原因呢?
因为mysql在验证权限的时候,首先是验证host列,如果host列在验证user列,再password列,而现在按照我之前的连接语句:按照host列首先找到第6行,然后发现这行的user列为空(空匹配所有用户名),所以匹配到了这条记录,然后发现这条记录的密码为空,而我的连接语句里面有密码,那么就会报错。
好了,你可能会说要是我在连接的时候 -h指定的是ip地址,而不是域名(db-2.photo.xxx.org)那么不就不会匹配到了这条记录么?但是我在前面说了--skip-name-resolve这个参数我没有指定,那么就会DNS解析,将域名解析成IP地址。所以还是会被匹配到的。于是我的连接语句改为 mysql -usys -h`hostname -i` 或者 mysql -usys -h`hostname` 此时就可以连接成功,但是请注意,虽然可以连接成功,但是实际上这个连上去的用户多大权限,也就是空用户所具有的权限。而且,我后来模拟的时候,将第6行的存放顺序(物理磁盘上)和第7行反过来(这个可以通过optimize 一下table,然后顺序插入两条用户记录解决,虽然不能说绝对的成功,因为插在磁盘上的哪个位置不是我们能控制的,但是除非你RP败坏,可能会出现后插入的记录在物理磁盘上反而在前面)然后再进行前面的连接测试 mysql -usys -ppass -h`hostname -i` 或者 mysql -usys -ppass -h`hostname`, ok没问题了。同时如果将上面的连接语句去掉 -p选项,那么就是密码错误,也不会再去判断第7行了(已经调换顺序的第7行)
所以解决办法就是:
1.刚装好mysql后就直接删除那个空用户
2.连接的时候,-h+ip连接,且在配置文件里面指定 --skip-name-resolve,这样同时还能消除DNS解析带来的性能影响
另外,如果直接指定了socket的话,也不会存在这个问题,此时连接就是通过socket,而不是tcp/ip。 同时,如果你在别的机器上连接肯定也没问题,因为空用户默认的host就是主机地址。
不明白mysql5.5为什么这样设计,在mysql5.1里面就没有这个空用户。
2012年3月20日星期二
正确理解innodb参数innodb_thread_concurrency
首先查看了系统的信息,
innodb_thread_concurrency | 8
--------
FILE I/O
--------
I/O thread 0 (thread id: 18578) state:
waiting for completed aio requests (insert buffer thread)
I/O thread 1 (thread id: 18579) state:
waiting for completed aio requests (log thread)
I/O thread 2 (thread id: 18580) state:
waiting for completed aio requests (read thread)
I/O thread 3 (thread id: 18581) state:
waiting for completed aio requests (read thread)
I/O thread 4 (thread id: 18582) state:
waiting for completed aio requests (read thread)
I/O thread 5 (thread id: 18583) state:
waiting for completed aio requests (read thread)
I/O thread 6 (thread id: 18584) state:
waiting for completed aio requests (write thread)
I/O thread 7 (thread id: 18585) state:
waiting for completed aio requests (write thread)
I/O thread 8 (thread id: 18586) state:
waiting for completed aio requests (write thread)
I/O thread 9 (thread id: 18587) state:
waiting for completed aio requests (write thread)
用show engine innodb status出来的结果就有10 IO个线程,而innodb_thread_concurrency却为8。
这是为什么呢?原来这个innodb_thread_concurrency指的是mysql调用innodb的线程最大并发数,大家知道由于mysql特有的插件式存储引擎架构,先上mysql架构图:
所以上层要使用innodb存储引擎的时候,肯定是mysql开启线程去调用innodb存储引擎进行相关工作,所以这里的innodb_thread_concurrency指的就是innodb接受mysql线程调用的最大线程数,而不是在innodb内部工作的线程数。之前用show engine innodb status显示的线程就是innodb内部的线程。
OK,问题解决了,顺便还提一句,还有一个参数thread_concurrency
这个代表的什么呢?
手册上有解释:This variable is specific to Solaris systems,而且在mysql5.6.1中就被废弃了
2012年3月18日星期日
[转载]运维必须要知道的几个Linux服务器监控命令
总结得相当全面,留下来以便查找
有些 Linux 发行版会提供 GUI 程序来进行系统的监控,例如 SUSE Linux 就有一个非常棒而且专业的工具 YaST,KDE 的 KDE System Guard 同样很出色。当然,要使用这些工具,你必须在服务器跟前进行操作,而且这些 GUI 的程序占用了很多系统资源,所以说,尽管 GUI 用来做基本的服务器健康状态监测挺好,但如果你想知道真正发生什么,请关掉 GUI 开始命令行之旅吧。
你应该只在需要的时候去启动 GUI ,不用的时候关掉它。如果要让服务器保持最佳性能,你应该将 Linux 服务器的运行级别 runlevel 设置为 3 ,就是控制台模式,当你需要图形化桌面的时候使用 startx 命令来启动它。
如果你的服务器启动后就直接进入图形界面,你需要修改配置 /etc/inittab 找到 initdefault 一样,将 id:5:initdefault 修改为 id:3:initdefault。
如果你没找到 /etc/inittab 文件,那就创建一个新的,文件内容增加 id:3 这么一行。这样下次服务器启动的时候就不会进入图形界面。如果你不想等到服务器重启的时候才生效,你可以执行 init 3 这个命令。
一旦你的服务器是在控制台模式下运行,你就可以开始我们接下来的内容。
iostat
iostat 命令用来显示存储子系统的详细信息,通常用它来监控磁盘 I/O 的情况。要特别注意 iostat 统计结果中的 %iowait 值,太大了表明你的系统存储子系统性能低下。
meminfo 和 free
Meminfo 可让你获取内存的详细信息,你可以使用 cat 和 grep 命令来显示 meminfo 信息:
# cat /proc/meminfo
另外你可以使用 free 命令来显示动态的内存使用信息,free 只是给你大概的内存信息,而 meminfo 提供的信息更加详细。
mpstat
mpstat mpstat是MultiProcessor Statistics的缩写,是实时系统监控工具。其报告与CPU的一些统计信息,这些信息存放在/proc/stat文件中。在多CPUs系统里,其不但能查看所有CPU的平均状况信息,而且能够查看特定CPU的信息。
netstat
Netstat 和 ps 命令类似,是 Linux 管理员基本上每天都会用的工具,它显示了大量跟网络相关的信息,例如 socket 的使用、路由、接口、协议、网络等等,下面是一些常用的参数:
1-a Show all socket information
2-r Show routing information
3-i Show network interface statistics
4-s Show network protocol statistics
你应该只在需要的时候去启动 GUI ,不用的时候关掉它。如果要让服务器保持最佳性能,你应该将 Linux 服务器的运行级别 runlevel 设置为 3 ,就是控制台模式,当你需要图形化桌面的时候使用 startx 命令来启动它。
如果你的服务器启动后就直接进入图形界面,你需要修改配置 /etc/inittab 找到 initdefault 一样,将 id:5:initdefault 修改为 id:3:initdefault。
如果你没找到 /etc/inittab 文件,那就创建一个新的,文件内容增加 id:3 这么一行。这样下次服务器启动的时候就不会进入图形界面。如果你不想等到服务器重启的时候才生效,你可以执行 init 3 这个命令。
一旦你的服务器是在控制台模式下运行,你就可以开始我们接下来的内容。
iostat
iostat 命令用来显示存储子系统的详细信息,通常用它来监控磁盘 I/O 的情况。要特别注意 iostat 统计结果中的 %iowait 值,太大了表明你的系统存储子系统性能低下。
meminfo 和 free
Meminfo 可让你获取内存的详细信息,你可以使用 cat 和 grep 命令来显示 meminfo 信息:
# cat /proc/meminfo
另外你可以使用 free 命令来显示动态的内存使用信息,free 只是给你大概的内存信息,而 meminfo 提供的信息更加详细。
mpstat
mpstat mpstat是MultiProcessor Statistics的缩写,是实时系统监控工具。其报告与CPU的一些统计信息,这些信息存放在/proc/stat文件中。在多CPUs系统里,其不但能查看所有CPU的平均状况信息,而且能够查看特定CPU的信息。
netstat
Netstat 和 ps 命令类似,是 Linux 管理员基本上每天都会用的工具,它显示了大量跟网络相关的信息,例如 socket 的使用、路由、接口、协议、网络等等,下面是一些常用的参数:
1-a Show all socket information
2-r Show routing information
3-i Show network interface statistics
4-s Show network protocol statistics
nmon
Nmon, 是 Nigel's Monitor 的缩写,是一个使用很普遍的开源工具,用以监控 Linux 系统的性能。Nmon 监控多个子系统的性能数据,例如处理器的使用率、内存使用率、队列、磁盘I/O统计、网络I/O统计、内存页处理和进程信息。Nmon 也提供了一个图形化的工具。
要运行 nmon,你可以在命令行中启动它,然后选择要监控的子系统,这些子系统都对应有一个快捷键,例如输入 c 可查看 CPU 信息,m用于查看内存,d用来查看磁盘信息等,你也可以使用 -f 命令将 nmon 的执行结果保存到一个 CSV 文件中,便于日后分析。
在每日的监控工作中,我发现 nmon 是我最常用的工具。
pmap
pmap 命令用来报告每个进程占用内存的详细情况,可用来看是否有进程超支了,该命令需要进程 id 作为参数。
ps 和 pstree
ps 和 pstree 命令是 Linux 系统管理员最好的朋友,都可以用来列表正在运行的所有进程。ps 告诉你每个进程占用的内存和 CPU 处理时间,而 pstree 显示的信息没那么详细,但它以树形结构显示进程之间的依赖关系,包括子进程信息。一旦发现某个进程有问题,你可以使用 kill 来杀掉它。
sar
sar 程序是系统监控工具里的瑞士军刀。该程序包含三个工具:sar 用来显示数据,sa1 和 sa2 用来收集数据并保存。sar 可用来显示 CPU 使用率、内存页数据、网络 I/O 和传输统计、进程创建活动和磁盘设备的活动详情。sar 和 nmon 最大的不同就是 sar 跟适合用作长期的监控,而 nmon 可以让你快速的了解系统当前状态。
strace
strace 经常被认为是程序员调试的工具,但不止如此。它可以记录进程进行系统调用的详情,因此它也是一个非常好的诊断工具,例如你可以使用它来找出某个程序正在打开某个配置文件。
Strace 也有一个缺陷,但它在跟踪某个进程时会让该进程的性能变得非常差,因此请谨慎使用。
tcpdump
Tcpdump 是一个简单、可靠的网络监控工具,用来做基本的协议分析,看看那些进程在使用网络以及如何使用网络。当然,如果你要获取跟详细的信息,你应该使用 Wireshark (下面我们会介绍).
top
top 命令显示当前的活动进程,默认它是按消耗 CPU 的厉害程度进行排序,每5秒钟刷新一次列表,你也可以选择不同的排序方式,例如 m 是按内存占用方式进行排序的快捷键。
uptime
uptime 命令告诉你这台服务器从开机启动到现在已经运行了多长时间了。同时也包含了从启动到现在服务器的平均负载情况。
vmstat
你可以使用 vmstat 来监控虚拟内存,一般 Linux 上的开发者喜欢使用虚拟内存来获得最佳的存储性能。该命令报告关于内核线程、虚拟内存、磁盘、陷阱和 CPU 活动的统计信息。由 vmstat 命令生成的报告可以用于平衡系统负载活动。系统范围内的这些统计信息(所有的处理器中)都计算出以百分比表示的平均值,或者计算其总和。
Wireshark
Wireshark, 前身是 Ethereal ,是一个网络协议检测程序,让您经由程序抓取运行的网站的相关资讯,包括每一封包流向及其内容、资讯可依操作系统语系看出,方便查看、监控TCP session动态等等.
这里罗列的是大多数最有价值的 Linux 监控程序,如果你全都能够熟练运用,那么你肯定在linux管理方面非常得心应手了。
Nmon, 是 Nigel's Monitor 的缩写,是一个使用很普遍的开源工具,用以监控 Linux 系统的性能。Nmon 监控多个子系统的性能数据,例如处理器的使用率、内存使用率、队列、磁盘I/O统计、网络I/O统计、内存页处理和进程信息。Nmon 也提供了一个图形化的工具。
要运行 nmon,你可以在命令行中启动它,然后选择要监控的子系统,这些子系统都对应有一个快捷键,例如输入 c 可查看 CPU 信息,m用于查看内存,d用来查看磁盘信息等,你也可以使用 -f 命令将 nmon 的执行结果保存到一个 CSV 文件中,便于日后分析。
在每日的监控工作中,我发现 nmon 是我最常用的工具。
pmap
pmap 命令用来报告每个进程占用内存的详细情况,可用来看是否有进程超支了,该命令需要进程 id 作为参数。
ps 和 pstree
ps 和 pstree 命令是 Linux 系统管理员最好的朋友,都可以用来列表正在运行的所有进程。ps 告诉你每个进程占用的内存和 CPU 处理时间,而 pstree 显示的信息没那么详细,但它以树形结构显示进程之间的依赖关系,包括子进程信息。一旦发现某个进程有问题,你可以使用 kill 来杀掉它。
sar
sar 程序是系统监控工具里的瑞士军刀。该程序包含三个工具:sar 用来显示数据,sa1 和 sa2 用来收集数据并保存。sar 可用来显示 CPU 使用率、内存页数据、网络 I/O 和传输统计、进程创建活动和磁盘设备的活动详情。sar 和 nmon 最大的不同就是 sar 跟适合用作长期的监控,而 nmon 可以让你快速的了解系统当前状态。
strace
strace 经常被认为是程序员调试的工具,但不止如此。它可以记录进程进行系统调用的详情,因此它也是一个非常好的诊断工具,例如你可以使用它来找出某个程序正在打开某个配置文件。
Strace 也有一个缺陷,但它在跟踪某个进程时会让该进程的性能变得非常差,因此请谨慎使用。
tcpdump
Tcpdump 是一个简单、可靠的网络监控工具,用来做基本的协议分析,看看那些进程在使用网络以及如何使用网络。当然,如果你要获取跟详细的信息,你应该使用 Wireshark (下面我们会介绍).
top
top 命令显示当前的活动进程,默认它是按消耗 CPU 的厉害程度进行排序,每5秒钟刷新一次列表,你也可以选择不同的排序方式,例如 m 是按内存占用方式进行排序的快捷键。
uptime
uptime 命令告诉你这台服务器从开机启动到现在已经运行了多长时间了。同时也包含了从启动到现在服务器的平均负载情况。
vmstat
你可以使用 vmstat 来监控虚拟内存,一般 Linux 上的开发者喜欢使用虚拟内存来获得最佳的存储性能。该命令报告关于内核线程、虚拟内存、磁盘、陷阱和 CPU 活动的统计信息。由 vmstat 命令生成的报告可以用于平衡系统负载活动。系统范围内的这些统计信息(所有的处理器中)都计算出以百分比表示的平均值,或者计算其总和。
Wireshark
Wireshark, 前身是 Ethereal ,是一个网络协议检测程序,让您经由程序抓取运行的网站的相关资讯,包括每一封包流向及其内容、资讯可依操作系统语系看出,方便查看、监控TCP session动态等等.
这里罗列的是大多数最有价值的 Linux 监控程序,如果你全都能够熟练运用,那么你肯定在linux管理方面非常得心应手了。
2012年3月14日星期三
innodb_thread_concurrency
首先查看了系统的信息,
innodb_thread_concurrency | 8
--------
FILE I/O
--------
I/O thread 0 (thread id: 18578) state:
waiting for completed aio requests (insert buffer thread)
I/O thread 1 (thread id: 18579) state:
waiting for completed aio requests (log thread)
I/O thread 2 (thread id: 18580) state:
waiting for completed aio requests (read thread)
I/O thread 3 (thread id: 18581) state:
waiting for completed aio requests (read thread)
I/O thread 4 (thread id: 18582) state:
waiting for completed aio requests (read thread)
I/O thread 5 (thread id: 18583) state:
waiting for completed aio requests (read thread)
I/O thread 6 (thread id: 18584) state:
waiting for completed aio requests (write thread)
I/O thread 7 (thread id: 18585) state:
waiting for completed aio requests (write thread)
I/O thread 8 (thread id: 18586) state:
waiting for completed aio requests (write thread)
I/O thread 9 (thread id: 18587) state:
waiting for completed aio requests (write thread)
用show engine innodb status出来的结果就有10 IO个线程,而innodb_thread_concurrency却为8。
所以上层要使用innodb存储引擎的时候,肯定是mysql开启线程去调用innodb存储引擎进行相关工作,所以这里的innodb_thread_concurrency指的就是innodb接受mysql线程调用的最大线程数,而不是在innodb内部工作的线程数。之前用show engine innodb status显示的线程就是innodb内部的线程。
OK,问题解决了,顺便还提一句,还有一个参数thread_concurrency
这个代表的什么呢?
手册上有解释:This variable is specific to Solaris systems,而且在mysql5.6.1中就被废弃了
2012年3月10日星期六
[转载]How Logs Work On MySQL With InnoDB Tables
这篇帖子个人感觉写得非常棒,不转不行
In
this article I will describe how logs work with MySQL and InnoDB. Very little
of this material is in the manual. I gleaned it from the source code while
researching for our latest book (SQL Performance Tuning by Peter Gulutzan and
Trudy Pelzer).
If you've ever worked with MySQL and InnoDB tables then this article should give you some "insider" knowledge that you wont find anywhere else, so sit back and get ready to read!
If you've ever worked with MySQL and InnoDB tables then this article should give you some "insider" knowledge that you wont find anywhere else, so sit back and get ready to read!
Page 1
When
you change data with UPDATE, INSERT or DELETE queries you're changing the data
in two places: the log buffer and the data buffers. Buffers are fixed-length,
typically a multiple of 512 bytes. And they are in memory – InnoDB won't write
them to disk... yet.
LOG BUFFER DATA BUFFER
================= ===============
= Log Record #1 = = Page Header =
= Log Record #2 = = Data Row =
= Log Record #3 = = Data Row =
= Log Record #4 = = Data Row =
================= ===============
For example, after "INSERT INTO Jobs VALUES (1,2,3)" the log buffer will have one new log record -- call it Log Record #5 -- containing a row identifier and the new contents. Meanwhile, the data buffer will have one new row, but it will also have a mark in the page header saying "the latest log record for this page is Log Record #5". In this example "#5" is the Log Sequence Number (LSN), and it's vital for scheduling later operations.
Some details about the data-change:
(a) An INSERT log record contains only new data, which is enough so that the procedure can be repeated on the same page if necessary. This is called a "redo" entry.
(b) The LSN isn't a field in the log record, instead it's an absolute address for a file and byte offset.[1]
After InnoDB has changed the log buffer and the data buffers, it's all over but the disk writing. But that's where things get complex. There are several threads monitoring buffer activity, and three situations -- overflow, checkpoint, and commit -- that result in disk writes.
What Happens With Overflows?
Overflow is rare because InnoDB takes pro-active measures to prevent buffers from filling up (see "what happens with checkpoints" below). Still, let's discuss the two possible cases.
One: if the log buffer gets full, InnoDB writes the buffer at the "end" of the log. I've put the word "end" inside quotes because a log file, or more precisely a group of log files, looks like a snake swallowing its tail. If there's only room for four log records and we're writing #5, then it will have to go at the start of the file.
LOG FILE(S) BEFORE WRITING LOG RECORD #5
=================
= Log Record #1 =
= Log Record #2 =
= Log Record #3 =
= Log Record #4 =
=================
LOG FILE(S) AFTER WRITING LOG RECORD #5
=================
= Log Record #5 =
= Log Record #2 =
= Log Record #3 =
= Log Record #4 =
=================
There's no such thing as a log that grows forever. Even though InnoDB uses some compression tricks, the log files would get too big to fit on any disk drive. So InnoDB writes "in a circle" and that means it must overwrite old log records. This circular logging policy has implications which we'll come back to later.
Two: if the data buffers get full, InnoDB writes the least recently used buffer to the database -- but not too quickly! This is where the page header's LSN becomes useful. First, InnoDB checks whether it's greater than the LSN of the last log record in the log file. If it's greater, then InnoDB must write out log records first, until the log catches up with the data, and only then can it write the data. In other words data page writing cannot occur until writing of the corresponding log record has occurred. That is the "Write-Ahead Logging" principle which is common to all important DBMSs except InterBase.
What Happens With Checkpoints?
I said that InnoDB takes some pro-active measures against overflows, and the most important of these measures is checkpointing. There is a separate thread, or a combination of threads that are separate from the thread that changes the buffers. At fixed intervals the checkpointer will wake, look for buffer changes, and ensure that writes happen.
By my count, most DBMS's would write everything at this time so there are no buffers left which are changed but unwritten. To use the usual jargon, the DBMS would flush all "dirty" buffers with a "Sharp Checkpoint". But InnoDB only ensures that (a) log and data buffers aren't getting fuller than a fixed threshold point, (b) log writing stays ahead of data page writing, (c) there is no data buffer whose page-header LSN corresponds to a log record that's about to be overwritten. In the jargon, this means InnoDB is a "Fuzzy Checkpoint" aficionado.
At checkpoint time it's possible to write another log record which says, in effect: at this moment a checkpoint occurred, so it's certain that the database is up to date except for a few dirty pages, and here is a list of the dirty pages. This information can be useful during a recovery, so I'll mention it again later.
LOG BUFFER DATA BUFFER
================= ===============
= Log Record #1 = = Page Header =
= Log Record #2 = = Data Row =
= Log Record #3 = = Data Row =
= Log Record #4 = = Data Row =
================= ===============
For example, after "INSERT INTO Jobs VALUES (1,2,3)" the log buffer will have one new log record -- call it Log Record #5 -- containing a row identifier and the new contents. Meanwhile, the data buffer will have one new row, but it will also have a mark in the page header saying "the latest log record for this page is Log Record #5". In this example "#5" is the Log Sequence Number (LSN), and it's vital for scheduling later operations.
Some details about the data-change:
(a) An INSERT log record contains only new data, which is enough so that the procedure can be repeated on the same page if necessary. This is called a "redo" entry.
(b) The LSN isn't a field in the log record, instead it's an absolute address for a file and byte offset.[1]
After InnoDB has changed the log buffer and the data buffers, it's all over but the disk writing. But that's where things get complex. There are several threads monitoring buffer activity, and three situations -- overflow, checkpoint, and commit -- that result in disk writes.
What Happens With Overflows?
Overflow is rare because InnoDB takes pro-active measures to prevent buffers from filling up (see "what happens with checkpoints" below). Still, let's discuss the two possible cases.
One: if the log buffer gets full, InnoDB writes the buffer at the "end" of the log. I've put the word "end" inside quotes because a log file, or more precisely a group of log files, looks like a snake swallowing its tail. If there's only room for four log records and we're writing #5, then it will have to go at the start of the file.
LOG FILE(S) BEFORE WRITING LOG RECORD #5
=================
= Log Record #1 =
= Log Record #2 =
= Log Record #3 =
= Log Record #4 =
=================
LOG FILE(S) AFTER WRITING LOG RECORD #5
=================
= Log Record #5 =
= Log Record #2 =
= Log Record #3 =
= Log Record #4 =
=================
There's no such thing as a log that grows forever. Even though InnoDB uses some compression tricks, the log files would get too big to fit on any disk drive. So InnoDB writes "in a circle" and that means it must overwrite old log records. This circular logging policy has implications which we'll come back to later.
Two: if the data buffers get full, InnoDB writes the least recently used buffer to the database -- but not too quickly! This is where the page header's LSN becomes useful. First, InnoDB checks whether it's greater than the LSN of the last log record in the log file. If it's greater, then InnoDB must write out log records first, until the log catches up with the data, and only then can it write the data. In other words data page writing cannot occur until writing of the corresponding log record has occurred. That is the "Write-Ahead Logging" principle which is common to all important DBMSs except InterBase.
What Happens With Checkpoints?
I said that InnoDB takes some pro-active measures against overflows, and the most important of these measures is checkpointing. There is a separate thread, or a combination of threads that are separate from the thread that changes the buffers. At fixed intervals the checkpointer will wake, look for buffer changes, and ensure that writes happen.
By my count, most DBMS's would write everything at this time so there are no buffers left which are changed but unwritten. To use the usual jargon, the DBMS would flush all "dirty" buffers with a "Sharp Checkpoint". But InnoDB only ensures that (a) log and data buffers aren't getting fuller than a fixed threshold point, (b) log writing stays ahead of data page writing, (c) there is no data buffer whose page-header LSN corresponds to a log record that's about to be overwritten. In the jargon, this means InnoDB is a "Fuzzy Checkpoint" aficionado.
At checkpoint time it's possible to write another log record which says, in effect: at this moment a checkpoint occurred, so it's certain that the database is up to date except for a few dirty pages, and here is a list of the dirty pages. This information can be useful during a recovery, so I'll mention it again later.
Page2
At
COMMIT time, InnoDB will NOT write all dirty data pages to disk. I emphasize
that because it's easy to think that committing changes means writing
everything on a persistent medium. Well, the InnoDB folks are smarter than
that. They realize that only the log records need writing. The writing of dirty
data pages can happen at overflow or checkpoint time, because their contents
are redundant. If the log survives a crash, remaking of data pages is possible
using the information in the log records.[2]
So InnoDB should only write the log. Or to be exact, InnoDB should write log records, until it has written all log records which apply to the transaction that's committing. Since all log writing is serial, that means InnoDB must write log records for other transactions too, but that's okay.[3]
Here I must get critical, because that's not what InnoDB does, necessarily. If MySQL's my.cnf's innodb_flush_log_at_trx_commit switch is zero, then InnoDB will avoid log writing at commit time. This means that a successful COMMIT won't "guarantee" that all data changes have become persistent, which is what the ANSI/ISO standard demands. The persistence guarantee applies only for checkpoints.
Anyway, you can set innodb_flush_log_at_trx_commit to one. In that case all's well, InnoDB will write the log. Also InnoDB will flush.
I'd better explain what flushing is, eh? Usually it's enough to merely write, but all modern operating systems will cache writes for efficiency reasons. To get the "guarantee" InnoDB must insist to the operating system "I really mean write, I want that disk write head on a platter, don't return till the physical operation is complete." This means that on a Windows system InnoDB calls the Windows-API function FlushFilBuffers, a call meaning "flush the cache." Here InnoDB parts company with Microsoft: SQL Server 2000 would use a "write through" option during the write, rather than flushing after the write.
Recovery
We come now to the point that makes all the logging trouble worthwhile: if a crash happens, you can recover your data.
For a crash that didn't melt the disk drive, recovery is automatic. InnoDB reads the last checkpoint log record, sees if the "dirty pages" were written before the crash, and (if they weren't) reads the log records which affected the page and applies them. This is called "rolling forward" and it's easy for two reasons: (1) because the LSN is there so InnoDB just has to compare numbers to get into synch, (2) because I left out a few details.
Fine. Now, what about a crash that did melt the disk drive? Then the recovery scenario depends on your preparation.
Scenario one: the log is gone. Well, you should have prepared by having a log copy on a separate drive. InnoDB has no explicit option but there are operating-system-specific ways.
Scenario two: the database is gone and the log is overwritten. Well, you should have anticipated that with circular logging, log record #5 will overwrite log record #1. Remember? Therefore if you didn't take a backup after the writing of log record #1, you've lost data.
Scenario three: the database is gone and the log is okay. In that case, congratulations. You just have to restore your last backup copy, and roll the whole log forward. There would be complications if you had to back up the log itself several times since the last full backup ("archive logging"), but I've assumed that option is currently inoperative. By the way, I am not discussing the role of MySQL's binlog. It's essential to the process but isn't part of InnoDB, so out of scope.
So InnoDB should only write the log. Or to be exact, InnoDB should write log records, until it has written all log records which apply to the transaction that's committing. Since all log writing is serial, that means InnoDB must write log records for other transactions too, but that's okay.[3]
Here I must get critical, because that's not what InnoDB does, necessarily. If MySQL's my.cnf's innodb_flush_log_at_trx_commit switch is zero, then InnoDB will avoid log writing at commit time. This means that a successful COMMIT won't "guarantee" that all data changes have become persistent, which is what the ANSI/ISO standard demands. The persistence guarantee applies only for checkpoints.
Anyway, you can set innodb_flush_log_at_trx_commit to one. In that case all's well, InnoDB will write the log. Also InnoDB will flush.
I'd better explain what flushing is, eh? Usually it's enough to merely write, but all modern operating systems will cache writes for efficiency reasons. To get the "guarantee" InnoDB must insist to the operating system "I really mean write, I want that disk write head on a platter, don't return till the physical operation is complete." This means that on a Windows system InnoDB calls the Windows-API function FlushFilBuffers, a call meaning "flush the cache." Here InnoDB parts company with Microsoft: SQL Server 2000 would use a "write through" option during the write, rather than flushing after the write.
Recovery
We come now to the point that makes all the logging trouble worthwhile: if a crash happens, you can recover your data.
For a crash that didn't melt the disk drive, recovery is automatic. InnoDB reads the last checkpoint log record, sees if the "dirty pages" were written before the crash, and (if they weren't) reads the log records which affected the page and applies them. This is called "rolling forward" and it's easy for two reasons: (1) because the LSN is there so InnoDB just has to compare numbers to get into synch, (2) because I left out a few details.
Fine. Now, what about a crash that did melt the disk drive? Then the recovery scenario depends on your preparation.
Scenario one: the log is gone. Well, you should have prepared by having a log copy on a separate drive. InnoDB has no explicit option but there are operating-system-specific ways.
Scenario two: the database is gone and the log is overwritten. Well, you should have anticipated that with circular logging, log record #5 will overwrite log record #1. Remember? Therefore if you didn't take a backup after the writing of log record #1, you've lost data.
Scenario three: the database is gone and the log is okay. In that case, congratulations. You just have to restore your last backup copy, and roll the whole log forward. There would be complications if you had to back up the log itself several times since the last full backup ("archive logging"), but I've assumed that option is currently inoperative. By the way, I am not discussing the role of MySQL's binlog. It's essential to the process but isn't part of InnoDB, so out of scope.
Page4
With
an understanding of InnoDB logging, you know some things need watching. In no
particular order:
use
large log files so overwriting won't happen since backups
- keep
log files and data files on separate disk drives
- make
sure innodb_flush_log_at_trx_commit is set right
2012年3月9日星期五
innodb_fast_shutdown的内幕
Innodb_fast_shutdown告诉innodb在它关闭的时候该做什么工作。有三个值可以选择:
1. 0表示在innodb关闭的时候,需要purge
all, merge insert buffer,flush dirty pages。这是最慢的一种关闭方式,但是restart的时候也是最快的。后面将介绍purge all,merge insert buffer,flush dirty pages这三者的含义。
2. 1表示在innodb关闭的时候,它不需要purge
all,merge insert buffer,只需要flush dirty page。
3. 2表示在innodb关闭的时候,它不需要purge
all,merge insert buffer,也不进行flush dirty page,只将log buffer里面的日志flush到log files。因此等下进行恢复的时候它是最耗时的。
那么在mysql restart的时候它的恢复流程(也称作crash recovery)是怎么样的呢?
1.
如果在上次关闭innodb的时候是在innodb_fast_shutdown=2或是mysql crash这种情况,那么它会利用redo
log重做那些已经提交了的事务。
2.
接下来的操作就是这么几个:
a>
Rollback uncompleted transitions
取消那些没有提交的事务
b>
Purge all 清除无用的undo页
c>
Merge insert buffer 合并插入缓冲
下面详解purge all、merge insert buffer、flush dirty page
1.
Purge all 这个操作主要是删除那些无用的undo页。对于delete操作,innodb是通过先将要删除的那一行标记为删除,而不是马上清除这一行,因为innodb实现了MVCC,这些undo段用来实现MVCC机制。MVCC也就是常说的多版本控制,即在读的时候不会加排他锁,这样大大提高了并发性。那么在一致性读的时候,怎么才能找到和事务开始的那个版本呢?对于主键索引,每个行都有一个事务ID和一个undo ID,这个undo ID指向了这行的先前版本的位置。对于非主键索引,也就是常说的secondary index,是通过先找主键索引再找到undo段。而对于update操作,则是先标记删除,然后insert一个新的行,接下来如果有一致性读,那么查找old version的行的原理和delete操作是一样的,详情见[1]。现在接着说purge
all操作,随着DML的操作越来越多,那么回滚段必然也会越来越多导致占用了许多磁盘空间,那么innodb就会定期删除一些无用的undo页,首先,innodb重启的时候必然undo页都会无效所以会进行purge all操作,另外,随着时间的推移必然一些事务已经完成,它们已不再需要某些undo页,那么这些undo在mysqld running的时候也会定期的进行清除,主要是在master thread中进行,虽然mysql5.5里面增加了一个参数innodb_purge_threads来进行purge工作,但是这个参数的默认值是0,手册上解释说这个功能在mysql5.5中还不完善,增加它的目的只是表明这是innodb的发展方向。
2.
Merge insert buffer
Insert buffer是innodb的一个特性之一,在非聚簇、且不是唯一索引(即非主键索引、非唯一索引)的情况下,如果插入的索引行所属的页在buffer pool中就直接更新这个页,否则它会将这个索引行插入到insert buffer中,然后定期对这个insert buffer进行合并(合并的本质工作就是将insert buffer中的信息更新到真正的索引文件中去)。因为innodb的secondary index是非聚簇的,那么插入很有可能带来大量的随机I/O,而如果利用insert buffer对一些属于相同页的行进行合并,那么就会减少随机IO从而提高性能。但是这里需要注意的是,insert buffer和doublewrite buffer是类似的概念,他实际上属于system
tablespace中的一部分[2],正由于它也是持久化存储,那么在服务器宕机或是重启之后这些信息不会丢失,所以也就有了在前面介绍innodb_fast_shutdown时所说:在innodb重启时,可能需要进行merge
insert buffer。那么在什么情况下需要对insert buffer进行merge操作呢?
a>
在innodb restart的时候
b>
master thread会定期的进行merge操作
c>
每次读取secondary index page时,如果所需页不在buffer
pool,而这些页在insert buffer中的时候,这时需要先对insert buffer进行合并,然后才能被读取。
为什么这样呢?因为所有插入的索引行所属的页如果不在buffer pool中,而又在insert buffer中,那么它一定代表了页的最新状态(不理解?因为每次插入索引行的时候,如果所需页不在buffer pool中就直接插入到insert buffer中,而一旦insert buffer merge后相关的行也就不在insert
buffer更新secondary index page了)。这时或许你会问那么为什么不直接读取insert
buffer中的页然后继续操作而一定要合并(更新到索引文件)呢?因为在innodb中是数据文件(也就是主键索引)和索引文件缓存的,在insert buffer中读取了需要的页后,那么必然就会在buffer pool中缓存了这个页,而如果这个页还留在insert buffer中却不更新到secondary index page去,那么,第一,这将不能保证索引文件得到更新;第二,insert buffer的空间会被占用。而如果这一步将insert buffer 合并后,不但减小了insert buffer的使用空间,而且将这merge操作完成了一部分,减小了以后merge的负担(不是有句话叫做今日事今日毕么),不过这也减慢了读的操作,因为读操作必须等待这个页的合并。
3.
Flush dirty page
这是最好理解的一个概念了,刷新脏页到磁盘。Innodb是数据文件和索引文件缓存的(innodb中的数据文件本质上也是索引文件,只是习惯这么称呼而已),从磁盘读到buffer中的文件被修改后,那么就成了dirty page脏页。而如果这些修改页的操作被提交了之后这些页就必须被flush到磁盘上。
啰嗦了这么久基本上将mysql的insert buffer工作原理大致说清楚了,不过需要注意的是在mysql5.5中这个insert buffer已经改名了,叫做change buffer,不见包含了insert buffer,而且包括了update buffer,delete buffer。最后提一句,随着SSD、Fusion IO这类型存储出现,很多时候我们考虑随机IO带来的影响或许对它们就不适用了。
因为没有读源码,这些理解是通过读其他的资料而来的,所以还留下了几个问题:
1.
实现insert buffer的数据结构是什么?我想应该是树状结构,因为这会为合并那一步提升效率。理由:第一,如果是无序链表的最开始的插入效率可能会比较高,但是最终判断哪些行在相同页或是相邻页的时候需要排序,这里的代价会比较高。而有序的链表在性能上没有二叉树这种结构效率高。
2.
Insert buffer占多大空间?如果很小那岂不是只能容纳几行?那么在系统压力的时候,有空间来应付插入压力么?而如果比较大的,那么怎么保证在合并时候的效率?
参考文档:
2012年3月8日星期四
Linux I/O 调度策略
I/O scheduler的作用就是为请求队列里面的IO请求做一个优化,以此达到提高系统吞吐量、缩短响应时间的目的。更改I/O scheduler有两种方式:
1./sys/block/device_name/queue/scheduler(IOscheduler);/sys/block/device_name/queue/nr_request(queue
size)
2.永久修改?启动时,grup->edit->elevator
= scheduler_name
但是提高系统吞吐量和缩短响应时间是一件矛盾的事情。因为如果为了提高系统吞吐量,则必然要对IO请求的某些顺序做改变,这就导致了某些IO请求的响应时间增大。而如果为了减小响应时间(那么就是来一个请求响应一个)那么系统吞吐量或许就会降低了。因此linux有了四种I/O scheduler来满足不同的情况。
1.
CFQ(completely fair queuing)
完全公平队列,实现了请求合并、电梯算法。每个进程的IO请求产生一个队列,这个队列是对寻道地址进行排序,尽量减少寻道时间。每个进程的队列有自己的优先级,然后以时间片轮转的方式去服务每个进程的请求队列。[1]
2.
Deadline
这种调度方式实现了请求合并、电梯算法,然后为每个请求分配一个超期时间,这样就避免了某个请求被饿死的情况。在它的实现中分配了好几个队列,deadline queue(这个是根据超期时间来排序的),这个是按照磁道访问顺序来的;write queue;read queue。其中read queue的优先级比write queue的优先级高,这主要是因为用户发出了读请求后会立马等待结果,而用户发出了写请求后则不会等待他是否完成了(因为用户看不到)。这三者的优先级关系是Q(read) > Q(read) > Q(deadline)。大概的响应流程:决定下一次是从队列中响应哪个请求时,首先是从Q(deadline)去查看第一个请求是否超期了,如果没有那么就会从read或write队列(这个应该是根据他们之间的优先级来定的)里面读取一批的请求去响应(默认好像是5个)。
3.
NOOP
NOOP的全称是no operations,它就是一个简单的FIFO,不过它实现了合并请求的功能,也就是说如果两个请求访问的额磁道地址在一起,那么就会合并它们。这种适用于没有寻到时间的情况,比如说SSD、fusion IO,而不适应HDD。
4.
AS
AS其实和CFQ差不蛮多,称作预期的调度,别的IO scheduler都只是对离散IO做了优化但是并没有对连续IO作优化,比如说一个IO请求刚执行完的这瞬间又在此位置来了一个IO请求,那么在AS这种情况下就可以马上响应刚才新来的那个请求,它是通过每次完成之后等那么极短的时间来响应的(6ms?)不过现在已经被CFQ取代。
简单的说:CFQ是一种通用方案,deadline适用database情景,而NOOP适用SSD、fusion IO。但是具体的也要看实际情况,比如说deadline只有在极大地IO情况下才有可能提升一定的性能(自己做过测试,规模比较小,没有多大性能提升)
下面在谈谈queue size
Queue size就是用来存在IO请求的队列的大小,所以理论上来说queue size越大,提升的性能更大,因为有更多的请求被优化了。但是这里需要注意几种情况。
1.
在SSD、Fusion IO这种不需要寻道时间(seek time)的情景下,增大了size并没有多大益处,或许会因为对队列的维护而带来负面影响。所以使用之前最好还是测试一下比较合适。
2.
在使用innodb的时候,无论是CFQ还是deadline,都不要设置为太大的queue size。因为有人说(不是官方说法[2])innodb内部也对io请求做了根os一样的优化,那么如果在os层把queue size设得太大的话会与innodb内部重复,导致额外的代价,性能或许会更低。
3.
Linux的文档中说deadline比CFQ更适合数据库这种情况,但是貌似只有在极大地系统压力情况下deadline才会比CFQ有比较大的性能优势。
介绍了cfq/deadline的一些细节。
订阅:
博文 (Atom)