2011-12-03 20:53:11
MySQL Slow Query Log Filter
Filtering all queries of one user out of a MySQL Slow Query Log is a common task. Maatkit is a toolkit for analyzing log files and more. Here is an example run to filter all queries of user123 of a 707 MB big slow-query-log:
$ time mk-query-digest --filter \
'($event->{user} || "") =~ m/user123/' \
--print --no-report log-slow-queries.log >1
log-slow-queries.log: 15% 02:49 remain
log-slow-queries.log: 29% 02:26 remain
log-slow-queries.log: 44% 01:51 remain
log-slow-queries.log: 59% 01:20 remain
log-slow-queries.log: 76% 00:45 remain
log-slow-queries.log: 90% 00:18 remain
real 3m20.097s
user 0m16.616s
sys 3m2.592s
Well, 707 MB in more than 3 minutes? Simple filtering can also be done with awk, try it:
$ time awk \
'/^# Time: /{t=1;s=$0;p=0}'\
'!/^# /{t=0}/^# User@Host: /{p=0}'\
'/^# User@Host: user123/{p=1};'\
'{if(p){if(t){print s;t=0;}print$0}}' \
<log-slow-queries.log >2
real 0m58.844s
user 0m2.501s
sys 0m3.268s
Nice speed, but awk is a symlink to mawk, try gawk:
$ time gawk \
'/^# Time: /{t=1;s=$0;p=0}'\
'!/^# /{t=0}/^# User@Host: /{p=0}'\
'/^# User@Host: user123/{p=1};'\
'{if(p){if(t){print s;t=0;}print$0}}' \
<log-slow-queries.log >2
real 3m30.287s
user 0m29.934s
sys 2m58.800s
Ooops, but didn't we already know, how slow GNU-software can be? Try a different server:
$ time gawk \
'/^# Time: /{t=1;s=$0;p=0}'\
'!/^# /{t=0}/^# User@Host: /{p=0}'\
'/^# User@Host: user123/{p=1};'\
'{if(p){if(t){print s;t=0;}print$0}}' \
<log-slow-queries.log >2
real 2m4.899s
user 2m2.240s
sys 0m1.890s
This server runs more fast, but it lacks mawk, is virtual and it runs like the first server only 32-bit-linux. Try gawk and mawk on 64-bit-linux (same physical host):
$ time gawk \
'/^# Time: /{t=1;s=$0;p=0}'\
'!/^# /{t=0}/^# User@Host: /{p=0}'\
'/^# User@Host: user123/{p=1};'\
'{if(p){if(t){print s;t=0;}print$0}}' \
<log-slow-queries.log >2
real 1m49.348s
user 1m40.242s
sys 0m1.528s
$ time mawk \
'/^# Time: /{t=1;s=$0;p=0}'\
'!/^# /{t=0}/^# User@Host: /{p=0}'\
'/^# User@Host: user123/{p=1};'\
'{if(p){if(t){print s;t=0;}print$0}}' \
<log-slow-queries.log >3
real 0m13.115s
user 0m1.000s
sys 0m2.392s
Conclusions:
- simple filter can be done faster with awk than with maatkit
- mawk does this filtering faster than gawk
- mawk and gawk run faster on 64-bit-linux