Syndicate
Site (RSS, Atom)
Contact
Weblog status
Total entries: 78
Last entry: 2022-10-16 13:52:24
Last updated: 2022-10-16 14:12:58
powered by vim, bash, cat, grep, sed, and nb 3.4.2

2011-12-03 20:53:11

MySQL Slow Query Log Filter

Filtering all queries of one user out of a MySQL Slow Query Log is a common task. Maatkit is a toolkit for analyzing log files and more. Here is an example run to filter all queries of user123 of a 707 MB big slow-query-log:

$ time mk-query-digest --filter \
 '($event->{user} || "") =~ m/user123/' \
 --print --no-report log-slow-queries.log >1
log-slow-queries.log:  15% 02:49 remain
log-slow-queries.log:  29% 02:26 remain
log-slow-queries.log:  44% 01:51 remain
log-slow-queries.log:  59% 01:20 remain
log-slow-queries.log:  76% 00:45 remain
log-slow-queries.log:  90% 00:18 remain

real    3m20.097s
user    0m16.616s
sys     3m2.592s

Well, 707 MB in more than 3 minutes? Simple filtering can also be done with awk, try it:

$ time awk \
'/^# Time: /{t=1;s=$0;p=0}'\
'!/^# /{t=0}/^# User@Host: /{p=0}'\
'/^# User@Host: user123/{p=1};'\
'{if(p){if(t){print s;t=0;}print$0}}' \
 <log-slow-queries.log >2

real    0m58.844s
user    0m2.501s
sys     0m3.268s

Nice speed, but awk is a symlink to mawk, try gawk:

$ time gawk \
'/^# Time: /{t=1;s=$0;p=0}'\
'!/^# /{t=0}/^# User@Host: /{p=0}'\
'/^# User@Host: user123/{p=1};'\
'{if(p){if(t){print s;t=0;}print$0}}' \
 <log-slow-queries.log >2

real    3m30.287s
user    0m29.934s
sys     2m58.800s

Ooops, but didn't we already know, how slow GNU-software can be? Try a different server:

$ time gawk \
'/^# Time: /{t=1;s=$0;p=0}'\
'!/^# /{t=0}/^# User@Host: /{p=0}'\
'/^# User@Host: user123/{p=1};'\
'{if(p){if(t){print s;t=0;}print$0}}' \
 <log-slow-queries.log >2

real    2m4.899s
user    2m2.240s
sys     0m1.890s

This server runs more fast, but it lacks mawk, is virtual and it runs like the first server only 32-bit-linux. Try gawk and mawk on 64-bit-linux (same physical host):

$ time gawk \
'/^# Time: /{t=1;s=$0;p=0}'\
'!/^# /{t=0}/^# User@Host: /{p=0}'\
'/^# User@Host: user123/{p=1};'\
'{if(p){if(t){print s;t=0;}print$0}}' \
 <log-slow-queries.log >2

real    1m49.348s
user    1m40.242s
sys     0m1.528s

$ time mawk \
'/^# Time: /{t=1;s=$0;p=0}'\
'!/^# /{t=0}/^# User@Host: /{p=0}'\
'/^# User@Host: user123/{p=1};'\
'{if(p){if(t){print s;t=0;}print$0}}' \
 <log-slow-queries.log >3

real    0m13.115s
user    0m1.000s
sys     0m2.392s

Conclusions:

  • simple filter can be done faster with awk than with maatkit
  • mawk does this filtering faster than gawk
  • mawk and gawk run faster on 64-bit-linux

Posted by Frank W. Bergmann | Permanent link | File under: mysql, logging, shell