Preface
We have a lot of methods to diagnose problems in our system such as strace,pstack,gstack,gdb,pt-pmp,etc.But sometimes there will be some fitful performance issues which are not so easy to trace.Thus,pt-stalk may help us in diagnosing these kind of problems.
Introduce
pt-stalk is a tool to collect detail diagnostic data base on triggers you specified such as gdb,oprofile,strace,tcpdump.The trigger is not the conception of trigger in database.They're different at all.pt-stalk provides various options to collect comprehensive data you need.It's really useful and helpful in performance diagnosing.Let's see some details of it.
Procedure
Usage
pt-stalk [OPTIONS]
Common Parameters
--collect //Collect diagostic data based on spedified triggers.
--collect-gdb //Collect gdb data.It prints stack traces from all threads.
--collect-oprofile //Collect oprofile data.
--collect-strace //Collect strace data.Do not specify it when "--collect-gdb" has been used.
--collect-tcpdump //Collect tcpdump data.
--cycles //Limite the times when triggering condition occurs(default "5").
--dest //Specify the position to store diagnostic data(default "/var/lib/pt-stalk").
--disk-bytes-free //Prevent the lack of least disk space.It does not collect data until it has the necessary disk space(default "100M",valid surffixes of unit is k,M,G and T).
--disk-pct-free //It's similar with "--disk-bytes-free" but specify the percentage of disk space.
--function //Specify the contents to watch for the triggers(default "status",other value is "processlist" or <yourfilename>).
--iterations //Limit the times to collect.It will run forever if not give a specific value.
--log //Specify the postion to record logs(default "/var/log/pt-stalk.log").It only generates when deamonized.
--match //The pattern to specify when using "--function processlist".
--mysql-only //Merely collect the MySQL relevant diagnostic data but disk space is the exception.
--retention-time //Specify the purge days of diagnostic data(default "30").
--run-time //Specify how many seconds to collect diagnostic data(default "30").It should not be longer than the value of "--sleep".
--sleep //Specify how many seconds to sleep after last collection.It used to prevent too many operations of collection(default "300").
--stalk //Watch and wait for trigger occur(default "yes").
--no_stalk //The option can be specify if you want to collect diagnostic data immediately without waiting for trigger to occur.
--threshold //Specify the maximum value for collecting(default "25").
--variable //Specify the variables to compared against(default "Threads_running").
Examples
Generate benchmark with sysbench.
[root@zlm2 :: ~/sysbench-1.0/src/lua]
#sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port= --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --tables= --table-size= --mysql-storage-engine=innodb cleanup
sysbench 1.0. (using bundled LuaJIT 2.1.-beta2) Dropping table 'sbtest1'...
Dropping table 'sbtest2'...
Dropping table 'sbtest3'...
Dropping table 'sbtest4'...
Dropping table 'sbtest5'...
Dropping table 'sbtest6'...
Dropping table 'sbtest7'...
Dropping table 'sbtest8'...
Dropping table 'sbtest9'...
Dropping table 'sbtest10'... [root@zlm2 :: ~/sysbench-1.0/src/lua]
#sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port= --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --tables= --table-size= --mysql-storage-engine=innodb prepare
sysbench 1.0. (using bundled LuaJIT 2.1.-beta2) Creating table 'sbtest1'...
Inserting records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
Creating table 'sbtest2'...
Inserting records into 'sbtest2'
Creating a secondary index on 'sbtest2'...
Creating table 'sbtest3'...
Inserting records into 'sbtest3'
Creating a secondary index on 'sbtest3'...
Creating table 'sbtest4'...
Inserting records into 'sbtest4'
Creating a secondary index on 'sbtest4'...
Creating table 'sbtest5'...
Inserting records into 'sbtest5'
Creating a secondary index on 'sbtest5'...
Creating table 'sbtest6'...
Inserting records into 'sbtest6'
Creating a secondary index on 'sbtest6'...
Creating table 'sbtest7'...
Inserting records into 'sbtest7'
Creating a secondary index on 'sbtest7'...
Creating table 'sbtest8'...
Inserting records into 'sbtest8'
Creating a secondary index on 'sbtest8'...
Creating table 'sbtest9'...
Inserting records into 'sbtest9'
Creating a secondary index on 'sbtest9'...
Creating table 'sbtest10'...
Inserting records into 'sbtest10'
Creating a secondary index on 'sbtest10'... [root@zlm2 :: ~/sysbench-1.0/src/lua]
#sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port= --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --threads= --time= --report-interval= --rand-type=uniform run
sysbench 1.0. (using bundled LuaJIT 2.1.-beta2) Running the test with following options:
Number of threads:
Report intermediate results every second(s)
Initializing random number generator from current time Initializing worker threads... Threads started! [ 10s ] thds: tps: 258.68 qps: 5176.49 (r/w/o: 3624.11/1034.82/517.56) lat (ms,%): 15.83 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: tps: 286.25 qps: 5726.15 (r/w/o: 4008.67/1144.89/572.60) lat (ms,%): 15.00 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: tps: 270.82 qps: 5416.13 (r/w/o: 3790.80/1083.69/541.64) lat (ms,%): 16.12 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: tps: 280.98 qps: 5619.75 (r/w/o: 3934.26/1123.53/561.97) lat (ms,%): 16.12 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: tps: 298.42 qps: 5968.38 (r/w/o: 4177.83/1193.70/596.85) lat (ms,%): 14.46 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: tps: 278.88 qps: 5578.20 (r/w/o: 3904.92/1115.52/557.76) lat (ms,%): 15.83 err/s: 0.00 reconn/s: 0.00
[ 70s ] thds: tps: 280.91 qps: 5617.66 (r/w/o: 3932.21/1123.63/561.82) lat (ms,%): 15.83 err/s: 0.00 reconn/s: 0.00
[ 80s ] thds: tps: 281.68 qps: 5632.83 (r/w/o: 3942.77/1126.71/563.35) lat (ms,%): 16.12 err/s: 0.00 reconn/s: 0.00
[ 90s ] thds: tps: 281.60 qps: 5631.55 (r/w/o: 3942.07/1126.39/563.10) lat (ms,%): 16.12 err/s: 0.00 reconn/s: 0.00
[ 100s ] thds: tps: 287.62 qps: 5753.86 (r/w/o: 4028.02/1150.49/575.35) lat (ms,%): 15.55 err/s: 0.00 reconn/s: 0.00
[ 110s ] thds: tps: 308.99 qps: 6180.45 (r/w/o: 4326.12/1236.35/617.97) lat (ms,%): 13.95 err/s: 0.00 reconn/s: 0.00
... //Omitted.
Collect diagnostic data using pt-stalk.
[root@zlm2 :: /data/mysql/mysql3306]
#pt-stalk --host localhost --port --user root --password Passw0rd --collect-gdb --cycles --variable Threads_connect --threshold
mysql: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_46 Starting /usr/bin/pt-stalk --function=status --variable=Threads_connect --threshold= --match= --cycles= --interval= --iterations= --run-time= --sleep= --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_46 Detected value is empty; something failed? Trigger exit status:
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_47 Detected value is empty; something failed? Trigger exit status:
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_48 Detected value is empty; something failed? Trigger exit status:
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_50 Detected value is empty; something failed? Trigger exit status:
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_51 Detected value is empty; something failed? Trigger exit status:
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_52 Detected value is empty; something failed? Trigger exit status:
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_53 Detected value is empty; something failed? Trigger exit status:
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_54 Detected value is empty; something failed? Trigger exit status:
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_55 Detected value is empty; something failed? Trigger exit status:
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_15_56 Detected value is empty; something failed? Trigger exit status:
^C2018_07_09_08_15_57 Caught signal, exiting
2018_07_09_08_15_57 All subprocesses have finished
2018_07_09_08_15_57 Exiting because OKTORUN is false
2018_07_09_08_15_57 /usr/bin/pt-stalk exit status //The value of "--variable" should be "Threads_connected".
//Modify the correct value and run it again. [root@zlm2 :: /data/mysql/mysql3306]
#pt-stalk --host localhost --port --user root --password Passw0rd --collect-gdb --cycles --variable Threads_connected --threshold
mysql: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_19_39 Starting /usr/bin/pt-stalk --function=status --variable=Threads_connected --threshold= --match= --cycles= --interval= --iterations= --run-time= --sleep= --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_19_39 Check results: status(Threads_connected)=, matched=yes, cycles_true=
2018_07_09_08_19_39 Collect triggered
2018_07_09_08_19_39 Collect PID
2018_07_09_08_19_39 Collect done
2018_07_09_08_19_39 Sleeping seconds after collect
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
2018_07_09_08_24_39 Check results: status(Threads_connected)=, matched=yes, cycles_true=
2018_07_09_08_24_39 Collect triggered
2018_07_09_08_24_39 Collect PID
2018_07_09_08_24_39 Collect done
2018_07_09_08_24_40 Sleeping seconds after collect
^C2018_07_09_08_25_56 Caught signal, exiting //Terminate to collect diagnostic data by "Ctrl+C"
2018_07_09_08_25_56 Waiting up to seconds for subprocesses to finish...
2018_07_09_08_25_56 Exiting because OKTORUN is false
2018_07_09_08_25_56 /usr/bin/pt-stalk exit status
Check the value of "show global status ... ".
(root@localhost mysql3306.sock)[sysbench]>show global status like '%Threads%';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| Delayed_insert_threads | |
| Slow_launch_threads | |
| Threads_cached | |
| Threads_connected | | //Threads_connected has been up to 5.
| Threads_created | |
| Threads_running | |
+------------------------+-------+
rows in set (0.04 sec) (root@localhost mysql3306.sock)[sysbench]>show global status like '%Threads%';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| Delayed_insert_threads | |
| Slow_launch_threads | |
| Threads_cached | |
| Threads_connected | | //Second time,Threads_connected has been up to 6.
| Threads_created | |
| Threads_running | |
+------------------------+-------+
rows in set (0.01 sec)
Check the output files in default directory("/var/lib/pt-stalk").
[root@zlm2 :: /var/lib/pt-stalk]
#ls -lrt
total
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-trigger
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-pmap
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-variables
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-innodbstatus1
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-mutex-status1
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-ps
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-lsof
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-opentables1
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-top
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-sysctl
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-disk-space
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-mysqladmin
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-vmstat
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-procstat
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-diskstats
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-procvmstat
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-netstat_s
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-slabinfo
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-interrupts
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-netstat
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-meminfo
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-df
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-processlist
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-innodbstatus2
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-transactions
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-hostname
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-mutex-status2
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-opentables2
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-vmstat-overall
-rw-r--r-- root root Jul : 2018_07_09_08_19_39-output //There're a series of files(starts with "tigger" and ends with "output") each time we collect the diagnostic data. -rw-r--r-- root root Jul : 2018_07_09_08_24_39-trigger
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-pmap
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-variables
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-innodbstatus1
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-ps
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-mutex-status1
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-sysctl
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-lsof
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-opentables1
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-top
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-disk-space
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-vmstat
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-mysqladmin
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-procstat
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-netstat
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-slabinfo
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-netstat_s
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-interrupts
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-meminfo
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-diskstats
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-df
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-procvmstat
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-processlist
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-innodbstatus2
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-transactions
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-hostname
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-mutex-status2
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-opentables2
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-vmstat-overall
-rw-r--r-- root root Jul : 2018_07_09_08_24_39-output [root@zlm2 :: /var/lib/pt-stalk]
#cat 2018_07_09_08_24_39-trigger
2018_07_09_08_24_39 Check results: status(Threads_connected)=, matched=yes, cycles_true=
2018_07_09_08_24_39 pt-stalk ran with --function=status --variable=Threads_connected --threshold= --match= --cycles= --interval= --iterations= --run-time= --sleep= --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin= //The trigger file records the options we have used. [root@zlm2 :: /var/lib/pt-stalk]
#cat 2018_07_09_08_24_39-vmstat-overall
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st //The vmstat-overall records the vmstat output.
//each file contains the relevant diagnostic data about what their name called.I'm not going to demonstrate all of them.
Take care of you remain disk space,pt-stalk won't run anymore if you're out of space.
[root@zlm2 :: /data/mysql/mysql3306]
#pt-stalk --host localhost --port --user root --password Passw0rd --collect --cycles --variable Threads_connected --threshold --sleep --demonize
Cannot open /tmp/pt-stalk..FSboRq/po/daemonize: No space left on device at -e line , <$fh> chunk .
No long attribute in option spec /tmp/pt-stalk..FSboRq/po/ask-pass [root@zlm2 :: /data/mysql/mysql3306]
#df -h
]Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root .4G .4G 20K % / //The root of linux has been out of space.
devtmpfs 488M 488M % /dev
tmpfs 497M 497M % /dev/shm
tmpfs 497M 6.6M 491M % /run
tmpfs 497M 497M % /sys/fs/cgroup
/dev/sda1 497M 118M 379M % /boot
none 87G 81G .9G % /vagrant
Summary
- pt-stalk is another tool in diagnosing system performance and similar with pt-pmp but not the same.
- pt-stalk will generate many statistic files which can really help you collecting information in almost every aspect.
- pt-stalk has four main triggers:gdb,oprofile,strace,tcpdump.It's flexible to collect data with diffrent dimensionality.