MySQL Performance Schema (2)
This is #2 in a series of blog postings about MySQL Performance Schema.
My colleague Mark Leith has shown some examples of output from two PERFORMANCE_SCHEMA summary tables on his blog http://www.markleith.co.uk/?p=112. I know we’ll be hearing more from him and I notice he’ll be giving a talk at our User Conference in April, so watch that space.
The specification of WL#2360 Performance Schema is now visible on our forge site http://forge.mysql.com/worklog/task.php?id=2360. I recommend you go there, click on “High Level Architecture”, read it, then come back to here. It should take you a while because this is the second-largest MySQL worklog-task specification since time began (the largest was WL#148 for foreign keys, http://forge.mysql.com/worklog/task.php?id=148). I’m an architect so I did the original specification and most of the wrangling/wheedling/editing over the last four years to keep this target in our sights. My colleague Marc Alff did almost all the serious coding, starting about nine months ago, after he did all the serious coding for the new SIGNAL feature. People deserving honourable mention include the architecture reviewer Mikael Ronstrom, principal engineer Sergei Golubchik, and early adopters like Mark Leith. I’ll talk in a later post about another “early adopter” group which will soon announce its plans based on Performance Schema.
Absorbed the specification? Then let’s move to looking at an unretouched photo of the result on my x86 Linux laptop. My illustration is the first SELECT that I did on the fundamental PERFORMANCE_SCHEMA table after starting a new server and client.
<pre>
mysql> SELECT * FROM performance_schema.events_waits_current LIMIT 1\G
*************************** 1. row ***************************
THREAD_ID: 2
EVENT_ID: 144
EVENT_NAME: wait/synch/mutex/mysys/THR_LOCK_malloc
SOURCE: safemalloc.c:183
TIMER_START: 40046326080026
TIMER_END: 40046326588338
TIMER_WAIT: 508312
SPINS: NULL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_TYPE: NULL
OBJECT_INSTANCE_BEGIN: 0×96aa620
NESTING_EVENT_ID: NULL
OPERATION: LOCK
NUMBER_OF_BYTES: NULL
1 row in set (0.00 sec)
</pre>
We make a new row every time we see an event. This is a monitoring “detail”. That’s the vital building block that distinguishes MySQL’s method. As any good
book about performance monitoring will tell you: you can get aggregates out of details, but you can’t get details out of aggregates. So voila, details.
The THREAD_ID value in this row doesn’t (yet) correspond to a value from SHOW PROCESSLIST, although that’s also available (it just happens to beĀ after the first row in
EVENTS_WAITS_CURRENT). We look at all threads, showing background jobs or delayed statements, so expect more than one-to-one correspondence here.
The EVENT_ID value is guaranteed to be unique, at least when taken in combination with THREAD_ID. Hidden meanings can be gleaned, but the essential purpose is to identify the row in sequence.
The EVENT_NAME value tells us that the most recent event for thread 2 was a “wait” i.e. something that might cause the application to wait, for a “synchronization object”, in fact a “mutex”, which is related to MySQL’s “mysys” code, and to be exact is “THR_LOCK_malloc”. (This happens to be an event that only occurs in debug mode.) Different mutexes have different names. Names show a hierarchy, which makes it easy to search for the most specific or the least specific, using WHERE with LIKE. We’re instrumenting both Mutexes and File IO. Mutex acquisitions tend to be more frequent than File IO.
The SOURCE value is where the code for the instrument is in MySQL’s source code. If we looked at line 183 in mysys/safemalloc.c (which we could, because this source is freely available), we’d see:
”
MYSQL_MUTEX_LOCK(&THR_LOCK_malloc);
”
Yes, that’s all there is to see. The instrumentation is technically not part of the module’s algorithms, it has no effect on what the routine is doing, and in fact there’s no change to the code here in safemalloc.c. We’ve just replaced the original code “pthread_mutex_lock(…)” with “MYSQL_MUTEX_LOCK”, which is a macro,
and the macro definition in include/mysql/mysql_mutex.h says MYSQL_MUTEX_LOCK is pthread_mutex_lock if you choose to build without performance schema. In that case, we’ve merely renamed. But if you choose to build with performance schema, the macro code in MySQL ’s include file effectively becomes
[ macro which causes "timer start" ]
actual mutex-locking instrumented code
[ macro which causes "timer end" calculation" ]
Those lines are the “instrumentation”.
TIMER_START, TIMER_END, and TIMER_WAIT show when the timer started, when it ended, and the difference (end minus start) which is how long a time elapsed between the instrumentation start and end. The TIMER_WAIT column tells us that the event took 508312 “picoseconds”. I put “picoseconds” inside quote marks because I’m avoiding any pretense that measurement can be so precise, or even that there’s always a relation to seconds on a real-world clock. The timer’s virtues are (a) very low overhead (b) good timings most of the time. But if anyone wants to make a view that divides TIMER_WAIT by a trillion to get what could be approximately the time in seconds (0.00005), that’s well and good.
The OPERATION is “LOCK”. Not a database-row lock in this case, but a “LOCK” because we’re acquiring a synchronization object.
Most of the other columns will be NULL for mutexes most of the time.
This is one of millions of rows. In fact it could be one of billions — one of the bugs that Marc fixed recently, which was due to my error, was too-small counter which overflowed after a session was up for a few days. These rows have a short life, we have to archive and destroy to save space. So most events pop into and out of existence without anybody noticing or caring.
But what if …
? The DBMS is “stuck”? Well, in that case the last event that has a timer start but not a timer end, and an ever-growing timer_wait value, is what’s it’s stuck on.
? You want to see whenever some job causes a particular event? Well, in that case, you can filter so that only that event appears, and there’s an in-memory history.
? You want to see aggregates (sums or counts or minima or maxima) of, for example, the number of times that thread 1 has accessed file tablespace_1, or the number of “picoseconds” that have been spent on reads of tablespace_1 by all threads?
Done. And I don’t mean “it can be done” or “all you need to do is write something”. I mean the information is (to coin a phrase) at your fingertips, always there. More about that on my next workday.

February 6th, 2009 at 11:41 pm
Where is the code for this? This schema offers features for monitoring as much it does for performance.
February 9th, 2009 at 2:44 pm
Everything for the performance schema feature is inside a source-code Bazaar tree. We’ll merge and make public in stages, probably after going through code review and QA.
Yes, we could have given the schema a different name.