MySQL

The world's most popular open source database

Contact a MySQL Representative


  • MySQL.com
  • Developer Zone
  • Partners & Solutions
  • Customer Login
  • DevZone
  • Downloads
  • Documentation
  • Articles
  • Forums
  • Bugs
  • Forge
  • Blogs
 
  • Pages

    • About
    • Find and store the error return value in procedures or functions
  • Archives

    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008
    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
  • Categories

    • MySQL 5.1 Features (3)
    • MySQL 5.4 New Features (2)
    • MySQL 6.0 New Features (5)
    • MySQL 6.x New Features (5)
    • News (8)
    • Personal Opinion (4)
    • Tiny Tweaks (10)
    • Uncategorized (19)



New Features In MySQL 6.x

« MySQL Performance Schema (5)
MySQL Performance Schema (7) »

MySQL Performance Schema (6)

This is #6 in a series of blog postings about MySQL Performance Schema.

A timer is a mechanism that gives some idea of how long an event takes. A platform is a CPU chip (or multiple chips) and an operating system. Different platforms have different timers available. MySQL searches for the best timers, picks one as the default, and allows users to change the default.

The platform I’m using while writing this is a single-CPU x86 1.6GHz laptop with Linux 2.6.18. I can find out what timers there are by looking at PERFORMANCE_TIMERS.

mysql> SELECT * FROM performance_schema.performance_timers;
+-------------+-----------------+------------------+----------------+
| TIMER_NAME  | TIMER_FREQUENCY | TIMER_RESOLUTION | TIMER_OVERHEAD |
+-------------+-----------------+------------------+----------------+
| CYCLE       |      1596965174 |                1 |              8 |
| NANOSECOND  |      1000000000 |                1 |           1345 |
| MICROSECOND |         1000000 |                1 |           1201 |
| MILLISECOND |             991 |                1 |           1284 |
| TICK        |             108 |                1 |           1091 |
+-------------+-----------------+------------------+----------------+
5 rows in set (0.12 sec)

PERFORMANCE_TIMERS is telling me that a timer named CYCLE has a frequency of 1,596,965,174 — in other words 1.6GHz but the calculation will come up with slightly different numbers each time I do a SELECT, and wildly different numbers if I step down the speed setting to save laptop power. The “resolution” is 1, which means essentially that the frequency is real (that’s not always the case, for example it’s common to see a nanosecond timer that in fact is going up by 1000 nanoseconds at a time). The overhead is 8 cycles, in other words (since there are 1.6 billion cycles per second) not much. Meanwhile the NANOSECOND timer has a frequency of 1 billion, a resolution of 1, and an overhead that’s (1345/8) 168 times greater than the CYCLE counter.

I can find out what timer I’m using by looking at SETUP_TIMERS.

mysql> SELECT * FROM performance_schema.setup_timers;
+------+------------+
| NAME | TIMER_NAME |
+------+------------+
| Wait | CYCLE      |
+------+------------+
1 row in set (0.00 sec)

What this is telling me is that MySQL chose a CYCLE timer. Given the precision and the overhead that’s obvious, eh? Well, I think so for this case, and I spent many late hours writing the tiny assembler snippets that have to work on all the major platforms that MySQL supports. (That was my modest contribution to the code and I depended a lot on examples in vendor manuals, everything else in Performance Schema is the work of Marc Alff with advice from many others.) But every user of Performance Schema must be aware of CYCLE’s quirks. In the next section I’m copying from another worklog task that I wrote some time ago, WL#2373 “Use cycle counter for timing”. The section specifically mentions the assembler instruction RDTSC which is specific for x86s, but one encounters similar considerations on other platforms too.

Bad things about RDTSC

- RDTSC doesn’t “serialize”. That is, if there is out-of-order execution, rdtsc might be processed after an instruction that logically follows it. (We can force serialization, but we won’t bother.) See:
“Q&A: RDTSC to measure performance of small # of FP calculations”
http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/30226599.aspx
This flaw is unimportant since we are trying to measure events that take much longer times.

- It is possible to set a flag which renders RDTSC inoperative. Somebody responsible for the kernel of the operating system would have to make this decision. For Windows and Linux, there’s no such problem (although CONFIG_X86_TSC_DISABLE exists).

- With a multi-processor arrangement, it’s possible to get the cycle count from one processor in thread X, and the cycle count from another processor in thread Y. They may not always be in synch. Each processor might have a different TSC value. This is especially noted for AMD multi-socket (as opposed to multi-core) systems. See:
“AMD TSC Drift Solutions in Red Hat Enterprise Linux”
http://developer.amd.com/article_print.jsp?id=92
“Future TSC Directions and Solutions”
http://ltt.polymtl.ca/svn/ltt/branches/poly/doc/developer/tsc.txt
“RDTSCP”
http://developer.amd.com/articles.jsp?id=92&num=5
“tsc timer related problems/questions”
http://kerneltrap.org/mailarchive/linux-kernel/2007/9/9/191506
But “Intel systems are normally all synchronized”. See:
“linux/arch/i386/kernel/tsc.c”
http://lxr.linux.no/linux/arch/i386/kernel/tsc.c#L329
Or the operating system may synchronize TSCs. For example “normally, Windows synchronizes the time stamp counters on all processors” (in special circumstances) (not Windows Server). On Linux, though, the apparent tendency is to check for synchronization but not force it. See:
“Measure Code Sections Using The Enhanced Timer”
http://softwarecommunity.intel.com/articles/eng/2589.htm
“x86: unify/rewrite SMP TSC sync code”
http://lwn.net/Articles/211051/
“Hardware Support and Directions for Windows Server”

http://download.microsoft.com/download/0/0/b/00bba048-35e6-4e5b-a3dc-36da83cbb0d1/ServerDirections.docx

Synchronizing may cause a counter to go backwards.

- Converting cycles to elapsed time is only reliable if a CPU always has the same cycle rate. That’s not true on a laptop, which might change speed to save power. And it’s not true with high-performance chips which might gear down if heating becomes dangerous. (Notice that elsewhere I count this as an argument in favour of RDTSC because such computers generally have slow gettimeofday().)
Variability does not exist on some recent processors. See:
“Intel secretly changes the rules”
http://www.x86-secret.com/?option=newsd&nid=846
“TSC and Power Management Events on AMD Processors”
http://lkml.org/lkml/2005/11/4/173
Microsoft describes the flaw as “not common”. See:
“SQL Server timing values may be incorrect when you use utilities or technologies that change CPU frequencies”
http://support.microsoft.com/kb/931279

- The implementor will have to write code for all the processors that MySQL fully supports. I have already done this, read the comments in the attached file rdtsc3.c. But in m attempt to be cautious I left a few gaps in the coverage.

So a cycle counter won’t be a wonderful solution for all timing situations. However, the defects are acceptable for WL#2360.

Change the timer

As I said in an earlier blog posting, the defects of CYCLE are just a cost of doing business for monitoring. Monitors might be on all the time. You just want them to be unobtrusive and in the background. In return, you accept that results can be a bit off always, and way off sometimes. Or, if you don’t accept it, say

UPDATE setup_timers SET timer_name = ‘MICROSECOND’; /* or NANOSECOND or MILLISECOND or TICK */

As long as you have the privileges, and as long as the timer you choose is really operative on the platform you’re on, and as long as you realize that PERFORMANCE_TIMERS tells you some timers are high-overhead timers, you can switch to using a timer that really is based on wall-clock time. Some people might like to do this when switch from “monitoring” to “diagnosis”.

“Picoseconds”

We wanted to show results in the same time unit, regardless of the timer. In an ideal world this time unit would look like a wall-clock unit and be reasonably precise, in other words microseconds. But to convert cycles or nanoseconds to microseconds we would have to do a DIVIDE twice for ever instrumentation. DIVIDE is expensive on many platforms. MULTIPLY is not expensive. Therefore we se MULTIPLY. Therefore the time unit is an integer multiple of the highest possible timer_frequency, using a multiplier that’s big enough to ensure there’s no major precision loss. Therefore the time unit is “picoseconds”. Trillionths of a second. I expect we’ll have a hard time explaining over and over that we’re aware the precision is spurious but, once again, we did it due to overhead. If we find that our decision was impractical in some way, we’ll change it.

This entry was posted on Thursday, February 12th, 2009 at 8:13 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “MySQL Performance Schema (6)”

  1. Antony Curtis Says:
    February 12th, 2009 at 9:09 pm

    Hi Peter,

    Yes, divide operations are expensive and multiply is much less so.

    However, since the various rates are pretty constant, you can do rate conversions (when polled regularly enough by using Bresenham’s Line drawing algorithm which only uses addition and subtraction.

    I remember using such techniques in the old 386/486 era to mix and resample many analog data channels at varying bit rates in real time without having to resort to using expensive DSPs. I even used it to write a fast 64 channel synthesizer which ran happily on a 486 while working on CD quality 44kHz audio. Response time from key depress to audio output was as short as 25ms.

    Regards,
    Antony.

Leave a Reply

New Features In MySQL 6.x is proudly powered by WordPress MU running on Blogs.mysql.com.
Entries (RSS) and Comments (RSS).