Thursday, August 09, 2012

Rotate log tables with a MySQL EVENT



Unix crontab can be used for database tasks that need to be run on a schedule, but since MySQL version 5.1.6 there is a better way to handle periodic database tasks - Events. 

Events in MySQL are like a stored procedure that can run automatically at a set time. The two main advantages over using a crontab for scheduled jobs are:
  1. No need to put the database password in the crontab file for the connection.
  2. When you move your database, the scheduled EVENTS go with it.

Rotate MySQL log tables


Be sure to add global event_scheduler=1 to your my.ini config file so events will be enabled at database startup.

set global event_scheduler=1;

DROP EVENT IF EXISTS log_clean_daily;

CREATE EVENT log_clean_daily
ON SCHEDULE EVERY 1 day
DO CALL rotate_logs;


The rotate_logs stored procedure:


The two versions I'll show here, are for a regular MySQL server and for an RDS MySQL server.


The regular MySQL procedure for rotate_logs


DROP PROCEDURE IF EXISTS rotate_logs;
DELIMITER |
CREATE PROCEDURE rotate_logs()
BEGIN
drop table if exists general_log_tmp;
drop table if exists general_log_old;
CREATE TABLE general_log_tmp LIKE general_log;
RENAME TABLE general_log TO general_log_old, general_log_tmp TO general_log;
drop table if exists slow_log_tmp;
drop table if exists slow_log_old;
CREATE TABLE slow_log_tmp LIKE slow_log;
RENAME TABLE slow_log TO slow_log_old, slow_log_tmp TO slow_log;
END
|
DELIMITER ;

The Amazon RDS MySQL procedure for rotate_logs 


DROP PROCEDURE IF EXISTS rotate_logs;
DELIMITER |
CREATE PROCEDURE rotate_logs()
BEGIN
  call mysql.rds_rotate_general_log;
  call mysql.rds_rotate_slow_log;
END
|
DELIMITER ;


Make sure to enable the event_scheduler for your parameter group since RDS does not allow setting the event_scheduler global variable directly.
rds-modify-db-parameter-group mygroup --parameters "name=event_scheduler, value=ON, method=immediate"

Tuesday, August 07, 2012

BigInt 64bit UID generation for MYSQL


Unique IDs and GUIDs can easily be represented in 128 bits. However, because there are no commonly available 128-bit processors- it is not practical to work with these values for large scale data processing.

We are left with 64-bit numbers. These are the largest values we can efficiently work with because that's the width on AMD and Intel registers. Also, the largest integer datatype in MySQL is the 64-bit BigInt. This is the most efficient size for a UID when 32-bit auto increment is not an option.


Goals:


  • The IDs need to remain unique (per table) for the expected life of the application.
  • Needs to be efficient for record storage and retrieval in MySQL.
  • Any server should be able to generate the UIDs in any common language.


MySQL Size Constraint:

The largest unsigned integer datatype in MySQL is the 64-bit BigInt which is a 20 digit number.

> 2^64
[1] 1.844674e+19

To reduce random IO for inserts we want our UIDs to be sequential and ordered by time. ID's will ideally be k-sorted to within 1 millisecond.

We can break the 20 digit number into a time part and random part to ensure no duplicate keys.

If we select the first 12 digits as the timestamp in milliseconds we can only store 5 years before key space runs out. However with 13 digits, we can keep the key going for 58 years (give or take a few leap days and leap seconds). Since this fits within our expected application lifetime, we choose 13 digits.

31536000000 milliseconds per year * 58 years

> 31536000000*58
[1] 1.829088e+12

_____________ ______ _
13 Digit Time     6 Rand S

Time: Take current time in milliseconds - Jan 1 2011 (or whenever your application started).
Rand: 6 digit random number
S: 1 digit Server ID

Example BigInt UIDs:

00000000000001234561
Generated exactly at epoch.
Random number=123456
Server ID=1

00315360000001234561
Generated about 1 year after epoch.
Random number=123456
Server ID=1

18290880000001234561

Generated about 58 years after epoch.
Random number=123456
Server ID=1


Example Java code to generate UID:



public static String getUID() {
long time = System.currentTimeMillis() - EPOCH;
StringBuilder sb = new StringBuilder();
sb.append(String.format("%d", time));
sb.append(String.format("%06d", generator.nextInt(1000000)));//0-999999
sb.append(hostIndex);//should be a 1 digit integer
return sb.toString();
}

  Reference: http://mindprod.com/jgloss/pseudorandom.html#LOWHIGH

Example SQL code to generate UID:

For batch inserting lots of records fast, we need to generate the key in SQL.
We use the uuid function in MySQL instead of the 3 digit millisecond value and 6 digit random number. From the first 8 characters of uuid, convert from hex to decimal and take the right 9 digits.

___________ _________
11 digit sec.    9 digit uuid

set @epoch=1293858000; #epoch in seconds (Jan 1, 2011)
set autocommit=0;
insert test (id, a, b)
values
(concat(unix_timestamp()-@epoch, right(conv(substring(uuid(),1,8),16,10)),9) , 'a1', 'b1'),
(concat(unix_timestamp()-@epoch, right(conv(substring(uuid(),1,8),16,10)),9) , 'a2', 'b2'),
(concat(unix_timestamp()-@epoch, right(conv(substring(uuid(),1,8),16,10)),9) , 'a3', 'b3')
;
set autocommit=1;


Of course, with timestamp being a major component of the UID, both server and database should be on UTC time and NTP. This will keep clocks synchronized across servers and prevent time repeating as happens during daylight savings.

For an alternate 64-bit UID see the Twitter Snowflake algorithm:
https://github.com/twitter/snowflake
It provides greater collision safety for a high volumen of inserts/second and a higher number of server ID values but the tradeoff is that it requires a ticket server.