Tuesday, August 07, 2012

BigInt 64bit UID generation for MYSQL


Unique IDs and GUIDs can easily be represented in 128 bits. However, because there are no commonly available 128-bit processors- it is not practical to work with these values for large scale data processing.

We are left with 64-bit numbers. These are the largest values we can efficiently work with because that's the width on AMD and Intel registers. Also, the largest integer datatype in MySQL is the 64-bit BigInt. This is the most efficient size for a UID when 32-bit auto increment is not an option.


Goals:


  • The IDs need to remain unique (per table) for the expected life of the application.
  • Needs to be efficient for record storage and retrieval in MySQL.
  • Any server should be able to generate the UIDs in any common language.


MySQL Size Constraint:

The largest unsigned integer datatype in MySQL is the 64-bit BigInt which is a 20 digit number.

> 2^64
[1] 1.844674e+19

To reduce random IO for inserts we want our UIDs to be sequential and ordered by time. ID's will ideally be k-sorted to within 1 millisecond.

We can break the 20 digit number into a time part and random part to ensure no duplicate keys.

If we select the first 12 digits as the timestamp in milliseconds we can only store 5 years before key space runs out. However with 13 digits, we can keep the key going for 58 years (give or take a few leap days and leap seconds). Since this fits within our expected application lifetime, we choose 13 digits.

31536000000 milliseconds per year * 58 years

> 31536000000*58
[1] 1.829088e+12

_____________ ______ _
13 Digit Time     6 Rand S

Time: Take current time in milliseconds - Jan 1 2011 (or whenever your application started).
Rand: 6 digit random number
S: 1 digit Server ID

Example BigInt UIDs:

00000000000001234561
Generated exactly at epoch.
Random number=123456
Server ID=1

00315360000001234561
Generated about 1 year after epoch.
Random number=123456
Server ID=1

18290880000001234561

Generated about 58 years after epoch.
Random number=123456
Server ID=1


Example Java code to generate UID:



public static String getUID() {
long time = System.currentTimeMillis() - EPOCH;
StringBuilder sb = new StringBuilder();
sb.append(String.format("%d", time));
sb.append(String.format("%06d", generator.nextInt(1000000)));//0-999999
sb.append(hostIndex);//should be a 1 digit integer
return sb.toString();
}

  Reference: http://mindprod.com/jgloss/pseudorandom.html#LOWHIGH

Example SQL code to generate UID:

For batch inserting lots of records fast, we need to generate the key in SQL.
We use the uuid function in MySQL instead of the 3 digit millisecond value and 6 digit random number. From the first 8 characters of uuid, convert from hex to decimal and take the right 9 digits.

___________ _________
11 digit sec.    9 digit uuid

set @epoch=1293858000; #epoch in seconds (Jan 1, 2011)
set autocommit=0;
insert test (id, a, b)
values
(concat(unix_timestamp()-@epoch, right(conv(substring(uuid(),1,8),16,10)),9) , 'a1', 'b1'),
(concat(unix_timestamp()-@epoch, right(conv(substring(uuid(),1,8),16,10)),9) , 'a2', 'b2'),
(concat(unix_timestamp()-@epoch, right(conv(substring(uuid(),1,8),16,10)),9) , 'a3', 'b3')
;
set autocommit=1;


Of course, with timestamp being a major component of the UID, both server and database should be on UTC time and NTP. This will keep clocks synchronized across servers and prevent time repeating as happens during daylight savings.

For an alternate 64-bit UID see the Twitter Snowflake algorithm:
https://github.com/twitter/snowflake
It provides greater collision safety for a high volumen of inserts/second and a higher number of server ID values but the tradeoff is that it requires a ticket server.

No comments: