UUID v7 in Swift
22 Aug 2024Universally Unique Identifiers (UUIDs) have been around since the 1980s1 and are baked into almost every programming language2 and database management system3. They are a safe and effective mechanism to assign identity to objects across large systems. Stated simply in the proposed standard RFC 9562:
A UUID… is intended to guarantee uniqueness across space and time
f
This is incredibly useful as a programmer! Imagine a social media mobile app generating millions of posts everyday. Each post can receive a UUID
without needing to coordinate with a central server. Developers can allow users to create posts offline and be reasonably assured that no other post will have the same ID
4.
Most developers I talk with understand the concept of a UUID
and use them frequently,
but few understand exactly what a UUID
is or how it’s created (including me a few weeks ago).
Anatomy of a UUID
A UUID
is a 128-bit data object composed primarily of randomly generated numbers with a little bit of identifying information sprinkled in. Below is the same UUID
represented in three variations: binary, number, and hexadecimal.
Binary
01010101000011101000010000000000111000101001101101000001110101001010011100010110010001000110011001010101010001000000000000000000
Number
113059749145936325402354257176981405696
5
Hexadecimal
550e8400-e29b-41d4-a716-446655440000
The hex representation above is what we most commonly encounter in programming. Whilst I’m guilty of mentally cataloging a UUID
as a String
type (which is fine in most contexts), understanding the components of a UUID
requires breaking down each bit.
Every UUID
in hex is made of 32 characters (ignoring dashes), each representing 4 bits6. Some bits are generated from random values but others have important roles to play. There are 8 different versions of UUID
but for now we’ll just focus on the original Version 1, the most common Version 4, and the new Version 7.
Version 1
Version 1 is the most complex of the three. Here is the makeup:
Bits | Values |
---|---|
0 - 47 | First 48 bits of a timestamp |
48 - 51 | 4-bit version number |
52 - 63 | Last 12 bits of the timestamp |
64 - 65 | 2-bit variant number |
66 - 79 | Clock sequence |
80 - 127 | Node (MAC Address) |
Imperfectly7 illustrated below:
The inclusion of a computer’s identifying information at the end of the UUID
is useful for ensuring uniqueness but problematic for privacy. To address this issue, RFC 4122 proposed a new standard…
Version 4
Version 4 is almost completely random and is the standard adopted by most software systems at the time of writing2 3.
Bits | Values |
---|---|
0 - 47 | 48 bits of random data |
48 - 51 | 4-bit version number |
52 - 63 | 12 bits of random data |
64 - 65 | 2-bit variant number |
66 - 127 | 62 bits of random data |
Version 4 solves the privacy hole of v1, but lacks an important feature that makes it less than ideal as a primary key in a database: sortability.
Version 7
Version 7 was proposed by RFC 9562 in May 2024 to better meet the needs of modern day distributed systems.
Bits | Values |
---|---|
0 - 47 | 48 bits of timestamp |
48 - 51 | 4-bit version number |
52 - 63 | 12 bits of random data |
64 - 65 | Variant number |
66 - 127 | 62 bits of random data |
The re-inclusion of the timestamp introduces sortability to version 7 while maintaining the anonymity of version 4. I recommend reading the proposal’s motivation for creating a new version. Here’s an excerpt:
One area in which UUIDs have gained popularity is database keys. This stems from the increasingly distributed nature of modern applications. In such cases, “auto-increment” schemes that are often used by databases do not work well: the effort required to coordinate sequential numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring coordination makes them a good alternative
UUID v7 in Swift
Even though Swift has a first-party UUID
data-type,
its initializers only produce a version 4 UUID
.
If you’re hoping to use version 7 in your project, you’re in luck! Anton Zhiyanov, an absolutely cracked engineer,
has already written version 7 in Swift and many other languages
with the help of the open source community. Let’s take a look at the implementation8:
The implementation steps are:
- Create a 16-member tuple of eight-bit integers in which the first six members are empty and the next ten have a random value.
- Get the time elapsed since Epoch in milliseconds.
- Using bitwise shift operators, encode the first six bytes with the timestamp.
- Encode the version and variant.
Reworking the solution
On Anton’s website, he is very clear about one thing:
These implementations may not be the fastest or most idiomatic, but they are concise and easy to understand.
As an iOS developer, I would argue that writing code in “idiomatic” Swift is both concise and easy to understand. For the purpose of practice let’s make a few modifications to simplify this function:
First we’ll swap out the shorthand initializer call with a standard Int
initializer and use the Date.now
which is a bit
more expressive 9.
I’m all for using inferred types when the type is clear from the initializer. In the case of a 16-member tuple, I would rather
see an explicit type declaration. Thankfully iOS 17 has a type alias
for it: uuid_t
. I will also rename value
to uuidBytes
to remind me that each element is a byte.
We can avoid unncessary mutation of the tuple’s members by initializing the uuid_t
with the data directly.
All together now:
Not only have we reduced the line count10 of the v7()
function from 31 to 23 lines,
but also made the structure of the UUID more obvious at first glance - we can see all 16 bytes and their
respective values without jumping around the code. We could spend more time optimizing the readability
of this function but I think this is a reasonable compromise with the existing solution.
Should I implement it?
If you’re designing database schemas: YES - at least for new tables. The impact that time sorted primary-keys have have on database-index locality (and thereby query performance) is too tempting to be ignored 11.
In all other cases, I would say probably not. While it’s fun to have the latest standard, I recommend
waiting until your favorite language releases a first-party implementation. The benefits don’t outweigh the risk
of rolling your own solution - you can be assured that Apple/Google/Oracle’s implementation
will be secure, performant, and sufficiently random. As of now I’m happy to use the built-in UUID
and sort my objects
with an old-fashioned Date
property.
What’s next?
Just because I don’t plan to use Version 7 in my personal projects doesn’t mean I’m done experimenting with it.
The improved readability is great, sure, but is it fast? Next week let’s run our v7()
through some
speed tests and see how it performs compared to the v4
implementation in iOS.
Sources and Notes
-
See docs for Oracle, MySQL, SQL Server, and PostgreSQL ↩ ↩2
-
While the probability of a collision is negligible, it is not zero. See this explanation on Wikipedia. ↩
-
One hundred thirteen undecillion fifty-nine decillion seven hundred forty-nine nonillion one hundred forty-five octillion nine hundred thirty-six septillion three hundred twenty-five sextillion four hundred two quintillion three hundred fifty-four quadrillion two hundred fifty-seven trillion one hundred seventy-six billion nine hundred eighty-one million four hundred five thousand six hundred ninety-six per Edward Furey’s “Numbers to Words Converter” at Calculator Soup. ↩
-
Hexadecimal values can represent any number between 0-15 with a single character (0-9, A-F). Learn more with this digital guide by Ionos. ↩
-
The variant is only two bits, but for simplicity is expressed as four bits when highlighting its place in the hexadecimal
UUID
↩ -
The function has been modified to reduce whitespace and add explanatory comments. The relevant code is the same. ↩
-
The
.now
static variable was introduced in iOS 15. If you need to support an older version, useDate()
↩ -
Excluding comments and whitespace. ↩
-
See the first bullet point in the Motivation section of RFC 9562. ↩