|
||||||
|
||||||
GeneralObtaining and Using Jug
Feeds
|
"JUG" - Java Uuid Generator FAQ1. Why JUG?1.1. Don't we already have "uuidgen"?Some do, some don't. Most platforms have uuidgen command line tool (or something similar), but not all do. Additionally, accessing uuidgen from Java may be tricky (since its location in native OS filesystem depends on OS and possibly other factors). So, portability is one benefit; Jug works if you have Java 1.2. Performance may be another benefit when using Jug from Java. Interfacing to native functionality (either via uuidgen or directly to libuuigen) is likely to be slower than calling Jug methods, even if generation itself was faster. 1.1. But how about my (favourite project's) own Unique Identifier generator?Many projects (and even individual developers...) build their own home-grown unique identifier generation schemes; usually using Java's system timer and IP address, and sometimes adding memory location (via identity hash of one of system classes, or generator singleton class). 2. Why NOT use JUG?If you are paranoid about duplicate UUIDs (esp. when using time-based algorithm), there's no way to guarantee that multiple UUID-generators don't produce same UUID. It's still unlikely to happen (due to clock sequence field etc), but potentially a problem. Note, though, that with random- and name-based methods multiple instance of Jug are not a problem; name-based methods base the uniqueness on the name, not timing, and random-based method is based on quality of the random number generator. In latter case it all depends on how random one considers SecureRandom to be. 3. What is the fastest method to use for generating UUIDs?It depends on your system, random number generators used etc. etc., but here are some quick test results from my work station (Ultra-60 dual 450Mhz SparcII; JDK 1.3.1, default JIT == client) (measurements done using Jug command-line tool, generating 1000 UUIDs for each type):
Creating datestamps for tag uris (new Calendar instances for each URI) slows the last entry significantly down it seems. Note also that names & namespaces for the last three methods were relatively short, so the 'real' numbers might be bit worse for them too (esp. since generating the separate names will add cost; for this test 3. and 4. used the same namespace + name for each UUID which is not too realistic) So, it seems that for default settings, time-based algorithm is the fastest, followed by random-number based one. Name-based algorithms are slow probably due to MD5-hashing cost associated. (as a sidenote, at home on my 800mhz AMD system times were about half of those presented above) Finally, if performance really is very important for you, there is a further complication when using time-based algorithm; Java's system clock has max. resolution of 1 millisecond (that is, prior to Java 1.5 which also has a higher-resolution timer available on some platforms), instead of 100ns required by UUID specification. This is solved by using additional counter (in Jug), but the downside is that for each separate Java 'time slice' (time period when system clock returns same timestamp) can produce at most 10000 UUIDs. If JDK on the platform does advance in 1 msec ticks, this is good enough for generating up to 10 million UUIDs per second, but on many platforms resolution is coarser (on Windows it used to be 55 msec, meaning max. rate of 180 kUUIDs per second). ... which all means that for generating more than, say, ten thousand UUIDs per second, you may need to look at native implementations. 4. Which one should I use, assuming performance is not important?If you can access the ethernet card address it might be good idea to use time-based algorithm, if you will only be generating UUIDs from single JVM (and won't be using other UUID-tools at the same time). If so, uniqueness is pretty much guaranteed and algorithm is fast as well. One potential drawback is that in case you consider giving out ethernet address a security problem (which in theory it could be, although there probably aren't any major immediate problems), this method is not for you, since ether address is stored as is in last 6 bytes of UUID (this could be partially solved by hashing the ethernet address, but the standard doesn't mention this solution so it's not implemented yet) If there will be multiple UUID generators (different JVMs, using native uuidgen), using random-based method may be the best option. Finally, if it's easy to generate unique names from system (say, URL combined with a sequence number guaranteed to be unique), and especially if these 'human readable' identifiers (such as tagURIs) are otherwise used, it may be a good idea to use one of the name-based algorithms. 5. How can I obtain the Ethernet MAC-address of the machine JUG runs on?Before version 1.0, your options would be limited to using native tools and passing address to JUG, or using dummy randomly generated broadcast addresses. However, beginning from version 1.0, there exists limited support for C/JNI - based native access for obtaining interface addresses. To obtain MAC-address of the primary interface, just call: EthernetAddress primary = NativeInterfaces.getPrimaryInterface(); (Note that if there's a problem in loading the JNI library, an Error is thrown). To test that you can use JNI code, you can also directly invoke class org.safehaus.uuid.NativeInterfaces: its main() method will try to access the Ethernet address of the primary interface. Currently there exists binary library files for Linux/x86, (1.0.2): Now it is possible to load native code both by using 'standard' library loading methods (which rely on java env. variable 'java.library.path' for locating libs), as well as application-specific loading from any given directory (default being 'jug-native' in current directory). Default is still app-specific method; to enable standard loading, call NativeInterfaces.setUseStdLibDir(). 6. Is there a way to synchronize UUIDs produced by JUG instance running on separate JVMs?By default (and always with pre-2.0 Jug), Jug does not try to prevent multiple instances running from separate JVMs. The reason is that JVMs do not offer a generic mechanism for instances running on separate JVMs (or even via multiple class loaders!) to communicate easily. Starting with 2.0, there is a file-locking based synchronization mechanism that can be used to synchronize access, so that basically only one instance can ever run. 7. What about cases where system reboots, and system time is set to an earlier timestamp?By default (and always with pre-2.0 Jug), Jug has no way of knowing that system time has gone backwards between last run, and new startup. Although Jug does keep track of used timestamps when it is running (to prevent problems in cases where system time is moved backwards by system administrator), there was no mechanism to prevent problems during time Jug was not running. Starting with 2.0, there is a file-locking based synchronization mechanism that can be used to synchronize access, so that basically only one instance can ever run. To enable this feature, you need to: TO BE WRITTEN |
|||||
| ||||||