OpenJDK has long had provision for cryptography support via NSS using the provider sun.security.pkcs11.SunPKCS11, but I get the impression it isn’t widely used or even known about. It requires some configuration to setup, as the OpenJDK build doesn’t look for NSS and the provider thus has to be manually configured with the location of the library. In September 2009, I added support to IcedTea to automate a lot of this, so you can now pass –enable-nss to the build and it will detect NSS and set it up in ${java.home}/jre/lib/security/java.security. I know Matthias Klose (doko) has used this, as he did further work on it within IcedTea, and I believe it may be enabled in Debian and/or Ubuntu builds. It’s also available (via USE=”nss”) in Gentoo, and it all three distros, it can be used to make elliptic curve cryptography available; this was the original motivation for adding support as noted in PR356.

Over the last week or so, we at Red Hat have been looking at whether using this NSS provider as the primary provider (i.e. listing it as number one in the java.security file) provides a performance advantage, specifically as concerns AES and the fact that newer chips have AES instructions which are supported by assembly code in NSS. We wrote a little test case which runs a number of encryption and decryption cycles (we ran with 20,000) and measures the average number of nanoseconds take to encrypt or decrypt each byte, and the amount of MB/s overall. We ran this on both IcedTea6 & IcedTea7 builds with and without the NSS provider enabled at the top priority. The results are below. The ns/byte figures are rounded to two decimal places. Tests were run on a 2.7GHz Intel Xeon E5-2680 machine with hardware AES instructions and 4 processors with 8 cores each, giving 32 cores, each with 20MB L3 cache.

IcedTea6 without the NSS provider

Keysize 4k block, enc 4k block, dec 32k block, enc 32k block, dec 256k block, enc 256k block, dec 1024k block, enc 1024k block, dec
128 bit 9.54 ns/byte, 99MB/s 10.57 ns/byte, 90MB/s 9.10 ns/byte, 105MB/s 9.80 ns/byte, 97MB/s 8.77 ns/byte, 108MB/s 9.35 ns/byte, 102MB/s 8.70 ns/byte, 109MB/s 9.43 ns/byte, 101MB/s
192 bit 10.42 ns/byte, 91MB/s 11.29 ns/byte, 84MB/s 10.66 ns/byte, 89MB/s 11.48 ns/byte, 83MB/s 9.95 ns/byte, 96MB/s 10.67 ns/byte, 89MB/s 10.23 ns/byte, 93MB/s 10.75 ns/byte, 88MB/s
256 bit 11.80 ns/byte, 80MB/s 12.79 ns/byte, 74MB/s 12.03 ns/byte, 79MB/s 12.98 ns/byte, 73MB/s 11.17 ns/byte, 85MB/s 11.96 ns/byte, 80MB/s 11.08 ns/byte, 86MB/s 12.00 ns/byte, 79MB/s

IcedTea6 with the NSS Provider

Keysize 4k block, enc 4k block, dec 32k block, enc 32k block, dec 256k block, enc 256k block, dec 1024k block, enc 1024k block, dec
128 bit 2.77 ns/byte, 344MB/s 1.16 ns/byte, 832MB/s 1.70 ns/byte, 571MB/s 0.44 ns/byte, 2222MB/s 1.99 ns/byte, 487MB/s 0.77 ns/byte, 1250MB/s 2.17 ns/byte, 444MB/s 1.01 ns/byte, 952MB/s
192 bit 2.66 ns/byte, 363MB/s 0.92 ns/byte, 1050MB/s 2.07 ns/byte, 476MB/s 0.51 ns/byte, 2000MB/s 2.28 ns/byte, 425MB/s 0.83 ns/byte, 1176MB/s 2.45 ns/byte, 392MB/s 1.05 ns/byte, 909MB/s
256 bit 2.97 ns/byte, 322MB/s 0.99 ns/byte, 998MB/s 2.30 ns/byte, 416MB/s 0.57 ns/byte, 1818MB/s 2.57 ns/byte, 377MB/s 0.88 ns/byte, 1111MB/s 2.75 ns/byte, 350MB/s 1.11 ns/byte, 869MB/s

There’s clearly a significant improvement when using the NSS provider on a machine with AES instructions. We still need to run more tests to see what the improvement (if any) is when using NSS without AES instructions. As an aside, an early version of the test didn’t show as significant an increase, instead more like a 1.6x speedup. We believe this was down to several issues:

  1. The entire process was being measured, including setup of the cipher, not just the encryption.
  2. The use of System.currentTimeMillis() rather than nanoseconds
  3. The use of doFinal where it returns a new array, rather than reusing the old one, thus creating significant garbage.

Update

Further results are collected in subsequent blogs: