Performance of Nitrokey HSM2 - can it be improved? What am I doing wrong?

When the HSM2 got announced, there was the following statement made (source; emphasis mine):

In my experience performance for RSA-4096 has never been an issue. Initial key generation takes longer but daily signing and decryption takes about one second. A smaller model will come but no release date yet. Support for Curve 25519 will come in the next few years too (As of now, it’s supported by Nitrokey Start already).

That’s not the experience we’ve had with the HSM2.

Our setup is on Debian (bookworm) with OpenSC (0.23.0-0.3). Our application is to use osslsigncode to code-sign some software. Anything below 250 ms would be great and anything “about a second” per signature would still be acceptable.

I am using osslsigncode with env PKCS11SPY=/usr/lib/x86_64-linux-gnu/opensc-pkcs11.so and passing -pkcs11module /usr/lib/x86_64-linux-gnu/pkcs11/pkcs11-spy.so to it.

A test of twenty runs (plus initial warmups) with hyperfine gives me a minimum of 6.5 s per signature and the size of the file to be signed doesn’t really affect the measured times much (not a surprise, because it’s only signing the – off-card generated – “PE hash”).

But the claim is that one signature should take 4100 ms (source).

Performance (without hashing): RSA-1024: 90 ms, RSA-1536: 150 ms, RSA-2048: 250 ms, RSA-3074: 1900 ms, RSA-4096: 4100 ms, ECDSA-256: 80 ms, ECDH-256: 90ms, ECDSA-512: 190 ms, ECDH-512: 290 ms

And the claim from the announcement even says “about one second”.

So I tried it out and indeed the C_Initialize step alone takes roughly 3 s (just going by the timestamps pkcs11-spy.so outputs). The actual signing then still takes up the remainder of the measured time, pretty much all of the other steps are not measurable in milliseconds, going by the results of pkcs11-spy.so

Is there anything one could do to cut down on this? Perhaps sidestep OpenSC altogether with an alternative PKCS#11 module?

Indeed, these numbers are unusal slow. I don’t know if it is caused by OpenSC but you could give this alternative PKCS#11 module a try.

1 Like

Alright, an improvement but not considerably faster. I built the suggested PKCS#11 module (rev: 3dfdde5e9839caea46574657911d9fc32f1fd1fd) and can see it being used.

This time C_GetSlotList takes a little under 3 seconds. I reckon this corresponds to the C_Initialize step when done via OpenSC. The final C_Sign step takes again the bulk of the overall time, though.

For comparison

First item for each run is the size of the signed PE file.

Via libsc-hsm-pkcs11.so

1024 Bytes
Benchmark 1: ./osslsigncode sign -verbose -pkcs11engine /usr/lib/x86_64-linux-gnu/engines-3/pkcs11.so -pkcs11module /usr/local/lib/libsc-hsm-pkcs11.so -certs testcert.pem -key 'pkcs11:serial=DENK0000000;type=private' -readpass pin.txt -h sha256 -in 'smallest.exe' -out 'smallest.exe.signed'
  Time (mean ± σ):      6.019 s ±  0.019 s    [User: 0.005 s, System: 0.005 s]
  Range (min … max):    5.982 s …  6.049 s    20 runs

11776 Bytes
Benchmark 1: ./osslsigncode sign -verbose -pkcs11engine /usr/lib/x86_64-linux-gnu/engines-3/pkcs11.so -pkcs11module /usr/local/lib/libsc-hsm-pkcs11.so -certs testcert.pem -key 'pkcs11:serial=DENK0000000;type=private' -readpass pin.txt -h sha256 -in 'small.exe' -out 'small.exe.signed'
  Time (mean ± σ):      6.010 s ±  0.020 s    [User: 0.005 s, System: 0.005 s]
  Range (min … max):    5.972 s …  6.049 s    20 runs

1097216 Bytes
Benchmark 1: ./osslsigncode sign -verbose -pkcs11engine /usr/lib/x86_64-linux-gnu/engines-3/pkcs11.so -pkcs11module /usr/local/lib/libsc-hsm-pkcs11.so -certs testcert.pem -key 'pkcs11:serial=DENK0000000;type=private' -readpass pin.txt -h sha256 -in 'medium.dll' -out 'medium.dll.signed'
  Time (mean ± σ):      6.024 s ±  0.018 s    [User: 0.011 s, System: 0.005 s]
  Range (min … max):    5.995 s …  6.066 s    20 runs

23346176 Bytes
Benchmark 1: ./osslsigncode sign -verbose -pkcs11engine /usr/lib/x86_64-linux-gnu/engines-3/pkcs11.so -pkcs11module /usr/local/lib/libsc-hsm-pkcs11.so -certs testcert.pem -key 'pkcs11:serial=DENK0000000;type=private' -readpass pin.txt -h sha256 -in 'big.exe' -out 'big.exe.signed'
  Time (mean ± σ):      6.147 s ±  0.023 s    [User: 0.114 s, System: 0.026 s]
  Range (min … max):    6.113 s …  6.202 s    20 runs

Via opensc-pkcs11.so

1024 Bytes
Benchmark 1: ./osslsigncode sign -verbose -pkcs11engine /usr/lib/x86_64-linux-gnu/engines-3/pkcs11.so -pkcs11module /usr/lib/x86_64-linux-gnu/opensc-pkcs11.so -certs testcert.pem -key 'pkcs11:serial=DENK0000000;type=private' -readpass pin.txt -h sha256 -in 'smallest.exe' -out 'smallest.exe.signed'
  Time (mean ± σ):      6.601 s ±  0.014 s    [User: 0.008 s, System: 0.013 s]
  Range (min … max):    6.584 s …  6.640 s    20 runs

11776 Bytes
Benchmark 1: ./osslsigncode sign -verbose -pkcs11engine /usr/lib/x86_64-linux-gnu/engines-3/pkcs11.so -pkcs11module /usr/lib/x86_64-linux-gnu/opensc-pkcs11.so -certs testcert.pem -key 'pkcs11:serial=DENK0000000;type=private' -readpass pin.txt -h sha256 -in 'small.exe' -out 'small.exe.signed'
  Time (mean ± σ):      6.602 s ±  0.018 s    [User: 0.009 s, System: 0.013 s]
  Range (min … max):    6.558 s …  6.635 s    20 runs

1097216 Bytes
Benchmark 1: ./osslsigncode sign -verbose -pkcs11engine /usr/lib/x86_64-linux-gnu/engines-3/pkcs11.so -pkcs11module /usr/lib/x86_64-linux-gnu/opensc-pkcs11.so -certs testcert.pem -key 'pkcs11:serial=DENK0000000;type=private' -readpass pin.txt -h sha256 -in 'medium.dll' -out 'medium.dll.signed'
  Time (mean ± σ):      6.604 s ±  0.022 s    [User: 0.015 s, System: 0.013 s]
  Range (min … max):    6.567 s …  6.635 s    20 runs

23346176 Bytes
Benchmark 1: ./osslsigncode sign -verbose -pkcs11engine /usr/lib/x86_64-linux-gnu/engines-3/pkcs11.so -pkcs11module /usr/lib/x86_64-linux-gnu/opensc-pkcs11.so -certs testcert.pem -key 'pkcs11:serial=DENK0000000;type=private' -readpass pin.txt -h sha256 -in 'big.exe' -out 'big.exe.signed'
  Time (mean ± σ):      6.732 s ±  0.027 s    [User: 0.113 s, System: 0.034 s]
  Range (min … max):    6.693 s …  6.781 s    20 runs

As you can see it chips a little over half a second off of each signature creation.

PS: of course I redacted the serial number and some file names.

How many objects do you have on your HSM2? It my experience listing objects is much faster if one few objects are there.

Exactly a single key pair. So it’s one public key object, one private key object. Nothing more, nothing less.

The benchmark was quite a downer, to be honest.