Speed improvements for crypto library: Before and After

Speed improvements for crypto library (04 Aug 2000)

Before and after comparisons

What follows are collected numbers showing that the performance differences, after Jason Thorpe's recent commits, are quite noticeable:


Blowfish

Before:
Doing blowfish cbc for 3s on 8 size blocks: 2891026 blowfish cbc's in 2.97s
Doing blowfish cbc for 3s on 64 size blocks: 411766 blowfish cbc's in 3.10s
Doing blowfish cbc for 3s on 256 size blocks: 104721 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 1024 size blocks: 26291 blowfish cbc's in 2.98s
Doing blowfish cbc for 3s on 8192 size blocks: 3290 blowfish cbc's in 3.10s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc      7787.28k     8755.16k     8936.19k     9034.22k     8954.05k

After:
Doing blowfish cbc for 3s on 8 size blocks: 4573792 blowfish cbc's in 3.10s
Doing blowfish cbc for 3s on 64 size blocks: 713440 blowfish cbc's in 2.99s
Doing blowfish cbc for 3s on 256 size blocks: 183125 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 1024 size blocks: 46221 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 8192 size blocks: 5787 blowfish cbc's in 3.00s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     12156.26k    15270.96k    15626.67k    15776.77k    15802.37k

MD5

Before:
Doing md5 for 3s on 8 size blocks: 1490796 md5's in 3.00s
Doing md5 for 3s on 64 size blocks: 895849 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 410807 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 129416 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 17527 md5's in 3.00s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5               3975.46k    19111.45k    35055.53k    44173.99k    47860.39k

After:
Doing md5 for 3s on 8 size blocks: 2041410 md5's in 3.00s
Doing md5 for 3s on 64 size blocks: 1345402 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 669827 md5's in 3.10s
Doing md5 for 3s on 1024 size blocks: 221744 md5's in 2.96s
Doing md5 for 3s on 8192 size blocks: 30685 md5's in 3.00s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5               5443.76k    28701.91k    56968.68k    76711.44k    83790.51k

RMD160

Before:
Doing rmd160 for 3s on 8 size blocks: 778828 rmd160's in 3.00s
Doing rmd160 for 3s on 64 size blocks: 430214 rmd160's in 3.00s
Doing rmd160 for 3s on 256 size blocks: 182108 rmd160's in 3.00s
Doing rmd160 for 3s on 1024 size blocks: 55050 rmd160's in 3.00s
Doing rmd160 for 3s on 8192 size blocks: 7339 rmd160's in 3.00s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rmd160            2076.87k     9177.90k    15539.88k    18790.40k    20040.36k

After:
Doing rmd160 for 3s on 8 size blocks: 1084941 rmd160's in 3.00s
Doing rmd160 for 3s on 64 size blocks: 617966 rmd160's in 3.00s
Doing rmd160 for 3s on 256 size blocks: 267381 rmd160's in 2.99s
Doing rmd160 for 3s on 1024 size blocks: 82001 rmd160's in 3.00s
Doing rmd160 for 3s on 8192 size blocks: 10974 rmd160's in 3.00s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rmd160            2893.18k    13183.27k    22892.82k    27989.67k    29966.34k

BIGNUM routines

Before:
Doing 512 bit sign dsa's for 10s: 965 512 bit DSA signs in 9.97s
Doing 512 bit verify dsa's for 10s: 766 512 bit DSA verify in 9.93s
Doing 1024 bit sign dsa's for 10s: 276 1024 bit DSA signs in 9.99s
Doing 1024 bit verify dsa's for 10s: 217 1024 bit DSA verify in 9.93s
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0103s   0.0130s     96.8     77.1
dsa 1024 bits   0.0362s   0.0458s     27.6     21.9

After:
Doing 512 bit sign dsa's for 10s: 3742 512 bit DSA signs in 9.88s
Doing 512 bit verify dsa's for 10s: 3065 512 bit DSA verify in 9.92s
Doing 1024 bit sign dsa's for 10s: 1357 1024 bit DSA signs in 9.99s
Doing 1024 bit verify dsa's for 10s: 1094 1024 bit DSA verify in 9.83s
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0026s   0.0032s    378.7    309.0
dsa 1024 bits   0.0074s   0.0090s    135.8    111.3