Skip to content

Function for measuring algorithmic performance of high-level and low-level integer64 functions

Usage

benchmark64(nsmall = 2L^16L, nbig = 2L^25L, timefun = repeat.time)

optimizer64(
  nsmall = 2L^16L,
  nbig = 2L^25L,
  timefun = repeat.time,
  what = c("match", "%in%", "duplicated", "unique", "unipos", "table", "rank",
    "quantile"),
  uniorder = c("original", "values", "any"),
  taborder = c("values", "counts"),
  plot = TRUE
)

Arguments

nsmall

size of smaller vector

nbig

size of larger bigger vector

timefun

a function for timing such as bit::repeat.time() or system.time()

what

a vector of names of high-level functions

uniorder

one of the order parameters that are allowed in unique.integer64() and unipos.integer64()

taborder

one of the order parameters that are allowed in table.integer64()

plot

set to FALSE to suppress plotting

Value

benchmark64 returns a matrix with elapsed seconds, different high-level tasks in rows and different scenarios to solve the task in columns. The last row named 'SESSION' contains the elapsed seconds of the exemplary sesssion.

optimizer64 returns a dimensioned list with one row for each high-level function timed and two columns named after the values of the nsmall and nbig sample sizes. Each list cell contains a matrix with timings, low-level-methods in rows and three measurements c("prep","both","use") in columns. If it can be measured separately, prep contains the timing of preparatory work such as sorting and hashing, and use contains the timing of using the prepared work. If the function timed does both, preparation and use, the timing is in both.

Details

benchmark64 compares the following scenarios for the following use cases:

scenario nameexplanation
32-bitapplying Base R function to 32-bit integer data
64-bitapplying bit64 function to 64-bit integer data (with no cache)
hashcacheditto when cache contains hashmap(), see hashcache()
sortordercacheditto when cache contains sorting and ordering, see sortordercache()
ordercacheditto when cache contains ordering only, see ordercache()
allcacheditto when cache contains sorting, ordering and hashing
use case nameexplanation
cachefilling the cache according to scenario
match(s,b)match small in big vector
s %in% bsmall %in% big vector
match(b,s)match big in small vector
b %in% sbig %in% small vector
match(b,b)match big in (different) big vector
b %in% bbig %in% (different) big vector
duplicated(b)duplicated of big vector
unique(b)unique of big vector
table(b)table of big vector
sort(b)sorting of big vector
order(b)ordering of big vector
rank(b)ranking of big vector
quantile(b)quantiles of big vector
summary(b)summary of of big vector
SESSIONexemplary session involving multiple calls (including cache filling costs)

Note that the timings for the cached variants do not contain the time costs of building the cache, except for the timing of the exemplary user session, where the cache costs are included in order to evaluate amortization.

Functions

  • benchmark64(): compares high-level integer64 functions against the integer functions from Base R

  • optimizer64(): compares for each high-level integer64 function the Base R integer function with several low-level integer64 functions with and without caching

See also

Examples

message("this small example using system.time does not give serious timings\n
this we do this only to run regression tests")
#> this small example using system.time does not give serious timings
#> 
#> this we do this only to run regression tests
benchmark64(nsmall=2^7, nbig=2^13, timefun=function(expr)system.time(expr, gcFirst=FALSE))
#> 
#> compare performance for a complete sessions of calls
#> 
#> === 32-bit ===
#> check data range, mean etc.
#> get all percentiles for plotting distribution shape
#> list the upper and lower permille of values
#> OK, for some of these values I want to see the complete ROW, so I need their positions in the data.frame
#> check if any values are duplicated
#> since not unique, then check distribution of frequencies
#> OK, let's plot the percentiles of unique values versus the percentiles allowing for duplicates
#> check whether we find a match for each fact in the dimension table
#> check whether there are any dimension table entries not in the fact table
#> check whether we find a match for each fact in a parallel fact table
#> find positions of facts in dimension table for joining
#> find positions of facts in parallel fact table for joining
#> out of curiosity: how well rank-correlated are fact and parallel fact table?
#>         32-bit 64-bit hashcache sortordercache ordercache allcache
#> seconds  0.011      0         0              0          0        0
#> factor   1.000    Inf       Inf            Inf        Inf      Inf
#> 
#> === 64-bit ===
#> check data range, mean etc.
#> get all percentiles for plotting distribution shape
#> list the upper and lower permille of values
#> OK, for some of these values I want to see the complete ROW, so I need their positions in the data.frame
#> check if any values are duplicated
#> since not unique, then check distribution of frequencies
#> OK, let's plot the percentiles of unique values versus the percentiles allowing for duplicates
#> check whether we find a match for each fact in the dimension table
#> check whether there are any dimension table entries not in the fact table
#> check whether we find a match for each fact in a parallel fact table
#> find positions of facts in dimension table for joining
#> find positions of facts in parallel fact table for joining
#> out of curiosity: how well rank-correlated are fact and parallel fact table?
#>         32-bit 64-bit hashcache sortordercache ordercache allcache
#> seconds  0.011  0.007         0              0          0        0
#> factor   1.000  1.571       Inf            Inf        Inf      Inf
#> 
#> === hashcache ===
#> check data range, mean etc.
#> get all percentiles for plotting distribution shape
#> list the upper and lower permille of values
#> OK, for some of these values I want to see the complete ROW, so I need their positions in the data.frame
#> check if any values are duplicated
#> since not unique, then check distribution of frequencies
#> OK, let's plot the percentiles of unique values versus the percentiles allowing for duplicates
#> check whether we find a match for each fact in the dimension table
#> check whether there are any dimension table entries not in the fact table
#> check whether we find a match for each fact in a parallel fact table
#> find positions of facts in dimension table for joining
#> find positions of facts in parallel fact table for joining
#> out of curiosity: how well rank-correlated are fact and parallel fact table?
#>         32-bit 64-bit hashcache sortordercache ordercache allcache
#> seconds  0.011  0.007     0.006              0          0        0
#> factor   1.000  1.571     1.833            Inf        Inf      Inf
#> 
#> === sortordercache ===
#> check data range, mean etc.
#> get all percentiles for plotting distribution shape
#> list the upper and lower permille of values
#> OK, for some of these values I want to see the complete ROW, so I need their positions in the data.frame
#> check if any values are duplicated
#> since not unique, then check distribution of frequencies
#> OK, let's plot the percentiles of unique values versus the percentiles allowing for duplicates
#> check whether we find a match for each fact in the dimension table
#> check whether there are any dimension table entries not in the fact table
#> check whether we find a match for each fact in a parallel fact table
#> find positions of facts in dimension table for joining
#> find positions of facts in parallel fact table for joining
#> out of curiosity: how well rank-correlated are fact and parallel fact table?
#>         32-bit 64-bit hashcache sortordercache ordercache allcache
#> seconds  0.011  0.007     0.006          0.008          0        0
#> factor   1.000  1.571     1.833          1.375        Inf      Inf
#> 
#> === ordercache ===
#> check data range, mean etc.
#> get all percentiles for plotting distribution shape
#> list the upper and lower permille of values
#> OK, for some of these values I want to see the complete ROW, so I need their positions in the data.frame
#> check if any values are duplicated
#> since not unique, then check distribution of frequencies
#> OK, let's plot the percentiles of unique values versus the percentiles allowing for duplicates
#> check whether we find a match for each fact in the dimension table
#> check whether there are any dimension table entries not in the fact table
#> check whether we find a match for each fact in a parallel fact table
#> find positions of facts in dimension table for joining
#> find positions of facts in parallel fact table for joining
#> out of curiosity: how well rank-correlated are fact and parallel fact table?
#>         32-bit 64-bit hashcache sortordercache ordercache allcache
#> seconds  0.011  0.007     0.006          0.008      0.016        0
#> factor   1.000  1.571     1.833          1.375      0.687      Inf
#> 
#> === allcache ===
#> check data range, mean etc.
#> get all percentiles for plotting distribution shape
#> list the upper and lower permille of values
#> OK, for some of these values I want to see the complete ROW, so I need their positions in the data.frame
#> check if any values are duplicated
#> since not unique, then check distribution of frequencies
#> OK, let's plot the percentiles of unique values versus the percentiles allowing for duplicates
#> check whether we find a match for each fact in the dimension table
#> check whether there are any dimension table entries not in the fact table
#> check whether we find a match for each fact in a parallel fact table
#> find positions of facts in dimension table for joining
#> find positions of facts in parallel fact table for joining
#> out of curiosity: how well rank-correlated are fact and parallel fact table?
#>         32-bit 64-bit hashcache sortordercache ordercache allcache
#> seconds  0.011  0.007     0.006          0.008      0.016    0.001
#> factor   1.000  1.571     1.833          1.375      0.687   11.000
#> 
#> now let's look more systematically at the components involved
#> 32-bit match(s,b)
#> 32-bit s %in% b
#> 32-bit match(b,s)
#> 32-bit b %in% s
#> 32-bit match(b,b)
#> 32-bit b %in% b
#> 32-bit duplicated(b)
#> 32-bit unique(b)
#> 32-bit table(b)
#> 32-bit sort(b)
#> 32-bit order(b)
#> 32-bit rank(b)
#> 32-bit quantile(b)
#> 32-bit summary(b)
#> seconds              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache          0.000  0.000     0.000          0.000      0.000    0.000
#> match(s,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> s %in% b       0.000  0.000     0.000          0.000      0.000    0.000
#> match(b,s)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% s       0.001  0.000     0.000          0.000      0.000    0.000
#> match(b,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% b       0.001  0.000     0.000          0.000      0.000    0.000
#> duplicated(b)  0.000  0.000     0.000          0.000      0.000    0.000
#> unique(b)      0.001  0.000     0.000          0.000      0.000    0.000
#> table(b)       0.003  0.000     0.000          0.000      0.000    0.000
#> sort(b)        0.000  0.000     0.000          0.000      0.000    0.000
#> order(b)       0.001  0.000     0.000          0.000      0.000    0.000
#> rank(b)        0.001  0.000     0.000          0.000      0.000    0.000
#> quantile(b)    0.000  0.000     0.000          0.000      0.000    0.000
#> summary(b)     0.000  0.000     0.000          0.000      0.000    0.000
#> SESSION        0.011  0.007     0.006          0.008      0.016    0.001
#> factor              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache            NaN    NaN       NaN            NaN        NaN      NaN
#> match(s,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> s %in% b         NaN    NaN       NaN            NaN        NaN      NaN
#> match(b,s)       NaN    NaN       NaN            NaN        NaN      NaN
#> b %in% s           1    Inf       Inf            Inf        Inf      Inf
#> match(b,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> b %in% b           1    Inf       Inf            Inf        Inf      Inf
#> duplicated(b)    NaN    NaN       NaN            NaN        NaN      NaN
#> unique(b)          1    Inf       Inf            Inf        Inf      Inf
#> table(b)           1    Inf       Inf            Inf        Inf      Inf
#> sort(b)          NaN    NaN       NaN            NaN        NaN      NaN
#> order(b)           1    Inf       Inf            Inf        Inf      Inf
#> rank(b)            1    Inf       Inf            Inf        Inf      Inf
#> quantile(b)      NaN    NaN       NaN            NaN        NaN      NaN
#> summary(b)       NaN    NaN       NaN            NaN        NaN      NaN
#> SESSION            1  1.571     1.833          1.375      0.687       11
#> 64-bit match(s,b)
#> 64-bit s %in% b
#> 64-bit match(b,s)
#> 64-bit b %in% s
#> 64-bit match(b,b)
#> 64-bit b %in% b
#> 64-bit duplicated(b)
#> 64-bit unique(b)
#> 64-bit table(b)
#> 64-bit sort(b)
#> 64-bit order(b)
#> 64-bit rank(b)
#> 64-bit quantile(b)
#> 64-bit summary(b)
#> seconds              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache          0.000  0.000     0.000          0.000      0.000    0.000
#> match(s,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> s %in% b       0.000  0.000     0.000          0.000      0.000    0.000
#> match(b,s)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% s       0.001  0.000     0.000          0.000      0.000    0.000
#> match(b,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% b       0.001  0.000     0.000          0.000      0.000    0.000
#> duplicated(b)  0.000  0.000     0.000          0.000      0.000    0.000
#> unique(b)      0.001  0.000     0.000          0.000      0.000    0.000
#> table(b)       0.003  0.000     0.000          0.000      0.000    0.000
#> sort(b)        0.000  0.000     0.000          0.000      0.000    0.000
#> order(b)       0.001  0.000     0.000          0.000      0.000    0.000
#> rank(b)        0.001  0.000     0.000          0.000      0.000    0.000
#> quantile(b)    0.000  0.000     0.000          0.000      0.000    0.000
#> summary(b)     0.000  0.000     0.000          0.000      0.000    0.000
#> SESSION        0.011  0.007     0.006          0.008      0.016    0.001
#> factor              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache            NaN    NaN       NaN            NaN        NaN      NaN
#> match(s,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> s %in% b         NaN    NaN       NaN            NaN        NaN      NaN
#> match(b,s)       NaN    NaN       NaN            NaN        NaN      NaN
#> b %in% s           1    Inf       Inf            Inf        Inf      Inf
#> match(b,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> b %in% b           1    Inf       Inf            Inf        Inf      Inf
#> duplicated(b)    NaN    NaN       NaN            NaN        NaN      NaN
#> unique(b)          1    Inf       Inf            Inf        Inf      Inf
#> table(b)           1    Inf       Inf            Inf        Inf      Inf
#> sort(b)          NaN    NaN       NaN            NaN        NaN      NaN
#> order(b)           1    Inf       Inf            Inf        Inf      Inf
#> rank(b)            1    Inf       Inf            Inf        Inf      Inf
#> quantile(b)      NaN    NaN       NaN            NaN        NaN      NaN
#> summary(b)       NaN    NaN       NaN            NaN        NaN      NaN
#> SESSION            1  1.571     1.833          1.375      0.687       11
#> hashcache cache
#> hashcache match(s,b)
#> hashcache s %in% b
#> hashcache match(b,s)
#> hashcache b %in% s
#> hashcache match(b,b)
#> hashcache b %in% b
#> hashcache duplicated(b)
#> hashcache unique(b)
#> hashcache table(b)
#> hashcache sort(b)
#> hashcache order(b)
#> hashcache rank(b)
#> hashcache quantile(b)
#> hashcache summary(b)
#> seconds              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache          0.000  0.000     0.000          0.000      0.000    0.000
#> match(s,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> s %in% b       0.000  0.000     0.000          0.000      0.000    0.000
#> match(b,s)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% s       0.001  0.000     0.000          0.000      0.000    0.000
#> match(b,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% b       0.001  0.000     0.000          0.000      0.000    0.000
#> duplicated(b)  0.000  0.000     0.000          0.000      0.000    0.000
#> unique(b)      0.001  0.000     0.000          0.000      0.000    0.000
#> table(b)       0.003  0.000     0.000          0.000      0.000    0.000
#> sort(b)        0.000  0.000     0.001          0.000      0.000    0.000
#> order(b)       0.001  0.000     0.000          0.000      0.000    0.000
#> rank(b)        0.001  0.000     0.000          0.000      0.000    0.000
#> quantile(b)    0.000  0.000     0.000          0.000      0.000    0.000
#> summary(b)     0.000  0.000     0.001          0.000      0.000    0.000
#> SESSION        0.011  0.007     0.006          0.008      0.016    0.001
#> factor              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache            NaN    NaN       NaN            NaN        NaN      NaN
#> match(s,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> s %in% b         NaN    NaN       NaN            NaN        NaN      NaN
#> match(b,s)       NaN    NaN       NaN            NaN        NaN      NaN
#> b %in% s           1    Inf       Inf            Inf        Inf      Inf
#> match(b,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> b %in% b           1    Inf       Inf            Inf        Inf      Inf
#> duplicated(b)    NaN    NaN       NaN            NaN        NaN      NaN
#> unique(b)          1    Inf       Inf            Inf        Inf      Inf
#> table(b)           1    Inf       Inf            Inf        Inf      Inf
#> sort(b)          NaN    NaN     0.000            NaN        NaN      NaN
#> order(b)           1    Inf       Inf            Inf        Inf      Inf
#> rank(b)            1    Inf       Inf            Inf        Inf      Inf
#> quantile(b)      NaN    NaN       NaN            NaN        NaN      NaN
#> summary(b)       NaN    NaN     0.000            NaN        NaN      NaN
#> SESSION            1  1.571     1.833          1.375      0.687       11
#> sortordercache cache
#> sortordercache match(s,b)
#> sortordercache s %in% b
#> sortordercache match(b,s)
#> sortordercache b %in% s
#> sortordercache match(b,b)
#> sortordercache b %in% b
#> sortordercache duplicated(b)
#> sortordercache unique(b)
#> sortordercache table(b)
#> sortordercache sort(b)
#> sortordercache order(b)
#> sortordercache rank(b)
#> sortordercache quantile(b)
#> sortordercache summary(b)
#> seconds              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache          0.000  0.000     0.000          0.001      0.000    0.000
#> match(s,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> s %in% b       0.000  0.000     0.000          0.000      0.000    0.000
#> match(b,s)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% s       0.001  0.000     0.000          0.000      0.000    0.000
#> match(b,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% b       0.001  0.000     0.000          0.000      0.000    0.000
#> duplicated(b)  0.000  0.000     0.000          0.000      0.000    0.000
#> unique(b)      0.001  0.000     0.000          0.000      0.000    0.000
#> table(b)       0.003  0.000     0.000          0.001      0.000    0.000
#> sort(b)        0.000  0.000     0.001          0.000      0.000    0.000
#> order(b)       0.001  0.000     0.000          0.000      0.000    0.000
#> rank(b)        0.001  0.000     0.000          0.000      0.000    0.000
#> quantile(b)    0.000  0.000     0.000          0.000      0.000    0.000
#> summary(b)     0.000  0.000     0.001          0.000      0.000    0.000
#> SESSION        0.011  0.007     0.006          0.008      0.016    0.001
#> factor              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache            NaN    NaN       NaN          0.000        NaN      NaN
#> match(s,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> s %in% b         NaN    NaN       NaN            NaN        NaN      NaN
#> match(b,s)       NaN    NaN       NaN            NaN        NaN      NaN
#> b %in% s           1    Inf       Inf            Inf        Inf      Inf
#> match(b,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> b %in% b           1    Inf       Inf            Inf        Inf      Inf
#> duplicated(b)    NaN    NaN       NaN            NaN        NaN      NaN
#> unique(b)          1    Inf       Inf            Inf        Inf      Inf
#> table(b)           1    Inf       Inf          3.000        Inf      Inf
#> sort(b)          NaN    NaN     0.000            NaN        NaN      NaN
#> order(b)           1    Inf       Inf            Inf        Inf      Inf
#> rank(b)            1    Inf       Inf            Inf        Inf      Inf
#> quantile(b)      NaN    NaN       NaN            NaN        NaN      NaN
#> summary(b)       NaN    NaN     0.000            NaN        NaN      NaN
#> SESSION            1  1.571     1.833          1.375      0.687       11
#> ordercache cache
#> ordercache match(s,b)
#> ordercache s %in% b
#> ordercache match(b,s)
#> ordercache b %in% s
#> ordercache match(b,b)
#> ordercache b %in% b
#> ordercache duplicated(b)
#> ordercache unique(b)
#> ordercache table(b)
#> ordercache sort(b)
#> ordercache order(b)
#> ordercache rank(b)
#> ordercache quantile(b)
#> ordercache summary(b)
#> seconds              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache          0.000  0.000     0.000          0.001      0.001    0.000
#> match(s,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> s %in% b       0.000  0.000     0.000          0.000      0.000    0.000
#> match(b,s)     0.000  0.000     0.000          0.000      0.001    0.000
#> b %in% s       0.001  0.000     0.000          0.000      0.000    0.000
#> match(b,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% b       0.001  0.000     0.000          0.000      0.000    0.000
#> duplicated(b)  0.000  0.000     0.000          0.000      0.000    0.000
#> unique(b)      0.001  0.000     0.000          0.000      0.000    0.000
#> table(b)       0.003  0.000     0.000          0.001      0.000    0.000
#> sort(b)        0.000  0.000     0.001          0.000      0.000    0.000
#> order(b)       0.001  0.000     0.000          0.000      0.000    0.000
#> rank(b)        0.001  0.000     0.000          0.000      0.000    0.000
#> quantile(b)    0.000  0.000     0.000          0.000      0.000    0.000
#> summary(b)     0.000  0.000     0.001          0.000      0.000    0.000
#> SESSION        0.011  0.007     0.006          0.008      0.016    0.001
#> factor              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache            NaN    NaN       NaN          0.000      0.000      NaN
#> match(s,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> s %in% b         NaN    NaN       NaN            NaN        NaN      NaN
#> match(b,s)       NaN    NaN       NaN            NaN      0.000      NaN
#> b %in% s           1    Inf       Inf            Inf        Inf      Inf
#> match(b,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> b %in% b           1    Inf       Inf            Inf        Inf      Inf
#> duplicated(b)    NaN    NaN       NaN            NaN        NaN      NaN
#> unique(b)          1    Inf       Inf            Inf        Inf      Inf
#> table(b)           1    Inf       Inf          3.000        Inf      Inf
#> sort(b)          NaN    NaN     0.000            NaN        NaN      NaN
#> order(b)           1    Inf       Inf            Inf        Inf      Inf
#> rank(b)            1    Inf       Inf            Inf        Inf      Inf
#> quantile(b)      NaN    NaN       NaN            NaN        NaN      NaN
#> summary(b)       NaN    NaN     0.000            NaN        NaN      NaN
#> SESSION            1  1.571     1.833          1.375      0.687       11
#> allcache cache
#> allcache match(s,b)
#> allcache s %in% b
#> allcache match(b,s)
#> allcache b %in% s
#> allcache match(b,b)
#> allcache b %in% b
#> allcache duplicated(b)
#> allcache unique(b)
#> allcache table(b)
#> allcache sort(b)
#> allcache order(b)
#> allcache rank(b)
#> allcache quantile(b)
#> allcache summary(b)
#> seconds              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache          0.000  0.000     0.000          0.001      0.001    0.001
#> match(s,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> s %in% b       0.000  0.000     0.000          0.000      0.000    0.000
#> match(b,s)     0.000  0.000     0.000          0.000      0.001    0.000
#> b %in% s       0.001  0.000     0.000          0.000      0.000    0.000
#> match(b,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% b       0.001  0.000     0.000          0.000      0.000    0.000
#> duplicated(b)  0.000  0.000     0.000          0.000      0.000    0.000
#> unique(b)      0.001  0.000     0.000          0.000      0.000    0.000
#> table(b)       0.003  0.000     0.000          0.001      0.000    0.001
#> sort(b)        0.000  0.000     0.001          0.000      0.000    0.000
#> order(b)       0.001  0.000     0.000          0.000      0.000    0.000
#> rank(b)        0.001  0.000     0.000          0.000      0.000    0.000
#> quantile(b)    0.000  0.000     0.000          0.000      0.000    0.000
#> summary(b)     0.000  0.000     0.001          0.000      0.000    0.000
#> SESSION        0.011  0.007     0.006          0.008      0.016    0.001
#> factor              32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache            NaN    NaN       NaN          0.000      0.000        0
#> match(s,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> s %in% b         NaN    NaN       NaN            NaN        NaN      NaN
#> match(b,s)       NaN    NaN       NaN            NaN      0.000      NaN
#> b %in% s           1    Inf       Inf            Inf        Inf      Inf
#> match(b,b)       NaN    NaN       NaN            NaN        NaN      NaN
#> b %in% b           1    Inf       Inf            Inf        Inf      Inf
#> duplicated(b)    NaN    NaN       NaN            NaN        NaN      NaN
#> unique(b)          1    Inf       Inf            Inf        Inf      Inf
#> table(b)           1    Inf       Inf          3.000        Inf        3
#> sort(b)          NaN    NaN     0.000            NaN        NaN      NaN
#> order(b)           1    Inf       Inf            Inf        Inf      Inf
#> rank(b)            1    Inf       Inf            Inf        Inf      Inf
#> quantile(b)      NaN    NaN       NaN            NaN        NaN      NaN
#> summary(b)       NaN    NaN     0.000            NaN        NaN      NaN
#> SESSION            1  1.571     1.833          1.375      0.687       11
#>               32-bit 64-bit hashcache sortordercache ordercache allcache
#> cache          0.000  0.000     0.000          0.001      0.001    0.001
#> match(s,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> s %in% b       0.000  0.000     0.000          0.000      0.000    0.000
#> match(b,s)     0.000  0.000     0.000          0.000      0.001    0.000
#> b %in% s       0.001  0.000     0.000          0.000      0.000    0.000
#> match(b,b)     0.000  0.000     0.000          0.000      0.000    0.000
#> b %in% b       0.001  0.000     0.000          0.000      0.000    0.000
#> duplicated(b)  0.000  0.000     0.000          0.000      0.000    0.000
#> unique(b)      0.001  0.000     0.000          0.000      0.000    0.000
#> table(b)       0.003  0.000     0.000          0.001      0.000    0.001
#> sort(b)        0.000  0.000     0.001          0.000      0.000    0.000
#> order(b)       0.001  0.000     0.000          0.000      0.000    0.000
#> rank(b)        0.001  0.000     0.000          0.000      0.000    0.000
#> quantile(b)    0.000  0.000     0.000          0.000      0.000    0.000
#> summary(b)     0.000  0.000     0.001          0.000      0.000    0.000
#> SESSION        0.011  0.007     0.006          0.008      0.016    0.001
optimizer64(nsmall=2^7, nbig=2^13, timefun=function(expr)system.time(expr, gcFirst=FALSE)
, plot=FALSE
)
#> match: timings of different methods
#> %in%: timings of different methods
#> duplicated: timings of different methods
#> unique: timings of different methods
#> unipos: timings of different methods
#> table: timings of different methods
#> rank: timings of different methods
#> quantile: timings of different methods
#>            128        8192      
#> match      numeric,27 numeric,27
#> %in%       numeric,30 numeric,30
#> duplicated numeric,30 numeric,30
#> unique     numeric,45 numeric,45
#> unipos     numeric,42 numeric,42
#> table      numeric,39 numeric,39
#> rank       numeric,21 numeric,21
#> quantile   numeric,18 numeric,18
if (FALSE) { # \dontrun{
message("for real measurement of sufficiently large datasets run this on your machine")
benchmark64()
optimizer64()
} # }
message("let's look at the performance results on Core i7 Lenovo T410 with 8 GB RAM")
#> let's look at the performance results on Core i7 Lenovo T410 with 8 GB RAM
data(benchmark64.data)
print(benchmark64.data)
#>                     32-bit       64-bit    hashcache sortordercache
#> cache         2.546532e-05 1.522116e-05  3.340000000   4.900000e+00
#> match(s,b)    2.370000e+00 6.300000e-01  0.004950495   3.785714e-02
#> s %in% b      2.390000e+00 6.700000e-01  0.004761905   3.333333e-02
#> match(b,s)    1.280000e+00 6.200000e-01  0.630000000   6.400000e-01
#> b %in% s      1.390000e+00 6.300000e-01  0.630000000   6.400000e-01
#> match(b,b)    7.020000e+00 3.860000e+00  2.260000000   4.230000e+00
#> b %in% b      7.420000e+00 3.790000e+00  2.200000000   4.130000e+00
#> duplicated(b) 2.620000e+00 2.090000e+00  0.500000000   6.100000e-01
#> unique(b)     2.860000e+00 2.090000e+00  1.830000000   7.500000e-01
#> table(b)      5.105700e+02 2.260000e+00  2.430000000   5.000000e-01
#> sort(b)       8.420000e+00 1.610000e+00  1.590000000   2.233333e-01
#> order(b)      5.349000e+01 2.590000e+00  2.540000000   1.120000e-01
#> rank(b)       5.614000e+01 3.200000e+00  3.280000000   9.000000e-01
#> quantile(b)   9.600000e-01 1.590000e+00  1.600000000   6.318348e-04
#> summary(b)    1.640000e+00 1.640000e+00  1.670000000   4.727273e-02
#> SESSION       8.499700e+02 3.680002e+01 33.954807692   2.131084e+01
#>                 ordercache     allcache
#> cache         23.620000000 8.590000e+00
#> match(s,b)     0.166666667 4.950495e-03
#> s %in% b       0.170000000 4.761905e-03
#> match(b,s)     0.620000000 6.200000e-01
#> b %in% s       0.630000000 6.300000e-01
#> match(b,b)     4.260000000 2.280000e+00
#> b %in% b       4.180000000 2.210000e+00
#> duplicated(b)  1.760000000 6.000000e-01
#> unique(b)      1.890000000 1.820000e+00
#> table(b)       2.360000000 5.300000e-01
#> sort(b)        1.510000000 1.833333e-01
#> order(b)       0.083333333 9.333333e-02
#> rank(b)        2.910000000 9.200000e-01
#> quantile(b)    0.000618047 6.082725e-04
#> summary(b)     0.047272727 4.818182e-02
#> SESSION       49.277285479 2.187034e+01

matplot(log2(benchmark64.data[-1,1]/benchmark64.data[-1,])
, pch=c("3", "6", "h", "s", "o", "a")
, xlab="tasks [last=session]"
, ylab="log2(relative speed) [bigger is better]"
)

matplot(t(log2(benchmark64.data[-1,1]/benchmark64.data[-1,]))
, type="b", axes=FALSE
, lwd=c(rep(1, 14), 3)
, xlab="context"
, ylab="log2(relative speed) [bigger is better]"
)
axis(1
, labels=c("32-bit", "64-bit", "hash", "sortorder", "order", "hash+sortorder")
, at=1:6
)
axis(2)

data(optimizer64.data)
print(optimizer64.data)
#>            65536      33554432  
#> match      numeric,27 numeric,27
#> %in%       numeric,30 numeric,30
#> duplicated numeric,30 numeric,30
#> unique     numeric,45 numeric,45
#> unipos     numeric,42 numeric,42
#> table      numeric,39 numeric,39
#> rank       numeric,21 numeric,21
#> quantile   numeric,18 numeric,18
oldpar <- par(no.readonly = TRUE)
par(mfrow=c(2,1))
par(cex=0.7)
for (i in 1:nrow(optimizer64.data)){
 for (j in 1:2){
   tim <- optimizer64.data[[i,j]]
  barplot(t(tim))
  if (rownames(optimizer64.data)[i]=="match")
   title(paste("match", colnames(optimizer64.data)[j], "in", colnames(optimizer64.data)[3-j]))
  else if (rownames(optimizer64.data)[i]=="%in%")
   title(paste(colnames(optimizer64.data)[j], "%in%", colnames(optimizer64.data)[3-j]))
  else
   title(paste(rownames(optimizer64.data)[i], colnames(optimizer64.data)[j]))
 }
}








par(mfrow=c(1,1))