It occurs to me that we should consider improving
the performance of NCDEFAULT_get/put_vars since they
are still being used for some dispatch tables.
The current implementations operate by reading/writing
one element at a time, which is seriously inefficient.
We should explore some alternative implementations such as this.
- Instead of reading element at a time, we allocate some larger
block of memory that covers multiple strided elements at a time.
- the stride code then reads one of these blocks and extracts the
relevant strided elements.
- Cost is memory of the block (per-call to get_vars). - should
we keep a blockcache
- For writing, we would read the block, then insert the
strided elements, and then write out the whole block.
Anyone have any other suggestions for speed up?
It occurs to me that we should consider improving
the performance of NCDEFAULT_get/put_vars since they
are still being used for some dispatch tables.
The current implementations operate by reading/writing
one element at a time, which is seriously inefficient.
We should explore some alternative implementations such as this.
block of memory that covers multiple strided elements at a time.
relevant strided elements.
we keep a blockcache
strided elements, and then write out the whole block.
Anyone have any other suggestions for speed up?