In many implementations, it is a simple while loop that copies the specified value one byte at a time over the given number of bytes. The memset function is designed to be flexible and simple, even at the expense of speed. I'll try assembler next :)Įdit3: fixed bug in test code, test results are not affectedĮdit4: While poking around the disassembled VS2010 C runtime, I noticed that memset has a SSE optimized routine for zero. (all tests were run on Linux) Further testing needed. Also, I tried optimizing zero_sizet further, but the compiler always outdid me, but no surprise here.įor now memset wins, previous results were distorted by CPU cache. however the disassembled function had roughly four times as many instructions (I think caused by loop unrolling). one thing of interest is that at -O3 zero_1 was equally fast as zero_sizet. memset was always slower than zero_sizet. zero_sizet is the fastest with roughly equal performance across -O1, -O2 and -O3. * I foolishly assume size_t has register width */įor (i = 0 i < size / sizeof(size_t) i++)īar = (char*)buff + size - size % sizeof(size_t) įor (i = 0 i < size % sizeof(size_t) i++) I revisited this issue and did a little testing. However at that time I was very tired, so I'm not quite sure I understood it correctly. What was I thinking?Īlso I asked a person who knew of assembler more than me to look at the stdlib, and he told me that on x86 memset is not taking full advantage of the 32 bit wide registers. I assume that memset uses mov, however when zeroing memory most compilers use xor as it's faster, correct? edit1: Wrong, as GregS pointed out that only works with registers. I learned that memset(ptr, 0, nbytes) is really fast, but is there a faster way (at least on x86)?
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |