Re: [EdgeX-TSC-Core] CBOR - Re-encode Performance Metrics
I had inadvertently recorded a timing for the last cell in the timings as 7m36s. This resulted from my run of 5000 iterations on the final check, in attempt to smooth out noise on average timings occurring between runs. Updated for consistency.
Note the variance in timings between runs will be reduced by running longer tests. We should not expect a chosen checksum method to impact encode/decode timings, etc.
From: Mosby, Tobias
Sent: Thursday, April 25, 2019 8:19 AM
To: 'Anthony Bonafide' <anthonymbonafide@...>; 'EdgeX-GoLang@...' <EdgeX-GoLang@...>; 'edgex-tsc-core@...' <edgex-tsc-core@...>; 'EdgeX-TSC-Device-Services@...' <EdgeX-TSC-Device-Services@...>
Subject: RE: [EdgeX-TSC-Core] CBOR - Re-encode Performance Metrics
Yes, this is very helpful to see the metrics we can anticipate for re-encode, garbage collection, and memory utilization.
Thank you Anthony!
Are the timings you measured for re-encode in units of microseconds? I re-ran and was looking to decode the actual value; will follow up with you.
This morning we discussed sharing anticipated response times for SHA256 vs MD5 checksum hashing algorithms.
Eric also suggested a look at XXHash so it is included below for comparison as well.
· CBOR encode/decode was performed using go-codec v1.1.4 which is not yet added to EdgeX codebase. I also found that setting up a buffered reader adversely impacts CBOR encode performance.
· Each run was provided same three payload sizes (small/medium/large), performed across 1k iterations respectively, with rolling average used to produce result set.
· Only intentional difference between each run was to apply a different checksum method to compute the event/payload hash.
· Takeaway: As seen in Run #3, xxHash.Checksum64 is demonstrably more performant. So if “weaker” (but speedy) hashes such as this were used in place of uuid and re-encode of the event model, mitigation of collisions may need to be addressed at persistence layer. For example, FIFO to resolve which event record to mark “pushed” if/when distinct events are assigned the same hash value; ref: https://github.com/Cyan4973/xxHash/issues/165
*Resulting metrics in the tables above must be taken as anecdotal (mileage may vary) since these were executed on my laptop in a VM with other services such as EdgeX stack running.
On Behalf Of Anthony Bonafide
Last week during the Core Working Group call we were discussing a couple of ways to address some issues regarding capturing ID and time information when dealing with CBOR content in core-data. One option was to decode the original content, update the necessary data and re-encode to CBOR before publishing. Before going down that path we wanted to grab some performance data on the re-encoding process. I have created a Github repo which simulates the process of re-encoding similar to how we would want to implement the previously mentioned steps. There are instructions in the README on how to run the benchmark tests, along with the test structure.
In a nutshell, the benchmark accepts 2 arguments. The first specifies how many iterations to execute, the second specifies the Event payload size. The options for the payload size are small, medium, and large which correspond to 100K,900K,and 12M respectively(to match the results shared by Tobias regarding checksum implementation). The logic for the bench mark works as follows:
A run of the tests on my laptop resulted in the following:
There is more information provided by the CLI tool, but those are what I felt might be the most important. Also, there are a few Go Benchmark tests in the repo which benchmark encoding, decoding, and the re-encoding process as listed above. Each of the tests are isolated and provide another way to get performance metrics. The instructions for executing the Go Benchmark tests can also be found in the Go Benchmark Section of the README. Hopefully this helps and if anyone has any ideas or suggestions regarding more and/or better metrics feel free to reach out.