Kiran Kankipati
Dabbler
- Joined
- May 8, 2017
- Messages
- 26
Today I wrote a code to do some benchmarks to test the memory operations
mainly such as memcpy(). To check the FreeNAS hardware (cpu, memory, and overall
software optimization). vs my Linux PC hardware (which is Linux).
Of course, I haven't done Linux Kernel <> FreeNAS (i.e FreeBSD) Kernel tests.
For the sake of simplicity for now I did user-space <> user-space memory performance.
I am sharing my source code. Which is quite simple and does the job.
The code is a generic platform-independent code. Which means you can compile on any Linux/FreeBSD OS platform.
As well compile for any hardware platform as you can see below.
To compile the same:
$ gcc -o memcpy_benchmark memcpy_benchmark.c
and you should get some benchmark results as shown below (recommend you to watch my video to interpret and further customize your tests.). This is my Intel Core i7 5820K - 4x 4GB Corsair DDR4 RAM Kit Ubuntu Desktop PC.
... and the same stuff but a FreeBSD binary running on my FreeNAS physical hardware (production server).
This is an AMD RYZEN 3 1200 - 1x 8GB Corsair DDR4 RAM FreeNAS server.
So if you are a systems programmer, you can re-purpose this code and use it for other applications.
If you are a systems admin, perhaps you can try this on multiple server platforms to assess and address
bottlenecks like Filesystem performance (like ZFS in FreeNAS), File transfers, Network performance,
VPN performance, encrypted FS, encrypted ZFS pool, compressed ZFS pool, dedup, and so on.
The main intent of this code is to test the overall CPU + Memory performance and CPU<>RAM/memory performance.
Because 99% of the times many file-system/storage or networking stack does memory operations all the times. And ZFS as it loads on to memory and stays in the memory (during runtime), the performance of CPU<>Memory is quite crucial for high-performance applications and file-system tuneup. You should able to improve my code benchmarks with all kinds of tweaks such as: CPU cooling, CPU cache, CPU cache optimization vs poorly designed cpu architecture, CPU<>RAM Interconnect (i.e DDR4 vs DDR3, Intel QPI vs AMD Infinity Fabric), RAM frequency, memory DIMM hardware specs, RAM latency, Dual vs Quad Channel, all sorts of things will result in varying performance.
As usual I documented the same in depth in my Youtube video episode (The Linux Channel):
IMPORTANT TIP:
Since although you can write and run simple bash/shell scripts on a FreeNAS server. It is not possible to compile a C/C++ code. Which is when you can try this method. You can install a full-fledged FreeBSD build system. Install various build tools (a.k.a tool-chain) such as gcc/g++ compiler and you can compile your C/C++ code. Take the binaries and run on any FreeNAS system directly like this. This is how a systems software product development is done. I.e any network routers, smart-phones, embedded devices, etc.
and in my case to get x86 FreeBSD native binary to run on my FreeNAS physical production server, I have installed x86 FreeBSD Build server VM on my x86 Ubuntu PC hardware as shown below (refer my video for more details):
mainly such as memcpy(). To check the FreeNAS hardware (cpu, memory, and overall
software optimization). vs my Linux PC hardware (which is Linux).
Of course, I haven't done Linux Kernel <> FreeNAS (i.e FreeBSD) Kernel tests.
For the sake of simplicity for now I did user-space <> user-space memory performance.
I am sharing my source code. Which is quite simple and does the job.
The code is a generic platform-independent code. Which means you can compile on any Linux/FreeBSD OS platform.
As well compile for any hardware platform as you can see below.
Code:
/* The Linux Channel * Code to benchmark multi-platform memcpy(): For Episode: 0x173 NAS OS | FreeNAS Server Hardware | Memory Performance | memcpy() * Author: Kiran Kankipati * Updated: 17-Oct-2018 */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/time.h> #define MBx10 10485760 unsigned char *buf1; unsigned char *buf2; void __memcpy_test(int mb) { struct timeval start, end; unsigned long long secs_used, millis_used, micros_used; memset(buf1, (0xaa+mb), MBx10*mb); //to avoid gcc or any compiler optimization gettimeofday(&start, NULL); memcpy(buf2, buf1, MBx10*mb); gettimeofday(&end, NULL); secs_used=(end.tv_sec - start.tv_sec); micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec); millis_used=micros_used/1000; printf("%d0MB - %llu (ms) - %llu (μs)\n\n", mb, millis_used, micros_used); } void main() { int i; buf1 = (unsigned char *)malloc(sizeof(char)*MBx10*100); buf2 = (unsigned char *)malloc(sizeof(char)*MBx10*100); __memcpy_test(1); //10 MB __memcpy_test(2); //20 MB __memcpy_test(3); //30 MB __memcpy_test(4); //40 MB __memcpy_test(5); //50 MB __memcpy_test(6); //60 MB __memcpy_test(7); //70 MB __memcpy_test(8); //80 MB __memcpy_test(9); //90 MB __memcpy_test(10); //100 MB printf("------\n"); __memcpy_test(30); //300 MB __memcpy_test(50); //500 MB printf("------\n"); for(i=0;i<10;i++) { __memcpy_test(50); } free(buf1); free(buf2); }
To compile the same:
$ gcc -o memcpy_benchmark memcpy_benchmark.c
and you should get some benchmark results as shown below (recommend you to watch my video to interpret and further customize your tests.). This is my Intel Core i7 5820K - 4x 4GB Corsair DDR4 RAM Kit Ubuntu Desktop PC.
Code:
kiran@3TBWDBLUE:/code/thelinuxchannel/memcpy_benchmark$ ./memcpy_benchmark 10MB - 5 (ms) - 5310 (μs) 20MB - 7 (ms) - 7393 (μs) 30MB - 8 (ms) - 8448 (μs) 40MB - 9 (ms) - 9359 (μs) 50MB - 10 (ms) - 10114 (μs) 60MB - 10 (ms) - 10873 (μs) 70MB - 11 (ms) - 11834 (μs) 80MB - 13 (ms) - 13032 (μs) 90MB - 13 (ms) - 13771 (μs) 100MB - 14 (ms) - 14173 (μs) ------ 300MB - 93 (ms) - 93776 (μs) 500MB - 112 (ms) - 112451 (μs) ------ 500MB - 49 (ms) - 49060 (μs) 500MB - 48 (ms) - 48724 (μs) 500MB - 49 (ms) - 49048 (μs) 500MB - 48 (ms) - 48871 (μs) 500MB - 49 (ms) - 49158 (μs) 500MB - 50 (ms) - 50161 (μs) 500MB - 48 (ms) - 48836 (μs) 500MB - 49 (ms) - 49197 (μs) 500MB - 48 (ms) - 48714 (μs) 500MB - 49 (ms) - 49001 (μs)
... and the same stuff but a FreeBSD binary running on my FreeNAS physical hardware (production server).
This is an AMD RYZEN 3 1200 - 1x 8GB Corsair DDR4 RAM FreeNAS server.
Code:
kiran@TITAN-NAS /mnt/TITAN-NAS-BIG-POOL/Utilities]$ ./memcpy_benchmark_freebsd 10MB - 5 (ms) - 5739 (μs) 20MB - 8 (ms) - 8361 (μs) 30MB - 10 (ms) - 10919 (μs) 40MB - 13 (ms) - 13389 (μs) 50MB - 15 (ms) - 15839 (μs) 60MB - 18 (ms) - 18404 (μs) 70MB - 20 (ms) - 20842 (μs) 80MB - 23 (ms) - 23405 (μs) 90MB - 25 (ms) - 25866 (μs) 100MB - 28 (ms) - 28322 (μs) ------ 300MB - 91 (ms) - 91293 (μs) 500MB - 135 (ms) - 135126 (μs) ------ 500MB - 109 (ms) - 109486 (μs) 500MB - 109 (ms) - 109606 (μs) 500MB - 109 (ms) - 109657 (μs) 500MB - 109 (ms) - 109674 (μs) 500MB - 110 (ms) - 110070 (μs) 500MB - 109 (ms) - 109809 (μs) 500MB - 109 (ms) - 109918 (μs) 500MB - 109 (ms) - 109989 (μs) 500MB - 109 (ms) - 109801 (μs) 500MB - 109 (ms) - 109327 (μs) [kiran@TITAN-NAS /mnt/TITAN-NAS-BIG-POOL/Utilities]$
So if you are a systems programmer, you can re-purpose this code and use it for other applications.
If you are a systems admin, perhaps you can try this on multiple server platforms to assess and address
bottlenecks like Filesystem performance (like ZFS in FreeNAS), File transfers, Network performance,
VPN performance, encrypted FS, encrypted ZFS pool, compressed ZFS pool, dedup, and so on.
The main intent of this code is to test the overall CPU + Memory performance and CPU<>RAM/memory performance.
Because 99% of the times many file-system/storage or networking stack does memory operations all the times. And ZFS as it loads on to memory and stays in the memory (during runtime), the performance of CPU<>Memory is quite crucial for high-performance applications and file-system tuneup. You should able to improve my code benchmarks with all kinds of tweaks such as: CPU cooling, CPU cache, CPU cache optimization vs poorly designed cpu architecture, CPU<>RAM Interconnect (i.e DDR4 vs DDR3, Intel QPI vs AMD Infinity Fabric), RAM frequency, memory DIMM hardware specs, RAM latency, Dual vs Quad Channel, all sorts of things will result in varying performance.
As usual I documented the same in depth in my Youtube video episode (The Linux Channel):
IMPORTANT TIP:
Since although you can write and run simple bash/shell scripts on a FreeNAS server. It is not possible to compile a C/C++ code. Which is when you can try this method. You can install a full-fledged FreeBSD build system. Install various build tools (a.k.a tool-chain) such as gcc/g++ compiler and you can compile your C/C++ code. Take the binaries and run on any FreeNAS system directly like this. This is how a systems software product development is done. I.e any network routers, smart-phones, embedded devices, etc.

and in my case to get x86 FreeBSD native binary to run on my FreeNAS physical production server, I have installed x86 FreeBSD Build server VM on my x86 Ubuntu PC hardware as shown below (refer my video for more details):

Last edited: