Upgrade to 25GbE (Mellanox ConnectX-4)

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
I am in the market to upgrade my network infrastructure to 25GbE.

Therefore I would like to purchase three PCIe cards (Workstation, Server, Backup Node) and after reading the STH article (https://www.servethehome.com/mellanox-connectx-4-lx-mini-review-ubiquitous-25gbe/) I guess price-wise the Mellanox ConnectX-4 (MCX4121A-ACAT) seems like a good solution.

I used and flashed ConnectX-3 cards in the past and had good results. But it is of course a bit hit and miss with the sellers on ebay.

Are there currently counterfeit ConnectX-4 cards in circulation - is there an (easy) way to identify those? I also see cards "Made in Isreal" and "Made in China". The China ones being the newer dates / revisions (2018 and later)

Any recommendations of sellers shipping to Europe?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
The 10 Gig network primer says to avoid Mellanox...

 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
I read that but I guess it is more geared towards FreeBSD and the much older Mellanox ConnectX-2 generation.

I am using Windows and Debian (TrueNAS Scale) and already had good results with the X-3 series myself (https://www.truenas.com/community/t...connectx-3-setup-benchmark-and-tuning.105462/).

Due to some logistics and compatibility related reasons I want to move away from the older QSFP standard and towards SFP28.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I guess it is more geared towards FreeBSD

You guess wrong. If I want you to think something is geared towards FreeBSD, I will say that it is geared towards FreeBSD. Even if it were geared towards FreeBSD, you should want to maintain compatibility anyways because Linux kinda sucks and in a crisis you might want or need to run CORE; deliberately selecting hardware that doesn't work well with both operating systems is kinda shortsighted. The recommended hardware works swimmingly well with both FreeBSD and Linux, and both Intel and Chelsio have adapters for 25G that use the same drivers as the known-to-work recommended cards, so those are a strong recommendation. Means you can pull your old X710 and drop in an XXV710 and it just works.
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
In a commerical pure TrueNAS production environment sure ... but considering budget, setup and use case I am leaning towards the Mellanox solution. So far has worked well from my own experience and reading other users feedback lead me to believe the sucessors (X-4/5/6) will be no difference.
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
I installed one Mellanox ConnectX-4 (MCX4121A-ACAT) in my Workstation (W11Pro) and one into my Server (Proxmox with TrueNAS Scale as VM).

Both are connected to a 10G Managed Switch (USW-Aggregation) with SFP+ DAC cables.

If I upload to my server from the workstation I get the expected 10G speeds:
Upload to TrueNAS (10GbE, NVMe).png



If I download from the server it seems oddly capped at 300 MB/s:
Download from TrueNAS (10GbE, NVMe).png



Download from / uploading to the same NVMe SMB share on the TureNAS ... ?!?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What storage system do you have on the workstation receiving the download? This looks like you're capped by the SATA drive on the desktop. You ideally want to run iperf between RAM drives on both ends to isolate just the network transfer bandwidth.
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
That is the weird thing it is all NVMe on both systems no spinning rust involved at all.

The Workstation's NVMe speed should not be the bottleneck:
230418_NVMe WD850.png
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
The switch also show's 10GbE correctly:
USW-AGGREGATION.png


Also the Workstation show 10/10 (Gbps):
Windows Speed.png
 
Last edited:

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
Iperf3 results:

Workstation in CLIENT-MODE:
Workstation in Client Mode.png


Server in CLIENT-MODE:
Server in Client Mode.png
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
I first did some more extensive testing and went one layer up like suggested by @Samuel Tai

So setup-wise 3 physical machines:
  • Workstation (Win11-Pro)
  • Proxmox (PVE)
  • Proxmox (PVE-Backup)
All three machines running NVMe only (no HDD bottleneck), equipped with "Mellanox ConnectX-4 (MCX4121A-ACAT) 25 GbE cards" and all three attached to a "USW-Aggregation" (10G Managed Switch, 8x SFP+) via DAC cables (10Gtek CAB-ZSP, 25G SFP28 passive). No other LAN ports attached.

Now speed test with iperf3 directly from CLI Proxmox to Proxmox (pve to pve-backup) shows full speed, I tested both devices in server/client mode:
Screenshot 2023-06-06 162852_pve-CM.png


So speeds between proxmox machines seems as expected -> full 10 GbE both directions.

Now running iperf3 on the Windows machine I get the following results (tested with both proxmox machines the same). I even exchanged the Mellanox card in the Windows machine to exclude a hardware fault. But the same behaviour.

Windows in client mode:
Screenshot 2023-06-06 163657_windows-CM_to pve and  backup.png


... and Windows machine in server mode:
Screenshot 2023-06-06 164902_windows in server mode.png


... one more thing, very rarely I encountered a few "Retr" (pve to pve-backup) ... is that concerning?
Screenshot 2023-06-06 163657_backup-CM_ to pve retr.png

Screenshot 2023-06-06 163657_pve-CM_to backup retr.png



TO SUMMARIZE:

Proxmox to Proxmox -> full 10 GbE up/down
Windows to Proxmox -> 8 GbE upload / 4 GbE download

So it seems that the issue is somewhere on the Windows Machine side ...?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
the issue is somewhere on the Windows Machine side ...?
Maybe look into card driver/firmware, perhaps a mismatch or older set is present.
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
What CPU do you have on your windows computer? This testing you are doing is single threaded so IPC matters.
This should be no issue at all:
  • CPU: AMD Ryzen 9 7950X3D, 16C/32T, 4.20-5.70GHz
  • RAM: G.Skill Trident Z5 NEO schwarz DIMM Kit 64GB, DDR5-6000, CL32-38-38-96, on-die ECC (F5-6000J3238G32GX2-TZ5N)
  • MAINBOARD: ASUS ROG Strix X670E-E Gaming WIFI
I was rather expecting the older Proxmox backup machine to perform worse (see specs in signature, backup node) but as said proxmox <-> proxmox = full 10 GbE speed.
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
Output of the query "mlxconfig -d mt4117_pciconf0 query":

Code:
Device #1:
----------

Device type:    ConnectX4LX
Name:           MCX4121A-ACA_Ax
Description:    ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
Device:         mt4117_pciconf0

Configurations:                                      Next Boot
         MEMIC_BAR_SIZE                              0
         MEMIC_SIZE_LIMIT                            _256KB(1)
         FLEX_PARSER_PROFILE_ENABLE                  0
         FLEX_IPV4_OVER_VXLAN_PORT                   0
         ROCE_NEXT_PROTOCOL                          254
         PF_NUM_OF_VF_VALID                          False(0)
         NON_PREFETCHABLE_PF_BAR                     False(0)
         VF_VPD_ENABLE                               False(0)
         STRICT_VF_MSIX_NUM                          False(0)
         VF_NODNIC_ENABLE                            False(0)
         NUM_PF_MSIX_VALID                           True(1)
         NUM_OF_VFS                                  8
         NUM_OF_PF                                   2
         SRIOV_EN                                    True(1)
         PF_LOG_BAR_SIZE                             5
         VF_LOG_BAR_SIZE                             0
         NUM_PF_MSIX                                 63
         NUM_VF_MSIX                                 11
         INT_LOG_MAX_PAYLOAD_SIZE                    AUTOMATIC(0)
         PCIE_CREDIT_TOKEN_TIMEOUT                   0
         MAX_ACC_OUT_READ                            0
         ACCURATE_TX_SCHEDULER                       False(0)
         PARTIAL_RESET_EN                            False(0)
         SW_RECOVERY_ON_ERRORS                       False(0)
         RESET_WITH_HOST_ON_ERRORS                   False(0)
         PCI_BUS0_RESTRICT_SPEED                     PCI_GEN_1(0)
         PCI_BUS0_RESTRICT_ASPM                      False(0)
         PCI_BUS0_RESTRICT_WIDTH                     PCI_X1(0)
         PCI_BUS0_RESTRICT                           False(0)
         PCI_DOWNSTREAM_PORT_OWNER                   Array[0..15]
         CQE_COMPRESSION                             BALANCED(0)
         IP_OVER_VXLAN_EN                            False(0)
         MKEY_BY_NAME                                False(0)
         UCTX_EN                                     True(1)
         PCI_ATOMIC_MODE                             PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
         TUNNEL_ECN_COPY_DISABLE                     False(0)
         LRO_LOG_TIMEOUT0                            6
         LRO_LOG_TIMEOUT1                            7
         LRO_LOG_TIMEOUT2                            8
         LRO_LOG_TIMEOUT3                            13
         ICM_CACHE_MODE                              DEVICE_DEFAULT(0)
         TX_SCHEDULER_BURST                          0
         LOG_MAX_QUEUE                               17
         LOG_DCR_HASH_TABLE_SIZE                     14
         MAX_PACKET_LIFETIME                         0
         DCR_LIFO_SIZE                               16384
         ROCE_CC_PRIO_MASK_P1                        255
         ROCE_CC_PRIO_MASK_P2                        255
         CLAMP_TGT_RATE_AFTER_TIME_INC_P1            True(1)
         CLAMP_TGT_RATE_P1                           False(0)
         RPG_TIME_RESET_P1                           300
         RPG_BYTE_RESET_P1                           32767
         RPG_THRESHOLD_P1                            1
         RPG_MAX_RATE_P1                             0
         RPG_AI_RATE_P1                              5
         RPG_HAI_RATE_P1                             50
         RPG_GD_P1                                   11
         RPG_MIN_DEC_FAC_P1                          50
         RPG_MIN_RATE_P1                             1
         RATE_TO_SET_ON_FIRST_CNP_P1                 0
         DCE_TCP_G_P1                                1019
         DCE_TCP_RTT_P1                              1
         RATE_REDUCE_MONITOR_PERIOD_P1               4
         INITIAL_ALPHA_VALUE_P1                      1023
         MIN_TIME_BETWEEN_CNPS_P1                    4
         CNP_802P_PRIO_P1                            6
         CNP_DSCP_P1                                 48
         CLAMP_TGT_RATE_AFTER_TIME_INC_P2            True(1)
         CLAMP_TGT_RATE_P2                           False(0)
         RPG_TIME_RESET_P2                           300
         RPG_BYTE_RESET_P2                           32767
         RPG_THRESHOLD_P2                            1
         RPG_MAX_RATE_P2                             0
         RPG_AI_RATE_P2                              5
         RPG_HAI_RATE_P2                             50
         RPG_GD_P2                                   11
         RPG_MIN_DEC_FAC_P2                          50
         RPG_MIN_RATE_P2                             1
         RATE_TO_SET_ON_FIRST_CNP_P2                 0
         DCE_TCP_G_P2                                1019
         DCE_TCP_RTT_P2                              1
         RATE_REDUCE_MONITOR_PERIOD_P2               4
         INITIAL_ALPHA_VALUE_P2                      1023
         MIN_TIME_BETWEEN_CNPS_P2                    4
         CNP_802P_PRIO_P2                            6
         CNP_DSCP_P2                                 48
         LLDP_NB_DCBX_P1                             False(0)
         LLDP_NB_RX_MODE_P1                          OFF(0)
         LLDP_NB_TX_MODE_P1                          OFF(0)
         LLDP_NB_DCBX_P2                             False(0)
         LLDP_NB_RX_MODE_P2                          OFF(0)
         LLDP_NB_TX_MODE_P2                          OFF(0)
         ROCE_RTT_RESP_DSCP_P1                       0
         ROCE_RTT_RESP_DSCP_MODE_P1                  DEVICE_DEFAULT(0)
         ROCE_RTT_RESP_DSCP_P2                       0
         ROCE_RTT_RESP_DSCP_MODE_P2                  DEVICE_DEFAULT(0)
         DCBX_IEEE_P1                                True(1)
         DCBX_CEE_P1                                 True(1)
         DCBX_WILLING_P1                             True(1)
         DCBX_IEEE_P2                                True(1)
         DCBX_CEE_P2                                 True(1)
         DCBX_WILLING_P2                             True(1)
         KEEP_ETH_LINK_UP_P1                         True(1)
         KEEP_IB_LINK_UP_P1                          False(0)
         KEEP_LINK_UP_ON_BOOT_P1                     False(0)
         KEEP_LINK_UP_ON_STANDBY_P1                  False(0)
         DO_NOT_CLEAR_PORT_STATS_P1                  False(0)
         AUTO_POWER_SAVE_LINK_DOWN_P1                False(0)
         KEEP_ETH_LINK_UP_P2                         True(1)
         KEEP_IB_LINK_UP_P2                          False(0)
         KEEP_LINK_UP_ON_BOOT_P2                     False(0)
         KEEP_LINK_UP_ON_STANDBY_P2                  False(0)
         DO_NOT_CLEAR_PORT_STATS_P2                  False(0)
         AUTO_POWER_SAVE_LINK_DOWN_P2                False(0)
         NUM_OF_VL_P1                                _4_VLs(3)
         NUM_OF_TC_P1                                _8_TCs(0)
         NUM_OF_PFC_P1                               8
         VL15_BUFFER_SIZE_P1                         0
         NUM_OF_VL_P2                                _4_VLs(3)
         NUM_OF_TC_P2                                _8_TCs(0)
         NUM_OF_PFC_P2                               8
         VL15_BUFFER_SIZE_P2                         0
         DUP_MAC_ACTION_P1                           LAST_CFG(0)
         SRIOV_IB_ROUTING_MODE_P1                    LID(1)
         IB_ROUTING_MODE_P1                          LID(1)
         DUP_MAC_ACTION_P2                           LAST_CFG(0)
         SRIOV_IB_ROUTING_MODE_P2                    LID(1)
         IB_ROUTING_MODE_P2                          LID(1)
         PHY_FEC_OVERRIDE_P1                         DEVICE_DEFAULT(0)
         PHY_FEC_OVERRIDE_P2                         DEVICE_DEFAULT(0)
         ROCE_CONTROL                                ROCE_ENABLE(2)
         PCI_WR_ORDERING                             per_mkey(0)
         MULTI_PORT_VHCA_EN                          False(0)
         PORT_OWNER                                  True(1)
         ALLOW_RD_COUNTERS                           True(1)
         RENEG_ON_CHANGE                             True(1)
         TRACER_ENABLE                               True(1)
         IP_VER                                      IPv4(0)
         BOOT_UNDI_NETWORK_WAIT                      0
         UEFI_HII_EN                                 True(1)
         BOOT_DBG_LOG                                False(0)
         UEFI_LOGS                                   DISABLED(0)
         BOOT_VLAN                                   1
         LEGACY_BOOT_PROTOCOL                        PXE(1)
         BOOT_INTERRUPT_DIS                          False(0)
         BOOT_LACP_DIS                               True(1)
         BOOT_VLAN_EN                                False(0)
         BOOT_PKEY                                   0
         DYNAMIC_VF_MSIX_TABLE                       False(0)
         EXP_ROM_UEFI_ARM_ENABLE                     False(0)
         EXP_ROM_UEFI_x86_ENABLE                     True(1)
         EXP_ROM_PXE_ENABLE                          True(1)
         FORCE_ETH_PCI_SUBCLASS                      False(0)
         ADVANCED_PCI_SETTINGS                       True(1)
         SAFE_MODE_THRESHOLD                         10
         SAFE_MODE_ENABLE                            True(1)
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
Do further testing it gets even weirder ...

When doing Crystaldiskmark/Blackmagic speed tests (target for booth is a shared SMB drive on the proxmox):

Windows to Proxmox
Screenshot 2023-07-01 132014_Crystalmark Win to Server.png


Mac to Proxmox
Bildschirmfoto 2023-07-04 um 09.56.02.png


... for Mac the write speed is limited to the server, for windows the read speed from server ?!?
 
Top