MTU 9000 (Jumbo) - Data Corruption possible?

Status
Not open for further replies.

Grinchy

Explorer
Joined
Aug 5, 2017
Messages
78
Hello,

I'm configuring my new Mellanox ConnectX-2 and while using MTU 9000, it's much faster than MTU 1514 (Standard Setting).

But isn't the chance of data lost or data corruption much higher with MTU 9000, than with 1514? Ethernet uses CRC 32, which got a low possibility to not detect (bit) Errors, even with MTU 1514.
But with MTU 9000 the chance of getting some undetected errors should be much higher, cause it uses about 6th the packet size of the Standard Settings.
 
Joined
Feb 2, 2016
Messages
574
There are many layers to the network stack.

First of all, ethernet itself (layer three where the MTU lives) has something like six 9s (99.9999%) of corruption detection. Which is insanely good.

For an error to propagate to the filesystem, it would also have to be missed by the transport layer and the application layer, both of which often have their own error detection mechanisms.

The problem with Jumbo frames isn't their inherent reliability versus standard sized frames but that so very few people can actually configure them correctly. The cost/benefit ratio for jumbo frames rarely favors their implementation.

Cheers,
Matt
 
Last edited:
Joined
Feb 2, 2016
Messages
574
Oddly enough, it seems ECC RAM - which we know, love and trust - has a higher uncorrected error rate than Ethernet. Am I reading this Google-sponsored (old) paper correctly?

DRAM Errors in the Wild: A Large-Scale Field Study

"About a third of machines and over 8% of DIMMs in our fleet saw at least one correctable error per year. The annual incidence of uncorrectable errors was 1.3% per machine and 0.22% per DIMM."

{shrug}

I'm not losing sleep over either of these concerns.

Cheers,
Matt
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I vaguely remember that, but it's clearly an abnormal scenario. Maybe they were running it too hot, say FB-DIMMs with little to no airflow. They only measure "motherboard" temperature and only consider "high" and "low" temperatures based on the median temperature.
 
Status
Not open for further replies.
Top