problems with disapearing datas :-(

Jeff63

Dabbler
Joined
Mar 31, 2023
Messages
22
Hi :)
I've set up a Truenas scale box and created one big ZFS RaidZ2 dataset of 36 TB consisting of 6x10 TB disks ;
in that dataset I have 5 'slices' for storing different stuff from my network ;
in one of that 'slices' I've put 5 TB of datas, and some on the other 'slices', I have few GB free (but will increase the pool size as soon as I get 4 more 10 TB disks as I can't just add one disk at a time... ;)) ;
All the 'slices' are SBM or SMB/NFS shares.

after some month of 24/7 run I've seen something very confusing : on that 5 TB slice some data has been erased randomly, file are still listed but zero in size...
that should not happen on a ZFS raidz2 array said to be very secure...

all the disks are good with no errors, so what has happened ??? and how can I prevent this to happen again, as that nas is for keeping safe my datas I would like it to not eat it ;)

any suggestions ?

thanks by advance for the help

Jeff
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
You have to describe your setup - which hardware is used? especially, which controller(s) and disks?

You say you have "a few" GB left on your raidz2 pool (not dataset). Is that correct? what is "a few"?
How are sync settings setup?

ZFS is a copy-on-write filesystem - which means you really need free space not to destroy your data - it kinda scares me with the "a few" part.
 

Jeff63

Dabbler
Joined
Mar 31, 2023
Messages
22
You have to describe your setup - which hardware is used? especially, which controller(s) and disks?

You say you have "a few" GB left on your raidz2 pool (not dataset). Is that correct? what is "a few"?
How are sync settings setup?

ZFS is a copy-on-write filesystem - which means you really need free space not to destroy your data - it kinda scares me with the "a few" part.
heu! a few is 30 GB left...

for the controller its a SAS2108 in IT mode, synch is 'standard'
disk are all Seagate ST10000 10TB
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
A lot of hardware details are still missing ...
 

Jeff63

Dabbler
Joined
Mar 31, 2023
Messages
22
ok here it is :

as said TrueNAS Scale v23.10.2 (uptodate)
Mobo Asus Maximus IV GeneZ
Intel Core i7-2600 @3.40GHz
16 GB ram DDR3 non ecc
6 x Seagate ST10000 10 TB sata disks
SAS 2108 controller (IT mode)
1 x Sata SSD 120 GB Kingston SV300S37A120G (cache)
Melanox CX311A-XCAT 10Gb Fiber network card
10Gb Internet :)
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
are you using quotas? The part with 5TB slice (dataset I would think) suggests that, but just want to confirm
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
If you have a 36TB pool and you only have 36Gb left then your pool is full. You should not fill a pool over 80% (see the documentation) which would be about 28-29Tb. If a quota limit is hit (on a slice?) it stops writing to the that share or slice and the operation doing the writing should error out and throw a device full error. If you go over the recommended pool capacity due to over allocation of space of the pool, it may break the system and prevent access to data, but it will not erase any existing data.
 

Jeff63

Dabbler
Joined
Mar 31, 2023
Messages
22
well, I got some disks problems on some machines and I had to urgently copy tons of datas somewhere before it died completely... so all this has filled the nas more than I wanted :( and as it's my first nas, I forgot to use quotas for the datasets.... btw if the 'disk' is full it should not be able to continue to write to any slice and overwrite other datasets datas or am I wrong ?
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
slice is a term used by UFS and not ZFS, in ZFS you have pools of virtual devices built by regular devices and in pools you can create datasets. There is no predefined size of a dataset as there is with a more traditional partition or slice. You can how ever set quotas to restrict how much space a dataset is allowed to consume. What you have done has nothing to do with slices or datasets. You have filled your pool and now you cannot read it any more.

I guess restore from backup isn't an option?

Please show us the printout of zpool status and zfs list -d 0 -o used,available,referenced in [ code] blocks
 

Jeff63

Dabbler
Joined
Mar 31, 2023
Messages
22
zpool status :

zpool status_TrueNAS.jpg

zfs list :
zfs list_TrueNAS.jpg
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
Ya, you filled your pool. Deletion might not be even possible. You might be able to mount the pool read only and copy the data off it.
 

Jeff63

Dabbler
Joined
Mar 31, 2023
Messages
22
hello :)
yes it's filled ;) and I can't do anything to make free space as I can't get new drives yet to replace the dead ones on the other machines or expand the pool by 4 more 10 GB drives :( damn' expansive beasts!

but that's not the problem that annoy me now, as I can access the pools just fine, read (in user or root mode), delete and even write to it (only if I'm in root mode when accessing by NFS, (no such problem via Samba access...))...

in fact the problem I have here is that datas have been erased randomly and magically from a dataset without my consent or knowledge 'till I wanted to read it... about 550 GB of data have vanished, as said I still see the files listed but zeroed in size... I'm just lucky it was not sources or more important ones this time.

I wanted a NAS to keep files and backups safe and what I see is that ZFS RaidZ2 pools aren't as safe as it should (as advertised ;) in FreeNAS Scale... sic.
I've chosen RaidZ2 as I could have two disks dying without losing any byte of data (expectations...) and see data disappearing with perfectly good drives without reason. :(

there must be a bug somewhere, don't you think ? so where ? and how I can help to debug this ?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
If you write to a full pool what do you expect to happen? Full is full.
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
First thing you should do before anything else is to move about 8Tb of data off the pool since you say you can actually see it. You may have to use the command line to do this.

Are you absolutely positive the RAID SAS 2108 controller is in IT mode? How do you know that for sure? That model I think is a RAID controller with cache memory. It works on a UFS file system, but will cause data loss when used with ZFS and Truenas as ZFS. Everything I have found here in the forums on this controller model number says it will not work with Truenas.

Where are the drives connected to and how are they connected?

The Asus motherboard is a gaming board not a server board and does not support ECC memory and has the RAID (probably a version of 2108 or similar) controller onboard and will include at least some onboard cache for the controller. The RAID controller onboard is not compatible with Truenas as Truenas is not a UFS type system,but is compatible with UFS type systems and works fine with them. Is this the actual controller you are using? If so, it is usually not possible to change onboard RAID controllers into non RAID controllers. Even if one can crossflash the chip, it is unlikely the RAID cache can be turned off and that in itself will cause data loss when used with ZFS.

Are you using deduplication?
Are you using any cache setups in Truenas? If so which types of cache?
You are using non-ECC memory. Is the memory overclocked?
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Are you absolutely positive the RAID SAS 2108 controller is in IT mode?
I know it is a fine line. But we are not even talking about controller in IT mode. Instead we are talking about HBAs with IT firmware. Because RAID variants of those cards also have IT mode, but it is utterly unsuitable.

Side note: There is also a difference between controller and adapter, going back to least to the SCSI days. There never was a SCSI controller card, they were all adapters. The controller was always on the disk itself. In contrast MFM, IDE (predecessor of SATA), etc. had a controller card and less "intelligence" on the drives.
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
Data is better of on ZFS then any of the available alternatives as long as you don’t fill up the pool. Size reported by nfs or smb in your present situation shouldn’t be relied on either. Log in with ssh for any maintenance of the pool.
 
Top