Seagate 8TB SMR Archive disk used as backup, working great

Status
Not open for further replies.

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Thought I'd post an update. My full backups are working great with the Seagate 8TB SMR Archive disk.
It's backing up a RAID-Z2 pool with 4 x 4TB disks, so a single 8TB disk can do full backups. (After I set
the backup pool to same compression :). I've probably run 12 backups, of various types, (full or incremtnal),
over the last 18 months. All ran fine, no real problems. (Yes, I only run one backup every 2 months on THIS
disk.)

Backups run a bit slow, upto 18 hours, (6 hours for the pre-backup scrub and 12 hours for a high update
incremental backup). But, for me it was never about the time, it was the conviniance of a single disk that
could store my entire NAS pool. At the time I bought the Seagate 8TB SMR Archive disk, there were no
regular 8TB disks, either NAS type or enterprise.

Today, I might consider another solution, like one of the 8TB, (or larger), NAS or enterprise disks. If however
it was still cost effective to use Seagate SMR Archive disks, I would. Especially if they came out with larger
ones, like 10TB or 12TB.

For those interested, my backups use an external enclosure that I can hot-swap the disk. Sits on top of my
FreeNAS Mini and uses an eSATA cable for attachment. Today I'd probably have bought a FreeNAS Mini XL
just for a builtin disk slot that I could use for backups.
 
Last edited:

rs225

Guru
Joined
Jun 28, 2014
Messages
878
It sounds like you've gotten into re-writes on the SMR; is that what takes so long? Do you think it would go faster if you started over with a complete new pool/full-backup once that happened?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
It sounds like you've gotten into re-writes on the SMR;
...
Yes, it's definantely into re-writing tracks due to free space fragmentation inside shingles. I've written more
than 8TB to the disk since I created the pool. My original backup scheme made a new dataset for each full
backup, (I had only about 1.2TB used at the time).
...
Is that what takes so long?
...
And yes again, it's slow because of the shingling. About 30Mega Bytes per second average write speed.
In the beginning I got 60MBps and maybe even up to 90MBps. Probably due to the next track being clear.
...
Do you think it would go faster if you started over with a complete new pool/full-backup once that happened?
Perhaps it would be faster if I started over, with a clear disk. Except my method of backups at present is
a bit complicated. Since I have the space, I am doing a snapshot after every backup. Thus, I can get a bit
of history. In essence, I have multiple backups on the same disk, allowing me to use ZFS to it's fullest.


This last weekend's backup was a bit larger than normal. I only bought 2 DVD movies, (not too many
extras), but the 3 Blu-rays I bought had lots of extras, which I also extract. Plus. I did a massive update to
my video library. Transcoded alot of the AC-3 audio down to AAC to make it more portable, (but left any
5.1 channel AC-3 audio as a 2nd audio track). Plus, I updated alot of Roku specific BIF files, (which allow
easier fast forward and rewind), because I have a better tool to create them now. Though they weren't too
big.

All in all, probably at least 350GB out of my 1.4TB of my videos were changed, or added for this last
backup. Won't need to re-do them again. But, I still have lots more AC-3 to AAC transcodes, so in 2 month,
yet more to backup.
 
Last edited:

rs225

Guru
Joined
Jun 28, 2014
Messages
878
That sounds like it is already ideal (big and blocky) on the rewrites, so a new pool may not be much improvement.

If they continue with SMR, I hope they add TRIM or a Secure Erase equivalent.

Since the SMR drives have a special area at front and rear for ZFS uber blocks, has it been checked/tested if the FreeNAS practice of reserving some space for swap partition is not working against that?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I too was not aware of any special place for ZFS uber blocks on the Seagate SMR Archive disks.

However, from what I have read, each SMR Archive disk has higher speed, non-shingled disk
tracks for a write cache. On the 8TB model, I think I read it was 20GBs in size. Whence it's full
or after a timeout, the disk has to flush it to shingled space.

This write cache would not be directly useful to ZFS, beyond the normal write cache reason.

I too wish they would add TRIM. If the disk knew that the next track to be over-written was not
in use, then it would not need to save it.
 

shorthand

Cadet
Joined
Sep 28, 2018
Messages
4
I know this thread is a little stale, but I found a solution to speeding up writes on SMR-based raid-z arrays ... From 'thewacokid' @ https://github.com/zfsonlinux/zfs/issues/4877

Basically:
Code:
zfs set recordsize=1M rpool


And then put the following in your zfs.conf:
Code:
#recommended settings for use wit SMR drives
#https://github.com/zfsonlinux/zfs/issues/4877
#4G for dirty_data, vdev_aggregration_limit=16M
options zfs zfs_dirty_data_max=4294967296
options zfs zfs_dirty_data_max_max=4294967296
options zfs zfs_txg_timeout=300
options zfs zvs_vdev_aggregration_limit=16777216

 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I too wish they would add TRIM. If the disk knew that the next track to be over-written was not
in use, then it would not need to save it.
With ashift=12 you have something like 64 uberblocks. Say 5 seconds per TXG, so 320 seconds' worth of uberblocks. They might not make it off the cache, depending on how it's managed.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I know this thread is a little stale, but I found a solution to speeding up writes on SMR-based raid-z arrays ... From 'thewacokid' @ https://github.com/zfsonlinux/zfs/issues/4877

Following that link (which also pertains to ZFSonLinux) reveals that the user who posted these tunables has a very specific workload of writes that benefit from them:

We have a very peculiar workload here, but we're trying to minimize the pain of working with SMR drives. What we're shooting for is essentially a combination of settings that will allow writes in a file to queue up until we explicitly sync the file

So as with all tunables, be very careful about just applying them blindly without consideration for the other impact they could have on the system.

zfs set recordsize=1M rpool
Fine for datasets that contain files significantly larger than 1MB - writing 100MB backup chunks or storing 1GB+ video files , sure - but for files that are barely larger than 1MB, you could waste a lot of disk space if it decides to write two 1M records for a 1.1M file.

zfs_dirty_data_max=4294967296
options zfs zfs_dirty_data_max_max=4294967296

Given sufficient RAM (40GB+) this will auto set to 4GB; and allowing more dirty data is a bandaid solution to a real problem of "your vdevs are too slow" and also has the potential to steal RAM from ARC.

zfs_txg_timeout=300
This value is a maximum; transactions can be forced by hitting the dirty_data_sync threshold (default 64M) whenever there's activity. Consider increasing dirty_data_sync as well so that the threshold to trigger a txg commit is higher, but bear in mind that unless you have an SLOG device this data is at risk.

zfs_vdev_aggregration_limit=16777216
This lets ZFS commit up to 16MB to a vdev as a "sequential write" - which is closer aligned to current SMR zone sizes. Could be very helpful.

However, all of this essentially boils down to trying to polish a turd. Current SMR drives are like the very early SSDs with no TRIM or garbage collection - if you hit a NAND page that needed to be zapped, it would do it on-demand, and you paid the penalty right then. But you also had no way of knowing or assuring what kind of latency you would get. Perhaps new drives will attempt to "defrag" themselves proactively, but this would obviously cause head contention on the drive itself.

Once T10 ratifies Zoned Block Commands/Zoned ATA Commands (I believe that's set for 2022?) and we get Host-Aware SMR drives available, ZFS will be able to "discuss" with the drive to find out where there's a zone available, how long the expected band cleaning time will be, etc - and balance the writes accordingly. A much larger amount of dirty/outstanding data will likely be necessary, given that the prototype HA-SMR firmware uses 256M "zones" - but ultimately, the physical nature of SMR drives means that they'll really be best used for a WORM (write once, read many) workload.

Or we can hope that Seagate/WD make good on their respective promises of HAMR/MAMR, and we'll all be enjoying 40TB HDDs by that time. ;)
 

shorthand

Cadet
Joined
Sep 28, 2018
Messages
4
HoneyBadger ... I agree ... and I should have noted that using SMR drives for normal read-write workloads would be a mistake. These settings, especially recordsize=1M will cause huge read multiplication on any normal workflow. My goal was to create a backup box on the cheap simply because I and a colleague handle too much data for our IT department to back up on its own. (They know what we're doing.)

However, given how hard it was to find the tuning, I thought some would appreciate knowing that reasonably fast writes were indeed possible.

SMR drives are still meant for essentially archival use, if you try to use them some other way, it will be bad.

The OP on github noted that the zfs_dirty_data and recordsize=1M options are what helped the most.

My use case for SMR drives is a backup server that just holds data (pg_dump and rsync), so because the data is by definition redundant (and coming off of other linux boxes running parity drives), I'm totally OK with some data loss due to power failure and having the write process steal RAM from the ARC. The idea was to do this on the cheap with an extra machine IT had while buying enough drives, so 8GB of RAM is adequate.

The database server and analysis workstation that we're backing up from have normal drives and plenty of SSD (ZIL is mirrored on the database server, etc.) and near-default zfs setups.

The backup server, on the other hand, is really a write-only box ... the data then gets SLOWLY (I mean a couple of months) streamed out to CrashPlan just in case of a fire or a hack or some other disaster that destroys all of the data in my cube. If nothing goes wrong, we never read the data it holds.
 

pro lamer

Guru
Joined
Feb 16, 2018
Messages
626

shorthand

Cadet
Joined
Sep 28, 2018
Messages
4
I had been having problems with dropped drives that weren't actually bad ... I have actually been fighting those for months - but they were hardware issues that I finally got sorted on Friday. They included bad eSATA cables (I was gifted a thin tower, so had to locate 4 of the 6 drives externally) and a bad enclosure. One issue is that I think a lot of the eSATA JBOD enclosures around are rather old ... they have been sitting on a shelf for a year or two.

Also, before increasing the write size, when the drives were overloaded, it appears the zfs would interpret some of the timeouts as bad drive issues and mark them as bad in the array though the SMART metrics said the drive was OK.

Lessons on that front are:
  • Don't use USB ... you can't access the drives' low-level settings and will fight their setting to go to sleep when idle.
  • While external SATA enclosures are logically identical to internal SATA drives, the extra complexity of eSATA cabling and extra power supplies can definitely cause headaches - you can live with it but its not nearly as good as a dedicated tower with all-internal drives.
  • SATA port multipliers (4-drive SATA JBOD enclosures) will both slow down your transfers (a lot) and also if any drive is having pre-failure issues, you will never be able to tell which it is as the whole bus will run slow. However, in a pinch, they definitely work better and more reliably than USB.
Unlike your link with virtualization, I'm running on bare metal (ubuntu 18.04 LTS), but I've written about 10TB to the machine over the weekend and haven't had any issues. I have never tried to send low-level disk commands through a hypervisor ... nor would I likely ever try to as I would either run bare metal or containerize my analysis.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
However, in a pinch, they definitely work better and more reliably than USB.
I tend to doubt that. With decent bridges, USB works fairly well, whereas port multipliers are a disaster in a number of ways and that's if they work in the first place. Not even Intel wants to support them.
 

pro lamer

Guru
Joined
Feb 16, 2018
Messages
626
I've written about 10TB to the machine over the weekend and haven't had any issues.
What was the transfer rate? What is your drives setup and which of them are SMR?

Sent from my mobile phone
 

shorthand

Cadet
Joined
Sep 28, 2018
Messages
4
Eric - The more USB I used, the more data errors I had, but in retrospect, that could have been small writes to SMR drives.
As for port multipliers ... I have had OK luck with one, but single enclosures are definitely better after you sort out the good units from the bad ... I ordered 4 of these and sent 2 back.

The machine is just one SSD and then 6 ST8000DM004's. 8GB of RAM and a i5-2500, so 4 3GHz cores without hyperthreading.

For speed with this setup, I'm getting about 33% utilization with 1 Gbps coming in the Ethernet port, so that works out to sustained writes of approximately 75 MB/s per disk.

As for setup:
Code:
 pool: backuppool
 state: ONLINE
  scan: resilvered 12K in 0h0m with 0 errors on Thu Sep 27 16:04:44 2018
config:

		NAME												STATE	 READ WRITE CKSUM
		backuppool										  ONLINE	   0	 0	 0
		  raidz1-0										  ONLINE	   0	 0	 0
			ata-ST8000DM004-2CX188_WCT0DAJ4				 ONLINE	   0	 0	 0
			sdh											 ONLINE	   0	 0	 0
			sdd											 ONLINE	   0	 0	 0
			sde											 ONLINE	   0	 0	 0
			sda											 ONLINE	   0	 0	 0
			sdc											 ONLINE	   0	 0	 0
		logs
		  ata-INTEL_SSDSC2KW128G8_BTLA81930AXE128BGN-part6  ONLINE	   0	 0	 0
		cache
		  sdb7											  ONLINE	   0	 0	 0

errors: No known data errors



Code:
NAME		PROPERTY			  VALUE				  SOURCE
backuppool  type				  filesystem			 -
backuppool  creation			  Tue May 15 13:33 2018  -
backuppool  used				  10.6T				  -
backuppool  available			 23.1T				  -
backuppool  referenced			153K				   -
backuppool  compressratio		 1.00x				  -
backuppool  mounted			   no					 -
backuppool  quota				 none				   default
backuppool  reservation		   none				   default
backuppool  recordsize			1M					 local
backuppool  mountpoint			/backuppool			default
backuppool  sharenfs			  off					default
backuppool  checksum			  on					 default
backuppool  compression		   lz4					local
backuppool  atime				 off					local
backuppool  devices			   on					 default
backuppool  exec				  on					 default
backuppool  setuid				on					 default
backuppool  readonly			  off					default
backuppool  zoned				 off					default
backuppool  snapdir			   hidden				 default
backuppool  aclinherit			restricted			 default
backuppool  createtxg			 1					  -
backuppool  canmount			  off					local
backuppool  xattr				 on					 default
backuppool  copies				1					  default
backuppool  version			   5					  -
backuppool  utf8only			  on					 -
backuppool  normalization		 formD				  -
backuppool  casesensitivity	   sensitive			  -
backuppool  vscan				 off					default
backuppool  nbmand				off					default
backuppool  sharesmb			  off					default
backuppool  refquota			  none				   default
backuppool  refreservation		none				   default
backuppool  guid				  14711942019921125983   -
backuppool  primarycache		  all					local
backuppool  secondarycache		metadata			   local
backuppool  usedbysnapshots	   0B					 -
backuppool  usedbydataset		 153K				   -
backuppool  usedbychildren		10.6T				  -
backuppool  usedbyrefreservation  0B					 -
backuppool  logbias			   throughput			 local
backuppool  dedup				 off					default
backuppool  mlslabel			  none				   default
backuppool  sync				  standard			   local
backuppool  dnodesize			 legacy				 default
backuppool  refcompressratio	  1.00x				  -
backuppool  written			   153K				   -
backuppool  logicalused		   11.0T				  -
backuppool  logicalreferenced	 40K					-
backuppool  volmode			   default				default
backuppool  filesystem_limit	  none				   default
backuppool  snapshot_limit		none				   default
backuppool  filesystem_count	  none				   default
backuppool  snapshot_count		none				   default
backuppool  snapdev			   hidden				 default
backuppool  acltype			   off					default
backuppool  context			   none				   default
backuppool  fscontext			 none				   default
backuppool  defcontext			none				   default
backuppool  rootcontext		   none				   default
backuppool  relatime			  off					default
backuppool  redundant_metadata	all					default
backuppool  overlay			   off					default

 
Status
Not open for further replies.
Top