Bigger ImagePull Failing through charts

mnbk91

Cadet
Joined
May 23, 2022
Messages
9
I'm a noob here and i just need clarification on something

Recently i noticed in my truenas scale install that some of the larger apps when i try to install them either through truecharts or official catalog (ex. Nextcloud, collabora etc..) keep failing at deployment stuck in the step

collabora
Code:
Failed to pull image "tccr.io/truecharts/collabora:v22.05.4.1.1@sha256:4b7c9b3583309d4717fb64fa1e5115695abd7e98db4bc02dff0a24cf3ac74b6c": rpc error: code = Unknown desc = context canceled


nextcloud
Code:
Failed to pull image "tccr.io/truecharts/nextcloud-fpm:v24.0.3@sha256:a307c7d49d5c7c691916f2e6bdbb10bc698bbd58498de12c4db992c4755a7018": rpc error: code = Unknown desc = context canceled


Initially i thought this was DNS issue and tried changing my dns to various providers, and I was pulling my hair trying do a lot of things from various forums/discord chats but none of it seemed to work all my configuration for the charts were correct, my dns was correct and my permissions were correct but yet bigger images seem to keep failing in the image pull stage.

After two weeks i was able to solve this by pulling the image manually using a ssh connection to my truenas scale with the command suggested by another user in the forums @aekt

Code:
docker image pull


after pulling the image manually and then trying to install it using truecharts catalog works, no issues. clean install
Note: pulling the image manually took nearly 10-15mins each time, download was completed immediately. But to extract the various parts it took some time

so my question here is why when I pull it using truecharts catalog the automatic pull generates an error for this and is there a way to solve this and let the process know that the image is being extracted.

My truenas scale configuration is as follows
Code:
OS Version:TrueNAS-SCALE-22.02.2.1
Product:ProLiant DL380p Gen8
Model:Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
Memory:20 GiB

The os is running off a 32gb pendrive
I have two pools
Image pool - 2*128gb Nvme ssd connected through PCIe (used for app storage) (Mirrored)
Tank - 4*4tb HDD segate (raidZ1) (other storage)


Would love to get an idea of why this happens to fix the issue in the future, if any other details are required to fill in the gaps please do let me know.

Thanks in advance
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I'm a noob here and i just need clarification on something

Recently i noticed in my truenas scale install that some of the larger apps when i try to install them either through truecharts or official catalog (ex. Nextcloud, collabora etc..) keep failing at deployment stuck in the step

collabora
Code:
Failed to pull image "tccr.io/truecharts/collabora:v22.05.4.1.1@sha256:4b7c9b3583309d4717fb64fa1e5115695abd7e98db4bc02dff0a24cf3ac74b6c": rpc error: code = Unknown desc = context canceled


nextcloud
Code:
Failed to pull image "tccr.io/truecharts/nextcloud-fpm:v24.0.3@sha256:a307c7d49d5c7c691916f2e6bdbb10bc698bbd58498de12c4db992c4755a7018": rpc error: code = Unknown desc = context canceled


Initially i thought this was DNS issue and tried changing my dns to various providers, and I was pulling my hair trying do a lot of things from various forums/discord chats but none of it seemed to work all my configuration for the charts were correct, my dns was correct and my permissions were correct but yet bigger images seem to keep failing in the image pull stage.

After two weeks i was able to solve this by pulling the image manually using a ssh connection to my truenas scale with the command suggested by another user in the forums @aekt

Code:
docker image pull


after pulling the image manually and then trying to install it using truecharts catalog works, no issues. clean install
Note: pulling the image manually took nearly 10-15mins each time, download was completed immediately. But to extract the various parts it took some time

so my question here is why when I pull it using truecharts catalog the automatic pull generates an error for this and is there a way to solve this and let the process know that the image is being extracted.

My truenas scale configuration is as follows
Code:
OS Version:TrueNAS-SCALE-22.02.2.1
Product:ProLiant DL380p Gen8
Model:Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
Memory:20 GiB

The os is running off a 32gb pendrive
I have two pools
Image pool - 2*128gb Nvme ssd connected through PCIe (used for app storage) (Mirrored)
Tank - 4*4tb HDD segate (raidZ1) (other storage)


Would love to get an idea of why this happens to fix the issue in the future, if any other details are required to fill in the gaps please do let me know.

Thanks in advance

Thanks for documenting this well... could you "report a bug" and the engineering team can look at it.
Perhaps there is a timeout on the imagepull... what is your network speed? Document what you can and share with the group the bug report.
 

waqarahmed

iXsystems
iXsystems
Joined
Aug 28, 2019
Messages
136
@mnbk91 please create a ticket at https://ixsystems.atlassian.net/ and kindly make sure to attach a debug of your system and also the files generated by the following commands:
1. journalctl --no-pager -u k3s > k3s-scale.log
2. journalctl --no-pager -u docker > docker-scale.log

Btw if you are able to reproduce this, please reproduce and then attach a debug and run the above commands so we are sure this is captured in the logs and not cycled out. Thanks!
 

mnbk91

Cadet
Joined
May 23, 2022
Messages
9
@waqarahmed & @morganL Created a ticket with the above mentioned data. I was able to re-create the issue with both official catalog (Collabora) and Truecharts (Outline) If any other info is required please do let me know.

My internet speed is 300Mbps

 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Thanks.. that's NAS-117452

@wsoteros Notice that the Jira inks now should show the generic "log In" screen rather than the specific Bug ID. Can that be fixed?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Also that ticket is not public (even for registered users)
Apologies.. there is an issue protecting users debug and log files. If we have them on the ticket, the ticket is private. We are developing workarounds.
 

stavros-k

Patron
Joined
Dec 26, 2020
Messages
231
Apologies.. there is an issue protecting users debug and log files. If we have them on the ticket, the ticket is private. We are developing workarounds.
Np, thanks for letting us know that you are working on it!
 

Flynn84

Cadet
Joined
Apr 22, 2023
Messages
6
Having the same problem on a slow internet connection when pulling an image. Manually pulling the failed image via docker pull solves the problem immediately and causes the app to deploy.

Code:
Failed to pull image "tccr.io/truecharts/jellyfin:10.8.9@sha256:32efce54017dcefcd437dc179baabb0ed79ad44298b9d0850388bd69f8e9eac5": rpc error: code = Unknown desc = context deadline exceeded


@morganL Can you guys confirm that you are putting some prio on fixing this? It's been over half a year now and this is still a problem.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
@Flynn84, TrueNAS is not responsible for TrueCharts. Please consult TrueCharts for issues with TrueCharts apps.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Having the same problem on a slow internet connection when pulling an image. Manually pulling the failed image via docker pull solves the problem immediately and causes the app to deploy.

Code:
Failed to pull image "tccr.io/truecharts/jellyfin:10.8.9@sha256:32efce54017dcefcd437dc179baabb0ed79ad44298b9d0850388bd69f8e9eac5": rpc error: code = Unknown desc = context deadline exceeded


@morganL Can you guys confirm that you are putting some prio on fixing this? It's been over half a year now and this is still a problem.

The problem may be caused by several issues:
1) Your network bandwidth
2) Bandwidth from TrueCharts repository
3) Time-out that is too aggressive

Given the docker pull is fast.. its probably not 1)

It would be good to get some technical data.

Can you measure the rate of download when you are pulling?
Can you compare this with a docker pull?
What is the size of the jellyfin image being pulled?
When is it timeing-out?
 

Flynn84

Cadet
Joined
Apr 22, 2023
Messages
6
Pretty sure this is not a TrueCharts issue but I may be wrong.

I read that this is a known k3s issue though that primarily occurs on slow connections here. I assumed this was related to the error that the op was getting but maybe it is a different issue entirely.

I am getting the full connection speed of my line on both. The pull through app deployment reliably errors out after about two minutes. I've done some more reading about the k3s issue and a timeout after two minutes is described there also. Docker pull takes about seven minutes give or take with no errors. I can post more details but am about to head to bed so that would be tomorrow.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Thanks... so if you can get the size of the pull and the expected speed, we can see if this is a tunable that might be needed.

It's not something I've seen commonly recently... are you on 22.12.2?
 

Flynn84

Cadet
Joined
Apr 22, 2023
Messages
6
Yes, 22.12.2. Also encountered the Reporting tab showing null on all values. Changing BIOS time unfortunately didn't fix that. Kinda frustrating.
 

Flynn84

Cadet
Joined
Apr 22, 2023
Messages
6
Size of the pull was about 300MB I think and expected speed on the dashboard matched speed of my internet connection.
 
Top