Skip to content

Guardian pulls large amounts of foreign IPFS data onto local disk #3206

@AlexIvanHoward

Description

@AlexIvanHoward

Problem description

Our Guardian instance, which is configured to use web3.storage as its IPFS pinning service provider, pulls large amounts of IPFS data onto local disk. The majority of this data is not our own data. For example, we currently have a mere 1.5 MB of data on our web3.storage account, but our Guardian instance has already pulled 22 GB of IPFS data from the web into its ./runtime-data/ipfs/data/ directory since we've completely cleared that directory a week ago. This is not the result of a once-off pull of data; it seems to be a continuous process, because the amout of IPFS data in the directory keeps on growing even when we're not doing anything whatsoever on our Guardian instance. I've been monitoring this process now for a while on an instance which currently has no other users than myself. I check the size of the ./runtime-data/ipfs/data/ directory as the last thing at the end of my workday, and then check the size again the next morning as the very first thing at the start of my workday. The directory is consistently at least ~1 GB of data larger in the morning than when I stopped working on it in the evening.

It has also happenened numerous times now that a huge amount of IPFS data has been pulled onto disk in a very short period of time, ultimately causing our cloud instance to crash, because the Guardian tried pulling more IPFS data onto its disk than the machine had space for. A week ago, for example, our machine - which currently has 200 GB of disk space allocated to it - crashed overnight. When I inspected the situation in the morning, I found that the crash was caused by the Guardian pulling 147 GB of data from IPFS, leaving no space left on the local disk. Our own web3.storage IPFS account did not even have 1.4 MB of data in it at the time.

We noticed this behaviour for the first time during the first half of 2023. Unfortunately I cannot provide more information on e.g., after which release this started to happen.

We don't know if this also happens on mainnet; our Guardian instances are all still running on testnet.

I am not sure if this is a bug or if it is related to tickets such as #2629 and #3046. My expectation, however, is that a Guardian instance, when the IPFS storage provider is a cloud-based provider (such as web3.storage) and therefore NOT a local IPFS node, should not be pulling any IPFS data onto local disk which is not in the IPFS account of the Guardian instance itself.

Given the problem description above, I have two questions:

  1. Why does the Guardian do this?
  2. Is there any way to prevent the Guardian from doing this?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions