As you all may know, the snap ecosystem uses SPDX license identifiers to represent a snap’s licensing information in a concise and machine-readable way, which also allows for validation of SPDX expressions.
The snap store and snapd maintain a list of SPDX license identifiers (https://github.com/snapcore/snapd/commits/master/spdx/licenses.go for example) which was, I believe, snapshot at a time when SPDX 2.x was current, and one of the reasons for having this internal list (the store has a similar one) is that SPDX did not provide a machine-readable list of identifiers.
This changed with SPDX 3.0 which came out in 2018: the good news is that they now provide said json list with identifiers and some other properties of each license.
The bad news is that SPDX 3.0 also introduced new license identifiers, which is how we learned about it, when someone complained the store wasn’t accepting what they thought was a perfectly valid identifier (FWIW snapd would also have rejected those identifiers).
Since SPDX now provides their identifier list in json format, and (AFAIU) they also guarantee existing identifiers won’t change or be removed (they can be marked as deprecated though), it made trivial sense to fix the above store bug by pulling the list of json identifiers periodically, so the store would be up to date and accept the latest ones.
However, a change like this must be considered in the context of the entire snap ecosystem and toolset. The store can’t trivially just start accepting and publishing snaps with license IDs that snapd does not recognize, because then people will be unable to install them. Just for absolute clarity, I have NOT done any such changes in the store until we’re all clear and agreed on what to do.
In the past we discussed using snapd as the central SPDX validation engine, so both snapcraft and the store would call a hypothetical “snap validate-license-expression” command to ensure uniformity. In and of itself, this solution would not help with constantly-updating SPDX identifiers, because the store’s snapd copy would still need to be updated periodically (which also necessitates constant snapd updates), and also because users in the wild are still exposed to their snapd not being current with what the store has, and receiving an expression their snapd can’t parse.
So the point of this thread is to discuss with the interested parties how best to proceed.
In addition to the store bug I mentioned above, I filed this snapd bug describing the issue, and in there I mention a solution which Bret came up with, which keeps the validation engines separate as they are now, but uses the store as a central repository of license data which snapd can sync to, when needed. To repeat the proposal, which is absolutely a strawman and can be refined, modified or entirely discarded:
- The store will use the latest version of the SPDX license list from the location noted above. We will update our version on every store rollout (happens several times a week).
- Since snapds in the wild are not necessarily always in sync and up to date, there is always the possibility snapd will receive from the store a license expression with unknown (read: new) identifiers. Even having the store use snapd as the validation engine would not remove this possibility.
- So snapd could get/refresh the list of known identifiers from the store. The store will have a verbatim copy of the .json files from spdx.
- To avoid excess traffic, Bret suggested:
1- when trying to validate a snap’s license, use the local data
2- If an unknown identifier is found in an expression, try fetching the latest data from the store, and retry the validation (which should now pass). Cache that latest data to keep the local license list updated.
3- If the validation still fails, then it is a bogus expression; show the appropriate error.
We also need to consider the sideloading case (e.g. snapd could maybe have a cached list in the event of sideload or whatever). A problem with sideloads is that if a snap with a newish license expression is installed and snapd has no store access, it will be unable to update its expression. In this case I would suggest just saying it is an unknown license to this snapd.
In any case, snapd should have an initial, seeded list of licenses which should be updated periodically so the disconnected and unfrequently-updated cases don’t fall too far behind.
What do you think? I’m looking forward to working together to come up with a good ecosystem-wide solution to this issue
PD: I filed this in the snapd category but tell me if it’s better to put it in the store category. Unfortunately cross-category posts don’t seem to be allowed