Extracting existing data from projects to feed into snap.yaml

sergiusens · September 28, 2017, 12:45pm

Preface

There has been discussion for a while, such as what can be read about on Development sprint June 26th, 2017, to provide mechanisms to use metadata (I am using the term lightly here) from existing data already used in a given project tree.

Such metadata that is useful is what can be found in projects assets like:

setup.py
package.json
usr/share/metainfo/.*.xml

From these assets we can pre-populate data used in a snap such as:

summary
description
application icons
application screenshots
translations

For the case of desktop applications, where there is use of appstream data, it is also interesting to expose the appstream id in some way through store APIs and snap info which would provide the benefit of deduplication capabilities in store fronts such as Gnome Software and KDE’s Discover.

Proposal

As a proposal on the developer end, snapcraft would implement handlers to consume the metadata from specified locations. These artifacts would be specified within a part’s definition. As a strawman the proposal would be

parts:
    application_part:
        source: .
        plugin: python
        extract:
            - setup.py

Where by default, the defined files would be relative to source. To extract something that is created during the build process it would look like:

parts:
    application_part:
        source: .
        plugin: cmake
        extract:
            - $SNAPCRAFT_PART_INSTALL/usr/share/metainfo/libreoffice-writer.appdata.xml

To enable this, snapcraft would stop enforcing having these consumable metadata items as part of the snapcraft.yaml and instead check for their existence in the resulting meta/snap.yaml.

Additionally, when using an appstream source to extract information, meta/snap.yaml will add the appstream id, as a working strawman proposal it would look like:

name: cool-application
summary: <summary from extracted from the appstream xml>
description: <description extracted from the appstream xml>
...
...
appstream:
    id: org.foo.cool-application

Once the snap is avaliable on the store, the details call for the given snap will ad the appstream id into the payload as will the relevant API calls to snapd.

Work items

This is a broad scope of the items that would make this feature complete:

Add handler system for snapcraft to consume metadata.
Add an appstream handler (as a first candidate) to snapcraft.
Allow the appstream id data in the review tools.
Expose the appstream id in the relevant store API calls.
Expose the appstream id in the relevant snapd API calls.

sergiusens · September 28, 2017, 12:47pm

I’ve made this post a wiki, @niemeyer, @matiasb, @cprov, @noise or others, feel free to make edits or add information I might have missed.

matk · September 28, 2017, 6:36pm

Looks very good to me, thank you for working on this!

chipaca · September 28, 2017, 8:32pm

The last I heard (but I might’ve missed further conversations) we weren’t going to include appstream data inside the snap.yaml, but in a separate file.

In any case, an appstream id can refer to the snap itself (I think snapcraft calls it a bundle), or the apps themselves (and I think the latter is more common). For the former, id at the toplevel is correct, but for the former, you really want it to be in a map, i.e.

appstream:
  id: the.id.of.the.snap.itself
  # and more appstream data of the snap itself
  apps:
    foo:
      id: com.example.foo
      # and more appstream data of the foo app

matk · October 2, 2017, 2:09pm

I was thinking the same before, but if you contain multiple apps in a snap which each have a metainfo file, which one’s description do you take for the snap? @sergiusens proposal (which is what we discussed, except for where exactly to store the data, snap.yaml vs. extra file (I think it was indeed the latter)) is nice because it removes every ambiguity for where the data is coming from.

However, having all the AppStream component-ids in the Snap data would also have advantages, for example a software center could display the snap as providing a particulr application. E.g. if one builds a LibreOffice snap containing all components (Writer, Calc, etc.) and the software center has them split out already, it can easily determine that the snap is providing these apps and offer it for installation.
In that case, we would have additional metadata from another source though, as the snap would only provide a list of component-ids, and not also a list of descriptions etc. per app it contains.

So, while I do see merit in having a component-id list, it likely is less ambiguous to only allow one ID per snap, unless Snaps allow describing the components they contain individually at some point (aka "Snap libreoffice contains application Writer that has this description and can open the following mimetypes, and has this icon, …).

matk · October 2, 2017, 2:10pm

Another thing to keep in mind is that some metainfo files require data from a .desktop file to be merged in and aren’t standalone. This is rather straightforward to implement though and should not cause problems.

sergiusens · October 2, 2017, 2:17pm

This leaves the open question of which appstream data you want to consume from to generate the content and the other thing is, how do we determine foo from your example as a key, or is this something we will just manually have snapcraft.yaml authors deal with?

niemeyer · October 2, 2017, 5:01pm

I suggest covering the appstream ID details in a separate topic as it’s a red-herring here. The language for extracting details out of external files should support any kind of data that snapcraft supports, and the appstream ID will be just one of those details among many others. It will be handled correctly no matter what we end up with as the right thing for that one case.

So, focusing on the data extraction, I think we need to split the logic in two different actions: one of them defines what information is available in a given part, and another one defines what content to adopt. Otherwise, how would we tell which of the N parts a snap makes use of defines the actual content that represents the exposed functionality of the snap?

Defining what a part has to offer takes place inside the part definition itself, perhaps via a new parse-info field. The info term is somewhat general, but we’re already using it on the snap info side which exposes some of those details, and given that we’re qualifying it with the parse prefix it sounds reasonable.

It would look similar to:

parts:
    part-one:
        source: .
        plugin: python
        parse-info:
            - setup.py
            - data/appstream.xml

Instead of asking the user to define whether these paths are inside the build directory or the installation one, we can just look up in both. First the installed one, as that may have been processed and thus should have precedence, and then the build location. If both are missing, we error out clearly complaining about the inconsistency. Otherwise, the path and content are then introspected to find out which info parsing plugin to use, and the parsing then takes place extracting all known details out of the file.

Information defined inside files presented above in the list take precedence over files defined below, so in the example above if both setup.py and appstream.xml define a summary, the one from setup.py is recorded for this part.

For the snap to actually use that data, though, the top-level adopt-info field must point to the specific part name containing the data. This way there’s no risk for bogus data to be injected into a snap because one of its parts suddenly makes information available, nor any ambiguities if multiple parts define their own information sources.

adopt-info: part-one

With that, all information parsed out of the given part is imported into the snap definition as long as it hasn’t been defined explicitly locally. In other words, any details defined locally in snapcraft.yaml take precedence over anything parsed out of the defined part. This allows importing external details while still overriding and polishing specific fields as required.

How does that sound?

evan · October 2, 2017, 5:36pm

I was inclined to suggest that we just have parse-info and error if it appears multiple times in the yaml. However, that would cause builds to fail if, as you suggest, a part defined outside the yaml adds parse-info. Fixing this would require forking the part or manually setting the metadata. Not great.

I’m not quite ready to abandon the idea, though. Given the library nature of parts with remote definitions, do we have a use case for them setting application metadata?

niemeyer · October 2, 2017, 5:41pm

There’s no delta between what you can put in a remote part vs. what you can put in a local one, and that is per design. If you have a part for postgres, I should be able to include it in my own snap for embedding it.

evan · October 3, 2017, 7:44pm

That convinces me. Thanks @niemeyer.

Lin-Buo-Ren · June 29, 2018, 4:50pm

Is it possible to append additional info after the adopted one? For example, I’d like to add security confinement related info like which slot the user can connect to for extra functionality in the description keyword, is that possible?

sergiusens · June 29, 2018, 6:01pm

We have adding, snapcraftctl get-description for which you can later do snapcraftctl set-description what you got with appended data. This however is not on our current roadmap, I wouldn’t expect it to happen before 18.10 is released.