How can I start an application based on several snaps in a controlled way?

kromerf · July 14, 2020, 2:24pm

I’m having a background in robotics, in particular with the Robot Operating System 2 (ROS2). ROS2 based applications use “nodes” which communicate with each other. In many cases it’s required for a proper application startup that nodes are started in a controlled way (correct order, dependent on a node specific status send to a “overall application server” node, etc.). These nodes are potentially run on different machines. However in case they are run on a single machine there is a mechanism to allow controlled startup of nodes with a launch system. Enough about context, let’s get to the actual questions:

How can I start applications which consist of several snaps in Ubuntu Core? (As far as I know there is no “launch like” mechanism for snaps. People seem to hack-around/workaround with systemd somehow.)
How can I start ROS2 applications in Ubuntu Core whose nodes are wrapped into several snaps? (The ROS2 launch system can only be used in case all nodes are wrapped into a single snap.)

ijohnson · July 14, 2020, 2:36pm

Perhaps @kyrofa can comment on some best practices here

kyrofa · July 14, 2020, 6:03pm

Hey @kromerf, good questions.

In many cases it’s required for a proper application startup that nodes are started in a controlled way (correct order, dependent on a node specific status send to a “overall application server” node, etc.).

I’m a ROS guy as well, and you’re right, this isn’t uncommon, but I personally consider it a bug. Your life will always be hard until your ROS nodes gain the ability to wait for proper state, and trying to order your services is a workaround, not a solution. That said, let’s discuss your questions:

You’re correct, snaps don’t have such a mechanism. Snaps don’t really have the concept of dependencies that such a function would require. There was a proposal a while back about being able to order cross-snap services, but I’m not sure that ever went anywhere. @mborzecki, do you have any more information on that?

It can be tempting to come from ROS’ concept of nodes and want to create a snap of each node, but I don’t recommend it. You’ll end up with a lot of dependencies duplicated between each snap, and updating them will be a nightmare as it will be easy to get into the situation where a given message definition changed in the snap publishing, but not the snap subscribing.

I instead suggest approaching each snap as a standalone product, each of which uses ROS 2’s launch system to bring up the collection of nodes contained within it. Depending on your use-case, the simplest option is to put your entire ROS system into a single snap that can be transactionally updated in one go. If that doesn’t work for you, you can still split your ROS 2 system into multiple snaps, but I suggest trying to avoid coupling them too closely.

As a simple example, your UAV could have three snaps that are configured to work together without duplicating a lot of dependencies:

A “foundation” snap that contains base behavior: nodes that read from various sensors and and control the servos. This might contain a mux configured for being controlled via two topics, one from a teleop system and one from an autonomous navigation system.
A “teleop” snap responsible for communicating with whatever RC system exists. It publishes one of the topics being muxed into the “foundation” snap.
A “navigation” snap responsible for mapping and autonomous navigation. It publishes the other topic for the “foundation” snap.

You can see how these snaps are split by concern rather than node, and none strictly depend on anything being launched by anything else. All three can use ROS 2’s launch system to bring up its collection of nodes. You still need to be careful to update them in lockstep if you ever change the messages passed between them, of course, but sometimes that’s required to gain the functionality you require. It’s all a trade-off.

I hope that helps, please let me know if you have any more questions.

ijohnson · July 14, 2020, 6:30pm

We iterated on the design at the most recent engineering sprint and have a design that will be implemented either this cycle or next. When that’s closer to being ready, we will update that post or write a new one as it has changed a bit from that post.

kromerf · July 15, 2020, 9:00am

You are totally right. Using roslaunch during development is usually an intermediate step.

For non-ROS2 people: What’s meant here are managed nodes with a lifecycle.

I expected to being able to use a concept from the container domain like docker-compose files. As applications in the docker container domain are collections of containers the need for some controlled way of starting them up is essential. Simple snaps usually wrap applications without dependencies or with dependencies which do not require much state management (e.g. snaps which simply connect to AWS S3). ROS2 snaps are a special case cause it’s not uncommon that they are deployed on different machined as part of a distributed system. However non-ROS2 snaps which are part of an application will often depend on MQTT for example. The MQTT broker (e.g. Mosquitto) is a separate snap as well. As snaps like these do not provide a mechanism for state management it’s not clear to me right now how to design a super reliable system ATM.

kyrofa:

It can be tempting to come from ROS’ concept of nodes and want to create a snap of each node, but I don’t recommend it. You’ll end up with a lot of dependencies duplicated between each snap, and updating them will be a nightmare as it will be easy to get into the situation where a given message definition changed in the snap publishing, but not the snap subscribing.

I instead suggest approaching each snap as a standalone product, each of which uses ROS 2’s launch system to bring up the collection of nodes contained within it. Depending on your use-case, the simplest option is to put your entire ROS system into a single snap that can be transactionally updated in one go. If that doesn’t work for you, you can still split your ROS 2 system into multiple snaps, but I suggest trying to avoid coupling them too closely.

Thanks for this hint. I totally agree.

kyrofa · July 15, 2020, 3:07pm

In this type of case generally the broker would be included in your snap instead of using the separate snap. This allows you to ensure what you’re delivering is exactly what you tested (whereas the mosquitto snap will update on its own).

Note that ROS 2’s launch system doesn’t support remote launches like ROS 1’s did. If you’re really looking for a way to coordinate deployments across multiple machines that communicate with each other, it feels like you’re entering the territory of something like kubernetes instead. Have you ever experimented with that? Microk8s is a great place to get started. I’m actually working on an article about this now if that would be helpful to you.

kromerf · July 15, 2020, 3:56pm

Mh, you are right. Sounds way more reliable. That means I’ve to include the app dependency, here e.g. MQTT as part of the snap I guess. Is there some example code where I can learn how to do that exactly?

Yes, I know. In case of distributed snaps (ROS2 app snaps per machine) one has to implement node state management in the ROS2 application itself to implement overall application state handling including proper startup. MicroK8s is great but I would prefer using snaps. Using containers in Ubuntu Core seems to eliminate a lot of the advantages which snaps provide (easy to handle security mechanisms, over-the-air updates, potentially better performance and more opportunities for optimization, etc.).

kyrofa · July 15, 2020, 4:40pm

Ah, you’ve given this some thought. Fair enough, as long as you realize the limitations, we’re here to help!

We might want to start a new thread so we don’t pull this one too far off topic, but I’ll give you a quick rundown of doing exactly this. First, how do you install Mosquitto on Ubuntu? apt install mosquitto. When you build your snap, you can take advantage of that by utilizing stage-packages. Take the following snapcraft.yaml as an example:

name: mosquitto-test
base: core18 # the base snap is the execution environment for this snap
version: '0.1' # just for humans, typically '1.2+git' or '1.3.2'
summary: Single-line elevator pitch for your amazing snap # 79 char long summary
description: |
  This is my-snap's description. You have a paragraph or two to tell the
  most important story about your snap. Keep it under 100 words though,
  we live in tweetspace and your description wants to look good in the snap
  store.

grade: devel # must be 'stable' to release into candidate/stable channels
confinement: strict

parts:
  my-part:
    # See 'snapcraft plugins'
    plugin: nil
    override-build: |
      cat > "$SNAPCRAFT_PART_INSTALL/mosquitto.conf" <<EOF
      user root
      persistence true
      persistence_location /var/snap/$SNAPCRAFT_PROJECT_NAME/current/mosquitto
      log_dest stdout
      EOF
    stage-packages: [mosquitto]

apps:
  mosquitto:
    command: usr/sbin/mosquitto -c $SNAP/mosquitto.conf
    plugs: [network, network-bind]
    daemon: simple

Let’s break that down by section:

name: mosquitto-test
base: core18 # the base snap is the execution environment for this snap
version: '0.1' # just for humans, typically '1.2+git' or '1.3.2'
summary: Single-line elevator pitch for your amazing snap # 79 char long summary
description: |
  This is my-snap's description. You have a paragraph or two to tell the
  most important story about your snap. Keep it under 100 words though,
  we live in tweetspace and your description wants to look good in the snap
  store.

grade: devel # must be 'stable' to release into candidate/stable channels
confinement: strict

Metadata about the snap. I assume you’re familiar with this stuff, so I’ll skip it, but you can go through the first snap walkthrough for ROS 2 if that’s helpful. Let’s get into the meat of this example:

parts:
  my-part:
    # See 'snapcraft plugins'
    plugin: nil
    override-build: |
      cat > "$SNAPCRAFT_PART_INSTALL/mosquitto.conf" <<EOF
      user root
      persistence true
      persistence_location /var/snap/$SNAPCRAFT_PROJECT_NAME/current/mosquitto
      log_dest stdout
      EOF
    stage-packages: [mosquitto]

Snaps consist of one or more “parts”. In this example, there’s only one. You’ll see that we’re using stage-packages here to pull down Mosquitto and unpack it into the snap. You’ll also see that I’m creating a Mosquitto config specific for this snap, for a few reasons:

I want my persistence database to go to a versioned directory (this will be $SNAP_DATA once installed)
By default Mosquitto wants to run as the mosquitto user, but snap daemons run as (confined) root
I want my systemd journal for the service to include the log, so I request that it log to stdout

You could decide to do things differently, of course. On to the final bit:

apps:
  mosquitto:
    command: usr/sbin/mosquitto -c $SNAP/mosquitto.conf
    plugs: [network, network-bind]
    daemon: simple

This is where I define the service for Mosquitto. I hand it the config I created above, specify that it requires the ability to access the network and bind to a port (otherwise confinement would deny it the ability to do so), and finally specify that it’s a simple daemon. As a result, once this snap is installed, Mosquitto will be up and running immediately.

Build the snap:

$ snapcraft

and install it:

$ sudo snap install mosquitto-test_0.1_amd64.snap --dangerous

(--dangerous because it’s not coming from the store so it can’t verify its signature)

At this point, mosquitto is up and running, and you can poke at it from your host with mosquitto_sub and mosquitto_pub, etc.

jamesh · July 16, 2020, 1:13am

Not specific to ROS, but one option for coordinating startup of multiple services is socket activation.

If your services use sockets to communicate with each other, then socket activation can solve the ordering problem. What happens is that systemd creates the listening sockets for each application, and client applications will block until the backing service starts when they try to connect. You essentially get automatic dependencies from the behaviour of the various processes rather than having to write them out explicitly.

The main issue is that the daemons need to support systemd socket activation for this to work. If they already support it, that’s great. If not, you might have some work to do.

kromerf · July 16, 2020, 4:21pm

@jamesh Thanks a lot for this potential solution and explanation.

kromerf · July 16, 2020, 4:29pm

I hope I’m aware of most limitations

Sure. For now… thanks a lot for the detailed explanation!

I’ll have to take a deeper look into this.

Yeah, that’s something I already understand.

That’s exactly I don’t know good enough yet.