Version n.n

IVOA WG Internal Draft YYYY Month DD

This version:
Latest version:
Previous versions:
Bernhard Bauer




Activate JavaScript to create the Table of Contents.


Data types


A reference to a VOSpace service and a container node to be used for storing data chunks therein.

The endpoint address for the service as a string.
A Node (see VOSpace specification) describing the container node.


An opaque reference to a pipe.


An opaque identifier capturing the read/unread state of the chunks in a pipe.



Creates a new pipe from the source VOSpace to the destination VOSpace, as well as container nodes in the source and destination VOSpaces. The nodes in the source and destination VOSpace paramaters are directly passed to the CreateNode method of the respective VOSpace, and the resulting nodes are returned.

The source VOSpace.
The destination VOSpace.
A newly created Pipe identifier.
The created source container node.
The created destination container node.
If one of the source or destination VOSpaces returns a fault.


Destroys a pipe and the associated container nodes at the source and destination VOSpaces.

The pipe that should be destroyed.
If the pipe could not be found.
If one of the source or destination VOSpaces returns a fault.


Notifies the VOPipe service that a new chunk has arrived in the source VOSpace and instructs it to transfer the chunk to the destination VOSpace.

The pipe the chunk should be written to.

The name of the chunk, relative to the container VOSpace.

For example, for a container node with the URI vos://edu.jhu.pha!vospace/transfer, the name of the chunk should be chunk-1.vot for the node with URI vos://edu.jhu.pha!vospace/transfer/chunk-1.vot.

In this case, if the destination container node has the URI vos://nvo.caltech!vospace/transfer, the resulting chunk at the destination VOSpace would be vos://nvo.caltech!vospace/transfer/chunk-1.vot.

If the pipe could not be found.
If one of the source or destination VOSpaces returns a fault.


Reads one or more chunks from the pipe. If no unread chunks are in the pipe, waits at most timeout seconds before returning an empty list of chunks, otherwise returns immediately with a list unread chunks.

The pipe the chunks should be read from.
The time to wait if there are no unread chunks, in seconds.

An optional, opaque state identifier signifying what chunks are read and unread. If it is not given, it is assumed that no chunks have been read yet.

Subsequent calls to ReadChunks should always use the state returned from the last call.

A state identifier that should be used in subsequent calls.
A list of unread chunks as strings. The chunks are given with their full URI in the destination VOSpace.
If the pipe could not be found.

Alternative interface

SkyQuery processing with VOPipes

SkyQuery processing happens via a set of SkyNodes, which are connected by pipes. The data for each SkyNode is stored in a VOSpace associated with it. The VOSpaces should be connected to the SkyNode with a high-bandwidth connection (they could even be located on the same machine). As well, the VOSpaces should have high-bandwidth connections among themselves.

The VOPipe service that implements the pipes can be located anywhere, and different servers could be used for the pipes; but using just a single service for all pipes is also possible. Because the VOPipe service doesn't directly participate in the data transfer, it can use a low-bandwidth connection to the SkyNodes and VOSpaces.

Each SkyNode acts as a client to the VOPipe service: it reads a chunk of data from one pipe, processes it and writes it to another pipe. The VOPipe service orchestrates the data transfer between the VOSpaces.

Data processing at SkyNodes is described in a declarative way in a fashion similar to UNIX makefiles (i.e. how to create a chunk of data from another chunk of data). Processing is stateless and acts on external events, like a chunk of data appearing in a pipe.

SkyNode batch processing interface

An interface to create SkyQuery jobs at SkyNodes should have the following methods:

Failure Recovery

VOPipe failure

If a VOPipe fails, the client (i.e. a SkyNode or the SkyQuery portal) can't connect to the VOPipe service. Because the read/unread state of the data chunks is stored client-side and the pipe state is stored in the database, no data is lost.

VOSpace failure

If a VOSpace fails, the associated SkyNode can't access it directly, and the VOPipe service can't transfer data from or to it, so it faults. In both cases, the SkyNode is notified of the failure.

SkyNode failure

Because a SkyNode acts as a client to the other services, in the case of a SkyNode failure, processing would just stop. In order to escalate the failure to the end-user, some sort of monitoring is needed.

System Diagnosis

Each SkyNode monitors the state of the services it connects to. In addition, it provides an interface to the end-user (or the SkyQuery portal, which acts on behalf of the end-user, so we're using the terms interchangeably from now on) reporting its own state (which is just "alive") as well as the state of the services. If one of the services used by the SkyNode fails, the SkyNode can't continue processing, so it tries again in exponentially growing intervals.

The end-user monitors the state of all SkyNodes and (indirectly) the other services by regularly polling them. If one of the services goes down, it has to be brought up again (probably manually). Because the SkyNodes regularly try to resume processing, no further user intervention is needed, but a manual notification would be possible.

Mockup of a possible interface:

SkyNode status
SkyNode Location Status
SkyNode A John Hopkins University OK
SkyNode B Caltech Can't connect to VOSpace - retrying in 5 minutes (retry now)
SkyNode C Technische Universität München Not responding

Dealing with data loss

In case of an "infrastructure" failure (i.e. loss of the metadata about the pipes), the pipes would have to be created again. In addition, partial results would have to be discarded, because the associated metadata is not available, so the status of the data is lost.

On the other side, if intermediate data is lost, query processing can just resume at the appropriate place because of the stateless nature of the processing.

Implementation Notes