Provenance in Practice - ASOV meeting



Mathieu Servillat (LUTH, Observatoire de Paris - CNRS)

The objective of this meeting is to bring together the ASOV community and beyond to exchange on the needs expressed and solutions proposed to manage provenance information in the astronomy domain.

Provenance in practice has been discussed within the ESCAPE European project (with a dedicated workshop [1]), and at the last ADASS and IVOA conferences (see e.g. [2]). Building on prototype implementations, a full provenance management system has been designed [3]. For simpler use, the concept of last-step provenance was also recently proposed to make provenance easier to adopt.

We plan to have presentations on the morning of Tuesday 14, so that participants express their needs and solutions (under consideration or in place) with regard to provenance. Tuesday afternoon, we will have a discussion on identified common solutions, and demonstrations of some existing tools (voprov, OPUS, logprov, ...). On Wednesday 15 morning, we plan to specify the necessary VO standards or notes, and potential further developments required.

Here are some topics that may be addressed during the meeting :

  • Existing and considered provenance recommendations at IVOA
    • Data models
    • Simplified views of the data model (last-step provenance)
    • Access protocols (ProvSAP, ProvTAP)
    • Serializations
  • Use cases for the capture and the use of provenance information
    • Inputs from ESCAPE partners
    • Finding the right granularity for the capture
    • Connections with workflows
  • Demo of implemented tools (voprov, ProvSAP, ProvTAP, logprov...)
  • Towards a full provenance management system


Collaborative notes:

Ice-breaker document:


  • Benjamin Mampaey
  • Catherine Boisson
  • François Bonnarel
  • Gilles Landais
  • Jean-Marc Petit
  • Jesus Salgado
  • Jose Enrique Ruiz
  • Mathieu Servillat
  • Michèle Sanguillon
  • Mireille Louys
  • Nicolas Bruot
  • Patrick Maeght
  • Pierre Cristofari
  • Stéphane Erard
  • Veronique Delouille
  • Tuesday, December 14
    • 9:45 AM 10:00 AM
      Welcome 15m
    • 10:00 AM 12:00 PM
      Status and Use cases

      Presentation of the Provenance vision at IVOA, and round table of participants to describe their needs and their data products for which provenance is relevant

      Convener: François Bonnarel (CDS ObAS)
      • 10:00 AM
        Introduction and status of Provenance at IVOA 40m
        Speaker: Dr Mathieu Servillat (LUTH, Observatoire de Paris - CNRS)
      • 10:40 AM
        Catalog of Coronal Hole detection in TAP service 20m

        Coronal holes (CH) are seen as dark features on EUV images of the Sun. They are the source of the fast solar wind, and as such are closely monitored. The CH feature recognition module named SPoCA-CH was developped at the Royal Observatory of Belgium. Since 2010, one version of this software is running at LMSAL and provides near-real time detection of coronal holes. We would like to provide the outputs from an updated version of SPOCA-CH as a catalog for a VESPA TAP service. In this talk, we will discuss our progress, as well as questions regarding the encoding of provenance information within this TAP service.

        Speaker: Véronique Delouille (Observatoire royal de Belgique)
      • 11:00 AM
        Planetary data - VESPA 20m
        Speaker: Stephane Erard (LESIA/PADC - Obs Paris)
      • 11:20 AM
        The Outer Solar System Origins Survey 20m
        Speaker: Jean-Marc Petit (UTINAM)
      • 11:40 AM
        AMHRA and MP3C 20m

        AMHRA: Analysis and Modeling at High Angular Resolution
        MP3C: Minor Planet Physical Properties Catalogue

        Speaker: Nicolas Bruot (OCA)
    • 2:00 PM 5:20 PM
      Demo and discussions
      Convener: Mireille Louys
      • 2:00 PM
        Provenance SKA requirements 20m
        Speaker: Jesus Salgado (SKAO)
      • 2:20 PM
        Pipeline execution and tracing with CTADIRAC 20m
        Speakers: Michèle Sanguillon (LUPM - IN2P3 - CNRS), Patrick Maeght
      • 2:40 PM
        In-pipeline provenance capture for LST1 (CTA prototype telescope) 20m
        Speaker: Jose Enrique Ruiz (IAA)
      • 3:00 PM
        How to use the voprov Python package 10m
        Speaker: Dr Mathieu Servillat (LUTH, Observatoire de Paris - CNRS)
      • 3:10 PM
        Provenance capture with OPUS 10m

        ProvSAP access to provenance graphs

        Speaker: Dr Mathieu Servillat (LUTH, Observatoire de Paris - CNRS)
      • 3:20 PM
        Coffee break 20m
      • 3:40 PM
        Provenance integration to Vizier 20m
        Speaker: Gilles Landais (CDS)
      • 4:00 PM
        Query provenance using ProvTAP 20m
        Speaker: François Bonnarel (CDS ObAS)
      • 4:20 PM
        Definition of a last-step flat provenance 20m
        Speaker: Dr Mathieu Servillat (LUTH, Observatoire de Paris - CNRS)
      • 4:40 PM
        Preparation of next day topics - discussion 20m
  • Wednesday, December 15
    • 10:00 AM 12:00 PM
      Conclusion and next steps

      discussion/conclusions that may be focused on specific questions, and should lead to a road map of future implementations and recommendations.

      Convener: Dr Mathieu Servillat (LUTH, Observatoire de Paris - CNRS)


      • Last-step flat provenance in more details
        • requirements
        • mandatory fields?
        • FITS keywords?
        • map with other standards (OGIP?)
      • Discuss logprov Python package
        • demo
        • doc topics
        • associated coding guidelines
      • voprov improvement 
        • should simplify the declaration of descriptions and configuration
        • example of internal OPUS code to autogenerate the prov graph
        • doc and debugging
      • Provenance and workflows
        • connector to workflow in last-step provenance
        • mapping CWL to Provenance descriptions
        • workflow attributes: ref to file, and format 
      • Storage
        • what is a "provenance record"
        • specifications for a DB ingestion system: from a file, ingest in a relational database structured with the IVOA ProvDM
        • test of NoSQL solution / graph databases
      • Query provenance
        • ProvTAP views: should answer use cases
        • query an element of a graph?



      • logprov
        • create tutorial to build an interface that executes any code (outside Python) with prov tracking
        • documentation
        • dev:
          • use voprov instead of prov
          • to be tested with last version of gammapy
      • voprov
        • publish notebook on voprov github
          • use generic filenames
          • add location
          • add EntityDescription.content_type
        • documentation
        • link in doc to the OPUS function that auto generates a graph
        • bugs:
          • update dependency to prov==2.0
        • dev:
          • shortcuts to declare parameter and descriptions
          • shortcut to prov:label
          • YAML output
            • to be defined (IVOA Note?)
          • JSON schema?
          • serializations proposed in voprov
      • Last-step flat provenance
        • Prepare an IVOA Note
        • based on ids
        • important to include used and generated ids
        • How to resolve those ids?
          • ProvURL: link to a ProvSAP that can resolve the ids
            • PROVURL keyword in FITS
            • ServiceDescriptor in VOTable?
      • ProvTAP
        • Draft before Christmas
        • Discussion in January on IVOA draft
        • Discussion with Gilles on ProvTAP usage
      • Provenance and workflows
        • test CWL on a use case, check mapping with prov
          • HESS DR1
          • CTA data challenge
        • internship?
      • Domain specific discussions
        • to take place in January-February within ESCAPE
        • SKA (with SKAO, Jesus and IAA, Lourdes, Julian, etc.) 
        • Solar data (with ROB)
        • CTA