A Visual Domain-Specific Language Editor

The Square Kilometre Array (SKA) project is the latest large-scale global scientific endeavor and is designed to produce factors more data than any other scientific project before. Just one of the SKA science projects will produce data of the order of terabytes per second. The SKA project will have stringent power constraints in order to limit the operational costs. It is a considerable challenge to manage, process and store such large datasets, within the budget and other constraints, while at the same time pushing Radio Astronomy into the forefront of the `Big-Data challenge'.

The state-of-the-art data processing systems currently deployed in astronomy are designed to handle data approximately two to three orders of magnitude smaller than the SKA. To tackle this challenge, we have developed the Data-Activated Flow Graph Engine or in short DALiuGE (Liu is the Chinese word for `Flow'). DALiuGE aims to provide a distributed data management platform and a scalable pipeline execution environment to support continuous, time and power bounded, data-intensive processing for producing SKA science-ready products. DALiuGE significantly differs from many existing processing frameworks in several aspects. First, DALiuGE allows data items to trigger events, which in turn activate cascaded execution of parallel processing tasks. Second, DALiuGE integrates data-lifecycle management into the data processing framework. Last, DALiuGE explicitly decouples the logical view of a problem and its realization or run-time deployment. This not only separates the concerns of different stakeholders such as telescope operators, pipeline developers, and astronomers but, more importantly, allows them to collectively optimise data processing at multiple levels in a harmonized way, while letting the framework optimize the generation of physical execution plans using resources and profiling information.

On the logical level, facing the scientists, we are developing a visual, domain specific programming language similar to workflow editors like Simulink, Kepler or Yabi. The prototype already exists, but it is lacking more advanced functionality and it is also pretty `ugly'. The CITS3200 project's goal is to implement some of the more advanced features and interfaces as well as a more professional and modern look-and-feel as well as an access control mechanism. It will involve web-technologies, database technologies (potentially graph-databases) and JSON based interfaces to the underlying DALiuGE. The web interface will also have to interact with very advanced scheduling and optimisation algorithms, but the project will not have to implement any of those. Currently the main language used for DALiuGE is Python 2.7.12 and the prototype editor is using the GoJS Javascript library. Other solutions are possible, but will require well-established advantages over the existing technologies. We are working in a fully established continuous integration environment using a professional tool-chain (Atlassian JIRA, Confluence and Jenkins) and are expecting that environment to be used for the development of this project as well.

Client

Contact Person: Andreas Wicenec
Telephone: 7847
Email: [email protected]
Preferred method of contact: e-mail
Location: International Centre for Radio Astronomy, Ken and Julie Michael Building

IP Exploitation Model

The client wishes to use a Creative Commons CC BY-NC model to deal with IP embodied in the project.