Practical DevOps for Big Data/Modelling Abstractions

Introduction

Data intensive applications that use Big Data technologies such as Hadoop MapReduce or stream processing, are important in many application domains, for instance, predictive analytics or smart cities. The DICE development approach follows the principles of model-driven development being the Unified Modeling Language (UML) the DICE choice for design. Therefore, providing the ability to continuously re-architect data intensive applications designs based on quality improvements such as performance or reliability. The DICE approach allows reasoning about qualities in terms of data properties (e.g., data volumes) and data usage patterns (e.g., read rates or write rates). These data characteristics will be introduced in the UML designs by means of a new profile, called DICE Profile. This outline presents the DICE profile, its challenges, its architecture and its conceptualization. The OMG Model-Driven Architecture contemplates different abstraction layers for modeling: the Platform Independent Model describes the general architecture and behavior of the software while hiding the underlying platform; the Platform Specific Model refines the previous one by adding information related to a specific platform. Stemming from these principles, the DICE Profile will consider three abstraction layers, called DPIM, DTSM, DDSM. The DICE Platform Independent Model (DPIM) supports the specification of source data format, computation logic synchronisation mechanisms and quality requirements. The DICE Platform and Technology Specific Model (DTSM) is a refinement of the previous one and it includes some technology specific concepts, both for computation logic and data storage. Finally, the DICE Deployment Specific Model (DDSM) is a further specialisation of the DTSM that includes complete information of the technology in use and the application deployment.


Motivations

We present the DICE profile, its foundations and its architecture; and we outline how DICE-profiled models can be exploited in further software development phases. The DICE profile has deep roots on other two profiles, namely MARTE and DAM, and has been structured to fit different abstraction levels (DPIM, DTSM, DDSM) similarly to the MDA standard. For its construction, we have followed a guided process as recommended by state of the art works or building quality profiles. The DICE profile has been implemented and integrated within Papyrus UML, a UML modeling tool based on the well-known Eclipse integrated development environment. The DICE domain models and the DICE profile are publicly available under an open source license in their corresponding repositories, namely the DICE-Models Repository and the DICE-Profiles Repository. In the future, we will focus on the continued validation of the DICE profile.

For constructing a technically correct high-quality UML profile that covers the necessary concepts according to data intensive applications and corresponding Big Data technologies, several steps need to be followed. First, conceptual models for each abstraction level, i.e. DPIM, DTSM and DSM, are needed. We have carried out this step by carefully reviewing the abstract concepts for modeling data intensive applications. Hence, we have obtained the abstractions for the DPIM level, which then conform the DICE domain model at DPIM level, Section 4 presents such model. Later, we have reviewed the different Big Data technologies addressed by DICE (e.g., Hadoop, Spark or Storm) and we have defined the abstractions of interest, consequently obtaining the DICE domain model at DTSM level, which is not reported in this paper for space reasons. As a second step, we realized the need of introducing fresh concepts for quality assessment since the DICE DPIM domain model initially just offers concepts for describing an architectural view. Therefore, we searched in the literature for existing UML profiles that leverage quality concerns, and decided to incorporate MARTE and DAM. Our task was to select from the domain models of MARTE and DAM those metaclasses of interest for supporting our specific needs on assessment. Later, we studied how to integrate such metaclasses and the already developed DPIM domain model. Consequently, we gained a final domain model which integrates all needed features: applications abstractions at DPIM level and behavioral abstractions for quality assessment. As a third step, the DICE profile at DPIM level was defined. Following technical advice on profile construction, we needed to map the concepts of the DPIM domain model into proper UML profile constructors, i.e., stereotypes and tags. In particular, for the DPIM level profile we have designed: (i) the DICE Library, containing data intensive applications specific types; and (ii) the DICE UML Extensions (stereotypes and tags). The objective was to introduce a small yet comprehensive set of stereotypes for the software designer. As a fourth and last step, we conducted a DPIM profile assessment by identifying a set of requirements based on three case studies from different application domains (fraud detection, acquisition of news from social sensors, vessel traffic management) and checking if they were met by the profile. If a requirement was not met, we went back to the previous step in order to refine it. Therefore, we followed an iterative process for the profile definition.


How the Profile Works

While the annotated UML model is useful for the engineer to specify both the workflow of the DIA and its data characteristics, it is not suitable for an assessment of its performance requirements. DICE is a project that, following the model-driven engineering paradigm, aims to define a quality-driven framework for developing DIA applications leveraging Big Data technologies. A key asset of DICE is the so called DICE profile, which offers the ability to design DIA using UML and a set of additional stereotypes to characterize specific DIA features. DICE-profiled models are the cornerstone of the DICE framework, since they are exploited by the DICE tool-chain to guide developers through the whole DIA lifecycle (e.g., development, quality analysis, deployment, testing, monitoring, etc.). In this paper, we have presented the DICE profile, its foundations and its architecture; and we have outlined how DICE-profiled models can be exploited in further software development phases. The DICE profile has deep roots on other two profiles, namely MARTE and DAM, and has been structured to fit different abstraction levels (DPIM, DTSM, DDSM) similarly to the MDA standard. For its construction, we have followed a guided process as recommended by state of the art works for building quality profiles.