Practical DevOps for Big Data/Iterative Enhancement

Introduction

The goal of DICE is to offer a novel UML profile and tools that will help software designers reasoning about the quality of data-intensive applications, e.g., performance, reliability, safety and efficiency. Furthermore, DICE develops a new methodology that covers quality assessment, architecture enhancement, continuous testing and agile delivery, relying on principles of the emerging DevOps paradigm. In particular, one of the goals of DICE is to build tools and techniques to support the iterative improvement of quality characteristics in data-intensive applications obtained through feedback to the developers that will guide architectural design change.

To achieve that goal, DICE Enhancement tool is developed to provide feedback to DICE developers on the application behaviour at runtime, leveraging the monitoring data from the DICE Monitoring Platform (DMon)^[1], in order to help them iteratively enhance the application design. DICE Enhancement tool introduces a new methodology and a prototype to close the gap between measurements and UML diagrams. It correlates the monitoring data to the DICE UML models, with the aim of bridging the semantic gap between UML abstractions and concrete system implementation. Based on the acquired data, DICE Enhancement tool allows the developer to conduct within the DICE IDE more precise simulations and optimizations, that can rely on experimental data, rather than guesses of unknown parameters. DICE Enhancement tool also supports the developer in carrying out refactoring scenarios, with the aim of iteratively improving application quality in a DevOps fashion. According to our knowledge, no mature methodology appears available in the research literature in the context of data-intensive applications (DIAs) to address the difficult problem of going from measurements back to the software models, annotating UML to help to reason about the application design. DICE Enhancement tool aims at filling this gap.

The core components of the DICE Enhancement tool are two modules:

DICE Filling-the-Gap (FG) module, a tool focusing on statistical estimation of UML parameters used in simulation^[2] and optimization tool^[3]. The tool provides the data to parameterize application design-time UML models by relying on the monitoring information collected at runtime. The goal is to enhance and automate the delivery of application performance information to the developer.
DICE Anti-Patterns & Refactoring (APR) module, a tool for anti-patterns detection and refactoring. The tool provides suggesting improvements to the designer of DIAs, based on observed and predicted performance and reliability metrics. The goal is to optimize a reference metric, such as maximize latency or minimize mean time to failure (MTTF).

Motivation

Motivated by the problem of inferring the bad practices in software design (i.e., performance anti-patterns) according to the data acquired at runtime during testing and operation, especially performance data, we developed DICE Enhancement tool. Since the abstraction levels between system runtime and design time are different, it is essential to the developer to obtain the runtime information, especially performance metrics, and reflect them into design-time model to reason on the quality of an application design and infer refactoring decisions.

To support performance analysis at the design time model, developers need to rely on software performance models for further analysis and evaluation. However, to provide reliable estimates, the input parameters must be continuously updated and accurately estimated. Accurate estimation is challenging because some parameters are not explicitly tracked by log files, requiring deep monitoring instrumentation that poses large overheads, unacceptable in production environments. Furthermore, performance Anti-Patterns (AP) are recurrent problems identified by incorrect software decisions. Software APs are largely studied in the industry. The increasing size and complexity of the software projects involve the rising of new obstacles more frequently. For that reason, the identification of AP at the early steps of the project life cycle saves money, time and effort. In order to detect the performance anti-patterns of DIAs, firstly the design-time model (i.e., architecture model) and performance model need to be specified, as well as the Model-to-Model transformation rules. Architecture model, as the system design time model, is specified by UML in DICE. In practice, developers typically use the activity diagram and deployment diagram of UML to modelling the system behaviour and infrastructure configuration. As the state of the art, UML is a general-purpose modelling language in the field of software engineering, and it provides a standard way to visualize the design of a system. Despite its popularity, UML is not suitable for automated analysis (e.g., performance evaluation). As a result, model analysis phases need to transform the annotated UML diagrams to performance models, e.g., Petri Net^[4], Layered Queueing Network (LQN)^[5]; for our work also considers the LQN as the performance model. As far as the performance metrics are concerned, LQN has advantages over UML, since LQN not only describes these metrics in a compact and understandable way but also supported by various analytical and simulation tools, e.g., LINE^[6], lqns^[7], and thus allows automating system performance analysis for further performance anti-patterns detection further AP detection.

In order to achieve the above goal, we developed DICE enhancement tools to close the gap between runtime performance measurements and design time model for the anti-patterns detection and refactoring.

Existing Solutions

DICE-FG has been initially developed relying on a baseline, called FG, provided by the MODAClouds project as a way to close the gap between Development and Operations. Within DICE, the tool has undergone a major revision and is being integrated and adapted to operate on DIAs datasets. Architectural changes have been introduced in DICE-FG, compared to the original FG. Different from DICE-FG, APR has no baseline software to start from, since the only available tools in this space are not for UML. Hence it is an original contribution of DICE, to our knowledge novel in the UML space.

Throughout our desktop research, we have discovered the following solutions that can be considered as direct competitors to our DICE Enhancement Tool.

LibReDE^[8]: a library of ready-to-use implementations of approaches to resource demand estimation that can be used for online and offline analysis. LibReDE is used in general distributed systems while DICE-FG is designed for Data Intensive Applications; LibReDE mainly focuses on the resource demand estimation for performance model while DICE-FG concerns both performance and reliability estimation; the measurement data which is the input of the estimation for LibReDe is read from standard CSV files while the format of the input data for DICE-FG is more popular JSON file via DMon; LibReDE is not able to reflect the estimation results to the design time model while DICE-FG continuous parameterizing designing time models (UML model annotated with DICE Profiles) with estimated performance & reliability metric, that can inform developers on how to refactor the software architecture.

KieKer^[9]: is an extensible framework for monitoring and analyzing the runtime behavior of concurrent or distributed software systems. KieKer enables the application-level performance monitoring, including filters that allow the selection of data for further analysis. The DICE-FG, however, can reason on design-time models to deliver more accurate inferences of the model parameters from runtime monitoring data. Thus, rather than simply monitoring, the DICE-FG tool is envisioned as a machine-learning component that is aware of the application software architecture, and can use this to improve parameter learning.

PANDA^[10]^[11]: It is a framework for addressing the results interpretation and the feedback generation problems by means of performance anti-patterns. DICE-APR follows a similar methodology for automatically detecting and solving performance problems. The common thing between the PANDA and DICE-APR is they both leverage the UML model as their design time model (i.e., architecture model) while the input UML models for DICE-APR are annotated with the specific profiles (DPIM, DTSM and DDSM) which specifies the unique attributes for the Big Data application and platform. PANDA uses the Queueing Networking as it performance model while DICE-APR may consider Petri Net or Layered Queueing Networking model. DICE-APR’s refactoring processing focuses on the Big Data application and will improve the former work on refactoring cloud-based applications , it will consider both the hardware and software knowledge of Big Data application.

PAD^[12]: PAD is a rule-based performance diagnosis tool, named Performance Antipattern Detection (PAD). PAD only focuses on Component Based Enterprise Systems, targeting EJB applications while DICE-APR concerns the Big Data applications and platform. They are both based on monitoring data from running systems while PAD’s scope is restricted to the specific domain, whereas DICE-APR’s starting point is the more general UML models of Data Intensive Application.

How the Tool Works

The DICE Enhancement tool is designed for iteratively enhancing the DIA quality. Enhancement tool aims at providing a performance and reliability analysis of Big Data applications, updating UML models with analysis results, and proposing a refactoring of the design if performance anti-patterns are detected. The following Figure shows the workflow for the Enhancement tool. It covers all of its intended functionalities which are discussed in details below.

DICE Enhancement Tool

DICE-FG

As a core component of the Enhancement tool, the DICE-FG tool plays two roles:

Updating parameters of design time model (UML models annotated with DICE Profile)
Providing in the UML resource usage breakdown information for the data-intensive application

Together these features provide to the DICE designer the possibility to:

Benefit from a semi-automated parameterization of simulation and optimization models. This supports the state goal of DICE of reducing the learning curve of the DICE platform for users with limited skills in performance and reliability engineering.
Inspect in Eclipse the automated annotations placed by DICE-FG to understand the resource usage placed by a workload across software and infrastructure resources.

The main logical components of the DICE-FG tool are the Analyzer and the Actuator. Below we describe each component:

DICE-FG Analyzer: The DICE-FG Analyzer executes the statistical methods necessary to obtain the estimates of the performance models parameters, relying on the monitoring information available on the input files.
DICE-FG Actuator: The DICE-FG Actuator updates the parameters in the UML models, e.g., resource demands, think times, which are obtained from the DICE-FG Analyzer.

DICE-APR

The DICE-APR module is designed to achieve the following objectives:

Transforming UML diagrams annotated with DICE profile to performance model for performance analysis.
Specifying the selected popular AP of DIAs in a formal way.
Detecting the potential AP from the performance model.
Generating refactoring decisions to update the architecture model (manually or automatically) to fix the design flaws according to the AP solution.

The components of the APR module are Model-to-Model (M2M) Transformation (Tulsa), Anti-patterns Detection and Architecture Refactoring (APDR) as detailed below.

Model-to-Model (M2M) Transformation (Tulsa): The component provides the transformation of annotated UML model with DICE Profile into quality analysis model. The target performance model is Layered Queueing Networks.
Anti-patterns Detection and Refactoring (APDR): The component relies on the analysis results of Tulsa. The selected anti-patterns (i.e., Infinite Wait (IW), Excessive Calculation (EC)) are formally specified for identifying if there are any anti-patterns issues in the model. According to the solution of discovered anti-patterns, refactoring decisions will be proposed, e.g., component replacement or component reassignment, to solve them. The Architecture model will be shared back to the DICE IDE for presentation, to the user in order to decide if the proposed modification should be applied or not.

Open Challenges

DICE Enhancement tool assumes that designer uses UML to represent the architecture model (i.e., activity diagram and deployment diagram) and use LQN model as the performance model. Without UML model, the user has to manually define the LQN model according to their architecture model. This may lead to the extra eﬀorts. Besides, the DICE-APR currently can detect two performance anti-patterns and its target are Storm-based Big Data applications.

Application domain: known uses

DICE FG

DICE-FG has been carried across a variety of technologies, including Cassandra^[13], Hadoop/MapReduce^[14]. For example, DICE-FG provides a novel estimator for hostDemands, which is able to efficiently account for all the state data monitored for a Big Data system. hostDemands of a Big Data application may be seen as the time that a request spends at a resource. For example, the execution time of a Cassandra query of type $c$ at node $r$ of a Cassandra cluster. A new demand estimation method called EST-LE (logistic expansion) has been included in the DICE-FG distribution. This method enables to use a probabilistic maximum-likelihood estimator for obtaining the hostDemands. Such approach is more expressive that the previous est--qmle method in that it includes information about the response time of the requests, in addition to the state samples obtained through monitoring. An obstacle that was overcome in order to offer this method is that the resulting maximum-likelihood method is computationally difficult to deal with, resulting in very slow execution times for the computation of the likelihood function. An asymptotic approximation is also developed that allows to efficiently compute the likelihood even in complex models with several resources, requests types, and high parallelism level.

DICE-APR

The practical use of the DICE-APR is for the Storm-based application. There are two reasons why the DICE-APR is suitable for Storm-based applications. First, since a Storm topology may be seen as a network of buffers and processing elements that exchange messages, it is thus quite natural to map them into a queueing network model. The interactions among the core elements (i.e., Spout and Bolt) of Storm applications and the deployment information can also be easily specified by the UML activity and deployment diagrams which is semantically similar to the LQN models. Thus, DICE-APR takes UML model (annotated with DICE and MARTE profile) of Storm-application as input and generates a performance model (i.e., Layered Queueing Networks (LQNs) model) for performance analysis. Second, in software engineering, APs are recurrent problems identified by incorrect software decisions at different hierarchical levels (architecture, development, or project management). Performance APs are largely studied in the industry. However, few of them focuses on the APs of data-intensive applications. Thus, we investigated three classic APs, Circuitous Treasure Hunt, Blob and Extensive Processing^[15] and define two Anti-Patterns (i.e., Infinite Wait and Excessive Calculation) of Storm-based applications for DICE-APR. The following are the problem statements of those APs and the corresponding solutions.

Infinite Wait (IW): Occurs when a component must ask services from several servers to complete the task. If a large amount of time is required for each service, performance will suffer. To solve this problem, DICE-APR reports the component which causes the IW and provides component replication or redesign suggestions to the developer.
Excessive Calculation (EC): Occurs when a processor performs all of the work of an application or holds all of the application’s data. Manifestation results in the excessive calculation that can degrade performance. To solve this problem, DICE-APR reports the processor which causes the EC and provides the suggestion, adding a new processor to migrate tasks, to the developer.

Therefore, DICE-APR analyses the performance model of the Storm-based application by using LINE solver and provides refactoring suggestions if the above performance Anti-Patterns (APs) are detected.

Conclusion

The main achievements of DICE Enhancement tool are as follows:

DICE FG provide statistical estimation algorithms to infer resource consumption of an application and fitting algorithms to match monitoring data to parametric statistics distributions, and use the above algorithms to parameterize UML models annotated with the DICE profile.

DICE APR helps to transform the UML model annotated with DICE profile to LQN model, define and specify two APs and the corresponding AP boundaries for DIAs and detect the above APs from the models and provide the refactoring suggestions to guide the developer to update the architecture.

References

↑ D4.2 Monitoring and Data Warehousing Tools - Final version, http://www.dice-h2020.eu/resources/
↑ D3.4 DICE Simulation Tools - Final version, http://www.dice-h2020.eu/resources/
↑ D3.9 DICE Optimization Tools - Final version, http://www.dice-h2020.eu/resources/
↑ Merseguer, J., Campos, J., Software performance modeling using uml and petri nets, Performance Tools and Applications to Networked Systems, 2965, 265-289(2004)
↑ Altamimi, T., Zargari, M.H., Petriu, D., Performance analysis roundtrip: automatic Generation of performance models and results feedback using cross-model trace links, In:CASCON'16, Toronto, Canada, ACM Press (2016)
↑ http://line-solver.sourceforge.net/
↑ https://github.com/layeredqueuing/V5/tree/master/lqns
↑ Spinner, S., Casale, G., Zhu, X., Kounev, S.: Librede: a library for resource demand estimation. In: ICPE, 227-228 ,2014.
↑ A. van Hoorn, J. Waller, and W. Hasselbring. Kieker: A framework for application performance monitoring and dynamic software analysis. In Proc. of the 3rd ICPE, 2012.
↑ Cortellessa, V., Di Marco, A., Eramo, R., Pierantonio, A., Trubiani, C.: Approaching the Model-Driven Generation of Feedback to Remove Software Performance Flaws. In: EUROMICRO-SEAA, 162–169. IEEE Computer Society (2009)
↑ Cortellessa, V., Di Marco, A., Trubiani, C.: Performance Antipatterns as Logical Predicates. In: Calinescu, R., Paige, R.F., Kwiatkowska, M.Z. (eds.) ICECCS, 146–156. IEEE Computer Society (2010)
↑ Parsons, T., Murphy, J.: Detecting Performance Antipatterns in Component Based Enterprise Systems. Journal of Object Technology 7, 55–91 (2008)
↑ http://cassandra.apache.org/
↑ http://hadoop.apache.org/
↑ Smith, C.U., Williams, L.G.: More new software performance antipatterns: even more ways to shoot yourself in the foot. In: Computer Measurement Group Conference, 717–725 (2003)

[1] D4.2 Monitoring and Data Warehousing Tools - Final version, http://www.dice-h2020.eu/resources/

[2] D3.4 DICE Simulation Tools - Final version, http://www.dice-h2020.eu/resources/

[3] D3.9 DICE Optimization Tools - Final version, http://www.dice-h2020.eu/resources/

[4] Merseguer, J., Campos, J., Software performance modeling using uml and petri nets, Performance Tools and Applications to Networked Systems, 2965, 265-289(2004)

[5] Altamimi, T., Zargari, M.H., Petriu, D., Performance analysis roundtrip: automatic Generation of performance models and results feedback using cross-model trace links, In:CASCON'16, Toronto, Canada, ACM Press (2016)

[6] ttp://line-solver.sourceforge.net/

[7] ttps://github.com/layeredqueuing/V5/tree/master/lqns

[8] Spinner, S., Casale, G., Zhu, X., Kounev, S.: Librede: a library for resource demand estimation. In: ICPE, 227-228 ,2014.

[9] A. van Hoorn, J. Waller, and W. Hasselbring. Kieker: A framework for application performance monitoring and dynamic software analysis. In Proc. of the 3rd ICPE, 2012.

[10] Cortellessa, V., Di Marco, A., Eramo, R., Pierantonio, A., Trubiani, C.: Approaching the Model-Driven Generation of Feedback to Remove Software Performance Flaws. In: EUROMICRO-SEAA, 162–169. IEEE Computer Society (2009)

[11] Cortellessa, V., Di Marco, A., Trubiani, C.: Performance Antipatterns as Logical Predicates. In: Calinescu, R., Paige, R.F., Kwiatkowska, M.Z. (eds.) ICECCS, 146–156. IEEE Computer Society (2010)

[12] Parsons, T., Murphy, J.: Detecting Performance Antipatterns in Component Based Enterprise Systems. Journal of Object Technology 7, 55–91 (2008)

[13] ttp://cassandra.apache.org/

[14] ttp://hadoop.apache.org/

[15] Smith, C.U., Williams, L.G.: More new software performance antipatterns: even more ways to shoot yourself in the foot. In: Computer Measurement Group Conference, 717–725 (2003)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]