Practical DevOps for Big Data/Future Challenges

Although the book has proposed a concrete solution for practical DevOps in software engineering, unavoidably the book could not cover in details a number of emerging themes in DevOps and software systems development. In this section, we discuss two emerging challenges in the area, which may spawn new industrial and academic research efforts.

From DevOps to DataOps edit

Throughout the book, we have focused on developing enterprise IT applications that rely on Big data processing technologies. However, we have not developed in detail the problem of designing and maintaining a data feeding pipeline for a Big data application. This problem, which typically falls under the remit of data engineering experts, involves the acquisition, filtering, analysis, and storage of the data collected by the Big data application. Specific architectures are progressively emerging, such as the notion of data lakes that consolidate all heterogeneous datasets of an organization in a single storage location, and data engineers have recently recognized the need to adopt DevOps-style methods for the release and update of the code involved in the data pipelines.

To make a concrete example, a set of teams collaborating on developing a business analytics pipeline, also need to test, update, and release new data queries and statistical analyses, in the same way in which a software engineer releases new versions of an application or its composing elements. This emerging trend, called DataOps, raises new research challenges, such as deciding which methods developed in DevOps could be adopted in this problem domain or establishing whether DevOps and DataOps methods can coexist in the same methodology. Quality-driven engineering methods raise strong challenges in the DataOps domain, as one needs to guarantee data privacy and data integrity with all releases, calling for dedicated methods for testing.

DevOps and Micro-services edit

Emerging industry trends, such as microservices, are raising the problem extending DevOps to support the delivery of cloud applications that are decomposed into fine-grained and independently deployable services, called micro-services. Micro-services different from traditional web services in a number of ways:

  1. Micro-services are lean, meaning that a traditional web service can expose many APIs, whereas a micro-services typically implement a single, or very few, functions. Moreover, a micro-service is typically based on lightweight REST/JSON communication, as opposed to the SOAP/WS-* stack common in web services.
  2. Next, a micro-service is meant to map exclusively to a single development team, so that each team can release independently of the others new versions of that service. This implies, among others, the notion that each team can manage in a decentralized fashion its own data, relieving the application architecture of shared databases and thus fostering the need to define new architectural paradigms and also new methods to ensure availability, consistency, and checkpointing. Another important property of micro-services i
  3. Lastly, and perhaps most importantly, micro-services are meant to exploit the advantages of containerization, which implies, for example, the ability to autoscale the services. A consequence of this is that when a micro-service becomes so fine-grained to be equivalent to a single function call in the application code, a so-called nano-service, it is effectively possible to auto-scale portions of the application code, trading performance for a lower cost of scaling.

Clearly, the above properties of micro-services make them rather different from basic enterprise applications, and they require DevOps to evolve in the direction of being a decentralized practice, with each team having at its disposal tools to reason about application updates before delivering to production. Developing support tools for decentralized software engineering of microservices and to reason on how to decompose an application architecture into right-sized services are among the most pressing research challenges for the near future.