Saturday, June 02, 2018

Immutable infrastructure (tech pill)

Immutable infrastructure is a pattern to change/evolve processes without modifying the running code, actual configuration, and the base components (library, auxiliary processes, SO configuration, SO packages, etc).  In summary, avoid any manual or automatic configuration change to the running systems.
But if we don't allow changes, how can we evolve our application and service?  Easy, change the whole piece (function, container, machine) in one shot...



So to make any kind of change (code, configuration, SO) we don't connect to the target box to execute commands, we "cook" a new artifact (container image or machine image) and use it to create a new instance (container or machine) with the changes. In the past, the cost/time to create a new instance was huge, so in order to optimize the process, we tend to execute the minimal change needed to run the new version. I mean, use a manual or automatic process to ssh to the box, install the needed packages, change the configuration, update the code, etc.

But...
¿What happens when the ssh dies in the middle of an update?
¿And if we have a problem installing a package?
¿How can we be sure about the actual state of a machine?
¿How can we calculate the changes to execute if we are not sure about the actual state of a machine?

Making changes in a machine using ssh, is not a transactional operation so we can have a failure in the middle of the process.

The solution, Immutable Infrastructure...
Create a new artifact and run it. Without intermediate states. Only "Not Ready" or "Ready". Simple.  And if something is wrong, destroy the artifact and try again.

This pattern is at the core of the principals container-orchestration systems and PaaS (kubernetes, swarm, mesos, heroku, open shift, Deis, etc).

Why is a good idea

  • Simplicity. Is an order of magnitude easier to destroy a resource and create a new one from scratch than to calculate the deltas to apply and execute them. 
  • If we need scalability we need to support this patter anyway.
  • Right now is easy to implement with the help of the different clouds and PaaS providers.
  • Very easy to return to the previous version.
  • Very easy to troubleshoot a problem, because there are no intermediate states. You can know the exact content of a running version (the SO state, the concrete conf, the concrete code, etc).
  • With this approach, there is no difference between languages and running environment (python, jruby, java, all the same...).  Is a general solution for all the technology your systems requires.

What are the downsides

  • In some cases, the bootstrapping/setup time to use a new machine is larger than modifying an existing one. But the time is improving continuously and if we have a scalable architecture we already are dealing with it. Another solution is to use the patter at a different level, for example, functions or containers instead of at the machine level.
  • This pattern requires more and longer steps that the classic approach so it is not practical to do it without automation. But not automatize this kind of task is shooting in your foot anyway.

Implementation Samples

As a general implementation strategy we need to be capable to make the following steps:
  1. Start a new running process (without processing workload).
  2. Detect when this running process is ready to accept real workload.
  3. Connect the running process to start processing workload.
  4. Disconnect the old running process to stop receiving new workload.
  5. The old running process completes the workload that already has.
  6. Detect that all the pending workload of the old process is completed.
  7. Destroy the old process.


If we talk about web servers the steps can be:
  1. Start a new machine/container running the web server in an internal endpoint.
  2. Detect when the web server is ready using the health check endpoint.
  3. Connect the new web server internal endpoint to the load balancer with the external endpoint.
  4. Inform the old web server to stop processing new traffic and disconnect from the load balancer.
  5. Detect when the old web server finished.
  6. Destroy the old web server.
If we talk about a background queue job processor the workflow can be:
  1. Start a new machine/container running the background queue job processor.
  2. Detect that the new machine is processing jobs from the queue.
  3. Inform the old background queue job processor to not get new jobs.
  4. Detect when the old background queue job processor has completed its pending work.
  5. Destroy the old background queue job processor.
We can think similar steps for other kinds of processes.

Design Notes

As we can see, this pattern requires collaboration from our code and support from the platform, but I assume that this is part of designing scalable systems that make great use of the capabilities of the cloud. For a good reference about designing applications for the cloud, please read The 12 Factor Apps

We can apply the same pattern at different levels. Virtual machines, Containers, Functions... The ideas are the same but the granularity is different.

Using Infrastructure as Code is recommended prerequisite to implement this pattern.

Conclusions

  • The platforms are going in this direction so they have a lot of help to implement it.
  • This is the fundamental pattern for:
    • Scale up and down in the cloud.
    • Advance patterns for deploy with zero downtime (blue-green deploy, rolling deploy, canary releasing, etc).

Other tech pills:


Saturday, May 26, 2018

Infrastructure as code IaC (tech pill)

Infrastructure as code (IaC) is the practice of defining/declaring the infrastructure we need for a system using some kind of machine-readable source files. These source files are used by a tool to provision, create or maintain in a defined state our infrastructure.

These definitions help to provision/create a different kind of resources, compute, storage, communication services, network



For cloud-based infrastructures, we can use these definitions to create "virtual resources" and to configure them to be in a certain state. For example, we can create a virtual machine with an initial  OS image and later install some software and configure it.

In a bare-metal environment, we can use the definition to configure a fixed number of machines and devices already defined in an inventory.

The goals of this practice are:

  • Avoid server configuration drifting
  • Avoid proliferation of Snow flake Servers.
  • Reduce drastically the maintenance cost and the total cost of ownership.
  • Allow easy and infrastructure evolution.

As a collateral effect, this practice also allows:
  • Use development practices for the infrastructure (version control, testing, audit, collaboration, live documentation...).
  • Create on-demand systems for development, QA, testing, and experimentation.
  • Developers collaboration.

Cons:  really, it's 2018... no, seriously I don't find a good reason to not to create infrastructure as code. And when the problem is that these resources are difficult or impossible to define using code or some kind of definition, we should avoid them as much as possible.

General Approaches and styles


  • Push. We execute a tool that parses the definitions, calculate the changes to do and execute them.
  • Pull. Each node/device have its definition and execute all the time in a loop executing the needed changes.
  • Push + Pull. A combination of the previous ones, so we can push a change (when we need to be sure or force some changes) or wait until each node/device update its configuration 
Sometimes we can restrict the IaC to provision the low-level infrastructure. For example, we can use this practice to create a PaaS (using Kubernetes, or similar), so the rest of elements are dynamically provision inside the PaaS we have created.

Related Tools:


In summary, IaC is a core DevOps practice and is the base for a lot of the innovations and evolution that the cloud brings us. It is a must for modern development in the cloud and also very recommendable for on-premise deployments.



Other tech pills:



Sunday, May 20, 2018

Good talks/podcasts (May 2018 I)

These are some interesting talks/podcast that I've seen/heard during the past month:


Wednesday, May 09, 2018

DevOps talks


As a complement of the previous post DevOps concepts and learning path and in case that you prefer watching talks, these are some great talks about DevOps:



Bases for DevOps (Lean / Flow):
DevOps:
Related:

Related posts:

Monday, May 07, 2018

DevOps concepts and learning path

As Jez Humble said at leading a devops transformation DevOps is:
A cross-functional community of practice dedicated to the study of buildings, evolving and operating rapidly changing, secure, resilient system at scale. 
DevOps includes and enhance the ideas of the Agile Software Development giving a more end to end vision of the value stream for a technology-based company. And, you know, Every Company Is A Tech Company.




In summary, I think that DevOps should be the core of any technology-based company and the only option to reach/pursue the necessary business agility.


Main characteristics of DevOps culture:

  • Collaboration between development and operations (avoiding SILOs and conflicting goals).
  • Organize around the value stream optimizing for the flow efficiency (not resource efficiency). Learning to work in small batches.
  • Remove waste (Non-Value Adding Activities).  Of course, we talk about value from the customer point of view.
  • Build quality in.
  • Create fast feedback loops.
  • Maximize organizational learning (making safe to fail and learn).



Common practices:

  • Infrastructure as code.
  • Developers are involved in operations of the system (you build it, you run it).
  • Operations involved in the development from the beginning (introducing/facilitating specific, nonfunctional requirements to create a system easy to operate and monitor).
  • Automation (to avoid errors and to facilitate short iterations).
  • Continuous Delivery.

Books / Learning path:


If you are interested in learning about this culture and you like reading books, this is the learning path I recommend:


 

The Phoenix project: As a great and easy to read an introduction to Lean in an IT environment. (My review)


 

 The DevOps handbook: The complement for The Phonix Project that explains step by step the typical practices and the strategy to introduce DevOps. A short of practitioner’s guide.


 

Continuous Delivery: To learn the principles and technical practices that enable rapid and incremental delivery of high-quality, valuable software to our customers.




Accelerate: To understand how to build and scale high performing technology organizations creating this DevOps culture. This book is also an analysis of the data from the State of DevOps reports that give an idea about the importance of these practices.






 

 

 

Team Topologies: To have a common vocabulary and organize the different types of teams that suit the business needs. The recommended team organizations try to optimize for end to end flow.




If you are involved in any technology company, do you a favor and learn about DevOps... As the State of DevOps report indicates is the key to be a high-performance organization.


Related posts:

Saturday, May 05, 2018

Good talks/podcasts (April 2018)

These are some interesting talks/podcast that I've seen/heard during the past month: