eferro's random stuff

Saturday, June 02, 2018

Immutable infrastructure (tech pill)

Immutable infrastructure is a pattern to change/evolve processes without modifying the running code, actual configuration, and the base components (library, auxiliary processes, SO configuration, SO packages, etc). In summary, avoid any manual or automatic configuration change to the running systems.
But if we don't allow changes, how can we evolve our application and service? Easy, change the whole piece (function, container, machine) in one shot...

So to make any kind of change (code, configuration, SO) we don't connect to the target box to execute commands, we "cook" a new artifact (container image or machine image) and use it to create a new instance (container or machine) with the changes. In the past, the cost/time to create a new instance was huge, so in order to optimize the process, we tend to execute the minimal change needed to run the new version. I mean, use a manual or automatic process to ssh to the box, install the needed packages, change the configuration, update the code, etc.

But...
¿What happens when the ssh dies in the middle of an update?
¿And if we have a problem installing a package?
¿How can we be sure about the actual state of a machine?
¿How can we calculate the changes to execute if we are not sure about the actual state of a machine?

Making changes in a machine using ssh, is not a transactional operation so we can have a failure in the middle of the process.

The solution, Immutable Infrastructure...
Create a new artifact and run it. Without intermediate states. Only "Not Ready" or "Ready". Simple. And if something is wrong, destroy the artifact and try again.

This pattern is at the core of the principals container-orchestration systems and PaaS (kubernetes, swarm, mesos, heroku, open shift, Deis, etc).

Why is a good idea

Simplicity. Is an order of magnitude easier to destroy a resource and create a new one from scratch than to calculate the deltas to apply and execute them.
If we need scalability we need to support this patter anyway.
Right now is easy to implement with the help of the different clouds and PaaS providers.
Very easy to return to the previous version.
Very easy to troubleshoot a problem, because there are no intermediate states. You can know the exact content of a running version (the SO state, the concrete conf, the concrete code, etc).
With this approach, there is no difference between languages and running environment (python, jruby, java, all the same...). Is a general solution for all the technology your systems requires.

What are the downsides

In some cases, the bootstrapping/setup time to use a new machine is larger than modifying an existing one. But the time is improving continuously and if we have a scalable architecture we already are dealing with it. Another solution is to use the patter at a different level, for example, functions or containers instead of at the machine level.
This pattern requires more and longer steps that the classic approach so it is not practical to do it without automation. But not automatize this kind of task is shooting in your foot anyway.

Implementation Samples

As a general implementation strategy we need to be capable to make the following steps:

Start a new running process (without processing workload).
Detect when this running process is ready to accept real workload.
Connect the running process to start processing workload.
Disconnect the old running process to stop receiving new workload.
The old running process completes the workload that already has.
Detect that all the pending workload of the old process is completed.
Destroy the old process.

If we talk about web servers the steps can be:

Start a new machine/container running the web server in an internal endpoint.
Detect when the web server is ready using the health check endpoint.
Connect the new web server internal endpoint to the load balancer with the external endpoint.
Inform the old web server to stop processing new traffic and disconnect from the load balancer.
Detect when the old web server finished.
Destroy the old web server.

If we talk about a background queue job processor the workflow can be:

Start a new machine/container running the background queue job processor.
Detect that the new machine is processing jobs from the queue.
Inform the old background queue job processor to not get new jobs.
Detect when the old background queue job processor has completed its pending work.
Destroy the old background queue job processor.

We can think similar steps for other kinds of processes.

Design Notes

As we can see, this pattern requires collaboration from our code and support from the platform, but I assume that this is part of designing scalable systems that make great use of the capabilities of the cloud. For a good reference about designing applications for the cloud, please read The 12 Factor Apps

We can apply the same pattern at different levels. Virtual machines, Containers, Functions... The ideas are the same but the granularity is different.

Using Infrastructure as Code is recommended prerequisite to implement this pattern.

Conclusions

The platforms are going in this direction so they have a lot of help to implement it.
This is the fundamental pattern for:

Scale up and down in the cloud.
Advance patterns for deploy with zero downtime (blue-green deploy, rolling deploy, canary releasing, etc).

Other tech pills:

References:

Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components Chad Fowler.

Notes:

Thanks to @Fortiz2305 and @cesarob for the feedback.

Saturday, May 26, 2018

Infrastructure as code IaC (tech pill)

Infrastructure as code (IaC) is the practice of defining/declaring the infrastructure we need for a system using some kind of machine-readable source files. These source files are used by a tool to provision, create or maintain in a defined state our infrastructure.

These definitions help to provision/create a different kind of resources, compute, storage, communication services, network

For cloud-based infrastructures, we can use these definitions to create "virtual resources" and to configure them to be in a certain state. For example, we can create a virtual machine with an initial OS image and later install some software and configure it.

In a bare-metal environment, we can use the definition to configure a fixed number of machines and devices already defined in an inventory.

The goals of this practice are:

Avoid server configuration drifting
Avoid proliferation of Snow flake Servers.
Reduce drastically the maintenance cost and the total cost of ownership.
Allow easy and infrastructure evolution.

As a collateral effect, this practice also allows:

Use development practices for the infrastructure (version control, testing, audit, collaboration, live documentation...).
Create on-demand systems for development, QA, testing, and experimentation.
Developers collaboration.

Cons: really, it's 2018... no, seriously I don't find a good reason to not to create infrastructure as code. And when the problem is that these resources are difficult or impossible to define using code or some kind of definition, we should avoid them as much as possible.

General Approaches and styles

Push. We execute a tool that parses the definitions, calculate the changes to do and execute them.
Pull. Each node/device have its definition and execute all the time in a loop executing the needed changes.
Push + Pull. A combination of the previous ones, so we can push a change (when we need to be sure or force some changes) or wait until each node/device update its configuration

Sometimes we can restrict the IaC to provision the low-level infrastructure. For example, we can use this practice to create a PaaS (using Kubernetes, or similar), so the rest of elements are dynamically provision inside the PaaS we have created.

Related Tools:

In summary, IaC is a core DevOps practice and is the base for a lot of the innovations and evolution that the cloud brings us. It is a must for modern development in the cloud and also very recommendable for on-premise deployments.

Other tech pills:

Sunday, May 20, 2018

Good talks/podcasts (May 2018 I)

These are some interesting talks/podcast that I've seen/heard during the past month:

Cloud Native related:

Anatomy of a Production Kubernetes Outage Oliver Beattie A great example of a complex outage and the learning derived from it.
Altitude NYC 2018: Observability workshop Peter Bourgon Good workshop to understand how to instrument our applications for good telemetry, logging, and tracing.
Kubernetes: Finally...A True Cloud Platform Sam Ghods A good description of kubernetes as a base abstraction for the cloud platforms.

CraftConf 2018. Always a great content at this conference.

Test-Driven: The Four-Step Dance Tim Ottinger Very interesting detail for improving our TDD practice.
What is this cloud native thing anyway? Sam Newman A good summary of the topic.
Five key challenges for software quality tomorrow Gojko Adzic This talk help us to understand the new challenges to test our systems in this new and changing world (ML, AI, cloud services, etc).
Scaling Your Architecture with Events and Services Randy Shoup New version of a classic talk from Randy. You can use as a map for dealing with data in microservices and to design using events.
Simplify the stack Matt Aimonetti Good talk about simplicity and how you can use "Requests For Comments" to discuss design ideas.

Other

The death of Agile Allen Holub (Related with the interesting blog post "Developers Should Abandon Agile") Thanks Fran Reyes for the reference.
Computers for Cynics [full version] Ted Nelson. General historic information and deep ideas about technology.
The Formula for Happiness from Solve for Happy An interesting interview with Mo Gawdat.
The Next Generation of Data Products Hilary Mason. A good introductory talk for newbies like me.
Old Is the New New Kevlin Henney Very interesting remainder about the bases and about how to identify these core ideas and the new improvements based on them. Understanding this process can help us to improve even faster if we learn and go deep in the bases.

Wednesday, May 09, 2018

DevOps talks

As a complement of the previous post DevOps concepts and learning path and in case that you prefer watching talks, these are some great talks about DevOps:

Bases for DevOps (Lean / Flow):

The Efficiency Paradox Niklas Modig. Only 18m to understand the efficiency of flow vs efficiency of resources.
Competing On The Basis Of Speed Mary Poppendieck
The Value of Flow 14 09 17 Dan North 27m Great explanation for flow efficiency for soft delivery

DevOps:

Devops and Dr Deming’s 14 Points John Willis' Ignite talk (Velocity NY 2014) 5mins to understand the relations between DevOps and the ideas of Dr. Deming.
The (Short) History of DevOps Damon Edwards. 12m
DOES17 London - The Key to High Performance What the Data Says

Continuous Delivery Sounds Great But It Won’t Work Here Jez Humble. A new version of a classic one. A great talk.
How Netflix Thinks of DevOps
Fail Better: Radical Ideas from the Practice of Cloud Computing Tom Limoncelli
Evolutionary Architecture and Fitness Functions
The Virtuous Cycle of Velocity: What I Learned About Going Fast at eBay and Google Randy Shoup

DevOps concepts and learning path

Monday, May 07, 2018

DevOps concepts and learning path

As Jez Humble said at leading a devops transformation DevOps is:

A cross-functional community of practice dedicated to the study of buildings, evolving and operating rapidly changing, secure, resilient system at scale.

DevOps includes and enhance the ideas of the Agile Software Development giving a more end to end vision of the value stream for a technology-based company. And, you know, Every Company Is A Tech Company.

In summary, I think that DevOps should be the core of any technology-based company and the only option to reach/pursue the necessary business agility.

Main characteristics of DevOps culture:

Collaboration between development and operations (avoiding SILOs and conflicting goals).
Organize around the value stream optimizing for the flow efficiency (not resource efficiency). Learning to work in small batches.
Remove waste (Non-Value Adding Activities). Of course, we talk about value from the customer point of view.
Build quality in.
Create fast feedback loops.
Maximize organizational learning (making safe to fail and learn).

Common practices:

Infrastructure as code.
Developers are involved in operations of the system (you build it, you run it).
Operations involved in the development from the beginning (introducing/facilitating specific, nonfunctional requirements to create a system easy to operate and monitor).
Automation (to avoid errors and to facilitate short iterations).
Continuous Delivery.

Books / Learning path:

If you are interested in learning about this culture and you like reading books, this is the learning path I recommend:

The Phoenix project: As a great and easy to read an introduction to Lean in an IT environment. (My review)

The DevOps handbook: The complement for The Phonix Project that explains step by step the typical practices and the strategy to introduce DevOps. A short of practitioner’s guide.

Continuous Delivery: To learn the principles and technical practices that enable rapid and incremental delivery of high-quality, valuable software to our customers.

Accelerate: To understand how to build and scale high performing technology organizations creating this DevOps culture. This book is also an analysis of the data from the State of DevOps reports that give an idea about the importance of these practices.

Team Topologies: To have a common vocabulary and organize the different types of teams that suit the business needs. The recommended team organizations try to optimize for end to end flow.

If you are involved in any technology company, do you a favor and learn about DevOps... As the State of DevOps report indicates is the key to be a high-performance organization.

https://www.eferro.net/2018/05/devops-talks.html

Saturday, May 05, 2018

Good talks/podcasts (April 2018)

These are some interesting talks/podcast that I've seen/heard during the past month:

DevOps related:

How Netflix Thinks of DevOps
DOES17 London - The Key to High Performance What the Data Says
a16z Podcast: Feedback Loops Company Culture, Change, and DevOps with Nicole Forsgren, Jez Humble, and Sonal Chokshi

Cloud related:

Five Cloud Native Ops Superpowers: Yes, You Can Do That! Dave Bartoletti, Forrester
Fail Better: Radical Ideas from the Practice of Cloud Computing Tom Limoncelli

Other topics:

Emotional Intelligence for Engineers April Wensel
a16z Podcast: Improv’ing Leadership with Dick Costolo and Peter Levine
Podcast. Interview Ron Jeffries (Agile.FM) Very interesting interview (darkscrum, agile adoption failures, number agile coaches vs the number of agile developers).

Páginas

Saturday, June 02, 2018

Immutable infrastructure (tech pill)

Why is a good idea

What are the downsides

Implementation Samples

Design Notes

Conclusions

Other tech pills:

References:

Notes:

Saturday, May 26, 2018

Infrastructure as code IaC (tech pill)

General Approaches and styles

Related Tools:

Other tech pills:

Sunday, May 20, 2018

Good talks/podcasts (May 2018 I)

Wednesday, May 09, 2018

DevOps talks

Monday, May 07, 2018

DevOps concepts and learning path

Main characteristics of DevOps culture:

Common practices:

Books / Learning path:

Related posts:

Saturday, May 05, 2018

Good talks/podcasts (April 2018)