Pythonic balancer control

In my day job I work with a lot of services that sit behind a load balancer providing high availability across multiple backend hosts.

Now, inevitably there are times when these services or their hosting servers require maintenance. And sometimes we want to do some investigation and troubleshooting against a running service, but without it taking any live traffic; it simply isn’t practical to try and involve the network team’s assistance with disabling interfaces gracefully. The usual approach is to consider using a server-side firewall to block inbound port access but this can be clumsy and can actually impact live traffic, albeit briefly.

One solution I like is to use an intermediate monitor (or watchdog) service that provides a healthcheck URL for the load balancer, say, http://192.168.1.100/my-service/healthcheck, where the returned status is derived from the application ports being monitored.

Now, we want this to be as lightweight as possible, so we can choose something like Python’s Flask and uwsgi (or Ruby Sinatra) to provide a simple service listener like,

#!/usr/bin/env python
#
from flask import Flask, abort, request, Response, redirect
import os
import requests

app = Flask(__name__)

@app.route('/my-service/healthcheck', methods=["GET"])
def heartbeat():
 resp = Response(response = "OK", status = 200, content_type = "text/plain")

# Now check for the node statuses
 nodes = ( "inbound", "outbound", "stats" )
 for node in nodes:
 req = requests.get("http://localhost:7070/" + node + "/isalive")
 if(req.status_code != 200):
 resp.status = "FAILED" 
 resp.status_code = req.status_code

return(resp)

if __name__ == "__main__":
 app.run()

And obviously I have skipped the setup with pip, virtualenv and the like, but that’s routine enough.

The beauty with this kind of approach is that with a few extra lines before the service port polling we can spoof an outage and allow the load balancer to complete any existing client connections (that the firewall approach will prevent) while marking the node out of action,

 # Check for the maintenance file and signal graceful failure
 if(os.path.isfile("maintenance")):
 resp.status = "Under maintenance. Remove maintenance file when complete"
 resp.status_code = 503

Now, by simple touching a file called maintenance in the directory where the application is run from, the next poll from the load balancer will register the failure, and we can test this with cURL,

$ curl -v http://localhost:7070/my-service/healthcheck
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 7070 (#0)
> GET /my-service/healthcheck HTTP/1.1
> Host: localhost:7070
> User-Agent: curl/7.52.1
> Accept: */*
> 
< HTTP/1.1 503 SERVICE UNAVAILABLE
< Content-Type: text/plain
< Content-Length: 2

Remove the file and the traffic will flow again. Remote control of the load balancer without stopping any services, reboot persistent and allowing us time and space to investigate as we please.

Advertisements

One thought on “Pythonic balancer control

  1. Pingback: Containing side-channel service monitoring | Dough, mud and penguins

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s