Wednesday, October 19, 2011

Simple auto-scale with HAProxy

Few years back I started to collect some "old" hardware at work trying to make of it a little "cluster" where I'd do my own experiments and run some web apps (Ruby, Sinatra, Rails, Django, ecc), HDFS, Map-Reduce and stuff like that.

Basically, it goes like this:


Very simple, right.
Well, we started to fill it up with more apps and recently I realized we'll run out of memory sooner or later. So, what I tried is I changed the way passenger was getting started: I reduced "--min-instances" to 0 and left it to minimun (1) on one server only.

It worked... but not the way I wanted. The problem is, while Nginx is load balancing incoming requests it naturally heath-checks the nodes (server 1, 2, ...N) which is a good thing, but, my "--min-instances 0" startup parameter had a very small impact, because:
  1. Nginx doesn't know about my "min-instances" parameter and consider all the nodes always to "be ready to serve";
  2. when a new request gets routed (by Nginx) to a node with "min-instances 0", it might take quite a few moments for the first response to be spit out by the rails app instance (i.e. a sort of "warming up"), so the whole thing started to feel much slower;
  3. since Nginx does a round-robin on load balancing, my rails instances were starting up and then shut down (by Phusion Passenger which is, by the way, doing a great job) because the activity, i.e. incoming requests, was actually not that high.
So, I wanted to make something very simple, with a minimun code/whatever writing/configuring.
Here's what I came up with:
Nothing fancy, you'll say. A "standard" stack... - exactly! There's a little thing though, this "haproxy_autoscale" python script I wrote. What it does is something very simple:
  1. hook up on HAProxy logfile (I actually use syslog and option httplog)
  2. calculate an average response time over 10-15 latest requests (i.e. how fast my rails instances are responding back to the client/browser)
  3. if the average is starting to get above a threshold for a while, start to scale up.
  4. when requests activity goes down (e.g. no high load within 5 min), scale down.
By "scale up" and "scale down" I simply mean assigning to a backend server the MAINT status using HAProxy socket commands:

scale up:

echo "enable server app1-backend/server1" | socat stdio /haproxy/socket

scale down:

echo "disable server app1-backend/server1" | socat stdio /haproxy/socket

That's it. What happens is most of my app servers and rails instances do not get bothered anymore unless there really is a high load. That way I got my RAM back and can stuff up even more apps/hadoop map-red/whatever.

In case you're interested, here's that haproxy_autoscale.py script.
I have to warn you, though: by no way you should use it in a production environment as is. It's an ongoing  experiment I'm running these days. This little script still needs quite a few touches, but it'll give you an idea.
import sys
import sys
import os
import time
import re
from threading import Timer
from datetime import datetime

import urllib2
import random

# response time threshold in milliseconds: when backend starts responding 
# slower than the threshold we scale up, otherwise scale down.
THRESHOLD = 500

# num of requests to calc average
NUM_REQ = 15

# need this backend to set correct initial status of backend servers
BACKEND = "app1backend"

# any url that goes straight to the backend is fine as warmup_url
# active == True will initially set its status as UP, MAINT otherwise
# always_up - never set MAINT on that backend (leave at least one host as always_up)
SERVERS = {
  'server1': { 'active': True,  
              'always_up': True,  
              'warmup_url': 'http://server1:1234/' },
              
  'server2'  : { 'active': False, 
              'always_up': False, 
              'warmup_url': 'http://server2:1234/' },
              
  'server3' : { 'active': False, 
              'always_up': False, 
              'warmup_url': 'http://server3:1234/' },
              
  'server4' : { 'active': False, 
              'always_up': False, 
              'warmup_url': 'http://server4:1234/' },
}

# see http://code.google.com/p/haproxy-docs/wiki/UnixSocketCommands
CMD_DISABLE    = 'echo "disable server b-%s/%s" | socat stdio /haproxy/socket'
CMD_ENABLE     = 'echo "enable server b-%s/%s" | socat stdio /haproxy/socket'
CMD_SET_WEIGHT = 'echo "set weight b-%s/%s %d" | socat stdio /haproxy/socket'

def watch(thefile):
  """
  opens thefile and keeps reading new lines.
  this is supposed to be a syslog log file.
  """
  thefile.seek(0,2)      # Go to the end of the file
  while True:
    line = thefile.readline()
    if not line:
      time.sleep(0.1)    # Sleep briefly
      continue
    yield line

def host_to_scaleup():
  """
  searches through the list of not yet active backends 
  and returns a random choice, otherwise returns None
  """
  hosts = filter(lambda h: not SERVERS[h]['active'], SERVERS)
  if len(hosts):
    return random.choice(hosts)
  # otherwise return None, nothing to scale up
  
def host_to_scaledown():
  """
  filters only active hosts and returns a random choice, 
  None otherwise.
  """
  hosts = filter(lambda h: SERVERS[h]['active'] and not SERVERS[h]['always_up'], SERVERS)
  if len(hosts):
    return random.choice(hosts)
  # otherwise return None, nothing to scale down
  
def scale_up(backend, host):
  """
  send a 'warmup' request to the host in question
  and adds it to the HAProxy's active backend servers list,
  i.e. sets UP status
  """
  warmup_url = SERVERS[host]['warmup_url']
  print "%s: warming up at %s" % (datetime.now(), warmup_url)
  req = urllib2.Request(warmup_url)
  req.add_header('User-Agent', 'haproxy_autoscale')
  try: 
    r = urllib2.urlopen(req)
    #print r.info()
  except urllib2.HTTPError, e:
    print "*** didn't get a 200/OK response, sorry: ", e.code
  except urllib2.URLError, e:
    print "*** couldn't reach the backend server: ", e.reason
  else:
    # send socket commands to (re-)enable the backend
    cmd1 = CMD_ENABLE % (backend, host)
    cmd2 = CMD_SET_WEIGHT % (backend, host, 10)
    os.system(cmd1)
    os.system(cmd2)
    SERVERS[host]['active'] = True
  
def scale_down(backend, host):
  """
  removes host from HAProxy active backend servers list,
  i.e. sets MAINT status
  """
  print "%s: turning DOWN b-%s/%s" % (datetime.now(), backend, host)
  cmd1 = CMD_SET_WEIGHT % (backend, host, 0)
  cmd2 = CMD_DISABLE % (backend, host)
  os.system(cmd1)
  # for some reasong cmd1 does not always work
  # so we set weight to 0, just in case.
  os.system(cmd2)
  SERVERS[host]['active'] = False

# this is where we store response times
resps = []

def avg_resp_time(new_val):
  """
  adds new_val to the resps arrays 
  and returns average over all requests in the list.
  """
  resps.append(new_val)
  if len(resps) > NUM_REQ: 
    # keep list length up to the NUM_REQ maximum items
    del(resps[0])
    return sum(resps) / len(resps)
  # otherwise we return None: not enough data

def random_scale_up(backend):
  """does the opposite of random_scale_down()"""
  h_up = host_to_scaleup()
  if h_up: 
    scale_up(backend, h_up)
  reset_cooldown_timer(backend)
  
def random_scale_down(backend):
  """
  runs after about 5 mins of inactivity 
  (e.g. no incoming requests)
  """
  h_down = host_to_scaledown()
  if h_down:
    print "%s: scaling down %s" % (datetime.now(), h_down)
    scale_down(backend, h_down)
    SERVERS[h_down]['active'] = False
    reset_cooldown_timer(backend)
  
# when no requests are coming in anymore we still 
# want to scale down automatically, after some time.
cooldown_timer = None

def reset_cooldown_timer(backend):
  """
  creates new timer to scale down 
  after 5 min of inactivity
  """
  global cooldown_timer
  if cooldown_timer: cooldown_timer.cancel()
  cooldown_timer = Timer(60*5, random_scale_down, [backend])
  cooldown_timer.start()
  
# set initial status of every backend server
for h in SERVERS:
  if SERVERS[h]['active']:
    scale_up(BACKEND, h)
  else:
    scale_down(BACKEND, h)
  
  
print "watching %s ..." % sys.argv[1]

# regexp to match against haproxy log file
p = re.compile('.*b-([a-zA-Z0-9\-_]+)/([a-zA-Z0-9\-_]+) \d+/\d+/\d+/(\d+)/.*')

# scale up/down count threshold
scale_threshold_count = 0

# endless loop
for line in watch(open(sys.argv[1])):
  r = p.match(line)
  if r:
    backend, host, rt = r.groups()
    if host in SERVERS:
      SERVERS[host]['active'] = True # set as active since it's in the logs
      
    # calculate average response time
    resp_time = avg_resp_time(int(rt))
    
    # check whether we have enough data to reason
    if resp_time is None: 
      continue
      
    elif resp_time > THRESHOLD:
      # scale up, if we can and need to
      scale_threshold_count += 1
      
      if scale_threshold_count < 3:
        # haven't reached count max
        continue
        
      # please, do scale
      print "%s: avg resp time: %d" % (datetime.now(), resp_time)
      
      random_scale_up(backend)
      scale_threshold_count = 0 # reset the counter

If you want to try it, just change /haproxy/socket in CMD_DISABLE, CMD_ENABLE and CMD_SET_WEIGHT (at the beginning of the script) to where your haproxy socket is and run it like that:

python haproxy_autoscale.py /path/to/your/haproxy/httplog

Let me know what you think.