Category Archives: Uncategorized

Persistent volumes for DC/OS using Rexray S3 and Ceph or minio

This post is continuation or addition to the previous post.

This time I will dump here what I’ve done to use Ceph storage backed volumes with DC/OS and Rexray.

I have considered using rexray/rbd plugin but I find it more flexible to talk to Ceph via S3 interface. If you would like to go RBD way consider this blog post instead. If you don’t have Ceph, give Minio a go. It’s easy to set up minio in DC/OS

I wanted to use rexray/s3fs docker managed module / plugin, same way I did for EFS but it doesn’t support setting custom endpoint (only allowing AWS S3, not minio for example) at the moment. So I am using rexray binary / service.
I have followed this gist and tuned the set up to match my needs.

Here are the steps:
1. Upgrade rexray (install to default location and replace the one shipped with DC/OS) to 0.10 or newer:
curl -sSL | sh -s -- stable 0.10.2
service dcos-rexray stop
cp $(which rexray) $(readlink /opt/mesosphere/bin/rexray)
service dcos-rexray start

2. Install s3fs which is a dependency here
apt install s3fs

3. Configure rexray

My chef template for that is this:

Notes / gotchas:
* s3 endpoint needs to be provided both in s3fs.endpoint and in s3fs.options.url
* setting libstorage.integration.volume.operations.mount.rootPath to “/” because default “/data” doesn’t exist in freshly created volume and fails to be created (at least for me, perhaps solvable in different way) – may be related to this issue in rexray
* setting libstorage.integration.volume.operations.remove.force = true, because of this issue in rexray

Note: Marathon doesn’t allow mounting the same volume across different applications and also using rexray service instead of docker plugin restricts the mount to single instance. See the ticket here

Persistent shared volumes for DC/OS in AWS using Rexray and EFS

This tutorial is meant to be quick how-to on setting up persistent EFS volumes for DC/OS running in AWS, using rexray/efs as docker plugin module.

I’m assuming the reader got here looking for practical information and is familiar with DC/OS external persistent volumes, REX-Ray and docker volume plugins.

The goal is to enable data sharing between containers with persistent volumes. The volumes will be created by specifying them in DC/OS (marathon) app definition. We have to prepare the DC/OS agents by installing docker REX-Ray plugins on them for that to be possible [1].

I’ve chosen to use EFS. Other options available in AWS are EBS and S3. See them compared in series of posts by Eric Noriega which I found very helpful.

Step 1. Set up AWS access for REX-Ray

We need to allow rexray to manage volumes in AWS. To do that we will set up an IAM policy and a user with that policy attached. Of course do it the way that fits your setup best using roles, groups etc. I’ll leave it up to you. We could also attach the role to EC2 instance and skip using credentials in REX-Ray plugin configuration. I’ll skip AWS IAM setup details for the sake of brevity of this tutorial.

The important part is permissions to include in the policy. Here is the policy definition I used. Note that it also covers EBS permissions in case I’d like to use EBS:

Step 2. Install docker plugin

The following is how you install the plugin manually, passing configuration in the form of environmental variables. Read on to see how I have automated that with Chef.

docker plugin install --alias rexrayefs rexray/efs:latest EFS_ACCESSKEY=<> EFS_SECRETKEY=<> EFS_SECURITYGROUPS="sg-12345678" EFS_TAG=dcos-rexray

The variables are:
EFS_ACCESSKEY and EFS_SECRETKEY – credentials of the user you have created in step 1
EFS_SECURITYGROUPS – security groups (space delimited list) that you use to allow traffic to/from your networks. If creating new, dedicated security group for use with EFS, allowing traffic on port 2049 is enough.
EFS_TAG – custom string
NOTE: I’m using --alias rexrayefs because DC/OS allows only alphanumeric characters in the driver name and will refuse a name with slash in it.

Here’s a Chef recipe and example attributes hash I use to deploy the plugin

Step 3. Optional. Verification.

Let’s see if it worked:
docker plugin ls
Test creating a volume:
docker volume create --driver rexrayefs --name=hello --opt size=10
docker volume ls
Test mounting the volume to docker container:
docker run -ti --name=foo --volume-driver=rexray/efs -v hello:/opt debian:jessie '/bin/bash'
Clean up
docker volume rm hello

Step 4. Use with DC/OS app

Here’s an example DC/OS (marathon) JSON app definition that mounts a volume, and echoes timestamps to a file on it.

Start multiple containers to see how they share the volume.

Note: In DC/OS UI, when I look under “volumes” for my app, it reads “unavailable” for some reason while it works just fine.

Step 5. Unused volume prune

AFAIK DC/OS will not clean up after apps not using a volume anymore. Get yourself familiar with docker volume prune command. I’m planning putting a cron job in place to run it.

[1] Notes:
– I’m on DC/OS 1.9.0. There’s a REX-Ray service (dcos-rexray) delivered with it, with rexray binary delivered with DC/OS is 0.3.3 (old). I’m not sure what is this useful for as we’re not going to use this service or binary. We will use REX-Ray docker plugin. The DC/OS manual reads that REX-Ray volume driver is provided with DC/OS. In my installation it wasn’t.
– docker version needs to be > 1.13

Cloudflare to AWS (Route53 and CloudFront) migration

For those looking to moving from Cloudflare to AWS services. I’ve done that migration a few months ago. I don’t have it fresh in my mind, it’s just a brain dump, hope it’s helpful anyway, at least as source of keywords for further search.

Cloudflare provides DNS service with a CDN with additional DoS protection. Their services are integrated, you buy the whole suite, there’s no way to use CDN service alone from another provider – you will have to take DNS with you as well.

So we wanted to move all the things to AWS (not my decision, and mainly driven by cost. If you want my opinion – consider Fastly for CDN and Dyn for DNS) .

Breaking it down to components, we have 3 migrations to do – CDN, DoS and DNS, in the following order:

CDN: set up AWS CloudFlare distribution(s)

Cloudflare CDN is pretty dumb, so is CloudFront, hence setting up the CloudFront distributions should be rather straightforward. I’m not going to go through this part in detail. You would in most cases need 2 of them – one for static and one for dynamic objects.

DoS protection: set up AWS WAF

Cloudflare DoS protection is just a req/s rate based IP address blacklist. We can totally do the same or better using WAF with AWS Lambda function processing access logs from CloudFront (I followed the tutorial from AWS

I have manually copied recently blacklisted IP’s to WAF for the start.

DNS: migrate records to Route53

We’ve had tens of domains with tens of records so I wanted to automate the process. of Please forgive me my coding skills, following is the quick and dirty script I’ve used to migrate DNS records.

One manual task after running the script: Reconfigure the zones manually changing A records to “alias” type records pointed to respective CloudFront resource. I tried doing that with the script but for some reason it didn’t work.

Getting beyond RUM with varnish logs and

In my previous post I’ve shared some bits on shipping Varnish logs to Logstash (or Heka or else?).

Recently, I have been given an opportunity to share some of my use cases for varnish logs at Dublin DevOps Meetup. Below, you’ll find the slides.

In short: What is missing from NewRelic RUM regarding website performance data and what you can get from your ELK stack is enough granularity and flexibility for thorough debugging when it comes to edge cases, such us extremely slow client (or bot, or DoS etc) screwing up averages, large image slowing down page loads, badly cached object and more.

Another contributor to page slowness, are 3rd party assets. Here, as well, RUM is not enough to find out. One can manually track down the cause using developer tools, or but of course every Ops engineer wants a graph or two! Here, if you don’t know yet, get yourself familiar with this excellent tool. It’s super easy to get it up and running using docker (I’m planning to write a how-to post), but if you don’t have time, you can choose to pay to run for you.

knife exec – mass operations on chef node run_list and attributes

While for most of daily operations, when searching for subset of nodes with certain run_list element or environment or attribute in general, knife node search is just enough, it’s not sufficient when it comes to making modifications. There, knife exec comes to let you execute some ruby against chef, with knife oneliner.

Let’s briefly remind some basic knife node search usage, using this BTW interesting case of ambiguity of run list search syntax:

knife node search "role:foo"

will list nodes which have the role “foo” in their run lists.

knife node search "roles:foo"

will list nodes which have run chef-client and have actually applied role “foo”.

OK, but what if I wanted to remove particular role from all nodes run_lists? I will use knife exec:

knife exec -E 'nodes.find("role:foo") {|n| puts; n.run_list.remove("role[foo]");}'

Note, that without the operation will be only dry-runned.

Some other oneliner I’ve used once to search for bridge interfaces, using regex search on node attributes:

knife exec -E "nodes.find("bonding_bridge:true") {|n| puts n['network']['interfaces'].keys.grep(/br/)}

Here you will find some examples for removing persistent (“normal”) attributes with knife exec:

You can as well, run scripts with knife exec that are more sophisticated than a oneliner. Check out this knife exec script that will save you time when you want to rename chef role across all nodes

Easy .deb building with fpm

In automated infrastructure, for me, installing from source is not an option. So I build deb packages and tell chef to install these. To avoid external dependencies packages are installed from local repo.

One day I will learn proper Debian packaging… but before it happens I’ll keep enjoying building packages using FPM.

This post is quick cheat sheet for using FPM to build .deb packages. First we need to install FPM.

sudo apt-get install ruby-dev
sudo gem install fpm

Let’s build phantomjs deb package.

tar jxf phantomjs-1.9.2-linux-x86_64.tar.bz2
cd phantomjs-1.9.2-linux-x86_64/
mkdir usr
mv bin usr/
fpm -s dir -t deb -n phantomjs -v 1.9.2 -a x86_64 -C ~/phantomjs-1.9.2-linux-x86_64 -p phantomjs-VERSION_ARCH.deb usr/bin

Setting up collecting Apache metrics to ganglia using sFlow is pretty straightforward, except that installing from source, especially when automated, isn’t nice to maintain. FPM to the rescue. Let’s build mod_sflow .deb with apache module config file:

sudo apt-get install apache2-dev
sudo mkdir -p /usr/lib/apache2/modules
git clone
cd mod_sflow/
sudo apxs2 -c -i mod_sflow.c sflow_api.c
mkdir -p /tmp/mod_sflow/usr/lib/apache2/modules/ /tmp/mod_sflow/etc/apache2/mods-available/
echo "LoadModule sflow_module       /usr/lib/apache2/modules/" > /tmp/mod_sflow/etc/apache2/mods-available/sflow.load
sudo mv /usr/lib/apache2/modules/ /tmp/mod_sflow/usr/lib/apache2/modules/
fpm -s dir -t deb -n mod_sflow -v 1.0.4 -a x86_64 -C /tmp/mod_sflow/ -p libapache2-mod-sflow-VERSION_ARCH.deb .

verify the contents:

$ dpkg --contents libapache2-mod-sflow-1.0.4_amd64.deb
 drwxr-xr-x 0/0               0 2014-02-24 20:54 ./
 drwxr-xr-x 0/0               0 2014-02-24 20:54 ./etc/
 drwxr-xr-x 0/0               0 2014-02-24 20:54 ./etc/apache2/
 drwxr-xr-x 0/0               0 2014-02-24 20:54 ./etc/apache2/mods-available/
 -rw-r--r-- 0/0              68 2014-02-24 20:52 ./etc/apache2/mods-available/sflow.load
 drwxr-xr-x 0/0               0 2014-02-24 20:54 ./usr/
 drwxr-xr-x 0/0               0 2014-02-24 20:54 ./usr/lib/
 drwxr-xr-x 0/0               0 2014-02-24 20:54 ./usr/lib/apache2/
 drwxr-xr-x 0/0               0 2014-02-24 20:54 ./usr/lib/apache2/modules/
 -rw-r--r-- 0/0          194981 2014-02-24 20:49 ./usr/lib/apache2/modules/


Mind the gap – 5 things you should know before you talk to Polish audience

On May 14 2013 I’ve spoken at Atmosphere Conference 2013 in Poznan – first (to my knowledge) polish conference dedicated to web performance and scalability and DevOps culture. Yay! This year Atmosphere Conference 2014 is going to take place in Warsaw, on 19-20 May. Check it out if you haven’t yet.

Topic wise it seemed to me that they wanted it to be something similar to Velocity Conference. That’s surely direction I like. Topics in Atmosphere’s schedule were mixed from quite different worlds, so that only some talks were about performance, scalability, monitoring and such. The conference was worth going to anyway, I’ve enjoyed at least couple of talks. From the talks I remember I liked the most, I’d like to recommend watching “Monitoring at scale” from Lorenzo Alberton.

Organization was brilliant. Food was the best I ever had on event like this. There’s also interesting review of the conference on Shelly Cloud Blog.

Two things I didn’t like was that Polish speakers (with 2 or 3 exceptions) gave their talks in Polish (there were two tracks and often both was in polish – imagine you came to attend from UK or Germany and you naturally don’t speak polish) and that 2/3 of speakers were Allegro employees. I felt kinda like if I was attending Allegro meeting.

What this post is about is something I realized attending foreign speakers talks siting within mainly polish group of listeners. I’ve seen people on stage looking kinda puzzled by reaction different from what they expected. Brian McCallister for example, in his second talk, demoing Docker, surprised by almost no reaction to some simple questions he went asking (like “raise your hands who is familiar with chroot”) he tried something like “raise your hands who of you is human”. Did he saw the audience laughing? Nope, this is Poland.

So what you can expect from Polish audience. What is so different about us?

1. Poles don’t smile. Period. Or at least they don’t smile for the same reasons Americans do. Yes we know it’s not good, we’re trying to change it. So don’t be surprise polish audience don’t laugh when you make a joke. Unless the audience is drunk, but that’s another story. Or you can try looking up some hints on what makes poles laugh.

2. Poles are taught at school to sit down and listen. We are not encouraged to interact with a teacher. Conference presentation, for some, reminds the class. Asking questions is asking for trouble. Again, it starts in school. Asking question means you were not paying attention to what teacher was saying. Bad. That’s why you may be surprised by how short the Q&A part after your talk was.

3. English is not Poles mothers tongue, so it’s possible that polish audience will not follow to all of details of your talk. And we try to avoid asking questions, when it’s uncertain for us, if the question wasn’t answered before (see point 2).

4. Poles are shy (unless we’re drunk). Shy and uncertain of their language skills. This is why you should expect less interactions. Take Filip Barański’s advice and make some fun of yourself as an icebreaker.

5. The other gap you should mind is money. No, we don’t earn $100k as Paul Hammond may think Polish developers do (see his Infrastructure for Startups talk).


Modularity in Chef cookbooks

Thoughts on modularity while refactoring Chef cookbooks. The dilemma of reusability and complexity and maintainability and… It’s never just black and white.

Finer grain recipes are easier to maintain and to understand. Roles containing lightweight recipes provide high level overview helping to understand the system architecture. Recipes that are too lightweight, however, may provide minimal value and may require combining many recipes into single useful role.

Coarse grained recipes make it easier to keep changes in one place – you edit one file. Recipes that are too heavyweight, however, may become difficult to reuse because they provide more than needed for different use cases.

As system grows in size and complexity, it’s important that we design our automation for easy reuse and optimal change isolation.

pidstat cheat sheet

I find pidstat useful tool in troubleshooting system performance. Let me share with you some examples of the ways I use it.

1. Checking CPU consumption per process.

This oneliner will run continuously every 1s showing output lines only for processes consuming more than 20% of CPU:

# pidstat -l 1|perl -lane 'print if @F[5] =~ /([2-9]\d|\d{3,})\./'
12:58:50         1484   44.55    5.94    0.00   50.50     1  /usr/sbin/apache2 -k start
12:58:50         2990   46.53    4.95    0.00   51.49    13  /usr/sbin/apache2 -k start
12:58:50         2999   30.69    2.97    0.00   33.66     4  /usr/sbin/apache2 -k start
12:58:50         8976    0.00   32.67    0.00   32.67     9  flush-0:21
12:58:50        11937   54.46    4.95    0.00   59.41     2  /usr/sbin/apache2 -k start

Same, but including threads:

# pidstat -lt 1|perl -lane 'print if @F[6] =~ /([2-9]\d|\d{3,})\./'

Note that the column number differs when watching threads. It may also differ depending on how your OS diplays time (AM/PM adds one column).

Watch single PID’s CPU usage:

$ pidstat -l 1 -p 5181
03:11:16 PM       PID    %usr %system  %guest    %CPU   CPU  Command
03:11:17 PM      5181   22.00    2.00    0.00   24.00     1  /usr/lib/firefox/firefox
03:11:18 PM      5181   23.00    2.00    0.00   25.00     1  /usr/lib/firefox/firefox

2. What process in making most context switches.

I graph context switches in Ganglia (with sflow) and I once saw this kind of graph:

context switches graph in ganglia

context switches graph in ganglia

I found out that pidstat could tell me what process was making these spikes. Let’s look for processes making more than 100 non voluntary context switches per second:

# pidstat -wl 1|perl -lane 'print if @F[3] =~ /\d{3,}\./'
13:18:40        32579   4408.00   1262.00  /usr/bin/plackup
13:18:40        32588    177.00    134.00  /usr/bin/plackup

3. Checking what process is using disk the most.

If you wanted to know which processes are writing more than 100 kB/s to disk:

# pidstat -dl 1|perl -lane 'print if @F[3] =~ /\d{3,}\./'
13:24:40          382      0.00    172.00      0.00  jbd2/sda3-8
13:24:48         1406      0.00    160.00    160.00  /usr/bin/perl /etc/rc2.d/S20varnishgmetric start
13:24:54         1981      0.00    212.00      0.00  /usr/bin/plackup
13:24:54        24520     56.00   1912.00   1080.00  /usr/sbin/apache2 -k start

Bonus note on KVM and iostat: On KVM host if you run iostat -dx 1 it will show you I/O consumption per drive. Now, how to tell which of dm-N devices belong to what VM? Let’s say the most I/O heavy drive was dm-15. Here’s how:

# dmsetup ls|grep 15
guests-vm_i1_root    (252, 15)

4. Looking at RAM hungry processes second by second.

This oneliner will show processes that hold more than 200kB of their RSS in RAM.

# pidstat -rl 1|perl -lane 'print if  @F[5] =~ /([2-9]\d{5,}|\d{7,})/'


How to overwrite attribute array elements instead of merging in Chef

The way attributes get merged during the chef client run may appear troublesome in a situation when you need to overwrite an array of attributes.

Imagine you want to store an array of, say, memcache pool members in attributes. You’ve chosen to do it with array, so it looks like this (attributes file notation):

default['fooapp']['memcache'] = [ "memcache-a1:11211", "memcache-a2:11211" ]

Now you want to configure different memcache pools, for different datacenters. You want to overwrite default attributes from attributes file, you also want to be able to have different number of pool members (array elements) in each datacenter. Let’s try with a roles specific to each datacenter, like this:

"fooapp": {
    "memcache": [ "memcache-b1:11211", "memcache-b2:11211", "memcache-b3:11211" ]

What you’ll getting on a node in datacenter “b” is:

$knife node show somenode-b1 -a fooapp

Oh, of course it’s merged! You didn’t want app from colo “b” to use memcache in colo “a”!

What you could do is remove default elements using special :knockout_prefix attribute allowing for subtractive merge. Elements prefixed with this custom prefix would get removed during merge. This is however not available since Chef 11 so I’ll skip discussing it.

The idea I got is to workaround it by keeping the elements in string instead of an array, like this:

default['fooapp']['memcache'] = "memcache-a1:11211,memcache-a2:11211"

and in role:

"fooapp": {
    "memcache":  "memcache-b1:11211,memcache-b2:11211,memcache-b3:11211"

And then split them in a template:

<% node[:fooapp][:memcache].split(",").each do |m| %>

It’s less elegant than using array but it works.