In modern web stacks, your website sits behind some front end cache and/or a CDN. You collect logs from your backend servers (apache, nginx, etc) and track visits to the page with Google Analytics or similar tools, but you might be missing visibility from your cache or you may not yet be shipping cache logs to your ELK stack. If either is the case, read on.
There is not many CDNs that enable you to collect logs directly from edge cache (for example Fastly enables you to do so). Check with your CDN provider how to get the access logs shipped to your ELK. If your provider suck with that, maybe you can consider choosing better CDN. Your border cache Varnish servers seem to be good place in typical web stack, for access log collection. Every request (or every cache miss on CDN) goes through it, so the visibility is better than on the backend servers where only requests that were cache miss on CDN and border Varnish will be recorded. So how to do it?
How to ship logs from Varnish to Logstash
I want logs structured in JSON format, to have them easily digested by Logstash. No grok to parse classic NCSA formatted log lines. I couldn’t find a way to achieve that using std.syslog VMOD, but I found it quite easily doable using varnishncsa. Varnishncsa does not directly support writing logs in JSON, but it’s easy to customize the format. Here’s example init config that will make varnishncsa talk JSON to Logstash.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | env LOG_FORMAT='{ "@timestamp": "%{%Y-%m-%dT%H:%M:%S%z}t", "remoteip":"%h", "xforwardedfor":"%{X-Forwarded-For}i", "method":"%m", "url":"%U", "request":"%r", "httpversion":"%H", "status": %s, "bytes": %b, "referrer":"%{Referer}i", "agent":"%{User-agent}i", "berespms" : "%{Varnish:time_firstbyte}x", "duration_usec": %D, "cache" : "%{Varnish:handling}x", "served_by":"%{Served-By}o", "accept_encoding":"%{Accept-Encoding}i", "xf_proto":"%{X-Forwarded-Proto}i" }' start on started varnish stop on stopped varnish respawn exec /usr/bin/varnishncsa -a -F "$LOG_FORMAT" | sed -e 's/"\(bytes\|status\|berespms\|duration_usec\)": -/"\1": 0/g' | /usr/bin/logger -p local0.info -t varnish |
Varnishncsa will only write to file or to std out, so we have to pipe logs from std out, to logger which will deliver them to local syslog (rsyslogd, syslog-ng etc) from where they will be shipped to central log server. Note the sed search and replace trick. This is to prevent messing up field mapping in Elastic Search by sending strings (which happens when there’s “-” printed for the missing value) to fields that are meant to be numerical.
Note, that you can add any header to the log, including headers you add in Varnish VCL, so for example you can attach specific cookie value (session cookie, AB-testing cookie etc) to log entries.
In the example above, I also log TTLB (time to last byte) – duration_usec, and backend response time – berespms. Having such performance stats collected from border cache helps identify web performance bottlenecks better.