If you’ve administered a web server, you’ve undoubtedly encountered its access log. By default, nginx web servers maintain the log at “/var/log/nginx/access.log,” while Apache web servers maintain the log at “/etc/httpd/conf/httpd.conf.”
Access logs are typically configured to house information in a log format called “combined,” which consists of nine columns.
Stay One Step Ahead of Cyber Threats
Example of an “access.log” File
1.2.3.4 – – [1/Dec/2019:01:01:01 -0400] “GET / HTTP/1.1” 200 12345 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
“Combined”-Format “access.log” Column Headings:
1 | Remote address |
2 | – |
3 | Remote user |
4 | Local time |
5 | Request |
6 | HTTP status |
7 | Bytes sent (body) |
8 | HTTP referrer |
9 | HTTP user agent |
What’s interesting about the format is the lack of quotes around columns. Some of them have it and some don’t, which poses a challenge for us when we use a tool like “awk.” If you look closer at column 4 you’ll see brackets with an internal space surrounding the date.
How, then, can you use awk to get columns? By default, awk uses commas as separators. And the “access.log” file doesn’t have any.
The way to do this is to define the columns using the -vFPAT parameter using awk.
AWK Example Using FPAT
$ awk -vFPAT='[^ ]*|”[^”]*”|\\[[^]]*\\]’ ‘{ print $5 }’ access.log
FPAT stands for “field pattern,” and as you can see above, we’re defining columns using regex — a space, a double quote, or an enclosed bracket.
Topics of interest: built-in awk variables
"Amateurs hack systems, professionals hack people."
-- Bruce Schneier, a renown computer security professional