This post may contain affiliate links, please read our affiliate disclosure to learn more.
How to Filter “access.log” Columns Using AWK

How to Filter “access.log” Columns Using AWK

Author
 By Charles Joseph | Cybersecurity Researcher
Clock
 Published on December 26th, 2023
This post was updated on February 29th, 2024

If you’ve administered a web server, you’ve undoubtedly encountered its access log. By default, nginx web servers maintain the log at “/var/log/nginx/access.log,” while Apache web servers maintain the log at “/etc/httpd/conf/httpd.conf.”

Access logs are typically configured to house information in a log format called “combined,” which consists of nine columns.

NordVPN 67% off + 3-month VPN coupon

Stay One Step Ahead of Cyber Threats

Want to Be the Smartest Guy in the Room? Get the Latest Cybersecurity News and Insights.
We respect your privacy and you can unsubscribe anytime.

Example of an “access.log” File

1.2.3.4 – – [1/Dec/2019:01:01:01 -0400] “GET / HTTP/1.1” 200 12345 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

“Combined”-Format “access.log” Column Headings:

1Remote address
2
3Remote user
4Local time
5Request
6HTTP status
7Bytes sent (body)
8HTTP referrer
9HTTP user agent

What’s interesting about the format is the lack of quotes around columns. Some of them have it and some don’t, which poses a challenge for us when we use a tool like “awk.” If you look closer at column 4 you’ll see brackets with an internal space surrounding the date.

How, then, can you use awk to get columns? By default, awk uses commas as separators. And the “access.log” file doesn’t have any.

The way to do this is to define the columns using the -vFPAT parameter using awk.

AWK Example Using FPAT

$ awk -vFPAT='[^ ]*|”[^”]*”|\\[[^]]*\\]’ ‘{ print $5 }’ access.log

FPAT stands for “field pattern,” and as you can see above, we’re defining columns using regex — a space, a double quote, or an enclosed bracket.

Topics of interest: built-in awk variables

QUOTE:
"Amateurs hack systems, professionals hack people."
-- Bruce Schneier, a renown computer security professional
Scroll to Top