Infopost | 2015.03.22

A few months ago Steve and I noted that our respective sites were getting tons of hits from Samara Oblast, an obscure(?) territory in Russia. Russian search engine maybe? Cybercriminals? Proxy for the American or Chinese or Syrian electronic armies? Who really cares? Only port 80 should be open and doing nothing fancy

But since this kilroy thing has gotten pretty lengthy I was scoping the possibility of doing some sort of 'top content' thing based on hits. So I pulled my server logs and was looking through them to see how hard it'd be to parse.
Attack surface

Missile command screenshot

Well this is fun:

91.200.13.119 "GET /kilroy/archive/2008/04/index.html HTTP/1.0"...
91.200.13.119 "GET /kilroy/2008/01/leader-board-r.html HTTP/1.0"...
91.200.13.119 "GET /kilroy/2008/01/index.php HTTP/1.0"...
91.200.13.119 "GET /2008/01/index.php HTTP/1.0"...
91.200.13.119 "GET /kilroy/2008/01/index.php HTTP/1.0"...
91.200.13.119 "GET /2008/01/index.php HTTP/1.0"...
91.200.13.119 "GET /kilroy/2008/01/index.php HTTP/1.0"...
91.200.13.119 "GET /2008/01/index.php HTTP/1.0"...

How am I going to count hits for 2008/01/index.php when there is no anything.php?

Eight sequential hits from the same person, within 10 seconds. That's what I call quick on the mouse. Whois says it's from Ukraine. I'm going to stop me right here, this is my first time actually looking at http traffic, this is old hat to 80% of the world. Okay, let's continue.
Maybe they're just guessing about site map, but probably they're looking to have some fun with php.

Another interesting one:

POST /cgi-bin/php?
%2D%64+%61%6C%6C%6F%77%5F%75%72%6C%5F%69%6E%63%6C%75%64%65
%3D%6F%6E+%2D%64+%73%61%66%65%5F%6D%6F%64%65%3D%6F%66%66+%2D%64+%73%75%68%6
F
%73%69%6E%2E%73%69%6D%75%6C%61%74%69%6F%6E%3D%6F%6E+%2D%64+%64%69%73%61%62
%6C%65%5F%66%75%6E%63%74%69%6F%6E%73%3D%22%22+%2D%64+%6F%70%65%6E%5F%62%61
%73%65%64%69%72%3D%6E%6F%6E%65+%2D%64+%61%75%74%6F%5F%70%72%65%70%65%6E%64
%5F%66%69%6C%65%3D%70%68%70%3A%2F%2F%69%6E%70%75%74+%2D%64+%63%67%69%2E%66
%6F%72%63%65%5F%72%65%64%69%72%65%63%74%3D%30+%2D%64+%63%67%69%2E%72%65%64
%69%72%65%63%74%5F%73%74%61%74%75%73%5F%65%6E%76%3D%30+%2D%6E HTTP/1.1

Looking to do injection or overflow or something? Not really my wheelhouse, but it was kind of a fun digression.
Classifier

So I wrote some code to classify site traffic into one of the following categories:
Some of it was pretty easy, bots tend to declare themselves in the user agent string and hit robots.txt first. Malicious stuff sends PUTs and looks for files that aren't .html/.jpg/etc. And, of course, sequential traffic from the same IP can be classified together. This is important because an attack might hit numerous legit links but it's not visit traffic.
Data

Logs go back about a year. Here's some excel because easy.

Classification of web site hits

I get indexed about twice as much as I get visited. There have been more than 20,000 malicious http requests.

Web site bot hits histogram

Google, Baidu, and Majestic 12 (a distributed indexing project) turned up most. But there are quite a few bots out there.

So the top visited content, the main reason for this whole endeavor:

Pages
Images
Labels - which are now just links to search
Data skew: some content has been around longer. On the other hand, the logs are only from about a year back.

When I get some more fun-coding time I'll see about putting this in the sidebar.




2015.03.22

Completion

Final thoughts on Far Cry 4, including the map editor and experiments with C4. Kafka destroys a cognac box.
2015.04.01

Class

A dog photoshoot and some other stuff.


Related / internal

Some posts from this site with similar content.

Post
2010.01.10

Favorite photos, 2010-2019

My favorite photo memories from this decade.
Post
2024.03.10

Moon of Vega

Helldivers 2 crossplay woes and combing the subsurface web.
Post
2014.01.01

Popular

A gallery of my most popular photos, updated regularly.

Related / external

Risky click advisory: these links are produced algorithmically from a crawl of the subsurface web (and some select mainstream web). I haven't personally looked at them or checked them for quality, decency, or sanity. None of these links are promoted, sponsored, or affiliated with this site. For more information, see this post.

sensepost.com

SensePost | Blog | Coding

Leaders in Information Security
krebsonsecurity.com

The Democratization of Censorship - Krebs on Security

blog.andrewcantino.com

An Example of Poor Security Communication in the Google Auth Flow ยท andrew makes things

Responsible Disclosure

Created 2024.10 from an index of 431,220 pages.