geeky

my idea to fight referral spam

today i was trying to remove the referral spams i’ve been getting. i found there are so many that it will be way too time consuming to remove them manually. while i do not want to completely rid of my referral script, i can no longer stand all the horrible porno referral spams i’ve gotten. i’ve been thinking about an efficient script to prevent them. many people have already talked about different ways to fight them. [referral spam removal] but i just think making a block list is too tedious. so i thought of a different idea. i will share this idea and some php code with you but this might or might not be what you want to do. the appoarch will increase your bandwidth. i installed it today so i don’t know how much bandwidth it will eat up. we will see.

the reason the referral links are spams is that the offender uses bot or crawler to request your webpage. they do not have your site linked on their site. so one way to test if a refering page is a spam is by viewing the source of the refering page and try to find your website address in the source code. if that page does not contain one, it means it’s spam or sites you can not access like someone’s email account etc. in order to view the page’s source, you must first read in the page’s contents and then perform the test. the following code will perform this task.

WARNING! – the code snippet is very generalized. it is written to demonstrate my idea. if you have basic programming knowledge, you should be able to use it to hack any referral scripts you use. otherwise it’s probably not too helpful to you.

$URL = getenv(‘HTTP_REFERER’); // the referring page

// read the site url
$handle = @fopen($URL, “r”);
$contents = “”;
do {
$data = @fread($handle, 8192);
if (strlen($data) == 0) {
break;
}
$contents .= $data;
} while (true);
@fclose($handle);

the variable “contents” now contains the page’s source code. here’s a simple if then statement that logs the referral. if the source contains a link to the page it requests, log it, otherwise log it in the ban list.

$currentURL = $_SERVER[‘REQUEST_URI’]; // requested page
// find base domain
$anchor = preg_replace(“/http:\/\//i”, “”, $URL);
$anchor = preg_replace(“/^www\./i”, “”, $anchor);
$anchor = preg_replace(“/\/.*/i”, “”, $anchor);
if(strstr($contents, $currentURL)){
// your code for logging the referral
} else {
@mysql_query(“INSERT INTO referer_banlist SET ban_string = ‘$anchor'”);
}

the base domain of the offending site is now logged in your database table referer_banlist. my table structure is like below:

CREATE TABLE referer_banlist (
ID int(11) NOT NULL auto_increment,
ban_string varchar(250) default NULL,
PRIMARY KEY (ID),
UNIQUE KEY STRING (ban_string)
) TYPE=MyISAM;

then i hope your referral script has an array variable of all the sites the script should avoid logging. assign the variable with all that’s logged in your referer_banlist with something like:

$ban_list = mysql_query(“SELECT ban_string FROM referer_banlist”);
while($dodos_ban_list = mysql_fetch_array($ban_list)) {
$ignore[] = $dodos_ban_list[‘ban_string’];
}

this way your site will not have to read in the refering page’s source code again if it’s already labeled as spam. it just simply ignores it.

i hope this is helpful to some people. again my idea is to avoid any manual block list making for referral spam. the trade off is bandwidth.

8 thoughts on “my idea to fight referral spam

  1. yeah, manual block lists are a pain =/ luckily i haven’t gotten any referral spams lately… does it help if you block googlebot from archiving your domain?

  2. I haven’t had many problems with referral spam. Then again, I don’t get near as many visitors as you do. I always check my referrals using my extreme tracker 😀 Good luck with stopping the spam! I have to conserve space on my server… yesterday I realized that I had used 95% of my disk space 😦 hehe

  3. Good luck getting rid of the spam o.0 I think I’ve read some way to get rid of it somewhere but I can’t remember where. I’ll have to look.

    (I just noticed one of your themes has Morning Musume girls! Weee)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s