Religious Nonprofit Organization
Goal:
Meet Your Audience
Data from Incapusla
Bots have made up the vast majority of web traffic, up until 2015.
Year - Bot Traffic
2012 - 51%
2013 - 61.5%
2014 - 56%
2015 - 48.5%
===
Current 2015 Breakdown
Humans - 51.5%
Bad Bots - 29.0%
Good Bots - 19.5%
How To Manually Block A Bot (don’t do this)
Each page has a .htaccess file. This file controls what IPs can and cannot view your site. You can ban by IP or by certain user-strings.
I applied a blanket IP ban to all possible Russian IPs.
The bots just changed their IP addresses.
(These bots could be using proxies, or could have hijacked other people’s computers...honestly, it didn’t really matter how they came back.
I could also block by User-Agent, but again, that’s even more trivial to change.)
Example of .htaccess
order allow,deny�deny from 123.45.6.7�deny from 012.34.5�allow from all
RewriteEngine On �RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] �RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR] �RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] �RewriteRule ^.* - [F,L]�
Bots Are Becoming More Advanced
They can use JavaScript (Google Analytics uses JavaScript to track visitors).
They can perform the same interactions as Humans can (filling out forms and pressing buttons).
They can break less-advanced CAPTCHAs (Completely Automated Turing Test To Tell Computers and Humans Apart) using OCR. Or they can rely on humans to break the CAPTCHAs for them.
They can be identified, but it requires people to pay attention to subtle clues (such as bounce rates).
And even if you do decide to identify and block them, they can just come back.
Bots Are Becoming More Advanced
To truly deal with the bot threat, you need to automate away the process of “discovery”. You have to algorithmically determine whether a person is a human or a bot, and to algorithmically update your “block” list.
Not only does it take some time to ‘set-up’, but you’ll probably going to get some errors...
Of course, you can always pay a software firm to detect and block the bots for you, but where’s the fun in that?
My “Solution”?
I was targeting a specific audience, the community of Texas.
I assumed that the bots weren’t specifically pretending to be Texans (but instead randomly deciding their impersonations), so I simply filtered out all non-Texan results from my Google Analytics.
If a few ‘bot visits’ got through, I’ll just accept the ‘dirty data’.
This works, except there’s no way it would be a scalable solution if I was running a more “international” website. (For example, I have a personal website that is intended to advertise myself, and not targeted to a geographic region. I can’t just filter out every person who is not from Nebraska.)
Our Schedule
Types of Bots
The Bots
“Good” Bots
“Bad” Bots
Good Bots (3 slides)
The “Good Bots”
Website Scan Tools analyze websites and make reports to their users. (Example: responsiveness, text summaries, keywords, etc.)
Crawlers visit a website, and store information on them. Then, they click on links located on those websites, and then keep on “crawling” those sites.
Examples of Crawlers:
Googlebot (picture from Google’s Search Engine Optimization Starter Guide, written by Google)
Dealing with Crawlers
Crawlers are interested in gathering websites on all sites, not just the most popular ones.
Dealing with Website Scan Tools
You were the one that authorized the scan, right?
Bad Bots (14 slides)
Scrapers
Scrapers load the HTML file directly and then parse it to find valuable information, to then transmit back to their human master. Scrapers tend to search for:
Many people build APIs to discourage the use of scrapers (since scrapers can hurt a website’s performance).
Dealing with Scrapers (Email)
Make your email easy for a human to read, but difficult for a robot to copy. There are many ways of doing this, from least-effective to most-effective:
Dealing with Scrapers (Content, Prices)
DDoS Bots
If too many people visit your website at once, your server shuts down. (A human version of this effect is sometimes called “Slashdotting”)...
Why are people so curious about shutting down your website via DDoS?
50% of all bad bot traffic comes from DDoS Bots (“impersonators” according to Incapsula). Script kiddies, DDoS-for-hires, state actors...they all can do DDoSes.
Dealing with DDoS Bots
Would require a dedicated team of people who can intercept traffic and identify whether it is ‘legitimate’ or not, before allowing them to proceed…
Cloudflare and Incapsula may be examples of firms that can help analyze web traffic and limit DDoS attacks. (Note that I never actually tried them, so I don’t know whether they are actually good. Sorry.)
You could also hope that you don’t offend any possible DDoS attacker (though DDoS attacks are fairly cheap to purchase online), and then clean up the mess afterwards.
Hacking Tools
They scan your website, trying to find possible vulnerabilities that can then be exploited by a hacker.
Things They May Attempt To Do:
In March 2015, pro-ISIS hackers scanned Wordpress websites to find anyone that was using a “FancyBox for Wordpress” plugin. It then injected an iframe into vulnerable websites. (This vulnerability was already discovered and patched a month before.)
Dealing with Hacking Tools
As long as you keep ‘up-to-date’ with current security practices, you should be safe. Keep your CMS/framework updated at all times, and you should be “safe” (although this doesn’t protect you against ‘zero-day’ attacks).
It can take a long time for a bot to bruteforce a password that is is randomly-generated by a password manager, for instance. It’s usually easier for the bot to just give up and move onto a more appeasing target.
Ad Bots
They visit websites and interact with ads (clicking on display ads or watching video ads). They have to continually improve their tactics to avoid detection by ad agencies and appear to be human.
According to White Ops. Inc., global advertisers lost $6.3 billion dollars this year to these bots (out of a total $40 billion spent on ad spending).
Dealing with Ad Bots
If you are an unethical content publisher that relies on ad revenue, you’ll love Ad Bots. You may even be running the bots (or hiring someone else to run those bots). However, advertisers will stop working with you once they find out.
If you are paying for the ads, be suspicious. Engage in continuous fraud monitoring, determine where the traffic is coming from at all times, use better metrics (NOT clicks and views), prepare to spend more money for ‘real’ traffic, and be careful about using “third-party traffic brokers” and programmatic ad buying.
Some clever traffic brokers are able to sell advertisers a mixture of ‘real’ traffic, ‘bot’ traffic, and ‘incentivized’ traffic (paying people in developing countries to see your site).
Spammers
Their goal is to post more and more content on other people’s websites...content that people don’t want.
The hope of these spammers is to post backlinks to their websites. Of course, a human may click on them, but these backlinks are really meant for search engines. (Their hope is that GoogleBot will see those backlinks and then rank their websites higher in search results.)
Spam is decreasing due to Google’s attempts to penalize spam backlinks.
Comment Spam
Dealing with Spam
Project Honeypot
What is a Honeypot?
An attempt to misdirect a malicious actor into doing something that reveals it to be malicious. We are offering ‘honey’ to them, to bait a robot into revealing themselves.
Robots are good at filling out forms with arbitrary data (hence how spamming works). So why not take advantage of that by giving them a form field that only they can fill out?
A Basic Honeypot
<input id="real_email" type="text" name="real_email" size="25"value="" />�<input id="test_email" type="text" name="email" size="25" value=""/>�<style>� #test_email {
display: none;�}
</style>
If a user inputs in a value for “test_email”, then we assume that the user must not be a human. We can then discard the results of the form.
SOURCE: https://solutionfactor.net/blog/2014/02/01/honeypot-technique-fast-easy-spam-prevention/
A More Complex Honeypot
Monitoring the Robot
Knowing the threat level of a robot can be helpful for those who want to avoid ‘false positives’.
Just because a robot is using an infected computer right now doesn’t mean that a human won’t use the infected computer in the future.
At the same time, a robot can pose an immediate threat (sending lots of spam messages) that could merit a temporary ban of one month. Or permanently.
I personally prefer having a low tolerance of “threat”, and can accept ‘false positives’.
Bots That “Project Honeypot” Targets
Does not deal with DDoS bots, Hacking Tools, or Ad Bots
HTTP:BL
The real glory of Project Honeypot is that it is a “distributed system”; it gathers information about robots by using a variety of different honeypots throughout the Internet. The data it acquired is accessible to webmasters, so they can programmatically look at IP addresses and determine whether a user is ‘suspicious’ enough to warrant blocking.
However, to access HTTP:BL, you need an API Key and “active” status. To become “active”, you must make an account and then either install a honeypot, add a QuickLink to a honeypot, donate a “MX entry” (the ‘fake’ email addresses for the honeypots), or refer friends to join.�Example of Data Collected on an IP Address: https://www.projecthoneypot.org/ip_91.200.12.7�
HTTP:BL
Examples of Open Source Projects That Connect To HTTP:BL:
WordPress Plugin - https://github.com/WP-http-BL/httpbl
(Currently removed from Wordpress directory due to a security vulnerability being discovered, and nobody actually maintaining the plugin)
Ruby Gem - https://github.com/cmaxw/project-honeypot
Note that you must have a API Key from Project Honeypot to be able to use HTTP:BL.
Rack::Attack
Rack::Attack
Ruby gem created by “Kickstarter” (yes, the company).
Intended against Scrapers and Brute Force attackers, as the company has to spend money providing ‘service’ to these robots (and would rather want to save that money by blocking the bad traffic instead).
“Example Configuration”, from https://github.com/kickstarter/rack-attack/wiki/Example-Configuration
class Rack::Attack
# Handle “Repeated Requests”� # Throttle all requests by IP (60rpm)� # Key: "rack::attack:#{Time.now.to_i/:period}:req/ip:#{req.ip}"
throttle('req/ip', :limit => 300, :period => 5.minutes) do |req|� req.ip # unless req.path.start_with?('/assets')� end
# Stopping “Brute Force” Login Attempts
throttle('logins/ip', :limit => 5, :period => 20.seconds) do |req|� if req.path == '/login' && req.post?� req.ip� end
throttle("logins/email", :limit => 5, :period => 20.seconds) do |req|� if req.path == '/login' && req.post?� # return the email if present, nil otherwise� req.params['email'].presence� end� end
end
Is That Enough?
“When the smartest bad guys figure out how to fool you, they don't tell you you're beaten. What you see instead looks like victory: fraud numbers going down! Then you're only beating the dumb crooks with no sense of what you're missing. This is a game where losing can actually look like winning. So the top action item is to reject complacency.”---Michael Tiffany, CEO of White Ops
Source: http://www.adweek.com/news/advertising-branding/whats-being-done-rein-7-billion-ad-fraud-169743�
Should you trust anything any more?
Comment is very generic...could have applied to any blog post.
Fra Stra also favorited this post as well…
There is no innocence, only degrees of guilt.--Warhammer 40K: Dawn of War