In their infancy, bots were innocent legitimate codes used for predicting fluctuations of the stock market, which gave financial traders a statistical and mathematical edge on trading. However, since the late 90’s, bots have matured and now dominate the web for legitimate and malicious use for hackers, black hat SEOs, and DDoS attackers.
Bots are tools, and as any tool they can be used for positive or for negative activities. While some bots serve as a vital part of search engines and ease our access to information, other do the exact opposite by spamming our websites with misleading link and fraudulent comments.
One way or another, today bot traffic dominates the web. A recent study by website security company, Incapsula, showed that today bots account for over 61.5% of all website’s traffic.
However, as Incapsula’s study showed, the rapid growth in those numbers comes from the side of so called “Good Bots”, not from their negative counterparts. Perhaps most interestingly, Incapsula notes a steep 75% decrease in spam bots’ activity, which they contribute to Google’s ongoing anti-spam campaign. Understanding this claim and Google’s relation to spam bot activity requires prior knowledge of bots’ nature and different classifications, as well as of Google’s motives and recent changes to its algorithms.
Internet’s Non-Human Majority
Bots are agents of computer software, which could be used to execute repetitive automated task, such as web crawling, content scraping, link building and etc. Generally speaking all Bots could be assigned to one of two categories:
- Legitimate – agents of search engines and other legitimate software (Alexa bot, Google bot, etc)
- Malicious – agents of hacker software (spam bots, vulnerability scanners, DDoS bots, etc)
The latter group, which consists of agents of malware (malicious software) also referred to as a:
The malicious activities such bots can be used for:
- Harvesting email addresses
- Spamming in contact forms, guest registrations, or comments.
- Scraping content of web sites and re-using it without permission
- Injecting websites with viruses and worms
- Executing Application Layer DDoS attacks
- Breaking into password pages
Bots and Search Engines
Both of these groups contain bots that were built to manipulate search engine results, either by providing these search engines with updated information or by trying to fool their relevancy-based ranking algorithms.
Legitimate bots – in context of search engine, these are the spiders that crawl websites and their servers. The crawl is an automated script by search engines guided by a file called robots.txt, which contains guidelines or suggestions on what is to be allowed or disallowed for indexing or for content.
The main goals of these legitimate bots are to provide search engines and visitors with the highest quality result in the search. In addition to ranking a website, detect spam and evaluating link quality, caching information and analysis for the SERP.
Malicious bots – Some of these are built to manipulate search engine with various link scams, which are usually revolve around link spamming. Other bots will try to take advantage of websites with quality content by scraping their text and creating a paraphrased duplicate content, in hopes of gaining some search engine positioning. These are just some examples but, in the end, the goal of any SEO-oriented malicious bot is to increase one websites ranking at the expense of others, by filling them with spam, by duplicating their content or by other means, which may even include malware injection and redirects manipulation. For the target, the implication of such methods will vary but the most common worst-case scenario is an SEO penalizing which will drop the website from Google or, at the very least, set your SEO efforts back for few months.
Google sets the Zoo free on spambots
Twenty thirteen was a game changer research by SEO experts Moz showed that, in 2013 alone; Google had released 15 minor and major alterations to their algorithm in an attempt to improve the quality of search results. Google has indeed unleashed the zoo. Google’s Penguin algorithm specially dealt with those who were well ranked and achieved those rankings through link farming, paid spam and black hat SEO.
Below is a brief snapshot of the Google’s latest algorithm changes:
Panda aimed at poor content to improve on quality of search results.
Penguin’s focus was spamdexing, poor links and link farming. The focal point was to target sites that are deemed to be spamming Google’s search results and still ranking well
Hummingbird is a complete change in the search algorithm combining its two predecessors whereas as Penguin and Panda dealt with specific parts the change was a total overhaul to the alogrithym. The goal is to provide better quality search results and make the search more “conversational”.
With these algorithm changes Google’s spam team definitively changed the playing field on how to deal with websites affecting their search results. This aggressive attack on hackers, spammers and black hat SEO has left everyone exposed.
Statistics now show that the Penguin updates have made significant improvement on the quality of search results. Although bot traffic has increased, the figure of 0.5% spam noted in the Incapsula report is actually a reduction in spam traffic compared to 2012 peaking at 2% of overall traffic. Effectively, this represents a very significant 75% reduction. It might be too early to tell but, based on these facts, it looks like Google is winning its war on spam.