1) Content spam
These techniques involve altering the logical view that a search
engine has over the page's contents. They all aim at variants of the
vector space model for information retrieval on text collections.
Keyword stuffing
Keyword stuffing involves the calculated placement of keywords within
a page to raise the keyword count, variety, and density of the page.
This is useful to make a page appear to be relevant for a
web crawler in a way that makes it more likely to be found. Example: A promoter of a
Ponzi scheme
wants to attract web surfers to a site where he advertises his scam. He
places hidden text appropriate for a fan page of a popular music group
on his page, hoping that the page will be listed as a fan site and
receive many visits from music lovers. Older versions of indexing
programs simply counted how often a keyword appeared, and used that to
determine relevance levels. Most modern search engines have the ability
to analyze a page for keyword stuffing and determine whether the
frequency is consistent with other sites created specifically to attract
search engine traffic. Also, large webpages are truncated, so that
massive dictionary lists cannot be indexed on a single webpage.
Hidden or invisible text
Unrelated
hidden text is disguised by making it the same color as the background, using a tiny font size, or hiding it within
HTML code such as "no frame" sections,
alt attributes, zero-sized
DIVs,
and "no script" sections. People screening websites for a search-engine
company might temporarily or permanently block an entire website for
having invisible text on some of its pages. However, hidden text is not
always spamdexing: it can also be used to enhance
accessibility.
Meta-tag stuffing
This involves repeating keywords in the
Meta tags, and using meta keywords that are unrelated to the site's content. This tactic has been ineffective since 2005.
Doorway pages
"Gateway" or
doorway pages
are low-quality web pages created with very little content but are
instead stuffed with very similar keywords and phrases. They are
designed to rank highly within the search results, but serve no purpose
to visitors looking for information. A doorway page will generally have
"click here to enter" on the page. In 2006, Google ousted
BMW for using "doorway pages" to the company's German site, BMW.de.
[7]
Scraper sites
Scraper sites
are created using various programs designed to "scrape" search-engine
results pages or other sources of content and create "content" for a
website.
[citation needed]
The specific presentation of content on these sites is unique, but is
merely an amalgamation of content taken from other sources, often
without permission. Such websites are generally full of advertising
(such as
pay-per-click ads
[8]),
or they redirect the user to other sites. It is even feasible for
scraper sites to outrank original websites for their own information and
organization names.
Article spinning
Article spinning
involves rewriting existing articles, as opposed to merely scraping
content from other sites, to avoid penalties imposed by search engines
for
duplicate content. This process is undertaken by hired writers or automated using a
thesaurus database or a
neural network.
2) Link spam
Link spam is defined as links between pages that are present for reasons other than merit.
[9] Link spam takes advantage of link-based ranking algorithms, which gives
websites
higher rankings the more other highly ranked websites link to it. These
techniques also aim at influencing other link-based ranking techniques
such as the
HITS algorithm.
Link-building software
A common form of link spam is the use of link-building software to automate the
search engine optimization process.
Link farms
Link farms are tightly-knit communities of pages referencing each other, also known facetiously as
mutual admiration societies.
[10]
Hidden links
Putting
hyperlinks where visitors will not see them to increase
link popularity. Highlighted link text can help rank a webpage higher for matching that phrase.
Sybil attack
A
Sybil attack is the forging of multiple identities for malicious intent, named after the famous
multiple personality disorder patient "Sybil" (
Shirley Ardell Mason). A spammer may create multiple web sites at different
domain names that all link to each other, such as fake blogs (known as spam blogs).
Spam blogs
Spam blogs
are blogs created solely for commercial promotion and the passage of
link authority to target sites. Often these "splogs" are designed in a
misleading manner that will give the effect of a legitimate website but
upon close inspection will often be written using spinning software or
very poorly written and barely readable content. They are similar in
nature to link farms.
Page hijacking
Page hijacking
is achieved by creating a rogue copy of a popular website which shows
contents similar to the original to a web crawler but redirects web
surfers to unrelated or malicious websites.
Buying expired domains
Some link spammers monitor DNS records for domains that will expire
soon, then buy them when they expire and replace the pages with links to
their pages.
See Domaining. However, Google resets the link data on expired domains. Some of these techniques may be applied for creating a
Google bomb — that is, to cooperate with other users to boost the ranking of a particular page for a particular query.
Cookie stuffing
Cookie stuffing involves placing an
affiliate
tracking cookie on a website visitor's computer without their
knowledge, which will then generate revenue for the person doing the
cookie stuffing. This not only generates fraudulent affiliate sales, but
also has the potential to overwrite other affiliates' cookies,
essentially stealing their legitimately earned commissions.
Using world-writable pages
Web sites that can be edited by users can be used by spamdexers to
insert links to spam sites if the appropriate anti-spam measures are not
taken.
Automated
spambots can rapidly make the user-editable portion of a site unusable. Programmers have developed a variety of automated
spam prevention techniques to block or at least slow down spambots.
Spam in blogs
Spam in blogs
is the placing or solicitation of links randomly on other sites,
placing a desired keyword into the hyperlinked text of the inbound link.
Guest books, forums, blogs, and any site that accepts visitors'
comments are particular targets and are often victims of drive-by
spamming where automated software creates nonsense posts with links that
are usually irrelevant and unwanted. Many of the blogs like, Wordpress
or Blogger, make their comments sections nofollow by default due to
concerns over spam.
[citation needed]
Comment spam is a form of link spam that has arisen in web pages that allow dynamic user editing such as
wikis,
blogs, and
guestbooks. It can be problematic because
agents can be written that automatically randomly select a user edited web page, such as a Wikipedia article, and add spamming links.
[11]
Wiki spam
Wiki spam is a form of link spam on wiki pages. The spammer uses the open editability of
wiki
systems to place links from the wiki site to the spam site. The subject
of the spam site is often unrelated to the wiki page where the link is
added. In early 2005,
Wikipedia implemented a default "
nofollow" value for the "rel" HTML attribute. Links with this attribute are ignored by Google's
PageRank algorithm. Forum and Wiki admins can use these to discourage Wiki spam.
Referrer log spamming
Referrer spam takes place when a spam perpetrator or facilitator accesses a
web page (the
referee), by following a link from another web page (the
referrer), so that the referee is given the address of the referrer by the person's Internet browser. Some
websites have a referrer log which shows which pages link to that site. By having a
robot
randomly access many sites enough times, with a message or specific
address given as the referrer, that message or Internet address then
appears in the referrer log of those sites that have referrer logs.
Since some
Web search engines
base the importance of sites on the number of different sites linking
to them, referrer-log spam may increase the search engine rankings of
the spammer's sites. Also, site administrators who notice the referrer
log entries in their logs may follow the link back to the spammer's
referrer page.
Other types of spamdexing
Mirror websites
A
mirror site is the hosting of multiple websites with conceptually similar content but using different
URLs. Some search engines give a higher rank to results where the keyword searched for appears in the URL.
URL redirection
URL redirection is the taking of the user to another page without his or her intervention,
e.g., using
META refresh tags,
Flash,
JavaScript,
Java or
Server side redirects. However,
301 Redirect or permanent redirect is not considered as a malicious behaviour.
Cloaking
Cloaking refers to any of several means to serve a page to the search-engine
spider
that is different from that seen by human users. It can be an attempt
to mislead search engines regarding the content on a particular web
site. Cloaking, however, can also be used to ethically increase
accessibility of a site to users with disabilities or provide human
users with content that search engines aren't able to process or parse.
It is also used to deliver content based on a user's location; Google
itself uses
IP delivery, a form of cloaking, to deliver results. Another form of cloaking is
code swapping,
i.e., optimizing a page for top ranking and then swapping another page in its place once a top ranking is achieved.