May 6, 2008
On-Site SEO Best Practices
How to Bake Tasty Googlebot Snacks
Search engine optimization (SEO) can be defined in two categories: on-site, and off-site. On-site is the kind of SEO that is directly under your control through your web site files or CMS. Whereas off-site optimization takes place on other web sites. This article deals primarily with the on-site portion of SEO; that which is usually handled by site designers and developers. But I think it’s important to define both types in more detail for the context of this article.
Off-site SEO defined
Off-site SEO deals primarily with getting quality links to your web site. If you have a great site, then this happens naturally over time as other sites link to it. At least, that’s the way Google would prefer it. Off-site SEO can also be described as link-building, whereby you seek out and in some cases pay for quality links. The benefit you are trying to derive from such links is not to gain clicks, but to gain PageRank. Google would rather you didn’t do this, as it is intended to artificially boost your ranking beyond what happens naturally. But considering the importance of links, it’s not surprising that this is currently a common—and hard to ignore—practice. There is a large amount of grey area as to what is legitimate with off-site SEO. I usually tell my clients to seek out links that are beneficial for both PageRank and clicks, so as to minimize loss of investment should Google decide the link doesn’t conform to their guidelines.
On-site SEO defined
This article deals with the on-site portion of SEO and describes the process by which we produce a site that is optimal for indexing by search engines. On-site SEO is the type of SEO that rests primarily in the hands of site designer and developer … it is directly related to how the site is produced. On-site SEO might better be defined as search engine accessibility. Making your site search engine accessible has a lot of crossover with making it accessible to text readers, mobile devices, and so on. On-site is also a prerequisite for off-site SEO. There’s little value in doing off-site SEO if you haven’t handled the on-site portion first. If best practices aren’t followed, your site may be inaccessible to search engines (at worst), or less than optimal (at best). If done correctly, your site will be highly accessible to search engines and take full advantage the resulting benefits, especially those gained by off-site SEO. That’s our goal with sites we produce, and below are the best practices that we try emphasize with on-site search optimization.
1. Quality copywriting
Your text is what gets indexed by the search engine, and what ultimately makes the connection between users’ search terms and your web site. When writing copy, it’s helpful to use words and phrases that parallel the terms that users would search for, but only if it also makes sense in the context of your copy. Just remember that search engines can be pretty literal with regards to copy. Write for your audience first, but do so with an understanding of what connects you with that audience. Be specific with your terminology where possible, without being redundant.
Example
I often come across a client’s articles that use the term design school, yet are really referring to architecture school. The phrase design school is entirely too vague. If someone searched on Google for design school, then they will no doubt be exposed to art schools, graphic design schools, and so on. Logic dictates that the person searching is going to be more specific with their terminology … you should too. That doesn’t mean changing every instance of design school to the more specific architecture school. Rather, you should use the more specific version in headlines, or in the first instance that you need it. Alternatively, you should simply use a combination of both, where it best suits the copy.
2. Proper markup and keyword/phrase placement
Once you have quality copy, you need to make sure that it’s being utilized in a manner that’s useful in the context of markup. To put it another way, you need to make sure that you are using HTML/XHTML in the way that it’s intended. This is done by properly marking page titles, headlines, subheads, body copy, bulleted lists, and so on. Think of Google as a blind person. It has no visual point of reference. The only way for search engines to know the relative importance of the items on your page is for them to be marked as such with HTML/XHTML tags.
Like you’d probably assume, text placed in headlines is going to carry more weight than regular body copy. And text placed higher up on the page is likewise going to carry a little more weight. Look at it from Google’s perspective — if you haven’t used a particular phrase in an important spot, then it’s probably not a primary theme of your page. Though, if you are dealing with terminology that’s very specific (and not competitive), then it probably doesn’t matter all that much.
Just because something looks like a headline, doesn’t necessarily mean that it is. Stylesheets and inline styles allow designers/developers to pull HTML tags visually outside of the context they represent. A tag intended for paragraphs can be made to look like a headline. The more this is done, the less useful the page becomes to anyone or anything without a visual point of reference (like search engines). So it’s important that markup is used properly. But this is easier said than done … most WYSIWYG editors I’ve come across do not produce quality markup. Instead they often rely upon inline CSS styles to make syntactically generic DIV elements do the heavy lifting. If you want truly good markup, you need to hand code it.
To complete our headline example, it should be contained in one of the HTML headline tags: h1, h2, h3, h4, h5, h6, where h1 is most important, and h6 is least important. h1, h2 and h3 tags serve the needs of most articles on this site.
Related Tool
While it won’t tell if your markup is syntactically correct to search engines, W3C’s XHTML validation tool is a very handy tool for identifying common mistakes and issues that could interfere with a search engine indexing your page. The validation tool will point out more issues than you necessarily need it to. Look for fatal flaws, like unclosed tags or the like. But non-entity-encoded ampersands or other minor issues probably aren’t going to really matter to the search engines.
3. Using your <title> tags properly
Related to all the items above, the <title> tag, appearing in the <head> section of an HTML/XHTML document, is of the highest importance. The text placed in this tag is weighted highest by Google — even more-so than headlines. This is also the short bit of text that appears as the link for users to click on in search results.
While the technical importance of the title tag would lead you to think this is the place to shove a lot of keywords, that would be counter to getting people to click on it. Who wants to click on a web page that looks like spam? As you can see, there are two conflicting challenges here. That is precisely why this tag is so important to Google, and why it cannot be misused or ignored. Google is understandably strict about the title tag and expects you to use it properly.
You only have a sentence worth of characters to work with, at most. The words used here have to accurately represent what’s on the page, but do so in a way that uses the key phrase (or phrases) users would search for. There is little value in attempting to utilize more than one or two key phrases here. Singular versus plural with keywords should be considered; i.e. if the page is about a specific widget, then don’t use the plural “widgets”.
It’s common to see a company name here — carefully consider whether that is worthwhile, as it very often isn’t. Placing your company name in the title tag is akin to running a full-page magazine ad with a giant graphic of your logo. Is that good marketing? Is that a call to action? For most companies, probably not. It might make sense on your homepage, but not your inside pages. If you have to use your company name, place it at the end of the title tag, not the beginning.
Coming up with the right title tag in a competitive area is an art form. It can help to see what your competition is doing. You will likely see that the most success is obtained when the title tag describes the contents of the page, and the page’s content overlaps with the terms you’ve searched for. Simple but specific title tags seem to work best.
What I do
For a large site, it might not be feasible to make the perfect title tag for every page. My advice is to focus your efforts on the core pages in your site: the homepage and first level of subpages (assuming they are the most important for search traffic). For pages that you don’t consider as important, make your title tags consistent with your primary (h1) headline. While this may not maximize the title tag’s potential, a page’s headline is likely going to represent the page better than any other pre-existing fields of copy on your page. While individual attention to a title tag is still preferable, this is a good way to go if you need something automated.
This is what we do with our CMS: a page’s title tag mirrors the page’s headline automatically. Then the person making edits has the option of tweaking the title tag to make it better. We recommend following the same strategy with your title tags.
4. Using the meta description tag, where appropriate
The meta description tag has value in a traditional marketing perspective. It should be differentiated from the meta keywords tag, which has no value (to the big search engines). The meta description is a short description, or summary of your page. When used properly, there is a good chance that this will appear as the summary text (right below your title) on search result pages. As a result, the meta description tag can be the deciding factor in whether or not a user decides to click your link in the search results.
It should be noted that this tag has no value in ranking your page, only in communicating to the user. One or two short sentences is plenty. While there’s no harm in it being too long, Google will truncate it to fit the space. You need not place keywords here unless it makes sense to do so within the context of your summary.
There is no guarantee that your meta description tag will be used by the search engine. Google may decide to create it’s own summary if it doesn’t feel yours is relevant enough. Chances are yours will be used, but don’t be too disappointed if its not. I don’t feel that using the meta description tag should be at the top of your list of priorities, but it is certainly worthwhile to take advantage of it where appropriate.
What I do
I usually give individual attention to the meta description tag on the homepage, and occasionally the top level pages. For other pages, I either leave it out, or have the CMS mirror it from a separate summary field, subhead field, or automatically generated from the first two sentences of the body copy.
5. Avoid duplication of text
Search engines place much value on content uniqueness. Repeating large amounts of copy in your site will diminish the value of it, and potentially introduce penalty by the search engines. While repeating a sentence or paragraph of copy occasionally makes sense (like on the sidebar of this page), you want to avoid having major content duplication as much as possible. More specifically, don’t have two or more pages that are completely, or mostly identical on your site.
If there are instances where logistics justify repeating large amounts of copy, then you should consider breaking that copy out into a separate page and linking to it from the places where the repetition was needed. If you still can’t do that, then go ahead and repeat the text, but do so with the understanding that Google may devalue the repetitive pages.
Example
One of my clients had an About Us section on their site that contained several pages about their history and services. They wanted to repeat this content in the Recruiting section of their site. The logical thing to do here is to have the Recruiting section of the site link to the relevant pages in the About Us section of the site. But this particular client wanted to maintain a Recruiting-specific branding in that portion of the site, and wanted to keep the Recruiting audience within that. That makes sense, but how do we avoid devaluing the copy?
Since repeating the copy was purely for visual and branding benefit, this part of the site was deemed not particularly important for SEO. In order to avoid penalty to the copy in the About Us section of the site, we used META “noindex” tags to prevent search engines from indexing the portion in Recruiting. This technique is discussed in more detail further in this article.
6. Avoid duplication of URLs
This is actually an extension of the section above, but focuses in on the unintentional effect of duplicate URLs. Clients often don’t realize that all of the following are in fact different URLs to a search engine, even though they lead to the same page:
- www.company.com
- company.com
- company.com/index.html
- company.com/index.html?this=that
Likewise, all of the following are also different URLs to a search engine, even though they all go to the same About Us page:
- company.com/about/
- www.company.com/about/index.php
- www.company.com/about/
- www.company.com/about (note: no trailing slash on this one)
What’s the problem here? In this example, all of these URLs are leading to the same exact content. But given that they are different URLs, the search engines have indexed separate copies of each, potentially devaluing the primary one. While Google might be smart enough to realize that some of the URLs are synonymous, it very often doesn’t, so you shouldn’t assume anything.
The lesson here is to decide what your primary URL is for a given page, and be consistent with it. Lets say that we’ve determined your homepage’s proper URL is www.company.com. When you link to your homepage, link to www.company.com, not company.com, and not www.company.com/index.html. This is true of not just your homepage, but of all pages in your site. Be consistent!
But what about other people linking to you? You don’t have control over what those people link to. But you can choose to setup redirects that correct any improper URLs and retain valuable PageRank. While the full scope of setting up redirects are beyond the scope of this article, we’ll cover one of the most common and important ones. That is, to redirect any non-“www” requests to the “www” hostname. This can be done by adding the following Apache directives to the .htaccess file in your web root:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^company.com
RewriteRule ^(.*)$ http://www.company.com/$1 [R=301,L]
Of course, replace company.com with your domain name. This forces any requests to company.com to be redirected to www.company.com. You only have to setup this one directive, as this rule covers all the URLs in your site. If you get an “internal server error”, then your web host may not have Apache’s mod_rewrite installed — have them install it, or find a decent web host (most already have it).
Note that the type of redirect you use is just as important as using it at all. Make sure that you are using a 301 permanent redirect (as in the example above). There are other types of redirects, such as the 302 temporary redirect, meta redirect, and others which may penalize you rather than help you. Don’t use anything but a permanent redirect unless you have a good reason to do so.
7. Tell search engines what NOT to index
There are some pages you simply don’t want indexed, either because you don’t want search engines sending traffic to them, or because you are trying to tell search engines that a particular page is not useful for indexing. (See the example mentioned in the “avoid duplication of text” section above).
You can instruct the search engine to exclude pages by placing a robots.txt file in the root directory of your web site. This file contains a list of URLs that should not be indexed. The file should be consistent with the Robots Exclusion Standard.
Another way to exclude pages from indexing is to use the meta robots tag. This tag appears in the <head> section of any documents that you don’t want to be indexed:
<meta name="robots" content="noindex " />
If you also don’t want search engines to follow any URLs mentioned on this page, then add in the “nofollow” directive:
<meta name="robots" content="noindex,nofollow " />
8. Make a good Internal Link Structure
Internal links are the lifeline to pages in your site. Google will assume that if a page in your site has a lot of links to it, then it must be important. Likewise, if a particular page in your site only has only one or two links to it, then it’s less important. Whether on-site or off-site, Google’s PageRank looks at links to pages and views them as votes. Pages with more votes are obviously more important.
But it’s not just about quantity. PageRank goes further and looks at what those votes are saying. If those votes [links] are saying “Belgian beers”, then that page is viewed as having more relevance for Belgian beers than if the link had said “beers” or “click here.” Like with title and headline tags, you’ll get the most mileage by making quality links. That means being accurate and specific with the text placed in your links. Keywords and phrases carry weight here, so think about what your votes are saying.
It goes without saying that your links need to be accessible to the search engine crawler. If your links are contained in Flash or Javascript, then the crawler probably won’t find them. If you must use Flash or Javascript-generated links, then have an alternate set of plain text links so that they can still be seen by visually impaired people and search engines.
If your links are composed of clickable images rather than text, make use of the the image alt attribute (aka alternate text). Search engines can’t read the contents of your image, and you aren’t saying anything with it unless you provide alternate text. An image link with alternate text is roughly equivalent to a text link.
What I do
Most sites that I produce have a primary (top) navigation, secondary (sub) navigation, a breadcrumb trail, and footer links. Primary navigation of course contains the main sections of the site. Secondary navigation contains links to pages within a specific section, and is ideally hierarchal … a tree of links that grows as you drill down further. The breadcrumb trail contains links that produce a path back to the homepage, parent by parent. The footer links are where we highlight other important pages that may not have been appropriate for the top navigation, but are helpful shortcuts. Included in these footer links is a link to the site map. Each of these sets of links is contained in either an unordered <ul> or ordered <ol> XHTML list. I never produce javascript-based or flash-based navigation unless the client has a solid understanding of the compromises.
On most sites, I also create a site map … rather, have the CMS maintain an automated site map. This site map contains a hierarchal list of all pages in the site and ensures that there is a link to every page. For a larger site, this can get unwieldy to look at, in which case it might be worth splitting some of the hierarchy into multiple site maps. A site map serves both the people browsing your site, and search engines. It ensures that there is at least one more link to every page on your site, and keeps them in one place — which is a very tasty snack for Googlebot. It’s also worthwhile to setup an XML site map, especially for use with Google’s webmaster tools (more below).
While this is not the whole of the link structure that I use, it does comprise the core of what I try to include in every client site. Such a system is of course common, but it is a very solid basis for a highly accessible and indexable web site.
9. Use Google’s Webmaster Tools
Google has created several Webmaster Tools that are especially helpful with on-site optimization. These tools give you a direct connection with Google’s crawler and give you an insiders point of view.
Especially helpful are the diagnostics tools. These help you to find any errors that Googlebot had with your site, ensure that any URL restrictions are working, find common issues with title and meta description tags, and more. Other tools show you how Google views your site with regards to keywords and phrases, how often Google crawls your site, what search queries are bringing people to your site, and much more.
It’s beyond the scope of this article to go in depth about these tools, but there is a wealth of information available to you with Google’s webmaster tools, so use them.
Conclusion
In this article we’ve covered the primary ways in which you can make a web site optimal for search engine indexing and accessibility (on-site SEO), along with a few resources and tips. Every clients’ needs are different, so this list should be seen as a starting point rather than a comprehensive guide. Please contact me if you found this helpful, or if you have anything to add or subtract. I’d enjoy hearing from you.
—Ryan Cramer
© 2008 by Ryan Cramer Design, LLC • 