What Search Engine's like in a Web Page

This section of our Search Engine Reference considers what information a Search Engine Robot looks for when it analyses a web page.

Page Description

Each page can have a Description field in the hidden 'head' area of a page. On its own it will not help a web page to get listed, and if you use keywords inconsistent with the rest of the page then it will have a negative effect on you page's placement. Some engines still utilise the description field to contribute to the page relevance score. Many web designers will duplicate information from the keywords tag in this field.

META Keywords

This is intended for robots (programs that scan the Internet automatically) to scan and use. Unfortunately due to the abuse in recent years, it is widely ignored or given a very low weighting.

Page Title

HTML allows you to control what is displayed in the Title of the web browser when a page is being viewed. Many web sites still make the mistake of using the same title for all pages or putting a promotional motto into it. This field is scanned by many engines and should reflect the content of the web page. The principal keyword for the page should be contained in the title. Another mistake commonly made is to put the company or product name in the title.

Page Text

Ultimately users are going to a web site to read some of the text on the web pages contained on it. Originally engines ignored this text, they just read the header part and left the main 'body' part on trust. Now the text is quite rightly perceived as the most important part of the page and will be scanned for keywords. Once again it is not a good idea to plaster the same keyword all over the text as engines will spot this and downgrade the rating of the page. They are looking for a relatively low density of repeated keywords as this indicates the page is giving lots of additional information relating to a keyword. It is presently the most important element of the page, and if effort is to be expended on improving a site's position, adding relevant text is the best choice.

Image tags

Engines don't currently use clever algorithms to analyse the graphics images within a page, so even though a human reader may be getting useful context information by looking at the pictures within a page a robot (such as a search engine) won't. However each HMTL IMG tag allows you to specify an alternative text (ALT) which can be read and used. The ALT text should therefore contain relevant keywords for the image and page. It should reflect the intended impression of the image as if it was a text heading. Its original intention was for use in browsers that didn't display graphics.

Links to other sites

If you have a researched the web you will know that one factor that distinguishes 'useful' from 'amateur' sites is that a good site will have links to relevant sites. Sites that are under construction or are just a scan of printed material will not have links to external pages. Engines can use the presence of links to add to the weighting for a page. It can also take account of the ranking and relevance of the sites that it is linking to as well. The other way that this has been abused is by the same person setting up a whole string of web sites that link to each other, the sites have very similar content and just serve to mutually promote each other. As hosting and domain names are relatively cheap now this is not an expensive 'trick' to set up. However, engines are now wise enough to detect this partly by noting that all the sites are hosted on the same server. It is now very difficult to get quality links to a web site and so search engines can't rely on this as the only ranking measure.

Link farms

The importance of links became a well-known attribute of engines and, of course, it is now widely abused. People set up sites that are just catalogues of links to achieve a higher ranking, if an engine detects that a 'link farm' is in operation it may well be given a negative effect on the ranking. Many of the web sites are fully automated and will build up a set of links around a keyword. They work by taking search engine results and just placing them in pages, they then notify the webmasters for these sites requesting a reciprocal link. It's not really clear whether a link farm is ever going to make real money from this trick as these sites are just not useful.

Headings

The look of the default HTML headers H1,H2,H3,... is decidedly dull but with Cascading Style Sheets it's perfectly possible to make them fit in with the page style.

Search Engines will pick out the headings and treat the text within them as significant, increasing the score for a page depending on the keywords it finds. Won't make a great deal of difference in itself but combined with other techniques it may well boost the ranking of a site.

Bold Text

Traditionally if you want to emphasise something you put it in bold in books or newspapers. On web pages the <B> (Bold) and <STRONG> tags allow a web page author to emphasise particular words and phrases. A search engine can then take more account of these words and phrases when it scans the page.

Page Name

It may seem an insignificant detail but the actual page name does matter. It allows the search engine to distinguish automatically generated pages news90238.htm from manually chosen ones like climatechangearctic.htm. So if possible choose a page name that reflects the title or principal keyword in a page. It's also likely that if you put pages in different folders then the name of the folder will be significant too.

Static Pages

With the widespread use of Content Management Systems the generation of web pages is mainly automated. The information is not stored in manually edited HTML pages, it comes from a large database of information. This is the only way that a news organization ➚, for example, can hope to keep a site consistent and up-to-date. Trouble is that if you're not careful these dynamic pages may not be persistent so the search engines have a certain reluctance to include the page in their index. You will tend to get a higher position for a static HTML page rather than one autogenerated with PHP, ASP or whatever.

Meta tags

The <HEAD> section of an HTML page can contain a wide variety of information include Author, Language and Response header setting. The META allows custom fields to be added that have a meaning specific to a web site. For the purposes of search engines 'keyword' and 'description' are the only ones of significance.

Owing to the abuse of these directives by some web page authors these fields may be ignored. Curiously, some products are still being sold on the basis that working out the appropriate keywords for a particular page will guarantee a good search engine placement. Getting a relevant set of keywords may have some effect on the ranking of a web page, but probably not much and from experience it usually better to do this manually as no program can work out the main message to be gleaned from a web page.

For the technical background on META tags see the HTML4.0 Specification ➚. If you would like an excellent layman's guide to the Internet please try How Stuff Works ➚ See also Site Design Tips ➚

Our SiteVigil product lets you check your site's position on all major search engines for lots of keywords. It will also let you analyze the web sites of competitors who are achieving a higher search engine ranking.