Web Research

Much of the following will be familiar to many experienced web-surfers; it is aimed, particluarly, at those who are just beginning to explore the richness of the Web.

It should also be said that things have changed since this piece was first written. I intend to update things shortly (dealing with things like the "Deep" Web). In the meantime, there's still a lot of useful stuff here.

When you are in need of a particular piece of information, it doesn't matter if you can't come up with an answer, off the top of your head. What matters is, can you find out?

There are upwards of a million sites on the Web, each holding an average of a three-hundred pages - and thousands more pages are being added every day (this statistic is already out of date). Much of the content is trivial nonsense, but there are increasing numbers of gems.

The Web contains huge resources of information - much of it dealing in specialised areas which are often quite difficult to research, using conventional means.

The problem is, how do we sort the wheat from the chaff?

The starting point is usually a WWW search.

There are a half a dozen organisations who seek out all the pages available on the Web, and compile huge indexes of their content. You can then search these indexes for subjects of interest.

The box below gives direct access to one of these indexes - the one managed by Dmoz.org. This is a non-profit directory/index. Many other search engines use the Dmoz index.

In general terms, it is probably best to try out each of these sites, find a couple you feel comfortable with, and stick to them. You will have a better chance of learning their syntax, and of understanding their results.

I'm not going to go into detail about search methods - they will depend on the search engine you use and on the kind of subject you are researching

There are gaps in these indexes. Some sites specifically request no indexing, some are so new that the indexers haven't reached them yet. You will also find that some of the sites offered by the search engines no longer exist - it's a lot harder to get unindexed than to get indexed.

Refining your search

You may find that the search engine responds to a simple search with a huge number of "hits" - far too many to be useful. For example, searching on the word "Shell", in the hope of finding some data on the oil company, returns 129,454 pages to choose from.

Each of the search engines offers various ways of narrowing down these results. Unfortunately, there is no standard method which works for them all. If you need to make your searching more precise, each of these Search pages has a "Help" or "Tips" page explaining the rules. Some, like Google, allow you to search the previous set of results only, using some additional search criterion - so you can start with a broad search pattern, and progressively narrow down the results until they satisfy your requirements.

Marking your place

When you come across a site which seems useful - either because of the information it contains, or because it contains lists of other useful sites, you can "mark your place", by selecting Favorites, Add to Favorites (on Internet Explorer) or Bookmarks, Add Bookmark (on Netscape). This will store the full address of the current page, so you can return there at any time.

Evaluating sources

Almost anyone can put up a web site, and it may contain almost anything. Normal laws of misrepresentation, libel and decency still apply, but they are not easily enforced in the electronic domain.

But it has never been a crime to be misguided, misinformed, mischievous - or just plain wrong.

So, how can you assess the value of the information being offered to you - particularly if you intend to rely on it for an important decision?

Provenance

What do you know of the individual or organisation displaying this information? Do they have a reputation to lose? If you know nothing of either, you must treat any information given sceptically.

Don't be fooled into thinking the contents of a university site (which has .edu. or .ac. in its address) must be reliable - some of the trashiest sites on the Web have been thrown together on free university servers, by bored college students.

If however, the author of the information is named, and references to his/her credentials are available - that can be an indication of reliability.

If the information is offered by an organisation or company, don't be fooled by an impressive-sounding name - these are easy to come by. If you've never heard of them before, look around to see if anyone else quotes them (do a Search on the company/organisation name, and see what other people think of them).

What's in it for them?

You should consider whether an individual or organisation has an axe to grind. Many web sites are published by lobbying organisations and single-issue interest groups. Many of these are very professionally produced, and some go to great lengths to give the impression of academic rectitude - even citing sources for their statistics. It is only when you follow up these citations that the selective nature of the quotations come to light.

So, question any site which is arguing a cause. There may be useful information there, but you'll need to research alternative views to be sure that facts haven't been "massaged" (or even just made up).

How recent?

Information has a value; those who possess it seldom give it away for free. However, many information brokers do give away old information for free - so potential buyers of new information can see the kind of material on offer.

In many cases, it won't matter if information is a little out of date - as long as you know that is the case. Try to find out when the information you are viewing was last updated. If the site itself does not make this clear, most sites contain an Email address for those responsible for the site. Ask them.

Other tips

It wouldn't be wise to let appearances influence us unduly, but a certain degree of quality control should be expected from a reputable site. If a page is riddled with bad spelling errors, bad grammar or bad logic, the chances are it will also contain bad information.

If you are doubtful, check some of the links to other sites. If more than a couple of these links turn out to be dead, then the original page probably hasn't been updated for a long time.

If possible, check what the site says about a subject you know about. If it gets that wrong, then you can assume that the same standards apply to other subjects.

If you see the same "fact" on several different sites, you may be tempted to believe that its credibility is assured. However, you shouldn't forget how easy it is to copy an electronic document, or just a scrap of text - many, many times.

Several sites may have arrived at the same conclusion by various means, but it is just as likely that a "factoid" has simply been reproduced indiscriminately.

Keeping the information

Having found the information you want, you will probably want to keep it in a form you can use in the future - perhaps to share it with others.

Not surprisingly, the best way to do that depends on how it is presented to you.

Copyright

Just because information may be freely available on the Web, does not mean that it is free of copyright. All intellectual property retains its status, no matter what the format.

Even if the web-page explicitly states that there is no charge for this information, you still can't pass it off as your own.

Treat the re-use of this data as you would treat photocopying; a few copies to show to colleagues isn't going to trouble the lawyers, but splashing it across the company Intranet might.

However, if you assemble your own document, using data from a number of sources, then there can be little cause for worry.

Downloading (and using) files

Some sites will give you the opportunity to download a file containing the material you want. In such cases, it is simply a case of following the instructions on screen.

Your browser may warn you about the dangers of downloading unknown material. This is good advice, but common sense should prevail - no text file can cause you any damage. You should be wary of any programs or Office documents; scan them for viruses before opening them.

Zip

Many downloadable files are in a "ZIP" format. This is a compression technique which squashes files into smaller packages.

You will need an "unzipper" (like Winzip or Pkzip for Windows) to access the files within.

Graphics

A web site may display a useful graph or map which you wish to store for future use. In Internet Explorer, right-click on the graphic and select Save Picture As… and choose where you want the file to be stored.

In Netscape, the command is Save Image As...

Txt, Doc, Xls, Pdf

Some sites store documents as text files, Word documents, Excel spreadsheets or Acrobat PDF files.

When you click on a link to these documents, they simply open within your browser (in fact, the Word, Excel or Acrobat programs take over your browser, and open the file there). Of course, this will only work if you have the appropriate software on your computer. If you haven't, the site offering the material will usually include a link to a location which will allow you to download the appropriate "plugin" for your browser.

However, viewing a file in this way will not save that file to your computer; when you leave that site, the file is gone. Using the Save As command will only save the web page, not the document it contains.

So, go back a step. Use the browser's Back button to return to the page you clicked to open the document. Right-click on the link you used; you will see a menu. On this, select Save Target As… (Internet Explorer) or Save Link As... (Netscape) and choose where you want the file to be stored.

On screen - text and tables

In most cases, a web page will present information as simple text, or in rows and columns (tables).

These may be "static" sites, where the page is written once, to be viewed many times. On other ("active") sites, information is presented to you on demand - it is actually assembled, there and then, to fulfil your request.

In either case, there is no file to download; what you see is what you've got. There are two ways of dealing with this:-

Use Alt-PrintScreen (one of the grey keys, top-right on your keyboard). This will copy the contents of the browser screen to the Windows clipboard. You can then open (say) a Word document and Paste the screen into that document.

This has the advantage of being fast and accurate, but has three disadvantages:-
(a) only the contents of the window will be copied - if the table scrolls off the bottom, the captured screen will miss some of it.
(b) the Word document will now contain a picture of the page - you won't be able to edit it.
(c) text may become difficult to read - especially if you have to re-size the picture to fit the Word page.
Use this method if you only want a reminder of a piece of information - it won't be much use for showing to other people.
Sweep the mouse over the text (or table) you want, holding the left button down as you go. The text will be highlighted in the browser. Press Ctrl-C (to copy the text to the clipboard). Switch to a Word document, and press Ctrl-V, which will Paste the contents of the clipboard into the document.
This method has the disadvantage that all the formatting will be lost - you will have to re-create it in Word.
The advantage is that you will be able to edit the result - perhaps to extract a particular range of data - and insert it into tables of data gathered from other sources.

Citation

If you assemble a report, using various Web resources, it is good sense (and good manners) to cite the sources used. If you have found the information on the Web, you should include the address of the page where you found it (copy it from the Address field of your browser), as well as the Author or organisation who compiled it.

Doing this will add to the credibility of your work, and will make it easier for you (or others) to revise it later.

Are you sure you don't know?

Researching on the Web