|
|
|
Signpost is a complex system comprising several integrated software components. Each component has
an overall function such as retrieving web page data, searching search resources, or (in the case of the
main submission engine), submitting to the resource's various forms. Each component bases its actions on data
retrieved from Signpost's database, and in turn updates the database, thus modifying the actions of
the other components.
Here are the main stages involved in the submission process:
- Spidering the web page
One of the first things Signpost does is to spider the web page of the submitted
URL. There are several reasons for this. First of all Signpost needs to determine whether the page
has been updated since the previous spidering. It also needs to ensure that the page is still valid
and that there are no HTTP errors. Finally, as Signpost is also a search engine, it needs to index
all the information it retrieves from the page.
- Searching for the URL
If this is the initial submission run for this site, the search component of Signpost will
search for it at the target resource. If the site is found, Signpost will not attempt to submit
it to that resource, at this point. If however Signpost's spider subsequently discovers
that the web page has been updated since its initial spidering, then it will be submitted to the search
engine spiders. The search frequency for non-indexed submissions is once per week, and for indexed submissions,
once every two weeks.
- Making the submissions
When Signpost makes a submission, it is in effect mimicking the action of the resource's
submission forms. That is, it generates exactly the same data that the form generates when the
'Submit' button is pressed, even down to the HTTP headers. If you check out the submission forms at
the resources Signpost submits to, you'll see how widely they differ, some requiring just
a URL and e-mail address, others asking for keywords and descriptions, as well as many hidden fields.
Signpost is able to mimic them all, and can even detect and store any edit passwords which may be returned.
- Handling the responses
To determine whether a submission has been successful or not, Signpost has to do what
you yourself would do had you submitted your site manually. It reads the responses. It can detect successful submissions,
duplicate submissions, and of course failed submissions (in which case it stores the response for administration purposes).
- Maintenance
Any software system which interacts closely with other independent systems will always require a high level
of maintenance, and Signpost is no exception. Search engines can change their submission forms and CGI scripts
at any time and Signpost's database needs to be updated as soon as possible to reflect those changes. When any of
Signpost's components is active, it creates comprehensive log files recording its actions (or those of the search
resources), in easy to read HTML format. This allows us to make any amendments which may be required, quickly and
easily. Signpost also does some maintenance itself by checking the submission forms of the various resources before
submitting to them. If it detects changes or problems it will disable the resource for that submission run, and send an
e-mail to the administrators alerting them to the fact.
|
|