Improving our site search: Where is the best bang for our buck?

The purpose of our site search is to assist visitors acquire the knowledge they need as efficiently as possible. Unlike internet search engines, we only have to serve results for our 'family' of sites. However, with more than 150,000 pages, this is still a complicated task, made even more challenging by the distributed authorship model that we live with and the disparate content sources (i.e. differing quality and completeness of metadata). In order to do search better I believe that we should we should focus our efforts in four main areas:

Context: We need to know more about our users, both as groups and as individuals, and their needs. Only then can we please most people most of the time. The starting point (and easiest to do) is reflect the site location into the search experience. This would mean a different result layout (or even different results) for a user searching from 'Future Students' than one searching from 'Research'. Taking this further, we could the type of visitor (in broad groups such as international/domestic) or their preferences (maybe by drawing on cookie information)..

Content: Search needs must feature in our content work (from strategy, through training and in writing) if we are to make real improvements to the relevance of search results. We must index and enrich the right content (not all of it), manage our recommend results, add promoted results for common queries (best bets), improve the visual catalog of the most important items (richer snippets), and eliminate junk from the default search experience.

Metadata: We need more and better structured information about our content in order to substantively improve search result relevancy. The minimum probably includes:

A title
Description of the contents
A number of descriptive keywords;
Some timestamps, highlighting the content’s lifecycle (e.g. created, published, updated, revised and finally possibly archived).
Status of availability, such as public, access-controlled, valid, outdated, archived, etc.
Its canonical address. That is the original and primary URL

UX: We need well researched, designed and built interfaces, with user feedback to enable continuous improvement. How come we don't ask where search found what the user was looking for? We should be continually gathering feedback, analysing, and refining our search experience and index.

Collections

Funnelback allows us to define collection, document/page/file groups with a common thread. We can then use these collections in search to better target and improve relevance, without having to micro-manage each document. For example, subject areas and UG degrees could be two collections, in turn grouped into a meta-collection 'UG study things'. We could search only over this metacollection on the KYM landing page or an UG study hub. So, combined with some site context information or a user-cookie value, we can improve relevancy without expensive content work.

Promoted results

Search service team

We need to make search a team priority, both the ensure it is ongoing rather than intermittent, but also because it requires more capability than one person possesses.

The most important effort an organisation can do to improve its search is to appoint a owner of search! It is an absolute minimum requirement. This means that a owner of search must have time set aside to work with search. A few hours a week is much better than nothing. And even more important: to work with search is a long-term work, certainly not a project.

The roles and competencies in search’s service team should consist of:

(Business) owner of search
Search technician
Search editor and/or Information specialist
Search analyst
Search support

Evaluation of search

Search fill its purpose when it deliver the right information, is fast about it and always available. To satisfy these requirements, the function of search is to be tested regularly and tests should be documented in test plans. Below are some of the tests that are appropriate:

Search loads quickly, tested with Google Pagespeed Insights, with a minimum of 80/100.
The response time of a query should be about 0.1 seconds, but never longer than 1 second, measured at the user interface.
Search will be available 24/7 (around the clock seven days a week). Monitored by, for instance, Pingdom or Uptimerobot.
Size of search indexes. Among other things, to see if more or fewer documents are indexed, which can provide warning signs in advance, help being proactive.
Search’s user interfaces are accessible, tested with the W3C Validator.
Search’s user interfaces are usable, tested against webbriktlinjer.seand W3C:s WCAG 2.0 at level AA.
Survey the satisfaction of users.
Reviewing search statistics and/or performing search analytics, to gain insight into how users are searching. Look regularly at our:
- Top Xx queries: To gain an insight into how the experience of search is for a large part of the users. And also, if the relevance model can be improved and what content is most in demand.
- Abandoned queries:
- Zero result queries: To identify what content is missing, find synonyms to use, understand which abbreviations are used and discover alternative spellings.

Training

Probably everyone who use search are in need of some form of training in the offered features. At least the following user training needs to be actively disseminated and be available when needed:

All users need to understand how search works and be able to supplement their knowledge with new handy tricks.
Web editors need to understand how they markup the information properly. Do we move to a minimum quality quota regarding metadata which at least would be mandatory for content creators.

Questions

Should we continue to use the 'promoted results' for courses, or transition to subjects? Or degrees?
Should we be adding key words to subjects and/or subject areas? Can a subject area 'inherit' the keywords of it's component subjects?
How do we strike a balance between what the user wants to see and what we want the user to see?