Improving our site search: Where is the best bang for our buck?

The purpose of our site search is to assist visitors acquire the knowledge they need as efficiently as possible. Unlike internet search engines, we only have to serve results for our 'family' of sites. However, with more than 150,000 pages, this is still a complicated task, made even more challenging by the distributed authorship model that we live with and the disparate content sources (i.e. differing quality and completeness of metadata). In order to do search better I believe that we should we should focus our efforts in four main areas:

Context: We need to know more about our users, both as groups and as individuals, and their needs. Only then can we please most people most of the time. The starting point (and easiest to do) is reflect the site location into the search experience. This would mean a different result layout (or even different results) for a user searching from 'Future Students' than one searching from 'Research'. Taking this further, we could the type of visitor (in broad groups such as international/domestic) or their preferences (maybe by drawing on cookie information)..

Content: Search needs must feature in our content work (from strategy, through training and in writing) if we are to make real improvements to the relevance of search results. We must index and enrich the right content (not all of it), manage our recommend results, add promoted results for common queries (best bets), improve the visual catalog of the most important items (richer snippets), and eliminate junk from the default search experience.

Metadata: We need more and better structured information about our content in order to substantively improve search result relevancy. The minimum probably includes:

A title
Description of the contents
A number of descriptive keywords;
Some timestamps, highlighting the content’s lifecycle (e.g. created, published, updated, revised and finally possibly archived).
Status of availability, such as public, access-controlled, valid, outdated, archived, etc.
Its canonical address. That is the original and primary URL

UX: We need well researched, designed and built interfaces, with user feedback to enable continuous improvement. How come we don't ask where search found what the user was looking for? We should be continually gathering feedback, analysing, and refining our search experience and index.

Dissatisfaction with search results

Users regularly complain about the relevancy of the current search results, both before and after the move to Funnelback. While one aspect of this is personal preference, the Web Team acknowledge that search has been unloved (i.e. had little attention lavished on it) and would benefit from an investment in time and resource. The team is always keen to hear of specific examples where search doesn't work or gives poor results. While we can't always alter that specific result set, we do analyse to try and understand the underlying issues, as these are what we should work on. So, please forward any specific examples of where search doesn't work for you and we will look into them.

Search service team

We need to make search a team priority, both the ensure it is ongoing rather than intermittent, but also because it requires more capability than one person possesses.

The most important effort an organisation can do to improve its search is to appoint a owner of search! It is an absolute minimum requirement. This means that a owner of search must have time set aside to work with search. A few hours a week is much better than nothing. And even more important: to work with search is a long-term work, certainly not a project.

The roles and competencies in search’s service team should consist of:

(Business) owner of search
Search technician
Search editor and/or Information specialist
Search analyst
Search support

Evaluation of search

Search fill its purpose when it deliver the right information, is fast about it and always available. To satisfy these requirements, the function of search is to be tested regularly and tests should be documented in test plans. Below are some of the tests that are appropriate:

Search loads quickly, tested with Google Pagespeed Insights, with a minimum of 80/100.
The response time of a query should be about 0.1 seconds, but never longer than 1 second, measured at the user interface.
Search will be available 24/7 (around the clock seven days a week). Monitored by, for instance, Pingdom or Uptimerobot.
Size of search indexes. Among other things, to see if more or fewer documents are indexed, which can provide warning signs in advance, help being proactive.
Search’s user interfaces are accessible, tested with the W3C Validator.
Search’s user interfaces are usable, tested against webbriktlinjer.seand W3C:s WCAG 2.0 at level AA.
Survey the satisfaction of users.
Reviewing search statistics and/or performing search analytics, to gain insight into how users are searching. Look regularly at our:
- Top Xx queries: To gain an insight into how the experience of search is for a large part of the users. And also, if the relevance model can be improved and what content is most in demand.
- Abandoned queries:
- Zero result queries: To identify what content is missing, find synonyms to use, understand which abbreviations are used and discover alternative spellings.

Training

Probably everyone who use search are in need of some form of training in the offered features. At least the following user training needs to be actively disseminated and be available when needed:

All users need to understand how search works and be able to supplement their knowledge with new handy tricks.
Web editors need to understand how they markup the information properly. Do we move to a minimum quality quota regarding metadata which at least would be mandatory for content creators.

Collections

Funnelback allows us to define collection, document/page/file groups with a common thread. We can then use these collections in search to better target and improve relevance, without having to micro-manage each document. For example, subject areas and UG degrees could be two collections, in turn grouped into a meta-collection 'UG study things'. We could search only over this metacollection on the KYM landing page or an UG study hub. So, combined with some site context information or a user-cookie value, we can improve relevancy without expensive content work.

Promoted results

Questions

Should we continue to use the 'promoted results' for courses, or transition to subjects? Or degrees?
Should we be adding key words to subjects and/or subject areas? Can a subject area 'inherit' the keywords of it's component subjects?
How do we strike a balance between what the user wants to see and what we want the user to see?