Tags and Tagging

Contents:

Current state of tagging

See Examples of Current Metadatas.

New tags

Asset Type

Type tag - what kind of data is represented on this page. Types: event, news, faculty, school.
There should be a type for every different kind of page there may be. This is partially useful for filtering search results (although this may be better utilised by restricting the results by url matching). However, I see this as being more useful for formatting search results.
For instance, if we know that a result is for a faculty, we could apply it's styling to the heading. If it is for an event, we could include the start/end date in the search result, etc.

  • Types: VIC.event, VIC.news, VIC.faculty, VIC.school, VIC.course, VIC.general (default)

New tags for specific assets

Note: Topic and Keywords metadata are expected to be carried accross to all asset types.

News

Title

Use existing DC.title

Date created

Use existing DCTERMS.issued

Description

Use existing description - TODO: @andrew to look into getting this renamed or remapped to VIC.description

News Category

E.g. whether it is a media release etc. Use existing VIC.NewsCategory

Audience

Whether it is for public or staff. Use existing VIC.Audience. TODO @andrew to investigate whether VIC.Restrictions would be better

Author

Person who created the news. There is already a DC.creator, however this seems to be nearly always "Victoria University of Wellington"

Also, would be worth implementing Google Rich Snippets for Articles

Events

Title

Currently there is either twitter:title or v:summary which seem to hold the title. Would be better to get this as VIC.title

StartDate

use existing v:startDate

EndDate

use existing v:endDate

Event Type

use existing v:eventType

Description

Currently there is description, twitter:description or v:description. Would be better to get this as VIC.description

Location

use existing v:location

Could VIC.EventOrganiser be replaced with a Faculty/School/Organisation metadata tag?

Also, not quite metadata, but we should look at implementing Google rich snippets for events

Staff

Most metadata we need is already in place. We can't really do any more on this until we know what solutions are being formed for new Staff Profiles hosting.

Course

Beautiful Metadata already in place. Can't think of any issues apart from:

  1. Can we merge the existing course 'school' and 'faculty' with the new school/faculty/organisation metatags?
  2. They currently have no prefix, would it be worthwhile to change them to VIC.description, VIC.year etc?

Faculty/School/Organisation

Title

Self Explanatory

Description

Self Explanatory

Color

Colored style associated with the school?

Image

Self Explanatory

Main Contact Details

Self Explanatory

Address

Self Explanatory

Keywords proposal

Keywords currently do not have a great solution. Ideally we wanted a system whereby the user could select from existing keywords, while still maintaining the freedom to create new keywords where necessary. This seems to be outside of the abilities currently provided by Squiz. Here are some other potential solutions:

1: Have the keyword field be a free text box

When entering metadata, the user has a free text box where they can enter any text they like as their keywords.
This is not ideal, for a number of reasons: It doesn't help with the issue of typos, people will be far more likely to not select already used keywords causing greater dispursal of content, if we want to allow multiple keyword use we will need to provide training on how to properly escape your keyword values - to name a few.
It will, however, be the easiest to implement - simply a case of adding a free text box to the metadataschema and some form of javascript interpretation on results.

2: Have a set list of keywords for people to select from

This is the opposite of the previous solutions, we have a curated set of keywords people can select from. If they require a keyword not already existing, they will need to get it approved for addition to the list.
Has the benefits of providing greater control over keywords use, there wont be any issues with typos etc. However, very restricting for the end users who may have content not fitting into a particular keyword which could in turn frustrate users and have them not tag their content correctly.

3: A combination of 1 and 2

Provide a set of curated keywords, but also provide a free text box for the user to add in new keywords as they see fit. The page will then have two metadata values - curatedKeywords and freeKeywords. This will be more confusing for the end user (two methods of providing keywords), however may provide the best functionality and flexibility within our platform constraints. Issues that could arise where if the list of original keywords is too large, users will simply ignore them and only use the free text box, in which case we are no better off than solution 1.
On a side note, we may be able to call a trigger within squiz which will combine the two sets of keywords into one metadata field instead. We could also use this to keep track of new freetext keywords to provide easier analysis of their usage and possible addition to curated keyword list.

4: Squiz Javascript Plugin

Using the javascript plugin functionality within squiz, we create a page/area which will be dedicated purely to tags. Because it is javascript, we should hopefully be able to create and control everything we need in here. If this is possible then we can really do whatever we would like with it. Possibly even providing a free text box which will allow the user to start typing in keywords and have suggestions provided dynamically as they type, as well as allowing new keywords. This will then be converted into properly machine readable metadata tags and saved to the asset.
This would be the most complicated solution, with the greatest development time, however it could potentially give us what we require from tagging. We are also faced with the issue of the upgrade to Squiz 5. Because of this we will need either develop the code purely for the new edit+ suite, or develop it for the easy edit suite but ensure it retains future compatibility. Nathan has said that it may be possible to get Edit+ installed on a server separate from the live Squiz install so we could progress this work without having to wait until the upgrade has taken place.

Topics Current State of Development

The initial workflow for selecting Topics was:

  1. The user selects the Super Topic - eg, accounting
  2. The user selects the Sub Super Topic (if there is one) from a list which has been filtered by the Super Topic - eg, accounting
  3. Finally the user selects the Topic from a list which has been filtered to only those values available within the Super Topic and the Sub Super Topic - eg, accounting

Unfortunately, this sort of functionality doesn't seem to be supported in Squiz. There doesn't seem to be a clear way to filter metadata options based on what was selected previously. However there is a workaround which could do until we are able to migrate tagging to a different system:

Instead of 3 lists with selectable Topic categories, there is only one sorted list of all the Topic combinations a person could select. It looks something like this:

    Architecture>Architecture>Architecture
    Architecture>Architecture>Architecture history and theory
    Architecture>Architecture>Interior architecture
    Architecture>Architecture>Landscape architecture
    Architecture>Construction>Building science
    Architecture>Construction>Project management (building)
    Architecture>Construction>Sustainable engineering systems

This unfortunately means this list will be quite long, however it doesn't seem unmanageably so, and also allows the selection of multiple topics (if needed).

todo
As I was implementing this, I forgot that we need to be able to select Super Topics or Sub Super Topics by themselves (without a regular Topics that is). This is not difficult to implement I just need to not forget it the next time I import the list of Topics.

todo
Create documentation listing the terminal commands I used to convert the excel file into the correct format to be uploaded to Squiz. Could be useful brief on how to use sed, sort, egrep and paste/ (paste not to be confused with pbpaste which actually works more like the copy and paste command)

Faculties, Schools and Organisations

Similar to the Topics, have a multi select list of all Faculties, Schools and Organisations which the asset can have attributed to them:

Faculty of Architecture and Design
Faculty of Architecture and Design>School of Architecture
Faculty of Architecture and Design>School of Design
Faculty of Humanities and Social Sciences
Faculty of Humanities and Social Sciences>School or Art History, Classics and Religious Studies
Victoria University Library
Faculty of Humanities and Social Sciences
International Institute of Modern Letters
Early Childhood Services
Weir House
etc.

Formatting and Gotchas

URL construction for metadata filtering:

You can filter results by metadata using the requiredfields and partialfields tag.

Examples:
If you want only items that have the metadata tag DC.publisher use:
&requiredfields=DC%252Epublisher
If you want only items where the DC.publisher = "Victoria University of Wellington" use:
&requiredfields=DC%252Epublisher:Victoria%2520University%2520of%2520Wellington

Make sure you double percent encode not only the name but the value of the metatag. For example . becomes %252E, : becomes %253A. See here for the rest of the codes in a handy format
NOTE: If you use either requiredfields or partialfields then the q vaue is optional
(e.g. curl -X GET 'http://search.victoria.ac.nz/search?client=new_homesite_frontend&proxystylesheet=json_frontend&output=xml_NO_DTD&filter=p&getfields=%2A&start=0&wc=0&wc_mc=0&num=100&site=global_search_collection&q=accy+111&requiredfields=DC%252Epublisher:Victoria%2520University%2520of%2520Wellington')

inmeta

If using inmeta inside the q value, you need to be careful about your url escaping:
&q=inmeta:[double escaped meta tag name][single escaped :][double escaped meta tag value]
e.g. Searching for "Victoria University of Wellington" in the "DC.publisher" metadata tag
&q=inmeta:DC%252Epublisher%3DVictoria%2520University%2520of%2520Wellington

(E.g. curl -v -X GET 'http://search.victoria.ac.nz/search?client=new_homesite_frontend&proxystylesheet=json_frontend&output=xml_NO_DTD&filter=p&getfields=%2A&start=0&wc=0&wc_mc=0&num=10&site=global_search_collection&q=inmeta:DC%252Epublisher%3DVictoria%2520University%2520of%2520Wellington')
NOTE: Doing a search like this seems to take a loong time (32 seconds at one count) so this will probably not be ideal for wide use.

Issues

GSA doesn't seem to be indexing some MT tags

EG: victoria.ac.nz/study/course-career/career-options

This is the metadata that is returned from a gsa search:

    "DC_creator": "Victoria University of Wellington",
    "DC_publisher": "Victoria University of Wellington",
    "VIC_Audience": "public",
    "VIC_keyImage": "",
    "description": "Work backwards to find the right degree to get you the career you want, or find out what careers different degrees could lead to.",
    "twitter:card": "summary",
    "twitter:description": "Work backwards to find the right degree to get you the career you want, or find out what careers different degrees could lead to.",
    "twitter:site:id": "218343330"
    "twitter:title": "Exploring career options",
    "viewport": "width=device-width, initial-scale=1, maximum-scale=1",

Whereas the metadata displayed on the site front end is

    "DC.creator": "Victoria University of Wellington",
    "DC.publisher": "Victoria University of Wellington",
    "VIC.Audience": "public",
    "VIC.keyImage": "",
    > "article:modified_time": "2015-07-31T09:07:31+12:00",
    > "article:published_time": "2015-08-03T14:36:28+12:00",
    > "article:publisher": "164979016849070",
    > "article:section": "Future Students",
    "description": "Work backwards to find the right degree to get you the career you want, or find out what careers different degrees could lead to.",
    > "fb:profile_id": "164979016849070",
    > "og:description": "Work backwards to find the right degree to get you the career you want, or find out what careers different degrees could lead to.",
    > "og:image": "http://www.victoria.ac.nz/__data/assets/image/0003/198246/social_media_default.png",
    > "og:site_name": "Victoria University of Wellington",
    > "og:title": "Exploring career options",
    > "og:type": "article",
    > "og:url": "http://www.victoria.ac.nz/study/course-career/career-options",
    "twitter:card": "summary",
    "twitter:description": "Work backwards to find the right degree to get you the career you want, or find out what careers different degrees could lead to.",
    "twitter:site:id": "218343330",
    "twitter:title": "Exploring career options",
    "viewport": "width=device-width, initial-scale=1, maximum-scale=1",

Missing metadata is highlighted with >.

Looking at the source code, this seems to be because these tags are labelled like as meta property instead of meta name:
<meta property="fb:profile_id" content="164979016849070" />
Whereas the other tags are labelled like this:
<meta name="twitter:title" content="Exploring career options" />

The question is why has it been done this way? Should it be changed in Squiz to name, or should/can the gsa be configured to index property as well as name metadata.