Represent Changes of Knowledge Organization Systems on the Semantic Web

Traditional knowledge organization systems (KOS) including thesauri, classification schemes, taxonomies, subject heading systems, name authorities, and other lists of terms and codes have been playing important roles in indexing, information organization, and retrieval. With the advent of the semantic web, a large number of them have been converted into Linked Open Data (LOD) datasets. Since the Simple Knowledge Organization Systems (SKOS) and SKOS eXtension for Labels (SKOS-XL) became formal World Wide Web Consortium (W3C) recommendations, they have been applied to knowledge organization systems. In this article, the issues surrounding changes, versioning control, and evolution of KOS are investigated. From KOS services providers and consumers perspectives, this study focuses on the representation of changes on the semantic web. To cite this article: Zou, Q. (2018). Represent Changes of Knowledge Organization Systems on the Semantic Web. International Journal of Librarianship, 3(1), 67-77. doi: https://doi.org/10.23974/ijol.2018.vol3.1.64 To submit your article to this journal: Go to http://ojs.calaijol.org/index.php/ijol/about/submissions INTERNATIONAL JOURNAL OF LIBRARIANSHIP, 3(1), 67-77


INTRODUCTION
The Simple Knowledge Organization System (SKOS) provides a common data model for organizing knowledge organization systems such as thesauri, classification schemes, subject headings, and taxonomies (W3C, 2012). Since the Simple Knowledge Organization Systems (SKOS) and SKOS eXtension for Labels (SKOS-XL) became formal World Wide Web Consortium (W3C) recommendations, they have been applied to knowledge organization systems. SKOS and SKOS-XL capture the common relations of KOS in a formal and explicit way. Moreover, it potentially improves the interoperability between KOS. SKOS and SKOS-XL can not only bring paper format KOSs but electronic format KOSs into an open and highly connected linked data world.
However, an actively used KOS needs to be regularly updated to reflect the development of human knowledge. Therefore, changes are inevitable for KOS. Changes need to be properly managed by KOS services providers and KOS consumers because an out-of-date or poorly managed KOS may provide obsolete information that may hinder information retrieval. KOS services providers often actively maintain their KOSs. For example, a thesaurus such as Arts and Architecture Thesaurus (AAT) 1 , subject heading systems such as Library of Congress Subject Headings (LCSH) 2 , Canadian Subject Headings (CSH) 3 and RVM (Répertoire de vedettesmatière) 4 publish revisions regularly. However, KOS consumers have difficulty to keep up with KOS services providers. For instance, researchers have identified obsolete subject headings in library automation systems (Buckland, 2012;Chan & Vizine-Goetz, 1997). To actively manage KOS changes at the consumers' side, KOS changes need to be explicitly expressed and properly propagated. With more and more KOSs being published on the semantic web, the issue of KOS changes needs to be investigated in this new context.

LITERATURE REVIEW
This section is organized into two subsections. Related studies about KOS changes are reviewed in the first subsection. Studies related to change representation are summarized in the second subsection.

Changes in KOS
The obsolete problem caused by KOS changes especially in subject heading systems has long been identified (Ashman, 2006;Buckland, 2012;Chan & Vizine-Goetz, 1997). Changes in KOS are caused by new topics, names, concepts, and cultural changes (Buckland, 2012). If changes in KOS are not propagated to applications, the old terms, concepts, and names become obsolete.
Since SKOS has been widely applied in publishing KOS on the semantic web, the capabilities of handling KOS changes in SKOS have been examined (Tennis, 2005). The two mechanisms (i.e., notes and web ontology language (OWL) versioning) for concept schema revision provided by SKOS do not "account for … refinement, lumping and other transformations of concepts" (Tennis, 2005, p. 276). Although Tennis (2005) proposes an approach to extend SKOS Core to track vocabularies changes over their lifetime through capturing three different changes including lumping, refining, and relationship changes, there are no detailed explanations on how the 1 http://www.getty.edu/research/tools/vocabularies/aat/ 2 http://id.loc.gov/authorities/subjects.html 3 https://www.bac-lac.gc.ca/eng/services/canadian-subject-headings/Pages/canadian-subject-headings.aspx 4 https://rvmweb.bibl.ulaval.ca/ approach works. For example, regarding relationship changes, it is not clear that the proposed "skos:wasRelated" will stick to the concept along its lifetime or not. If one concept has associative relations with two concepts and two association relations are changed to hierarchical relations more than one time, there will be more than two "skos:wasRelated" of one concept in the KOS. There is no way to tell the temporal differences between the two "skos:wasRelated". In addition, it is not clear what the differences between the proposed "skos:ConceptLump" and "skos:Concept" and between the proposed "skos:ConceptRefinement" and "skos:Concept". From the perspectives of services providers and consumers, the approach may not be applicable. Tennis (2007) categorized characteristics of scheme changes into three major changes including "structural change, word-use change and textual change" (p.90). Traditionally, term records have been used to manage values of thesaurus entries by thesaurus management manuals and standards (Aitchison, Bawden, & Gilchrist, 2000;National Information Standards Organization, 2005;Soergel, 1974). Tennis (2007) proposed values records, an expanded version of term records, for managing changes. However, although the approach was claimed for scheme versioning in the semantic web, it has not been empirically tested. Tennis and Sutton (2008) differentiate the abstract concept and "concept instances" to capture concept evolution in vocabulary development applications. However, this approach is bound to vocabulary development applications as shown in the iterative workflow described in the article. No doubt, there are some advantages to link concept instances to their abstract concepts. This approach also brings in extra work on maintaining abstract concepts. It is not clear how this approach can couple with situations when any changes happen to abstract concepts including addition, modification, and deletion.

Change Representation
Change representation is one of the six phases (i.e., change capturing, change representation, semantics of change, change implementation, change propagation, and change validation) of ontology evolution in a cyclic loop (Stojanovic, Maedche, Motik, & Stojanovic, 2002). Ontology can be defined as an "explicit specification of a conceptualization" (Gruber, 1993, p. 199). In a broader sense, traditional knowledge organization systems include terms list, subject headings, classification schemes, taxonomies, and thesauri are ontologies. Uschold and Gruninger (2004) categorized taxonomies and thesauri as "lightweight-ontologies". Since ontology change refers to "the generic problem of changing an ontology in response to a certain need" (Flouris, Manakanatas, Kondylakis, Plexousakis, & Antoniou, 2008, p. 117), studies focusing on ontology evolution, versioning and change management are relevant to KOS changes.
Ontology Changes have been categorized into 1) atomic (basic) change; and 2) composite changes (Javed, Abgaz, & Pahl, 2013). Noy, Chugh, Liu, and Musen (2006) introduced a Change and Annotation Ontology (CHAO) to explicitly express the changes between different versions of ontologies. A change history ontology is used to represent changes (Khattak, Batool, Pervez, Khan, & Lee, 2013). A layered change history log has been introduced to log ontology changes (Javed, 2013).
Taking into consideration of syntactic and semantics of ontologies, Zeginis, Tzitzikas, and Christophides (2011) proposed "delta functions" to compare RDFs. A temporal logic approach is used to manage and reason in different versions of an ontology (Huang & Stuckenschmidt, 2005). Palma, Corcho, & Haase (2009) proposed OWL 2 change ontology for managing changes following a temporal ordering.
Types of KOS changes have been identified. Changes happened in a thesaurus fall into six categories: "amendment of existing terms, status of existing terms, deletion or demotion of existing terms, addition of new, or deletion of old relationships, addition of new terms, amendment of existing structure" (Aitchison et al., 2000, p. 170). Changes can be categorized into three general groups including "structural change, word-use change and textual change" (Tennis, 2007, p. 90). With more and more KOS published on the semantic web, KOS changes need to be discussed in this new context.
In summary, change representation in ontology needs to take features of ontologies into consideration. Since KOSs and ontologies are different, some methods (such as reasoning) used in ontologies cannot apply to KOSs. In this study, methods such as change log and temporal order are utilized in the investigation of KOS changes.

OBJECTIVE OF THE STUDY
This study aims to explore how to represent KOS changes on the semantic web.

METHODOLOGY
In order to address the issue, this study takes a three steps approach: 1) formally define changes in KOS, 2) to model change sets; 3) to apply the change sets model to use cases which are taken from Tennis's Studies.

Changes Representation
In the context of the semantic web, Resource Description Framework (RDF) is the fundamental tool and model for representing resources (Manola & Miller, 2004). Knowledge organization systems can be encoded in RDF statements using SKOS and SKOS-XL. KOS changes can be defined as any modification of a KOS.
Definition 1. A knowledge organization system is a set of concepts interrelated by relationships. A KOS K can be denoted as: K = (C, A, R) Where C is a set of concepts, A is a set of attributes, and R is a set of relations. For each concept ci C, A(ci)= {a1, a2, …, am }, and R(ci) = { r1, r2, …, rn}.
In other words, a KOS can be considered as a set of RDF triples of form <subject, predicate, object> with semantics defined in RDF, SKOS and SKOS-XL, if the KOS only uses SKOS and SKOS-XL, RDF.
Two basic operations add and delete are defined above. In terms of changes in KOS, Tennis (2007) identified structural change, word-use change and textual change. More specifically, KOS changes include concept splitting and lumping (Tennis, 2005). Take the example used in Tennis (2005), the changes can be represented using the definitions: The change in Table 1 can be considered as one relation addition operation. The concept "Applications" has a "skos:narrower" relation with the concept "Web services" after the change: ∆= ∆R=R(Concept("Applications")) = "skos:narrower" and "skos:narrower(Concept("Application")) -> Concept("Web services") The change can be represented by Turtle (a terse RDF triple language (Beckett, Berners-Lee, Prud'hommeaux, & Carothers, 2014)) as follows: changelog:operation01 a ch:RelationAdd ; ch:hasTimeStamp "2018-03-01 15:34:45+5" ; ch:subject skos:Concept "Application" ; ch:relation "skos:narrower"; ch:target skos:Concept "Web services" . Changes are a set of operations. Therefore, a general model was built on Changeset (Tunnicliffe & Davis, 2009), change log history (Khattak, Latif, & Lee, 2013), and layered change log . To express KOS changes, an ontology 5 was developed based on the model. At the core of this ontology, changes are formally defined. Figure 2 shows that the hierarchical structure of the AtomicChange class.
The statements indicate when the change starts and ends, who is the operator, and to which operation it links. In this example, the operation (changelog:operation01) is a RelationAdd operation that includes subject, relation, and target.
The second example is listed in Table 2 (Tennis, 2005). In the example, two concepts lumped together. In this example, there are three operations that should be executed in order. An ordered operation list is defined. The refinement changes in Tennis (2007, p.15) is listed as follows: The change can be represented as a ConceptAdd and RelationAdd operations. For the sake of simplicity, the detailed statements are not listed here. However, they are similar to the examples above.
In the ontology, there are only add and delete operations. A modification operation can be expressed by one delete and add operations.

DISCUSSION AND CONCLUSION
The goal of this study is to represent KOS changes. Definitions are given to express KOS changes. In addition, a change set model was proposed along with types of changes. As a result, an ontology for representing KOS changes was proposed. This study took the examples from other studies as use cases. Through the examples, it is clear that KOS changes can be expressed by the proposed approach.
Using this approach, KOS changes are explicitly expressed. The explicitly RDF statements can be queried by using SPARQL Protocol and RDF Query Language (SPARQL). Moreover, changes can be separated from the KOS that makes easier to manage KOS and changes. Formally expressed changes and the separation of changes with KOS make it possible to deal with KOS changes on the semantic web.
Five groups of stakeholders (i.e., LOD Dataset producer group, vocabulary producer group, researcher group, web site/tool developer group, and KOS service provider group) of KOS have been identified (Zeng & Mayr, 2018, p. 5). KOS changes need to propagate not only from producer groups to KOS consumers end but within producer groups. For example, a LOD dataset producer may use more than one KOSs as indicated by Zeng and Mayr (2018). This study is the first step to connect KOS producer groups to consumer groups. The following steps including generating change sets and change propagation from KOS sources to applications need to be investigated. The ultimate goal is to keep KOS current at the application level through change sets without downloading the full KOS every time.