Fall final paper


 * Nancy's main page

Suggested terms

 * Pind (2005) describes the steps the professionals use for the tag gardening. The tag gardener must be aware of the general concepts used in a subject field, then arrange synonyms and create a hierarchy among tags. Pind also suggests that when users tag should see a window with suggested synonym tags. A well designed tagging software must be able to find automatically synonyms of the used tags and also to have the ability to allow changes of previously used tags.


 * Al Khalifa (2007) also proposes that the bookmarking software should provide the users with suggested tags. In this example, the author creates an applet for del.icio.us, which is called FolksAnnotation. This applet saves the tags proposed by users, it converts all upper-case words or letters into lower-case, it blocks non-english words and it converts the singular to plural and at the end it is connected through an algorithm with the ontology.


 * De Meo (2009) recognizes that the disadvantages of folksonomies are ambiguity between meaning terms, synonyms, which can be used differently among users and quality between tags. This means that some users may use more general tags, while others more specialized tags. As a solution to this problem, De Meo suggests that a bookmarking tool should not be restricted into just suggesting alternative terms to the ones the users place. The bookmarking software must suggest tags providing a relationship within each used tag. This relationship can be a broader term, or a narrower term, or give multiple alternative meanings.

Clustering

 * Brooks (2006) suggests clusters as a method to prove a relationship between tags. The authors found that clustering is a good way to match documents, but in some cases, the tags can take a lot of improvement. The bookmarking systems should offer extensive tags suggested list, which users can use when they want to tag. The tools that provide automatic tagging apply general tags for the documents, while human taggers in general are more specific.


 * Clustering is also discussed in the Limpens (2009) . Specia & Motta (2007, in Limpens, 2009) create tag clusters, where a “tag is added to this cluster only if it has a similarity value above a given threshold with all the other tags in the cluster”. The steps that indicate this similarity value are:(a) a statistical analysis of the folksonomy, (b) with the help of Wordnet, Wikipedia or other thesauri the disambiguation of the tags takes place, (c) clustering, which happens by grouping relevant terms.

Topic Maps

 * Instead of clustering, in the Limpens (2009) article, Park & Hunting (2002) suggest

Topic Maps, which is also a way of relating tags, but it is more advanced than clustering and less formal than an ontology. The characteristic of the Topic Maps is that they are structured based on the way humans use tags and not according to the structural rules and strict words required to build an ontology.


 * The first step to build a topic map is to define the roles that tags can play. According to the author, these roles are (Golder & Huberman, 2005 in Limpens, 2009):


 * “identifying what or who is about”
 * “identifying what it is”
 * “identifying who owns it”
 * “refining categories”
 * “identifying qualities or characteristics” (adjectives characterizing an author)
 * “self-reference” (tags that begin with the word “my”)
 * “task organizing”


 * The structure of folksonomies is defined as “a triple of sets, called tri-concepts ({R}, {T}, {U}). Where each user of the set {U} has tagged each resource of the set {R} with all the tags of the set {T}.” Using algorithms the weight of the words is estimated and this is one of the most popular methods to create ontologies from folksonomies. Based on the triplet, different methods are used to find the relatedness between tags. For example, Hotho et. al. (2006, in Limpens, 2009) in his project Folkrank believe that the user becomes the most important element in this triplet relationship, while Markiness (2009, in Limpens, 2009) suggests that the tags should not be associated to the user. They should be separated from the user and the other tags the same user defined, and each tag should be treated individually.

Users semantics

 * In the Limpens (2009) article, the MOAT ontology is referred. In this ontology, it is suggested that the users when they tag, apart from the actual tagging, they should also provide a link from the actual tag to a resource that describes the tag. For example, a user can tag a resource on Del.icio.us and then using MOAT they can connect the tags with a URL, which will be a semantic web resource. The combination of the tag and the URL will take place in MOAT. In my personal opinion, if a URL is dies, then the tag is “orphan” and this can create future disambiguities. On the other hand, the Semantically Interlinked Online Communities Project suggests that the users should link the actual tag with other existing ontologies to give a description of the tag.


 * Knowledge Organization Systems
 * Peters & Weller (2008) describe the two most used systems to define relationships between terms:
 * (a) Paradigmatic relations: these relations are strict and they create a hierarchical relationship between two words, based on their concepts. Use this type of relation for generalized concepts. Standards: DIN 1463/1, 1987
 * (b) Syntagmatic relations: connect words not based on their meaning, but based on the fact if they appear together or not.


 * These two types can be used in our organization of tags, and I personally believe that the second option can be more useful for tags, such as the institutional repositories. In addition, it is important for the tag gardener to define the types of relations the words can have between them. These types of relationships are:


 * 1) Relations of equivalence: synonyms and quasi-synonyms
 * 2) Hierarchical relations: “Two concepts are hierarchical related if one concept includes the extension of the other concept”.
 * 3) Associative relations: relations between concepts, which cannot be specified, but they have a type of relation, which is not neither a synonym nor a hierarchical relation.


 * According to Chu (2005), the semantics of text should follow the rules:


 * 1) "the attributes being used for classification under the general perception in the application domain"
 * 2) "the entities under the inheritance of the taxonomy and the attributes"


 * Develop an ontology from a folksonomy
 * Step 1: relation set identification: The purpose here is to set the "orthogonal relations for a given taxonomy standard so that assumed domain knowledge and complex concepts can be formally specified". There are two types of relationships: (a)primitive relationships: terms, the definition of which is clear in the public, and their relationship does not change through time. The relationships must be clear and they must be formed as "instance-instance", or "instance-class", or "class-class". (b) derived relationships: relations that can be generated from the primitive relationships.
 * Step 2: relation statements construction: this step builds on the first step, which means the relationships arranged in the first step and the connection of the relationships with the words of the taxonomy.
 * Step 3: normalization: again this step builds on the results of the previous two steps. There are three gardening rules that need to be used here: (a) "remove the same or equivalent statements", (b) "remove asymmetric properties" and (c)"remove transitive properties".
 * Step 4: semi-automatic generalization: this step is the last one and its purpose is to "relate the concepts of step 3 to a higher-level concepts connected by the same set of relations. "


 * Tanasescu & Steibels (2007) present the Extreme Tagging System, to describe the process of collaborative tagging and the expression of relations between tags. The goal is to create a semantic association between tags.


 * The authors base the relationship between the users and the tags in the widely used triplet {U}, {R} and {T}. When a user tags for example a picture that depicts a car, the tags vary, e.g. car, vehicle, four wheels, trip. The authors suggest that the users in addition to tagging the document, in this case the picture, they should also tag the tags they are using. This is called a semantic association, where "two entities are semantically associated if they are semantically connected, i.e. there exist a path of relations between them". For example, "two entities are similar if a path from the first one to another is similar to the path from the second to another". The triplets in relation with the semantic association, depict the relation in the RDF clearly.


 * The authors created Tagopedia, a Facebook applet to allow users to create semantics between tags. When a user tags a resource, they are asked to tag a relation or to chose a relation and apply it to the tag. There is an image on page 7 of the pdf article illustrates how it works. What this image has is a text field with a url. Right next to the text field, there is a drop down menu. One option of this menu is the "screen play by" and right next to it there is another text field, where the users can add the extra information. Therefore, this drop down menu is used by the user to define the relationships between the tagged document and the tag itself and create the semantics of the tag.


 * During the tagging process users follow three steps:


 * Annotation: users tag documents, building their own list of tags and they don't do this locally, but they can relate their tags with other users' tags.
 * Navigation: this is the best where semantic associations are being built. When a user tags a document, he maybe the first tagger, therefore he also has to build the relation, or another user has already tagged the same document and the relation exists (Personal note: my question here is who decides if this relation is right or wrong?). Concerning the answer of my note, the article suggests that it is assumed that the user knows the relationship because he has used the same tag before, or because he chooses to tag it when asked to do so.
 * Control: this is the aid to help the system to evolve. These aids are:
 * 1) total control over ones annotations: a persons annotations belong to this person and can manipulated by this person.
 * 2) appreciation and depreciation of tags: a minus or plus sign can be applied to a tag. This ranking helps for the next step. When a tag gets a lot of negative votes it becomes private and it is considered an outlier, therefore, it is not included in the whole tagging system for a resource.
 * 3) questions to author: since Facebook gives the advantage to gather the users in societies, the one user can find another user and ask for clarifications on the used tags.


 * Huynh et. al (2008) propose an applet called Scuttle. When a user bookmarks a URL in Scuttle, the URL is added to the rest of the list saved in Scuttle on the most recently tagged section. Users can browse the different bookmarks through a list which appears on the right hand side of the Scuttle webpage. The application SemanticScuttle allows the users to create relationships between tags. The relationships they can give are either relationships of inclusion, by using the symbol ">" or relationship of equality, using the equal sign "=".


 * One user context 


 * The relationships between taga are user created. For instance if a user was tagging a document on Airbus, the tags could be defined by the user as: airplane>airbus>a380, test>breaking, type>video. This way the system later can interpret this relationship. The system also remembers if a user has used a tag in the past in the singular or plural. And when this happens, even if the user tags the second time with a tag in plural, the system automatically creates the relation that the test=tests, because the root of the word is the same.


 * In addition, some users when they are trying to create an inclusion relationship, they can create an inclusion circle, for example airplane>airbus>airplane. The system informs the user about the mistake and makes him correct it.


 * Multi user context 


 * The system does not have the function to display tags from all the users at some area of the bookmarking interface (my note: it works the same way Connotea does). But through the aggregation of the tags function, the tags are grouped easily, like "classic tags if the users have used two tags with the same name the system considers it as one tag, and regroups the bookmarks under this one". The system functions the same way with the structural tags and their relationships. When a user uses the same tag as a "definition tag", but then creates a different type of relationship, these relations are filed under the same tag.


 * Structural tags display 


 * Scuttle behaves the same way that the most popular bookmarking services do, which means that it creates clouds of the most recent and also of the most used tags. Apart from that, the system also creates three extra methods of tags structure. These are:


 * Number of descendants: "tags are sorted according to the number of tags they include".
 * Length of branches: "tags are sorted according to the length of their longest branches". For example, a main tag with other 3 three tags under it, is sorted to the length 3.
 * Number of updates: "tags are sorted according to the number of updates of its branches. Anytime a tag A is added to or removed from a tag B, then all the tags which include tag B increments one update".


 * Hsieh et. al (2009) suggest a Desktop Collaborative Tagging (DCT) system, which allows users to cooperate while tagging. This system provides automatic synonym control, relevant search and tag suggestions, creating a multi-faceted categorization. In this multi-faceted categorization, tags are automatically indexed under facets. The article does not mention the method used to create the facets.


 * The three services of the DCT are:


 * Tag concept spaces: give the relationship between tags based on users’ tagging behaviors. Concept spaces are created when a tag is converted to an eigenvector (not quite sure what this means and how it works. I will ask at school). The suggested tags the user gets are not extracted from the list of tags the same user has used in the past, but also from a list of tags that other users have used too. Concept spaces tags suggestion is based in the following logic. Every time a user is tagging a document, information about the user, the tags used and the document are saved in the tagging history database. In addition, the DCS has the ability to identify documents by the URL. Therefore, the items returned are based on the tags that were used before, the URL identification, the keyword extraction and the saved history of the database.
 * Document similarity: is based on the similarity between the documents’ keywords and tags
 * Suggested tags for untagged documentsdot: suggested tags are provided both for tagged (by other users) or untagged documents. The eigenvector includes information both for the tag(s) used in a document and for the automatically generated keywords of a document. These keywords are extracted using a keyword extraction module. Since tags are user given and the frequency of some document words is not representative to the document subjects, the tags applied to a document are given more weight than the document keywords. For a document that it was not tagged, the keyword extraction module [the structure of this module and how it works is not presented in this article] will scan the words and give suggested tags, depending on the words appearing in the document (omitting the stop words).


 * Echarte et al. (2008) suggests a structure where a folksonomy is automatically converted to an ontology using algorithms. The parts the generic ontology has are:


 * source: “the sources of applications that use or feed the folksonomy”
 * resource: any resource that is likely to be annotated
 * tag: a tag concept
 * user: the users who tag—they receive a number in the article’s example
 * annotation: a set of tags to a resource
 * annotation tag: relate an annotation to the assigned tags
 * polarity: each tag’s polarity can be negative or positive and it is related to the annotation tag class.


 * In general the articles I have read indicate that the automatic creation of ontologies is error prone in the connection of the terms


 * Dotsika (2009) suggests the ISO Terminology Standards as a standard for organizing the words that will be used to create an ontology. The author gives the steps for the migration of the folksonomy to the ontology. These steps are:


 * Folksontologies:
 * Step 1: the folksonomy tags are associated and relations are created
 * Step 2: the definitions of the tags are given using dictionaries or other online tools, e.g. Wikipedia
 * Step 3: The search engines Swoogle and Wordnet are helpful for the next step of the tags. Swoogle indexes metadata and computes relationships between them and Wordnet provides information for synonyms, homonyms and allows communication between different ontologies.
 * Step 4: Ontology mapping: conceptual relationship and ontology structure, based on the analysis of tags.


 * From folksonomies to networks of terms to ontologies. Steps 3.
 * A tag cloud is produced from the tags
 * Create a network of tags, using merging and dictionary based filtering of the tags. The dictionary filtering can clean the cloud of those tags that do not exist in dictionaries, a thesaurus can help to define relationships between tags (synonyms), an algorithm can correct the typos and alternative spellings.
 * Ontology creation: distinguishing concepts from terms, give relationships between concepts


 * Tags problems in folksonomies
 * Polysemes and homonyms: words with multiple meanings and no clear definitions
 * synonyms: a lot of words can mean the same thing
 * discrepancies in granularity: when different tags appear often that share the same root, e.g. bank, banking, banks


 * Ontologies quality criteria (these points are “copy & paste” from the original text)
 * (a) intermediate representations adapted to domain expert authors’ user views,
 * (b) domain expert authors’ guidelines,
 * (c) underlying ontology schemas and transformation rules to the intermediate representation and
 * (d) natural language generation lexicons and grammars for result displaying.


 * Zheng, Borchert & Kim (2009) introduce Clonto, a software that builds automatically ontologies. The software has the ability to scan the documents and recognize the main concepts to scan the ontology. The tags are linked to an ontology through the key concepts of a document. In addition, Clonto is able to maintain the relationships between the tags that appear in different clusters and can recognize documents that cover the same topics.
 * Simply what Clonto does with the help of an algorithm are: (a) a document’s terms are captured based on the frequency they appear, (b) these terms are weighted again based on the frequency they appear, (c) the stop words are extracted, (d) a relationship is created between the key words and they common phrases using the Latent Semantic Analysis, (e) with WordNet the words with the broadest meanings are the ones used for the automatic creation of the ontology.

Back to the main page
Nancy's main page

=Bibliography=