FOLKSONOMIES

Nancy's project main page

Folksonomies: Nancy's project:summer '09

Fall

Pind, L. (2005). Folksonomies: How we can improve the tags Blog posting

Article saved as: pind_2005

The techniques used by professionals to improve the tags are: (a) universe: are aware of the complete vocabulary the categories that exist, (b) arrange the synonyms together, (c) create a hierarchy between the tags.

The author suggests software additions to make folksonomies better. Therefore such a software would: (a) make tag suggestions to the users, (b) find automatically synonyms of the typed tag, (c) show other users’ tags, (d) deduce the hierarchy of more than one tags, (e) make changes to the previous tagged documents

Al-Khalifa, H.S. & Davis, H. (2007). Creating structure from disorder: using folksonomies to create semantic metadata. In: the 3rd International Conference on Web Information Systems and Technologies (WEBIST), 3 - 6 March, 2007, Barcelona, Spain.

Article saved as: al_khalifa_2007

A taxonomy can be characterized as an ontology that follows a top-down approach. Professionals build a hierarchy between words and their meanings in a subject matter. A folksonomy is a bottom-up approach, a vocabulary composed of simple people, taggers, and it has no limit.

Using an application built on del.icio.us the authors provide a way for the automatic creation of an ontology. The software is called FolksAnnotation, and the steps are:

Normalization pipeline

Step 1: the normalization pipeline gets a bookmarked link and it extracts its tags.
Step 2: all tags are converted to lower case, non-english characters are taken out, the usage of a tag in plural is converted to singular (stemming) and finally general tags are eliminated.

Semantic Annotation Pipeline: (connects the tags extracted from del.icio.us and manupalated from the normalization pipeline and creates the semantic metadata, connecting these tags with the existing ontology)

De Meo, P., Quatronne, G., Ursino, D. (2009). Exploitation of semantic relationships and hierarchical data structures to support a user in his annotation and browsing activities in folksonomies. Information Systems, 34, 511-535

Article saved as: meo_2009

Folksonomies disadvantages:

(a) ambiguity: some terms may have more than one meanings, e.g. homonyms

(b) synonyms: synonym tags could be used differently among the different taggers.

(c) quality compatibility between tags: experts in a field may tag a resource with specific and specialized tags in the field, while others non specialists may use general tags

The authors suggest that a way to resolve these three disadvantages of tagging, is a tool which would parse the set of tags the tagger is using. The more inexperienced a user is in a subject area the more useful this tool will be. This tool will suggest multiple meanings of a tag, broaden the subject matter vocabulary, and provide more specialized tags that the user may not be aware of.

Later on the article gives a set of algorithms used for this tool. It gets too technical

(2007). Community Systems Research at Yahoo! Community Systems Group Yahoo! Research. SIGMOD Record, 36(3).

Article saved as: ramarkis_2007

I have googled this project and apart from this article I haven’t found it being referred in another resource.

The GUEST project (Groups of Users going Social in web Two.0 Search) recognizes that users find the social ties of content and that they must be lead through an individual discovery of content. To succeed in that, GUEST tries to find and extract “social networks for common interests”. Therefore, users are able to see other users who share the same interests in tagging, and it gives this connection by presenting the same tags used by two or more people. Users also can ask for specific tags and get the people who have used these tags. At last, the system also suggests tags of a webpage, based on the webpage’s metadata.

This tool gives suggestions for synthesizing two or more tags on a topic, where the one tag is extracted taken into consideration the meaning of the first tag used.

Brooks, C.H., Montanez, N. (2006). An analysis of the effectiveness of tagging in blogs. American Association of Artificial Intelligence.

Article saved as: brooks_2006

The authors got 350 tags from Technorati and they tried to see if there is a relationship between the tags clusters within documents. The problem met is that some of the most popular used tags, like podcasting were not included in WordNet. Then they “grouped documents into clusters and compared the similarity of all documents within a cluster” (page3). Their hypothesis was that “a set of documents under the same cluster will be more relevant than documents that did not belong in the same cluster”, (page 3).

Heymann, Paul and Garcia-Molina, Hector (2006). Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems. Technical Report. Stanford.

Article saved as:heyman_2006

In this article the authors suggest an algorithm, which automatically builds a hierarchy of tags for the data in a tagging system.

Peters. I. Weller, K. (2008). Paradigmatic and syntagmatic relations in knowledge organization systems. Düsseldorf, Informationswissenschaft.

Article saved as:informationwissenschaft_2008

Mostly the Knowledge Organization Systems (KOS) arranges the relationships between concepts. In KOS there are two popular systems used to define the relationships between terms:

(a) Paradigmatic relations: these relations are strict and they create a hierarchical relationship between two words, based on their concepts. Use this type of relation for generalized concepts. Standards: DIN 1463/1, 1987

(b) Syntagmatic relations: connect words not based on their meaning, but based on the fact if they appear together or not.

The relations applied today are:

Relations of equivalence: synonyms and quasi-synonyms
Hierarchical relations: “Two concepts are hierarchical related if one concept includes the extension of the other concept”.
Associative relations: relations between concepts, which cannot be specified, but they have a type of relation, which is not neither a synonym nor a hierarchical relation.

Mostly used indexing languages:

(a) nomenclatures (= “controlled keywords extracted from natural language, following specific rules”)

(b) classifications (= “non-verbal notations which represent concepts and relations between them”)

(c) thesauri (= “controlled terms extracted from natural language and provide hierarchical relation into hyponymy”)

(d) ontologies (= semantic indexing and information integration, they make use of hyponymy and meronymy, as in thesaurus)

Limpens, F., Gordon, F. & Michel, B., (2009). Linking folksonomies and ontologies for supporting knowledge sharing: a state of the art. State of the art on ontology & folksonomy hybrid techniques.

Article saved as: limpens_2008

Folksonomies are considered to be bottom-up approaches in the knowledge organization, while ontologies bottom-up. Folksonomies do not include any semantics, they just represent with semiotics the knowledge on the web.

A semi-formal knowledge representation on the web is the Topic Maps, which can be used by themselves, just like ontologies. Their characteristic is that they structured on the way that humans interpret the notions of the tags used. They differ from ontologies, because they don’t try to create a formal operational scheme, but a “description networks” which will be used by humans- not machines- in their effort to navigate through knowledge.

There are seven types of the role the tags can play (my note: maybe this is a suggestion that we need to make to the users.)

“identifying what or who is about”
“identifying what it is”
“identifying who owns it”
“refining categories”
“identifying qualities or characteristics” (adjectives characterizing an author)
“self-reference” (tags that begin with the word “my”)
“task organizing”
Resource: Golder & Huberman (2005).

Concept analysis techniques

These techniques are used to extract the different components of the same tags users use for the same resources.

Exact phrase from the article (Jäschke et al., 2008). "The technique is composed of a triple of sets, called tri-concepts ({R}, {T}, {U}). Where each user of the set {U} has tagged each resource of the set {R} with all the tags of the set {T}." This is one of the most popular methods to create ontologies from folksonomies
Extract association rules from folksonomies (Schmitz et al., 2006). Based on the previous triple of sets model ({R}, {U}, {T}) into a context with two dimensions. There is not a suggested way of pairing the sets. For instance, a possible pairing could be a record of both users and resources, which are related with the tags T. Then data mining is used.

Measuring the relatedness between tags

Simple cooccurance counting: "Given a folksonomy F (U, T, R, Y) and given a post (u,Tur, r)- where u=user, Tur= user tag, r=resource", it means that the subset of the folksonomy corresponding to an annotation of a user u of a resource r with a set of Tags Tur.
Folkrank based measure of similarity: In the Folkrank measure, the user becomes the most important element of the relationship between the user and the tag. The main idea behind this method, is that tags which were added by important users are weighted more than other tags which were added by users who are not so important.
Mutual information measure for evaluating similarity measures within folksonomies(Markines et al. (2009): In this case, it is used again the triplets of tags, but this time the author suggests "the generalization of these methods of reduction of the dimensionality of the tagging data".
Incremental aggregation methods, Markiness et al. (2009): In this method, each user's tags are taken separately and then they are aggregated across users, "to sum the local similarity calculated for each user's data set".
Grounding the relatedness of tags using a generic hierarchy of concepts (Cattuto et al., 2008): The author proposes a way to give the relationships between the tags semantically. Therefore every tag is manipulated using the following way: (a) a relationship is given between each tag using the different types of measures explained before. (b) these tags are mapped in Wordnet synsets, (c) measurement of the Wordnet definition of each tag and its usage in the tagging process by the user.

Clustering tags

Finding equivalent tags(equivalent either in their meaning or in the topic they describe).
1. stemming algorithms: extract roots from the same words
2. string distance metrics: measurement of the difference between the string of characters of the tags.
3. exploiting online resources: to check the correct spelling
Clustering similar tags:
1. co-occurrence: tags that co-occurre in the same resource
2. Specia & Motta (2007): according to this method, each cluster has a tag, and "a tag is added only if it has a similarity value above a given threshold with all the other tags in the cluster." More specifically, (a) there is a statistical analysis of the folksonomy, (b) with the help of Wordnet, Wikipedia or other thesauri the disambiguation of the tags takes place, (c) clustering, which happens by grouping relevant terms.
3. Begelman (2006): after he creates a way to define the 'strong' tags, "they calculate the cut off frequency of co-occurrence between two tags by looking for a disruption point in the distribution, for each tag, the threshold above which is co-occurring with it".

Semantically enriching folksonomies:

Weller & Peters, 2008: collaboration between taggers- he have this article, seeding, weeding, fertilizing
Gruber, 2005: tag the tags to state that a tag is synonym of another tag- we have this article, Ontology of folksonomies: tiding up things
Tanasescu & Streibel (2007) used Gruber's suggestion, and apart from tagging the tags, also added the extra component of expressing the relationship between the tags. read & annotated
Huynh-Kim, Bang et al. (2008) suggested that users should be the one adding the semantic information of the tags. He suggested this model for teachers communities read & annotated

Ontologies for modeling folksonomies and online communities

Infrastructure for linking tags with ontologies: The MOAT ontology "allows users the link the tags they use with a resource which represents the meaning of the tag. The semantic connection is made on the tagging and not merely on the tag itself". read & annotated.
Semantically interlinked online communities- (SIOC): gives to the social web platforms developers ways to "describe the resources to exchanged within and across communities". This is based on other ontologies, like the Simple Knowledge Organization Scheme, (SKOS).

Ontology process (Braun et al., 2007).

combination of terms used by the taggers in the same subject
point out the concepts and the semantic relationships
combine the semantic relations between the shared concepts (=axiomatize)

Chu, H.J., Randy, Y.C, Chen, S.S. (2005). Semantic association of taxonomy-based standards using ontology.

Article saved as: chu_2005

Formalization of taxonomy: The semantics of text should follow the rules:

"the attributes being used for classification under the general perception in the application domain"
"the entities under the inheritance of the taxonomy and the attributes"

Ontology development from taxonomy
Step 1: relation set identification: The purpose here is to set the "orthogonal relations for a given taxonomy standard so that assumed domain knowledge and complex concepts can be formally specified". There are two types of relationships: (a)primitive relationships: terms, the definition of which is clear in the public, and their relationship does not change through time. The relationships must be clear and they must be formed as "instance-instance", or "instance-class", or "class-class". (b) derived relationships: relations that can be generated from the primitive relationships.
Step 2: relation statements construction: this step builds on the first step, which means the relationships arranged in the first step and the connection of the relationships with the words of the taxonomy.
Step 3: normalization: again this step builds on the results of the previous two steps. There are three gardening rules that need to be used here: (a) "remove the same or equivalent statements", (b) "remove asymmetric properties" and (c)"remove transitive properties".
Step 4: semi-automatic generalization: this step is the last one and its purpose is to "relate the concepts of step 3 to a higher-level concepts connected by the same set of relations. "

MOAT (Meaning Of A Tag) Project: it allows users to create the semantics of their tags, using semantic web resources. Therefore users can tag a picture or a document in del.icio.us and then using MOAT they can connect the tags with a URL, which will be a semantic web resource. This combination of the tag and the URL happends in the MOAT. I see a problem though in this method. MOAT gives examples of tags and the URLs they are connected. And the URLs were dead. When I think of the life of a webpage, it is not trustworthy to depend the whole semantic of a tag to a URL that may or may not exist in a month. This connection can only happen if we create a database, where we will provide all these semantic web resources, and we are sure that we will keep the pages alive.

Tanasescu, V. & Streibel, O. (2007). Extreme tagging: emergent semantics through the tagging of tags. In ESOE at ISWC.

Article saved as: tanasescu_2007

The authors use the term Extreme Tagging Systems to describe the process of collaborative tagging and the expression of relationships between tags. Its main purpose is not to create a hierarchical ontology, but to give the semantic association between tags, which can be "automatically controlled through social network regulation mechanisms".

The authors base the relationship between the users and the tags in the widely used triplet {U}, {R} and {T}. When a user tags for example a picture that depicts a car, the tags vary, e.g. car, vehicle, four wheels, trip. The authors suggest that the users in addition to tagging the document, in this case the picture, they should also tag the tags they are using. This is called a semantic association, and "two entities are semantically associated if they are semantically connected, i.e. there exist a path of relations between them". For example, "two entities are similar if a path from the first one to another is similar to the path from the second to another". The triplets in relation with the semantic association, depict the relation in the RDF clearly.

Tagopedia is a Facebook applet. It is similar to the rest of bookmarking services, but it has an extra component not found in the rest of these services. When a user tags a resource, they are asked to tag a relation or to chose a relation to tag. There is an image on page 7 of the pdf article. What this image has is a text field with a url. Right next to the text field, there is a drop down menu. One option of this menu is the "screen play by" and right next to it there is another text field, where the users can add the extra information. Therefore, this drop down menu is used to define the relationships between the tagged document and the tag itself and create the semantics of the tag.

During the tagging process users follow three steps:

Annotation: users tag documents, building their own list of tags and they don't do this locally, but they can relate their tags with other users' tags.
Navigation: this is the best where semantic associations are being built. When a user tags a document, he maybe the first tagger, therefore he also has to build the relation, or another user has already tagged the same document and the relation exists (Personal note: my question here is who decides if this relation is right or wrong?). Concerning the answer of my note, the article suggests that it is assumed that the user knows the relationship because he has used the same tag before, or because he chooses to tag it when asked to do so.
Control: this is the aid to help the system to evolve. These aids are:

total control over ones annotations: a persons annotations belong to this person and can manipulated by this person.
appreciation and depreciation of tags: a minus or plus sign can be applied to a tag. This ranking helps for the next step. When a tag gets a lot of negative votes it becomes private and it is considered an outlier, therefore, it is not included in the whole tagging system for a resource.
questions to author: since Facebook gives the advantage to gather the users in societies, the one user can find another user and ask for clarifications on the used tags.

Huynh-Kim-Bang, B., Dané, É, Grandbastien, M. (2008). Merging Semantic and participative approaches for organizing teachers' documents. ED-Media '08- World Conference of Educational Multimedia, Hypermedia & Telecommunications, Vienna: France.

Article saved as:huynh_2008

The authors add an applet in the bookmarking tool Scuttle. When a user bookmarks a URL in Scuttle, the URL is added to the rest of the list saved in Scuttle on the most recently tagged section. Users can browse through the different bookmarks through a list which appears on the right hand side of the Scuttle webpage. The application SemanticScuttle allows the users to create relationships between tags. The relationships they can give are either relationships of inclusion, by using the symbol ">" or relationship of equality, using the equal sign "=".

One user context

The relationships are user created. For instance if a user was tagging a document on Airbus, the tags could be defined by the user as: airplane>airbus>a380, test>breaking, type>video. This way the system later can interpret this relationship. The system also remembers if a user has used a tag in the past in the singular or plural. And when this happens, even if the user tags the second time with a tag in plural, the system automatically creates the relation that the test=tests, because the root of the word is the same.

In addition, some users when they are trying to create an inclusion relationship, they can create an inclusion circle, for example airplane>airbus>airplane the informs the user about the mistake and makes him correct it.

Multi user context

The system does not have the function to display tags from all the users at some area of the bookmarking interface (my note: it works the same with Connotea). But through the aggregation of tags function, the tags are grouped easily, like "classic tags if the users have used two tags with the same name the system considers it as one tag, and regroups the bookmarks under this one". The system functions the same way with the structural tags and their relationships. When a user uses the same tag as a "definition tag", but then creates a different type of relationship, these relations are filed under the same tag.

Structural tags display

Scuttle behaves the same way that the most popular bookmarking services do, that is it creates clouds of the most recent and also of the most used tags. Apart from that, the system also creates three extra methods of tags structure. These are:

Number of descendants: "tags are sorted according to the number of tags they include".

Length of branches: "tags are sorted according to the length of their longest branches". For example, a main tag with other 3 three tags under it, is sorted to the length 3.
Number of updates: "tags are sorted according to the number of updates of its branches. Anytime a tag A is added to or removed from a tag B, then all the tags which include tag B increments one update".

McCulloch, E. & Macgregor, G. (2008). Analysis of equivalence mapping for terminology services. Journal of Information Science, 34(70), 70-94

Article saved as: mcculloh_2008

The Simple Knowledge Organization System (SKOS) is combined with RDF to exchange information which will be machine processed. The SKOS system proposes the following properties: exactMatch, broadMatch, narrowMatch, majorMatch, and minorMatch. The SKOS mapping also applies relationships of AND, OR and NOT. For instance to express information where more than one section is included, e.g. 'health services' AND 'administration'.

Miles, A. (2007). SKOS: Simple Knowledge Organization for the Web. Cataloging & Classification Quarterly, 43(3/4), p69-83.

Article saved as:milles_2007

SKOS is used to represent online controlled structured vocabularies, such as (copy & paste from the text):

Thesauri broadly conforming to the ISO 2788:1986 guidelines, such as the UK Archival Thesaurus (UKAT, 2004), the General Multilingual Environmental Thesaurus (GEMET), and the Art and Architecture Thesaurus (AAT) (ISO 5964:1985).
Classification schemes such as the Dewey Decimal Classification (DDC), the Universal Decimal Classification (UDC), and the Bliss Classification (BC2).
Subject heading systems such as the Library of Congress Subject Headings (LCSH) and the Medical Subject Headings (MeSH).

The SKOS structure is the following:

skos:Concept- this is the equivalent of a class in the ontologies. Under this tag the whole structured vocabulary is created.
skos:prefLabel: this is a property that can give labels to the structured vocabulary. Such label could be the word "en" for english
skos:ConceptScheme: this is the umbrella tag for all concepts
skos:broader, skos:narrower: these two properties express the semantic relation between the words. The skos:broader terms is used to express generalization and the skos:narrower is used to express specification
skos:InScheme: this property relate the concepts to the concept Scheme they participate.

At the end the SKOS document is connected with RDF.

The classes in SKOS should not be confused with the classes in OWL. In SKOS the word "class" is used to express the classification of a term among the other terms, while in OWL a class is to describe a concept in a domain.

Laniado, D. Eynard, D, Colombetti, M (2007). Using WordNet to turn a folksonomy into a hierarchy of concepts.

Article saved as: laniado

The authors get tags from del.icio.us and they use WordNet to create the hierarchy of these tags. In their effort to map the tags, the authors encountered some problems. Some tags were not recognizable from WordNet and their mapping was not possible. The total number of tags gathered (N=480,000), a small percent (8%) was recognized by WordNet. Another problem was that even though some words that were used as tags were included in WordNet, the noun of the word was not included.

What the authors suggest is an application which is able to extract the tags from the HTML pages of del.icio.us and create automatically a structure vocabulary using WordNet.

The problems encountered in this process are:

polysemy: a lot of words may have more than one meanings. For this reason, the words with possible different meanings belonged to synsets (synonym sets). For instance, the word turkey it may mean the country and the poultry. This double meaning makes the word to belong at least in two synsets.
To be able to disambiguate the meaning of such tags, it is important to look at the rest of the tags appearing for this entry, and understand the general tagging concept behind the total tags used.

The authors suggest an algorithm that extracts the tags from del.icio.us, does automatically the disambiguation and the connection with the WordNet words.

De Meo, P., Quatronne, G., Ursino, D. (2009). Exploitation of semantic relationships and hierarchical data structures to support a user in his annotation and browsing activities in folksonomies. Information Systems, 34, 511-535

Article saved as: meo_2009

Folksonomies disadvantages:

(a) ambiguity: some terms may have more than one meanings, e.g. homonyms

(b) synonyms: synonym tags could be used differently among the different taggers.

(c) quality compatibility between tags: experts in a field may tag a resource with specific and specialized tags in the field, while others non specialists may use general tags

The authors suggest that a way to resolve these three disadvantages of tagging, is a tool which would parse the set of tags the tagger is using. The more inexperienced a user is in a subject area the more useful this tool will be. This tool will suggest multiple meanings of a tag, broaden the subject matter vocabulary, and provide more specialized tags that the user may not be aware of.

Later on the article gives a set of algorithms used for this tool.

(2007). Community Systems Research at Yahoo! Community Systems Group Yahoo! Research. SIGMOD Record, 36(3).

Article saved as: ramarkis_2007

I have googled this project and apart from this article I haven’t found it being referred in another resource.

The GUEST project (Groups of Users going Social in web Two.0 Search) recognizes that users find the social ties of content and that they must be lead through an individual discovery of content. To succeed in that, GUEST tries to find and extract “social networks for common interests”. Therefore, users are able to see other users who share the same interests in tagging, and it gives this connection by presenting the same tags used by two or more people. Users also can ask for specific tags and get the people who have used these tags. At last, the system also suggests tags of a webpage, based on the webpage’s metadata.

This tool gives suggestions for synthesizing two or more tags on a topic, where the one tag is extracted taken into consideration the meaning of the first tag used.

Nancy's project main page

FOLKSONOMIES

Fall

Navigation menu

Search