ONTOLOGIES: Difference between revisions

Latest revision as of 10:28, 12 November 2009

Nancy's project main page

Summer '09

Noy, N & McGuinness. Ontology Development 101: A guide to creating your first ontology.

Article saved as: noy_1st_guide

Resource Description Framework- (RDF): a language for encoding knowledge on Web pages to make it understandable to electronic agents researching for information. The Defence Advanced Research Projects Agency (DARPA) in conjunction with the W3C is developing DARPA Agent Markup Language (DAML) by extending RDF with more expensive constructs aimed at facilitating agent interaction on the Web.

Ontology → classes/ concepts (description of a concept in a domain of discourse) → slots/roles/properties (properties of each concept describing each various attributes of each concept) → role restrictions (restrictions on slots). A ontology with a set of individual instances of classes constitutes a knowledge base. There is a fine line where the ontology ends and the knowledge base begins.

Classes are the most important part in a ontology and they describe concepts in a domain. A class of wines describes all the wines. The kinds of wines are instances of a class. Subclasses are the dividing of wines in red, white, rose or sparkling and non-sparkling. Slots describe properties of classes, e.g. Chateau Lafite Rothschild Paullac.

Create a ontology (there is not one correct way of creating ontologies):

define classes in an ontology
arrange the classes in a taxonomic hierarchy
define slots and describe values for slots
filling in the values for instances

Steps:

Determine the domain and scope of the ontology. Competency questions: one of the ways to determine the scope of the ontology is to sketch a list of questions that a knowledge base based on the ontology should be able to answer.
Consider re-using existing ontologies
Enumerate important terms in the ontology. Write down a list of terms that you would like to make a statement about or to explain to a user. What are the terms we would like to talk about, what properties these terms have.
Define the classes and the class hierarchy, using three different ways:
1. Top-down: this development starts with the definition of the most general concepts in the domain and subsequent specialization of the concepts.
2. Bottom-up: This process starts with the definition of the most specific classes, the leaves of the hierarchy, with subsequent grouping of these classes into more general concepts.
3. Combination: a combination of the top-down and bottom-up approaches.
Define the properties of classes slots. Once we have defined some of the classes we must describe some of the internal structure of concepts. There are several types of object properties that can become slots in an ontology:
1. Intrinsic
2. Extrinsic
3. Parts, if the object is structured
4. Relationships to other individuals

Define the facets in the slots. Slots can have different facets describing the value type, allowed values, the number of values.
1. Slot cardinality defines how many values a slot can have
2. Slot-value type: Describes what types of values can fit in the slot.
3. String: the simplest value type
4. Number: describes slotes with numeric values
5. Boolean: are simply “yes” or “no” flags
6. Enumerated: specify a list of specific allowed values for the slot
7. Instance-type slots allow definitions of relationships between individuals. Allowed classes for slots of type Instance are often called a range of slot.
Create instances of classes in the hierarchy
1. Choose a subclass
2. Create an individual instance of this class
3. Fill in the slot values

Defining hierarchy

Ensure that a class hierarchy is correct

“is-a” relation: a class A is a subclass of B if every instance of B is also an instance of A.
single and plural: it is not correct to include both the singular and the plural word of a concept in the hierarchy making the one a subclass of the other. Use either singular or plural.
A subclass relationship is transitive: If B is a subclass of A and C is a subclass of B, then C is a subclass of A
Classes and their names. Distinguish between a class and its name. E.g. Classes represent concepts in the domain and not the words that denote these concepts. Synonyms for the same concept do not represent different classes.
Avoid class cycles: e.g. There is a cycle in the hierarchy when some class A has a subclass B and at the same time B is a superclass of A.

Analyze siblings in the class hierarchy

Siblings in a class hierarchy are classes that are direct subclasses of the same class. All the siblings in the hierarchy must be at the same level of generality
There are not rules for how many of direct subclasses a class can have. Many well structured ontologies have between two and a dozen direct subclasses. The guidelines are: (a) if a class has only one direct subclass there may be a modeling problem or the ontology is not complete, (b) if there are more than a dozen subclasses for a given class then additional intermediate categories may be necessary.

Multiple inheritence: a class can be a subclass of several classes.
When to introduce a new class or not: Subclasses of a class usually (1) have additional properties that the superclass does not have, or (2) restrictions different from those of the superclass, or (3) participate in different relationships than the superclasses
A new class of a property value: When modeling a domain, we often need to decide whether to model a specific distinction (such as white, red, or rosé wine) as a property value or as a set of classes again depends on the scope of the domain and the task at hand. Do we create a class White wine or do we simply create a class Wine and fill in different values for the slot color? The answer usually lies in the scope that we defined for the ontology.

If the concepts with different slot values become restrictions for different slots in other classes. Otherwise, we represent the distinction in a slot value. :*
If a distinction is important in the domain and we think of the objects with different values for the distinction as different kinds of objects, then we should create a new class for the distinction.
A class to which an individual instance belongs should not change often.

An instance or a class:

Individual instances are the most specific concepts represented in a knowledge base.
If concepts form a natural hierarchy, then we should represent them as classes

Limiting the scope: As a final note on defining a class hierarchy, the following set of rules is always helpful in deciding when an ontology definition is complete:

The ontology should not contain all the possible information about the domain: you do not need to specialize (or generalize) more than you need for your application (at most one extra level each way).

Classes are disjoint if they cannot have any instances in common

Defining properties

Inverse slots: a value of a slot may depend of a value of another slot
Default values: If a particular slot value is the same for most instances of a a class, we can define this value to be a default value for the slot. hen, when each new instance of a class containing this slot is created, the system fills in the default value automatically. We can then change the value to any other value that the facets will allow.
What is a name: define a naming convention for classes and slots and adhere to it. E.g. you cannot have in Protégé a class winery and a slot winery, but you can have a class Winery and a slot winery. This software is case sensitive.
Capitalization of delimiters:

Use space between two words, Protégé allows it, e.g. Red wine
Run the words together and capitalize each word, e.g RedWine
Use an underscore or a dash, e.g. Red_Wine, Red-Wine

Singular or plural: A class name represents usually a collection of objects. Therefore it would be more natural to be in plural. It should be consistent with the whole ontology.
Prefix and suffix conventions: the two common practices are to add a “has” or a suffix “of”. E.g. has-maker, or maker-of
Other naming considerations: (a) avoid abbreviations, (b) do not add strings such as “class”, “property”, “slot”, to the concept names

Names of direct subclasses of a class should either all include or not include the name of the superclass.

Ducheyne, S. (2009). “To treat of the world”: Paul Otlet’s ontology and epistemology and the circle of knowledge. Journal of Documentation, 65(2), 223- 244

Article saved as: ducheyne_2009

This article is a good philosophical article about the construction of information. It talks about ontologies, but using the meaning the ancient greeks were giving to the word and not the meaning it has been used for folksonomies and ontologies.

Good, B.M., Tennis J.T. (2009). Term based comparison metrics for controlled and uncontrolled indexing languages. Information Research, 14(1).

Article saved as: good_2009

Not a relevant article to the OATP, but informative enough for an introduction of how ontologies work. This research is an effort to compare the vocabularies created by the folksonomies and the indexing languages, such as the Medicine Subject Headings (MeSH). For this research it was investigated the tags users in health sciences apply using the following bookmarking tools: Connotea, Bibsonomy, CiteULike.

The types of relationships between the words as defined in this paper are two:

Intra-term set measures

(1)Observed Linguistic Precoordination, which is defined as whether a term is a part from joint terms with syntactic separators. The terms are also characterized as uniterms (one term), duplets (combinations of two terms) and triplets (combinations of three terms) or quadruplets or higher (combinations of two four or more terms).

(2) Compositionality, counts (a) the number of terms that are a combination of a term and another main term which is tied to the first one, (b) the number of terms that are a combination of a term and another proper term which is tied to the first one, (c) different terms that complete one another and (d) terms composed with each of the terms.

These term sets can be applied to any set of terms independently.

Inter-term set measures

These measures allow a better comparison between the terms. In this paper it was considered that there is not a hierarchy between the terms and their relationship is identical, they all belong to the same level and there are not any synonyms, hyponyms and polysemys.

The first research question is, “if the terms from folksonomies are shaped differently than the terms from controlled vocabularies”. The answer to this question is “yes”. The second of the research questions asked in this paper is, “how much direct overlap exists between terms from Connotea, Bibsonomy and CitULike folksonomies, and terms from MeSH? These folksonomies are used to describe tens of thousands of the same resources as MeSH, hence we expect some overlap in representation, but how much is there in reality?” Connotea cannot depict accurately of how users tag their webpages, since it allows users to import data from other indexing systems, such as MEDLINE. On the other hand, Connotea allows the use of two or more words for tags with spaces, while Bibsonomy and CiteULike do not allow spaces between tags, therefore there can not be a clear indication of how many compound terms are in Bibsonomy and CiteULike. When though the terms from these bookmarking software were compared with the MeSH, Connotea has the most tags that were overlapped with the MeSH, nearly 36% of the terms.

Lund, B., Hammond, T., Flack, M. & Hannay, T. (2005). Social bookmarking tools (II): a case study - Connotea. D-Lib Magazine, 11(4). Retrieved 26 February, 2009

Article saved as: lund_2005

This is a basic article that describes how Connotea works. It is not very useful for our project, but it is useful for someone, who wants to know how Connotea works. The article gives information about the open software source.

This article describes Connoteas features: (1) online storage for references and bookmarks, (2) simple, non-hierarchical organizing, (3) opening the list to others, (4) auto-discovery of bibliographic information. They key features of Connotea are (a) the bookmarklets are scripts of JavaScript code, (b) recognizes if the link belongs to a set of URLs. For instance, if an academic article is bookmarked, Connotea recognizes the author, publication name, issue numbers, etc. (c) tagging, (d) user’s comments, (e) generates RSS feeds.

Special Interest Group in Classification Research, Austin, Texas. Retrieved 26 February, 2009

Article saved as: feinberg_2006

This article is a theoretical view of what social bookmarking is. Especially section 4 is good source if introductory literature review is need about social bookmarking and the hive minds. The literature review compares the kind of tags that a lot of users can create, and the kinds of tags librarians choose to use as descriptors. The author suggests that the tags created by a group of people create a vocabulary of semantics, which has to have as a base a controlled vocabulary, but he questions who will be in the position to create such a vocabulary.

Tennis, J. T. (2006). Social tagging and the next steps for indexing. In J. Furner & J.T. Tennis, (Eds.), Proceedings of the 17th Workshop of the American Society for Information Science and Technology Special Interest Group in Classification Research, Austin, Texas.

Article saved as: tennis_2006

This paper compares social tagging with subject cataloguing. This article covers a literature review of the theory of social tagging and subject cataloguing. The lit review sees (a) a relationship between ontologies and classification (Vickery, 1997 & Soegel, 1999). The author suggests when social tagging is concerned someone has to look at the purpose of the system and the purposes that people tag. In a lot of cases, tagging seems to be personal, influenced by the personality, approach, and the indexer, while cataloguing is “an act of delegation mediated by institutions, such as LoC Subject Headings”. Findings prove that plenty of the social taggers follow a habitual behavior when they tag, while Huberman (2006) showed the opposite thing. In general (Table 6) social tagging has no control in the terms used and no coordination between the taggers. The authority in the tagging system is personal for the folksonomies (Table 7), while indexes have an institutional authority.

Ontology making principles of Open Biomedical Ontologies

This page analyzes the Open Biomedical Ontologies standards. The standards are, an ontology: (a) must be open and available to be used by all, (b) is expressed in a common shared syntax, (c) has a unique identifier space, (d) its provider identifies distinct successive versions, (e) has a precise content, (f) includes textual definitions for all terms, (g) is related with the OBO Relation Ontology, (h) is well documented, (i) has independent users and (g) and it is created collaboratively

Gruber, T. (2007). Ontology of folksonomy: a mash-up of apples and oranges. International Journal of Semantic Web and Information Systems, 3(2), 2007.

Article saved as: gruber_2007

Tagging allows us: (a) collaborative tagging across multiple applications, e.g. Flickr, Del.icio.us (b) collaborative filtering based on tagging: clarify the tags’ meaning, identify vague concepts, find commonalities and reason incompatibilities.

TagOntology is about identifying and formalizing a conceptualization of the activity of tagging and building technology that commits to the ontology at the semantic level. Constraints of tagging:

(a) Systems need to make ontological commitments at the semantic level. What is meant with that is that it has to be defined the definition of each tag and then make sure all users apply this tag with the same way, in the same content.

(b) when users tag tags there has to be an asymentry level relation between the original tag and the meta-tag

Spammers need to be blocked by the users.

Tag identity: Represent a function from names to tags e.g.

f {San francisco} = tag 1

f {San Francisco} = tag 2

f {Sanfracisco} = tag 3

Write clear axioms that define how a particular system handles the name matching.

Ontology building from existing data sources:

Article saved as:

The main idea of this article is to use specific terminologies and existing databases as a starting point for defining ontology. The general steps are: (i) use subsets of several terminologies and merge them, and (ii) the database is rich sources of concepts and will be integrated to produce formal domain ontology.

Steps:

Terminology selection: select the terms and subterms that are very representative in the application domain
Merge local taxonomies: determine conflicts and semantic mismatches
Local Schema database integration process: Present syntactical, structural and semantic heterogeneities
Semantic enrichment: combination of the global taxonomies and global schemas to produce global ontology
mplementation: create the ontology using the four previous steps and describe it in a formal language

This model is suggested because building an ontology from scratch is very cost effective. When an ontology like that is being built is more trustworthy and less expensive

Almeida, M.B. & Barbosa, R.R. (2009). Ontologies in Knowledge management support: A case study. JASIS&T, vol. 60(10).

Article saved as: amleida_2009

Background:

This ontology was created for the second largest Brazilian energy utility. The ontology was built in 3 different layers:

abstract layer: contains generic concepts, which could be of use in other contexts too
organizational layer: concepts that will be used in different sectors of the organization
specific layer: terms for the specific requirements of the Quality Management Policy (QMP)

This ontology includes terms and their definitions for these three different types of layers. The authors gathered the terms from the personnel using a survey and also from other existing ontologies. Then interviewed the personnel with two different types of interviews to obtain the terms used and two other types of interview to create the definitions of these terms.

Ontology structure:
They used bottom-up and top-down approaches
identified the core concepts and then they both generalized or specialized these concepts
The set of terms was organized into a taxonomy
other relations between the terms were set, and the basis was the domain of how specific was the domain.

Implementation:
The concepts and the relations were inserted in the Protégé in the three layers (abstract, organizational and specific)
The results were extracted from the Protégé to the Resource Description Framework (RDFS)

Evaluation of the ontology:

The authors created a search engine prototype and a set of questionnaires to measure the effectiveness of the ontology. The results show that the ontology was successful into answering the basic questions the personnel had in the working environment. The quality of the information the staff were getting was from the ontologies was also positive.

Fall '09

Hsieh, W.T, Stu, J. Chen, Y.L. & Chou, S.C. (2009). A collaborative desktop tagging system for group knowledge management based on concept space. Expert systems with applications, 36, 9513- 9523

This article proposes a Desktop Collaborative Tagging (DCT) system, which will allow users to cooperate when they tag. When people use the DCT to tag, concept spaces is activated, which provides synonym control, relevant search and tag suggestions. This way, the DCT creates multi-faceted categorization. Thus the documents will be better organized and retrieved.

DCS three components

Tag concept spaces: gives the relationship between tags based on users’ tagging behaviors. Concept spaces are created when a tag is converted to an eigenvector (not quite sure what this means and how it works. I will ask at school).

Document similarity: is based on the similarity between the documents’ keywords and tags

Suggested tags for untagged documents: suggested tags are provided both for tagged (by other users) or untagged documents. The eigenvector includes information both for the tag(s) used in a document and for the automatically generated keywords of a document. These keywords are extracted using a keyword extraction module. Since tags are user given and the frequency of some document words is not representative to the document subjects, the tags applied to a document are given more weight than the document keywords. For a document that it was not tagged, the keyword extraction module [the structure of this module and how it works is not presented in this article] will scan the words and give suggested tags, depending on the words appearing in the document (omitting the stop words).

Method:DCS is an individual’s desktop application, but the author expands its use to a collaborative work environment. When a user starts tagging, the concept spaces are activated and other tags are recommended. The advantage of the concept spaces is that it does not only give the items tagged with the same tag from the same user, but also extracts other documents using related tags. How the concept spaces make these suggestions? Every time a user is tagging a document, information about the user, the tags used and the document are saved in the tagging history database. In addition, the DCS has the ability to identify documents by the URL. Therefore, the items returned are based on the tags that were used before, the URL identification, the keyword extraction and the saved history of the database.

DCS Advantages:Eigenvectors provide multi-faceted organization, the recommended tags make easier categorization and the relevance searching is more effective than keyword searching.

Echarte, F., Astrain. J.J., Cordoba, A., Villadangos, J. (2008). Self-adaptation of ontologies to folksonomies in semantic web. Proceedings of World Academy of Science, Engineering and Technology, 33

Article saved as: echarte_2008

OWL is supported by XML, RDF and RDF Schema
The generic ontology is built in Protégé

The authors propose a way where folksonomies can be modeled with ontologies. This method has two components: (a) a generic ontology structure, which can represent any folksonomy and (b) an alorithm, which is able to bring together the folksonomy information into a generic ontology. So when a user tags a document automatically the tags are stored in the ontology.

The generic ontology has the following classes

source: “the sources of applications that use or feed the folksonomy”
resource: any resource that is likely to be annotated
tag: a tag concept
user: the users who tag—they receive a number in the article’s example
annotation: a set of tags to a resource
annotation tag: relate an annotation to the assigned tags
polarity: each tag’s polarity can be negative or positive and it is related to the annotation tag class.

The ontology can define if a tag or an annotation has singulars or plurals, synonyms, misspellings, wrong words.

The exact algorithm is not given in the article, but it gives a table showing the OWL language, which was generated automatically when a user tagged a document using three tags.

Dotsika, F. (2009). Uniting formal and informal descriptive power: reconciling ontologies with folksonomies. International Journal of Information Management, 29, 407- 415

Article saved as: dotsika_2009

ISO Terminology Standards: ISO 704 & ISO 1087-1. These standards give the main classification rules for an information system, from a typical classification to data modeling and behavior and in general for all controlled vocabularies.

Ontology parts:

Granularity (level of abstraction):

low
mid-level
upper

…domain ontologies

Language formality

informal (natural language)
semi-formal (restricted and structural language or semi-formally defined language)
formal (defined by formal semantics)

domain they define (the following items are ‘copy & paste’ from the original text)

role-based: the terminology and the concepts are based to a particular user
process ontologies (terms, relationships, constraints, input and output relevant to a particular process or group of processes)
domain ontologies (terminology and concepts relevant to a particular topic)
interface (structure and content restrictions relevant to an interface)

The authors use different terms to describe the steps from a folksonomy to an ontology. Therefore, it is something relevant to the folksonomy gardening term we met in the bibliography in the past.

Folksontologies:

Step 1: the folksonomy tags are associated and relations are created
Step 2: the definitions of the tags are given using dictionaries or other online tools, e.g. Wikipedia
Step 3: The search engines Swoogle and Wordnet are helpful for the next step of the tags. Swoogle indexes metadata and computes relationships between them and Wordnet provides information for synonyms, homonyms and allows communication between different ontologies.
Step 4: Ontology mapping: conceptual relationship and ontology structure, based on the analysis of tags.

From folksonomies to networks of terms to ontologies. Steps 3.

A tag cloud is produced from the tags
Create a network of tags, using merging and dictionary based filtering of the tags. The dictionary filtering can clean the cloud of those tags that do not exist in dictionaries, a thesaurus can help to define relationships between tags (synonyms), an algorithm can correct the typos and alternative spellings.
Ontology creation: distinguishing concepts from terms, give relationships between concepts

Tags problems in folksonomies

Polysemes and homonyms: words with multiple meanings and no clear definitions
synonyms: a lot of words can mean the same thing
discrepancies in granularity: when different tags appear often that share the same root, e.g. bank, banking, banks

Ontologies quality criteria (these points are “copy & paste” from the original text)

(a) intermediate representations adapted to domain expert authors’ user views,

(b) domain expert authors’ guidelines,

(c) underlying ontology schemas and transformation rules to the intermediate representation and

(d) natural language generation lexicons and grammars for result displaying.

Kashyap talks about two kinds of quality in an ontology:

(a) structural: rich semantic relations, consistency between the terms, and include in the ontology a wide variety of terms

(b) atomic: the concepts, their relationships, the axioms and constraints must have a quality.

Zheng, H.T, Borchert, C., & kim, H.G. (2009). Exploiting corpus-related ontologies for conceptualizing document corpora. Journal of the American Society for Information Science and Technology, 60(11), 2287- 2299

Article saved as: zheng_2009

The authors present software, Clonto, which builds automatically ontologies. Clonto performs the following functions: (a) scans the main body of the document, (b) recognizes the main concepts and (c) creates automatically ontologies. In Clonto there is a hidden semantic analysis for the key concepts. Clonto is able to find documents that include these key concepts and using WordNet (a dictionary that provides relationships between meanings of words) creates automatically ontologies. The documents are linked to the ontology through the key concepts.

Clonto advantages (the following text is “copy & paste” from the original text):

It identifies the key concepts of a document corpus, avoiding information overload for users.
It maintains the semantic relationships between clusters by generating a corpus-related ontology.
The corpus-related ontology helps users easily understand the conceptual structure of the document corpus.
The corpus-related ontology is created in OWL format, which, compared toOBOformat1 or database format, enables more flexible functionality, such as DL reasoning (description logic reasoning).
It allocates documents based on the key concepts using LSA, so a document that is implicitly related to a key concept can be allocated to the appropriate related group.
It creates overlapping groups and handles cross-topic documents well.

In very simple English, the procedure Clonto uses from the moment a document is inserted into the program until the final result is:

A document is inserted and the document’s terms are captured based on the frequency they appears, taking off the stop words. Common phrases are captured with the help of an algorithm.
A weight is given into each term, based on the amount of times it appears in the document (weight=term frequency x inverted document frequency). The term frequency is generated into a matrix scheme
The with the key words and common phrases a key concept is built, using LSA (Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. source wikipedia). Using WordNet all the words with broad meanings are the ones used in the automatic creation of the ontology.
Then the initial document and the rest of the documents in Clonto are linked to the ontology based on the key concepts.

Gruber, T. (1993). Toward principles for the design of ontologies used for knowledge sharing. International Journal Human-Computer Studies, 907-928

Article saved as: gruber_1993

Design criteria for ontologies

clarity: objective and complete definitions
coherence: there must be an agreement between the ontology inferences
extendibility: new terms must be able to be added in the ontology, without the existing definitions need to be changed
minimal encoding bias: the terms should not be defined for specific situations, but their general meaning must be provided
minimal ontological commitment: the conceptualization of the ontology must be based on the domain it was created

After that the article gets technical and it is out of our interest.

Nancy's project main page

ONTOLOGIES: Difference between revisions

Latest revision as of 10:28, 12 November 2009

Summer '09

Fall '09

Navigation menu

Search