Hadoop: MapReduce Introduction:
Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.
A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
Typically the compute nodes and the storage nodes are the same, that is, the MapReduce framework and the Hadoop Distributed File System (see HDFS Architecture Guide) are running on the same set of nodes. This configuration allows the framework to effectively schedule tasks on the nodes where data is already present, resulting in very high aggregate bandwidth across the cluster.
The MapReduce framework consists of a single master JobTracker and one slave TaskTracker per cluster-node. The master is responsible for scheduling the jobs' component tasks on the slaves, monitoring them and re-executing the failed tasks. The slaves execute the tasks as directed by the master.
Minimally, applications specify the input/output locations and supply map and reduce functions via implementations of appropriate interfaces and/or abstract-classes. These, and other job parameters, comprise the job configuration. The Hadoop job client then submits the job (jar/executable etc.) and configuration to the JobTracker which then assumes the responsibility of distributing the software/configuration to the slaves, scheduling tasks and monitoring them, providing status and diagnostic information to the job-client.
Although the Hadoop framework is implemented in JavaTM, MapReduce applications need not be written in Java.
Thursday, 22 December 2011
More Research Topics:
MapReduce: A software framework introduced by Google to support distributed computing on large datasets.
Answer Set Programming (ASP): Declarative programming oriented towards difficult (primarily NP-hard) search problems. ASP includes all applications of answer sets to knowledge representation. Answer Set Solvers: smodels, assat, clasp and dlv.
Using these two concepts I am going to start reading and understanding about MapReduce and Answer Set Programming and one of its solvers dlv. The challenge is to produce an implementation using ASP and MapReduce together.
MapReduce: A software framework introduced by Google to support distributed computing on large datasets.
Answer Set Programming (ASP): Declarative programming oriented towards difficult (primarily NP-hard) search problems. ASP includes all applications of answer sets to knowledge representation. Answer Set Solvers: smodels, assat, clasp and dlv.
Using these two concepts I am going to start reading and understanding about MapReduce and Answer Set Programming and one of its solvers dlv. The challenge is to produce an implementation using ASP and MapReduce together.
Monday, 19 December 2011
Research Topics:
After talking to Dr. Kewen Wang he suggested me some research topics. He talked about a new model in the World Wide Web in which Artificial Intelligence and Knowledge Representation is applied. He said, nowadays the Internet is a web of linked documents, but not data. Efforts like DBpedia are trying to make a new Internet. The solution to the myriad of data formats could be Linking Open Data (LOD). He also mentioned about Wikipedia evolution, from WorldNet to Wikipedia and the question is now LOD? Based on what have been done in WorldNet and Wikipedia what could be an innovative approach.
Research Topics:
1. Resolve conflicts
Research Topics:
1. Resolve conflicts
2. Ranking candidate solutions
3. Linked open data
Thursday, 15 December 2011
Firefox extensions and On-line tools for the Semantic Web
Firefox Extensions
Semantic Radar: Displays a status bar icon to indicate presence of Semantic Web (RDF) data in the web page.
More extensions
On-Line Tools
Semantic Query End-Point
Firefox Extensions
Semantic Radar: Displays a status bar icon to indicate presence of Semantic Web (RDF) data in the web page.
More extensions
On-Line Tools
Semantic Query End-Point
Thursday, 8 December 2011
Installing a Semantic Web Environment
On Ubuntu 10.04
I'll be doing some tests on Jena Framework for the Semantic Web. First things first we need to install Jena. As long as Java is correctly configured to install Jena we just need:
1. Download Jena
2. Unzip it in any folder
3. Run the test as said in this instruction tutorial
We will also need a Java IDE, as recommended by this book: Semantic web programming. I have installed Eclipse in my Ubuntu 10.04. It is very easy:
$sudo apt-get install eclipse
On Windows 7
1. Setting up the environment variable:
http://introcs.cs.princeton.edu/java/15inout/windows-cmd.html
2. Java editor: Eclipse
3. Ontology editor: Protégé
4. Semantic web programming framework: Jena
5. Pellet reasoner
That's it!
This is a basic environment installation. I'll be doing more interesting things by following the above book. The book has a web site in which they have all the source code they use.
On Ubuntu 10.04
I'll be doing some tests on Jena Framework for the Semantic Web. First things first we need to install Jena. As long as Java is correctly configured to install Jena we just need:
1. Download Jena
2. Unzip it in any folder
3. Run the test as said in this instruction tutorial
We will also need a Java IDE, as recommended by this book: Semantic web programming. I have installed Eclipse in my Ubuntu 10.04. It is very easy:
$sudo apt-get install eclipse
On Windows 7
1. Setting up the environment variable:
http://introcs.cs.princeton.edu/java/15inout/windows-cmd.html
2. Java editor: Eclipse
3. Ontology editor: Protégé
4. Semantic web programming framework: Jena
5. Pellet reasoner
That's it!
This is a basic environment installation. I'll be doing more interesting things by following the above book. The book has a web site in which they have all the source code they use.
Semantic Web Real World Examples:
Example 1:
Try googling for all cars advertised on the web with engines smaller than 2.0 litres that run unleaded, and have an mp3 connection and can been seen in a showroom conveniently accessible by public transport from your house. Google is unable to help you. You have to make several searches and correlate the results yourself. On the Semantic web, you can express an interest in products for sale that are cars, and add the constraints. Every result would be useful.
An example is a site that gives the weather for any city in the world, in HTML form. Even though the site offers dynamic, database-driven information, it is presented in a purely syntactic way. One could imagine a computer program that tried to retrieve this weather information through text parsing or "web scraping". Though it would be possible to do, if the creators of the site ever decide to change around the layout or HTML of the site, the computer program would most likely need to be rewritten in some way. In contrast, if the weather site published its data semantically, the program could retrieve that semantic data, and the site's creators could change the look and feel of the site without affecting that retrieval ability.
Example 1:
Try googling for all cars advertised on the web with engines smaller than 2.0 litres that run unleaded, and have an mp3 connection and can been seen in a showroom conveniently accessible by public transport from your house. Google is unable to help you. You have to make several searches and correlate the results yourself. On the Semantic web, you can express an interest in products for sale that are cars, and add the constraints. Every result would be useful.
Example 2:
You want to correlate data that is not clearly related. Like for example, country walks in a population versus the levels of clinical obesity in the same population. This kind of information can be watched in Gapminder.
Technologies for the Semantic Web:
SPARQL Query Language
RDF Language to organize information and represent resources
Monday, 5 December 2011
Open Rules. Business Rules Management Methodology and Supporting Tools
- Offers a methodology and open source tools for business analysts to create a Business Rules Repository
- Repositories are used across enterprises as a foundation for rules-based applications with complex business, processing, and presentation logic
- It uses familiar graphical interfaces provided by MS Excel, OpenOffice and Google Docs
- OpenRules supports collaborative rules management.
- OpenRules® also helps subject matter experts to work in concert with software developers to integrate their decision models into existing infrastructures for Java and .NET.
- OpenRules makes rules-based systems less expensive, easier to develop and manage, and more sustainable.
Reference:
Visual Rules Execution Platform
Providing Rules as Web Services
Visual Rules Execution Platform provides a centralized rule deployment and execution environment that allows rules to be easily integrated into many applications running on any platform. Hot deployment capabilities ensure that new rule versions are made available with zero downtime.
Any rule models deployed to Visual Rules Execution Platform automatically become available as standard web services. These services can be consumed by a wide variety of clients, not limited to Java architectures.
Reference:
W3C Workshop on Rule Languages for Interoperability
Overview:
- Rule languages and rule systems are widely used in :
- Database integration
- Service provisioning
- Business process management
- General purpose rule languages remain relatively non-standardized
- Rule systems from different suppliers are rarely interoperable
- The Web has achieved remarkable success in allowing documents to be shared and linked
- Semantic Web languages like RDF and OWL are beginning to support data/knowledge sharing
- Having a language for sharing rules is often seen as the next step in promoting data exchange on the Web
Summary:
- In April 2005, the W3C held a workshop to explore options for establishing a standard web-based language for expressing rules.
- Half-dozen candidate technologies were presented and discussed.
- The workshop confirmed the differences among types of rules. "if condition then action" rules and "if condition then condition" rules.
Introductory Sessions:
- During the first session everyone met with each other. There were three backgrounds: business rules, logic programming and semantic web.
- The second session had two presentations proposing scopes for a standard, and one on the W3C approach to standardization.
- Both scope/requirements presentations suggested that no single rule language would cover all the requirements but that there could be a common core to a family of languages.
Candidate Technologies:
- WSML, RuleML, SWSL, N3, SWRL, Common Logic, TRIPLE. Primarily academic efforts.
- What constitutes a candidate technology for standardization, what kind of specification is needed for a rule language?
- The RuleML presentation claimed a slightly different ground, focusing more on the exchange format and interoperability.
- The other main line of discussion was thus about what is feasible in a short time and what should be the scope of the standard.
- Some of the participants argued for a simple set of features to start with instead of a very rich and complex language.
- The candidate technologies have not been tested on commercial rule bases.
The discussion revolved largely around formal issues and semantic features.
Related Standards:
- The Production Rule Representation (PRR)
- A standard Java API for rule engines
- The Semantic of Business Vocabulary and Business Rule meta-model (SBVR, aka the "semantic beaver").
- PRR is limited to forward-chaining and sequential rule processing
- The lightweight JSR-94 API does not specify the behavior of the engine
- SBVR is for business modeling by business users, in their own terms. It provides structured English for business rules from which the meaning can be extracted as formal logic.
Issues:
Negation as a failure:
- Many features of the Web (including search engines) report failure for inscrutable and unpredictable reasons.
- In a database if a record is not found, then we can assume it is not true. On the Web if a book isn't found by a search engine, it might just mean it failed to crawl the appropriate part of the Web.
Relationship to Description Logics (OWL)
Users want a language where they can represent both rules and ontologies. This topic came up in nearly every session.
Syntax Options
People want rules in many different styles of syntax, driven by who (or what) they expect will be reading and writing rules.
- XML is convenient for machine interchange, and appears to be widely deployed and understood by rule users and implementers.
- English-like syntaxes are often good for people who are not experts in the language, especially if they need to read and understand a rule set.
- Programmer-oriented syntaxes, on the other hand, are designed for people who know the language well.
- An Abstract syntax is defined to not be directly usable; rather, it is mapped to one or more concrete syntaxes, each of which will be in one of the above styles. It is possible to have an abstract syntax, several normative concrete syntaxes, and several non-normative concrete syntaxes.
- An RDF syntax (where the syntactic structures are described in RDF) has some of the appeal of an abstract syntax while being directly usable by machines. However, there is significant doubt whether a rule language can be defined with an RDF syntax and still have consistent semantics.
Conclusions:
The most obvious conclusion from the workshop is that there was significant interest in establishing a standard language for expressing rules.
- Customers are demanding standards to protect their rule assets. They want portability across vendors, platforms, and applications. The want to be able to repurpose, reuse, and redistribute rule sets.
- The standard should be simple. This field tends towards a complex work, and we will seriously endanger deployment if we go that route. It should be simple to use and relatively simple to implement.
- Compatibility with deployed and emerging technologies. In particular, compatibility with RDF, OWL, OMG PRR, and ISO Common Logic, along with common programming and rule methodologies will allow people to understand and adopt the work much more quickly.
- A Working Group in this field should be given a narrow and well-defined scope. People should be able to see, early on, if the work is relevant to their uses for rules, instead of having the Working Group trying to prioritize from an overwhelming sea of features.
Reference:
Cruzar: An application of semantic matchmaking for eTourism in the city of Zaragoza
General Description
Technical details of the solution
Route customization
Key Benefits of Using Semantic Web Technology
Semantic web technologies are put into practice:
CRUZAR implements a matchmaking algorithm between objects that are described in RDF, and it pipes the results to a planner algorithm. Moreover, at the same time, it offers an innovative service for visitors to plan their trip in advance, exploiting expert knowledge. These features are often used as important examples to illustrate the promises of the Semantic Web.
References:
http://www.w3.org/2001/sw/sweo/public/UseCases/Zaragoza-2/
General Description
The web is a big showcase for cities that want to build their tourism industry. Nowadays, many tourists plan their trips in advance using the information that is available on web pages. Information on the websites often leads to information-bloated and multimedia-rich web sites which are similar to digital versions of printed brochures. Everyone receives the same information, regardless of their interests. This is unlike when they visit a tourism office, and receive customized information and recommendations based on their profile and desires.
CRUZAR is a web application that uses expert knowledge (in the form of rules and ontologies) and a comprehensive repository of relevant data (instances) to build a custom route for each visitor profile. CRUZAR can potentially generate an infinite number of custom routes and it offers a much closer fit for each visitor's profile.
There are a number of reasons that make this city an excellent test bed for such a project. In the first place, Zaragoza has a high density of Points of Interest (POIs). Zaragoza is one of the biggest cities in Spain, and it enjoys a very dynamic cultural agenda, as well as frequent top-level sport events. Finally, the city council has extensive databases with all the aforesaid information, including content in five languages.
Technical details of the solution
The first challenge was to collect the required data from existing relational databases which are used to feed the content of the Official Website of Zaragoza. This data was split across the following four information silos:
- The CMS database which feeds the city council web site with pertinent information for tourists visiting Zaragoza. Monuments, historical buildings of the city, restaurants, accommodation, green spaces, shopping areas and other relevant points of interest.
- A database which contains up-to-date information about upcoming cultural events and leisure activities.
- The city council web site which mainly displays photographs of the area.
- The IDEZar, which is a Geographic Information System hosted by the University of Zaragoza. It is designed to use REST web services to fetch maps as raster images and to compute the shortest path between two geo-referenced points of the city.
The information contained in these databases is transformed into RDF data using specific adapters. This process takes place regularly every time the databases are updated.
Representing knowledge of the domain
An ontology is used to organize the RDF data. The CRUZAR ontology captures information about three types of domain entities: 1) Zaragoza’s tourism resources, mainly events and POIs, 2) user profiles to capture the visitors' preferences and their context, and 3) the route configuration. The conceptual structure of CRUZAR is based on the upper-ontology DOLCE.
Events and POIs are defined in terms of their intrinsic features: position, artistic style or date. Conversely, visitors’ profiles contain information on their preferences and their trip: arrival date, composition of the group, preferred activities, etc. In order to match the local information with the preferences, a shared vocabulary is needed. The central concept of this intermediate vocabulary is “interest”. Visitors’ preferences are translated to a set of “interests”, and POIs and events can attract people with certain “interests”. This translation is captured as production rules, which are executed using the Jena rule engine. These rules are simple enough to be easily understood by the domain experts.
POIs ranking
All the POIs in Zaragoza are dynamically ranked to reflect their “subjective interest” according to the profile of each visitor. At the end of the matchmaking process, a numerical score is assigned to all POIs to quantify their anticipated level of interest. Initially every POI has a static score or relevance which was decided by the experts of the domain (“objective interest”). The semantic matchmaking process is executed individually for each POI, and its output is a calculated score for the resource (“subjective interest”). The value of this score depends on how many of the visitor’s interests (derived from their profile) are fulfilled by each POI.
Route Planning
After all the candidate POIs have been sorted by their subjective interest, a planner algorithm is run in order to create the route. The main driving force of the algorithm is to balance the quantity and quality (interestingness) of the selected POIs and the distance.
Route customization
The route proposed by the system is offered to the user using an accessible, information-rich interface that includes: the sequence of selected POIs, a tentative timetable for each day, a map highlighting the POIs, suggestions of other interesting places near the route, and two sets of recommended restaurants near the last POI of the route. Complementary activities, such as events (music concerts, sport events, etc.) and shopping, are also suggested. Users can interact with the generated route in a number of ways.
Key Benefits of Using Semantic Web Technology
Semantic web technologies are put into practice:
- to integrate and to organize data from different sources
- to represent and to transform user profiles and tourism resources
- to capture all the information about the generated routes and their constraints.
References:
http://www.w3.org/2001/sw/sweo/public/UseCases/Zaragoza-2/
Thursday, 1 December 2011
IBM FileNet P8 Platform
FileNet is a company that developed software to help enterprises manage their content and business processes. The FileNet P8 platform is a framework for developing custom enterprise systems. FileNet combines enterprise content management reference architecture with comprehensive business process management and compliance capabilities. The FileNet P8 platform is a key element in creating an agile, adaptable Enterprise Content Management (ECM) environment necessary to support a dynamic organization that must respond quickly to change.
You can use the workflow software to create, modify, manage, analyze, and simulate workflows (also referred to as business processes) that are performed by applications, enterprise users, and external users such as partners and customers. The functionality to define your workflows extends from the integrated expression builder, which provides a means of associating complex business rules with various routes between workflow steps
Examples of automated processes:
Use FileNet Process software to automate the flow of work to complete a structured business process. Examples of automated processes include:
- Circulating documents for a systematic review and approval process
- Processing new employee paperwork
- Submitting travel expense reports for approvals and payment
- Handling customer queries
Starting with FileNet Process Applications:
Multi-step business processes centre on the systematic routing of documents and information, with each step completed by the appropriate participant or an automated program. An individual workflow automates the routing and processing of a particular type of document, or set of documents, for a specific business process.
In a process system, different users perform different activities:
Participant: Participate in a workflow and Launch a workflow
Workflow Administrator: Manage work in progress
Workflow author: Design a workflow
System Administrator: Set up and maintain a Process system
Developer: Develop custom applications
Integrating business rules
Workflow authors and business analysts can create and add business rules to individual steps of a workflow definition. You can use third party rules software to separate the business rules from the process, making it easier for a business analyst to independently manage the process and the rules behind the process, rather than modifying a workflow definition.
To implement rules functionality in a workflow, the workflow author and the business analyst work together to determine how rules will be used in the workflow, what decisions will be controlled by rules, what workflow data will be required, appropriate names for the rule sets, and the steps in a workflow where the rules will execute.
Rules integration using web services
A business rules management system leverages industry standard web services as a communication
mechanism for invoking business rules. FileNet P8 Business Process Manager provides the ability to
configure and invoke web services from within a workflow. Steps:
- Author a rule
- Deploy it to the business rules management system
- Generate a Web Service Definition Language (WSDL)
- Import the WSDL into the Process Engine
- The business rule is then available to use within a workflow
- The final step is to configure calls to the rules engine to execute the business rules as part of a workflow
Rules Integration using the Rules Connectivity Framework
The Process Engine server uses TCP/IP to communicate with the Rules Listener. The Rules Listener is
implemented in Java as a multi-threaded process. It hosts the rules vendor JAR file that implements the
rules vendor functionality. The rules vendor must provide a JAR file that contains an implementation of the
IFNRule listener interface in order for to be invoked from the RCF. The IFNRule listener Java interface is
defined by IBM FileNet. The Rules Listener looks for the rules vendor JAR file and if it is present, loads it
and enable the rules functionality. The following figure provides a graphical high level view of the rules
integration
Enterprise Content Management: Is a formalized means of organizing and storing an organization's documents, and other content, that relate to the organization's processes.
An Overview of W3C Semantic Web Activity:
The Semantic Web is an extension of the current Web in which the meaning of information is clearly and explicitly linked from the information itself. The World Wide Web Consortium (W3C) Semantic Web Activity, researchers and industrial partners want to enable standards and technologies to allow data on the Web to be defined and linked in such a way that it can be used for more effective discovery, automation, integration and reuse across various applications. The Internet will reach its full potential when data can be processed and shared by automated tools as well as by people.
The Semantic Web fosters greater data reuse by making it available for purposes not planned or conceived by the data provider. E.g. you want to locate news articles published in the previous month about companies headquartered in cities with populations under 500,000. The information may be there in the Web, but currently only in a form that requires intensive human processing.
The Semantic Web will allow:
- It will allow information to surface in the form of data, so that a program doesn't have to strip the formatting, pictures and ads off a Web page and guess at how the remaining page markup denotes the relevant bits of information.
- It will allow people to generate files that explain – to a machine – the relationship between different sets of data. For example, one will be able to make a "semantic link" between a database with a "zipcode" column and a form with a "zip" field to tell the machines that they do actually mean the same thing. This will allow machines to follow links and facilitate the integration of data from many different sources.
Being "semantically linked" means that the Semantic Web will allow people to make relations with the data. Relationships such as hasLocation, worksFor, isAuthorOf, hasSubjectOf, dependsOn, etc., will allow web machines to find related information in a more natural way. At the moment these kind of relationships are there but only they can be understood by people.
The development of the Semantic Web is underway in at least two very important areas: (1) from the infrastructural and architectural position defined by W3C and (2) in a more directed application-specific fashion by those leveraging Semantic Web technologies in various demonstrations, applications and products.
Enabling standards:
Uniform Resource Identifiers (URIs) are fundamental for the current Web and are in turn a foundation for the Semantic Web. URIs provide the ability for uniquely identifying resources of all types – not just Web documents – as well as relationships among resources. Besides the development of the Extensible Markup Language (XML) and the Resource Description Framework (RDF) help to represent relationships and to obtain meaning.
Enabling standards:
Uniform Resource Identifiers (URIs) are fundamental for the current Web and are in turn a foundation for the Semantic Web. URIs provide the ability for uniquely identifying resources of all types – not just Web documents – as well as relationships among resources. Besides the development of the Extensible Markup Language (XML) and the Resource Description Framework (RDF) help to represent relationships and to obtain meaning.
The W3C Semantic Web Activity plays a leadership role in both the design of specifications and the open, collaborative development of technologies focused on representing relationships and meaning and the automation, integration and reuse of data.
The base level standards for supporting the Semantic Web are currently being refined by the RDF Core Working Group.
The base level standards for supporting the Semantic Web are currently being refined by the RDF Core Working Group.
The Web Ontology (http://www.w3.org/2001/sw/WebOnt/) Working Group standards efforts are designed to build upon the RDF core work a language, OWL ( http://www.w3.org/TR/owl-ref/), for defining structured, Web-based ontologies. Ontologies can be used by automated tools to power advanced services such as more accurate Web search, intelligent software agents and knowledge management. Web portals, corporate website management, intelligent agents and ubiquitous computing are just some of the identified scenarios that helped shape the requirements for this work.
Semantic Web Advanced Development (SWAD):
SWAD-Europe aims to highlight practical examples of where real value can be added to the Web through Semantic Web technologies. Their focus is on providing practical demonstrations of how (1) the Semantic Web can address problems in areas such as sitemaps, news channel syndication, thesauri, classification, topic maps, calendaring, scheduling, collaboration, annotations, quality ratings, shared bookmarks, Dublin Core for simple resource discovery, Web service description and discovery, trust and rights management and (2) effectively and efficiently integrate them.
SWAD-Europe aims to highlight practical examples of where real value can be added to the Web through Semantic Web technologies. Their focus is on providing practical demonstrations of how (1) the Semantic Web can address problems in areas such as sitemaps, news channel syndication, thesauri, classification, topic maps, calendaring, scheduling, collaboration, annotations, quality ratings, shared bookmarks, Dublin Core for simple resource discovery, Web service description and discovery, trust and rights management and (2) effectively and efficiently integrate them.
The W3C is running some other projects such as: SWAD-Simile and SWAD-Oxygen
Conclusion:
- The Semantic Web is an extension of the current Web.
- It is based on the idea of having data on the Web defined and linked such that it can be used for more effective discovery, automation, integration and reuse across various applications.
- Provides an infrastructure that enables not just Web pages, but databases, services, programs, sensors, personal devices and even household appliances to both consume and produce data on the Web.
- Software agents can use this information to search, filter and prepare information in new and exciting ways to assist Web users.
- New languages make significantly more of the information on the Web machine-readable.
Notes:
Both authors are part of the W3C and can be contacted by email Eric Miller em@w3.org and Ralph Swick swick@w3.org
References:
An Overview of W3C Semantic Web Activity: http://onlinelibrary.wiley.com/doi/10.1002/bult.280/full
Monday, 28 November 2011
The Semantic Web
Expressing meaning:
Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully. Computers can adeptly parse Web pages for layout and routine processing here a header, here a link to another page but in general, computers have no reliable way to process the semantics. Significant new functionality as machines become much better able to process and "understand" the data that they merely display at present.
The Web has developed most rapidly as a medium of documents for people rather than for data and information that can be processed automatically. The Semantic Web aims to make up for this.
The Semantic Web will be as decentralized as possible.
Knowledge Representation:
For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.
Traditional knowledge-representation systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as "parent" or "vehicle.
The challenge of the Semantic Web, therefore, is to provide a language that expresses both data and rules for reasoning about the data and that allows rules from any existing knowledge-representation system to be exported onto the Web.
Two important technologies for developing the Semantic Web are already in place: eXtensible Markup Language (XML) and the Resource Description Framework (RDF). Meaning is expressed by RDF, which encodes it in sets of triples, each triple being rather like the subject, verb and object of an elementary sentence.
The triples of RDF form webs of information about related things.
Ontologies
Two databases may use different identifiers for what is in fact the same concept, such as zip code. A program that wants to compare or combine information across the two databases has to know that these two terms are being used to mean the same thing.
A solution to this problem is provided by the third basic component of the Semantic Web, collections of information called ontologies. In philosophy, an ontology is a theory about the nature of existence, of what types of things exist; ontology as a discipline studies such theories. Artificial-intelligence and Web researchers have co-opted the term for their own jargon, and for them an ontology is a document or file that formally defines the relations among terms. The most typical kind of ontology for the Web has a taxonomy and a set of inference rules.
References:
The semantic web. Tim Berners-Lee, et al. http://www.dblab.ntua.gr/~bikakis/SW.pdf
Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully. Computers can adeptly parse Web pages for layout and routine processing here a header, here a link to another page but in general, computers have no reliable way to process the semantics. Significant new functionality as machines become much better able to process and "understand" the data that they merely display at present.
The Web has developed most rapidly as a medium of documents for people rather than for data and information that can be processed automatically. The Semantic Web aims to make up for this.
The Semantic Web will be as decentralized as possible.
Knowledge Representation:
For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.
Traditional knowledge-representation systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as "parent" or "vehicle.
The challenge of the Semantic Web, therefore, is to provide a language that expresses both data and rules for reasoning about the data and that allows rules from any existing knowledge-representation system to be exported onto the Web.
Two important technologies for developing the Semantic Web are already in place: eXtensible Markup Language (XML) and the Resource Description Framework (RDF). Meaning is expressed by RDF, which encodes it in sets of triples, each triple being rather like the subject, verb and object of an elementary sentence.
The triples of RDF form webs of information about related things.
Ontologies
Two databases may use different identifiers for what is in fact the same concept, such as zip code. A program that wants to compare or combine information across the two databases has to know that these two terms are being used to mean the same thing.
A solution to this problem is provided by the third basic component of the Semantic Web, collections of information called ontologies. In philosophy, an ontology is a theory about the nature of existence, of what types of things exist; ontology as a discipline studies such theories. Artificial-intelligence and Web researchers have co-opted the term for their own jargon, and for them an ontology is a document or file that formally defines the relations among terms. The most typical kind of ontology for the Web has a taxonomy and a set of inference rules.
References:
The semantic web. Tim Berners-Lee, et al. http://www.dblab.ntua.gr/~bikakis/SW.pdf
Semantics of Business Vocabulary and Information Rules (SBVR)
Originated in December 11, 2007. This is a new standard to represent business semantics for machines.
The majority of specialists in the field of semantics focus on what symbols denote, rather than what they connote. There are two fundamental tests for whether a machine is 'doing semantics'.
The essence of SBVR then turns out to be a vocabulary a painfully deep and complete vocabulary covering all the discrete ideas needed to structure the semantics of rule-ish sentences
Originated in December 11, 2007. This is a new standard to represent business semantics for machines.
Test 1: Can the machine determine that some instance (of something) does or does not fall into some class of things the machine knows about? For example, if a machine were handed a fruit electronically could it determine that the fruit was or was not an apple? What the machine 'knows' about the fruit would have to satisfy all the encoded rules for 'apple-ness'. Representing such rules is a primary focus of semantic languages, including in particular those proposed for the semantic web.
Test 2: Can the machine determine whether or not two expressions mean the same? For example, suppose humans take customer and client (and perhaps cliente in Spanish) to designate the same concept (think synonyms). If humans specify as much to machines, then the machines will 'know' the meaning denoted by the symbols is the same one, not different.
What makes SBVR so unique?
SBVR is a vocabulary (or more accurately, a set of inter-related sub-vocabularies) that permits the capture of semantics for the kinds of sentences commonly used to express business rules.
Why SBVR practices?
1. You need to retain business know-how. You need your operational business knowledge to be explicit, rather than tacit. Your company runs the risk of losing key people. You need a pragmatic approach to knowledge retention.
2. Your business doesn't really know what its business rules are. You need a better way to manage and disseminate business rules consistently across various parts of the organization.
3. You need to develop operational decision logic and other kinds of shared, knowledge-rich specifications directly with business people in business terms.
References:
[1] SBVR part 1
[2] SBVR part 2
What is the Semantic Web?
Definitions:
1. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF). See also the separate FAQ for further information [1].
1. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF). See also the separate FAQ for further information [1].
2. Right now the HTML+CSS is centered more on structure and presentation. Semantics is about the meaning of the information. In semantic web you use shared ontologies to establish meaning (semantic) of the object and meaning of relations between the objects. Best known ontologies are: FOAF and Dublin Core.
Typically semantics would be expressed in specialized language, such as RDF or OWL. RDF can be embedded within XHTML using eRDF or W3C's RDFa.
Less structured alternative to eRDF/RDFa are microformats. [2]
3. This is a more practical definition which I understand much more. I found it in Stackoverflow as well.
4. Real world example
5. Introduction to the Semantic web
6. Semantic web frameworks
7. Semantic web and Syntactic web resources
8. Commercial applications using Semantic web
9. Semantic web stack
10. Semantic web site
11. The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects [3].
12. Semantic Web Standards Wiki
13. Semantic Overflow
14. Semantic Web Case Studies and Use Cases
4. Real world example
5. Introduction to the Semantic web
6. Semantic web frameworks
7. Semantic web and Syntactic web resources
8. Commercial applications using Semantic web
9. Semantic web stack
10. Semantic web site
11. The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects [3].
12. Semantic Web Standards Wiki
13. Semantic Overflow
14. Semantic Web Case Studies and Use Cases
References:
[1] Stackoverflow
ONTORULE Project
Reading some material on regards the development of business rules and their integration with ontologies and the semantic web I found with a project that aims to do so. The 1st International Workshop on Business Models, Business Rules and Ontologies (BuRO 2010).
Workshop Description
Three views on the business organization:
1. The view of the business analyst using a formal and validated business model
2. The view of the knowledge engineer via ontologies and rules
3. The view of the IT department via an operationalization in applications
These views can be glued with an end-to-end point solution:
1. Conceptualization and where possible acquisition of business models and their transformation into ontologies and rules.
Workshop Description
Three views on the business organization:
1. The view of the business analyst using a formal and validated business model
2. The view of the knowledge engineer via ontologies and rules
3. The view of the IT department via an operationalization in applications
These views can be glued with an end-to-end point solution:
1. Conceptualization and where possible acquisition of business models and their transformation into ontologies and rules.
2. Their management and maintenance
3. The transparent operationalization in IT applications
The vision at the heart of the Semantic Web is of high relevance in a business setting as well. The workshop deals with the different issues that arise in a company that wishes to have a transparent and where possible a useful and semi-automatic transfer of knowledge present in business documents expressing, e.g., policies, to an IT operationalization. Moreover, the workshop uses a holistic perspective, raising awareness for the overall picture, instead of focusing on stand-alone issues. E.g., although OWL is well-investigated it is unclear how business knowledge expressed in SBVR can be mapped to it.
Topics of interest
3. The transparent operationalization in IT applications
The vision at the heart of the Semantic Web is of high relevance in a business setting as well. The workshop deals with the different issues that arise in a company that wishes to have a transparent and where possible a useful and semi-automatic transfer of knowledge present in business documents expressing, e.g., policies, to an IT operationalization. Moreover, the workshop uses a holistic perspective, raising awareness for the overall picture, instead of focusing on stand-alone issues. E.g., although OWL is well-investigated it is unclear how business knowledge expressed in SBVR can be mapped to it.
Topics of interest
- The acquisition of ontologies and rules from unstructured text via Natural Language Processing (NLP) techniques.
- The development of a complete, formal and validated business model, taking all possible inputs into account (people and documents, structured and unstructured, some of which as output from an NLP phase), using the Semantics of Business Vocabulary and Business Rules (SBVR).
- Transformation from structured business representations, from SBVR, to RDF/OWL and/or rules.
- The management and maintenance of business models, ontologies and rules, e.g., consistency maintenance and the integration of rules and ontologies (semantics, algorithms).
- Implementations of such management systems.
- Use cases and field reports.
Further readings:
1. The Semantic Web by Tim Berners-Lee
2. The Semantic Web. Recompilation of references and definitions
3. Semantics of Business Vocabulary and Business Rules (SBVR)
References
The ONTORULE Project http://ontorule-project.eu/dissemination/events/buro2010
- The development of a complete, formal and validated business model, taking all possible inputs into account (people and documents, structured and unstructured, some of which as output from an NLP phase), using the Semantics of Business Vocabulary and Business Rules (SBVR).
- Transformation from structured business representations, from SBVR, to RDF/OWL and/or rules.
- The management and maintenance of business models, ontologies and rules, e.g., consistency maintenance and the integration of rules and ontologies (semantics, algorithms).
- Implementations of such management systems.
- Use cases and field reports.
Further readings:
1. The Semantic Web by Tim Berners-Lee
2. The Semantic Web. Recompilation of references and definitions
3. Semantics of Business Vocabulary and Business Rules (SBVR)
References
The ONTORULE Project http://ontorule-project.eu/dissemination/events/buro2010
Starting on the Web Semantic and related
I am going to be researching about ontologies, web semantic and business rules. In this moment I do not feel quite comfortable with these topics, I do not understand many things. However, I am willing to invest my time in learning these important subjects. I am an IT student, I like to do programming, PHP and related frameworks like CakePHP and recently Yii. The reason why I want to study about ontologies and the web semantic is basically because I believe these technologies are going to be the next generation of the Internet. This knowledge will contribute enormously to my professional development. So this blog is aimed to help me in this task. However, if I find a better way to organise my ideas, then I will have to change from this environment.
Note: On the way if I can help somebody else that will be even greater.
Note: On the way if I can help somebody else that will be even greater.
Subscribe to:
Posts (Atom)