Digging on the Semantic Web

Friday, 17 February 2012

Lparse 1.0 User's Manual - Theoretical background

I'll write something later

Friday, 10 February 2012

Lparse 1.0 User's Manual - ASP and Smodels

Answer Set Programming (ASP) is a programming paradigm in which the programmer describes the problem using a formal language and an underlying engine finds a solution to the problem.

Smodels is a system for ASP. Smodels programs are written using standard logic programming notation. The programs are composed of atoms and inference rules. Programs represent problems, an answer to a problem is a set of atoms, called stable model, that tell which atoms are true. A Smodel program may have none, one or many stable models. The stable models may be seen as a set of rational beliefs about the program. Smodels program example:

ide_drive       :- hard_drive, not scsi_drive.
scsi_drive      :- hard_drive, not ide_drive.
scsi_controller :- scsi_drive.
hard_drive

This program has two stable models:

M1 = { hard_drive, ide_drive }

And

M2 = { hard_drive, scsi_drive, scsi_controller }

Smodels is implemented by Patrick Simons. The above information comes from this website: Computing the Stable model semantics and their user manual

Lparse 1.0 User's Manual - Smodels simple example

Today I ran my first ASP program and find the solution using the smodels solver. I code the Node Coloring problem. In this problem we are given a number of nodes and a set of edges that connect the nodes. The problem is to use some fixed number of colors to color each node so that two adjacent nodes do not have the same color.

color1.lp:

color(red). color(blue). color(yellow).
col(X,red)    :- node(X), not col(X,blue), not col(X,yellow).
col(X,blue)   :- node(X), not col(X,red), not col(X,yellow).
col(X,yellow) :- node(X), not col(X,blue), not col(X,red).
fail          :- edge(X,Y), color(AB), col(X,AB), col(Y,AB).
compute 1 { not fail }.

graph1:

node(a). node(b).
node(c). node(d).
edge(a,b). edge(b,c).
edge(c,d). edge(d,a).

Representing a graph with four nodes and four edges.
And to run it:

$ lparse color1.lp graph1 | smodels

After that it will show the stable model for this program:

Answer: 1
Stable Model: edge(d,a) edge(c,d) edge(b,c) edge(a,b) 
node(d) node(c) node(b) node(a) 
col(a,yellow) col(c,blue) col(d,red) col(b,red) 
color(yellow) color(blue) color(red) 
True
Duration: 0.000
Number of choice points: 3
Number of wrong choices: 0
Number of atoms: 25
Number of rules: 35
Number of picked atoms: 41
Number of forced atoms: 0
Number of truth assignments: 139
Size of searchspace (removed): 12 (0)

Thursday, 9 February 2012

Master in IT with Honours plan

On the 8th of February I did a presentation titled "An introduction to MapReduce". Everything went well, but I have to be more prepared for the next time. After the presentation I talked to Dr. Kewen Wang on what to do next. He said that I should implement ASP solvers with MapReduce, more exactly with Hadoop (MapReduce open source implementation).

At the moment I'm still learning ASP as I don't understand it very well. So for the next weeks I should spent some time reading about ASP (probably the seminal papers). At the same time I should practice programming ASP, learning how to do it. The latter will definitely will help me to implement ASP with Hadoop.

Following the plan. I will try to make some practical examples of ASP on my Ubuntu machine. I don't know how to compile and even code this kind of programs. I have followed this tutorial that has helped me to install the required software Knowledge Base System Group.

Wednesday, 4 January 2012

Hadoop: Example Program:
In this example by using HadoopStreaming you are able to interact with Hadoop. HadooopStreaming is an API that allows you to write code in any language and use a simple text-based record format for the input and output <key, value> pairs.

Hadoop example program.

In the above link there is a tutorial that shows how to write a program in Python and then it shows how to make it run through Hadoop.

When you reach the part in which you need to put your files in HDFS don't follow the instructions in the tutorial. Instead do the following:

$ bin/hadoop fs -mkdir urls
$ bin/hadoop fs -put url1 urls/
$ bin/hadoop fs -put url2 urls/
$ bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar \
      -mapper $HOME/proj/hadoop/multifetch.py \
      -reducer $HOME/proj/hadoop/reducer.py     \
      -input urls/*                                                    \
      -output titles

Hadoop: How to set up a single node:
Simple operations to use Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Prerequisites
GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

$ sudo apt-get install ssh
$ sudo apt-get install rsync

Installation
Download a recent stable release from one of the Apache Download Mirrors.

Edit the file conf/hadoop-env.sh and point JAVA_HOME to your JAVA root installation. How to know where is your JAVA ($ echo $JAVA_HOME)

Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process. Follow this link.

In one of the steps you will need to ssh to the localhost:
$ ssh localhost

If you get the error
$ ssh: connect to host localhost port 22: Connection refused

Then try:
$ sudo apt-get install openssh-server

Thursday, 22 December 2011

Hadoop: MapReduce Introduction:
Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

Typically the compute nodes and the storage nodes are the same, that is, the MapReduce framework and the Hadoop Distributed File System (see HDFS Architecture Guide) are running on the same set of nodes. This configuration allows the framework to effectively schedule tasks on the nodes where data is already present, resulting in very high aggregate bandwidth across the cluster.

The MapReduce framework consists of a single master JobTracker and one slave TaskTracker per cluster-node. The master is responsible for scheduling the jobs' component tasks on the slaves, monitoring them and re-executing the failed tasks. The slaves execute the tasks as directed by the master.

Minimally, applications specify the input/output locations and supply map and reduce functions via implementations of appropriate interfaces and/or abstract-classes. These, and other job parameters, comprise the job configuration. The Hadoop job client then submits the job (jar/executable etc.) and configuration to the JobTracker which then assumes the responsibility of distributing the software/configuration to the slaves, scheduling tasks and monitoring them, providing status and diagnostic information to the job-client.

Although the Hadoop framework is implemented in JavaTM, MapReduce applications need not be written in Java.