risenfall

Use Free Software.

How to setup Apache Pig with Apache Hadoop

leave a comment »

Download Hadoop 1.0.0 and setup as a multi node cluster.

Download Apache Pig 0.9.1 and extract.

Export HADOOP_HOME – place you install Apache Hadoop

Start Apache Pig with mapreduce mode

bin/pig

You will get the grunt prompt

grunt>

Written by risenfall

January 14, 2012 at 4:27 pm

Posted in Uncategorized

Distributed grep using Hadoop

leave a comment »

Hadoop word count example is commonly used to introduce mapreduce concepts. I have altered 
the word count sample to do pattern matching or work like UNIX grep command.

first copy the text file to HDFS location.

bin/hadoop dfs -copyFromLocal <local-dir> <hdfs-dir>
bin/hadoop jar <path>/grep.jar org.myorg.Grep <hdfs-input-dir> <hdfs-output-dir> <pattern>

package org.myorg;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Grep {

    public static class Map extends MapReduceBase
            implements Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        private Pattern pattern;
        private int group;

        public void configure(JobConf job) {
            pattern = Pattern.compile(job.get("mapred.mapper.regex"));
            group = job.getInt("mapred.mapper.regex.group", 0);
        }

        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
                        Reporter reporter) throws IOException {
            String line = value.toString();
            Matcher matcher = pattern.matcher(line);
            if (matcher.find()) {
                output.collect(new Text(line), one);
            }
        }
    }

    public static void main(String[] args) throws Exception {
        if (args.length < 3) {
            System.out.println("Grep <inDir> <outDir> <regex> [<group>]");
            return;
        }
        JobConf conf = new JobConf(Grep.class);
        conf.setJobName("Grep");
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);
        conf.setMapperClass(Map.class);
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        conf.set("mapred.mapper.regex", args[2]);
        if (args.length == 4) {
            conf.set("mapred.mapper.regex.group", args[3]);
        }
        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));
        JobClient.runJob(conf);
    }
}

Written by risenfall

January 8, 2012 at 5:18 pm

Posted in hadoop, HDFS

Tagged with

Apache Hadoop 1.0.0 released

leave a comment »

Apache Hadoop 1.0.0 released. Release Note

Hadoop 1.0.0 released from 0.20.2xx.x development tree.  There is another new development in Hadoop space with version 0.23.0. New 0.23.0 version contains HDFS Federation and NextGen MapReduce (YARN).

Written by risenfall

January 8, 2012 at 2:42 pm

Posted in FOSS, hadoop

Tagged with , ,

Howto setup Xen 4.1 with Ubuntu 11.10 and run a VM (domU)

leave a comment »

Ubuntu 11.10  released with the Linux 3.0.0. Now Linux  has the Xen dom0 support.

setup steps:

1. Install Xen hypervisor

apt-get install xen-hypervisor-4.1-amd64

2. create a domU configuration. ( node0.cfg )

disk = [ 'file:/mnt/vm/xen/vm_images/node0.iso,hda,w',
'file:/media/mnt/vm/xen/iso_files/oneiric-desktop-i386.iso,hdc:cdrom,r' ]

memory=1024 vcpus=1 name="node0"
vif=[ 'type=ioemu,bridge=virbr0' ]
builder = "hvm"
device_model = "/usr/lib/xen-4.1/bin/qemu-dm"
vnc=1 vncunused=1
apic=0
acpi=0
pae=0
serial = "pty" # enable serial console
boot="dc"
on_reboot = 'restart'
on_crash = 'restart'

3. Start the domain

xm create /pathto/node0.cfg

In the first boot vm boots from cdrom and user can install new OS to vm.
by changing the boot order to boot=”c” user can boot from the hard disk.

4. List running domains.

xm list

PS:
Ubuntu 11.10 still need some path fixes to run user domains /vms ( domU ).
You may get an error in your /var/log/qemu-dm-node0.log saying :

Could not read keymap file: ‘/usr/share/qemu/keymaps/en-us’
create symbolic link named qemu-linaro in /usr/share with the name qemu.

For bridge networks default bridge may not include your default NIC.
If your vms ( domUs ) can not communicate with out side networks add
NIC to bridge.

brctl addif virbr0 eth0

Written by risenfall

October 23, 2011 at 1:03 pm

Application Development with WSO2 Relational Storage Service ( WSO2 RSS )

leave a comment »

WSO2 Relational Storage Service is a data storage service provided by WSO2 Stratoslive PaaS. WSO2 RSS supports MySQL and Amazon RDS as the back end data store.

Creating data bases with WSO2 RSS is a simple task. StratosLive Data Server has the easy RSS user interface that helps to add / manage databases.

Steps to create database using WSO2 RSS.

1.  Add Database.

2. Create Database User and add user to a database privileged group.

3. Create tables / mange data using  WSO2 RSS DB console.

RSS based data stores are accessible with in StratosLive PaaS.

Users can use Java application development methods to access RSS Data stores.

WSO2ConRSS application is a webapp deployed in StratosLive Application servers and it uses a RSS based data store to retrieve data. Source code related this sample available in OT svn.

Written by risenfall

October 9, 2011 at 10:27 am

Posted in Carbon, HOWTO, PaaS, RSS, StratosLive, WSO2

Tagged with ,

Explore Connect Learn WSO2Con 2011

leave a comment »


more  Photos and Videos

Written by risenfall

September 13, 2011 at 8:05 pm

Posted in FOSS, SOA, WSO2

Tagged with

How to use StratosLive Column ( Family Data ) Store Service.

leave a comment »

WSO2 CSS is Column ( Family Data ) Store based on Apache Cassandra . WSO2 CSS  can deploy with any WSO2 Carbon based product and it is available as a service in StratosLive  the PaaS offering of WSO2.

It is very easy to use CSS as a data store with widely available connectors like java based Hector and other thrift based connectors. StratosLive supports Hector API to communicate with the Cassanda based back-end CSS cluster. External applications can use StratosLive PaaS column data store feature with any Cassandra connector.

StratosLive app developers have to use tenant information to authenticate in the connection with CSS data store. Tenant admin can create tenant and authorize the user for data store access.

Check the full sample in OT SVN.

This sample create connection to StratosLive CSS as an external application. It writes random data to StratosLive CSS keyspace and read and output date via stdout.

Instructions to build and run the sample.

Build the project with Maven

Take a copy of the source using svn

mvn clean install

Build the project with dependency libraries

mvn clean assembly:assembly -o

Execute the program

java -jar target/org.wso2.carbon.cassandra.examples-3.2.1-jar-with-dependencies.jar

Written by risenfall

September 3, 2011 at 12:01 pm

Posted in Carbon, FOSS, NoSQL, WSO2

Tagged with , , ,

Column ( Family Data ) Store Service in WSO2 StratosLive

leave a comment »

StratosLive PaaS supports several internal data stores like column ( family data ) store service , relational data store service and external data sources like Amazon DS and Amazon S3. Also users can use external data sources via Web Services.

WSO2 introduces CSS in the StratosLive PaaS to support webscale data generated by users deployed applications and the PaaS itself.

WSO2 Stratos CSS is based on Apache Cassandra. Cassandra is modified to run in WSO2 Carbon platform which is an OSGI environment. Stratos CSS 1.0.0 is shipped with Stratos 1.5.1. Users can install it with WSO2 private cloud deployments. CSS related features can be deploy with any carbon standalone product and get full features.

StratosLive has separate CSS cluster deployed to store tenant keyspaces. StrtosLive Data Service Server ( DSS ) contains the user interfaces to manage keyspaces.

CSS is a multi-tenanted and it works with users in private Stratos deployments.

WSO2 CSS 1.0.0 features.

1. Manage (create / delete / modify ) keyspaces

2. Share Keyspaces with in users

3. Create Indexes

4. Monitor Keyspace

WSO2 CSS has easy user interface to manage keyspaces and users can use CSS to manage external keyspaces. Users can use WSO2 CSS as a Cassandra management user interface.

  • List Keyspaces

 

 

 

 

 

 

 

 

 

 

 

 

  • List Keyspace information

 

 

 

 

 

 

 

 

 

 

 

 

  • Create a Keyspace for a tenant

 

 

 

 

 

 

 

 

 

 

 

 

  • Create Column Family

 

 

 

 

 

 

 

 

 

 

 

 

  • Create Column and Set Indexes

 

 

 

 

 

 

 

 

 

 

 

 

  • Share Keyspace

 

 

 

 

 

 

 

 

 

 

 

 

WSO2 Stratos PaaS Column ( Family ) data support will improve with the CSS based data services and CQL support in next CSS releases.

Written by risenfall

September 2, 2011 at 8:56 am

Posted in Carbon, NoSQL, SOA, WSO2

Tagged with , , , ,

WSO2Con 2011

leave a comment »

WSO2Con 2011 is happening in Colombo Sri Lanka between Sep 12 – 16 at Waters Edge. WSO2Con 2011 is the second WSO2 developer conference and it is more focus on new WSO2 PaaS offering named Stratos and the hosted service StratosLive.

WSO2Con 2011 main conference starts on 13th Sep and ends on 15th Sep. There are pre-conference tutorials and post conference tutorials to get real experience of the WSO2 product stack. Each conference day has two tracks therefore participants can select talks based on the interests.  Check the conference agenda early and select the track.

In this year conference WSO2 customers and partners are presenting their solutions developed with WSO2 products.

Written by risenfall

September 1, 2011 at 1:45 pm

Posted in FOSS, SOA, WSO2

Happy 6th Birthday WSO2

6 Years in SOA.

 

Written by risenfall

August 4, 2011 at 8:00 pm

Posted in FOSS, SOA, WSO2

Follow

Get every new post delivered to your Inbox.