Vos User Manual

Table of Contents

Introduction

Vos is a program to process formatted data, i.e. CSV data. Vos is designed to process a large input file, a file where their size is larger than the size of memory, and can be tuned to adapt with your machine environment.

Vos currently has four main features,

Building and Compiling Vos

Vos was developed on GNU/Linux system, so any prerequisite below only valid on system that running GNU/Linux system. Usually, any Unix like system could compile the source, it just does not fully tested yet.

Software Requirements

This software/tools below is used in developing Vos, therefore we recommended you to use the same or greater version when building Vos from the source.

Compiling Vos From Source

This step assume that you already get the source and saved into your machine.

	$ tar jxvf vos-xxxx.xx.xx.tar.bz2
	$ cd vos/src
	$ make

Where xxxx.xx.xx is Vos version (depend on which version that has been downloaded). When running "make", make program will create directory "build" in "vos" directory, vos executable is placed in there (vos/build).

For later use, you should copy Vos executable to your PATH directory. In example:

	$ pwd
	/home/johndoe/tmp/vos/src
	$ echo $PATH
	/home/johndoe/bin:/usr/local/bin:/usr/bin:/bin
	$ cp ../build/vos /home/johndoe/bin

Running Vos

Vos program only have one parameter: vos script.

	vos < vos-script >

vos-script is a file contains vos statements that will be executed and processed.

Vos Environments

Before running Vos program, there are severals environment variables that you can set to change behaviour of program while running. Some of the environment variable also can be set at the vos script using Vos variables.

Vos Script

To illustrate on how Vos script work, we will use two input files as an example here, "artist.data" and "album.data".

artist.data
1,"Broken Social Scene"
2,"U2"
3,"Led Zeppelin"
4,"John Legend"
5,"Deep Purple"
album.data
'You Forgot it in People'   1
'Burn'                      5
'Get Lifted'                4
'The Joshua Tree'           2
'Broken Social Scene'       1

Vos Variables

Vos variable is used with "SET" statement.

Vos variable is used to adapt with the environment where Vos will be running. For example, let say that you have a machine with 8 processor and 16 GB of memory and you want to sort 20,000,000 rows of data with it's size maybe about 2 GB. Instead of using default maximum row (which is 100,000) with two thread you can set maximum row to 2,500,000 and maximum thread to 8, which will decrease processing time.

There are two method to set Vos variable, first, by explicitly defined it on vos script by using SET statement; second, by defined in environment variable using shell set or export.

Vos Statements

Vos script is not case sensitive, "Load" is equal with "LOAD".

Set Statement

vos set statement
For example on how to use Set Statement and list of variable see Vos Variable.

Load Statement

vos load statement

Example on using Load Statement:

LOAD "artist.data" (
	   :idx :   ::',',
	'"':name:'"'::
) as artist;

LOAD "album.data" (
	'\'':title     :'\''::,
	    :artist_idx:    :28:28
) as album;

Sort Statement

vos sort statement

Example on using Sort Statement:

This script will sort artist.data by name (second field) on descending order,

LOAD "artist.data" (
	   :idx :   ::',',
	'"':name:'"'::
) as artist;

SORT artist BY name DESC;

If you run the script the output would be like this,

2|U2
3|Led Zeppelin
4|John Legend
5|Deep Purple
1|Broken Social Scene

This script will sort album.data by artist_idx (second field) then by title (first field) and save the output to a file album_sorted.data .

LOAD "album.data" (
	'\'':title     :'\''::,
	    :artist_idx:    :28:28
) as album;

SORT album BY artist_idx, title INTO "album_sorted.data";

If you run the script the output would be like this,

Broken Social Scene|1
You Forgot it in People|1
The Joshua Tree|2
Get Lifted|4
Burn|5

Create Statement

vos create statement

Create statement is used to create a new data with new format or with different field output order.
Create statement also can be used to combine several input file into one file.

Example on using Create Statement,

This script will combine artist.data and album.data into one file, fields will be separated by '|'.

LOAD "artist.data" (
	   :idx :   ::',',
	'"':name:'"'::
) as artist;

LOAD "album.data" (
	'\'':title:'\''::,
	    :artist_idx::28:28
) as album;

CREATE "artist_album.data" from artist, album (
	   :artist.idx      :   ::'|',
	'"':artist.name     :'"'::'|',
	   :album.artist_idx:   ::'|',
	'[':album.title     :']'::
);

If you run the script the output would be like this,

1|"Broken Social Scene"|1|[You Forgot it in People]
2|"U2"|5|[Burn]
3|"Led Zeppelin"|4|[Get Lifted]
4|"John Legend"|2|[The Joshua Tree]
5|"Deep Purple"|1|[Broken Social Scene]

Join Statement

vos joint statement vos join rules

Join statement is used to combine two input file into one file, like create statement, but using specific fields as a matching rule.

Example on using Join statement,

LOAD "artist.data" (
           :idx :   ::',',
        '"':name:'"'::
) as artist;

LOAD "album.data" (
        '\'':title     :'\''::,
            :artist_idx:    :28 :28
) as album;

JOIN artist, album INTO "join_artist_album.data" (
        artist.idx = album.artist_idx
);

If you run the script the output would be like this,

1|Broken Social Scene|You Forgot it in People|1
2|U2|The Joshua Tree|2
4|John Legend|Get Lifted|4
5|Deep Purple|Burn|5

Field Clause

vos field clause
Priority of quote vs position vs separator

First, when reading field data start-position is have a higher priority than left-quote. In example, suppose that input data is like this,

'You Forgot it in People'

and you defined field like this,

	'\'':field00:'\'':4:22:

Vos will always read from position 4, not from first character of left-quote, which result " Forgot it in Peopl".

Second, while reading field data end-position have a higher priority than right-quote, and riqht-quote is have a high priority than separator.

Filter Clause

vos filter clause

Example of using filter:

This script will only write artist and album where it's field idx value is 1.

LOAD "artist.data" (
	   :idx :::',',
	'"':name:'"'::
) as artist;

LOAD "album.data" (
	'\'':title     :'\''::,
	    :artist_idx:    :28:28
) as album;

CREATE "filter_artist_album.data" from artist, album (
	   :artist.idx	:::'|',
	'"':artist.name	:'"'::'|',
	'[':album.title	:']'::
) FILTER (
        ACCEPT artist.idx = 1,
        REJECT album.artist_idx != 1
);

If you run the script the output would be like this,

1|"Broken Social Scene"|[You Forgot it in People]
|""|[Broken Social Scene]

Vos License

Copyright (C) 2009 M. Shulhan (ms@kilabit.info) All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

3. All advertising materials mentioning features or use of this software must
   display the following acknowledgment:
   "This product includes software written by M. Shulhan (ms@kilabit.info)"

4. The names "M. Shulhan" or "Vos" must not be used to endorse or promote
   products derived from this software without specific prior written
   permission.

5. Products derived from this software may not be called "Vos" nor may "Vos"
   appear in their names without prior written permission of the author.

THIS SOFTWARE IS PROVIDED BY SHULHAN "AS IS" AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.