Pentaho ETL








Pentaho 

Here, I am trying to write down my finding in Pentaho (Spoon). So that, I can quickly look back if I need to.
This is more like a personal note and as of now, I am not sure if this could be of any help to others.



07 Feb 2014

ERROR HANDLING

How to enable it? You can right click on the steps and you can see an option called Error Handling.




Note: Not all the steps have this option. Error handling is disabled for Table input step, stream look-up step etc..


How to define the field names?
You can give the names you are interested in for the predefined fields of error handling. So, these fields will be added to the fields flowing in your current stream



Currently, for the Pentaho 5.0, we are having a bug for ERROR_COUNT field, which is getting type as Long instead of Integer for some of the rows. So, as a temporary fix, we have used javascript to handle this

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}

try {
String errorsCount = get(Fields.In, "ERROR_COUNT").getString(r);
Integer.parseInt(errorsCount);
} catch(Exception e){
Long errorsCount = null;
get(Fields.Out, "ERROR_COUNT").setValue(r, errorsCount);
}

// Send the row on to the next step.
putRow(data.outputRowMeta, r);

return true;


}

How to disable hop of error handling?


Table output step: Commit Size

The commit size for table output step in Pentaho seems to be 200



Table output step: Batch update insert




If you turn on "Use batch updates for inserts"  , for the commit size mentioned above, the writing onto table happens in batches. 
Let's say you have 100 rows input and want to write into target in batches of commit size 10, the transformation writes in 10 batches.
Say your record 35 and 74 have errors. Now, the transformation will error out records from 31-40,71-80 (2 batches will fail).

If, we want to error out only the records which have errors, we need to turn off the "Use batch updates for inserts"  check box. 

Concatenate with new line character:












You can either do by using
1) Java expression as above or
2) In the SQL
Example using oracle:
(COLUMN1||CHR(10)||COLUMN2) as CONCAT_COL


Update Step :

















If you are trying to update a record and if you enable this skip lookup, then the update step will not lookup and not update. (Needs more research) For now, make sure you do not enable it.
May, it is designed for a scenario, where for sure you know that the lookup is not needed but still, keep away from it.

Comments

Popular posts from this blog

Copy files from one computer to other using ‘scp’ command