Pentaho ETL
Here, I am trying to write down my finding in Pentaho (Spoon). So that, I can quickly look back if I need to.
This is more like a personal note and as of now, I am not sure if this could be of any help to others.
07 Feb 2014
ERROR HANDLING
How to enable it? You can right click on the steps and you can see an option called Error Handling.
How to define the field names?
You can give the names you are interested in for the predefined fields of error handling. So, these fields will be added to the fields flowing in your current stream
Currently, for the Pentaho 5.0, we are having a bug for ERROR_COUNT field, which is getting type as Long instead of Integer for some of the rows. So, as a temporary fix, we have used javascript to handle this
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
try {
String errorsCount = get(Fields.In, "ERROR_COUNT").getString(r);
Integer.parseInt(errorsCount);
} catch(Exception e){
Long errorsCount = null;
get(Fields.Out, "ERROR_COUNT").setValue(r, errorsCount);
}
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
How to disable hop of error handling?
Table output step: Commit Size
The commit size for table output step in Pentaho seems to be 200Table output step: Batch update insert

If you turn on "Use batch updates for inserts" , for the commit size mentioned above, the writing onto table happens in batches.
Let's say you have 100 rows input and want to write into target in batches of commit size 10, the transformation writes in 10 batches.
Say your record 35 and 74 have errors. Now, the transformation will error out records from 31-40,71-80 (2 batches will fail).
If, we want to error out only the records which have errors, we need to turn off the "Use batch updates for inserts" check box.
Concatenate with new line character:
You can either do by using
1) Java expression as above or
2) In the SQL
Example using oracle:
(COLUMN1||CHR(10)||COLUMN2) as CONCAT_COL
Update Step :
If you are trying to update a record and if you enable this skip lookup, then the update step will not lookup and not update. (Needs more research) For now, make sure you do not enable it.
May, it is designed for a scenario, where for sure you know that the lookup is not needed but still, keep away from it.
Comments
Post a Comment