Thursday, December 19, 2013

Unofficial benchmark results comparing SSIS to Talend

As a follow up to my previous post, here is the output of my various tests comparing the extraction only processing rates between SSIS and talend.

Flat to Flat
I created a flat file (.csv) with 10M rows with the following fields (sizes to indicate approx size of expected data)
Id - Bigint
OtherId - Int
SomeOtherNumber - Int
RandomName - VarChar(250)
AnotherNumber - Int
BitField - true or false
EmailAddress - VarChar(250)

Three runs to verify consistency were done, results are below:

SSIS: Average duration 44.5 seconds
Talend: Average duration 1 minute 9 seconds

Large Flat File to SQL Server
Same flat file used above, but loaded into SQL Server 2012 located on my laptop. So Flat File -> SQL Server 2012, with no transformations just straight data columns to data fields. Again, three runs to verify consistency, results are as follows:

SSIS: Average duration 59.6 seconds
Talend: Average duration 6 minutes 56 seconds
(nope, that's not a typo, that's nearly 7 minutes)

Many Files to SQL Server
Next I attempted ingesting from a folder containing 3 subfolders containing 79 files, which contained a total of 2,187,842 records. Three runs, results are below:

SSIS: Average duration 22.8 seconds
Talend: Average duration 42.2 seconds

Summary
This is a very unofficial benchmark testing in trying to determine if we should start focusing on Talend as an ETL tool over SSIS. As Talend boasts over 450+ connectors out of the box and is platform independent, it was worth a look. I did not expect for SSIS to outperform in all scenarios. I expected better integration and speed with MSFT products (Flat File to SQL Server), which is what SSIS was designed for and which performance meets my expectations. 

This is not an indication that Talend is or may be a poor solution, but in our environment, SSIS is the clear winner so far.
Notes:
To be completely fair, and if I'd had time, I would have installed Talend on linux and run similar tests, but we are a mostly MSFT shop so I went with what we have access to, locally.

Hardware/Software configurations:
Dell Precision M4700, 32GB Ram, Win7 64-bit
[Flat file storage, Talend and SSIS installed on laptop]
Virtual Server with 32GB Ram, Server 2012 64-bit
[SQL Server 2012]
Talend Open Studio - Data Integration v 5.4.1
I'm being pretty vague, as the title states: unofficial benchmark.

I attempted to ONLY have minimal processes running on both, etc. I also ran tests without connecting to SQL Server, i.e., local flat file to flat file ETL, small files, large files, etc., trying to vary the scenarios we may have at work. 

No comments: