One of appropriate test strategy for testing extract, transform and load (ETL) procedures is to use automation. In this post I will explain what on this field Microsoft has to offer.
When you need to test ETL procedure, you usually have a large set of test data. This is first indicator that automation could help you a lot. Microsoft offers, as part of SQL server, its extension of SQL, tSQL. Based on that, community around Microsoft SQL server developed tSQLt, database unit testing framework. It has features that are expected of every modern testing framework on database access object level. The most important one is transactional support, which means that during your test you can modify your database, perform your checks, and test will automatically revoke all database changes when it is done with execution.
As an IDE I recommend Microsoft SQL management studio express which gives you Microsoft Visual Studio features for database development.
You can start by taking quick start guide, which is actually quick. I do not recommend installing tSQLt to your development database. Write tests in tSQLt_Example database and access your database using database_name.database_table notation available in Microsoft SQL Server management studio.
In order to be able to use tSQLt, you need to now any kind of database stored procedure SQL extension. By following links in Quick Start guide, you will find all needed resources to start testing your ETL procedures.
And one lesson from BBST foundation course. tSQLt is database unit testing framework. What is unit in this context?
Please write your answers as comments of this post.
In my context I used tSQLt in following manner. I tested ETL procedure written in C# and run from command line. I explored input and output databases, and tried to discover business rules of ETL procedure. I coded those rules as tSQLt tests, and discuss them with business analyst using source of tSQLt tests as requirements documentation.
Hi Karlo,
I am happy to see that you are trying out tSQLt for your ETL procedures. However, I advice you to not write your tests in the tSQLt_Example database and to not use "database_name.database_table" in your tests.
Instead, you should write and execute your test cases in your local development database (never execute your tests in your production database). These are my reasons:
1. The tSQLt_Example database might get dropped and recreated next time you update tSQLt. You don't want to loose all of your tests.
2. Most of the tSQLt procedures, especially FakeTable, ApplyConstraint and SpyProcedure, expect that your test case and the objects you are testing are in the same database.
3. You want to keep your unit tests as easy to write as possible. Therefore it helps that they are in the same database.
4. You don't want your test cases to need to know the database name where your code is stored in case that database name changes.
Why do you "not recommend installing tSQLt to your development database"?
Hope this helps,
Dennis
Hi and thanks for the response. It is a nice change from robot automatic post replies 🙂
Here are my responses and reasoning behind them.
You are right, I did not explain my advice why tSQLt should not be installed in development database.
I am considered with SET TRUSTWORTHY On option. That is the only reason for advice that I gave in the post. And this is connect with need to run some tSQLt scripts on production environment. What if there is a need to run smoke test developed in tSQLt on production database? Or if devOps scripts are reworked using tSQLt framework?
I resolved point 1. risk by putting all tSQLt scripts in separate .sql files and storing them in TFS version control.
I did not have need to use those features. And this is credible point for installing tSQLt into development database.
3. This is not an issue when using mentioned option database_name.table_name and Microsoft SQL management studio.
4. I parametrised database names using .csv file.
How did you found out about this blog post?
Regards, KArlo.