How do I pursue career in data warehousing?

Standard

Someone asked this on quora, and here’s my reply:

In the data world there are two broad sets of jobs available:

  1. Engineering-oriented: Date engineers, Data Warehousing specialists, Big Data engineer, Business Intelligence engineer— all of these roles are focused on building that data pipeline using code/tools to get the data in some centralized location
  2. Business-oriented: Data Analyst, Data scientist — all of these roles involve using data (from those centralized sources) and helping business leaders make better decisions. *

*smaller companies (or startups) tend to have roles where small teams(or just one person) do it all so the distinction is not that apparent.

So, it seems like you are interested in engineering-oriented roles — the role that focused on building data pipelines. Since you are starting out, I would suggest that you broaden the scope to learn about other tools as well. While data warehousing is still relevant and will be in some form or another for next few years, Industry (especially tech companies) have been slowly moving towards Big Data technologies and you need to be able to adapt to these changes. So learn about data warehousing, may be get a job/internship as a ETL/BI engineer but keep an eye out on other data engineering related tools like Hadoop ecosystem, spark, python, etc.

VIEW QUESTION ON QUORA

How to fix the Non-unicode to unicode data type conversion problems in SQL Server Integration Services?

Standard

Problem:

Are you trying to import an Excel file into SQL Server using SQL Server Integration services…And ran into error that has words like “Non unicode” and “unicode”? Then this blog is for you.

Why does this error occur?

Well it turns out that things like SQL Server and Excel have encoding standards that they follow which provides them a way to process, exchange & store data. BUT turns out that SQL Server and Excel use different standards.

Solution:

So, the solution is simple right? Import the data from Excel into non-Unicode format because that’s what you need for SQL Server.

So how do you that? Between your Source and Destination tasks, include a task called “Data conversion” and do the following for all columns that have text:

Excel SQL Server Unicode Nonunicode

And in the destination task, you’ll have to make sure that the mapping section using the new output aliases that you defined in the “data conversion” step.

Conclusion:

In this post, we learned about how to solve a common error that pops up when you try to import excel file to sql server using SSIS. Hope that helps.

Author: Paras Doshi

Every Data Analyst Needs to check out this FREE excel add-in: Power Query!

Standard

Power Query is amazing! It takes the data analysis capabilities of Excel to whole new level! In this post, I am going to share three reasons:

1. it enables repeatable mash-up of data!

Have you every had to do your data analysis tasks repeatedly on the data with same structure? Do you get “new” data every other week and need to go through the same data transformation workflow to get to the data that you need?

What’s the solution? Well, you can look at MACRO’s! Or you can request your IT department to create a Business Intelligence platform. However, what if you need to modify your data mashup workflow then these solutions don’t look great, do they now?

Don’t worry! Power Query is here!

It enables repeatable mashup of data like you might have never seen before! You need to try it to believe.

It’s very easy to input new data to Power Query and it enables you to retrieve final output based on new data using a “refresh” feature.

Each data-mashup is recorded as steps which you can go back and edit if you need to.

Power Query Refresh

2. It’s super-flexible!

Any data mashup performed using Power Query is expressed using its formula language called “M”. You can edit the code if you need to and as you can imagine such a platform enables much-needed flexibility for the analyst’s.

3. It has awesome advance features!

Do you want to Merge data? How about Join? Are you tired with VLOOKUP’s! Don’t worry! it’s super easy with Power Query! Here’s a post: Join Excel Tables in Power Query

How about Pivot or Unpivot? Done! Check this out: Unpivot excel data using Power Query

How about searching for online & open data sets? Done!

How about connecting to data sources that “Data” section of Excel doesn’t support yet? (Example: Facebook) – DONE! Power Query makes that happen for you.

And That’s not a complete list!

Plus you can unlock the “Power” (pun intended) of Power Query by using it with other tools in Power BI Stack. (Power Pivot, Power View, etc…) OR you can use the your final output from Power Query with other tools too! After all it’s an excel file.

Action-Item!

If you haven’t already then check out Power Query! it’s free and works with Excel 2010 and above.

Author: Paras Doshi

SQL server Integration services: How to solve “The value violated the integrity constraints for the column” error?

Standard

problem:

you are working on an SSIS package to load a table from a source system and you get an error “The value violated the integrity constraints for the column error” – how do you solve it?

solution:

one the things that the error message should also tell you would be column name. What you want to do is check the table definition of the destination table for any integrity constraints like NOT NULL or PRIMARY KEY. Now once you have that information, go back to your source and figure out if it’s trying to add NULL values in a column that has NOT NULL integrity constraint. Or may be ETL logic is trying to insert duplicate value to the column that has primary key constraint.

Also, the don’t alter the destination table to accept NULL’s or remove integrity constraint. You want to put a logic in your ETL OR fix the data integrity at source. You can use TSQL functions like NULLIF to handle NULL values while querying source systems.

Conclusion:
In this post, we saw how to solve the “The value violated the integrity constraints for the column” error in SSIS.

SQL Server Integration services: How to write a package that does Set based updates?

Standard

Problem:

if you’ve a sizable number of rows that need update in SSIS, then you don’t want to do a row based update commands because it won’t be efficient. if you’ve good number of rows that need to be updated then you can use the SET Based updates. it’s a common design pattern for loading dimensions in a data warehouse.

Find the steps below:

Solution:

There are two main steps to achieve this:

1) Populate the “update” table with rows that have been changed. Note that a new table needs to be created.

2) Run the SQL command to do a SET based update

1. SSIS Set based Updates Integration ServicesLet’s see each step in detail:

1)  Populate the “update” table with rows that have changed.

For this step, first make sure that you have a table that can hold the rows that have been updated.

Then create a Data Flow that take the source data and lookups the data that has changed and puts it in an update/staging table:

2 SSIS Populate the Table with Rows that changed

Note: I’ve used a small table for demo purpose but you won’t use this method if you don’t have a more rows to update because as you can see this method adds an overhead of putting the data in the update table first.

2) Run the SQL command to do a SET based update

Here’s the sample query:

[code language=”sql”]
— run the update command
Update Dim
Set
[Column1]=Upd.[Column1]
[Column2]=Upd.[Column2]
— [Column3]=Upd.[Column3]
— …
From dbo.DimDestination Dim
Inner Join dbo.Destination_update Upd
on Dim.Destination_sk=Upd.Destination_sk

–Truncate update table
Truncate table dbo.Destination_update
[/code]

Conclusion:

In this post, we saw how to write a package in SSIS that does SET based updates.