News: PASS outstanding Volunteer award & stepping down as Business Analytics Virtual Group Co-leader

Standard

I am honored to get the PASS outstanding volunteer award again for June 2017! It’s been so much fun helping grow the chapter from 1K to 10K members within last 4 years — the PASS HQ Team & Dan English (Group Lead) were great to work with and there’s so much more growth left for the next few years! The Group was recently classified as a “tier-1” group and got new sponsors which mean that group has some funding to pursue paid growth opportunities that weren’t accessible before.

Outstanding Volunteer Award PASS

URL: http://www.pass.org/Community/GetInvolved/Volunteers/OutstandingVolunteers.aspx

So since the group has the perfect platform to continue growing and we have a really good process in place to keep our growth flywheel running, I figured it’s a great time to step down. Over the past few years, my career moved me from Business Intelligence -> Analytics -> Data Science and along with that, I have slowly moved away from Microsoft-centric architectures too. I started out working for a Microsoft Gold Partner and then worked for an Open-source heavy shop at a startup-mode organization in silicon valley and now I work in an organization that uses a little bit of everything. Something like best of both worlds — and so there’s a much bigger gap now between where my career is taking me and the mission of the business analytics virtual group. They don’t perfectly align anymore and even though it’s a very rewarding experience, after some reflection, I figured the group deserves a leader whose mission aligns better than mine does.

Thank you PASS for the opportunity!

And there’s an open position for new volunteers on the Virtual group and so if you like to be involved, reach out to Dan English through the group’s website: http://bavc.pass.org/

SQL: How to add Hierarchical Levels in your Query?

Standard

Tree-like structures are common in our world: Company Hierarchy, File System on your computer, Product category map among others so you might run into a task of creating a Hierarchical level in your SQL query — In this blog post, I will show you how you can do that using couple of approaches. These approaches can also be used to map “parent – child” relationships.

Two approaches are:

  1. When you know the Tree Depth
  2. When you don’t know the Tree Depth

SQL Hierarchical

#1: When you know tree-depth:

When you know the tree-depth (and if it’s not too deep) then you could consider simple CTE’s to come up with the Hierarchical Levels field in your query.

Let’s take an example:

Input:

EmployeeIDFirstNameLastNameTitleManagerID
1KenSánchezChief Executive OfficerNULL
16DavidBradleyMarketing Manager273
273BrianWelckerVice President of Sales1
274StephenJiangNorth American Sales Manager273
285SyedAbbasPacific Sales Manager273

Query: (On SQL Server)


with lvl1 as
(select
[EmployeeID]
,[FirstName]
,[LastName]
,[Title]
,[ManagerID]
,1 as Level
FROM [dbo].[employees]
where ManagerID is null
)
,
lvl2 as
(
select
[EmployeeID]
,[FirstName]
,[LastName]
,[Title]
,[ManagerID]
,2 as Level
FROM [dbo].[employees]
where ManagerID IN (Select EmployeeID from lvl1)
),
lvl3 as
(
select
[EmployeeID]
,[FirstName]
,[LastName]
,[Title]
,[ManagerID]
,3 as Level
FROM [dbo].[employees]
where ManagerID IN (Select EmployeeID from lvl2)
)
select * from lvl1
union
select * from lvl2
union
select * from lvl3

Output:

EmployeeIDFirstNameLastNameTitleManagerIDLevel
1KenSánchezChief Executive OfficerNULL1
273BrianWelckerVice President of Sales12
16DavidBradleyMarketing Manager2733
274StephenJiangNorth American Sales Manager2733
285SyedAbbasPacific Sales Manager2733

#2: When you do NOT know tree-depth:

In other words, if the tree is N-level deep then you are out of luck using option #1. In this case, you should consider the RECURSIVE CTE approach. Here’s an example:

Input: (with the idea that this table will grow over time)

EmployeeIDFirstNameLastNameTitleManagerID
1KenSánchezChief Executive OfficerNULL
16DavidBradleyMarketing Manager273
23MaryGibsonMarketing Specialist16
273BrianWelckerVice President of Sales1
274StephenJiangNorth American Sales Manager273
275MichaelBlytheSales Representative274
276LindaMitchellSales Representative274
285SyedAbbasPacific Sales Manager273
286LynnTsofliasSales Representative285

Query: (On SQL Server that supports Recursive CTE)


with HierarchyLvl as
(
SELECT [EmployeeID]
,[FirstName]
,[LastName]
,[Title]
,[ManagerID]
,1 as Level
FROM [dbo].[employees]
where ManagerID is null
UNION ALL
SELECT e.[EmployeeID]
,e.[FirstName]
,e.[LastName]
,e.[Title]
,e.[ManagerID]
,Level + 1
FROM [dbo].[employees] e INNER JOIN HierarchyLvl d on e.ManagerID = d.EmployeeID
)
select * from HierarchyLvl

 

Output:

EmployeeIDFirstNameLastNameTitleManagerIDLevel
1KenSánchezChief Executive OfficerNULL1
273BrianWelckerVice President of Sales12
16DavidBradleyMarketing Manager2733
274StephenJiangNorth American Sales Manager2733
285SyedAbbasPacific Sales Manager2733
286LynnTsofliasSales Representative2854
275MichaelBlytheSales Representative2744
276LindaMitchellSales Representative2744
23MaryGibsonMarketing Specialist164

Conclusion:

Even if I know tree-depth I will go with option #2 as it’s much easier to read and can accommodate future updates to the table. If you are interested in learning more about this and search for “Recursive Query using Common Table Expression” and you should find technical articles that talk about why it does what it does.

Hope this helps!

How to remove line feeds (lf) and character return (cr) from a text field in SQL Server?

Standard

I was doing some data cleaning the other day, I ran into the issue of text fields having line feeds (lf) and character returns (cr) — this creates a lot of issues when you do data import/export. I had run into this problem sometime before as well and didn’t remember what I did back then so I am putting the solution here so it can be referenced later if need be.

To solve this, you need to remove LF, CR and/or combination of both. here’s the T-SQL syntax for SQL Server to do so:

SELECT REPLACE(REPLACE(@YourFieldName, CHAR(10), ' '), CHAR(13), ' ')

if you’re using some other database system then you need to figure out how to identify CR and LF’s — in SQL Server, the Char() function helps do that and there should be something similar for the database system that you’re using.

What is the difference between Row_Number(), Rank() and Dense_Rank() in SQL?

Standard

If the database that you work with supports Window/Analytic functions then the chances are that you have run into SQL use-cases where you have wondered about the difference between Row_Number(), Rank() and Dense_Rank(). In this post, I’ll show you the difference:

So, let’s just run all of them together and see what the output looks like.

Here’s my query: (Thanks StackExchange!)

select DisplayName,Reputation,
Row_Number() OVER (Order by Reputation desc) as RowNumber,
Rank() OVER (Order by Reputation desc) as Rank,
Dense_Rank() OVER (Order by Reputation desc) as DenseRank
from users

Which gives the following output:

DisplayName          Reputation RowNumber Rank DenseRank 
-------------------- ---------- --------- ---- --------- 
Hardik Mishra        9999       1         1    1         
Alex                 9997       2         2    2         
Omnipresent          9997       3         2    2         
Sergei Basharov      9993       4         4    3         
Oleg Pavliv          9991       5         5    4         
Jason Creighton      9991       6         5    4         
Aniko                9991       7         5    4         
Notlikethat          9990       8         8    5         
ZeMoon               9989       9         9    6         
Carl                 9987       10        10   7   
...
...
...     

Note that all the functions are essentially are “ranking” your rows but there are subtle differences:

  1. Row_Number() doesn’t care if the two values are same and it just ranks them differently. Note row #2 and #3, they both have value 9997 but they were assigned 2 and 3 respectively.
  2. Rank() — Now unlike Row_Number(), Rank() would consider that the two values are same and “Rank” them with same value. Note Row #2 and #3, they both have value 9997 and so both were assigned Rank “2” — BUT notice the Rank “3” is missing! In other words, it introduces some “gaps”
  3. Dense_Rank() — Now Dense_Rank() is like Rank() but it doesn’t leave any gaps! Notice that the Rank “3” in the DenseRank field.

I hope this clarified the differences between these SQL Ranking functions — let me know your thoughts in the comments section

Paras Doshi

How do I prepare myself to be a data analyst?

Standard

Originally published on Quora: How do I prepare myself to be a Data Analyst?

Based on how you are framing your question, it seems that you currently don’t have “Data Analysis” Background but want to build a career in this field. Here are three things you could do:

  1. Learn Tech Skills: You will need technical knowledge to be successful at analyzing data. SQL and Excel are a good starting point. You could do a lot with these tools — then depending on the bandwidth that you might have you could explore R. How do you learn this? Here’s a learning pathway: Learn #Data Analysis online – free curriculum ; Also search for free courses on Coursera or other platforms.
  2. Learn Soft/Business Skills: This is as important as tech skills (if not more!) when it comes to Data Analysis. Finding Insights from your data is half the battle, you will need to put the insights in a context/story and influence business decisions and sometimes influence business change. we know change is always hard! So your soft/business skills will be very important. Also, you will benefit a lot from learning about how to break down problems, communicate your solution by using “business” language vs tech-speak.
  3. Apply them (and keep improving): Now that you have picked up some tech and soft/biz skills, apply them! Get an internship, Help out a non-profit in your free time (Data Kind, Statistics Without borders, Volunteer Match are good resources to find a non-profit) and start applying your skills! It would also help you get some “Real” world experience and applying what you have learned while “learning-on-the-job” is arguably the BEST way to pick something up!

Hope that helps!

[VIDEO] Microsoft’s vision for “Advanced analytics” (presented at #sqlpass summit 2015)

Standard

Presented at #sqlpass summit 2015.

SQL Server Reporting services: How to display “There are NO rows” message?

Standard

Problem:

You have a SQL Server reporting services (SSRS) report that has a table which displays some records — but sometimes it can have NO rows; In that case, how to display “There are No rows” message so that it doesn’t confuse the consumer.

Solution:

  1. Open the report in SQL Server Data Tools and go to the “design” tab of your SSRS report
  2. Select your table (do NOT select a cell inside a table. Make sure that the table is selected) SQL Server reporting services NO data rows message
  3. While the “table” is selected, Go the Properties section OR you can use F4
  4. Inside the Properties section, find “No Rows” section and you should see a NoRowsMessage property:SQL Server reporting services NO data rows message v2
  5. Go to the preview tab to make sure it’s working and you should be ready to deploy the change!

That’s it! Hope that helps.

Official reference:  https://msdn.microsoft.com/en-us/library/dd220407.aspx

Author: Paras Doshi

How to change the Data Source of a SQL Server Reporting Services Report (Native Mode)?

Standard

Problem:

You have your SQL Server Reporting Services environment in native mode — and you want to modify the data source of a report there.

Solution:

  1. Navigate to Report Manager.
  2. Navigate to the Report that you want to Manage and run it
  3. After the report renders, you will have a breadcrumb navigation on the top right
  4. Click on the Last Part of the Breadcrumb NavigationSSRS properties report native mode
  5. It should open up the “properties” section of this report
  6. On the properties section, you should be able to manage the data source
    SSRS Manage Data Source Native Mode Shared
  7. Make the changes that you wanted to the data source settings of this SSRS report — and don’t forget to click “apply”
  8. Done!

Author: Paras Doshi

Back to Basics — What is DDL, DML, DCL & TCL?

Standard

I was talking with a database administrator about different categories that SQL Commands fall into — and I thought it would be great to document here. So here you go:

ACRONYMDESCRIPTIONSQL COMMANDS
DMLData Manipulation Language: SQL Statements that affect records in a table.SELECT, INSERT, UPDATE, DELETE
DDLData Definition Language: SQL Statements that create/alter a table structureCREATE, ALTER, DROP
DCLData Control Language: SQL Statements that control the level of access that users have on database objectsGRANT, REVOKE
TCLTransaction Control Language: SQL Statements that help you maintain the integrity of data by allowing control over transactionsCOMMIT, ROLLBACK

BONUS (Advance) QUESTION:

Is Truncate SQL command a DDL or DML? Please use comment section!

Author: Paras Doshi

How to fix the Non-unicode to unicode data type conversion problems in SQL Server Integration Services?

Standard

Problem:

Are you trying to import an Excel file into SQL Server using SQL Server Integration services…And ran into error that has words like “Non unicode” and “unicode”? Then this blog is for you.

Why does this error occur?

Well it turns out that things like SQL Server and Excel have encoding standards that they follow which provides them a way to process, exchange & store data. BUT turns out that SQL Server and Excel use different standards.

Solution:

So, the solution is simple right? Import the data from Excel into non-Unicode format because that’s what you need for SQL Server.

So how do you that? Between your Source and Destination tasks, include a task called “Data conversion” and do the following for all columns that have text:

Excel SQL Server Unicode Nonunicode

And in the destination task, you’ll have to make sure that the mapping section using the new output aliases that you defined in the “data conversion” step.

Conclusion:

In this post, we learned about how to solve a common error that pops up when you try to import excel file to sql server using SSIS. Hope that helps.

Author: Paras Doshi