How to remove line feeds (lf) and character return (cr) from a text field in SQL Server?

Standard

I was doing some data cleaning the other day, I ran into the issue of text fields having line feeds (lf) and character returns (cr) — this creates a lot of issues when you do data import/export. I had run into this problem sometime before as well and didn’t remember what I did back then so I am putting the solution here so it can be referenced later if need be.

To solve this, you need to remove LF, CR and/or combination of both. here’s the T-SQL syntax for SQL Server to do so:

SELECT REPLACE(REPLACE(@YourFieldName, CHAR(10), ' '), CHAR(13), ' ')

if you’re using some other database system then you need to figure out how to identify CR and LF’s — in SQL Server, the Char() function helps do that and there should be something similar for the database system that you’re using.

What is the difference between Row_Number(), Rank() and Dense_Rank() in SQL?

Standard

If the database that you work with supports Window/Analytic functions then the chances are that you have run into SQL use-cases where you have wondered about the difference between Row_Number(), Rank() and Dense_Rank(). In this post, I’ll show you the difference:

So, let’s just run all of them together and see what the output looks like.

Here’s my query: (Thanks StackExchange!)

select DisplayName,Reputation,
Row_Number() OVER (Order by Reputation desc) as RowNumber,
Rank() OVER (Order by Reputation desc) as Rank,
Dense_Rank() OVER (Order by Reputation desc) as DenseRank
from users

Which gives the following output:

DisplayName          Reputation RowNumber Rank DenseRank 
-------------------- ---------- --------- ---- --------- 
Hardik Mishra        9999       1         1    1         
Alex                 9997       2         2    2         
Omnipresent          9997       3         2    2         
Sergei Basharov      9993       4         4    3         
Oleg Pavliv          9991       5         5    4         
Jason Creighton      9991       6         5    4         
Aniko                9991       7         5    4         
Notlikethat          9990       8         8    5         
ZeMoon               9989       9         9    6         
Carl                 9987       10        10   7   
...
...
...     

Note that all the functions are essentially are “ranking” your rows but there are subtle differences:

  1. Row_Number() doesn’t care if the two values are same and it just ranks them differently. Note row #2 and #3, they both have value 9997 but they were assigned 2 and 3 respectively.
  2. Rank() — Now unlike Row_Number(), Rank() would consider that the two values are same and “Rank” them with same value. Note Row #2 and #3, they both have value 9997 and so both were assigned Rank “2” — BUT notice the Rank “3” is missing! In other words, it introduces some “gaps”
  3. Dense_Rank() — Now Dense_Rank() is like Rank() but it doesn’t leave any gaps! Notice that the Rank “3” in the DenseRank field.

I hope this clarified the differences between these SQL Ranking functions — let me know your thoughts in the comments section

Paras Doshi

Back to Basics — What is DDL, DML, DCL & TCL?

Standard

I was talking with a database administrator about different categories that SQL Commands fall into — and I thought it would be great to document here. So here you go:

ACRONYMDESCRIPTIONSQL COMMANDS
DMLData Manipulation Language: SQL Statements that affect records in a table.SELECT, INSERT, UPDATE, DELETE
DDLData Definition Language: SQL Statements that create/alter a table structureCREATE, ALTER, DROP
DCLData Control Language: SQL Statements that control the level of access that users have on database objectsGRANT, REVOKE
TCLTransaction Control Language: SQL Statements that help you maintain the integrity of data by allowing control over transactionsCOMMIT, ROLLBACK

BONUS (Advance) QUESTION:

Is Truncate SQL command a DDL or DML? Please use comment section!

Author: Paras Doshi

SQL Server Query Fundamentals: A Simple example of a Query that uses PIVOT:

Standard

Problem:

Convert the following source data into a schema shown below:

SQL SERVER TSQL PIVOTSolution:

Here’s the code that uses PIVOT function to get to the solution, please use this as a starting point.

Note the use of aggregation function avg – this will depend on the requirement. In the example, the Test_value need to be average if more than one tests were performed.

[code language=”SQL”]

— source data
SELECT [Product_ID],[Test_Desc],[Test_Val] FROM [dbo].[Address]
go

— Destination data using PIVOT function
select * from [dbo].[Address]
pivot( avg(test_val) for test_Desc IN (Test1,Test2,Test3,Test4,Test5))
as Tests

[/code]

TSQL – Quick note about numeric data type to solve “Arithmetic overflow error”

Standard

Problem:

You are working on a query where you are trying to convert source data to numeric data type and you get an “Arithmetic overflow error”.

Solution:

Let’s understand this with an example:

Here’s the source data: 132.56000000 and you want to store just 132.56 so write a query that looks like:

cast([source language=”column”][/source] as numeric(3,2)) as destination_column_name

and after you run the query its throws an error “Arithmetic Overflow Error” – so what’s wrong?

The issue is that you incorrectly specified the precision and scale – by writing the query that says numeric(3,2) you are saying I want 3 data places with 2 on the right (after decimal point) which leaves just 1 place for left.

what you need to write is numeric(5,2) – and this will have 2 places on the right and leaves 3 places for left.

so after you run this, it shouldn’t complain about the arithmetic overflow error. you just need to make sure that the precision and scale of the numeric data type is correct.

Conclusion:

In this post, you saw an example of how to correctly use the precision and scale in the numeric data type and that should help you solve the arithmetic overflow errors.

SQL Server: How to insert explicit values into an identity column of a table?

Standard

In a SQL server data mart, it’s common to have an Identity column (SK columns) in a Fact Table. And it’s also common to add a -1 record to this table for “unknown values”. So if you want to insert an explicit value into an identity column in sql server table, here are the steps:

1) In SSMS, select the Table the Object Explorer > Right click > Script Table as > Insert To > “New Query Editor Window”

2) This gives you a nice starting point to created your identity insert script:

3) Add following code before the insert statement:

SET IDENTITY_INSERT <schema_name>.<table_name> ON
go

4) Add following code after the insert statement:

SET IDENTITY_INSERT <schema_name>.<table_name> OFF
go

5) Now modify the “VALUES” section of the insert statement. Also, Note that the identity column would not be in column list, you’ll have add it manually in the script.

After the values are added, here’s what the code should look like:

[code language=”sql” gutter=”false”]

USE
GO

SET IDENTITY_INSERT <schema_name>.<table_name> ON
go

INSERT INTO <schema_name>.<table_name>
([Col1_sk]
,[col2])
VALUES
(-1,
‘N/A’)
GO

SET IDENTITY_INSERT <schema_name>.<table_name> OFF
go

[/code]

Conclusion:
I hope this gives you a good starting point to create an identity insert script in SQL server table.

How to use TSQL checksum to compare data between two tables?

Standard

In any BI project, data validation plays an important part. You want to make sure that the data is right! usually business helps in this validation. As a developer, you might also want to do some preliminary data validation. One of the techniques that I’ve learned recently is to use TSQL checksum to compare data between two tables. In this post , I’ll describe the technique & post a pseudo code.

we’ll create a pseudo code to compare all columns but you should be able to use that to tweak that if you need it.

1) Run checksum(*) on Tables:

On Table1:

select checksum(*) as chktb1 from table1
go

On Table 2:

select checksum(*) as chktb2 from table2
go

At this point, you should get two result sets each populated by checksum values computer over all columns since you passed * in the checksum function.

2) Now let’s join these tables & look at rows w/ different checksum: (in other words, it is going to list all rows that are different between table1 & table2)

select * from
(
select checksum(*) as chktb1 from table1
) as tb1
left join
(
select checksum(*) as chktb2 from table2
) as tb2
on tb1.someid=tb2.someid /* you can have more ids */
where tb1.chktb1 != tb2.chktb2

3) You can add individual column now to see what changed:

select * from
(
select checksum(*) as chktb1, columnname1, columnname2 from table1
) as tb1
left join
(
select checksum(*) as chktb2, columnname1, columnname2 from table2
) as tb2
on tb1.someid=tb2.someid
where tb1.chktb1 != tb2.chktb2

Conclusion:
I hope this helps especially if you don’t have rights to install 3rd party tools on your dev machine.

TSQL Script: How to get list of all tables or view in a database?

Standard

I was documenting the list of tables/views in a data mart & staging databases & I found the following scripts useful:

TSQL To get list of all tables:

[sourcecode type=”sql” wraplines=”false”]
SELECT * FROM <DatabaseName>.information_schema.tables
where TABLE_TYPE=’BASE TABLE’
[/sourcecode]

TSQL To get list of all views:
[sourcecode type=”sql” wraplines=”false”]
SELECT * FROM <DatabaseName>.information_schema.tables
where TABLE_TYPE=’VIEW’
[/sourcecode]

Alternatives (for SQL 2005 onwards):
[sourcecode type=”sql” wraplines=”false”]
SELECT * FROM SYS.TABLES
[/sourcecode]

SQL Spatial Data Types: A Tip to fix STIntersects Method returning NULL values.

Standard

I was working on some queries that used SQL Spatial data types and ran into a problem where STIntersects function was returning NULL. I was expecting it to say 1 or 0, but it kept on returning NULL. Here’s how I solved it:

1. On Technet article for STIntersects, it said “This method always returns null if the spatial reference IDs (SRIDs) of the geometry instances do not match.

here’s my pseudo code for you:
– – Function
– – Input Parameters @Area Geometry, @Point1 varchar(10), @Point2 varchar(10)
DECLARE @p point;
SET @p = geometry::STGeomFromText(‘POINT(@Point1 @Point2)’, 0);
– -Note that the 0 in the above line is SRID
return(SELECT @p.STIntersects(@area));

2. So next step was to make sure that the two geometry instances that I was using had same SRID’s.

3. Since I was passing the @area (geometry data type) to the function, I had to check the SRID for @area.
so wrote a single line of code that used the STSrid function.

Select @area.STSrid

That told me that the SRID of the geometric data type was 4xxx

4. So now I modified the original code to change the SRID of @p variable to same as that of @area variable.

So here’s the updated code:
DECLARE @p point;
SET @p = geometry::STGeomFromText(‘POINT(@Point1 @Point2)’, 4xxx);
return(SELECT @p.STIntersects(@area));

5. Now since the SRID of @area and @p matched, it started returning 1 or 0 values as expected.

A quick note on how select @@version helps me while I’m T-SQL’ing:

Standard

As a part of developing ETL packages, sometimes, I’ve to write T-SQL queries to pull data from SQL server source systems. But before I start doing that, it’s always good to know the version/edition of the source system. Why? because it can determine whether a TSQL operators are available for me to use or not. Case in point, I had a requirements where I could have written a query that uses Pivot & UnPivot operators. So I write a query & it doesn’t work! I spent about 5 minutes trying to debug the code. The code seems OK to me. So I thought of checking the “version”. And there you go, client’s source system was running SQL Server 2000. So that meant, I couldn’t use the Pivot & UnPivot operators.

Select Version SQL serverThis was my quick note on how select @@version helps me while I’m TSQL’ing. Next time, I’ll probably check this first, before writing the code. That could save me few minutes 🙂