top of page
Writer's pictureJeff Taylor

Data Types - Does size really matter? - nvarchar vs. varchar


How many of us have created a new table and used the default and first data type nvarchar for all of the text/alphanumeric fields, or had a developer who might have used a GUI to create a new table and all text fields were nvarchar? Maybe we inherited a database with nvarchar or the deprecated 'text' data type.


Some may say, 'Oh, it's just a text column. How bad can it be?'


My first question is, do you need a data type supporting Unicode characters?


Do you store in each nvarchar column national languages such as Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac, Thaana, Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar, Georgian, Hangul, Ethiopic, Cherokee, Canadian Aboriginal Syllabics, Khmer, Mongolian, Han (Japanese, Chinese, Korean ideographs), Hiragana, Katakana, and Yi?


If you don't use any of those languages, you don't need to use nvarchar for all your text fields.


Let us break it down a little and provide perspective.


Below is a query of a nvarchar datatype column named 'Title' with text. The next column specifies the text length using the LEN() function. The final column uses the function DATALENGTH(), which shows you the actual byte size of the column. You will notice that the data length is twice as much as the text length.

Let us look at a query with a varchar datatype column named 'Title' and use the LEN() and DATALENGTH() functions. Notice how the text length and data length column values are exactly the same.

Most tables have multiple text columns, so let us look at a table with several nvarchar text columns.

CREATE TABLE [dbo].[TestPostsN]
(
    [TestId] [INT] NOT NULL IDENTITY(1, 1),
    [Body] [NVARCHAR](MAX) NOT NULL,
    [LastEditorDisplayName] [NVARCHAR](40) NULL,
    [Tags] [NVARCHAR](150) NULL,
    [Title] [NVARCHAR](250) NULL
) ON [PRIMARY];
GO
ALTER TABLE [dbo].[TestPostsN] ADD CONSTRAINT [PK_TestPostsN] PRIMARY KEY CLUSTERED ([TestId]);
GO
CREATE NONCLUSTERED INDEX [IX_TestPostsN_Title] ON [dbo].[TestPostsN]
(
	[Title] ASC
) WITH (SORT_IN_TEMPDB = ON, ONLINE = OFF)
GO

Using sp_SpacedUsed 'dbo.TestPostsN', we can see the size of the table, which is the Primary Key Clustered Index and the one non-clustered index on the table.

Now let us look at a table with several varchar text columns.

CREATE TABLE [dbo].[TestPosts]
    (
        [TestId] [INT] NOT NULL IDENTITY(1, 1),
        [Body] [VARCHAR](MAX) NOT NULL,
        [LastEditorDisplayName] [VARCHAR](40) NULL,
        [Tags] [VARCHAR](150) NULL,
        [Title] [VARCHAR](250) NULL
    ) ON [PRIMARY];
GO
ALTER TABLE [dbo].[TestPosts] ADD CONSTRAINT [PK_TestPosts] PRIMARY KEY CLUSTERED ([TestId]);
GO
CREATE NONCLUSTERED INDEX [IX_TestPosts_Title] ON [dbo].[TestPosts]
(
	[Title] ASC
) WITH (SORT_IN_TEMPDB = ON, ONLINE = OFF)
GO

Using sp_SpacedUsed 'dbo.TestPosts', we can see the table size, which is the Primary Key Clustered Index and the one non-clustered index on the table.

Notice the size difference? With the same data in both tables, one using nvarchar and one using varchar.

Using nvarchar data types more than doubles the clustered index and almost doubles the non-clustered index size.


This is one little example. Now let us step back and think about the server memory used for the database. If these nvarchar indexes are placed in memory vs. the varchar indexes, now we are using more server memory.

Notice how the 'nvarchar' database is more than double the size cached in memory. If your server is limited in memory, using nvarchar will more than double the memory usage.


Let us go even further and look at our backups. Backing up the database with one table.

Notice that even using compression on both databases, the nvarchar database is 68,168 KB more than the varchar database.


Put this in perspective with a large table of 600 million rows. Notice that the nvarchar table is now 1.04 TB more than the varchar table, and the non-clustered index is 27GB more than the varchar index.

With multiple large tables using nvarchar, we are now using more memory for those indexes, and also we increase the backup and restore time for a database that uses nvarchar.


In summary, I think I'll stick with varchar unless I have a national language and even then place that in a translation table with one column using nvarchar or UTF8 on a varchar field to use fewer bytes.


One last thing, you can change your tables and stored procedures from nvarchar to varchar and it should not affect your application. It will still work. If you specify data types inside your code for stored procedures, I would suggest changing those from nvarchar to varchar, but it's not necessary. If you can use a tool such as Redgate SQL Compare making changes across your entire database is very simple.


If you have any questions, or comments or perform these changes in your database to save space and increase performance please let me know!


794 views

1 Comment


Jason Horner
Jason Horner
Sep 01, 2023

This is a great post Jeff, Well done! In the post you briefly mention the ability to use UTF-8 Collation in 2019: https://learn.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver16#utf8 but it probably deserves a bit more callout as it can allow you to use char and varchar in cases where you need a limited set of non-english characters. Thanks for sharing

Like
bottom of page