The best SQL feature you probably don't know about

Intro

Back in 2017, a colleague and I started research on a new project. This would be a ground-up redesign of our existing software and would later become known as Omnidek.com - a digital forms and workflow tool. Our research included testing the maintenance, pricing, and usability of cloud platforms like Azure and AWS, weighing the pros and cons of server-side rendering vs client-side rendering, determining the limits of noSQL vs SQL, and much more.

After months of research, we settled on a .NET Core backend, Angular frontend, and SQL for the database - all hosted on Azure. We started prototyping the main engine that would drive digital Form creation and interaction. The Form object would become very heavy over time as it included the structure data for the form and the user's input data. For example, a Form had Sections, Sections had Questions, and Questions had Properties. These were all nested objects/arrays within their parents. Below is a summarized example of what a Form's JSON object may look like on the front end:

{
    "id": 1,
    "name": "My Form",
    "sections": [
        {
            "id": 1,
            "name": "Section 1",
            "sortOrder": 1,
            "questions": [
                {
                    "id": 1,
                    "name": "Question 1",
                    "sortOrder": 1,
                    "properties": {
                         "id": 1,
                         "caption": "Question 1",
                         "dataType": "string"
                    }
                },
                {
                    "id": 2,
                    "name": "Question 2",
                    "sortOrder": 2,
                    "properties": {
                         "id": 1,
                         "caption": "Question 2",
                         "dataType": "date"
                    }
                }
            ]
        },
        {
            "id": 2,
            "name": "Section 2",
            "sortOrder": 2,
            "questions": [
                {
                    "id": 1,
                    "name": "Question 3",
                    "properties": {
                         "id": 1,
                         "caption": "Question 3",
                         "dataType": "number"
                    }
                }
            ]
        }
    ]
}

We developed the front end prototype with the purpose of testing user experience - which was a success. We then shifted our efforts to determining and defining the tables we need in SQL to store the Form structure data: Forms, Sections, Questions. We also developed a base .NET Core backend API for retrieving the data from our database and passing it to the front end. The API had a generic Controller, Service, Repository structure. An Entity Framework (EF) Core LINQ query to the database would look like the following:

...
var form = await context.Forms.Where(x => x.Id = 1)
                          .Include(x => x.Sections)
                          .Include(x => x.Questions).FirstOrDefaultAsync();
return form;
...

This is where we hit a problem. At the time, with EF Core 2.1, you could not .OrderBy() on a .Include(). This was a massive problem for us as we had to order Sections and Questions by their "sortOrder" property. There were also other SQL system functions not available to us through EF. So we found ourselves back doing research, how can we overcome this problem without querying the nested entities separately, sorting and merging them in memory at runtime? How can we use the power of SQL without being restricted by the adolescent state of EF Core? I insisted that there must be a clean way of doing this.

After many hours of research and testing, we finally found the solution... SQL's FOR JSON clause.

Not only did this solve the few issues we encountered, it would also go on to change the way I design and develop software!

Welcome to SQL's FOR JSON

the following code and examples use Microsoft's AdventureWorks2017 database.

How it works

SQL's FOR JSON clause is used to convert the results of a SQL query to a serialized JSON string. Simply add the clause "FOR JSON PATH" after your SQL query, like this:

select p.BusinessEntityID
      ,p.PersonType
      ,p.NameStyle
      ,p.Title
      ,p.FirstName
      ,p.MiddleName
      ,p.LastName
from Person.Person p
for json path

result:

[
   {
      "BusinessEntityID":1,
      "PersonType":"EM",
      "NameStyle":false,
      "FirstName":"Ken",
      "MiddleName":"J",
      "LastName":"Sánchez"
   },
   {
      "BusinessEntityID":2,
      "PersonType":"EM",
      "NameStyle":false,
      "FirstName":"Terri",
      "MiddleName":"Lee",
      "LastName":"Duffy"
   },
   {
      "BusinessEntityID":3,
      "PersonType":"EM",
      "NameStyle":false,
      "FirstName":"Roberto",
      "LastName":"Tamburello"
   },
   ...
]

There are three additional keywords to complement the FOR JSON clause:

PATH or AUTO - PATH means you control the structure of the JSON object. AUTO means the SELECT statement will determine the JSON structure. I HIGHLY RECOMMEND TO ALWAYS USE PATH INSTEAD OF AUTO

INCLUDE_NULL_VALUES - this keyword will force the result JSON object to include properties whose values are null. If you do not include this keyword, null-valued properties will be omitted from the result object.

WITHOUT_ARRAY_WRAPPER - by default, all FOR JSON queries are returned as an array of objects. This keyword forces the query to exclude the opening and closing brackets "[]" from the result JSON object.

Feel free to add any other WHERE, GROUP BY, ORDER BY, etc. clauses to your query. For nested objects, you must place a nested sub-select in your query by passing it to the JSON_QUERY() function, like this:

select p.BusinessEntityID
      ,p.PersonType
      ,p.NameStyle
      ,p.Title
      ,p.FirstName
      ,p.MiddleName
      ,p.LastName
      ,json_query((
        select ea.*
        from Person.EmailAddress ea
        where ea.BusinessEntityID = p.BusinessEntityID
        for json path, include_null_values, without_array_wrapper
      )) as EmailAddress
from Person.Person p
for json path, include_null_values

result:

[
   {
      "BusinessEntityID":1,
      "PersonType":"EM",
      "NameStyle":false,
      "FirstName":"Ken",
      "MiddleName":"J",
      "LastName":"Sánchez",
      "EmailAddress":{
         "BusinessEntityID":1,
         "EmailAddressID":1,
         "EmailAddress":"ken0@adventure-works.com",
         "rowguid":"8A1901E4-671B-431A-871C-EADB2942E9EE",
         "ModifiedDate":"2009-01-07T00:00:00"
      }
   },
   {
      "BusinessEntityID":2,
      "PersonType":"EM",
      "NameStyle":false,
      "FirstName":"Terri",
      "MiddleName":"Lee",
      "LastName":"Duffy",
      "EmailAddress":{
         "BusinessEntityID":2,
         "EmailAddressID":2,
         "EmailAddress":"terri0@adventure-works.com",
         "rowguid":"B5FF9EFD-72A2-4F87-830B-F338FDD4D162",
         "ModifiedDate":"2008-01-24T00:00:00"
      }
   },
   {
      "BusinessEntityID":3,
      "PersonType":"EM",
      "NameStyle":false,
      "FirstName":"Roberto",
      "LastName":"Tamburello",
      "EmailAddress":{
         "BusinessEntityID":3,
         "EmailAddressID":3,
         "EmailAddress":"roberto0@adventure-works.com",
         "rowguid":"C8A51084-1C03-4C58-A8B3-55854AE7C499",
         "ModifiedDate":"2007-11-04T00:00:00"
      }
   },
   ...
]

Best Practices

  • All FOR JSON queries should be executed in a stored procedure or function (scalar or table-valued).

  • Never use FOR JSON AUTO, it is not reliable and it's unnecessary. Always use FOR JSON PATH.

  • Use this boilerplate for all JSON queries - the reason for this is that FOR JSON returns a string and SQL will truncate the string unless you select the result into a nvarchar(max) variable.

...
declare @result nvarchar(max)
set @result =
(
    -- json query goes here
    -- example:
    -- select p.*
    -- from Person.Person p
    -- for json path, include_null_values
)
select @result;
...
  • For nested JSON queries, wrap the query in the JSON_QUERY() function. JSON_QUERY() deserializes the resulting string from a FOR JSON query.
select p.BusinessEntityID
      ,p.PersonType
      ,p.NameStyle
      ,p.Title
      ,p.FirstName
      ,p.MiddleName
      ,p.LastName
      ,json_query((
        select ea.*
        from Person.EmailAddress ea
        where ea.BusinessEntityID = p.BusinessEntityID
        for json path, include_null_values, without_array_wrapper
      )) as EmailAddress
from Person.Person p
for json path, include_null_values

** NOTE - the JSON_QUERY() function also allows you to query an arbitrary JSON string. E.g. you want to store user preferences for users in your database but the structure could vary or be dynamic. You can create a nvarchar(max) "Preferences" column in your Users table and store the JSON as a serialized string. Then you can query that column like this:

select u.*
      ,json_query(u.Preferences) as Preferences
from Users u
for json path, includes_null_values, without_array_wrapper

"Preferences" will be a nested object in the result JSON

  • Use the communication interface of your choice (Dapper, ADO.NET, EF Core, etc.). All that is required is the ability to call procedures and functions from your application. All examples in this article return a string that can be deserialized into your C# object.

  • You can use the "DRY" principle with SQL's FOR JSON. Most JSON queries can be reduced to a SQL Function. That SQL Function can then be nested in other JSON queries, like this:

select p.*
      ,json_query(dbo.MyFunction(p.BusinessEntityId)) MyFunctionResult
from Person.Person p
for json path, include_null_values

From my research and years of heavy use, the overhead of using FOR JSON is negligible, if at all present. I have tested FOR JSON on multiple layers of nested data containing millions of records. Properly indexed tables and good query writing will perform equally, whether you use FOR JSON or not. Where you can get yourself into trouble is by storing serialized JSON in your database. It is fine to do this for small JSON objects (less than 400 characters). I recommend NOT doing this, but if you must have large, serialized JSON objects (greater than 1000 characters) stored in every record of a table, you must do two things:

  • Move that column of data to its own table

  • Compress the data in the column

Conclusion

I have reached out to various content creators on this topic and they seem to be uninterested in SQL's FOR JSON. In my opinion, it is an underused and very powerful feature of SQL that has changed the way I design and develop software. I hope this article has at least opened the door for anyone who finds themselves in the position I was in years back.

If you would like to see more examples or have any other questions or concerns, please comment below.