Problem
Suppose you have this simple query :
var ids = new long[]{1,3,4,5,8,9,10,78,34,23...};
Context.MyTable.Where(a=>ids.Contains(a)).ToList()
if ids
contains one million number, Linq provider will write a giant query like :
SELECT id, field1, field2 WHERE
id IN (1,3,4,5,8,9,10,78,34,23 ...) // until one million numbers
You will have an exception because it’s too much for the SQL provider to execute this query.
You’ll probably think about these 2 solutions :
- Inserting all theses datas in a temporary table (with a third party library), then joining
- Splitting your queries in multiple chunks
Both are bad solutions because you have to Upload millions of number to the SQL Server.
Solution
To resolve this problem, you have to group the numbers by ranges.
Range 0 : 1,2,3,4,5
Range 1 : 19,20
Range 2 : 45,46,47,48,49,50
…..
Range 10 : 125558,125559,125560,125561,125562
IList<Range> GroupArrayToRanges(IEnumerable<int> values)
{
var res = values.Order().Select((num, index) =>
new
{
Id = num,
Rank = index,
Group = num - index
})
.GroupBy(a => a.Group)
.Select(a => new Range(a.Min(a => a.Id), a.Max(a => a.Id)))
.ToList();
return res;
}
Now you have the groups, you need to rewrite the correct LINQ query.
var ids = new int[]{1,3,4,5,8,9,10,78,34,23...};
var ranges = GroupArrayToRanges(ids);
Context.MyTable.Where(a=>
(a>=ranges[0].Start && a<=ranges[0].End)
|| (a>=ranges[1].Start && a<=ranges[1].End)
|| (a>=ranges[2].Start && a<=ranges[2].End)
...
|| (a>=ranges[10].Start && a<=ranges[10].End)
).ToList()
Building this query dynamically would need the use of Expression Trees because of logical OR.
If we want to build this query without Expression, we need to convert to an AND query, by including the whole range first and excluding ranges like this :
var ids = new int[]{1,3,4,5,8,9,10,78,34,23...};
var ranges = GroupArrayToRanges(ids);
Context.MyTable.Where(a=>
(a>=ranges[0].Start && a<=ranges[10].End) // include whole range
&& !(a>ranges[0].End && a<ranges[1].Start) // exclude
&& !(a>ranges[1].End && a<ranges[2].Start) // exclude
&& !(a>ranges[2].End && a<ranges[3].Start) // exclude
...
&& !(a>ranges[9].End && a<ranges[10].Start)
).ToList()
Now you can enjoy an elegant solution without uploading tons of data.
The only drawback of the solution is it will not work if your data cannot be grouped by interval like 1,3,6,8,12,14,16,29,31 => it will create 9 groups…
But if you have 1,2,3,4…..1001045 data, that will perfectly fit !
Go further with SQL and Linq
This is the equivalent way to do in SQL
SELECT
MIN([t].[Id]) AS [Min],
MAX([t].[Id]) AS [Max]
FROM (
SELECT [s].[Id], [s].[Id] - (
SELECT COUNT(*)
FROM [SuperTable] AS [s0]
WHERE [s0].[Id] < [s].[Id]) AS [Key]
FROM [SuperTable] AS [s]
) AS [t]
GROUP BY [t].[Key]
-- if you really need performance, you would better go with ROW_NUMBER
SELECT
MIN([t].[Id]) AS [Min],
MAX([t].[Id]) AS [Max]
FROM (
SELECT
[s].[Id],
s.Id - (ROW_NUMBER() OVER(ORDER BY id ASC)) AS [Key]
FROM [SuperTable] AS [s]
) AS [t]
GROUP BY [t].[Key]
-- if you have an old MYSQL server, you can emulate ROW_NUMBER with a session variable
SET @row_number := 0;
SELECT
MIN(t.id) AS Min,
MAX(t.id) AS Max
FROM (
SELECT
s.id,
s.id - s.RowNumber as `Key`
FROM (
SELECT id, ((@row_number := @row_number + 1)) as RowNumber
FROM
T_USERS
ORDER BY id
) AS s
) AS t
GROUP BY t.Key;
And the Linq (SQL transcriptable) Syntax (without ROW_NUMBER function as it’s not supported by current LINQ providers):
var q = from s in context.Table
orderby s.Id
select new
{
Id = s.Id,
Rank = (from o in context.SuperTables where o.Id < s.Id select o).Count(),
Group = s.Id - (from o in context.SuperTables where o.Id < s.Id select o).Count(),
} into sub2
group sub2 by sub2.Group into sub3
select new
{
Min = sub3.Min(a => a.Id),
Max = sub3.Max(a => a.Id)
};
In a non SQL context, you would probably rewrite Select
by using Select((s,index)
=> syntax to perform ranking like ROW_NUMBER, it’s more efficient.
int[] data = new int[] { 1, 2, 3, 4, 5, 7, 8, 10, 11, 12 };
var res = data.Order().Select((num, index) =>
new
{
Id = num,
Rank = index,
Group = num - index
})
.GroupBy(a => a.Group)
.Select(a =>
new
{
Min = a.Min(a => a.Id),
Max = a.Max(a => a.Id)
})
.ToList();
Update : Use ROW_NUMBER within Entity Framework Core
I developed my own extension to support Row_Number within EntityFramework Core for SqlServer,Sqlite,Mysql,PostreSql here :
https://www.nuget.org/packages?q=Webrox.EntityFrameworkCore
Github repository : https://github.com/Poppyto/Webrox.EntityFrameworkCore
Then use it like that in order to get the id intervals :
var res = context.Table
.Select((a, index) =>
new
{
Id = a.Id,
Group = a.Id - EF.Functions.RowNumber(EF.Functions.OrderBy(a.Id))
})
.GroupBy(a => a.Group)
.Select(a =>
new
{
Min = a.Min(a => a.Id),
Max = a.Max(a => a.Id)
})
.ToList();