软件开发培训班 >> 编程开发 >> SQL

基于SQL Server中如何比较两个表的各组数据图解说明

　　本文标签：sql,server数据库

开始

前一阵子，在项目中碰到这样一个SQL查询需求，有两个相同结构的表(table_left & table_right)，如下：

Windows 11可以运行Android应用了，但微软的野心远不止于此	有线网比无线网快，是真的咩？
价格相差数倍！智能手表比智能手环强在哪里？	常用的网站漏洞扫描工具小结

图1.

检查表table_left的各组(groupId)，是否在表table_right中存在有一组(groupId)数据(data)与它的数据(data)完全相等.

如图1. 可以看出表table_left和table_right存在两组数据完整相等：

图2.

一个比特币价值3.6万，普通电脑24小时挖矿，多久能挖到一枚	seo成功必先解决的三个大忌
手机曲面屏和普通屏，你更喜欢哪个？	梦回486时代现在组装电脑到底有多贵?

分析

从上面的两个表，可以知道它们存放的是一组一组的数据；那么，接下来我借助数学集合的列举法和运算进行分析。

先通过集合的列举法描述两个表的各组数据：

图3.

如何提高网站价值？提高网站价值七种内容策略	继承Win10缺点！教你关闭Win11烦人的网络搜索
快递代收点推出霸王条款：网友直言不会接受	还在用QQ、微信快捷截图吗？试试win10自带截图，马上爱不释手

这里只有两种情况，相等和不相等。对于不相等，可再分为部分相等、包含、和完全不相等。使用集合描述，可使用交集，子集，并集。如下面图4.，我列举出这几种常见的情况：

图4.

实现

在数据库中，要找出表table_left和表table_right存在相同数据的组，方法很多，这里我列出两种常用的方法。

(下面的SQL脚本，是以图4.的数据为基础参考)

这些高颜值相机适合毕业旅行和追学妹	为什么小姐姐都喜欢大光圈拍照
路由器的USB接口的5个用途，你都知道吗？	黑客大佬Dan Kaminsky去世，年仅42岁！

方法1：

通过"Select … From …Order by … xml for path() "把各组的data列数据连串起来（如，图4.把table_left的组#11的列data连串起来成"data1-data2-data3"），其他分组（包含表table_right）以此方法实现data列数据连串起来；然后通过比较两表的连串后字段是否存在相等，若是相等就说明这比较多两组数据相等，由此可以判断出表table_left的哪组数据在表table_right存在与它数据完全相等的组。

针对方法1，需要对原表增加一个字段dataPath,用于存储data列数据连串的结果，如：

复制代码代码如下:

alter table table_left add dataPath nvarchar(200) 
alter table table_right add dataPath nvarchar(200)

分组连串data列数据并update至刚新增的列dataPath，如：

复制代码代码如下:

update a 
    set dataPath=b.dataPath 
    from table_left a 
        cross apply(select (select -+x.data from table_left x where x.groupId=a.groupId order by x.data for xml path()) as dataPath)b 

update a 
    set dataPath=b.dataPath 
    from table_right a 
        cross apply(select (select -+x.data from table_right x where x.groupId=a.groupId order by x.data for xml path()) as dataPath)b

接下来就是查询了，如：

复制代码代码如下:

select distinct a.groupId 
    from table_left a 
    where exists(select 1 from table_right x where x.dataPath=a.dataPath)

完整代码：

复制代码代码如下:

View Code 
use tempdb
go
if object_id(table_left) is not null drop table table_left
if object_id(table_right) is not null drop table table_right
go
create table table_left(groupId nvarchar(5),data nvarchar(10))
create table table_right(groupId nvarchar(5),data nvarchar(10))
go
alter table table_left add dataPath nvarchar(200)
alter table table_right add dataPath nvarchar(200)
go
create nonclustered index ix_left on table_left(dataPath)
create nonclustered index ix_right on table_right(dataPath)
go
set nocount on
go
insert into table_right(groupId,data)
select #1,data1 union all
select #1,data2 union all
select #1,data3 union all
select #2,data55 union all
select #2,data55 union all
select #3,data91 union all
select #3,data92 union all
select #4,data65 union all
select #4,data66 union all
select #4,data67 union all
select #4,data68 union all
select #4,data69 union all
select #5,data77 union all
select #5,data79
insert into table_left(groupId,data)
select #11,data1 union all
select #11,data2 union all
select #11,data3 union all
select #22,data55 union all
select #22,data57 union all
select #33,data99 union all
select #33,data99 union all
select #44,data66 union all
select #44,data68 union all
select #55,data77 union all
select #55,data78 union all
select #55,data79
go
update a 
    set dataPath=b.dataPath
    from table_left a
        cross apply(select (select -+x.data from table_left x where x.groupId=a.groupId order by x.data for xml path()) as dataPath)b
update a 
    set dataPath=b.dataPath
    from table_right a
        cross apply(select (select -+x.data from table_right x where x.groupId=a.groupId order by x.data for xml path()) as dataPath)b
--
select distinct a.groupId
    from table_left a
    where exists(select 1 from table_right x where x.dataPath=a.dataPath)

方法2：

通过SQL Sever提供的集运算符"Except"，判断两组非重复的数据。如果两组针对对方都不存在非重复的数据，就说明这两组数据完全相等。如，表table_left中的组#11和表 table_right中的组#1，对列data进行"Except"集运算，无任是(#11 à #1)进行Except集运算，还是(#1 à #11 )进行Except集合运算，都返回空结果，这就说明组#1 和#11的data数据完全相等，如：

复制代码代码如下:

select data from table_left where groupId=#11 except select data from table_right where groupId=#1 
select data from table_right where groupId=#1 except select data from table_left where groupId=#11 

同样道理，我们把表table_left中的组#11和表 table_right中的组#2，对列data进行"Except"集运算，如:

复制代码代码如下:

select data from table_left where groupId=#11 except select data from table_right where groupId=#2 
select data from table_right where groupId=#2 except select data from table_left where groupId=#11

只要(#11 à #2 )或 (#2 à #11 )的"Except"集运算结果有记录，就说明两组的数据不相等。

两张表的所有组都进行比较，我们需要通过以下SQL脚本实现，如：

复制代码代码如下:

select distinct a.groupId 
    from table_left a 
        inner join table_right b on b.data=a.data 
    where not exists(select x.data from table_left x where x.groupId=a.groupId except select y.data from table_right y where y.groupId=b.groupId ) 
        and not exists(select x.data from table_right x where x.groupId=b.groupId except select y.data from table_left y where y.groupId=a.groupId )

完整代码：

复制代码代码如下:

View Code 
use tempdb
go
if object_id(table_left) is not null drop table table_left
if object_id(table_right) is not null drop table table_right
go
create table table_left(groupId nvarchar(5),data nvarchar(10))
create table table_right(groupId nvarchar(5),data nvarchar(10))
go
create nonclustered index ix_left on table_left(data)
create nonclustered index ix_right on table_right(data)
go
set nocount on
go
insert into table_right(groupId,data)
select #1,data1 union all
select #1,data2 union all
select #1,data3 union all
select #2,data55 union all
select #2,data55 union all
select #3,data91 union all
select #3,data92 union all
select #4,data65 union all
select #4,data66 union all
select #4,data67 union all
select #4,data68 union all
select #4,data69 union all
select #5,data77 union all
select #5,data79
insert into table_left(groupId,data)
select #11,data1 union all
select #11,data2 union all
select #11,data3 union all
select #22,data55 union all
select #22,data57 union all
select #33,data99 union all
select #33,data99 union all
select #44,data66 union all
select #44,data68 union all
select #55,data77 union all
select #55,data78 union all
select #55,data79
go
--select 
select distinct a.groupId
    from table_left a
        inner join table_right b on b.data=a.data
    where not exists(select x.data from table_left x where x.groupId=a.groupId except select y.data from table_right y where y.groupId=b.groupId )
        and not exists(select x.data from table_right x where x.groupId=b.groupId except select y.data from table_left y where y.groupId=a.groupId )

方法1 Vs. 方法2 ：

方法1和方法2都能找出表table_left在table_right存在数据完全相等的组#11 。但性能角度上，方法2比方法1略胜一筹，可以看它们执行过程的统计信息：

方法1：

图5.

方法2：

图6.

如果，数据量大情况下，那么方法2比方法1更具有明显的优点。因为方法1，多两个更新dataPath的部分，数据量随着增加，这里位置的更新就耗很多的资源；如果dataPath列数据大小超过900字节，会导致无法在dataPath创建索引，影响后面的Select查询性能。

扩展

这里说扩展，主要是针对上面的方法2来说。在当列data的数据大小超过900字节，或者含有多个数据列要进行比较，看是否存在两组(groupId)的各对应列数据一一相等。

图7.

这样的情况，可对字段dataSub1 & dataSub2 创建一个哈希索引，如：

复制代码代码如下:

alter table table_left add dataChecksum as checksum(dataSub1,dataSub2) 
alter table table_right add dataChecksum as checksum(dataSub1,dataSub2) 
go 
create nonclustered index ix_table_left_cs on table_right(dataChecksum) 
create nonclustered index table_right_cs on table_right(dataChecksum)

后面的select查询语句，在Inner Join 部分稍改动下即可，如：

复制代码代码如下:

select distinct a.groupId 
    from table_left a 
        inner join table_right b on b.dataChecksum=a.dataChecksum 
            and b.dataSub1=a.dataSub1 
            and b.dataSub2=a.dataSub2 
    where not exists(select x.dataSub1,x.dataSub2 from table_left x where x.groupId=a.groupId except select y.dataSub1,y.dataSub2 from table_right y where y.groupId=b.groupId ) 
        and not exists(select x.dataSub1,x.dataSub2 from table_right x where x.groupId=b.groupId except select y.dataSub1,y.dataSub2 from table_left y where y.groupId=a.groupId )

完整代码：

复制代码代码如下:

View Code 
use tempdb
go
if object_id(table_left) is not null drop table table_left
if object_id(table_right) is not null drop table table_right
go
create table table_left(groupId nvarchar(5),dataSub1 nvarchar(10),dataSub2 nvarchar(10))
create table table_right(groupId nvarchar(5),dataSub1 nvarchar(10),dataSub2 nvarchar(10))
go
alter table table_left add dataChecksum as checksum(dataSub1,dataSub2)
alter table table_right add dataChecksum as checksum(dataSub1,dataSub2)
go
create nonclustered index ix_table_left_cs on table_left(dataChecksum)
create nonclustered index table_right_cs on table_right(dataChecksum)
go
set nocount on
go
insert into table_right(groupId,dataSub1,dataSub2)
select #1,data1,data7 union all
select #1,data2,data8 union all
select #1,data3,data9 union all
select #2,data55,data4 union all
select #2,data55,data5 
insert into table_left(groupId,dataSub1,dataSub2)
select #11,data1,data7 union all
select #11,data2,data8 union all
select #11,data3,data9 union all
select #22,data55,data0 union all
select #22,data57,data2 union all
select #33,data99,data4 union all
select #33,data99,data6 
go
--select 
select distinct a.groupId
    from table_left a
        inner join table_right b on b.dataChecksum=a.dataChecksum
            and b.dataSub1=a.dataSub1
            and b.dataSub2=a.dataSub2
    where not exists(select x.dataSub1,x.dataSub2 from table_left x where x.groupId=a.groupId except select y.dataSub1,y.dataSub2 from table_right y where y.groupId=b.groupId )
        and not exists(select x.dataSub1,x.dataSub2 from table_right x where x.groupId=b.groupId except select y.dataSub1,y.dataSub2 from table_left y where y.groupId=a.groupId )

小结

对于这个问题，可能还有其他的或更优的解决方法.而且在实际的生产环境中，可能碰到的情况会有所不同，无论如何，需要多分析，多动手多实验，找到最优的解决方法。

技术文章快速查找

Oracle与SQL Server之间的数据迁移

一段压缩MS SQLServer日志的语句

SQL MSSQL 常用代码大全

优化：LECCO SQL Expert

SQLSERVER的排序问题结果不是想要的

SQL Server空值比较的实际搜索条件

SQL Server Agent服务

SQL Server管理工具的升级方法

相关下载

· 金石舆情监测SQL单机版(64bit) 10.7

在线教程导航

软件应用
·Windows8	·Windows7	·Word
·Excel	·PPT	·WPS
Web开发
·ASP	·JavaScript	·DIV+CSS
·JSP	·VbScript	·XML
·PHP
开发语言
·VB	·VC	·ASP.NET
·Java	·C++	·Delphi
数据库开发
·MySQL	·MsSQL	·Access
·Oracle	·DB2
手机系统
·Android	·iOS	·WindowsPhone
网站设计
·Flash	·Dreamweaver	·Fireworks
平面设计
·Photoshop	·CorelDraw	·AutoCAD
·3DsMAX	·Illustrator
网络技术
·网站运营	·网络安全	·网络搭建

软件开发培训班 >> 编程开发 >> SQL

基于SQL Server中如何比较两个表的各组数据 图解说明

基于SQL Server中如何比较两个表的各组数据图解说明