本文介绍了MS SQL-如何从长不一致的字符串中提取电话号码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个任务要完成,我必须将大约970,000个用户从其他人的数据库导入到我们的数据库中.在源数据库中,有一个电话号码字段,其中包含多个电话号码,这些电话号码被合并为一个丑陋的字符串.

I have a task to complete where I have to import around 970,000 users from someone else's database into ours. In the source DB, there is a phone number field that contains multiple phone numbers concatenated into one hideous string.

以下是一些数据示例:

|Home: 555-555-5555 Office: (555)-555-5555 Work: 5555555555|
|Home: Office: 555\555-5555 Work: 555-555-5555|
|Office: 555-555-5555 Home: (555)555-5555 some Comment here|

我遇到的问题是

  1. 数字顺序不一致
  2. 有一些关于自由文本的评论
  3. 某些电话号码的格式不同.

如果可能的话,我真的更愿意通过SQL来执行此操作,并且以最少的手动调整以有效的方式执行此操作非常困难.

I would really prefer to do this through SQL if possible and am pretty stumped on doing this in an efficient manner with minimal Manual adjustment.

在我的数据库中,每种电话号码类型都有单独的列,因此我基本上需要将这些字符串分成相应的列.

In my DB, we have separate columns for each phone number type, so I basically need to split those strings into their appropriate columns.

请告诉我是否遗漏了任何东西.

Please tell me if I left anything out.

推荐答案

在此代码中,您将需要3个额外的列来存储新电话号码

In this code, you will need 3 extra columns to store the new phone numbers

这是代码中的逻辑

  1. 拆分电话号码

  1. splitting the phone numbers

在分割结果中的前3个最后3个数字之前截断文本

cutting off text before first 3 last 3 numerics in the split result

删除电话号码中使用的外来字符(仅用于示例中的字符)

deleting the alien characters used in phone number(only those used in sample)

在位置7和4插入替换分隔符'-'

inserting replacement separators '-' at position 7 and 4

分组数据

更新表

样本数据:

DECLARE @t table
  (phone varchar(500), home varchar(50), work varchar(50), office varchar(50))
INSERT @t(phone) values
('Home: 555-555-5551 Office: (555)-555-5555 Work: 5555555552|'),
('|Home: Office: 555\555-5555 Work: 555-555-5555|'),
('|Office: 555-555-5555 Home: (555)555-5555 some Comment here|')

更新:

;WITH CTE as
(
     SELECT
       nid,work, home, office, 
       t.c.value('.', 'VARCHAR(2000)') phone
     FROM (
         SELECT
           row_number() over(order by (select 1)) nid, work, home,office, 
           x = CAST('<t>' + 
               REPLACE(REPLACE(REPLACE(phone, 'Work', '</t><t>work')
               ,'Office', '</t><t>Office'), 'Home', '</t><t>Home')
                + '</t>' AS XML)
        FROM @t -- replace @t with your table
     ) a
     CROSS APPLY x.nodes('/t') t(c)
     WHERE t.c.value('.', 'VARCHAR(2000)') like '%[0-9][0-9][0-9]%'
), CTE2 as
(
SELECT 
work,max(case when phone like '%work%' then z end) over(partition by nid)nwork,
home,max(case when phone like '%home%' then z end) over(partition by nid)nhome,
office,max(case when phone like '%office%' then z end) over(partition by nid)noffice
FROM cte t
CROSS APPLY(SELECT REVERSE(SUBSTRING(phone,PATINDEX('%[0-9][0-9][0-9]%', phone), 20))x)y
CROSS APPLY(SELECT STUFF(STUFF(REPLACE(REPLACE(REPLACE(REVERSE(
  SUBSTRING(x, PATINDEX('%[0-9][0-9][0-9]%', x), 20)), ')', ''), '\', ''),
  '-', ''),7,0, '-'),4,0,'-')z)v    )
UPDATE CTE2
SET work = nwork, home = nhome, office = noffice

SELECT home,work,office FROM @t

结果:

home          work          office
555-555-5551  555-555-5552  555-555-5555
NULL          555-555-5555  555-555-5555
555-555-5555  NULL          555-555-5555

这篇关于MS SQL-如何从长不一致的字符串中提取电话号码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-21 15:27