问题描述
我有一个excel表与两列。第一列是关键短语,第二列是消息。关键短语可能发生在消息列中。我需要知道消息列中发生了一个关键短语的次数。关键词是一列,消息是第二列。消息列是1个或多于1个关键短语的组合(连接)。我需要知道每个消息包含多少关键短语。还有一些消息与他们有一些日期和数字。还有一些消息中有日期和数字,目前与(xx-xxx-xxxx)作为该日期/数字的匹配关键短语。
例如消息是该交易于2014年10月8日结束,因此不允许进一步交易,关键短语是关闭(xx-xxx-xxxx) 。还有消息是交易号码4238428DDSSD有问题,关键字短语是交易号码xxxxxxxx hass问题。需要正则表达式匹配。
您可以选择一些关键字短语,为其创建正则表达式模式,然后对短语进行编码,使的regex关键字
随意修改附加附加关键字,短语和RegEx模式。
I have an excel sheet with two columns. The first column is the key phrase and the second is the messages. The key phrase may occur in the messages column. I need to know how many times a key phrase has occurred in messages column.
The key phrase is one column and the messages is the second column. The messages column is combination (concatenation) of 1 or more than 1 key phrases. I need to find out that how many key phrases does each message contain. Also some of the messages have some dates and numbers with them. Also some of the messages have dates and numbers in them, the matching key phrase as that date/numbe as (xx-xxx-xxxx) presently.
e.g. the message is "The deal closed on 08-Oct-2014 so no further transaction allowed" and the key phrase is "The deal closed on (xx-xxx-xxxx)". Also there are messages as "Deal number 4238428DDSSD has problems" and the keyphrase is "Deal number xxxxxxxx hass problems". The regex matching is required.
You can pick a few keyword phrases, create the regex pattern for them, then encode the phrases such that a Range.Replace method can be used on them to substitute an appropriate RegEx pattern mask into the keyword phrase.
In the following, I've used X00000000X, XSHORTDATEX and XDEALNMBRX as placeholders within the keywords. These will be replaced with [0-9,-]{7,8}, [0-9,-]{3}[a-z]{3}[0-9,-]{3,5} and [0-9]{7}[a-z]{5} respectively.
X00000000X is designed to handle anything that looks like 1234567 or * 99-11-00*. XSHORTDATEX will handle dates in the dd-mmm-yy or dd-mmm-yyyy format (once converted to lower case) and XDEALNMBRX will locate alphanumeric patterns similar to 4238428DDSSD.
Sub count_strings_inside_strings_rgx()
Dim rw As Long, lr As Long
Dim k As Long, p As Long, vKEYs As Variant, vPHRASEs As Variant, vCOUNTs As Variant
Dim sPATTERN As String, vbaRGX As New RegExp, cMATCHES As MatchCollection
ReDim vKEYs(0)
ReDim vPHRASEs(0)
With Worksheets("Sheet1") '<~~ set to the correct worksheet name\
'populate the vKEYs array
For rw = 2 To .Cells(Rows.Count, 1).End(xlUp).Row
vKEYs(UBound(vKEYs)) = LCase(.Cells(rw, 1).Value2)
ReDim Preserve vKEYs(UBound(vKEYs) + 1)
Next rw
ReDim Preserve vKEYs(UBound(vKEYs) - 1)
'populate the vPHRASEs array
For rw = 2 To .Cells(Rows.Count, 2).End(xlUp).Row
vPHRASEs(UBound(vPHRASEs)) = LCase(.Cells(rw, 2).Value2)
ReDim Preserve vPHRASEs(UBound(vPHRASEs) + 1)
Next rw
ReDim Preserve vPHRASEs(UBound(vPHRASEs) - 1)
ReDim vCOUNTs(0 To UBound(vPHRASEs))
For p = LBound(vPHRASEs) To UBound(vPHRASEs)
For k = LBound(vKEYs) To UBound(vKEYs)
sPATTERN = Replace(vKEYs(k), "x00000000x", "[0-9,\-]{7,8}")
sPATTERN = Replace(sPATTERN, "xshortdatex", "[0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}")
sPATTERN = Replace(sPATTERN, "xdealnmbrx", "[0-9]{7}[a-z]{5}")
sPATTERN = Replace(sPATTERN, "xshortwrapdatex", "\([0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}\)")
With vbaRGX
.Global = True
.Pattern = sPATTERN
Set cMATCHES = .Execute(vPHRASEs(p))
End With
vCOUNTs(p) = vCOUNTs(p) + cMATCHES.Count
Next k
Next p
.Cells(2, 3).Resize(UBound(vCOUNTs) + 1, 1) = Application.Transpose(vCOUNTs)
Call key_in_phrase_helper_rgx(vKEYs, .Range(.Cells(2, 2), .Cells(Rows.Count, 2).End(xlUp)))
End With
Set cMATCHES = Nothing
Set vbaRGX = Nothing
End Sub
Sub key_in_phrase_helper_rgx(vKYs As Variant, rPHRSs As Range)
Dim c As Long, m As Long, p As Long, r As Long, v As Long, sPTTRN As String
Dim vbaRGX As New RegExp, cMATCHES As MatchCollection
With rPHRSs
For r = 1 To rPHRSs.Rows.Count
With .Cells(r, 1)
.ClearFormats
For v = LBound(vKYs) To UBound(vKYs)
sPTTRN = Replace(vKYs(v), "x00000000x", "[0-9,\-]{7,8}")
sPTTRN = Replace(sPTTRN, "xshortdatex", "[0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}")
sPTTRN = Replace(sPTTRN, "xdealnmbrx", "[0-9]{7}[a-z]{5}")
sPTTRN = Replace(sPTTRN, "xshortwrapdatex", "\([0-9,\-]{2,3}[a-z]{3}[0-9,\-]{3,5}\)")
c = 5 + CBool(vKYs(v) <> sPTTRN) * 2
Debug.Print sPTTRN
With vbaRGX
.Global = True
.Pattern = sPTTRN
End With
Set cMATCHES = vbaRGX.Execute(LCase(.Value2))
For m = 0 To cMATCHES.Count - 1
p = 0
Do While CBool(InStr(p + 1, .Value2, cMATCHES.Item(m), vbTextCompare))
p = InStr(p + 1, .Value2, cMATCHES.Item(m), vbTextCompare)
'Debug.Print vKYs(v)
With .Characters(Start:=p, Length:=Len(cMATCHES.Item(m))).Font
.Bold = True
.ColorIndex = c
End With
Loop
Next m
Next v
End With
Next r
End With
Set cMATCHES = Nothing
Set vbaRGX = Nothing
End Sub
In the following image of me sample's results, the staight location items are noted in bold}blue and the RegEx pattern matching is noted by bold|red.
Feel free to modify and append with additional keywords, phrases and RegEx patterns.
这篇关于模式匹配在excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!