本文介绍了Excel VBA使用正则表达式查找和屏蔽PAN数据,以符合PCI DSS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于大多数在文件系统中发现信用卡数据的工具不再列出可疑文件,因此需要工具来屏蔽必须保留的文件中的任何数据。

Because most of the tools to discover credit card data in file systems does no more that list the suspicious files, tools are needed to mask any data in files that must be retained.

对于可能存在信用卡数据负载的Excel文件,我使用正则表达式在所选列/行中检测信用卡数据的宏,并用X代替中间6-8位将对许多文件有用。可悲的是,我不是正则表达式宏空间的上师。

For excel files, where loads of credit card data may exist, I figure a macro that detects credit card data in the selected column/row using regex and replaces the middle 6-8 digits with Xs would be useful to many. Sadly, I'm not a guru in the regex macro space.

以下基本上只适用于3个卡牌的正则表达式,如果PAN与其他数据(例如注释字段)在一个单元格中,则工作原理

The below basically works with regex for 3 card brands only, and works if the PAN is in a cell with other data (e.g. comments fields)

以下代码有效,但可以改进。改善正则表达式,使其适用于更多/所有卡牌,并通过包含LUHN算法检查来减少假阳性。

The below code works, but could be improved. It would be good to improve the regex to make it work for more/all card brands and reduce false-positives by including a LUHN algorithm check.

改进/剩余问题:


  • 将所有卡牌品牌的PAN与扩展的正则表达式匹配

  • 包含Luhn算法检查(FIXED - 好主意Ron)

  • 改善Do While逻辑(由stribizhev修复)

  • 更好地处理不会包含PAN(FIXED)

  • Match all card brand's PANs with expanded regex
  • Include Luhn algorithm checking (FIXED - good idea Ron)
  • Improve the Do While logic (FIXED by stribizhev)
  • Even better handling of cells that don't contain PANs (FIXED)

这是我迄今为止对于AmEx,Visa和万事达卡似乎正常工作:

Here's what I have so far which seems to be working ok for AmEx, Visa and Mastercard:

Sub PCI_mask_card_numbers()
' Written to mask credit card numbers in excel files in accordance with PCI DSS.
' Highlight the credit card data in the Excel sheet, then run this macro.

Dim strPattern As String: strPattern = "([4][0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})|" & _
"([5][0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})|" & _
"([3][0-9]{2})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})|" & _
"([3][0-9]{3})([^a-zA-Z0-9_]?[0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})|" & _
"([3][0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})|" & _
"([3][0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{3})|" & _
"([3][0-9]{3})([^a-zA-Z0-9_]?[0-9]{6})([^a-zA-Z0-9_]?[0-9]{5})"

' Regex patterns for PANs above are broken into multiple parts (between the brackets)
' As such the when regex matches the first part of a PAN will fit into one of rMatch(k).SubMatches(#) where # is 0, 4, 8, 12, 16, 20 or 24.
' Visa start with a 4 and is 16 digits long. Typically the data entry pattern is four groups of four digits
' MasterCard start with a 5 and is 16 digits long. Typically the data entry pattern is four groups of four digits
' AmEx start with a 3 and is 15 digits long. Typically the pattern is 4-6-5, but data entry seems inconsistent

    Dim strReplace As String: strReplace = ""
'     Dim regEx As New RegExp  ' if this line is used instead of the next 2, the MS VBS RegEx v5.5 needs to be enabled manually. The next 2 lines seem to do it from within the script
    Dim regEx As Object
    Set regEx = CreateObject("VBScript.RegExp")
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    Dim NewPAN As String
    Dim Aproblem As String
    Dim Masked As Long
    Dim Problems As Long
    Dim Total As Long

With regEx
    .Global = True
    .MultiLine = True
    .IgnoreCase = False
    .Pattern = strPattern ' sets the regex pattern to match the pattern above
End With

Set Myrange = Selection

    MsgBox ("The macro will now start masking credit card numbers identified in the selected cells only. If entire columns are selected, each column will take 10-30 seconds to complete. Ditto for Rows.")

For Each cell In Myrange
    Total = Total + 1

    ' Check that the cell is a likely candidate for holding a PAN, not just a long number
    If strPattern <> "" _
    And cell.HasFormula = False _
    And Left(cell.NumberFormat, 1) <> "$" _
    And Mid(cell.NumberFormat, 3, 1) <> "$" Then
'        cell.NumberFormat = "@"
        strInput = cell.Value

        ' Depending on the data matching the regex pattern, fix it
        If regEx.Test(strInput) Then
            Set rMatch = regEx.Execute(strInput)
            For k = 0 To rMatch.Count - 1
                toReplace = rMatch(k).Value

        ' If the regex matched, replace the PAN based on its regex segment
                Select Case 2
                    Case Is < Len(rMatch(k).SubMatches(0))
                        strReplace = rMatch(k).SubMatches(0) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(3))
                        Masked = Masked + 1
                    Case Is < Len(rMatch(k).SubMatches(4))
                        strReplace = rMatch(k).SubMatches(4) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(7))
                        Masked = Masked + 1
                    Case Is < Len(rMatch(k).SubMatches(8))
                        strReplace = rMatch(k).SubMatches(8) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(11))
                        Masked = Masked + 1
                    Case Is < Len(rMatch(k).SubMatches(12))
                        strReplace = rMatch(k).SubMatches(12) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(13))
                        Masked = Masked + 1
                    Case Is < Len(rMatch(k).SubMatches(16))
                        strReplace = rMatch(k).SubMatches(16) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(19))
                        Masked = Masked + 1
                    Case Is < Len(rMatch(k).SubMatches(20))
                        strReplace = rMatch(k).SubMatches(20) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(23))
                        Masked = Masked + 1
                    Case Is < Len(rMatch(k).SubMatches(24))
                        strReplace = rMatch(k).SubMatches(24) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(26))
                        Masked = Masked + 1
                    Case Else
                        Aproblem = cell.Value
                        Problems = Problems + 1
                        ' MsgBox (Aproblem) ' only needed when curios
                End Select
                If cell.Value <> Aproblem Then
                    cell.Value = Replace(strInput, toReplace, strReplace)
                End If

            Next k
        Else
            ' Adds the cell value to a variable to allow the macro to move past the cell
            ' Once the macro is trusted not to loop forever, the message box can be removed
            ' MsgBox ("Problem. Regex fail? Bad data = " & Aproblem)
        End If
    End If
Next cell
' All done, tell the user
    MsgBox ("Cardholder data is now masked" & vbCr & vbCr & "Total cells highlighted (including blanks) = " & Total & vbCr & "Cells masked = " & Masked & vbCr & "Possible problem cells = " & Problems & vbCr & "All other cells were ignored")

End Sub


推荐答案

从假期回来这是一个简单的VBA函数,用于测试LUHN算法。参数是数字字符串;结果为布尔值。

Back from vacation. Here's a simple VBA function that will test for the LUHN algorithm. The argument is a string of the digits; the result is boolean.

它生成一个校验和数字,并将该数字与您提供的数字字符串中的数字进行比较。

It generates a checksum digit and compares that digit with the one in the digit string you feed it.

Option Explicit
Function Luhn(sNum As String) As Boolean
'modulus 10 algorithm for various numbers
Dim X As Long, I As Long, J As Long

For I = Len(sNum) - 1 To 1 Step -2
    X = X + DoubleSumDigits(Mid(sNum, I, 1))
    If I > 1 Then X = X + Mid(sNum, I - 1, 1)
Next I

If Right(sNum, 1) = (X * 9) Mod 10 Then
    Luhn = True
Else
    Luhn = False
End If
End Function

Function DoubleSumDigits(L As Long) As Long
    Dim X As Long
    X = L * 2
    If X > 9 Then X = Val(Left(X, 1)) + Val(Right(X, 1))
    DoubleSumDigits = X
End Function

这篇关于Excel VBA使用正则表达式查找和屏蔽PAN数据,以符合PCI DSS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-18 09:21
查看更多