本文介绍了在开发蛋白质序列分割时遇到麻烦的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在编写分解蛋白质的程序时遇到问题

模块 Module1子主()Dim apomyoglobin As String脱脂肌红蛋白 = "'GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLTALGGILKKKEGHHEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKHRPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG"Dim thearray() As String = apomyoglobin.Split("") '存储字符串数组'对于 i = 0 至 (Len(肌红蛋白) - 1)如果数组(i) 那么下一个结束子终端模块

我想在特定字母处拆分序列,即

在精氨酸(R)之后切断字符串

赖氨酸(K)后切割

如果赖氨酸或精氨酸后接脯氨酸(P),则不会切割

我一直在研究如何在不删除任何字符的情况下将这个数组拆分成单独的块.我只是想将某些字符放入一个数组中

解决方案

下面是一个在循环中使用 IndexOfAnySubstring 进行拆分的例子,包装成一个扩展方法:

导入 System.Runtime.CompilerServices公共模块 StringExtensions<扩展名>Public Function SplitAtChars(source As String, ParamArray chars As Char()) As String()Dim substrings as New List(Of String)'从源文本的开头开始.昏暗的开始索引 = 0'找到第一个分隔符.Dim endIndex As Integer = source.IndexOfAny(chars)直到 endIndex = -1'获取从当前起始索引到并包括当前分隔符的子字符串.substrings.Add(source.Substring(startIndex, endIndex - startIndex + 1))'在当前分隔符之后重新开始.开始索引 = 结束索引 + 1'找到下一个分隔符.endIndex = source.IndexOfAny(chars, startIndex)环形'获取从最后一个分隔符之后到源文本末尾的子字符串.substrings.Add(source.Substring(startIndex))返回 substrings.ToArray()结束函数终端模块

示例用法:

Dim text = "AB1CD2DEF1GHIJK1LMN2OPQR2STU1VW2XYZ"Dim text1 = 1AB1CD2DEF1GHIJK1LMN2OPQR2STU1VW2XYZ"Dim text2 =AB1CD2DEF1GHIJK1LMN2OPQR2STU1VW2XYZ2"Console.WriteLine(String.Join(",", text.SplitAtChars("1"c, "2"c)))Console.WriteLine(String.Join(",", text1.SplitAtChars("1"c, "2"c)))Console.WriteLine(String.Join(",", text2.SplitAtChars("1"c, "2"c)))

输出:

AB1、CD2、DEF1、GHIJK1、LMN2、OPQR2、STU1、VW2、XYZ1,AB1,CD2,DEF1,GHIJK1,LMN2,OPQR2,STU1,VW2,XYZAB1,CD2,DEF1,GHIJK1,LMN2,OPQR2,STU1,VW2,XYZ2,

您可以根据具体情况进行调整.

I am having trouble screating a program to split up a protein

Module Module1

    Sub Main()
        Dim apomyoglobin As String
        apomyoglobin = "'GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLTALGGILKKKEGH
HEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKHRPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG'"
        Dim thearray() As String = apomyoglobin.Split("") 'Stores the array of the string'
        For i = 0 To (Len(apomyoglobin) - 1)
            If thearray(i) Then
        Next

    End Sub

End Module

I want to split the sequence at specific letters i.e

Cutting the string after an arginine(R)

Cutting after a lysine(K)

Does not cut if lysine or argine is followed by proline(P)

I am stuck on exactly how to split this array into seperate chunks without removing any characters.I Just want to put certain ones into an array

解决方案

Here's an example of splitting using IndexOfAny and Substring in a loop, wrapped into an extension method:

Imports System.Runtime.CompilerServices

Public Module StringExtensions

    <Extension>
    Public Function SplitAtChars(source As String, ParamArray chars As Char()) As String()
        Dim substrings As New List(Of String)

        'Start at the beginning of the source text.
        Dim startIndex = 0

        'Find the first delimiter.
        Dim endIndex As Integer = source.IndexOfAny(chars)

        Do Until endIndex = -1
            'Get the substring from the current start index to and including the current delimiter.
            substrings.Add(source.Substring(startIndex, endIndex - startIndex + 1))

            'Start again after the current delimiter.
            startIndex = endIndex + 1

            'Find the next delimiter.
            endIndex = source.IndexOfAny(chars, startIndex)
        Loop

        'Get the substring from after the last delimiter to the end of the source text.
        substrings.Add(source.Substring(startIndex))

        Return substrings.ToArray()
    End Function

End Module

Example usage:

Dim text = "AB1CD2DEF1GHIJK1LMN2OPQR2STU1VW2XYZ"
Dim text1 = "1AB1CD2DEF1GHIJK1LMN2OPQR2STU1VW2XYZ"
Dim text2 = "AB1CD2DEF1GHIJK1LMN2OPQR2STU1VW2XYZ2"

Console.WriteLine(String.Join(",", text.SplitAtChars("1"c, "2"c)))
Console.WriteLine(String.Join(",", text1.SplitAtChars("1"c, "2"c)))
Console.WriteLine(String.Join(",", text2.SplitAtChars("1"c, "2"c)))

Output:

AB1,CD2,DEF1,GHIJK1,LMN2,OPQR2,STU1,VW2,XYZ
1,AB1,CD2,DEF1,GHIJK1,LMN2,OPQR2,STU1,VW2,XYZ
AB1,CD2,DEF1,GHIJK1,LMN2,OPQR2,STU1,VW2,XYZ2,

You can adjust as required for your specific case.

这篇关于在开发蛋白质序列分割时遇到麻烦的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 09:59