如何从Watson Speech-to-Text输出中重建对话?

本文介绍了如何从Watson Speech-to-Text输出中重建对话?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有Watson的语音转文本服务的JSON输出，我已将其转换为列表，然后转换为Pandas数据帧.

I have the JSON output from Watson's Speech-to-Text service that I have converted into a list and then into a Pandas data-frame.

我正在尝试确定如何重建与会话类似的会话(带有时间):

I'm trying to identify how to reconstruct the conversation (with timings) akin to the following:

发言人0:说了这一点[00.01-00.12]

Speaker 0: Said this [00.01 - 00.12]

发言人1:说[00.12-00.22]

Speaker 1: Said that [00.12 - 00.22]

发言人0:说了别的话[00.22-00.56]

Speaker 0: Said something else [00.22 - 00.56]

我的数据框的每个单词都有一行，单词每个都有列，其开始/结束时间和扬声器标签(0或1).

My data-frame has a row for each word, and columns for the word, its start/end time, and the speaker tag (either 0 or 1).

words = [['said', 0.01, 0.06, 0],['this', 0.06, 0.12, 0],['said', 0.12,
0.15, 1],['that', 0.15, 0.22, 1],['said', 0.22, 0.31, 0],['something',
0.31, 0.45, 0],['else', 0.45, 0.56, 0]]

理想情况下，我要创建的内容是以下内容:同一位演讲者说的话会归为一组，并在下一位演讲者进入时被打断:

Ideally, what I am looking to create is the following, where words spoken by the same speaker are grouped together, and are broken when the next speaker steps in:

grouped_words = [[['said','this'], 0.01, 0.12, 0],[['said','that'] 0.12,
0.22, 1],[['said','something','else'] 0.22, 0.56, 0]

更新:根据请求，在 https://github上获得指向JSON文件的示例的链接..com/cookie1986/STT_test

推荐答案

将演讲者标签加载到Pandas Dataframe中很简单，可以很容易地获得图形化的视图，然后确定演讲者的班次.

Should be pretty straightforward to load the speaker labels into a Pandas Dataframe for a nice easy graphical view and then identifying the speaker shifts.

speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)

输出:

   from  speaker    to          0     1     2
0  0.01        0  0.06       said  0.01  0.06
1  0.06        0  0.12       this  0.06  0.12
2  0.12        1  0.15       said  0.12  0.15
3  0.15        1  0.22       that  0.15  0.22
4  0.22        0  0.31       said  0.22  0.31
5  0.31        0  0.45  something  0.31  0.45
6  0.45        0  0.56       else  0.45  0.56

从那里，您可以仅识别说话者的移动并快速循环折叠数据框

From there, you can ID only speaker shifts and collapse the dataframe with a quick loop

ChangeSpeaker = speakers.loc [speakers ['speaker'].shift()！= speakers ['speaker']].index

Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
    print(counter)
    currentindex=ChangeSpeaker[counter]
    try:
        nextIndex=ChangeSpeaker[counter+1]-1
        temp=speakers.loc[currentindex:nextIndex,:]
    except:
        temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))

您要从临时数据帧中的第一个值(因此开始)开始，然后从最后一个vlaue结束.此外，要处理最后一个扬声器情况(通常会出现超出范围的错误)，请使用try/catch.

You want to take the start point from the first value (hence head) and then the end point from the last vlaue in the temporary dataframe. Additionally, to handle the last speaker case (where you 'd normally get an array out of bounds error, you use a try/catch.

输出:

   from    to speaker               transcript
0  0.01  0.12       0             [said, this]
0  0.12  0.22       1             [said, that]
0  0.22  0.56       0  [said, something, else]

此处有完整代码

import json
import pandas as pd

jsonconvo=json.loads("""{
   "results": [
      {
         "alternatives": [
            {
               "timestamps": [
                  [
                     "said",
                     0.01,
                     0.06
                  ],
                  [
                     "this",
                     0.06,
                     0.12
                  ],
                  [
                     "said",
                     0.12,
                     0.15
                  ],
                  [
                     "that",
                     0.15,
                     0.22
                  ],
                  [
                     "said",
                     0.22,
                     0.31
                  ],
                  [
                     "something",
                     0.31,
                     0.45
                  ],
                  [
                     "else",
                     0.45,
                     0.56
                  ]
               ],
               "confidence": 0.85,
               "transcript": "said this said that said something else "
            }
         ],
         "final": true
      }
   ],
   "result_index": 0,
   "speaker_labels": [
      {
         "from": 0.01,
         "to": 0.06,
         "speaker": 0,
         "confidence": 0.55,
         "final": false
      },
      {
         "from": 0.06,
         "to": 0.12,
         "speaker": 0,
         "confidence": 0.55,
         "final": false
      },
      {
         "from": 0.12,
         "to": 0.15,
         "speaker": 1,
         "confidence": 0.55,
         "final": false
      },
      {
         "from": 0.15,
         "to": 0.22,
         "speaker": 1,
         "confidence": 0.55,
         "final": false
      },
      {
         "from": 0.22,
         "to": 0.31,
         "speaker": 0,
         "confidence": 0.55,
         "final": false
      },
      {
         "from": 0.31,
         "to": 0.45,
         "speaker": 0,
         "confidence": 0.55,
         "final": false
      },
      {
         "from": 0.45,
         "to": 0.56,
         "speaker": 0,
         "confidence": 0.54,
         "final": false
      }
   ]
}""")



speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)

ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index


Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
    print(counter)
    currentindex=ChangeSpeaker[counter]
    try:
        nextIndex=ChangeSpeaker[counter+1]-1
        temp=speakers.loc[currentindex:nextIndex,:]
    except:
        temp=speakers.loc[currentindex:,:]



    Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))

这篇关于如何从Watson Speech-to-Text输出中重建对话?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！