问题描述
我从图书馆目录中收到了一个数据转储,它以.txt格式显示.我已经能够将数据保存到电子表格中,但是全部都放在一栏中.我将把行换成列.
I've received a data dump from a library catalogue, it came out in .txt format. I've been able to get the data into a spreadsheet, but it is all in one column. I would to transpose the rows into columns.
数据按以下顺序位于这一列中:标题文件类型作者日期
The data is in this one column in the following order:TitleDocument TypeAuthorDate
但是在某些情况下,目录记录按以下顺序显示:标题文件类型概要作者日期
But in some cases, the catalogue records appear in the order:TitleDocument TypeSynopsisAuthorDate
因此,我无法根据行数将这些记录转置为列.
Therefore I cannot transpose these records into columns based on the number of rows.
每个标题前面都有单词"Description".这是整个数据集中的一项常规功能.
Each title has the word "Description" ahead of it. This is the one regular feature throughout the entire dataset.
有没有一种方法可以使用OpenRefine根据列中的文本将行转置为列?要在包含描述"的行之后转置x行,直到单词描述"的下一个实例?
Is there a way to use OpenRefine to transpose rows into columns based on the text in a column? To transpose x rows after the row containing "Description" until the next instance of the word "Description"?
推荐答案
我建议的方法是将行分组为OpenRefine'records'-我将采用以下方法:
The approach I'd suggest is to group your rows into OpenRefine 'records' - I'd approach this as follows:
- 将数据原样导入OpenRefine
- 使用GREL
value.startsWith("Description")
编写自定义文字刻面" - 选择该方面显示为"true"的行-这应为您提供所有包含标题的行
- 仍然应用了该方面选项,请使用基于此列添加列"添加仅包含标题的新列
- 将此新列移动到项目的开头(左手)
- 切换到记录"模式
- Import the data into OpenRefine as it is
- Write a 'custom text facet' with the GREL
value.startsWith("Description")
- Select the rows for which this facet shows 'true' - this should give you all the rows containing titles
- Still with this facet choice applied, use 'add column based on this column' to add a new column which contains just the titles
- Move this new column to the start (left hand) of your project
- Switch to 'Records' mode
现在您应该看到,与同一标题相关的每组行都有一个记录.现在,您可以使用加入多值单元格"选项将标题,文档类型,摘要(如果有),作者和日期放入单个单元格中
You should now see that you have a single Record for each set of rows which relate to the same title. You can now use the option to "Join multi-valued cells" to get the title,document type,synopsis(if exists),author, and date into a single cell
现在使用拆分为几列"将值拆分为各列
Now use 'split into several columns' to split the values across columns
现在每个标题应该有一行.您仍然需要做一些工作,因为与没有提要的行相比,有提要的行中的数据将移位一个.要解决此问题,我建议在最后一列使用空白" –由于没有太多数据,最后一列的非摘要行应该为空.
You should now have one row per title. You'll still have a little work to do as the data in rows where there is a synopsis will be shifted across by one compared to the rows where there is no synopsis. To fix this I'd suggest a 'facet by blank' on the last column - the non-synopsis rows should be empty in the last column as there is one less bit of data.
然后您可以使用转换将值在列之间一一移位(从空列开始,否则将覆盖数据).
You can then use transformations to shift the values across columns one by one (starting at the empty column, otherwise you'll overwrite data).
希望这一切都有道理.如果您按照Ettore的建议发布一些示例数据,那么我可以进行屏幕投射来说明
Hope that all makes sense. If you post some example data as Ettore suggests then I could do a screen cast to illustrate
欧文
这篇关于Openrefine-根据文本将行转置为列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!