问题描述
GAS 非常强大,您可以使用 Google Sheet 作为数据库后端编写一个完整的网络应用程序.有很多理由不 这样做,但我认为在某些情况下是可以的.
GAS is quite powerful and you could write a full fledged web-app using a Google Sheet as the DB back-end. There are many reasons not to do this but I figure in some cases it is okay.
我认为在具有大量行的工作表中根据某些条件查找行时,最大的问题将是性能问题.我知道有很多方法可以查询"一张纸,但我找不到关于哪个最快的可靠信息.
I think the biggest issue will be performance issues when looking for rows based on some criteria in a sheet with a lot of rows. I know there are many ways to "query" a sheet but I can't find reliable information on which is the fastest.
其中一个复杂性是许多人都可以编辑工作表,这意味着您必须考虑多种情况.为了简单起见,我想假设工作表:
One of the complexities is that many people can edit a sheet which means there are a variable number of situations you'd have to account for. For the sake of simplicity, I want to assume the sheet:
- 已锁定,因此只有一个人可以看到它
- 第一列有行号 (
=row()
)
最基本的查询是查找特定列等于某个值的行.
The most basic query is finding a row where a specific column equals some value.
哪种方法最快?
推荐答案
我有一个包含 ~19k 行和 ~38 列的工作表,其中填充了各种未排序的现实世界数据.那是几乎 700k 行,所以我认为这是一个很好的表格,可以对几种方法进行计时并查看哪种方法最快.
I have a sheet with ~19k rows and ~38 columns, filled with all sorts of unsorted real-world data. That is almost 700k rows so I figured it would be a good sheet to time a few methods and see which is the fastest.
- 方法 1:将工作表作为二维数组获取,然后遍历每一行
- 方法二:将sheet作为一个二维数组,排序,然后使用二分查找算法找到行
- 方法 3:对 Google 可视化查询进行
UrlFetch
调用 并且不要提供最后一行 - 方法 4:对 Google 可视化查询进行
UrlFetch
调用 并提供最后一行
这是我的查询函数.
function method1(spreadsheetID, sheetName, columnIndex, query)
{
// get the sheet values excluding header,
var rowValues = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getSheetValues(2, 1, -1, -1);
// loop through each row
for(var i = 0, numRows = rowValues.length; i < numRows; ++i)
{
// return it if found
if(rowValues[i][columnIndex] == query) return rowValues[i]
}
return false;
}
function method2(spreadsheetID, sheetName, columnIndex, query)
{
// get the sheet values excluding header
var rowValues = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getSheetValues(2, 1, -1, -1);
// sort it
rowValues.sort(function(a, b){
if(a[columnIndex] < b[columnIndex]) return -1;
if(a[columnIndex] > b[columnIndex]) return 1;
return 0;
});
// search using binary search
var foundRow = matrixBinarySearch(rowValues, columnIndex, query, 0, rowValues.length - 1);
// return if found
if(foundRow != -1)
{
return rowValues[foundRow];
}
return false;
}
function method3(spreadsheetID, sheetName, queryColumnLetterStart, queryColumnLetterEnd, queryColumnLetterSearch, query)
{
// SQL like query
myQuery = "SELECT * WHERE " + queryColumnLetterSearch + " = '" + query + "'";
// the query URL
// don't provide last row in range selection
var qvizURL = 'https://docs.google.com/spreadsheets/d/' + spreadsheetID + '/gviz/tq?tqx=out:json&headers=1&sheet=' + sheetName + '&range=' + queryColumnLetterStart + ":" + queryColumnLetterEnd + '&tq=' + encodeURIComponent(myQuery);
// fetch the data
var ret = UrlFetchApp.fetch(qvizURL, {headers: {Authorization: 'Bearer ' + ScriptApp.getOAuthToken()}}).getContentText();
// remove some crap from the return string
return JSON.parse(ret.replace("/*O_o*/", "").replace("google.visualization.Query.setResponse(", "").slice(0, -2));
}
function method4(spreadsheetID, sheetName, queryColumnLetterStart, queryColumnLetterEnd, queryColumnLetterSearch, query)
{
// find the last row in the sheet
var lastRow = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getLastRow();
// SQL like query
myQuery = "SELECT * WHERE " + queryColumnLetterSearch + " = '" + query + "'";
// the query URL
var qvizURL = 'https://docs.google.com/spreadsheets/d/' + spreadsheetID + '/gviz/tq?tqx=out:json&headers=1&sheet=' + sheetName + '&range=' + queryColumnLetterStart + "1:" + queryColumnLetterEnd + lastRow + '&tq=' + encodeURIComponent(myQuery);
// fetch the data
var ret = UrlFetchApp.fetch(qvizURL, {headers: {Authorization: 'Bearer ' + ScriptApp.getOAuthToken()}}).getContentText();
// remove some crap from the return string
return JSON.parse(ret.replace("/*O_o*/", "").replace("google.visualization.Query.setResponse(", "").slice(0, -2));
}
我的二分查找算法:
function matrixBinarySearch(matrix, columnIndex, query, firstIndex, lastIndex)
{
// find the value using binary search
// https://www.w3resource.com/javascript-exercises/javascript-array-exercise-18.php
// first make sure the query string is valid
// if it is less than the smallest value
// or larger than the largest value
// it is not valid
if(query < matrix[firstIndex][columnIndex] || query > matrix[lastIndex][columnIndex]) return -1;
// if its the first row
if(query == matrix[firstIndex][columnIndex]) return firstIndex;
// if its the last row
if(query == matrix[lastIndex][columnIndex]) return lastIndex;
// now start doing binary search
var middleIndex = Math.floor((lastIndex + firstIndex)/2);
while(matrix[middleIndex][columnIndex] != query && firstIndex < lastIndex)
{
if(query < matrix[middleIndex][columnIndex])
{
lastIndex = middleIndex - 1;
}
else if(query > matrix[middleIndex][columnIndex])
{
firstIndex = middleIndex + 1;
}
middleIndex = Math.floor((lastIndex + firstIndex)/2);
}
return matrix[middleIndex][columnIndex] == query ? middleIndex : -1;
}
这是我用来测试它们的函数:
This is the function I used to test them all:
// each time this function is called it will try one method
// the first time it is called it will try method1
// then method2, then method3, then method4
// after it does method4 it will start back at method1
// we will use script properties to save which method is next
// we also want to use the same query string for each batch so we'll save that in script properties too
function testIt()
{
// get the sheet where we're staving run times
var runTimesSheet = SpreadsheetApp.openById("...").getSheetByName("times");
// we want to see true speed tests and don't want server side caching so we a copy of our data sheet
// make a copy of our data sheet and get its ID
var tempSheetID = SpreadsheetApp.openById("...").copy("temp sheet").getId();
// get script properties
var scriptProperties = PropertiesService.getScriptProperties();
// the counter
var searchCounter = Number(scriptProperties.getProperty("searchCounter"));
// index of search list we want to query for
var searchListIndex = Number(scriptProperties.getProperty("searchListIndex"));
// if we're at 0 then we need to get the index of the query string
if(searchCounter == 0)
{
searchListIndex = Math.floor(Math.random() * searchList.length);
scriptProperties.setProperty("searchListIndex", searchListIndex);
}
// query string
var query = searchList[searchListIndex];
// save relevant data
var timerRow = ["method" + (searchCounter + 1), searchListIndex, query, 0, "", "", "", ""];
// run the appropriate method
switch(searchCounter)
{
case 0:
// start time
var start = (new Date()).getTime();
// run the query
var ret = method1(tempSheetID, "Extract", 1, query);
// end time
timerRow[3] = ((new Date()).getTime() - start) / 1000;
// if we found the row save its values in the timer output so we can confirm it was found
if(ret)
{
timerRow[4] = ret[0];
timerRow[5] = ret[1];
timerRow[6] = ret[2];
timerRow[7] = ret[3];
}
break;
case 1:
var start = (new Date()).getTime();
var ret = method2(tempSheetID, "Extract", 1, query);
timerRow[3] = ((new Date()).getTime() - start) / 1000;
if(ret)
{
timerRow[4] = ret[0];
timerRow[5] = ret[1];
timerRow[6] = ret[2];
timerRow[7] = ret[3];
}
break;
case 2:
var start = (new Date()).getTime();
var ret = method3(tempSheetID, "Extract", "A", "AL", "B", query);
timerRow[3] = ((new Date()).getTime() - start) / 1000;
if(ret.table.rows.length)
{
timerRow[4] = ret.table.rows[0].c[0].v;
timerRow[5] = ret.table.rows[0].c[1].v;
timerRow[6] = ret.table.rows[0].c[2].v;
timerRow[7] = ret.table.rows[0].c[3].v;
}
break;
case 3:
var start = (new Date()).getTime();
var ret = method3(tempSheetID, "Extract", "A", "AL", "B", query);
timerRow[3] = ((new Date()).getTime() - start) / 1000;
if(ret.table.rows.length)
{
timerRow[4] = ret.table.rows[0].c[0].v;
timerRow[5] = ret.table.rows[0].c[1].v;
timerRow[6] = ret.table.rows[0].c[2].v;
timerRow[7] = ret.table.rows[0].c[3].v;
}
break;
}
// delete the temp file
DriveApp.getFileById(tempSheetID).setTrashed(true);
// save run times
runTimesSheet.appendRow(timerRow);
// start back at 0 if we're the end
if(++searchCounter == 4) searchCounter = 0;
// save the search counter
scriptProperties.setProperty("searchCounter", searchCounter);
}
我有一个全局变量 searchList
,它是各种查询字符串的数组——有些在工作表中,有些不在.
I have a global variable searchList
that is an array of various query strings -- some are in the sheet, some are not.
我在触发器上运行 testit
每分钟运行一次.在 152 次迭代之后,我有 38 个批次.查看结果,这是我对每种方法看到的:
I ran testit
on a trigger to run every minute. After 152 iterations I had 38 batches. Looking at the result, this is what I see for each method:
| Method | Minimum Seconds | Maximum Seconds | Average Seconds |
|---------|-----------------|-----------------|-----------------|
| method1 | 8.24 | 36.94 | 11.86 |
| method2 | 9.93 | 23.38 | 14.09 |
| method3 | 1.92 | 5.48 | 3.06 |
| method4 | 2.20 | 11.14 | 3.36 |
所以看来,至少对于我的数据集,正在使用 Google 可视化查询 是最快的.
So it appears that, at least for my data-set, is using Google visualization query is the fastest.
这篇关于使用/在 Google Apps 脚本中在大型 Google 表格中搜索行的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!