从文本文件中获取随机单词

从文本文件中获取随机单词

本文介绍了C-从文本文件中获取随机单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,其中包含精确顺序的单词列表.我正在尝试创建一个从该文件返回单词数组的函数.我设法以与文件相同的顺序检索单词,如下所示:

I have a text file which contains a list of words in a precise order.I'm trying to create a function that return an array of words from this file. I managed to retrieve words in the same order as the file like this:

char *readDict(char *fileName) {

    int i;

    char * lines[100];
    FILE *pf = fopen ("francais.txt", "r");

    if (pf == NULL) {
        printf("Unable to open the file");
    } else {

        for (i = 0; i < 100; i++) {

            lines[i] = malloc(128);

            fscanf(pf, "%s", lines[i]);

            printf("%d: %s\n", i, lines[i]);
        }


        fclose(pf);

        return *lines;
    }

    return "NULL";
}

我的问题是:如何从文本文件返回带有随机单词的数组;不是按文件单词顺序排列吗?

My question is: How can I return an array with random words from the text file; Not as the file words order?

文件如下:

exemple1
exemple2
exemple3
exemple4

推荐答案

水库采样允许您可以从不确定大小的流中选择随机数量的元素.这样的事情可能会起作用(尽管未经测试):

Reservoir sampling allows you to select a random number of elements from a stream of indeterminate size. Something like this could work (although untested):

char **reservoir_sample(const char *filename, int count) {
    FILE *file;
    char **lines;
    char buf[LINE_MAX];
    int i, n;

    file = fopen(filename, "r");
    lines = calloc(count, sizeof(char *));
    for (n = 1; fgets(buf, LINE_MAX, file); n++) {
        if (n <= count) {
            lines[n - 1] = strdup(buf);
        } else {
            i = random() % n;
            if (i < count) {
                free(lines[i]);
                lines[i] = strdup(buf);
            }
        }
    }
    fclose(file);

    return lines;
}

这是算法R":

  • 将前count行读入示例数组.
  • 对于随后的每一行,用概率count / n替换样本数组中的随机元素,其中n是行号.
  • 最后,样本包含一组随机线. (顺序不是一致地随机的,但是您可以通过随机播放来解决.)
  • Read the first count lines into the sample array.
  • For each subsequent line, replace a random element of the sample array with probability count / n, where n is the line number.
  • At the end, the sample contains a set of random lines. (The order is not uniformly random, but you can fix that with a shuffle.)

这篇关于C-从文本文件中获取随机单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 17:31