问题描述
所以我有一个包含三个数字的向量.65、66 和 67.我将这些数字从 int 转换为二进制并将它们附加到一个字符串中.字符串变为 100000110000101000011(分别为 65、66、67).我正在通过 dynamic_bitset 库将此数据写入文件.我有 BitOperations 类,它可以读取和写入文件工作.当我从文件中读取数据而不是给出上述位时,它给了我这些 001100010100001000001 位.
So I have a vector which has three numbers. 65, 66, and 67. I am converting these numbers from int to binary and appending them in a string. the string becomes 100000110000101000011 (65, 66, 67 respectively). I am writing this data into a file through dynamic_bitset library. I have BitOperations class which does the reading and writing into file work. When I read the data from file instead of giving the above bits it gives me these 001100010100001000001 bits.
这是我的 BitOperations 类:
Here is my BitOperations class:
#include <iostream>
#include <boost/dynamic_bitset.hpp>
#include <fstream>
#include <streambuf>
#include "Utility.h"
using namespace std;
using namespace boost;
template <typename T>
class BitOperations {
private:
T data;
int size;
dynamic_bitset<unsigned char> Bits;
string fName;
int bitSize;
public:
BitOperations(dynamic_bitset<unsigned char> b){
Bits = b;
size = b.size();
}
BitOperations(dynamic_bitset<unsigned char> b, string fName){
Bits = b;
this->fName = fName;
size = b.size();
}
BitOperations(T data, string fName, int bitSize){
this->data = data;
this->fName = fName;
this->bitSize = bitSize;
}
BitOperations(int bitSize, string fName){
this->bitSize = bitSize;
this->fName = fName;
}
void writeToFile(){
if (data != ""){
vector<int> bitTemp = extractIntegersFromBin(data);
for (int i = 0; i < bitTemp.size(); i++){
Bits.push_back(bitTemp[i]);
}
}
ofstream output(fName, ios::binary| ios::app);
ostream_iterator<char> osit(output);
to_block_range(Bits, osit);
cout << "File Successfully modified" << endl;
}
dynamic_bitset<unsigned char> readFromFile(){
ifstream input(fName);
stringstream strStream;
strStream << input.rdbuf();
T str = strStream.str();
dynamic_bitset<unsigned char> b;
for (int i = 0; i < str.length(); i++){
for (int j = 0; j < bitSize; ++j){
bool isSet = str[i] & (1 << j);
b.push_back(isSet);
}
}
return b;
}
};
这是调用这些操作的代码:
And here is the code which calls theses operations:
#include <iostream>
// #include <string.h>
#include <boost/dynamic_bitset.hpp>
#include "Utility/BitOps.h"
int main(){
vector<int> v;
v.push_back(65);
v.push_back(66);
v.push_back(67);
stringstream ss;
string st;
for (int i = 0; i < v.size(); i++){
ss = toBinary(v[i]);
st += ss.str().c_str();
cout << i << " )" << st << endl;
}
// reverse(st.begin(), st.end());
cout << "Original: " << st << endl;
BitOperations<string> b(st, "bits2.bin", 7);
b.writeToFile();
BitOperations<string>c(7, "bits2.bin");
boost::dynamic_bitset<unsigned char> bits;
bits = c.readFromFile();
string s;
// for (int i = 0; i < 16; i++){
to_string(bits, s);
// reverse(s.begin(), s.end());
// }
cout << "Decompressed: " << s << endl;
}
我做错了什么导致不正确的行为?
What am I doing wrong which results in incorrect behaviour?
这里是 extractIntegersFromBin(string s) 函数.
Here is the extractIntegersFromBin(string s) function.
vector<int> extractIntegersFromBin(string s){
char tmp;
vector<int> nums;
for (int i = 0; s[i]; i++ ){
nums.push_back(s[i] - '0');
}
return nums;
}
编辑 2:这是 toBinary 的代码:
Edit 2: Here is the code for toBinary:
stringstream toBinary(int n){
vector<int> bin, bin2;
int i = 0;
while (n > 0){
bin.push_back(n % 2);
n /= 2;
i++;
}
// for (int j = i-1; j >= 0; j--){
// bin2.push_back(bin[j]);
// }
reverse(bin.begin(), bin.end());
stringstream s;
for (int i = 0; i < bin.size(); i++){
s << bin[i];
}
return s;
}
推荐答案
您面临两个不同的问题:
You are facing two different issues:
boost 函数
to_block_range
将通过在末尾附加零来将输出填充到内部块大小.在您的情况下,内部块大小为sizeof(unsigned char)*8 == 8
.因此,如果您在writeToFile
中写入文件的位序列不是8
的倍数,则会写入额外的0
8
的倍数.因此,如果您使用readFromFile
重新读取位序列,则必须找到某种方法再次删除填充位.
The boost function
to_block_range
will pad the output to the internal block size, by appending zeros at the end. In your case, the internal block size issizeof(unsigned char)*8 == 8
. So if the bit sequence you write to the file inwriteToFile
is not a multiple of8
, additional0
s will be written to make for a multiple of8
. So if you read the bit sequence back in withreadFromFile
, you have to find some way to remove the padding bits again.
如何表示位序列没有标准方法(参考).根据场景,从左到右或从右到左(或一些完全不同的顺序)表示位可能更方便.因此,当您使用不同的代码段打印相同的位序列并且希望这些代码段打印相同的结果时,您必须确保这些代码段就如何表示位序列达成一致.如果一段代码从左到右打印,另一段从右到左打印,你会得到不同的结果.
There is no standard way for how to represent a bit sequence (reference). Depending on the scenario, it might be more convenient to represent the bits left-to-right or right-to-left (or some completely different order). For this reason, when you use different code pieces to print the same bit sequence and you want these code pieces to print the same result, you have to make sure that these code pieces agree on how to represent the bit sequence. If one piece of code prints left-to-right and the other right-to-left, you will get different results.
让我们单独讨论每个问题:
Let's discuss each issue individually:
我知道您想在 boost::dynamic_bitset
的内部块大小之上使用 bitSize
变量定义自己的块大小.例如,在您的 main
方法中,您构造 BitOperations;c(7, bits2.bin");
.我理解这意味着您希望存储在文件中的位序列的长度是 7
的倍数.
I understand that you want to define your own block size with the bitSize
variable, on top of the internal block size of boost::dynamic_bitset
. For example, in your main
method, you construct BitOperations<string> c(7, "bits2.bin");
. I understand that to mean that you expect the bit seqence stored in the file to have a length that is some multiple of 7
.
如果这种理解是正确的,您可以通过读取文件大小然后将其四舍五入到您的块大小的最接近倍数来移除由 to_block_range
插入的填充位.尽管您应该注意,您目前没有在 BitOperation
构造函数或 writeToFile
中强制执行此合同(即通过确保数据大小是 7
).
If this understanding is correct, you can remove the padding bits that have been inserted by to_block_range
by reading the file size and then rounding it down to the nearest multiple of your block size. Though you should note that you currently do not enforce this contract in the BitOperation
constructor or in writeToFile
(i.e. by ensuring that the data size is a multiple of 7
).
在您的 readFromFile
方法中,首先注意内部循环错误地将 blockSize
考虑在内.因此,如果 blockSize
为 7
,则错误地仅考虑每个块的前 7
位.而由 to_block_range
写入的块使用每个 1
字节块的完整 8
位,因为 boost::dynamic_bitset
对您的 7
位块大小一无所知.所以这会让你错过一些细节.
In your readFromFile
method, first note that the inner loop incorrectly takes the blockSize
into account. So if blockSize
is 7
, this incorrectly only considers the first 7
bits of each block. Whereas the blocks that were written by to_block_range
use the full 8
bit of each 1
-byte block, since boost::dynamic_bitset
does not know anything about your 7
-bit block size. So this makes you miss some bits.
以下是如何修复代码的一个示例:
Here is one example for how to fix your code:
size_t bitCount = (str.length()*8) / bitSize * bitSize;
size_t bitsPerByte = 8;
for (int i = 0; i < bitCount; i++) {
size_t index = (i / bitsPerByte);
size_t offset = (i % bitsPerByte);
bool isSet = (str[index] & ( 1 << offset));
b.push_back(isSet);
}
这个例子首先通过将文件大小四舍五入到块大小的最接近倍数来计算总共应该读取多少位.然后它遍历输入中的完整字节(即由 boost::dynamic_bitset
写入的内部块),直到读取了目标位数.剩余的填充位被丢弃.
This example first calculates how many bits should be read in total, by rounding down the file size to the nearest multiple of your block size. It then iterates over the full bytes in the input (i.e. the internal blocks that were written by boost::dynamic_bitset
), until the targeted number of bits have been read. The remaining padding bits are discarded.
另一种方法是使用boost::from_block_range
.这允许您摆脱一些样板代码(即将输入读入一些字符串缓冲区):
An alternative method would be to use boost::from_block_range
. This allows you to get rid of some boiler plate code (i.e. reading the input into some string buffer):
dynamic_bitset<unsigned char> readFromFile() {
ifstream input{fName};
// Get file size
input.seekg(0, ios_base::end);
ssize_t fileSize{input.tellg()};
// TODO Handle error: fileSize < 0
// Reset to beginning of file
input.clear();
input.seekg(0);
// Create bitset with desired size
size_t bitsPerByte = 8;
size_t bitCount = (fileSize * bitsPerByte) / bitSize * bitSize;
dynamic_bitset<unsigned char> b{bitCount};
// TODO Handle error: fileSize != b.num_blocks() * b.bits_per_block / bitsPerByte
// Read file into bitset
std::istream_iterator<char> iter{input};
boost::from_block_range(iter, {}, b);
return b;
}
关于问题 2
一旦您解决了问题 1,writeToFile
写入文件的 boost::dynamic_bitset
将与 readFromFile 读取的相同代码>.如果您使用相同的方法打印两者,则输出将匹配.但是,如果您使用不同的打印方法,并且这些方法在打印位的顺序上不一致,则会得到不同的结果.
Regarding issue 2
Once you have solved issue 1, the boost::dynamic_bitset
that is written to the file by writeToFile
will be the same as the one read by readFromFile
. If you print both with the same method, the output will match. However, if you use different methods for printing, and these methods do not agree on the order in which to print the bits, you will get different results.
例如,在程序的输出中,您现在可以看到Original:"输出与解压缩:"相同,但顺序相反:
For example, in the output of your program you can now see that the "Original:" output is the same as "Decompressed:", except in reverse order:
Original: 100000110000101000011
...
Decompressed: 110000101000011000001
同样,这并不意味着 readFromFile
工作不正确,只是您使用不同的方式打印位序列.
Again, this does not mean that readFromFile
is working incorrectly, only that you are using different ways of printing the bit sequences.
Original:
的输出是通过直接打印 main
中的 0
/1
输入字符串获得的左到右.在writeToFile
中,这个字符串然后按照与extractIntegersFromBin
相同的顺序进行分解,并将每一位传递给boost的
.push_back
方法::dynamic_bitsetpush_back
方法附加到位序列的末尾,这意味着它会将您传递的每个位解释为比前一个更重要 (参考):
The output for Original:
is obtained by directly printing the 0
/1
input string in main
from left to right. In writeToFile
, this string is then decomposed in the same order with extractIntegersFromBin
and each bit is passed to the push_back
method of boost::dynamic_bitset
. The push_back
method appends to the end of the bit sequence, meaning it will interpret each bit you pass as more significant than the previous (reference):
作用:将bitset的大小增加1,并将新的最高有效位的值设置为value.
因此,您的输入字符串被解释为输入字符串中的第一位是最低有效位(即序列的第一"位),而输入字符串的最后一位是最高有效位(即序列的最后"位).
Therefore, your input string is interpreted such that the first bit in the input string is the least significant bit (i.e. the "first" bit of the sequence), and the last bit of the input string is the most significant bit (i.e. the "last" bit of the sequence).
而您为Decompressed:"构造输出时,使用 to_string
.从该方法的文档中,我们可以看到位序列的最低有效位将是输出字符串的最后位(参考):
Whereas you construct the output for "Decompressed:" with to_string
. From the documentation of this method, we can see that the least-significant bit of the bit sequence will be the last bit of the output string (reference):
效果:将 b 的表示复制到字符串 s 中.如果设置了相应的位,则字符串中的字符为1",否则为0".字符串中的字符位置 i 对应位位置 b.size() - 1 - i.
所以问题很简单,to_string
(按设计)打印的顺序与您手动打印输入字符串的顺序相反.因此,要解决此问题,您必须反转其中之一,即通过以相反的顺序迭代字符串来打印输入字符串,或反转 to_string
的输出.
So the problem is simply that to_string
(by design) prints in opposite order compared to the order in which you print the input string manually. So to fix this, you have to reverse one of these, i.e. by printing the input string by iterating over the string in reverse order, or by reversing the output of to_string
.
这篇关于从文件中读取动态位集写入的数据无法读取正确的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!