Protocol Buffers (a.k.a., protobuf) are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.
syntax = "proto2";
message TCPLog{
optional int32 total_byteps = ;
optional int64 flow_start_time =;
optional int64 date =;
生成过程可以使用ecplise 的插件 或者 直接在控制台中使用命令生成。
protoc.exe -I=proto的输入目录 --java_out=java类输出目录 proto的输入目录包括包括proto文件
import java.io.File;
import java.io.FileOutputStream; public class ProtoTest3 {
* @param args
* @throws Exception
* @author qiang(upupgo)
public static void main(String[] args) throws Exception {
TCPLogOuterClass.TCPLog.Builder builder = TCPLogOuterClass.TCPLog.newBuilder();
TCPLogOuterClass.TCPLog tcpLog= builder.build();
FileOutputStream out = new FileOutputStream(new File("D:/pb"));
TCPLogOuterClass.TCPLog tcp = TCPLogOuterClass.TCPLog.parseFrom(tcpLog.toByteArray());
Apache Avro™ is a data serialization system.
Avro provides:
Rich data structures.
A compact, fast, binary data format.
A container file, to store persistent data.
Remote procedure call (RPC).
Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.
{"namespace": "example.avro",
"type": "record",
"name": "TCPLog",
"fields": [
{"name": "total_byteps", "type": "int"},
{"name": "flow_start_time", "type": "long"},
{"name": "date", "type": "long"} ]
package avro; import java.io.File; import org.apache.avro.Schema;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
* @date 2017年8月13日22:15:32
* @author qiang(upupgo)
public class AvroTest2 {
public static void main(String[] args) throws Exception {
String filePath = "D:/TCPLog.avsc";
Schema schema = new Schema.Parser().parse(new File(filePath)); GenericRecord tcpLog = new GenericData.Record(schema);
tcpLog.put("total_byteps", 1024);
tcpLog.put("flow_start_time", 1502415717L);
tcpLog.put("date", 1502415717L); System.out.println(tcpLog); // Serialize user1 and tcpLog to disk
File file = new File("D:/avro");
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);
DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);
dataFileWriter.create(schema, file);
long timestart = System.currentTimeMillis();
long timeend = System.currentTimeMillis();
System.out.println(timeend-timestart); // Deserialize users from disk
DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("d:/avro"), datumReader);
GenericRecord tcpLogs = null;
long timestart1 = System.currentTimeMillis();
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
tcpLogs = dataFileReader.next();
// System.out.println("xx"+tcpLogs);
long timeend1 = System.currentTimeMillis();
以下是通过对100W tcpLog序列化操作对比结论:
通过对比测试发现 avro的性能要不pb稍微好一些,且支持动态性。故技术选型上可以优先考虑。
Google protobuf:
优点 二进制消息,性能好/效率高(空间和时间效率都很不错)
Netty等一些框架集成 缺点 官方只支持C++,JAVA和Python语言绑定
只涉及序列化和反序列化技术,不涉及RPC功能(类似XML或者JSON的解析器) Apache Thrift:
应用 Facebook的开源的日志收集系统(scribe: https://github.com/facebook/scribe)
淘宝的实时数据传输平台(TimeTunnel http://code.taobao.org/p/TimeTunnel/wiki/index)
HBase( http://abloz.com/hbase/book.html#thrift )
… 优点 支持非常多的语言绑定
支持同步和异步通信 缺点 和protobuf一样不支持动态特性 Apache Avro:
应用 Hadoop RPC (http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F) 优点 二进制消息,性能好/效率高
提供了基于Jetty内核的服务基于Netty的服务 缺点 只支持Avro自己的序列化格式
如有错误欢迎指正,如果对您有帮助也欢迎打赏 点赞 推荐 谢谢!^^