问题描述
我希望能够生成C code动态和快速重新加载到我正在运行的C程序。
I want to be able to generate C code dynamically and re-load it quickly into my running C program.
我在Linux上,怎么可能这样做?
I am on Linux, how could this be done?
能否出库的.so在Linux文件重新编译并在运行时重新加载?
Can a library .so file on Linux be re-compiled and reloaded at runtime?
难道是不会产生.so文件编译,可以在编译的输出以某种方式去记忆,然后重新加载?我想很快重新加载编译code。
Could it be compiled without producing a .so file, could the compiled output somehow go to memory and then be reloaded ? I want to reload the compiled code quickly.
推荐答案
你想要做什么是合理的,而我做的正是在的(高级别领域特定语言扩展GCC;熔体被编译为C,直通本身写在MELT翻译)
What you want to do is reasonable, and I am doing exactly that in MELT (a high level domain specific language to extend GCC; MELT is compiled to C, thru a translator itself written in MELT).
首先,生成C code(或许多其他来源语言)时,一个很好的建议是保持某种的(AST)在内存中。因此,建立第一个生成的C code的整个AST,然后发射它作为C语法。不要以为你的code生成框架的不明确AST(换句话说,新一代C code。与一帮的printf是一个维护的噩梦,你想有一些中间再presentation)
First, when generating C code (or many other source languages), a good advice is to keep some sort of abstract syntax tree (AST) in memory. So build first the entire AST of the generated C code, then emit it as C syntax. Don't think of your code generation framework without an explicit AST (in other words, generation of C code with a bunch of printf is a maintenance nightmare, you want to have some intermediate representation).
二,主要原因产生C code是要好好地优化编译器的优势(另一个原因是C的便携性和无处不在)。如果你不关心产生code的性能(和TCC编译速度非常快ç成一个非常幼稚和慢的机器code)您可以使用一些其他方法,例如使用一些JIT库,如(非常快速生成慢的机器code的), 或的(生成的机器code是一个好一点),或(好机器code生成,但生成时间可比一个编译器)。
Second, the main reason to generate C code is to take advantage of a good optimizing compiler (another reason is the portability and ubiquity of C). If you don't care about performance of the generated code (and TCC compiles very quickly C into a very naive and slow machine code) you could use some other approaches, e.g. using some JIT libraries like Gnu lightning (very quick generation of slow machine code), Gnu Libjit or ASMJIT (generated machine code is a bit better), LLVM or GCCJIT (good machine code generated, but generation time comparable to a compiler).
所以,如果你生成C code,并希望它快速运行,在C code的编译时间是不可忽略的(因为你可能会fork一个 GCC -O -fPIC -shared
命令做出一些共享对象 foo.so
出你的生成 foo.c的
)。根据经验,生成C code花费更少的时间比编译它(与 GCC -O
)。在融化,C code的生成是由GCC比它的编译更快10倍以上(通常快30倍)。但是,C编译器做了优化是值得的。
So if you generate C code and want it to run quickly, the compilation time of the C code is not negligible (since you probably would fork a gcc -O -fPIC -shared
command to make some shared object foo.so
out of your generated foo.c
). By experience, generating C code takes much less time than compiling it (with gcc -O
). In MELT, the generation of C code is more than 10x faster than its compilation by GCC (and usually 30x faster). But the optimizations done by a C compiler are worth it.
一旦你发出你的C code,分叉其汇编成一份的.so
共享对象,可以的dlopen
它。不要害羞,我的例子表明,在Linux上可以执行dlopen大很多共享对象(数十万)。真正的瓶颈是生成的C code的汇编。在实践中,你并不真的需要 dlclose
在Linux上(除非你编写一个服务器程序需要为月来运行);未使用的共享模块可以留实际上的dlopen
-ed,你大多发生泄漏进程地址空间(这是一种廉价的资源),因为大多数的未使用的.so
将被换出。 的dlopen
迅速完成的,什么是需要时间是C源代码编译,因为你真的要由C编译器进行优化。
Once you emitted your C code, forked its compilation into a .so
shared object, you can dlopen
it. Don't be shy, my manydl.c example demonstrates that on Linux you can dlopen a big lot of shared objects (many hundreds of thousands). The real bottleneck is the compilation of the generated C code. In practice, you don't really need to dlclose
on Linux (unless you are coding a server program needing to run for months); an unused shared module can stay practically dlopen
-ed and you mostly are leaking process address space (which is a cheap resource), since most of that unused .so
would be swapped-out. dlopen
is done quickly, what takes time is the compilation of a C source, because you really want the optimization to be done by the C compiler.
您不需经过使用很多其他不同的方法,例如有一个字节code间preTER并生成该字节code,使用的Common Lisp(如SBCL在Linux上实现动态编译成机器code),LuaJit,爪哇,MetaOcaml等。
You coul use many other different approaches, e.g. have a bytecode interpreter and generate for that bytecode, use Common Lisp (e.g. SBCL on Linux which compiles dynamically to machine code), LuaJit, Java, MetaOcaml etc.
正如其他人的建议,你不那么在意写一个C文件的时候,它会留在文件系统缓存中的做法。和写作它比编译它快得多,所以留在内存中是不值得的麻烦。使用一些的的tmpfs 的如果你是I / O次数有关。
As others suggested, you don't care much about the time to write a C file, and it will stay in filesystem cache in practice. And writing it is much faster than compiling it, so staying in memory is not worth the trouble. Use some tmpfs if you are concerned by I/O times.
您询问
能否库的.so
在Linux文件重新编译和重 - 的在运行时加载?
当然是肯定的:你应该fork一个命令来构建从生成的C code库(如 GCC -O -fPIC -shared -o generated.c generated.so
,但你可以做到这一点间接,例如通过运行使-j
,特别是如果 generated.so
足够大,使其相关分裂 generated.c
中几个C生成的文件),然后你用动态的(给像 /some/file/path/to/generated.so ,也可能是
RTLD_NOW
标志,它),你必须使用则dlsym
找相关元件内。不要以为的 RE 的-loading(第二次)同 generated.so
,更能散发出独特的 generated1.c
(当时 generated2.c
等...)C文件,然后把它编译成的唯一的 generated1.so
(第二次 generated2.so
,等...),然后的dlopen
它(这是可以做到很多几十万次)。你可能想拥有,在发射生成的* .c
文件,一些的功能,这将在的dlopen
的生成*的.so 次code>
Of course yes: you should fork a command to build the library from the generated C code (e.g. a gcc -O -fPIC -shared generated.c -o generated.so
, but you could do it indirectly e.g. by running a make -j
, especially if the generated.so
is big enough to make it relevant to split the generated.c
in several C generated files!) and then you dynamically load your library with dlopen (giving a full path like /some/file/path/to/generated.so
, and probably the RTLD_NOW
flag, to it) and you have to use dlsym
to find relevant symbols inside. Don't think of re-loading (a second time) the same generated.so
, better to emit a unique generated1.c
(then generated2.c
etc...) C file, then to compile it to a unique generated1.so
(the second time to generated2.so
, etc...) then to dlopen
it (and this can be done many hundred thousands of times). You may want to have, in the emitted generated*.c
files, some constructor functions which would be executed at dlopen
time of the generated*.so
您的基础应用程序应该定义一个关于一套 -ed约定名字(一般函数)和它们是如何调用。它应该只直接调用在生成*的.so
功能直通则dlsym
-ed函数指针。在实践中,你将决定例如每个生成的* .c
定义一个函数无效dynfoo(INT)
和 INT dynbar(INT,INT)
,并使用则dlsym
与dynfoo
和dynbar
和调用这些直通函数指针(由则dlsym
)。你也应该定义这些 dynfoo
和 dynbar
将被称为如何以及何时约定。你会更好地与 -rdynamic
链接您的基本应用程序,使您的生成的* .c
文件可能会打电话给你的应用程序功能
Your base application program should have defined a convention about the set of dlsym-ed names (usually functions) and how they are called. It should only directly call functions in your generated*.so
thru dlsym
-ed function pointers. In practice you would decide for example that each generated*.c
defines a function void dynfoo(int)
and int dynbar(int,int)
and use dlsym
with "dynfoo"
and "dynbar"
and call these thru function pointers (returned by dlsym
). You should also define conventions of how and when these dynfoo
and dynbar
would be called. You'll better link your base application with -rdynamic
so that your generated*.c
files could call your application functions.
您的不想你的生成*的.so
为重新定义 现有的名称。例如,你不想重新定义的malloc
在生成的* .c
并希望所有的堆分配功能神奇地使用你的新变种(也可能不会工作,如果即使它,那将是危险的)。
You don't want your generated*.so
to re-define existing names. For instance, you don't want to redefine malloc
in your generated*.c
and expect all heap allocation functions to magically use your new variant (that probably won't work, and if even if it did, it would be dangerous).
您可能不会刻意去 dlclose
动态加载的共享对象,除了在应用程序的清理和退出时间(但我根本不费心 dlclose
)。如果你这样做 dlclose
一些动态加载生成*的.so
文件,确保没有什么是它使用:没有指针,甚至没有返回调用帧的地址,是现有的吧。
You probably won't bother to dlclose
a dynamically loaded shared object, except at application clean-up and exit time (but I don't bother at all to dlclose
). If you do dlclose
some dynamically loaded generated*.so
file, be sure that nothing is used in it: no pointers, not even return addresses in call frames, are existing to it.
P.S。熔体译者是目前MELT code的57KLOC翻译成近1770KLOC C $ C $的C
P.S. the MELT translator is currently 57KLOC of MELT code translated to nearly 1770KLOC of C code.
这篇关于如何动态地加载经常重新生成的C code快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!