每每看到什么内部原理,深入剖析,大全之类的字眼,总是对书的作者崇拜的五体投地,但是随着自己技术阅历的增加,发现自己昔日
心目中的大神的地位开始动摇,发现在偶看来很高深的知识,原来都是官方文档,很多东西都是自描述的,很多东西官网上面都有详细的介绍,
只是自己从来没有遇到过,mysql如此,rabbitmq如此,python也是如此,得出的结论就是要向对技术了解的比较深,一定要重视官方文档,特别
是开源项目~~~~~~~~~~~~~~~~~
layout of source code :
Demo/ Demonstration scripts, modules and programs
Doc/ Documentation sources (reStructuredText)
Grammar/ Input for the parser generator
Include/ Public header files
LICENSE Licensing information
Lib/ Python library modules
Mac/ Macintosh specific resources
Makefile.pre.in Source from which config.status creates the Makefile.pre
Misc/ Miscellaneous useful files
Modules/ Implementation of most built-in modules
Objects/ Implementation of most built-in object types
PC/ Files specific to PC ports (DOS, Windows, OS/2)
PCbuild/ Build directory for Microsoft Visual C++
Parser/ The parser and tokenizer and their input handling
Python/ The byte-compiler and interpreter
README The file you're reading now
RISCOS/ Files specific to RISC OS port
Tools/ Some useful programs written in Python
pyconfig.h.in Source from which pyconfig.h is created (GNU autoheader output)
configure Configuration shell script (GNU autoconf output)
configure.in Configuration specification (input for GNU autoconf)
install-sh Shell script used to install files
setup.py Python script used to build extension modules
在进行python的源码研究之前,先要把C的基本功熟悉一把,在python的include中,宏使用的频率还是很高的,高频率的使用,使得
python的源码的理解难度增大
基本功1 宏续行 example1: #define _PyObject_HEAD_EXTRA \ struct _object *_ob_next; \ struct _object *_ob_prev; 基本功2 条件编译 两种方式: #if #ifdef example2: #if defined(Py_DEBUG) && !defined(Py_TRACE_REFS) #define Py_TRACE_REFS #endif #ifdef Py_TRACE_REFS /* Define pointers to support a doubly-linked list of all live heap objects. */ #define _PyObject_HEAD_EXTRA \ struct _object *_ob_next; \ struct _object *_ob_prev; #define _PyObject_EXTRA_INIT 0, 0, #else #define _PyObject_HEAD_EXTRA #define _PyObject_EXTRA_INIT #endif 基本功3: 宏替换: example3 /* PyObject_HEAD defines the initial segment of every PyObject. */ #define PyObject_HEAD \ _PyObject_HEAD_EXTRA \ Py_ssize_t ob_refcnt; \ struct _typeobject *ob_type; typedef struct _object { PyObject_HEAD } PyObject; 基本功4 类型定义 typedef int Py_ssize_t; 基本功5 指向函数的指针以及类型定义 typedef PyObject *(*getiterfunc) (PyObject *); typedef PyObject *(*iternextfunc) (PyObject *); C的声明相对还是比较复杂,我们可以按照声明并使用的原则进行解读: PyObject *(*iternextfunc) (PyObject *) *(*iternextfunc) (PyObject *)是一个PyObject (*iternextfunc) (PyObject *)为一个指向PyObject类型的函数指针 (*iternextfunc)为一个函数 *iternextfunc代表函数本身, iternextfunc代表指向函数的指针 基本功5 交换链接: extern "C" { }这样理解一下是不是会好一点?说完了宏,接着不得不说的就是结构体,Cpython使用结构体来模拟python的继承与多态------python中一切皆对象
python中的对象主要分为两种,一种是定长对象,一种是变长对象,在宏的部分,我们已经给出定长对象的定义:
typedef struct _object { PyObject_HEAD } PyObject;其它定长对象于该对象相比,都具有相同结构的对象头,即所有对象进行解析的时候,开始部分是相同的,都可以使用 *PyObject 进行访问。
基本数据类型的对象的大小都是都通过
struct _typeobject *ob_type;来进行确定的,该结构指向了一个对象所属的类型,而所有对象类型的泛型定义如下:
typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; /* For printing, in format "." */ Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; printfunc tp_print; getattrfunc tp_getattr; setattrfunc tp_setattr; cmpfunc tp_compare; reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (here for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Functions to access object as input/output buffer */ PyBufferProcs *tp_as_buffer; /* Flags to define presence of optional/expanded features */ long tp_flags; const char *tp_doc; /* Documentation string */ /* Assigned meaning in release 2.0 */ /* call function for all accessible objects */ traverseproc tp_traverse; /* delete references to contained objects */ inquiry tp_clear; /* Assigned meaning in release 2.1 */ /* rich comparisons */ richcmpfunc tp_richcompare; /* weak reference enabler */ Py_ssize_t tp_weaklistoffset; /* Added in release 2.2 */ /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; /* Attribute descriptor and subclassing stuff */ struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; struct _typeobject *tp_base; PyObject *tp_dict; descrgetfunc tp_descr_get; descrsetfunc tp_descr_set; Py_ssize_t tp_dictoffset; initproc tp_init; allocfunc tp_alloc; newfunc tp_new; freefunc tp_free; /* Low-level free-memory routine */ inquiry tp_is_gc; /* For PyObject_IS_GC */ PyObject *tp_bases; PyObject *tp_mro; /* method resolution order */ PyObject *tp_cache; PyObject *tp_subclasses; PyObject *tp_weaklist; destructor tp_del; /* Type attribute cache version tag. Added in version 2.6 */ unsigned int tp_version_tag; #ifdef COUNT_ALLOCS /* these must be last and never explicitly initialized */ Py_ssize_t tp_allocs; Py_ssize_t tp_frees; Py_ssize_t tp_maxalloc; struct _typeobject *tp_prev; struct _typeobject *tp_next; #endif } PyTypeObject;python中所有对象的类型对象都具有上述结构的定义。
/* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping;上述三个声明,确定了一个类型可以进行操作方法集的集合,所有基本数据类型中以Int数据类型最为简单,我们以它为例:
int 对象的声明:
typedef struct { PyObject_HEAD long ob_ival; } PyIntObject;可以看到python中的int就是对C中long数据的一个封装。
然后看看int 类型对象的声明:
PyTypeObject PyInt_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) "int", sizeof(PyIntObject), 0, (destructor)int_dealloc, /* tp_dealloc */ (printfunc)int_print, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ (cmpfunc)int_compare, /* tp_compare */ (reprfunc)int_to_decimal_string, /* tp_repr */ &int_as_number, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ (hashfunc)int_hash, /* tp_hash */ 0, /* tp_call */ (reprfunc)int_to_decimal_string, /* tp_str */ PyObject_GenericGetAttr, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_CHECKTYPES | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_INT_SUBCLASS, /* tp_flags */ int_doc, /* tp_doc */ 0, /* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ 0, /* tp_iter */ 0, /* tp_iternext */ int_methods, /* tp_methods */ 0, /* tp_members */ int_getset, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ 0, /* tp_dictoffset */ 0, /* tp_init */ 0, /* tp_alloc */ int_new, /* tp_new */ (freefunc)int_free, /* tp_free */ };
python中的对象主要分为两种,一种是定长对象,一种是变长对象,在宏的部分,我们已经给出定长对象的定义:
typedef struct _object { PyObject_HEAD } PyObject;其它定长对象于该对象相比,都具有相同结构的对象头,即所有对象进行解析的时候,开始部分是相同的,都可以使用 *PyObject 进行访问。
基本数据类型的对象的大小都是都通过
struct _typeobject *ob_type;来进行确定的,该结构指向了一个对象所属的类型,而所有对象类型的泛型定义如下:
typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; /* For printing, in format "." */ Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; printfunc tp_print; getattrfunc tp_getattr; setattrfunc tp_setattr; cmpfunc tp_compare; reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (here for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Functions to access object as input/output buffer */ PyBufferProcs *tp_as_buffer; /* Flags to define presence of optional/expanded features */ long tp_flags; const char *tp_doc; /* Documentation string */ /* Assigned meaning in release 2.0 */ /* call function for all accessible objects */ traverseproc tp_traverse; /* delete references to contained objects */ inquiry tp_clear; /* Assigned meaning in release 2.1 */ /* rich comparisons */ richcmpfunc tp_richcompare; /* weak reference enabler */ Py_ssize_t tp_weaklistoffset; /* Added in release 2.2 */ /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; /* Attribute descriptor and subclassing stuff */ struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; struct _typeobject *tp_base; PyObject *tp_dict; descrgetfunc tp_descr_get; descrsetfunc tp_descr_set; Py_ssize_t tp_dictoffset; initproc tp_init; allocfunc tp_alloc; newfunc tp_new; freefunc tp_free; /* Low-level free-memory routine */ inquiry tp_is_gc; /* For PyObject_IS_GC */ PyObject *tp_bases; PyObject *tp_mro; /* method resolution order */ PyObject *tp_cache; PyObject *tp_subclasses; PyObject *tp_weaklist; destructor tp_del; /* Type attribute cache version tag. Added in version 2.6 */ unsigned int tp_version_tag; #ifdef COUNT_ALLOCS /* these must be last and never explicitly initialized */ Py_ssize_t tp_allocs; Py_ssize_t tp_frees; Py_ssize_t tp_maxalloc; struct _typeobject *tp_prev; struct _typeobject *tp_next; #endif } PyTypeObject;python中所有对象的类型对象都具有上述结构的定义。
/* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping;上述三个声明,确定了一个类型可以进行操作方法集的集合,所有基本数据类型中以Int数据类型最为简单,我们以它为例:
int 对象的声明:
typedef struct { PyObject_HEAD long ob_ival; } PyIntObject;可以看到python中的int就是对C中long数据的一个封装。
然后看看int 类型对象的声明:
PyTypeObject PyInt_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) "int", sizeof(PyIntObject), 0, (destructor)int_dealloc, /* tp_dealloc */ (printfunc)int_print, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ (cmpfunc)int_compare, /* tp_compare */ (reprfunc)int_to_decimal_string, /* tp_repr */ &int_as_number, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ (hashfunc)int_hash, /* tp_hash */ 0, /* tp_call */ (reprfunc)int_to_decimal_string, /* tp_str */ PyObject_GenericGetAttr, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_CHECKTYPES | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_INT_SUBCLASS, /* tp_flags */ int_doc, /* tp_doc */ 0, /* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ 0, /* tp_iter */ 0, /* tp_iternext */ int_methods, /* tp_methods */ 0, /* tp_members */ int_getset, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ 0, /* tp_dictoffset */ 0, /* tp_init */ 0, /* tp_alloc */ int_new, /* tp_new */ (freefunc)int_free, /* tp_free */ };可以看到PyInt_Type是PyTypeObject类型的对象,并进行了结构体的初始化,该数据类型相对来说比较简单,
因此很多函数指针的指针的值都为0,基本数据类型没有iterator,从上面的初始化就可以看出。我们需要注意的是
&int_as_number, /* tp_as_number */
该变量为静态变量,变量的声明位于类型对象的声明之上:
static PyNumberMethods int_as_number = { (binaryfunc)int_add, /*nb_add*/ (binaryfunc)int_sub, /*nb_subtract*/ (binaryfunc)int_mul, /*nb_multiply*/ (binaryfunc)int_classic_div, /*nb_divide*/ (binaryfunc)int_mod, /*nb_remainder*/ (binaryfunc)int_divmod, /*nb_divmod*/ (ternaryfunc)int_pow, /*nb_power*/ (unaryfunc)int_neg, /*nb_negative*/ (unaryfunc)int_int, /*nb_positive*/ (unaryfunc)int_abs, /*nb_absolute*/ (inquiry)int_nonzero, /*nb_nonzero*/ (unaryfunc)int_invert, /*nb_invert*/ (binaryfunc)int_lshift, /*nb_lshift*/ (binaryfunc)int_rshift, /*nb_rshift*/ (binaryfunc)int_and, /*nb_and*/ (binaryfunc)int_xor, /*nb_xor*/ (binaryfunc)int_or, /*nb_or*/ int_coerce, /*nb_coerce*/ (unaryfunc)int_int, /*nb_int*/ (unaryfunc)int_long, /*nb_long*/ (unaryfunc)int_float, /*nb_float*/ (unaryfunc)int_oct, /*nb_oct*/ (unaryfunc)int_hex, /*nb_hex*/ 0, /*nb_inplace_add*/ 0, /*nb_inplace_subtract*/ 0, /*nb_inplace_multiply*/ 0, /*nb_inplace_divide*/ 0, /*nb_inplace_remainder*/ 0, /*nb_inplace_power*/ 0, /*nb_inplace_lshift*/ 0, /*nb_inplace_rshift*/ 0, /*nb_inplace_and*/ 0, /*nb_inplace_xor*/ 0, /*nb_inplace_or*/ (binaryfunc)int_div, /* nb_floor_divide */ (binaryfunc)int_true_divide, /* nb_true_divide */ 0, /* nb_inplace_floor_divide */ 0, /* nb_inplace_true_divide */ (unaryfunc)int_int, /* nb_index */ };该变量定义了int数据类型所支持的很多操作;结构体的声明如下:
typedef struct { /* For numbers without flag bit Py_TPFLAGS_CHECKTYPES set, all arguments are guaranteed to be of the object's type (modulo coercion hacks -- i.e. if the type's coercion function returns other types, then these are allowed as well). Numbers that have the Py_TPFLAGS_CHECKTYPES flag bit set should check *both* arguments for proper type and implement the necessary conversions in the slot functions themselves. */ binaryfunc nb_add; binaryfunc nb_subtract; binaryfunc nb_multiply; binaryfunc nb_divide; binaryfunc nb_remainder; binaryfunc nb_divmod; ternaryfunc nb_power; unaryfunc nb_negative; unaryfunc nb_positive; unaryfunc nb_absolute; inquiry nb_nonzero; unaryfunc nb_invert; binaryfunc nb_lshift; binaryfunc nb_rshift; binaryfunc nb_and; binaryfunc nb_xor; binaryfunc nb_or; coercion nb_coerce; unaryfunc nb_int; unaryfunc nb_long; unaryfunc nb_float; unaryfunc nb_oct; unaryfunc nb_hex; /* Added in release 2.0 */ binaryfunc nb_inplace_add; binaryfunc nb_inplace_subtract; binaryfunc nb_inplace_multiply; binaryfunc nb_inplace_divide; binaryfunc nb_inplace_remainder; ternaryfunc nb_inplace_power; binaryfunc nb_inplace_lshift; binaryfunc nb_inplace_rshift; binaryfunc nb_inplace_and; binaryfunc nb_inplace_xor; binaryfunc nb_inplace_or; /* Added in release 2.2 */ /* The following require the Py_TPFLAGS_HAVE_CLASS flag */ binaryfunc nb_floor_divide; binaryfunc nb_true_divide; binaryfunc nb_inplace_floor_divide; binaryfunc nb_inplace_true_divide; /* Added in release 2.5 */ unaryfunc nb_index; } PyNumberMethods;该结构体一共声明了39个int类型数据可以进行的操作,至于是否实现根据版本的不同进行讨论,我们不在继续深入。这样我们就知道,python中所有的int对象都是 PyIntObject的实例;而对象的本身的信息(如大小,加减乘除操作的信息)是通过PyInt_Type类型获得,那么自然而然的一个问题就是,PyInt_Type的信息是根据什么过的呢?
PyTypeObject PyInt_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0)从PyInt_Typ的初始化信息,下面需要观察的结构体就是PyType_Type
#define PyObject_HEAD_INIT(type) \ _PyObject_EXTRA_INIT \ 1, type, #define PyVarObject_HEAD_INIT(type, size) \ PyObject_HEAD_INIT(type) size,采用宏来进行初始化,现在我们唯一需要关注就是传入的参数PyType_Type--谜底马上就要揭开,不要走开
PyTypeObject PyType_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) "type", /* tp_name */ sizeof(PyHeapTypeObject), /* tp_basicsize */ sizeof(PyMemberDef), /* tp_itemsize */ (destructor)type_dealloc, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_compare */ (reprfunc)type_repr, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ (hashfunc)_Py_HashPointer, /* tp_hash */ (ternaryfunc)type_call, /* tp_call */ 0, /* tp_str */ (getattrofunc)type_getattro, /* tp_getattro */ (setattrofunc)type_setattro, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_TYPE_SUBCLASS, /* tp_flags */ type_doc, /* tp_doc */ (traverseproc)type_traverse, /* tp_traverse */ (inquiry)type_clear, /* tp_clear */ type_richcompare, /* tp_richcompare */ offsetof(PyTypeObject, tp_weaklist), /* tp_weaklistoffset */ 0, /* tp_iter */ 0, /* tp_iternext */ type_methods, /* tp_methods */ type_members, /* tp_members */ type_getsets, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ offsetof(PyTypeObject, tp_dict), /* tp_dictoffset */ type_init, /* tp_init */ 0, /* tp_alloc */ type_new, /* tp_new */ PyObject_GC_Del, /* tp_free */ (inquiry)type_is_gc, /* tp_is_gc */ };
可以进行如下的测试:>>> type(int)
>>> type(5)
>>> a=1
>>> b=1
>>> hex(id(a))'0x12aa2d8'
>>> hex(id(b))'0x12aa2d8'
使用了对象缓冲池技术,后续继续研究
REF:
python 源码学习
python manual
c pitfall and trap