问题描述
大家好,我是新来的,但希望我的问题很清楚.
Hey guys I'm new here but hope my question is clear.
我的代码是用Python编写的.我有一个代表一般网站的基类,该类包含一些基本方法来从网站中获取数据并保存.该类由许多其他类扩展,每个其他类代表一个不同的网站,每个网站都具有该网站的特定属性,每个子类都使用基类方法来获取数据.所有站点都应该对数据进行解析,但是许多站点共享相同的解析功能.因此,我创建了几个解析类,它们包含不同解析方法的功能和属性(我大约有六个).我开始考虑将这些类与需要它们的网站类集成在一起的最佳方法是什么.
My code is written in Python.I have a base class representing a general website, this class holds some basic methods to fetch the data from the website and save it. That class is extended by many many other classes each representing a different website each holding attributes specific to that website, each subclass uses the base class methods to fetch the data. All sites should have the data parsed on them but many sites share the same parsing functionality . So I created several parsing classes that hold the functionality and properties for the different parsing methods (I have about six) . I started to think what would be the best way to integrate those classes with the website classes that need them.
起初,我认为每个网站类都将包含一个与之对应的解析器类的类变量,但是后来我认为必须有一些更好的方法来实现它.
At first I thought that each website class would hold a class variable with the parser class that corresponds to it but then I thought there must be some better way to do it.
我读了一点,以为我可以更好地依靠Mixins为每个网站集成解析器,但是后来我认为,尽管这样做行得通,但由于网站类没有业务可继承,所以听起来不正确"解析器类(甚至认为它只是一个Mixin,并不意味着要完全继承类),因为它们没有任何关系,只是网站使用了解析器功能.
I read a bit and thought I might be better off relying on Mixins to integrate the parsers for each website but then I thought that though that would work it doesn't "sound" right since the website class has no business inheriting from the parser class (even thought it is only a Mixin and not meant to be a full on class inheritance) since they aren't related in any way except that the website uses the parser functionality.
然后我想我可能会依赖于我在python上看到的一些依赖项注入代码来将解析器注入每个网站,但这听起来有些矫kill过正.
Then I thought I might rely on some dependency injection code I saw for python to inject the parser to each website but it sounded a bit of an overkill.
所以我想我的问题基本上是,什么时候最好使用每种情况(在我的项目中以及在其他任何项目中),因为它们都可以完成工作,但似乎并不是最合适的.
So I guess my question basically is, when is it best to use each case (in my project and in any other project really) since they all do the job but don't seem to be the best fit.
感谢您提供的任何帮助,希望我很清楚.
Thank you for any help you may offer, I hope I was clear.
添加一个小的模拟示例来说明:
Adding a small mock example to illustrate:
class BaseWebsite():
def fetch(): # Shared by all subclasses websites
....
def save(): # Shared by all subclasses websites
....
class FirstWebsite(BaseWebsite): # Uses parsing method one
....
class SecondWebsite(BaseWebsite): # Uses parsing method one
....
class ThirdWebsite(BaseWebsite): # Uses parsing method two
....
以此类推
推荐答案
我认为您的问题是您正在使用应该使用实例的子类.
I think your problem is that you're using subclasses where you should be using instances.
根据您的描述,每个网站都有一个类,并具有许多属性.大概您创建了每个类的单例实例.在Python中很少有这样做的充分理由.如果每个网站都需要不同的数据(基本URL,解析器对象/工厂/功能等),则可以将其存储在实例属性中,因此每个网站都可以是同一类的实例.
From your description, there's one class for each website, with a bunch of attributes. Presumably you create singleton instances of each of the classes. There's rarely a good reason to do this in Python. If each website needs different data—a base URL, a parser object/factory/function, etc.—you can just store it in instance attributes, so each website can be an instance of the same class.
例如,如果网站实际上需要以不同的方式覆盖基类方法,则使它们成为不同的类是有意义的(尽管即使存在,您也应考虑是否将该功能移至外部功能或可以网站所使用的解析器一样).但是,如果没有,则没有充分的理由这样做.
If the websites actually need to, say, override base class methods in different ways, then it makes sense for them to be different classes (although even there, you should consider whether moving that functionality into external functions or objects that can be used by the websites, as you already have with the parser). But if not, there's no good reason to do this.
当然我在这里可能是错的,但是事实是您定义了旧式类,将self
参数排除在方法之外,谈论了类属性,并且通常使用Java术语而不是Python术语.这个错误不太可能发生.
Of course I could be wrong here, but the fact that you defined old-style classes, left the self
parameter out of your methods, talked about class attributes, and generally used Java terminology instead of Python terminology makes me think that this mistake isn't too unlikely.
换句话说,您想要的是:
In other words, what you want is:
class Website:
def __init__(self, parser, spam, eggs):
self.parser = parser
# ...
def fetch(self):
data = # ...
soup = self.parser(data)
# ...
first_website = Website(parser_one, urls[0], 23)
second_website = Website(parser_one, urls[1], 42)
third_website = Website(parser_two, urls[2], 69105)
假设您有20个网站.如果要创建20个子类,则每个子类要编写六行样板,并且很多细节可能使您弄错,而这些细节可能很难调试.如果要创建20个实例,那么这只是样板人物,而犯错的则更少:
Let's say you have 20 websites. If you're creating 20 subclasses, you're writing half a dozen lines of boilerplate for each, and there's a whole lot you can get wrong with the details which may be painful to debug. If you're creating 20 instances, it's just a few characters of boilerplate, and a lot less to get wrong:
websites = [Website(parser_one, urls[0], 23),
Website(parser_two, urls[1], 42),
# ...
]
或者甚至可以将数据移动到数据文件中.例如,像这样的CSV:
Or you can even move the data to a data file. For example, a CSV like this:
url,parser,spam
http://example.com/foo,parser_one,23
http://example.com/bar,parser_two,42
…
您可以更轻松地进行编辑-甚至使用电子表格程序进行编辑-无需任何多余的输入.您可以使用几行代码将其导入Python:
You can edit this more easily—or even use a spreadsheet program to do it—with no need for any extraneous typing. And you can import it into Python with a couple lines of code:
with open('websites.csv') as f:
websites = [Website(**row) for row in csv.DictReader(f)]
这篇关于将一个类的特征与另一个类的特征结合在一起的最合适方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!