


I have a database full of two different types of users (Mentors and Mentees), whereby I want the second group (Mentees) to be able to "search" for people in the first group (Mentors) who match their profile. Mentors and Mentees can both go in and change items in their profile at any point in time.

当前,我正在使用Apache Mahout进行用户匹配(recommender.mostSimilarIDs()).我遇到的问题是,每次有人搜索时,我都必须重新加载用户数据.就其本身而言,这并不需要花费那么长的时间,但是当Mahout处理数据时,它似乎要花费很长时间(3000 Mentors和3000 Mentees需要14分钟).处理后,匹配仅需几秒钟.在处理代码时,我也一遍又一遍地收到相同的INFO消息(已处理2248个用户"),同时查看代码显示该消息仅应每10000个用户输出一次.

Currently, I am using Apache Mahout for the user matching (recommender.mostSimilarIDs()). The problem I'm running into is that I have to reload the user data every single time anyone searches. By itself, this doesn't take that long, but when Mahout processes the data it seems to take a very long time (14 minutes for 3000 Mentors and 3000 Mentees). After processing, matching takes mere seconds. I also get the same INFO message over and over again while it's processing ("Processed 2248 users"), while looking at the code shows that the message should only be outputted every 10000 users.


I'm using the GenericUserBasedRecommender and the GenericDataModel, along with the NearestNUserNeighborhood, AveragingPreferenceInferrer and PearsonCorrelationSimilarity. I load mentors from the database, add the mentee to the list of POJOs and convert them to a FastByIDMap to give to the DataModel.


Is there a better way to be doing this? The product owner needs the data to be current for every search.




You shouldn't need to ask it to reload the data every time, why's that?


14 minutes sounds way, way too long to load such a small amount of data too, something's wrong. You might follow up with more info at user@mahout.apache.org.


You are seeing log messages from a DataModel, which you can disable in your logging system of choice. It prints one final count. This is nothing to worry about.


I would advise you against using a PreferenceInferrer unless you absolutely know you want it. Do you actually have ratings here? I might suggest LogLikelihoodSimilarity if not.


08-03 19:11