问题描述
我有一本词典,键是我的客户ID,值是我的电影ID.尽管客户已经看过同一部电影很多次,但我还是希望将其制作成一部.在这里,我需要将字典转换为二进制数据.在所有行中,我都需要客户ID和列作为电影ID,如果客户已经看过电影,则给出1,否则为0.
I have a dictionary with keys as my customer ID and values as my movie id. Though the customer has watched the same movie many times, I want it to make as one.Here I need to convert my dictionary to binary data.In all the rows I need the customers ID's and columns as movie id's, where if the customer has watched the movie, it gives 1 else 0.
d = {'121212121' : 111, 222, 333, 333,444, 444, '212121212' : 222, 555, 555, 666, '212123322' : 555, 666, 666, 666, 777}
所需的输出:
customer ID 111 222 333 444 555 666 777
121212121 1 1 1 1 0 0 0
212121212 0 1 0 0 1 1 0
121323231 0 0 0 0 1 1 1
我尝试使用count vectorizer()
I have tried using count vectorizer()
代码:
cv = CountVectorizer()
movies = cv.fit_transform(cust['movies_list'])
cols = cv.vocabulary_
movies_ = pd.DataFrame(movies.toarray(), columns = cols, index =
cust['customer_id'])
movies_
输出:
customer ID 111 222 333 444 555 666 777
212121212 1 1 2 2 0 0 0
121212121 0 1 0 0 2 1 0
121323231 0 0 0 0 1 3 1
客户ID的精巧匹配,我可以算出他看过电影的次数了.
The customer Id's dint match and I got a count on how many times he watched the movie.
推荐答案
您似乎可以使用clip_upper
将正值裁剪为1.
It looks like you can just use clip_upper
to clip positive values to 1.
movies_.clip_upper(1)
111 222 333 444 555 666 777
121212121 1 1 1 1 0 0 0
212121212 0 1 0 0 1 1 0
212123322 0 0 0 0 1 1 1
这是从d
开始的替代解决方案.您可以使用pd.get_dummies
,然后使用clip_upper
.
Here's an alternative solution starting with d
. You can use pd.get_dummies
, followed by clip_upper
.
import pandas as pd
df = pd.concat([
pd.Series(v, name=k).astype(str) for k, v in d.items() # `d` is your dict
],
axis=1
)
pd.get_dummies(df.stack()).sum(level=1).clip_upper(1)
111 222 333 444 555 666 777
121212121 1 1 1 1 0 0 0
212121212 0 1 0 0 1 1 0
212123322 0 0 0 0 1 1 1
这篇关于在python中将字典转换为二进制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!