这是我的代码:
区块1

import requests
import pandas as pd

url = ('http://www.omdbapi.com/' '?apikey=ff21610b&t=social+network')
r = requests.get(url)
json_data = r.json()
# from app
print(json_data['Awards'])
json_dict = dict(json_data)
tab=""
# printing all data as Dictionary
print("JSON as Dictionary (all):\n")
for k,v in json_dict.items():
  if len(k) > 6:
    tab = "\t"
  else:
    tab = "\t\t"
  print(str(k) + ":" + tab + str(v))
df = pd.DataFrame(json_dict)
df.drop_duplicates(inplace=True)
# printing Pandas DataFrame of all data
print("JSON as DataFrame (all):\n{}".format(df))

我只是在DataCamp上测试了一个示例问题然后我开始探索不同的东西问题在print(json_data['Awards'])处结束。我更进一步,正在测试将JSON文件转换为字典,并为其创建pandas数据帧有趣的是,我的输出如下:
Won 3 Oscars. Another 165 wins & 168 nominations.
JSON as Dictionary (all):

Title:      The Social Network
Year:       2010
Rated:      PG-13
Released:   01 Oct 2010
Runtime:    120 min
Genre:      Biography, Drama
Director:   David Fincher
Writer:     Aaron Sorkin (screenplay), Ben Mezrich (book)
Actors:     Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons
Plot:       Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, but is later sued by two brothers who claimed he stole their idea, and the co-founder who was later squeezed out of the business.
Language:   English, French
Country:    USA
Awards:     Won 3 Oscars. Another 165 wins & 168 nominations.
Poster:     https://m.media-amazon.com/images/M/MV5BMTM2ODk0NDAwMF5BMl5BanBnXkFtZTcwNTM1MDc2Mw@@._V1_SX300.jpg
Ratings:    [{'Source': 'Internet Movie Database', 'Value': '7.7/10'}, {'Source': 'Rotten Tomatoes', 'Value': '96%'}, {'Source': 'Metacritic', 'Value': '95/100'}]
Metascore:  95
imdbRating: 7.7
imdbVotes:  542,658
imdbID:     tt1285016
Type:       movie
DVD:        11 Jan 2011
BoxOffice:  $96,400,000
Production: Columbia Pictures
Website:    http://www.thesocialnetwork-movie.com/
Response:   True
Traceback (most recent call last):
  File "C:\Users\rschosta\OneDrive - Incitec Pivot Limited\Documents\Data Science\omdb-api-test.py", line 20, in <module>
    df.drop_duplicates(inplace=True)
  File "C:\Users\rschosta\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3535, in drop_duplicates
    duplicated = self.duplicated(subset, keep=keep)
  File "C:\Users\rschosta\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3582, in duplicated
    labels, shape = map(list, zip(*map(f, vals)))
  File "C:\Users\rschosta\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3570, in f
    vals, size_hint=min(len(self), _SIZE_HINT_LIMIT))
  File "C:\Users\rschosta\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\algorithms.py", line 471, in factorize
    labels = table.get_labels(values, uniques, 0, na_sentinel, check_nulls)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1367, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'

我在做一些关于.drop_duplicates()的研究,因为我以前用过这个,它工作得很好。下面是一个运行良好的示例代码:
区块2
import pandas as pd
import numpy as np

#Create a DataFrame
d = {
    'Name':['Alisa','Bobby','jodha','jack','raghu','Cathrine',
            'Alisa','Bobby','kumar','Alisa','Alex','Cathrine'],
    'Age':[26,24,23,22,23,24,26,24,22,23,24,24],

    'Score':[85,63,55,74,31,77,85,63,42,62,89,77]}

df = pd.DataFrame(d,columns=['Name','Age','Score'])
print(df)
df.drop_duplicates(keep=False, inplace=True)
print(df)

注意这两段代码有一些不同我将numpy作为np导入到我的第一个脚本中,但它并没有改变结果。
关于如何使drop_duplicates()方法在块1上工作有什么想法吗?
输出块1-A
根据@wen的请求,以下是作为字典的数据:
{'Title': 'The Social Network', 'Year': '2010', 'Rated': 'PG-13', 'Released': '01 Oct 2010', 'Runtime': '120 min', 'Genre': 'Biography, Drama', 'Director': 'David Fincher', 'Writer': 'Aaron Sorkin (screenplay), Ben Mezrich (book)', 'Actors': 'Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons', 'Plot': 'Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, but is later sued by two brothers who claimed he stole their idea, and the co-founder who was later squeezed out of the business.', 'Language': 'English, French', 'Country': 'USA', 'Awards': 'Won 3 Oscars. Another 165 wins & 168 nominations.', 'Poster': 'https://m.media-amazon.com/images/M/MV5BMTM2ODk0NDAwMF5BMl5BanBnXkFtZTcwNTM1MDc2Mw@@._V1_SX300.jpg', 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.7/10'}, {'Source': 'Rotten Tomatoes', 'Value': '96%'}, {'Source': 'Metacritic', 'Value': '95/100'}], 'Metascore': '95', 'imdbRating': '7.7', 'imdbVotes': '542,658', 'imdbID': 'tt1285016', 'Type': 'movie', 'DVD': '11 Jan 2011', 'BoxOffice': '$96,400,000', 'Production': 'Columbia Pictures', 'Website': 'http://www.thesocialnetwork-movie.com/', 'Response': 'True'}

现在,在删除重复项之前,我没有调用.drop_duplicates()方法,而是将分级词典转换为列,因此,在我打印的词典列表中,也有了更多的输出,因此更易于阅读:
Title:      The Social Network
Year:       2010
Rated:      PG-13
Released:   01 Oct 2010
Runtime:    120 min
Genre:      Biography, Drama
Director:   David Fincher
Writer:     Aaron Sorkin (screenplay), Ben Mezrich (book)
Actors:     Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons
Plot:       Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, but is later sued by two brothers who claimed he stole their idea, and the co-founder who was later squeezed out of the business.
Language:   English, French
Country:    USA
Awards:     Won 3 Oscars. Another 165 wins & 168 nominations.
Poster:     https://m.media-amazon.com/images/M/MV5BMTM2ODk0NDAwMF5BMl5BanBnXkFtZTcwNTM1MDc2Mw@@._V1_SX300.jpg
Ratings:    [{'Source': 'Internet Movie Database', 'Value': '7.7/10'}, {'Source': 'Rotten Tomatoes', 'Value': '96%'}, {'Source': 'Metacritic', 'Value': '95/100'}]
Metascore:  95
imdbRating: 7.7
imdbVotes:  542,658
imdbID:     tt1285016
Type:       movie
DVD:        11 Jan 2011
BoxOffice:  $96,400,000
Production: Columbia Pictures
Website:    http://www.thesocialnetwork-movie.com/
Response:   True

最佳答案

您有一个Ratings列,其中充满了字典所以不能使用drop_duplicates,因为dicts是可变的,不可散列的。
作为解决方案,您可以将这些值设置为元组的transform,然后使用frozenset

df['Ratings'] = df.Ratings.transform(lambda k: frozenset(k.items()))
df.drop_duplicates()

或者只选择要用作引用的列例如,如果只想基于drop_duplicatesyear删除重复项,可以执行以下操作
ref_cols = ['Title', 'Year']
df.loc[~df[ref_cols].duplicated()]

08-25 17:33