python - Pandas和Excel中部分重复的条件格式

我有以下名为reviews.csv的csv数据：

Movie,Reviewer,Sentence,Tag,Sentiment,Text,
Jaws,John,s1,Plot,Positive,The plot was great,
Jaws,Mary,s1,Plot,Positive,The plot was great,
Jaws,John,s2,Acting,Positive,The acting was OK,
Jaws,Mary,s2,Acting,Neutral,The acting was OK,
Jaws,John,s3,Scene,Positive,The visuals blew me away,
Jaws,Mary,s3,Effects,Positive,The visuals blew me away,
Vertigo,John,s1,Scene,Negative,The scenes were terrible,
Vertigo,Mary,s1,Acting,Negative,The scenes were terrible,
Vertigo,John,s2,Plot,Negative,The actors couldn’t make the story believable,
Vertigo,Mary,s2,Acting,Positive,The actors couldn’t make the story believable,
Vertigo,John,s3,Effects,Negative,The effects were awful,
Vertigo,Mary,s3,Effects,Negative,The effects were awful,

我的目标是将此csv文件转换为具有条件格式的Excel电子表格。具体来说，我想应用以下规则：

如果“电影”，“句子”，“标签”和“情感”值相同，则整行应为绿色。
如果“电影”，“句子”和“标签”值相同，但“情感”值不同，则该行应为蓝色。
如果“电影”和“句子”的值相同，但“标签”的值不同，则该行应为红色。

因此，我想创建一个如下所示的Excel电子表格（.xlsx）：

我一直在查看Pandas的样式文档，以及XlsxWriter上的条件格式设置教程，但我似乎无法将它们放在一起。到目前为止，这就是我所拥有的。我可以将csv读取到Pandas数据框中，对其进行排序（尽管我不确定是否有必要），然后将其写回到Excel电子表格中。我该如何进行条件格式设置，以及在代码中的什么位置？

def csv_to_xls(source_path, dest_path):
    """
    Convert a csv file to a formatted xlsx spreadsheet
    Input: path to hospital review csv file
    Output: formatted xlsx spreadsheet
    """
    #Read the source file and convert to Pandas dataframe
    df = pd.read_csv(source_path)

    #Sort by Filename, then by sentence number
    df.sort_values(['File', 'Sent'], ascending=[True, True], inplace = True)

    #Create the xlsx file that we'll be writing to
    orig = pd.ExcelWriter(dest_path, engine='xlsxwriter')

    #Convert the dataframe to Excel, create the sheet
    df.to_excel(orig, index=False, sheet_name='report')

    #Variables for the workbook and worksheet
    workbook = orig.book
    worksheet = orig.sheets['report']

    #Formatting for exact, partial, mismatch, gold
    exact = workbook.add_format({'bg_color':'#B7F985'}) #green
    partial = workbook.add_format({'bg_color':'#D3F6F4'}) #blue
    mismatch = workbook.add_format({'bg_color':'#F6D9D3'}) #red

    #Do the conditional formatting somehow

    orig.save()

最佳答案

免责声明：我是该图书馆的作者之一

这可以通过StyleFrame和DataFrame.duplicated轻松实现：

from StyleFrame import StyleFrame, Styler

sf = StyleFrame(df)

green = Styler(bg_color='#B7F985')
blue = Styler(bg_color='#D3F6F4')
red = Styler(bg_color='#F6D9D3')

sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence'], keep=False)],
                          styler_obj=red)
sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence', 'Tag'], keep=False)],
                          styler_obj=blue)
sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence', 'Tag', 'Sentiment'],
                                           keep=False)],
                          styler_obj=green)

sf.to_excel('test.xlsx').save()

输出以下内容：

关于python - Pandas和Excel中部分重复的条件格式，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/44814798/