问题描述
我有一个Excel文件&我处理了该文件以进行数据分析并创建了 Data Frame(Pandas).
现在我需要得到结果,我试图通过使用 for&if
条件但是我没有得到想要的输出.
我在Excel文件中使用了连字符(-),以便可以应用一些条件.
I have an excel file & I Processed that file for Data Analysis and Created a Data Frame(Pandas).
Now I Need to Get the result ,I'm trying to get it through iterating over pandas columns and rows using for & if
Condition But I'm not getting desired output.
I've Taken hyphen(-) in excel file so that I can apply some conditions.
Excel文件输入文件
Required Output
A -> B -> C -> E -> I
F -> G ->L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X
Note: Saving Output in Text file in plain. No need of Graph / Visualization
代码
df = pd.read_excel('Test.xlsx')
df.fillna('-')
# Below code answer Z -> X
for index, row in df.iterrows():
if row['Start_Name'] != '-':
if row['End_Name'] != '-':
print(row['Start_Name'] +' -> '+ row['End_Name'])
# Below code answer A -> B / F -> G / H -> J / C1 -> A1
for index, row in df.iterrows():
if row['Start_Name'] != '-':
if row['Mid_Name_1'] == '-':
if row['Mid_Name_2'] != '-':
print(row['Start_Name'] +' -> '+ row['Mid_Name_2'])
# Below code answer B -> C / C -> E
for index, row in df.iterrows():
if row['Mid_Name_1'] != '-':
if row['Mid_Name_2'] != '-':
print(row['Mid_Name_1'] +' -> '+ row['Mid_Name_2'])
推荐答案
设置:
Fronts
词典保存以名称/关键字开头的序列的值/位置.
Fronts
dictionary holds value/position of the sequence that stars with name/key.
Backs
词典保存以名称/关键字结尾的序列的值/位置.
Backs
dictionary holds value/position of the sequence that ends with name/key.
序列
是包含所有组合关系的列表.
sequences
is a list to hold all combined relations.
position_counter
存储最后创建的序列的位置.
position_counter
stores position of last made sequence.
from collections import deque
import pandas as pd
data = pd.read_csv("Names_relations.csv")
fronts = dict()
backs = dict()
sequences = []
position_counter = 0
全部提取.为每个 row
选择与regex-pattern匹配的值
Extract_all. For each row
select values that match regex-pattern
selector = data.apply(lambda row: row.str.extractall("([\w\d]+)"), axis=1)
对于选择器
中的每个关系
,请提取元素.
For each relation
from selector
get extracted elements.
将它们放入队列.
检查新的 relation
的 front
是否可以附加到任何先前的序列上.
Check if front
of new relation
can be attached to any previous sequence.
如果是这样:
- 采用该序列的
位置
. - 将序列本身作为
llist2
- 从
llist2
中删除最后一个重复的元素 - 添加序列
- 使用连接的文字更新
序列
- 使用序列的当前结束位置更新
backs
- 最后从
fronts
和backs
中删除上一个序列的突出末端
- take
position
of that sequence. - take sequence itself as
llist2
- remove last duplicated element from
llist2
- add the sequences
- update
sequences
with connected llists - update
backs
with position of the current end of the seuquence - and finally remove exausted ends of the previous sequence from
fronts
andbacks
类似于fronts.keys()中的:
Analogous to back in fronts.keys():
如果尚不存在与新关系匹配的序列:
If no already existing sequence match to new relation:
- 保存该关系
- 使用该关系的位置更新
fronts
和backs
- 更新位置计数器
for relation in selector:
front, back = relation[0]
llist = deque((front, back))
finb = front in backs.keys()
# binf = back in fronts.keys()
if finb:
position = backs[front]
llist2 = sequences[position]
back_llist2 = llist2.pop()
llist = llist2 + llist
sequences[position] = llist
backs[llist[-1]] = position
if front in fronts.keys():
del fronts[front]
if back_llist2 in backs.keys():
del backs[back_llist2]
# if binf:
# position = fronts[back]
# llist2 = sequences[position]
# front_llist2 = llist2.popleft()
# llist = llist + llist2
# sequences[position] = llist
# fronts[llist[0]] = position
# if back in backs.keys():
# del backs[back]
# if front_llist2 in fronts.keys():
# del fronts[front_llist2]
# if not (finb or binf):
if not finb: #(equivalent to 'else:')
sequences.append(llist)
fronts[front] = position_counter
backs[back] = position_counter
position_counter += 1
for s in sequences:
print(' -> '.join(str(el) for el in s))
输出:
A -> B -> C -> E -> I
F -> G -> L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X
#if binf is active:
# A -> B -> C -> E -> I
# F -> G -> L
# H -> J -> K
# C1 -> A1 -> B1
# Z -> X
Name_relations.csv
Name_relations.csv
Start_Name,Mid_Name_1,Mid_Name_2,End_Name
A,-,B,-
-,B,C,-
-,C,E,-
F,-,G,-
H,-,J,-
-,E,-,I
-,J,-,K
-,G,-,L
-,A1,-,B1
C1,-,A1,-
Z,-,-,X
这篇关于大 pandas 遍历行与行列并根据某些条件进行打印的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!