多输入多变量数据可视化

解决方案更新:具有不同的颜色:colors = dict(low='DarkBlue', high='red', part='yellow', medium='DarkGreen')fig, ax = plt.subplots()for grp, vals in df.groupby('col4'): color = colors[grp] vals[['col2','col3']].plot.scatter(x='col2', y='col3', ax=ax, s=120, label=grp, color=color) PS，您必须注意所有组(col4)-在colors词典中定义老答案:假设您已将文件串联/合并/合并到单个DF中，我们可以执行以下操作:fig, ax = plt.subplots()[vals[['col2','col3']].plot.scatter(x='col2', y='col3', ax=ax, label=grp) for grp, vals in df.groupby('col4')] PS作为作业-您可以玩彩色游戏；)I am trying to visualise multivariate data model by reading them from multiple input files. I am looking for a simple solution to visualise multiple category data read from multiple input csv files. The no. Of rows in inputs range from 1 to 10000s in individual files. The format is same of all the inputs with 4 columns csv files.Input 1tweetcricscore 34 51 highInput 2tweetcricscore 23 46 lowtweetcricscore 24 12 lowtweetcricscore 456 46 lowInput 3 tweetcricscore 653 1 mediumtweetcricscore 789 178 mediumInput 4tweetcricscore 625 46 parttweetcricscore 86 23 parttweetcricscore 3 1 parttweetcricscore 87 8 parttweetcricscore 98 56 partThe four inputs are each of different category and col[1] and col[2] are pair results of some kind of classification. All the inputs here are the outputs of the same classification. I want to visualise them in better way to show all the categories in one plot only. Looking for a python or pandas solutions for the same. Scatter plot or any best approach to plot.I have already posted this query in Data analysis section of stack exchange and I have no luck hence trying here. https://datascience.stackexchange.com/questions/11440/multi-model-data-set-visualization-pythonMay be something like below image where every class has its own marker and color and can be categorized or any better way to show the pair values together.code: Edit 1: I am trying to plot a scatter plot with above input files. import numpy as npimport matplotlib.pyplot as pltfrom pylab import*import mathfrom matplotlib.ticker import LogLocatorimport pandas as pddf1 = pd.read_csv('input_1.csv', header = None)df1.columns = ['col1','col2','col3','col4']plt.df1(kind='scatter', x='col2', y='col3', s=120, c='b', label='Highly')plt.legend(loc='upper right')plt.xlabel('Freq (x)')plt.ylabel('Freq(y)')#plt.gca().set_xscale("log")#plt.gca().set_yscale("log")plt.show()Error:Traceback (most recent call last): File "00_scatter_plot.py", line 12, in <module> plt.scatter(x='col2', y='col3', s=120, c='b', label='High') File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 3087, in scatter linewidths=linewidths, verts=verts, **kwargs) File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 6337, in scatter self.add_collection(collection) File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 1481, in add_collection self.update_datalim(collection.get_datalim(self.transData)) File "/usr/lib/pymodules/python2.7/matplotlib/collections.py", line 185, in get_datalim offsets = np.asanyarray(offsets, np.float_) File "/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py", line 514, in asanyarray return array(a, dtype, copy=False, order=order, subok=True)ValueError: could not convert string to float: col2Expected Output Plotting- Pandas 解决方案 UPDATE:with different colors:colors = dict(low='DarkBlue', high='red', part='yellow', medium='DarkGreen')fig, ax = plt.subplots()for grp, vals in df.groupby('col4'): color = colors[grp] vals[['col2','col3']].plot.scatter(x='col2', y='col3', ax=ax, s=120, label=grp, color=color)PS you will have to care that all your groups (col4) - are defined in colors dictionaryOLD answer:assuming that you've concatenated/merged/joined your files into single DF, we can do the following:fig, ax = plt.subplots()[vals[['col2','col3']].plot.scatter(x='col2', y='col3', ax=ax, label=grp) for grp, vals in df.groupby('col4')]PS as a homework - you can play with colors ;) 这篇关于多输入多变量数据可视化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！