DataMapPlot库可绘制漂亮的数据地图,以便应用于演示文稿、海报和论文中。重点是用尽可能少的工作量生成美观的静态图, 您只需在数据地图中标记点簇。虽然这涉及到大多数美学选择的自动化,但该库提供了多种方法来根据您的需求定制结果图。


一、安装

pip3 install datamapplot



二、准备数据

2.1 读取arxiv.csv.gz

点击下载 arxiv.csv.gz , 该数据有 x1x2label 三个字段,其中

  • x1、x2是降维后的特征信息,常见的降维算法有pca、UMAP, t-SNE等
  • label是标注(类别)信息
df = pd.read_csv('arxiv.csv.gz', compression='gzip')
df


使用PIL读取 arxiv_logo.png(点击下载该图片),并转化为array数组型数据。

import PIL
import numpy as np

arxiv_logo = np.asarray(PIL.Image.open('arxiv_logo.png'))



三、绘图

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('png', 'svg')
import PIL



df = pd.read_csv('arxiv.csv.gz', compression='gzip')
data_map_coords, labels = np.array(df[['x1', 'x2']]), df['label']
arxiv_logo = np.asarray(PIL.Image.open('arxiv.png'))
highlight_labels =  ["Clustering",
                     "Manifold learning and dimension reduction",
                     "Active learning",
                     "Topic modelling and text classification"]


datamapplot.create_plot(
    data_map_coords, 
    labels,
    title = "ArXiv ML Landscape",
    sub_title = "A data map of papers from the Machine Learning section of ArXiv",
    highlight_labels = highlight_labels,
    label_font_size = 8,
    highlight_label_keywords = {
        "fontsize": 12, "fontweight": "bold", "bbox":{"boxstyle":"circle", "pad":0.75}
    },
    logo=arxiv_logo,
)

plt.savefig('arxiv_white.png', dpi=200)



三、Gallery

更多内容,可阅读文档 DataMapPlot: https://github.com/TutteInstitute/datamapplot




广而告之