re、词云

yu
资讯
2025-01-01
55

re、词云

from jieba.analyse import set_stop_words,extract_tags

from wordcloud import WordCloud,STOPWORDS as sw

import numpy as np

from PIL import Image

#①结巴分词提取高频中文词：

stopWords='D:/中文停用词表.txt' #1个过滤词占1行

txtFile='F:/New Download/example.txt'

with open(txtFile) as f:

sentence=f.read()

set_stop_words(stopWords) #结巴分词の过滤：自定义中文

#words=extract_tags(sentence,50)

#text=' '.join(words)

#词云网站多数有个先后足矣，wc.generate_from_frequencies({*})等还需提供词频

words=extract_tags(sentence,topK=50,withWeight=True)

frequencies={word[0]:int(word[1]*1000) for word in words}

#②{词:词频,}数据导入词云：

backImg='F:/New Download/background.png'

bg=np.array(Image.open(backImg)) #bg=scipy.misc.imread(backImg)

wc=WordCloud('simhei.ttf',mask=bg,max_font_size=81) #背景是ndarray对象

#wc.stopwords=sw|set(open(stopWords).readlines()) #词云の过滤：内置英文及自定义中文

#wc.generate_from_frequencies({词1:词频1,})，wc.generate('空格分隔的各词')

wc.generate_from_frequencies(frequencies) #wc.generate(text)

#③展示图：法1のImage库用图片路径str，法2のplt库用WordCloud对象

saveImg='F:/New Download/result.jpg'

wc.to_file(saveImg)

Image.open(saveImg).show()

#import matplotlib.pyplot as plt

#plt.imshow(wc)

#plt.axis('off')

#plt.savefig(saveImg,dpi=240,bbox_inches='tight')

#plt.show()

******************分割线*******************

词云网站https://wor删dart.com/create(加载完要等几秒)的用法：

左侧的：WORDSのImport→保持顺序贴入各词(勾上俩Remove，若有词频且以分割则勾上CSV)→SHPAGES选个图→FONTSのAdd font(如选个本机的雅黑字体，网站提供的那些都不支持中文)→LAYOUT设字体倾斜→STYLEのCustom设字体五颜六色

→右上的Visualize

→右顶的DOWNLOAD(chrome设为内置下载)。

本网信息来自于互联网，目的在于传递更多信息，并不代表本网赞同其观点。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，并请自行核实相关内容。本站不承担此类作品侵权行为的直接责任及连带责任。如若本网有任何内容侵犯您的权益，请及时联系我们，本站将会在24小时内处理完毕，E-mail：xinmeigg88@163.com
本文链接：http://www.xrbh.cn/tnews/4779.html

上一篇
win10肿么删除图片右键的向右旋转？？？？？

下一篇
外链获取宝典：与商家建立联系，提升网站排名271