What project would you potentially like to work with us on, using Weibo-by-Year Models (Wbbyyr)? 使用逐年微博模型,您有意与我们合作什么项目?
※ What is Wbbyyr?
Wbbyyr (short for WeiBo-BY-YeaR) is a set of 70 Word Embedding models trained on Sina Weibo posts.

※ Why 70 models?
The models covers the 7 years from 2012 to 2018, with a 10-fold cross-validation on each year. 7 * 10 = 70.

※ How are the models trained?
5 million unique Weibo posts composed in Mandarin Chinese are chosen from each year. The 5 million posts are randomly and evenly splitted into 10 parts. For each part, the rest 4,500,000 posts are used to train one fold of FastText model.

※ Terms and Conditions:
(1) you will use the model solely for the purpose of academic research,
(2) you must not redistribute or share the models without our explicit consent,
(3) you will be held reliable (morally and legally) for any misuse of the models, and
(4) this form is meant for setting up collaborations, and we do not imply any obligations.

---------------------------------------------------------------------------------------

※ Wbbyyr 是什么?
Wbbyyr(“逐年微博”的缩写)是一套(共70个)用新浪微博文本语料训练的“词嵌入”模型。

※ 为什么有70个?
从2012年到2018年(共7年),我们为每年都训练了10折,用以交叉验证。

※ 模型是如何训练的?
我们筛选了五百万条普通话、简体中文、不重复的微博。这五百万条被随机分配成大小相等的10折。我们依次隐去每个折,使用剩下的四百五十万条微博训练一个 FastText 模型。

※ 条款:
(1) 您只可将这套模型用于学术研究;
(2) 未经我方明确同意,您不可分享这套模型;
(3) 如有滥用,您对所有后果担负学术道德及法律责任;
(4) 本问卷仅是为了建立合作之用,我方不作任何保证。
Email address *
How should we address you? 如何称呼您? *
Your answer
What is your affiliation? 您的单位是? *
Your answer
How do you plan to use Wbbyyr? (Answer this like a research proposal, abstract, and/or a title.) 您打算如何使用这套模型?(可当作研究计划、论文摘要、论文标题来回答。) *
Your answer
Do you understand and accept our T&C? 您是否理解并接受我方条款? *
Thank you! 谢谢!
Submit
Never submit passwords through Google Forms.
reCAPTCHA
This form was created inside of Penn Alumni. Report Abuse