<div dir="ltr">Dear all, <br><br>Thank you for all your input on how to go about a WoW corpus compilation.<div>I shall try to summarize the discussion to this point:<br></div><div><br></div><div><b>Are there any existing compilation of WoW corpus?</b></div>
<div><ul><li>(for now) no openly available WoW corpus available but<br></li><li>given researches like <a href="http://dl.acm.org/citation.cfm?id=1920331.1920490" target="_blank">http://dl.acm.org/citation.cfm?id=1920331.1920490</a> or <a href="http://www.pitt.edu/~lbc8/FriedlineCollister-constructingpowerfulidentityinWoW.pdf" target="_blank">http://www.pitt.edu/~lbc8/FriedlineCollister-constructingpowerfulidentityinWoW.pdf</a> , there are already some existing closed in-group corpora for WoW corpus</li>
</ul></div><div><b>How could one go about collecting a WoW corpus?</b></div><div><ul><li>as a <u><b>field linguist</b></u>, join and embrace the WoW community, collect data using built-in chatlogging and ethnographic journal.</li>
<ul><li>see <font face="arial, sans-serif">Friedline, B., & Collister, L. (2012) “Constructing a Powerful Identity in World of Warcraft: A Sociolinguistic Approach to MMORPGs.” In Call, Voorhees, and Whitlock (eds.), Dungeons, Dragons, and Digital Denizens: The Digital Role-Playing Game. New York: Continuum.</font></li>
</ul><li>as an <b><u>out-group</u></b> observer, join with a free-account and stay at the free locations to log the chats</li><ul><li>Problem is that you will end up logging mostly auction related chats because the locations available to free-accounts are usually use as a marketplace</li>
</ul><li>from <b><u>second-hand</u></b> data, using gameplay videos openly available, run OCR to collect texts</li><ul><li>foreseeable problems includes:<br></li><ul><li>no speaker meta-data</li><li>trouble converting video to frames to image for OCR</li>
<li>low quality videos leading to OCR input noise</li><li>noisy OCR outputs</li></ul></ul><li><b>ask data from game developer.</b></li><ul>
</ul></ul><div><b>Issues that might be raised:</b></div></div><div><ul><li><b>copyrights issues</b>, one needs to read through the TOS or consult Blizzard's staff</li><li><b>ethical issues</b>, Should there be a need to ask for consent prior or posterior to data collections? </li>
<li><b>data quality issues, </b>"<i>If you want chat that includes meaningful interactions, I think you have to actually be recording someone truly participating in the game, preferably someone in a functional guild.</i>" - Mary Elaine Califf</li>
<li><b>corpus representation issues</b>, given a collection of chatlogs of different users in the community, the users' language use would differ as would humans with different level of prestige/power/solidarity and the function/domain of utterance </li>
</ul><div>Regards,</div></div><div>liling</div></div>