<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">Hi All,<br>I really appreciate the post by Marco Baroni on links to LDA implementation. <br>My challenge is in developing my own corpra for application to activity recognition. 'words' in my case are features extracted from video, and 'documents' are video clips.<br><br>1. What files will i need to provide as input for these LDA codes, and what is the data format?<br>I came across things like "LDA-C format" at Blei's site where he says<br><span style="font-style: italic;">"The data is a file where each line is of the form:</span><br style="font-style: italic;"><span style="font-style: italic;">[M] [term_1]:[count] [term_2]:[count] ... [term_N]:[count]</span><br style="font-style: italic;"><span style="font-style: italic;">where [M] is the number of unique terms in the document, and the</span><br style="font-style: italic;"><span
style="font-style: italic;">[count] associated with each term is how many times that term appeared</span><br style="font-style: italic;"><span style="font-style: italic;">in the document."</span><br style="font-style: italic;"><br>I got confused because my understanding so far is that the 'words' are rows while the 'documents' are columns. Therefore the row vector are occurrence of word i in all the documents j(s).<br>But the expression: <span style="font-style: italic;">"where [M] is the number of unique terms in the document, and the [count] associated with each term is how many times that term appeared</span><br style="font-style: italic;"><span style="font-style: italic;">in the document."</span>- makes it sound like the rows of the matrix are documents while the colums are the the words. <br><br>PLEASE can someone help me clarify this. I would really appreciate is just a piece of any corpra that is already in the LDA-C format can be sent to my mail
so i use as a template, including any other files i need to specify.<br>Thanks<br>Toyin Popoola<br>toyinpopoola@ieee.org<br>HEU<br></td></tr></table><br>