# evaluation file - top 1000 terms sorted by: cmpr_cvalue # annotations: 2 for valid tech term, 1 for valid term but not technology term, 0 for invalid term, - for not annotated yet # term_id term_string, annotation 447 natural language 1 1706 machine translation 2 1922 natural language processing 2 1489 other hand 0 2881 noun phrase 1 9254 speech recognition 2 28246 language model 1 1431 language processing 2 6033 computational linguistics 2 34285 error rate 1 34179 training corpus 1 33684 test set 1 44456 knowledge base 1 37346 parse tree 1 450 information retrieval 2 33622 training set 1 224403 computational linguistics volume 0 33229 word sense 1 8255 target language 1 241 word order 1 34900 machine learning 2 6912 total number 0 42352 wall street journal 0 197 phrase structure 1 27750 data set 1 251 l ~ 0 33632 information extraction 2 33165 named entity 1 37782 target word 1 34902 decision tree 2 2276 large number 0 7646 semantic information 1 33690 future work 0 8277 spoken language 1 7846 syntactic structure 1 31 c ~ 0 1907 next section 0 36668 penn treebank 1 47085 logical form 1 8304 discourse structure 1 1678 same time 0 224404 linguistics volume 0 33700 mutual information 1 4642 semantic representation 1 8253 source language 1 5851 input sentence 0 33044 feature structure 1 33230 word sense disambiguation 2 35587 previous work 0 8804 translation system 2 2850 data base 2 36555 morphological analysis 2 9602 lexical entry 1 8258 question answering 2 42351 wall street 0 70125 word segmentation 2 1057 first step 0 1056 same way 0 33231 sense disambiguation 2 15121 language understanding 2 42353 street journal 0 1423 syntactic analysis 2 29339 previous section 0 4806 speech act 1 4644 prepositional phrase 1 14350 input string 1 34609 linguistic knowledge 1 2264 syntactic information 1 7399 semantic interpretation 1 34203 word alignment 2 35078 search space 1 210 r ~ 0 34284 word error rate 1 31781 test corpus 1 7584 lexical item 1 7941 semantic analysis 2 51662 anaphora resolution 2 34283 word error 0 12532 finite state 2 46834 knowledge representation 2 19275 small number 0 10200 relative clause 1 21333 translation model 1 34534 semantic role 1 33951 statistical machine translation 2 7623 lexical information 1 8803 machine translation system 2 38078 gold standard 1 34578 unknown word 0 35832 natural language generation 2 2473 t ~ 0 19504 parsing algorithm 2 34628 mt system 2 34407 probability distribution 1 36048 dynamic programming 2 40471 brown corpus 1 9118 main verb 1 35833 language generation 2 49935 world knowledge 1 2976 direct object 0 38989 feature selection 2 21573 % accuracy 0 1498 english word 0 51291 case frame 1 12225 context-free grammar 1 36526 contextual information 1 13044 data structure 1 34532 language modeling 2 30406 head noun 1 64330 domain knowledge 1 46526 derivation tree 1 12993 average number 0 32759 root node 1 2818 semantic class 1 2302 wide range 0 9776 head word 1 33872 conditional probability 1 9974 verb phrase 1 13459 word sequence 1 55893 natural language understanding 2 3016 dependency tree 1 12714 maximum likelihood 2 7469 new york 0 45912 speech recognizer 2 34241 proposed method 0 33869 statistical machine 0 39933 search engine 2 3018 tree structure 1 36587 morphological analyzer 2 11965 sentence length 0 34905 pos tag 1 37729 named entity recognition 2 44859 elementary tree 1 10541 system performance 1 62299 text generation 2 571 single word 0 52276 reference resolution 2 2862 dependency structure 1 35596 user model 1 67846 discourse segment 1 28960 recognition system 2 9028 linguistic information 1 14683 feature set 1 34473 markov model 2 34943 pattern matching 2 8783 input text 1 4689 semantic structure 1 18781 generation system 1 21346 large corpus 1 106 c \/ 0 35251 dialogue system 2 16954 similarity measure 1 34286 translation quality 1 40048 probabilistic model 1 21195 additional information 0 33978 parallel corpus 1 60256 speech recognition system 2 52282 argument structure 1 35477 spoken dialogue 2 1586 text processing 2 34186 source sentence 1 9516 new information 0 39024 dialogue act 1 5631 generation process 1 33779 entity recognition 2 10388 proper noun 1 2989 first sentence 0 33587 % precision 0 15777 syntactic category 1 43573 local context 1 277 ~ l 0 34924 coreference resolution 2 38269 part-of-speech tagging 2 1653 computer science 2 48292 user interface 2 36671 data sparseness 1 46652 auxiliary tree 1 35129 spontaneous speech 0 9113 l \ 0 37796 semantic similarity 1 4648 surface form 1 9873 given word 0 1472 ~ r 0 63 finite set 1 30753 word w 0 31464 normal form 1 36653 sentence level 1 22030 small set 0 21107 right-hand side 1 5804 parsing process 1 22128 next step 0 10240 language system 0 6601 p ~ 0 62364 classi cation 2 57360 neural network 2 56247 semantic relation 1 24189 important role 0 7382 proper name 1 12745 statistical model 1 34404 development set 1 61311 lexical choice 1 10043 english sentence 1 47484 text categorization 2 17129 future research 0 34502 baseline system 0 39352 discourse model 1 35338 domain model 1 50847 regular expression 2 9724 new word 0 57000 lexical knowledge 1 61158 chinese word 1 33705 feature vector 1 4641 b ~ 0 83110 semantic network 1 5306 phrase structure grammar 1 292 same sentence 0 34287 bleu score 1 3195 n ~ 0 881 data collection 2 12004 source text 1 36377 feature space 1 33243 training data 1 34168 edit distance 1 22506 continuous speech 1 57552 trigram model 1 5036 linguistic analysis 2 30310 human language 2 33479 vector space 1 33588 % recall 0 35098 annotation scheme 1 41132 maximum entropy 2 39950 recent work 0 33646 knowledge acquisition 2 37628 related work 0 7936 special case 0 33713 average precision 0 33979 em algorithm 2 34153 parallel corpora 1 13771 wide variety 0 17708 categorial grammar 1 1622 national science foundation 0 5748 language learning 2 4858 real world 0 35814 information structure 1 6679 ambiguous word 1 6753 hierarchical structure 1 21311 current state 0 30848 first stage 0 34580 overall performance 0 1722 high precision 0 43191 semantic knowledge 1 6795 basic idea 0 39503 dialogue manager 2 21214 hand side 0 33047 grammar formalism 1 28418 first order 0 3377 surface structure 1 12753 correct answer 0 2746 same set 0 926 great deal 0 38155 large amount 0 5001 relative frequency 1 2213 same word 0 13043 search algorithm 2 11883 main clause 1 8599 semantic content 1 35192 pronoun resolution 2 33753 annotated corpus 1 101056 lexical rule 1 28698 right hand side 0 148343 natural language interface 2 4882 artificial intelligence 2 21105 left-hand side 0 20686 given sentence 0 35043 success rate 0 36799 large corpora 1 488738 o \ 0 7962 linguistic theory 1 8358 ~ \ 0 51723 query expansion 2 67634 continuous speech recognition 2 46910 rhetorical structure 1 87840 chinese word segmentation 2 47730 qa system 2 382 first word 0 47114 semantic type 1 31371 same type 0 37363 target sentence 1 36617 probability model 1 10239 natural language system 2 38404 chart parsing 2 7924 discourse context 1 33886 sentence pair 1 34828 word form 1 45871 understanding system 0 819 retrieval system 2 19573 internal structure 0 1603 english text 1 42655 broadcast news 1 33795 high level 0 41831 source word 0 36736 language technology 2 27962 grammar rule 1 28567 word list 0 53769 chart parser 2 40417 high accuracy 0 45843 meaning representation 2 37354 time complexity 1 39824 standard deviation 1 33261 window size 1 6326 high frequency 0 42072 relevant information 0 23876 o o o 0 33663 text corpus 1 98191 representation language 1 35383 new domain 0 36559 beam search 2 43721 current word 0 29450 second step 0 35603 current implementation 0 42407 training material 1 45126 error reduction 2 48534 unification grammar 1 12658 initial state 0 125111 text planning 2 581 next word 0 39100 naive bayes 2 34730 statistical language 0 50640 text structure 1 53551 free word order 1 12206 ~ t 0 1500 word class 0 2852 following way 0 30201 word level 1 21281 large scale 0 34375 significant improvement 0 9055 sentence boundary 1 17322 t \ 0 35409 automatic evaluation 2 36258 supervised learning 2 17697 dependency grammar 1 34879 classification accuracy 1 34904 classification task 1 36335 learning algorithm 2 45356 grammatical function 1 111750 compound noun 1 34431 automatic speech recognition 2 55030 translation process 2 42297 pos tagger 2 37088 objective function 0 51790 semantic processing 2 35161 pos tagging 2 15183 such information 0 3646 low frequency 0 17930 definite noun 1 34629 system using 0 57444 research projects agency 0 33433 lexical sample 0 36915 following section 0 34471 hidden markov 2 54467 bilingual corpus 1 5245 b \ 0 4738 constituent structure 1 67497 description length 0 45881 recognition accuracy 1 36680 cross validation 1 833 content word 0 6523 detailed description 0 12328 ~ c 0 33855 word pair 1 37480 tree adjoining 0 58840 parse forest 1 28697 right hand 0 5307 structure grammar 0 10252 simple example 0 5666 system \ 0 56815 viterbi algorithm 2 10765 o o 0 34961 first experiment 0 92589 expert system 2 1883 ~ ~ 0 18118 above example 0 3935 other language 0 52825 text summarization 2 32790 typed feature 1 39504 lexical database 1 44163 corpus analysis 2 40213 semantic category 1 41120 second stage 0 7383 definite description 1 42220 high quality 0 34430 automatic speech 0 2937 present paper 0 43586 parsing model 1 51341 entropy model 1 34624 part-of-speech tagger 2 57442 advanced research projects 0 35128 right context 0 79902 word boundary 1 49462 text classification 2 98327 text understanding 2 40206 extraction system 2 45546 shared task 0 70883 text analysis 2 951706 de coling-92 0 37831 distributional similarity 1 52582 lexical ambiguity 1 61322 correct parse 0 137844 reference time 0 1564 document collection 1 6924 input word 0 37188 joint probability 1 1623 science foundation 0 7978 programming language 2 79655 grammar development 2 34511 trigram language model 1 1621 national science 0 55183 language text 0 10192 english translation 1 5045 sentence structure 1 6645 problem solving 2 2841 second sentence 0 47777 unsupervised learning 2 33868 computational complexity 1 35885 dependency parsing 2 41166 high degree 0 47518 discourse representation 2 33908 parameter estimation 2 44265 text corpora 1 156381 n \ 0 44851 example sentence 0 27560 + l 0 42614 lexical cohesion 1 53144 message understanding 2 33906 bigram model 1 241048 text planner 2 34919 current work 0 8893 subordinate clause 1 46560 foot node 0 37986 british national corpus 1 1617 document retrieval 2 8155 error correction 2 10133 relative pronoun 1 17423 c \ 0 958 o ~ 0 18715 current system 0 9761 morphological information 1 14744 top level 0 33867 ibm model 1 50158 information content 1 27790 processing system 0 14524 feature value 1 34482 n-best list 1 35601 user satisfaction 1 52715 linguistic structure 1 54700 human language technology 2 36008 syntactic parsing 2 41174 prepositional phrase attachment 1 41006 speech synthesis 2 53522 text retrieval 2 67598 real time 0 73101 knowledge source 0 33899 polynomial time 1 96222 context model 1 33480 vector space model 2 68345 context vector 1 148344 language interface 0 19091 background knowledge 1 36215 small amount 0 12621 equivalence class 0 27883 same number 0 30167 left context 0 51302 finite verb 1 58523 % error 0 5110 whole sentence 0 51324 long distance 0 18587 large set 0 34695 statistical significance 1 35230 recognition performance 0 39084 tag set 1 59077 natural language text 1 40538 word frequency 1 41043 pp attachment 1 66657 partial parse 1 22593 other information 0 59076 predicate-argument structure 1 6310 original text 0 40482 further research 0 34472 hidden markov model 2 17550 new approach 0 21078 dependency relation 1 27982 parsing system 2 34302 single sentence 0 18974 maximum number 0 27006 final state 2 50331 term frequency 1 63332 decision list 0 1396 foreign language 1 57526 street journal corpus 0 77717 first phase 0 33033 first argument 0 34535 role labeling 1 42438 common noun 0 32838 type hierarchy 1 35369 system development 0 36255 word accuracy 0 78311 hong kong 0 77235 word recognition 2 50431 human performance 1 12911 same meaning 0 37813 raw text 0 51961 base form 0 35877 parsing strategy 1 96904 nlp system 2 3779 first part 0 38697 similarity score 1 3174 following sentence 0 45648 pitch accent 1 33154 document frequency 1 7797 w ~ 0 39996 computational model 1 20769 general purpose 0 27180 useful information 0 48441 definite clause 0 49302 dialog system 2 831 information retrieval system 2 57782 expressive power 0 71637 discourse relation 1 78926 brown et 0 87445 brown et al. 0 12720 k ~ 0 36870 language acquisition 2 52236 discourse entity 1 112277 discourse marker 2 61155 speci c 0 10333 rule application 0 36760 average length 0 48616 grammar writer 2 54738 exact match 1 6558 m ~ 0 46666 initial tree 0 54192 statistical information 1 23970 word string 1 45360 training corpora 1 33434 lexical sample task 0 18357 current version 0 46871 temporal information 1 34495 baseline model 1 147360 language understanding system 2 17574 new method 0 59850 discourse processing 2 8740 language pair 1 76 linear order 1 8777 empty string 1 47088 answer type 1 32965 lexical category 1 37728 relation extraction 2 51345 subcategorization frame 1 36831 corpus size 1 37343 tree kernel 2 50223 shallow parsing 2 53729 lexical head 1 1520 english language 1 3911 vocabulary size 1 34225 word lattice 1 34343 good performance 0 34891 indirect object 0 57440 defense advanced research 0 1748 latter case 0 10253 semantic feature 1 35152 model using 0 8180 second language 0 54615 tag sequence 1 57234 significant difference 0 57441 advanced research 0 112588 speech understanding 2 21938 general case 0 30705 past participle 1 1035 further work 0 17901 discourse referent 1 33999 probability mass 0 34197 speech translation 2 34476 alignment model 1 695 third person 0 42271 annotated corpora 1 66619 bilingual lexicon 1 70688 propositional content 1 6704 f ~ 0 12563 ~ b 0 42726 cue phrase 1 44905 active learning 2 271047 x \ 0 9942 processing time 1 65781 system architecture 1 2261 generative model 1 2946 transitive verb 1 9335 second experiment 0 293176 text plan 1 33242 cosine similarity 1 17127 query language 2 13434 first time 0 293 same level 0 32740 given input 0 9987 given context 0 39203 inter-annotator agreement 1 54162 information gain 1 42875 free text 0 50124 extraction task 0 128052 resource management 2 157937 intentional structure 0 8817 correct translation 0 21074 natural way 0 46546 derived tree 0 53550 free word 0 87305 tagged corpus 1 2193 g ~ 0 48604 logic programming 2 50421 first case 0 54129 memory-based learning 2 102343 joint venture 0 80123 spoken language system 2 34510 trigram language 0 1333 previous research 0 37412 predicate argument 1 39128 context information 1 52776 rhetorical relation 1 39408 dialogue management 2 59900 inverse document frequency 1 39926 support vector 1 63527 statistical approach 1 65625 retrieval performance 1 80714 database query 1 30509 syntactic tree 1 33623 second order 0 37587 classification problem 1 46934 application domain 0 37481 tree adjoining grammar 1 40894 sentence extraction 2 269382 s \ 0 1557 language analysis 2 90 second phase 0 3151 current sentence 0 33487 error analysis 2 37715 syntactic parser 2 17931 definite noun phrase 1 20809 particular word 0 27303 given text 0 38852 sentence alignment 2 46092 n-gram model 1 53965 lexical acquisition 2 47117 question answering system 2 26030 semantic distance 1 29547 formal language 1 34966 overall accuracy 0 1759 test collection 1 34659 system output 0 5147 english grammar 1 10520 internal representation 0 53942 thematic role 1 27981 similar way 0 57446 projects agency 0 6420 limited number 0 46354 structural information 1 80364 japanese sentence 0 41079 world wide web 2 14563 character string 0 40899 web page 1 44691 transitive closure 1 10300 ambiguity resolution 2 37319 maximum likelihood estimation 2 1470 u ~ 0 38053 correct sense 0 67692 word similarity 1 33055 leaf node 0 7605 first place 0 34668 first pass 0 22327 np \ 0 42422 wsj corpus 1 49720 conceptual structure 1 53053 person name 0 118570 information extraction system 2 211694 language processing system 2 24091 new york times 0 43506 same information 0 45296 subject position 1 952096 acres de coling-92 0 35888 dependency graph 1 42042 high performance 0 5259 r \ 0 40809 comparable corpora 1 4818 language use 1 12794 p \ 0 9930 dictionary entry 1 42535 auxiliary verb 1 56762 word formation 1 28703 left hand side 0 35658 content selection 2 2925 written text 0 1940 different word 0 3971 second part 0 10067 near future 0 57443 research projects 0 37705 dynamic programming algorithm 2 22519 written language 0 44112 n-gram language 0 45212 posterior probability 1 52784 overall system 0 60204 speech signal 1 43987 same entity 0 38789 training text 1 23901 name recognition 1 46064 partial parsing 2 112605 speech input 1 40892 bilingual corpora 1 46287 parallel text 1 27707 linguistic processing 2 52721 previous sentence 0 18765 other work 0 11274 second argument 0 87446 et al. 0 89900 bilingual dictionary 1 330837 default unification 0 449409 y \ 0 490394 j \ 0 34656 translation probability 1 2920 sentence generation 2 43747 multi-document summarization 2 48524 semantic lexicon 1 39864 training phase 1 26033 semantic space 1 43061 discourse analysis 2 2560 e \ 0 81319 plan recognition 0 1090 information system 2 54861 new algorithm 0 41056 broad coverage 0 41175 phrase attachment 1 51314 constraint satisfaction 1 36685 statistical parsing 2 9567 entire sentence 0 61449 syntactic knowledge 1 6754 x x 0 7340 s ~ 0 21095 original sentence 0 44330 communicative goal 0 48530 original document 0 53606 linear precedence 0 1334 function word 1 64972 full text 0 2745 same class 0 34110 bigram language model 1 12228 l l 0 35413 specific domain 0 39481 common sense 0 487 last word 0 35683 generation component 0 37987 national corpus 0 45512 verb sense 1 64068 linguistic data consortium 1 30046 second case 0 12738 finite number 0 38502 % improvement 0 19514 further processing 0 27226 semantic classification 2 36749 singular value decomposition 2 3173 sentence analysis 2 1521 statistical analysis 2 34937 important information 0 70563 natural language learning 2 279 ~ n 0 36392 parsing accuracy 1 5718 entire corpus 0 58283 inside-outside algorithm 2 66729 semantic frame 1 14821 k \ 0 10547 chinese language 1 58146 prototype system 0 102630 sentence planning 2 119073 lr parsing 2 33481 space model 0 35606 based approach 0 1966 complete set 0 46480 development corpus 0 47479 feature function 0 47813 specific information 0 31478 modified version 0 32153 structure analysis 0 2801 generation algorithm 2 54250 first sense 0 1739 subject matter 0 39160 different feature 0 224196 noun group 1 313319 discourse plan 1 225196 computational linguistics computational 0 5900 new language 0 12487 start symbol 0 225198 linguistics computational linguistics 0 45250 automatic acquisition 0 64622 term weighting 2 246 \/ ~ 0 37361 syntactic parse 1 156681 semantic interpreter 2 35311 statistical language model 1 34156 training procedure 2 43485 task completion 0 84070 message understanding conference 1 29771 only difference 0 36832 training process 2 1453 th ~ 0 52981 enough information 0 37463 kernel function 1 80597 matrix clause 1 110121 language sentence 0 41747 distance measure 1 43867 simple sentence 0 12381 v ~ 0 37114 relative importance 0 3084 second type 0 52578 rhetorical structure theory 1 54024 confusion matrix 1 179804 data extraction 2 30842 last year 0 21913 same category 0 8440 k + 0 37449 first set 0 9914 other word 0 33586 dependency parser 2 21957 generative capacity 1 48083 part-of-speech tag 1 23 ~ o 0 27089 first column 0 34261 alignment algorithm 2 37982 particular domain 0 3807 correlation coefficient 1 15120 general knowledge 0 17107 current research 0 21985 np vp 0 9266 y ~ 0 14501 first element 0 52675 event structure 1 10105 chinese text 1 10118 right side 0 44839 sentence containing 0 46282 statistical parser 2 28702 left hand 0 29704 main difference 0 34807 evaluation method 1 35328 second set 0 8656 grammatical information 1 39613 speech processing 2 54125 labeled training 0 18794 string matching 2 77009 syntax tree 1 36679 10-fold cross validation 1 32684 intermediate representation 0 36672 sparseness problem 1 43509 channel model 0 62993 human intervention 1 73343 control structure 0 34701 test data 1 8287 canonical form 1 17196 given set 0 9875 word length 0 1979 minimum number 0 9760 translation lexicon 1 57277 last section 0 4914 difficult task 0 37551 vector machines 0 37718 linear time 1 39267 % confidence 0 41641 linear combination 1 45115 corpus using 0 18545 time consuming 0 1001 same thing 0 66253 correct analysis 0 6755 x x x 0 2071 rule set 1 3874 different language 0 37483 adjoining grammar 0 42805 web search 2 73937 prosodic information 1 31014 semantic level 0 55539 surface string 0 71931 data model 1 90289 search process 2 159 tree t 0 712 computer program 1 2753 same manner 0 55163 probability p 0 58271 training algorithm 2 52752 discourse representation theory 1 12793 final result 0 51182 complete parse 1 57527 journal corpus 0 206227 attentional state 0 35747 dialogue structure 1 37985 british national 0 44437 intended referent 0 62133 % error rate 0 29433 verb class 1 6518 h ~ 0 14255 french word 1 34279 linear interpolation 2 21841 certain threshold 0 37937 % reduction 0 38308 local tree 1 17688 b c 0 39131 full set 0 23877 new set 0 48290 word graph 0 13531 high probability 0 51900 robust parsing 2 52081 syntactic processing 2 33792 syntactic relation 1 47921 discourse level 1 5172 upper case 0 54482 word translation 2 62983 parser output 1 154312 wide scope 0 154689 dr. smith 0 44113 n-gram language model 1 29581 previous example 0 35272 development test 0 42183 ordered list 1 49444 relational database 2 77157 structural ambiguity 1 12863 word meaning 1 36627 set size 1 22017 phrase type 1 43857 target domain 0 47264 second column 0 51338 grammatical relation 1 63483 supervised training 2 68857 new text 0 93518 della pietra 0 14693 semantic component 0 12513 same language 0 34654 phrase translation 2 2936 present work 0 39133 manual annotation 1 73980 relevance feedback 0 148085 dialogue model 1 5400 new model 0 22016 dependency analysis 2 53648 much information 0 34574 alignment error rate 1 1464 same document 0 34250 important feature 0 8490 detailed analysis 0 28239 major problem 0 40912 selection process 2 11987 sample size 0 2617 chinese character 0 961 different corpora 0 48486 single document 0 57439 defense advanced 0 31546 english corpus 1 414 method using 0 37743 parent node 0 41251 boundary detection 2 9695 first example 0 56607 entity type 1 177031 japanese text 1 289331 upper model 0 8070 information processing 2 53359 source document 0 100630 user input 0 6859 relative position 0 36021 recognition problem 0 37148 computational cost 0 38760 logistic regression 2 41156 same corpus 0 54701 language technology conference 0 32533 search strategy 1 1584 automatic text 0 33435 sample task 0 46944 answer key 0 56533 argument position 1 57201 particular type 0 59995 ir system 2 76130 previous utterance 0 82657 sentence planner 2 87718 case study 0 91271 statistical translation 2 98237 air travel 0 38197 null ing 0 2047 information concerning 0 20707 uniform distribution 1 27668 binary classification 2 37358 shallow parser 2 31093 syntactic form 1 41717 cost function 0 42310 error detection 2 50718 van den 0 55795 search procedure 1 59484 accuracy rate 0 89544 precision rate 0 104004 unrestricted text 0 952095 acres de 0 35627 spoken dialogue system 2