[Corpora-List] Q: Hyphenation removal

Angus Grieve-Smith grvsmth at panix.com
Fri Aug 17 15:39:28 UTC 2012


On 8/16/2012 7:37 AM, Roland Schäfer wrote:
> are there any tools to remove hard-coded "hyphe- nation" from texts (or
> papers describing principled solutions to the problem).

     I'm sure that there's something out there and that someone on the 
list will know where to find it.

     I don't know about German, but in English there is significant 
ambiguity.  There are many instances where a hyphen is optional. 
Fortunately for your purpose, I believe that the differences in meaning 
are small enough that in those cases you could probably remove all the 
hyphens.  Some are even typographically motivated, such as 
"antiinflamatory," which exists but is used less often than 
"anti-inflammatory" because people seem to be uncomfortable writing two 
"i"s in the middle of a word in English.

     Maybe someone with more experience in this area can elaborate.

-- 
				-Angus B. Grieve-Smith
				grvsmth at panix.com


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list