More on Transcriber and Unicode input

Bruce Cox bruce_cox at SIL.ORG
Sat Nov 20 15:51:03 UTC 2010


There is another option for keyboarding special characters that I 
succeeded in getting to work with Transcriber which is based on the 
AutoHotkey program. An IPA keyboard based on this system can be found at 
http://scripts.sil.org/UniIPAKeyboard. In compatibility mode, it 
effectively pastes the unicode character into the target window. If you 
don't like the bindings you can (in principle) write your own script.

Having said that, I don't use it, and I could only get an alternative 
script that I wrote on the basis of the one I downloaded several months 
ago for a different keyboard to work in Transcriber for some reason. But 
that did work, which is at least a proof of concept.

Cheers,
bruce

On 20/11/2010 7:15 AM, Andrew Margetts wrote:
> This is a follow up to my earlier post, 'Toolbox as Transcriber'. 
> Several people responded with the suggestion that an alternative 
> strategy might be to use Microsoft Keyboard Layout Creator (MSKLC) 
> with Transcriber to facilitate the direct input of Unicode special 
> characters (using UTF-8). MSKLC is freely available from 
> http://msdn.microsoft.com/en-us/goglobal/bb964665.aspx
>
> This sounded like a great idea (albeit one that rather negated my 
> own), so I had a look at it. Unfortunately, as far as I can see, MSKLC 
> doesn't work in Transcriber, (at least with version 1.5.1 on Windows 
> XP Professional). If anybody knows otherwise I would be very 
> interested to hear.
>
> The good news is that MSKLC is easy to set-up and use, and does work 
> well in (among others):
> Toolbox
> ELAN
> Notepad
>
> Regarding what IS possible in Transcriber:
> 1) you can do search-and-replace oprations within Transcriber, but it 
> seems you must paste special characters to the dialog box from an 
> external editor that can handle the required input. Therefore it is 
> really simpler to just do all such editing in a text editor, after 
> completing the Transcriber file. Notepad can be used for this task - 
> i.e. it can handle UTF-8. (Avoid Wordpad and Word which just introduce 
> problems; Notepad is reliable because it is purely a text editor).
> 2) you can also paste text strings which include special characters 
> directly into Transcriber units.
>
> In any case, it is crucial is to explicitly set the encoding in 
> Transcriber to UTF-8 thus:
> 'Options > General > Encoding > Unicode(UTF-8)'
>
> The result is that the top line in the .trs file will read:
> <?xml version="1.0" encoding="UTF-8"?>
> rather than
> <?xml version="1.0" encoding="ISO-8859-1"?>
> which is the default.
>
> This technique however does not always work well on existing 
> Transcriber files (you have to at least make a change to the file so 
> that you can save it); but of course you can instead just make the 
> substitution in a text editor rather than using the Transcriber commands.
>
> For good measure I suggest also doing in Transcriber:
> 'Options > Save configuration'
> to keep UTF-8 as the default encoding for new Transcriber files.
>
> Failure to do this may result in Transcriber discarding all your 
> Unicode characters on save, close and re-open - which is really very 
> annoying. If you are having this problem check that the top line is 
> correct!
>
> To summarise this as a work-flow, in case you do wish to use 
> search-and-replace techniques with MSKLC I suggest:
> 1) define and load your custom MSKLC keyboard - it will show up as one 
> of the options in the 'Language bar' (usually present in the Windows 
> Taskbar at the bottom of the screen - if it is not there you will have 
> to enable it via 'Control Panel > Regional and Language Options > 
> Languages > Details > Settings > Language Bar > Show the language bar 
> on the desktop').
> 1) set Transcriber to encode as UTF-8, but then use a working 
> orthography in Transcriber.
> 2) open each finished file in Notepad (or other text editor) and 
> transform to the real orthography with search-and-replace, using the 
> default keyboard for the 'search' term and switching to your custom 
> keyboard (using the Language bar) for the 'replace' term.
> 3) (Optionally reopen the file in Transcriber to see what it should 
> have looked like all along).
>
> Incidentally, the on-line Transcriber to Toolbox converter can handle 
> (and display) UTF-8 so you should have no problem using it to convert 
> such a Transcriber file to an accurate Toolbox representation. As 
> mentioned above, both Toolbox and ELAN support MSKLC keyboards.
>
> If you subsequently import a Toolbox file that uses UTF-8 special 
> characters into ELAN you must use 'File > Import > Toolbox File...' , 
> rather than 'File > Import > Shoebox File...' , and you must tick the 
> box 'All markers are Unicode'. Similarly, you should use 'File > 
> Export As > Toolbox File(UTF-8)...'
>
> I hope these notes save someone some pain.
>
> Andrew Margetts
>



More information about the Resource-network-linguistic-diversity mailing list