Braille encoding

Questions about XML that are not covered by the other forums should go here.
forget-e
Posts: 2
Joined: Thu Oct 14, 2021 6:16 pm

Braille encoding

Post by forget-e »

Hello!

I'm working on a project for which I am transcribing and encoding a book in two written languages plus the two written languages in braille as well. I know that there are braille characters in the special characters in Oxygen, which is great, but I'm wondering if it's better practice to use the actual braille characters or if I should be using the decimal or hexadecimal character entity instead?

I've done a ton of Googling on this and I haven't found anything that describes best practices for encoding braille. I've been using the actual characters up until now and I'm hoping I don't have to redo it in one of the other formats, but it occurred to me that there might be advantages to using the character entity instead.

Thanks! Any advice is helpful.
Radu
Posts: 9431
Joined: Fri Jul 09, 2004 5:18 pm

Re: Braille encoding

Post by Radu »

Hi,

What XML vocabulary are you using for the documents?
I'm afraid I do not know much about Braile characters, I would assume that if you write the XML using plain English characters, you could maybe apply a custom XSLT stylesheet which automatically converts the characters to Braille equivalents. In XSLT there is a translate function which can map a character to another:
https://www.saxonica.com/html/documenta ... slate.html
But I do not know if alphanumerical chars and Braille map one to one.

There is also a Daisy consortium which deals with accessibility and they use XML, maybe you can find some public list and ask them about this use case:
https://github.com/daisy

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
forget-e
Posts: 2
Joined: Thu Oct 14, 2021 6:16 pm

Re: Braille encoding

Post by forget-e »

Hi, Radu. Thanks for your response!

I'm using TEI, if that's what you mean. I'm quite new to XML/TEI so I'm still figuring out the basics as I work through this project and another one simultaneously.

It's actually easier for me to look up the braille special characters rather than typing it out in plain English and having the software translate it to the braille characters for me. I'm transcribing directly from the braille book itself. I'm not fluent in braille yet, but I know enough that it's better to do it that way - and it's helping me learn more braille as I go, so I like that process.

I'm just wondering if there might be a reason why I'd want the hexadecimal code rather than the special character itself in my XML/TEI document, for instance if it might make searching or queries easier later on? As I said, I'm still learning XML/TEI and I'm using this as a research-creation project, so I'm not 100% sure what I'll be looking for/doing with it once I've finished the transcription. But there doesn't seem to be a way to translate the special character to the hexadecimal automatically (unless there is and I'm missing it? That would be perfect if there is, honestly) so I'd hate to get to the end of the project and realize I have to re-transcribe everything because I needed the hexadecimal rather than the actual braille characters.

If there's no difference in the end, then I'd definitely prefer to keep using the special characters because I can sight-proofread those a lot easier than the hexadecimal because I know the braille characters.
Radu
Posts: 9431
Joined: Fri Jul 09, 2004 5:18 pm

Re: Braille encoding

Post by Radu »

Hi,

From the point of view of an application interpreting the XML content, a character and its hexadecimal equivalent entity are perfectly equivalent.
The only time when I would use the hexadecimal equivalent of a character would be when the XML document declares in its header that it has a very restrictive encoding like "ASCII".
For example a document like this:

Code: Select all

<?xml version='1.0' encoding='ASCII'?>
<root>
  ऎ
</root>
cannot be saved to ASCII encoding because there is a Japanese character, so it must be encoded to its hexadecimal notation in order to save the content:

Code: Select all

<?xml version='1.0' encoding='ASCII'?>
<root>
  &#x90e;
</root>
but by default XML documents are saved with UTF-8 encoding which can encode any character directly. So something like this can be saved directly:

Code: Select all

<?xml version='1.0' encoding='UTF-8'?>
<root>
  ऎ
</root>
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply