Log in

No account? Create an account
28 July 2004 @ 10:33 pm
Unicode keyboard (in progress)  
I've been working on a "US Unicode" keymap off and on lately. It should be more "off" than "on"; MS Keyboard Layout Creator seems to have become my procrastination tool of choice aside from the Internet. Anyway, it's not done, but it seems to be shaping up nicely. It's a lot more ad hoc than my US Latin-1 keymap; instead of being designed to make characters from a well-defined, limited charset easily accessible, it's just meant to make a bunch of characters from the Unicode repertoire that I thought would be handy or just nifty usable.

Although I'm calling it "US Unicode", it obviously doesn't (and won't) contain the entire Unicode repertoire. That'd be impossible in a simple keymap. Instead, it provides a bunch of (mostly non-decomposable) characters from the Unicode Latin ranges, the more common Combining Diacritical Marks (not including some of the IPA diacritics and particularly obscure ones), plus some General Punctuation, and a smattering of Greek, Mathematical Symbols, Arrows, and other stuff. The repertoire should, I believe, cover all of the characters necessary to write the languages covered by the ISO Latin charsets (but using combining diacritics instead of precomposed accented characters) plus Vietnamese, and more.

If you see anything that doesn't make sense, or doesn't seem like a good idea, or if you just have any ideas, please tell me. More than the Latin-1 keymap, this one is "just for fun", but I still want it to be handy.

The "unshifted" and "shift key" shift states are identical to the standard US-ASCII keymap, so it doesn't get in the way of your normal typing. The only difference is that shift spacebar, instead of being identical to unshifted space, is "hair space" (a very narrow space).

The keymap in the AltGr (right alt or ctrl+alt) shift state:

And in the "AltGr Shift" shift state:

You'll note that the B key doesn't do anything in these. That's because I haven't thought of anything for it to do yet. As I said, it's unfinished, although it's mostly usable. Also, the diacritics in the AltGr shift states are combining diacritics: you type them after the letter you want to apply them to—this is the opposite order from the dead-key-based Latin-1 keymap, but probably more intuitive, and more flexible as well: you can add as many diacritics as you like to a character.

Whenever possible, if a character comes in a case pair (has lowercase and uppercase versions), the shift state reflects this. Not all do, though.

I'll list the characters here, because it can be kind of tough to make them out in some cases. If only one character name is given, it's lowercase for AltGr and uppercase for AltGr Shift. Otherwise, the first character listed by a key is reached by AltGr and the second by AltGr Shift. Some explanations are given in parenthesis.

Numbers row:
`: combining Grave accent; combining Tilde
1: roman numeral I (dead key: numerals); inverted exclamation point (Spanish)
2: trademark; registered trademark
3: pound; euro
4: cent sign; yen
5: permille (per thousand); permyriad (per ten thousand)
6: combining Caron/Hacek; combining Circumflex
7: combining Rhotic hook; reversed question mark ("symbol for substitute form two", whatever that means)
8: infinity; dagger
9: combining Macron accent; combining Breve accent
0: combining Ring Above; combining Inverted Breve
-: en dash; em dash
=: equals sign (dead key: mathematical relations); plus sign (dead key: mathematical operators and misc. math)

Top row:
Q: O-slash
W: wynn (Old English equivalent to "w")
E: open e/epsilon
R: smallcaps R; Old Norse yr
T: thorn (Icelandic/Old Norse/Old English character for "th")
Y: psi
U: latin letter OU (used in Algonquin)
I: latin letter OI (pan-Turkic latin alphabets)
O: OE ligature
P: pi
[: left single guillemet (angle quatation mark); left double guillemet (chevron quotation mark)
]: right single guillemet (angle quotation mark); right double guillemet (chevron quotation mark)
\: backslash (dead key: single arrows & symbols); broken pipe (dead key: double arrows and symbols)

Middle row:
A: ash (ae ligature)
S: esh
D: eth (Icelandic/Old English character for voiced "th")
F: phi
G: yogh (Old Irish "gh")
H: hwair (Gothic "hw")
J: ezh
K: kra (Greenlandic); glottal stop
L: L with middle dot (Catalan)
;: combining Diæresis/Umlaut; combining Dot Above
': combining Acute Accent; combining Double Acute

Bottom row:
Z: long s (archaic German non-final small s); sharp S (German ess-zet, long s small s ligature)
X: schwa
C: combining Cedilla; copyright sign
V: upsilon
B: (currently unassigned)
N: eng (velar N, like "ing")
M: micro sign; minimi (or Scorpio sign)
,: combining Comma Below; combining Comma Above
.: combining Ogonek; combining Hook
/: fraction slash (dead key: fractions); inverted question mark

Spacebar: non-breaking space (dead key: spacing equivalents of diacritics, misc.); zero width space (dead key: spaces, soft hyphen)
Decimal point (number pad period): middle dot; figure dash

Technically, the rhotic hook (AltGr 7) is not a combining character, it's a modifier letter, but it's always attached to a preceding letter so it may as well be.

The dead keys provide extra characters (mostly symbols) that don't fit elsewhere. Hit the dead key combination and then another key to get the additional characters:

AltGr space: spacing characters
any combining diacritic: spacing equivalent of the diacritic (mostly from Spacing Modifier Charatcers, but some from elsewhere)
hyphen: dictionary hyphenation point
P: pilcrow (paragraph symbol)
lowercase a and o: Spanish feminine and masculine ordinals (respectively)
1, 2, and 3: superscript equivalents
0 (number zero): degree sign
tilde (non-combining): swung dash
lowercase s: section sign
semicolon: reversed semicolon
8: two asterisks stacked vertically
asterisk: asterism (three asterisks in a triangle)
infinity (AltGr 8): reference mark
dagger (AltGr *): double dagger
space: non-breaking space

AltGr shift space: additional spaces
hyphen: soft hyphen
nonbreaking space (AltGr space): narrow no-break space
hair space (shift space): thin space
en dash (AltGr hyphen): en space
em dash (AltGr shift hyphen): em space
space: zero-width space (word break)

AltGr backslash: single arrows and symbols
numbers: basic arrows. This is intended for use with the number pad with numlock on. Each number gives an arrow pointing in the same direction on the numpad (2 is down, 6 right, etc.). 5 is a two-headed left-right arrow, 0 is a counterclockwise circular arrow
period/decimal point: clockwise circular arrow
equals sign: left arrow over right arrow
slash: left-down corner arrow
asterisk: down-right corner arrow
backslash: paired up and down arrows
pipe: two-headed up-down arrow
lowercase z: downward zigzag arrow

AltGr pipe (shift backslash): double arrows and game symbols
numbers: thick (double stem) arrows. Same arrangement as with AltGr backslash, but no 0
pipe: two-headed thick up-down arrow
k q r b n p: chess pieces (king, queen, rook, bishop, knight, pawn). Lowercase is white, uppercase is black
I'll probably add card suits too

AltGr equals: mathematical comparisons
tilde (non-combining): approximately equal to
slash: not equal to
less than, greater than: less than or equal to, greater than or equal to
lowercase e, uppercase e: member of the set of, not a member of the set of
lowercase schwa, uppercase schwa: contains, does not contain
square brackets: left and right single arrows (I may remove this because of the AltGr backslash arrows)
curly brackets: left and right double arrows (may remove because of AltGr pipe)

AltGr plus (shift equals): mathematical operators
hyphen: minus sign
plus: plus or minus sign
lowercase x: multiplication (cross product) sign
asterisk: multiplication (dor product) sign
ampersand: and operator
pipe: or operator
exclamation point: not sign

AltGr one: roman numerals
digits 1–9: roman numerals I–IX
digit 0: roman numeral X
e and t: roman numerals XI and XII
l c d m: roman numerals L, C, D, M
shifted keys produce lowercase roman numerals
o, p, w: roman numerals for 1,000, 5,000, 10,000 (uppercase is the same)
u or U: roman numeral reversed 100 (for large numbers)

AltGr slash: fractions
digit 1: "one over" (superscript 1 with fraction slash)
digit 2: one half (½)
digit 3: three quarters (¾)
digit 4: one quarter (¼)
lowercase c: care of (c/o)
o and O: o-slash and O-slash

An earlier version was abandoned because, since it only used AltGr-space as a dead key (which I thought was kinda cool), there was very little rhyme or reason to a lot of the symbols' keystrokes. Also, the en and em dashes were available though AltGr space hyphen and AltGr space underscore, respectively; pretty inconvenient keystrokes for characters that were to some extent the reason for the keymap's existence. This version dumps the combining long and short bar overlays (AltGr hyphen and underscore in the earlier version) and the combining long and short slash overlays (AltGr backslash and slash in the earlier version), opening up better positions for the dashes. The new dead keys were mainly prompted by having some positions with no obvious symbols to associate with them, and making extended symbols somewhat more sensible was kind of a side effect. AltGr 1 is still kind of a "yeah, whatever, here's something to stick there" assignment, since there's really no point to having special roman numeral characters (except maybe the very large ones that don't correspond to regular letters).

I may remove the O-slash entries from the fraction-slash list, since I assigned them to Q. I may actually drop the slash dead key entirely and just make it the fraction slash (since it's supposed to turn adjacent numbers into a fraction anyway, so the vulgar fractions are kind of redundant). I'd only lose the care-of symbol that way, but I may be able to fit that in somewhere else.

One thing I've been pondering is using SGCAPS: setting it so capslock is not equivalent to shift, but is a separate shift state entirely. Letters would then not change case when capslock is on, but the keys that normally produce diacritics when AltGr is pressed would act as dead keys to produce precomposed accented characters. Since characters with combining diacritics are supposed to be interchangeable with equivalent precomposed characters, this is technically redundant, but the fact of the matter is that with current fonts and display technologies, precomposed characters tend to look better than ones that are composed on the fly. Also, it'd allow access to accented characters even when using software that only supports non-combining charsets like Latin-1 and not Unicode (a dedicated keymap like my US Latin-1 would probably be preferable in those cases though). It'd be a lot of tedious work, though (MSKLC's interface for assigning dead key combinations isn't very convenient), and I'm not sure it'd be worth it. It'd also make accidentally hitting the %$#@* capslock less likely to interfere with what you're typing (since a lot of the letters would not change), but it'd still occasionally cause problems (lowercase c would suddenly become a cedilla dead key, quotes would also be dead keys—similar to how US International confuses things) and you'd be less likely to notice that capslock is on (since there probably wouldn't be any immediate change). Something to think about, at least.

This has got to be one of the geekiest hobbies ever devised, bar none.
Current Mood: geekygeeky
gwalla: evil mickeygwalla on July 30th, 2004 12:55 am (UTC)
Added B With Topbar (U+182 and U+183) for B. My desire for symmetry is bugging me now, since there's also D With Topbar, and the D key is already filled. Damn.

Also replacing the L With Middle Dot with Lambda, since the middle-dotted Ls are really the same as Ls followed by a middle dot (which is AltGr+decimal point) in Catalan (the middle dot is just added to a sequence of two Ls to differentiate it from the digraph "ll"). Considering dropping yogh, since it's very visually similar to ezh, even though it's possibly convenient for writing a voiced velar fricative in some potential conlang romanization. I'd replace it with gamma, but that brings up the question of which gamma, since Unicode provides two pairs: Latin Letter Gamma (used in some African languages), and Greek Letter Gamma. The Latin letters sound like the right thing (since they'd be mixed in with other Latin letters, as the keymap really isn't suitable for writing Greek), but the problem is the capital just looks like a larger form of the lowercase, instead of the cool-looking (IMO) upside-down L/gallows shape of the Greek capital. Since I'm already using some letters from the Greek block, I may as well use the Greek versions, I suppose.

Current thinking is to drop the smallcaps R and move the rhotic hook from 7 to R, keeping YR as the shifted equivalent. I'm just not seeing a use for smallcaps R, especially in the absence of other smallcaps letters (except for kra, which kinda counts). I'd only put it in there because I wanted YR, and smallcaps R is listed in the Unicode charts as being the lowercase equivalent thereof. I may also dump the Symbol For Substitute Form Two, despite looking like a rhetorical question mark, since it doesn't seem to be present in any fonts. Oh well. This would, incidentally, open up 7 entirely. I could possibly use it for combining short stroke overlay, or maybe a dead key for letter-with-stroke. Not sure.
gwalla: lon chaneygwalla on July 30th, 2004 12:58 am (UTC)
Or I could keep the smallcaps R, leave the rhotic hook on AltGr+7, and just drop the Symbol For Substitute Form Two, giving me only one open space to assign (which could be easier) and leaving the smallcaps R in case I want it sometime later (like if I wanted more than one kind of R sound in a conlang and didn't want to or could use a digraph...hmm...)