From: Hart, Edwin F. Sent: 21 June, 1999 10:39 To: 'John Safarig' Cc: 'Winkler, Arnold' Subject: RE: SGML, ASCII, ISO 8859-1, etc. Dear John, No offense, but I think that this particular technical subject merits a direct conversation and that using an intermediary will result in too much information being lost. First of all, I recommend that you spend the money to purchase the standards because it's difficult to discuss them if you have not seen them. In particular, you need to examine the code tables in the standards. I suspect that the cost will be about $50-$100. You should buy: ANSI X3.4-1986 (7-bit ASCII) ANSI/ISO 8859-1 ANSI/ISO 646 (optional, only if your are really interested) Contact ANSI for ordering information at Attn: Customer Service American National Standards Institute 1 W. 42nd St. New York, NY 10026 1-212-642-4900 1-212-302-1286 (fax) Second, you should understand what a code is. The jargon is a "coded character set". Each coded-character-set standard lists a set of characters, assign a unique numeric value (code position) to each character in the set, names each character, and shows an example picture of the character. Typically, the standards illustrate the example characters in the code with a code table. Third, you need to be aware that one can use two numbering systems to specify the code value (code position) of a character: decimal (base 10) and hexadecimal (base 16). I believe that HTML allows both but I am unsure of the exact notation. You will need to check this. When you see the 7-bit ASCII and the 8859-1 code tables, you can see why one might want to use hexadecimal notation over decimal. Hexadecimal uses the digits 0-9, and then the letters A to F (for 10 to 15). The code value for a character is its position in the code table. Let's take the LATIN CAPITAL LETTER A. It is located in column 4 and row 1 of the ASCII and 8859-1 code tables. In decimal, its code position is 65; in hexadecimal, it is 0x41 ((4 x 16) + 1 = 65 decimal). Similarly, the LATIN CAPITAL LETTER Z is located in column 5, row 10 and has a code position of 90 in decimal and 0x5A in hexadecimal ((5 x 16) + 10 = 90 decimal). For your information, the US 7-bit ASCII code (X3.4-1986) is the archetype of contemporary ISO standard codes. It is the standard code referenced by the SMTP e-mail standard (RFC) and the character set referenced in the ANSI/ISO C Programming Language Standard. (IBM uses its EBCDIC family of codes, but this is a different topic.) 7-bit ASCII is a code table of 128 code positions for 95 graphic characters (printing characters) and 33 control characters. ISO 8859-1 defines graphic characters at 191 code positions within a code table of 256 code positions. ISO/IEC 646 defines graphic characters at 95 code positions within a code table of 128 code positions. The 95 characters comprising the "International Reference Version" or IRV of ISO/IEC 646 are coded exactly the same in ASCII, ISO/IEC 646 IRV, and ISO/IEC 8859-1. (That is, the LATIN CAPITAL LETTER A has the same code position (65/0x41) in each code table.) The following table describes the layout of the code tables for ASCII, ISO/IEC 646 IRV, ISO/IEC 8859-1, and the Microsoft Windows 3.1 codes. Code Positions ASCII ISO 646 IRV ISO/IEC 8859-1 MS Windows 3.1 0-31 (00-1F) C0 Controls undefined (C0 controls) undefined (C0 controls) undefined 32-126 (20-7E) Graphic Characters Graphic Characters Graphic Characters Graphic Characters 127 (7F) DELete (C0 control) undefined (C0 controls) undefined (C0 controls) undefined 128-159 (80-9F) undefined undefined undefined (C1 controls) MS Graphic Characters 160-255 (A0-FF) undefined undefined Additional Additional Graphic Characters Graphic Characters Notes on Control Characters ISO defines the C0 and C1 control characters in ISO/IEC 6429. The C0 controls defined in ASCII are either the same or equivalent to the C0 controls in ISO/IEC 6429. Notes on Graphic Characters Graphic characters from code positions 20-7E are the same across these codes. The additional graphic characters from code positions A0-FF are the same across the 8859-1 and Windows codes. The additional graphic characters consist of 96 additional symbols and accented letters. The Windows code page replaces the C1 control character positions with its own set of 32 graphic characters. There is an additional code standard of which you need to be aware and that is becoming increasingly important. It is a multibyte code with the goal of encoding all of the world's characters. It is called ISO/IEC 10646-1. In addition, the Unicode Consortium defines the Unicode Standard, which describes 10646 and implementing 10646. The 10646-1 standard is very expensive (over $300 when I purchased it in 1993). The Unicode Standard, Version 2.0 book is much more reasonably priced at about $60 or $70 (USD). However, in 2000 ISO will be publishing a second edition of 10646-1 and Unicode will be publishing version 3.0. For your information, the ISO/IEC 8859-1 standard defines the first 256 code positions (0 to 255) of ISO/IEC 10646-1 (and Unicode). Thus, you can see how the ISO coded-character-set standards build on each other. If you are using Windows (3.1, 95, 95 or NT), go to Accessories, Character Map and you can see the graphic characters in the Windows Code Table (but be sure the font is Arial or Times New Roman rather than Symbol or WingDings). I hope that this explanation clarifies the situation. Good luck with your web site. Best regards, Ed Hart 1-443-778-6926 edwin.hart@jhuapl.edu -----Original Message----- From: John Safarig [mailto:johnsafarig@yahoo.com] Sent: 21 June, 1999 00:56 To: Edwin.Hart@jhuapl.edu Subject: Re: SGML, ASCII, ISO 8859-1, etc. Dear Mr. Ed Hart, Hello, how are you? I apologize for my delay in contacting you. I am currently out of town and will be for some time. Thus, that is why I use a Yahoo! mail account because I cannot access my service providers account. I am very pleased to know you are willing to help me out. I appreciate it sincerely. So then, I contacted my mother since I will not be static for a while limiting my phone communications, I asked her to take the call for me if you don't mind. I left her a notepad with my precise queries and hopefully you can help. You can reach her at the following number; 813-685-0055 that's a Florida number. It is also her work place (she is a manager at an apartment complex) just ask for "Diana". Basically, so you know what to look forward to, I am in the process of building a website for my company "Safari Graphics Inc."(or Safarig) I am writing alot of the HTML myself however, I came across ASCII and ISO-8859-1 and I was confused at how they both work. It is to my understanding that ASCII is a binary code (7-bit) that represents letters , numbers, etc. in data processing and translating. I believe that ISO-8859-1 is used for escape codes in HTML. However, I noticed that they warned users that numbers above 127 may not be compatible with all platforms. I don't understand this because I thought that they were "standards". Then they list entity references such as < along with numeric references like ? So is ISO-8859-1 comprised of numeric and entity references combined for decimals 00 through 255? or are the entity references categorized as ISO-8879: 1986 added latin? I can't seem to find a complete and accurate listing for these. Everyone refers to it as HTML special characters and not what ISO defines as complete standards character set for 8859-1. ASCII is new to me, thus I'm not sure how it works internally. I really hope I'm not asking too much of you, but I can't seem to find the answers. I have alot to learn, but I have big dreams for this web thing. I'm interested in a long term relationship with computers. Mainly, one day I'd like my company to produce a browser. Long way off, but I love the idea. Hopefully soon, you can visit my site at what should be "Safarig.com" Well Mr. Hart, I thank you for your time and your willingness to help. Maybe one day I could return the favor. Take care. Sincerely, John Safarig JohnSafarig@yahoo.com --- "Hart, Edwin F." wrote: > Dear Mr. John Safarig, > > I know nothing about SGML nor ISO 8879. (I last > worked with IBM's > GML/SCRIPT/DCF products about 9-10 years ago.) > However, I do know something > about ASCII, ISO/IEC 646, and ISO/IEC 8859-1. > > Please give me a call so that we can discuss your > concerns. > > Best regards, > Ed Hart > 1-443-778-6926 > edwin.hart@jhuapl.edu > _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com