[
Advertise | Submit Code | About us | Contact us | Link us
]
Go!
Membership Services
Login
Register

Home
C# General

General

C# Language

Design & Architecture

Algorithms

Database

Security

Active Directory

COM Interop

Remoting
C# Windows Forms

General

Combo and List boxes

Miscellaneous Controls

Button Controls

Edit Controls
Cutting Edge

ASP.NET 2.0

Visual Studio 2005

Windows Longhorn

SQL Server 2005
C# Multimedia and GDI+

General

DirectX

GDI+

Audio
Internet & Web

General

Images and multimedia

Database

Utilities

Security

ASP.NET Controls

Design and Architecture

Webservices
.NET

General

Design & Architecture

Algorithms

Database

Security

Active Directory

COM Interop

Remoting

ADO.NET

XML.NET

Tools

Enterprise

IDE
Visual Basic .NET

VB.NET General

VB.NET Controls
General Reading

.NET Books Review

Product Showcase

Book Chapters

Business Design & Strategy
Community

Discuss

Job Board

Discussion

CodeXchange
DeveloperLand

Advertise

Submit Code

About us

Contact us

Link us
Miscellaneous

Favorite Links

Downloads

Programming Sites

Top Stories
Regular Expressions

E-Mail

Date/Time
Home > C# General > Algorithms
EBCDIC Encoding with .NET
Posted by on Thursday, August 26, 2004 (EST)

A discussion of IBM's EBCDIC character encoding, and a library to help developers to use it in .NET.

This article has been viewed: 4,431 times
Technology: Algorithms.

debug.zip (10.59 KB)
release.zip (6.58 KB)
source.zip (19.02 KB)

Contents

EBCDIC Encoding with .NET

After reading a post on the C# newsgroup asking for a EBCDIC to ASCII converter, and seeing one solution, I decided to write my own implementation. This page describes the implementation and its limitations, and a bit about EBCDIC itself.

Top Go to Table of Contents

EBCDIC

Unfortunately it appears to be fairly tricky to get hold of many concrete specifications of EBCDIC. This is what I've managed to glean from various websites:

  • Introduced by IBM, EBCDIC is an encoding mostly used on mainframes.
  • Like "OEM", EBCDIC isn't a single character encoding: there are many EBCDIC encodings, suited to different cultures.
  • It is primarily a single-byte encoding, ie each character is encoded as a single byte. However, there are two characters, "shift out" and "shift in" (0x0e and 0x0f respectively) which are used to change between this an a double-byte character set (DBCS). As far as I can tell, a single EBCDIC encoding doesn't specify which DBCS is to be used - in other words, you really need even more information before you can tell what's going on. Presumably the DBCS in question can't have any pairs beginning with byte 0x0f, as otherwise it would be confused with the "shift in" flag.

If you have any more information, particularly about the DBCS aspect, please mail me at skeet@pobox.com [^].

Top Go to Table of Contents

My EBCDIC Encoding implementation

I managed to get hold of details of 47 EBCDIC encodings from http://std.dkuug.dk/i18n/charmaps/ [^]. To be honest, I don't really know what DKUUG is, so I'm really just hoping that the maps are accurate - they seem to be quite reasonable though. Each encoding has a name and several have aliases, although I currently ignore this aliasing.

My implementation consists of three projects, described below, of which only the middle one is of any interest to most people.

A character map reader
This simply finds all of the files whose names begin with "EBCDIC-" in the current directory, reads them all in (warning of any oddities in the encoding, such as any non-zero byte having two distinct meanings) and writes a resource file out, ebcdic.dat. This is a console applicion built from a single C# source file.
An encoding library
This is a library built from two C# source files and the ebcdic.dat file generated by the reader. This library is all most users will need. More details are provided below.
A test program
This is a console application built from a single C# source file and requiring the library described above. Currently it just displays the encoded version of "hello" and then decodes it.

Top Go to Table of Contents

Using The Encoding Library

The encoding library is very simple to use, as the encoding class (JonSkeet.Ebcdic.EbcdicEncoding) is a subclass of the standard .NET System.Text.Encoding class. To obtain an instance of the appropriate encoding, use EbcdicEncoding.GetEncoding (String) passing it the name of the encoding you wish to use (eg EBCDIC-US). You can find out the list of names of available encodings using the EbcdicEncoding.AllNames property, which returns the names as an array of strings.

Once you have obtained an EbcdicEncoding instance, use it like any other Encoding: call GetString, GetBytes etc. The encoding does not save any state between requests, and can safely be used by many threads simultaneously. There is no need (or indeed facility) to release encoding resources when it is no longer needed. All encodings are created on the first use of the EbcdicEncoding class, and maintained until the application domain is unloaded.

Top Go to Table of Contents

Sample Code

The following is a sample program to convert a file from EBCDIC-US to ASCII. It should be easy to see how to modify it to convert the other way, or to use a different encoding (eg from EBCDIC-UK, or to UTF-8).

using System;
using System.IO;
using System.Text;
using JonSkeet.Ebcdic;
public class ConvertFile
{
    public static void Main(string[] args)
    {
        if (args.Length != 2)
        {
            Console.WriteLine 
                ("Usage: ConvertFile  ");
            return;
        }
        
        string inputFile = args[0];
        string outputFile = args[1];
        Encoding inputEncoding = EbcdicEncoding.GetEncoding ("EBCDIC-US");
        Encoding outputEncoding = Encoding.ASCII;
        
        try
        {
            // Create the reader and writer with appropriate encodings.
            using (StreamReader inputReader = 
                      new StreamReader (inputFile, inputEncoding))
            {
                using (StreamWriter outputWriter = 
                           new StreamWriter (outputFile, false, outputEncoding))
                {
                    // Create an 8K-char buffer
                    char[] buffer = new char[8192];
                    int len=0;
                    
                    // Repeatedly read into the buffer and then write it out
                    // until the reader has been exhausted.
                    while ( (len=inputReader.Read (buffer, 0, buffer.Length)) > 0)
                    {
                        outputWriter.Write (buffer, 0, len);
                    }
                }
            }
        }
        // Not much in the way of error handling here - you may well want
        // to do better handling yourself!
        catch (IOException e)
        {
            Console.WriteLine ("Exception during processing: {0}", e.Message);
        }
    }
}

Top Go to Table of Contents

Limitations

Due to the lack of available information about the DBCS aspect of EBCDIC, this encoding class makes no effort whatsoever to simulate proper shifting. Shift out and shift in are merely encoded/decoded to/from their equivalent Unicode characters, and bytes between them are treated as if the shift had not taken place. (This means that a decoded byte array is always a string of the same length as the byte array, and vice versa).

Any byte not recognised to be from the specific encoding being used is decoded to the question mark character, '?'. Any character not recognised to be in the set of characters encoded by the specific encoding being used is encoded to the byte representing the question mark character, or to byte zero if the question mark character is not in the character set either.

The library doesn't currently have a strong-name, so can't be placed in the GAC. You may, however, download the source and modify

Top Go to Table of Contents

Licence

This was just an interesting half-day project. I have no desire to make any money out of this code whatsoever, but I hope it's interesting and useful to others. So, feel free to use it. If you have any questions about it, or just find it useful and wish to let me know, please mail me at skeet@pobox.com [^]. You may use this code in commercial projects, either in binary or source form. You may change the namespace and the class names to suit your company, and modify the code if you wish. I'd rather you didn't try to pass it off as your own work, and specifically you may not sell just this code - at least not without asking me first. I make no claims whatsoever about this code - it comes with no warranty, not even the implied warranty of fitness for purpose, so don't sue me if it breaks something. (Mail me instead, so we can try to stop it from happening again.)

Top Go to Table of Contents

History

  • August 31st 2003, v1.0.0.1 - no in-code changes, just made the XML documentation build correctly.
  • August 28th 2003, v1.0.0.1 - slight tweaking to remove unnecessary (and probably counterproductive) efficiency measure. No functional changes.
  • May 21st 2003, v1.0.0.0 - initial implementation.

Top Go to Table of Contents

About Jon Skeet

Click here if you want to know more about .

Other articles that may interest you

  • Write a Word Add-In – Part 0
  • Write a Word Add-In – Part I
  • Lengthy Operations on Single Thread in .NET Application
  • Learning Draughts
  • Exceptions and Performance
  • Average Rating :

    Discussion Forums
    Got a programming related question? Hopefully someone has the answer... Want to help out other developers? Visit our discussion forums.

    Sponsored by:

    New Articles

  • Exceptions and Performance
    Almost every time exceptions are mentioned in mailing lists and newsgroups, people say they're really expensive.Let's examine that claim, shall we?

  • Creating multilingual websites - Part 1
    Extend the existing globalization capabilities of .NET to create flexible and powerful multilingual web sites. First, create a custom ResourceManager, and then create custom localized-capable server controls to easily deploy multilingual functionality.

  • Parameter passing in C#
    Many people have become fairly confused about how parameters are passed in C#, particularly with regard to reference types. This page should help to clear up some of that confusion

  • Most Popular Articles

  • LDAP, IIS and WinNT Directory Services
    This article explains how to use .NET Directory Services to retrieve and search directory objects, create new directory objects and edit or delete existing directory objects. Describes Active Directory Application Mode (ADAM) and how to use the IIS, WinNT and LDAP directory (ADSI) provider.

  • An in-depth look at WMI and instrumentation, Part II
    WMI stands for Windows Management Instrumentation and, as the name indicates, is about managing your IT infrastructure this article is the second part of a two-part series.

  • An in-depth look at WMI and instrumentation, Part I
    WMI stands for Windows Management Instrumentation and, as the name indicates, is about managing your IT infrastructure this article provides an in-depth look at WMI and MOM 2005

  • New Books

  • Murach's ASP.NET 2.0 Upgrader's Guide: VB Edition
    What’s new and how to use it! That’s what this book delivers if you’re a VB developer who’s interested in upgrading from ASP.NET 1.x to ASP.NET 2.0.

  • C# in easy steps
    Learn to program with Microsoft’s premier programming language. No previous programming knowledge is assumed. With numerous easy-to-follow examples, this title explains the essentials of object-oriented programming with C#.

  • Murach's ASP.NET web programming with VB.NET
    Murach's ASP.NET web programming with VB.NET by Doug Lowe and Anne Prince is a in depth training and reference book for ASP.NET programming using VB.NET. The book builds upon Murach's previous books and covers more advanced concepts for programming ASP.NET pages.

  • Got Code?

    if you have any article , source code , or anything else you'd like to share with this community that you think others might find useful, please submit it here and we will gladly make it available on this site. submit@developerland.com.
    Partners

    All articles are copyrighted by their individual authors unless otherwise specified , everything else Copyright ©2004-2006 DeveloperLand