-a cC" РР”

in < Pais

FREE А LE FORMATS | AND D FUTURE OF < ‘INTELLECTUAL

FREEDOM

(P

of pinare

LOCHAECORUS.COM 5:5 Ideasarenotobjectsthatcanbewholly Allare musicand the accompanying owned - they can be infinitely copiedand artwork is released under Creative re-used due to their immaterial form.

Commons Attribution Share Alike Licence

COMMODITT

We are always interested in hearing fram

creative people- please contact usat info@locarecords

| <

Math Focus =

Communicating 5 Реге d N EWS Е ісе. Discussion > Collaboration

| ©

| бое -

...15 getting B Q OK 5 Ф ier. 2

c

гер

Қайы epee тезінен Бет PS hese ©

Powerful tools for online mathematics

Contents

Issue 1, February 2005

EDITORIAL Welcome to the first Free Software Magazine 7 FOCUS

Format Wars 8

by Marco Fioretti File formats: the past, the present and a possible future

XML: the answer to everything? 12 by Kay Ethier, Scott Abel

This article weighs the pros and cons of XML for some applications (publishing), and explores why it is the best possible solution for many programming and publishing needs.

Free file formats and the future of intellectual freedom 17 by Terry Hancock

Information as property may be served by closed file formats, but the freedom of information requires free formats

TECH WORLD

Creating Free Software Magazine 24 by Tony Mobily

A long path that takes us to the very beginning of this project

Mac OS X: Welcome to the jungle 30 by Chris J. Karr A look inside the Mac OS X software ecology

The magic of live CDs 35 by Harish Pillay What are live CDs, and how do they work?

Every engineer’s checklist for justify- ing free software 39 by Malcolm D. Spence

Free software is not just about “no license fees"!

Smarter password management 45 by John Locke How to handle your passwords without getting lost

WORD WORLD

The content tail wags the IT dog 49 by Daniel James

Without hardware and software, there would be nothing for digital media to be created on, or used with. And yet the content industry attempts to tell the far larger IT industry what it can and cannot do.

4 Free Software Magazine n. 1, February 2005

ivation ге DU (Sx 52 by Aaron E. Klemm

Wikipedia and PlanetMath show the way

by Christian Einfeldt

Freedom is free software’s competitive advantage by David M. Berry

The Commons as an Idea - Ideas as a Commons

by Tom Chance

Free software is not just about cost or stability: free software is a movement that mustn’t forget the principles which made it possible

by Richard Stallman Richard’s ( ), from September 2004 to October 2004

MAGAZINE

BY SUBSCRIBING YOU WILL BE SUPPORTING A

MAGAZINE WHICH BELIEVES IN FREE SOFTWARE. ALL OUR ARTICLES ARE RELEASED UNDER THE GNU FREE DOCUMENTATION LICENSE, ENHANCING EXISTING INFORMATION ON FREE SOFTWARE.

SUBSCRIBE NOW!

WWW.FREESOFTWAREMAGAZINE.COM/SUBSCRIBE

Free Software Magazine n. 1, February 2005

OpenSource Labs

World-class suppor

“Open Source Labs, a division of Vyom Labs, is а centre of excellence providing world-class support, consultancy and training across the entire spectrum of open source technologies. Open Source Labs enables organizations worldwide to gain maximum value fram Open Source Software while reducing the total cost of ownership.”

реп Source Support Services We preside migration, insiallation, and post installation support and technical support өзімен lor most Open Source technaleqies = * Linux and MySQL, * Soltwane Engineering tools like CVS, ПЕ Software IT management tools like Nagios, MRTG, OTRS Messaging solutions inclading Qmail, Sencimail, WebMail, SpamAssassin, Thunderbird ' Custom Open Source development in PHP Perl [using MySQL or Postgses SOL servers)

wining Linux OS Administration Networking and Security RedHat Linux SuSe Linux

Open Source redhat ^ Labs

Authorized Red Hat Training Partnar power of freed om

vyOmlabs

Looking beyond the Horizon

Developer Course on Linux a OSL 101 Unix Shell scripting = OSL 102 PHP/Peil + MySQL

OpenSource Labs, (A division of Vyam Labs Pvt Ltd), Dayaprabha House, Opp Sulzer India ITI Road, Aundh, Pune 411007 INDIA. Ph: +91 382278700, 9120-25889236 e-mail ; contactus@opensource-labs.com

www.opensource-labs.com

EDITORIAL

= would have liked to start this editorial defining what free software is, but I found | myself writing and deleting my sentences time and again. The problem is that | free software means different things to different people. To some, free software is | a way to save money in licensing fees and technical support. To some, it's a way of sharing their skills (which they do for different reasons: research, personal development, money, etc). And to others free software is a movement, a way of life. Whatever the case, due to its many merits, free software's popularity is growing daily. Even non-geeks are discovering that most of the web sites that they visit run on free software (Apache); there is a valid alternative to Internet Explorer (Firefox); and their internet provider's network is secured by free software (Nexus, free firewall, etc). And yet, until today there hasn't been a single magazine dedicated entirely to free soft- ware. As the Editor in Chief, I'm very excited because I’ve always wanted to be involved in a project like this. Гуе always considered myself a “free software consultant and advocate", but I have never felt that I was giving enough back to the community. In a way, I consider Free Software Magazine to be my big opportunity and I believe that it's a big opportunity for free software, its users and its programmers. АП of the articles are released under a free license six weeks after publication. This means that we'll steadily build up a library of valuable material, which can then be used both in technical and non-technical discussions by the public at large. Now, this project is not risk-free. In the publishing industry you need numbers to make everything work. The more you print, the less you pay. The more readers you have, the more likely you are to get paying advertisers and so it goes on. At the moment, nobody really knows what these numbers will be for a magazine on free software, simply because there's never been one. I believe that we (myself, the staff, and the contributors) did a fantastic job, and it shows. If you don't think we did, and you believe that Free Software Magazine isn't up to stan- dard, please let us know - we welcome any criticism. If you believe in this project, please let the whole world know about it, use all those means that made great free software projects successful: talk about Free Software Magazine in your blog, user group mailing lists, social networks, professional web sites, IRC, etc. This way, you will help the magazine gain momentum and obtain the exposure it and free software deserve. РИ see you here next month!

Copyright information (c) 2005 by Tony Mobily Verbatim copying and distribution of this entire article is permitted in any medium without royalty provided this notice is preserved.

Free Software Magazine n. 1, February 2005

EDITOR IN CHIEF Tony Mobily

TECHNICAL EDITORS Clare James

Pancrazio De Mauro Gianluca Insolvibile

EDITORS Anna Dymitr Hawkes Dave Guard

TECHS

Gianluca Pignalberi (IATEX class and magazine generation)

Gian Maria Ricci (RTF to XML converter using VBA)

GRAPHIC DESIGN

Alan Sprecacenere (Web, cover and advertising design)

Tony Mobily, Gianluca Pignalberi, Alan design)

Sprecacenere (Magazine

THIS PROJECT EXISTS THANKS TO

Donald E. Knuth, Leslie Lamport, People at ТЕХ Users Group TUG (http: //www.tug.org)

For copyright information about the con- tents of Free Software Magazine, please see the section “Copyright information” at the end of each article. If an article is released under a free license six weeks after publication, and the six weeks are not over yet, you may not reproduce or retransmit the article, in whole or in part, in any manner, without the prior written consent of the author.

eal programmers love their applications’ source code: the faster and more elegant it is,

the better. Users are after very different things:

à.— they seem to want simplicity, flashy colors,

nice icons and tons of options. In spite of these reasons, or perhaps because of them, programmers and users often

forget what lies in the middle of it all: information.

Who owns the information?

Almost all software applications are used to manage infor- mation so these applications are worthless without informa- tion to process, store and display. For example, you could use a word processor to write letters or video editing suites to edit footage of your girlfriend at the beach.

If information exists before (and independent of) the appli- cations, the file format used to store the information should be defined before hand. In this ideal situation, you could potentially write several programs (released under free or non-free licenses) to handle your information.

Please keep in mind that here “information” means any kind of creative work: blog entries, private movies, essays, gov- In an ideal world, the format used to store this information doesn’t mat-

ernment reports, court rulings, road projects...

ter: it should simply belong only to its author, or whoever paid for its production.

Fig. 1: An OpenOffice RTF file opened with Word X for Macintosh

a a аа сангаа ка T гим. шаға L| T AmTfhhterytjfa[2? є peers? reden енттарріктікен 7? bbc?" user Т amd 7 more? clegamt T ATT RTT is The These т VT ease Tratt көс. finder TT cokan nice?! kena Tomo? epum 3318770 pna. TT hod T carpere! nd T ат берет very 7? alien T wart! hostia T T

vii TTE аА T7 eh the еее рее T od F! аныз? anda Т fen "Көбен The) Thur? ko manage? hast! or? hor"? ent alata TP They th are Th cet wien EN wi?! ЕНОТ? чет йе Р eters, Tan?! video Teleng sumer? ctis енім T reverig 71 77807? daa heon belong ы depend П ber? ap ————Ó——— ——— MR EORR

^ Thin problema bas? boe under cutimated Гое а Hong me reve CMM Üecomier 23, 2004 ira aii ar ia faut? ol ral rti] Ofen ттен Tetra pegar MM TT cH)

аа ль к SLE = "hxc | = кет

In practice, applications and file formats have historically grown and changed together. Moreover, the file formats for proprietary software have not always been documented (see Microsoft products) unless you sign unacceptable NDAs (Non-Disclosure Agreements); the result of this is that dig- ital information isn’t always under the complete control of the person who created it.

In my opinion this problem has been underestimated for a long time, probably because in the beginning people didn’t think it was such a big deal.

First of all, far fewer people had computers. When they did have them, they weren’t often networked and were physi- cally incompatible (think of Mac and PCs, which even had problems sharing a floppy disk!). Resources were very lim- ited: monitors, processors and hard drives weren’t even re- motely comparable to what we have today, and therefore visually “fancy” information wasn’t as important as it is to- day (think WYSIWYG). Even complex spreadsheets were

8 Free Software Magazine n. 1, February 2005

FOCUS

Fig. 2: The same OpenOffice RTF file opened with Word 2004 for Macintosh

Әле mi a oa mma ee ee ed io ge alia cee ed Б... .-........ Ше mi a i cee i

stored as CSV format (plain text separated by commas) or as binary files. Back then, there was a situation similar to to- day’s: if the information was stored as text files, you could use powerful text processing tools like sed, awk and then Perl. If it was stored in binary format, reverse engineering and black magic fixed most of the problems. Exchanging information at that point wasn’t often a problem; even when binary-only format became more common thanks to Word- Star and AutoCAD, the end product was nearly always a stack of paper that was to be shipped or archived some- where.

This paper could then be read even centuries after it was written, without a concern for what "brand" of paper, or which printer or pen had been used to write on it.

In a way, paper was the lingua franca.

Today, with the internet, CDs and search engines, any file can be used and distributed in several different ways without ever turning into durable, non proprietary (and non-searchable, I must add), printed paper. Talk about

progress...

Today's scenario

Today's scenario is somehow very similar to what it was a few years ago - just a bit more complicated. Proprietary file formats are now more complex than before and therefore harder to reverse-engineer. Text-based file formats are still based on text (obviously!), but they have gained a level of

complexity as well: rather than representing the information directly (like plain text documents or CSV spreadsheets do), they are usually based on XML.

For example, the content of a cell in OpenOffice.org could be represented with this:

<style:properties style:column-width-"1.785cm"/»

\ldots{}

«table:table-cell»«text:p»600000«/text:p» «/table:table-cell»

These two lines above simply state that the width of the col- umn containing this cell must be 1.785 cm and that the cell stores the number 600000.

A paragraph in a letter could be:

<p>This is Фе <b>first</b> paragraph </р>

<p>This is the second one</p>

The advantages of XML files are clear: anybody can write an application which manipulates them, as long as they know what every XML tag means in that specific context.

A word on encoding

Even "plain text" can mean different things, depending on how it's encoded. Тһе encoding defines which sequence of bits represents a particular character (such as a letter, a white space, symbols like “(с)” and “4”, and so on) used іп а written language.

In ASCII (American Standard Code for Information In- terchange), for example, the sequence “01000001” corre- sponds with the capital letter "A".

Even "plain text" can mean different things, depending on how it's encoded

The ASCII encoding (or format) is really ubiquitous these days, but has simply outlived its meaning in a wired world where most people don't speak English. Over the last few years many more types of encoding have been created in or- der to deal with almost any other language on the planet including non-alphabetic ones (Chinese, Hindu, Korean, Japanese. ..). The resulting confusion has been made worse by the fact that “plain text files" don't contain, by defini- tion, any headers to declare their internal encoding. Conse- quently, the programs processing them have to guess, or be told, which encoding they should use to display them; oth- erwise, blank or strange characters are displayed instead of the correct ones.

Free Software Magazine n. 1, February 2005 9

FOCUS

Today, the Unicode family of standards provides a defini- tive solution; unfortunately, it will take a lot of effort and time to have it accepted - not to mention used - everywhere. When, in 2002, Red Hat Linux switched to Unicode, many people complained on mailing lists because “everything had become slower, just to make the French happy".

The problem with closed file formats

Why should end users care at all about these issues? Ве- cause in the last two decades at least, file formats have been used to avoid free market competition, making it harder for customers to switch to newer and better products, or to place restrictions on how people use programs or the information produced with them.

In the last two decades at least, file formats have been used to avoid free market competition, making it harder for customers to switch to newer and better products

This is evident in fields as varied as office automation, in- dustrial design and video streaming. In the first case, al- most every user knows that the only guaranteed way to open “doc” or “.xls” files reliably is by using the same version of Microsoft Word or Excel which created them in the first place. Remember that this applies to all of your files, start- ing from your personal diary. . .

When it comes to engineering, many projects for build- ings, mechanical parts, furniture and bridges are stored in the DWG file format of AutoCAD, produced by Au- toDesk. In 1998, competitors launched cheaper products based on an equivalent format. AutoDesk's reaction was not limited to improving features, service and discounts. Their advertising campaign focused on reminding peo- ple that only AutoDesk's products were 100% capable of keeping existing projects completely accessible. The full story can be read online in the FAQ and history pages on the website of the Open Design Alliance (http://www. opendesign.com) founded just to create an alternative file format.

What about multimedia? МРЕС-4 is an advanced format for compressed video: DivX and many other decoders are based on it. Now, according to the MPEG-4 License page (www .mpegla.com/m4v/m4v-faq.cfm):

Fig. 3: The home page of the Open Design Alliance

ОРЕН DESIGN

ALLIAHTE'"-

Wis ны алға control of your. uas Cat and design са?

= 1

ыа bz En

Video providers who receive remuneration for of- fering MPEG-4 video either directly (e.g. sub- scription or title-by-title fees) or indirectly (e.g. advertising or underwriting fees) pay a royalty for the right to use the decoders and encoders to re- ceive and transmit the remunerated video.

The fear of having to pay MPEG-4 fees even when you place banners and video clips of your holidays on your home page has been enough to start projects like Theora (http: //www.theora.org).

XML: the savior?

The cases above are just a few examples of how file formats have effectively been used to enforce a much greater control on end users than was possible before.

The family of technologies known as XML (eXtensible Markup Language) can play an essential role in solving these problems (at least in some areas).

XML was designed to make it easy to exchange information rather than locking it.

XML was designed to make it easy to exchange information not easy to lock information

XML files are in a plain (Unicode!) text format similar to HTML. This alone makes reverse-engineering of XML files much easier, compared with binary formats. Of course text can never be as compact and fast to parse as pure binary data, but it has a huge advantage: it can be processed with

10 Free Software Magazine n. 1, February 2005

FOCUS

Fig. 4: The OASIS consortium’s home page

OASIS 4

ке. Ве 12.

any of the existing text-processing tools, known to and im- proved оп by Unix users since the '70s.

For example to extract the XML code shown above I only had to unzip the original spreadsheet and open the con- tent.xml file with a text editor.

However, XML is no more or less proprietary or open than binary formats. Its full benefits are only available when it’s completely and openly documented, guaranteed to stay that way and, above all, legally usable without asking permis- sion or paying fees to anybody.

Format wars: the next episode

The OASIS consortium (http: //www.oasis-—open. org) produces open XML standards in all fields of business and computing activity. Perhaps its most important achieve- ment is the OpenDocument format for word processing, spreadsheets and presentations, directly derived from the one used in OpenOffice.org and submitted to the Interna- tional Standard Organization (ISO).

OpenDocument is more powerful than XHTML and, unlike other formats, there are already some cross platform appli- cations which use it. OpenOffice.org 2.0, due for release around March 2005, will use it by default, and other prod- ucts, from Koffice (http://koffice.kde.org/) to ІВМ? Workplace and servers like Plone (http://www. plone.org/),can already read and write files in this for- mat. Just add a Firefox plug-in, and OpenDocument will be immediately accessible from your browser! For these reasons, some people think that it could eventually replace HTML as the default format for the internet.

However, there is no need to look that far. There is something much more important already happening today. Namely, the European Union (EU) wants to make it possi-

ble for all EU public administrations to (re)take ownership of the documents they manage on behalf of their citizens. In order for this to happen, these administrations, or any- body willing to do business with them, will eventually have to produce, exchange and store files in the right format.

In 2003 an EU study called the Valoris Report (http: //europa.eu.int/ida/en/document/3439) concluded that an XML file format, highly portable and very open, is required to reach this goal. Тһе report mentions the efforts in this field by Sun (OpenOf- fice.org/OASIS) and Microsoft (MSXML), pointing out several limitations of the latter. The main limit is the fact that Microsoft prefers not to completely separate the file format from the applications. They would much rather assist selected partners in enabling their applications to read and interoperate with MSXML. It doesn’t sound like much of a concession, does it?

This episode of the Format Wars is still being quietly fought while we’re writing (mid December 2004): stay tuned for further news. Hopefully, if it all ends as it should, the utopia described at the beginning of the article, can come into be- ing: that file formats are defined before and independently of any implementation, in any field of computing.

Conclusions

In my opinion, one of the best signs that software is still in its infancy is the way this issue of formats has been ignored so far, by professionals and casual users alike. Luckily the tide has started to turn. Things like OpenDocument are cer- tainly steps in the right direction. Nobody can predict what combinations of proprietary and free software will be used twenty years from now. The most probable guess is that there will be a lot of them, and each user will be free to choose the best combination for his or her real needs. In any case, I hope that in twenty years the era where infor- mation is locked up by proprietary and application specific formats will be just a laughable memory.

Copyright information

© 2005 by Marco Fioretti

Verbatim copying and distribution of this entire article is permitted in any medium without royalty provided this no- tice is preserved.

About the author

Free Software Magazine n. 1, February 2005 11

his article weighs the pros and cons of XML for some applications (publishing), and explores why it is the best possible solution for many L programming and publishing needs. Everywhere you turn these days, someone is talking Ex- tensible Markup Language (XML). Jump into a discussion about publishing - XML is touted as a means of exchang- ing information. Talk with someone about the new software tool she is creating - she describes setting up some of her actions in XML. Ask a webmaster what he’s been doing - he raves about the dynamic content he’s serving up to site visitors using XML from a database. In short, XML is a great solution to a wide variety of challenges, and it seems to be everywhere. But is it the cure for every data or content challenge? The simple answer is, no.

Not everyone needs XML to make things work. For some small organizations, publishing processes are straightfor- ward enough that the costs of implementing an XML so- lution may not be worthwhile. But the only way to be sure is to perform a thorough examination of the business pro- cesses and review cycles that produce information products - most organizations and companies grossly underestimate the amount of information they could potentially reuse in publishing, and overestimate the costs of reusing that infor- mation with an XML-based solution. And they’re not aware of the breadth of available free tools that can get them well on the road to their XML destination.

What is XML?

XML is meta markup language that is used to create new markup languages. It’s most commonly used to create tag sets and processing instructions that describe structured content for presentation in text documents, but it can also be used to describe, manage, and deliver content of all types (text, images, voice, forms, multimedia files, and so on) and to transform transactional data between disparate database

systems.

Unlike Hypertext Markup Language (HTML), which is a display markup language with a predefined list of tag sets designed solely to control how information is presented in a web browser, XML presents content in an open, standards- based, media-neutral, operating system-agnostic, platform- independent format. XML is extensible because it allows organizations to define their own sets of tags, each with a meaningful (semantic) “name”. Semantic names (or tags) are more useful than generic HTML tags because they can describe content in real-world, user-friendly and context- specific ways. For instance, the XML tag «product name> is much more descriptive than the HTML tag <h2>.

In a traditional word processing environment, the formatting data is stored with the content it governs, and changes to the formatting involve changes to the content itself. XML's strength comes in its ability to separate content from for- matting data, thus allowing authors to create content with-

12 Free Software Magazine n. 1, February 2005

FOCUS

out spending unnecessary time formatting that information. XML style sheets control the formatting of the content be- ing created, and specify how it will be presented in each medium.

XML content can therefore be automatically transformed (with the help of style sheets) from a single text source into a variety of information products (printed product brochures, web site content, wireless content, etc.) each with its own look and feel. And, XML content can be personalized and delivered dynamically on the fly, based on the specific re- quirements of the end user.

XML also differs from HTML in that it allows documenta- tion to be processed by computer software programs, thus allowing organizations to reuse content from disparate data repositories, and recombine that data in ways and in var- ious media - not possible with HTML. XML supports sin- gle source content reuse, and allows organizations to make changes to a content element (like a product description) and have those changes reflected instantly and automatically in every information product that uses that information, re- gardless of the medium. This ability to reuse information and to make changes once and have them appear globally saves organizations considerable time and money revising, updating, and translating content.

XML content is also “validated” against document guide- lines encoded in a Document Type Definition (DTD) and can enforce standards on the authors who develop content. This ability is particularly useful in validated or regulated environments (life sciences companies, legal firms, automo- bile and aerospace industries, the financial sector) in which completeness, consistent structure, and accuracy of infor- mation are all essential, if costly regulatory compliance and legal issues are to be avoided.

W3C Goals for XML

After the world wide web explosion, web users were in- undated with miles of good and bad HTML, and the W3C sought a better solution for publishing, cataloguing, locat- ing, retrieving and archiving data. The guidelines they set for this “something better than HTML” resulted in the de- velopment of XML. The “design goals” for XML, which set it aside from HTML, include the following (source: W3C (http: //www.w3c.com)).

1. XML shall be straightforwardly usable over the Inter- net.

2. XML shall support a wide variety of applications.

3. XML shall be compatible with SGML.

4. It shall be easy to write programs, which process XML documents.

5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

6. XML documents should be human-legible and reason- ably clear.

7. The XML design should be prepared quickly.

8. The design of XML shall be formal and concise.

9. XML documents shall be easy to create.

10. Terseness in XML markup is of minimal importance.

This article focuses primarily on the second W3C require- ment for XML, that it plays well with a variety of tools that perform various tasks. And since the potential uses of XML are countless, and space is limited, we’ve restricted our scope to the use of XML in publishing.

XML Uses

In the publishing arena, XML is used by authoring and con- tent management tools. Authors use the XML elements and attributes to produce documents. Content management tools use the XML elements and attributes as data that can be re- trieved or marked for reuse.

Is this the answer to everything? Well, in the publishing world the answer is sometimes “по”, because affordable publishing can sometimes be accomplished without the help of XML - XML would be overkill. However, XML often is the best option for organizations that take the time to evaluate their content lifecycle and to examine how much it costs to create, maintain, translate, deliver, store, reuse, archive, and retire content. A recent study by ZapThink (“XML in the Content Lifecycle Foundation Report Cre- ating, Managing, Publishing, Syndicating, and Protecting Content with XML”) found that the biggest - and most ex- pensive - challenge for most organizations today is con- tent reuse. The study found that “Producers of content in the enterprise spend over 60% of their time locating, for- matting, and structuring content and just 40% of their time actually creating (Source: ZapThink (http://www. zapthink.com/report.html?id-ZTR-CL100)) The sad fact is, most organizations don't know how much their content creation and management efforts cost them, апа so they assume that XML is not for them. Тһе real- ity is that the only way to know whether XML is the right choice for your organization's publishing needs is to seek the assistance of a content management expert who can per- form an organizational needs analysis, a content lifecycle analysis, and an audit of your existing content. Additional

Free Software Magazine n. 1, February 2005 13

FOCUS

services offered by content management consultants include customer needs analysis, tools recommendations and assis- tance calculating return on investment. Analysis often iden- tifies obstacles to change (tools, processes, and people) that will need to be addressed before you adopt XML as a pub- lishing solution. Once you know how much it costs, and what obstacles you’ll face, you can make an informed busi- ness decision about whether to move to XML publishing or not.

XML does provide a lot of options. Exchanging content, for example, is often easier and more affordable with XML than it is with proprietary tools like Microsoft Word. Rather than saving content in a proprietary format, authors can output their document content into XML and pass it along to col- leagues or customers who need the content but who may use other authoring and publishing tools. Additionally, XML makes reuse of information easier since formatting data is separated from XML content. Separating content from for- mat is one of the biggest productivity gains an organization can obtain by adopting XML.

Exchanging content, for example, is often easier and more affordable with XML than it is with proprietary tools like Microsoft Word

XML content may be used to produce one document, and that same XML content can then be harnessed to create ad- ditional documents, each with a completely different look and feel. Alternatively, the same XML content can be dynamically served up to various audiences in different chunks or in different sequences using other technologies (see “XSLT”, below). This represents a degree of flexibility

that HTML simply doesn't offer.

Free XML Authoring Tools

There are a wide variety of free XML authoring tools available for download on the internet. Each has its own strengths and weaknesses, and no one free tool does it all (i.e. your mileage may vary).

Check them out and learn as much as you can about XML authoring before you decide to employ any particular tool:

e Altova Authentic (http://www.altova.com/ products, doc.html)

e XML Cooktop com/)

(http://www.xmlcooktop.

e Open XML Editor (http://www.philo.de/ xmledit/) e Xray2 (http: //architag.com/xray/)

XML-related Technologies

Jonathan Robie (http://www.gca.org/papers/ xmleurope2001/papers/bio/s13-lauth2. html), an XML Research Specialist at Software AG, once exclaimed, “XML doesn't do anything!” In its purest sense, this is true; by itself, XML will not magically repurpose content for multiple media or audiences. XML doesn't pro- vide formatting in the absence of additional technologies. In order to make XML “look good", or turn it into a final deliverable, some assistance from format-conscious tech- nologies is required... but on the other hand, no amount of such formatting technology can turn ugly-duckling HTML content into a coterie of media swans.

XSL and XSLT

In the HTML world, Cascading Style Sheets (“CSS” files) make HTML display as desired... cause XML separates content from its formatting data, you

in a web browser. Be-

must employ additional technologies to format XML, allow- ing it to display as you wish. XML can be formatted a few different ways. You can bring XML content into XML- based tools to change its appearance. (You can also use HTML to format XML.) The XML formatting and trans- forming language (Extensible Stylesheet Language, Trans- form, “XSLT” for short) can adjust XML output for various display purposes. When you have multiple media in which you want to present your content, XML is far more flexible than its HTML ancestors.

XSLT uses the tags within an XML document to control formatted output. Formatting XML content can be as sim- ple as adding bold to a <companyname> tagged object. The formatting can be as complex as telling all of the pieces of an invoice, for example, to display in a certain font, point size, style, etc. in a table and make the table content "sortable" by any of the tags used in your XML content. Free software tools used for XSLT include Saxon and Xalan (and others). Each allows you to perform transforms with- out moving your XML content into a proprietary tool that will “trap” you into using that tool in future.

Saxon, created by Michael Kay, is available in several flavors. The “lite” version allows you to do transformations on any PC running the Java Runtime Environment (JRE).

14 Free Software Magazine n. 1, February 2005

FOCUS

Saxon is available via Michael Kay’s SourceForge web- site (http://saxon.sourceforge.net/)The JRE is java.com (http://www. java.com/en/download/

available from multiple sites, including windows automatic. jsp).

Xalan is an XSLT processor designed to transform XML documents into HTML, text, or other XML document types and is available via The Apache XML Project (http: //

xml.apache.org/xalan- j/)among other sites.

Free software tools used for XSLT include Saxon and Xalan (and others). Each allows you to perform transforms without moving your XML content into a proprietary tool that will "trap" you into

using that tool in future

A good resource for more information on working with XSLT and XML is Mitch Amiano's free software collec- tion, the “Agile Markup Toolkit", which is available at no cost. The CD itself contains several dozen free software installations and links. Any software on the CD also in- cludes reference information that indicates where it came from, allowing you to update as new releases become avail- able. Mitch is a big user of free software, very involved in the free software community, and is also a user of the tools he has gathered on this CD.

Visit the Agile Markup Toolkit’s web site (http: // home.agilemarkup.com/index.php?option- contentN&task-viewN&id-55N&Itemid-29) for more information about "Agile Markup Toolkit".

XSL-FO

Another subset of XSL is XSL-FO. The FO stands for “for- matting objects." XSL-FO provides a means for formatting XML for presentation. More information on its capabilities is available at the W3C website (http: //www.w3.org/ TR/xs1/).

XQuery

Some companies may be publishing information stored in a database or even stored as XML. XQuery al- lows you to query XML, similar to the way SQL is

used to access databases. More information, and a

great overview, are available from Data Direct Technolo- gies (http: //www.datadirect.com/techzone/ xml/basics/basics/index.ssp).

XML Performance

How has XML met with (http: //www.w3.org)? Certainly there are many XML-driven websites. Check out Safari, CNN, Fidelity, and Wired, among others.

the W3C expectations

These are dynamically gen- At Fidelity, XML ties together web and back-end systems to deliver

erated pages with XML behind the scenes.

hundreds of thousands of transaction per hour to its web site customers. Fidelity says it’s realizing millions of dollars of savings in infrastructure and development costs by eliminating the need for transformation of data between the company’s disparate database systems and by reducing (by 50%) the number of web application servers through InternetWeek (http://www. internetweek.com/newslead01/ lead080601.htm)).

In publishing, XML has proven beneficial for creating mate-

which customer data travels. (Source:

rials derived from information stored in a database or pub- lishing information that developers have created in XML. Some tools can open the XML and style it, providing para- graph formatting along with page layout (and in publishing, presentation is everything!). Such tools, which can auto- matically style XML, make publishing data easier and more affordable than traditional publishing methods.

However, XML can slow performance, Ш not inte- grated properly and appropriately planned for. “Re- search by IBM Labs shows that even small XML- based documents can increase the CPU cost of a rela- tional database transaction by up to 10 times in the ab- The re-

search concluded that XML parsing could have a 'po-

sence of a dedicated XML processing engine.

tentially fatal impact’ on high-performance, transaction- oriented database applications that use XML.” (Source: nwfusion.com (http: //www.nwfusion.com/news/ 2004/0503xmlaccel.html)). Hardware vendors аге rushing to develop new gigabit-speed silicon to address the spread of XML and the processing problems it can some- time cause.

Again, it’s important to employ a content management ex- pert with experience in planning and implementing XML solutions before you adopt XML in your organization. XML is a business solution, not an IT solution. Employ it only after developing and conducting a thorough analy- sis of your organizational business needs, the needs of your

Free Software Magazine n. 1, February 2005 15

customers, and after evaluating your content lifecycle. Тһе results should yield a unified strategy for XML use across your enterprise that will provide measurable benefits and a positive return on investment.

Conclusion

XML isn’t the universal panacea... but it is often prefer- able to alternatives. Particularly in publishing applications, which represent so many ways data can be caught up in proprietary systems, it’s a good idea to use non-proprietary technologies for content authoring, management and deliv- ery, and it’s crucial to assess and quantify the potential pay- backs of XML versus HTML systems.

Copyright information

© 2005 by Kay Ethier, Scott Abel This article is made available under the “Attribution- NonCommercial-NoDerivs” Creative Commons License

2.0 available from http://creativecommons.org/licenses/by- ne-nd/2.0/.

Kay Ethier is an Adobe Certified Expert in FrameMaker 7.x and several prior versions. She instructs training classes, performs consulting, and provides support to clients in a variety of industries. Kay resides in the Research Trian- gle Park area of North Carolina and works for Bright Path Solutions (http://www.travelthepath.com). In 2001, Kay co-authored the book XML Weekend Crash Course (Wi- ley/Hungry Minds). She has most recently been a contribut- ing author on Advanced FrameMaker (TIPS Technical Pub- lishing) and XML and FrameMaker (Apress).

Scott Abel is a technical writing specialist and content management strategist whose strengths lie in helping orga- nizations improve the