Up to the TUG homepage
Up to Converters between LaTeX and PC Textprocessors homepage

Converters from PC Textprocessors to LaTeX - Overview

Switch conversion direction: From LaTeX to PC

Author: Wilfried Hennings, Forschungszentrum (Research Center) Jülich GmbH
last update (including subpages): July 31, 2001
The url of this page is http://tug.org/utilities/texconv/pctotex.html

I maintain these pages because I need converters between LaTeX and PC Textprocessors for my work and I want to share the information with others who need it. Because I maintain them in my spare time (uh, what is spare time?), I can not answer individual questions.

This list is as good or as bad as its support, and I need YOUR support to update and supplement this list. Please supplement if you know more and/or better ones. There are some more converters on the CTAN sites, but the following seem to be most promising for conversion to and from the current versions of wordprocessors.

Neither correctness nor completeness is guaranteed.
All opinions mentioned (if any) are my own, not my employer's. Please send corrections, enhancements and supplements to the following address:
W.Hennings@fz-juelich.de

Note that this FAQ list contains information about converters ONLY between LaTeX and PC word processors. Converters to and from other formats may have own FAQ lists - e.g. see the link for converters to and from HTML.


For the impatient, here is a table with overview of features of the most recent converters.


General Remarks

Before looking for a converter, stop and think about a principal question:

What do you want to be converted in which way?

Do you want to convert the document structure, i.e. a heading should remain a heading, a list should remain a list etc., no matter how it will look like in the target format?
Or do you want to convert the appearance, i.e. how it looks like, no matter how it is represented in the target format?
Or do you want a mixture of both?
For using SGML as an intermediate format, you would have to specify the translation rules yourself (as far as I understood). This makes sense, and explains why different people have very different opinions about which converter best fits their needs: They simply have different demands and expectations on what should be converted and how.
So, not only practically there is no converter which is good for everyone and every purpose, but this is even principally impossible because there are no well-defined requirements which a converter should meet.

So keep this in mind when looking through the following list of converters, try yourself and decide what you need.

Principal problems of wordprocessor to LaTeX conversion

One advantage of LaTeX is that it forces to structure a document, whereas wordprocessors like Word/WordPerfect allow unstructured documents. It is hardly possible to automatically structure a document where there was no structure before.

However it is nevertheless possible to write a structured document with a wordprocessor by consistently using styles. Therefore, wordprocessor documents using styles can be converted to LaTeX e.g. by a macro written for the specific wordprocessor.

There are several ways to convert

To illustrate these, let me restrict it to the Microsoft Word case:

  1. Word binary format -> LaTeX
  2. RTF (Word ASCII format, use Word's own RTF export) -> LaTeX
  3. WordPerfect 5.1 format (use Word's own export) -> LaTeX
  4. HTML (use Word's internet assistant or built-in html converter) -> LaTeX
  5. maybe other external format(s)

The converters being most complete, undergoing further development and having support are:

rtf2latex2e - free standalone converter for Mac, PC, and Unix, and

word2tex - shareware, word export filter for PC

Publishing Companion - commercial converter for PC


Using a Word macro

Free:

winw2ltx: A set of macros for WinWord 2, now also available for WinWord 6 and 7 (95)

Commercial:

MathType: PC equation editor with export to LaTeX. MathType home page (USA)


Using a Word export filter

Shareware:

Word2TeX: This converter can save documents from Word6/Word7(=95) or later as LaTeX, including equation editor (!) objects and MathType objects.
Converts:

(*) restrictions will apply in unregistered Word2TeX: only 7 first equations will be translated, only 1 first table will be translated, only 1 first figure will be translated and, also, unregistered Word2TeX will never save own settings.

For a complete list of features, visit its homepage.


Converting from Word binary format

Free:

LAOLA: LAOLA can read Word6/Word7(=95) documents under Unix and extract the text. LAOLA homepage (DE site)

word2x: Converts Word6/Word7(=95) documents to LaTeX or plain text. word2x homepage (UK site)

antiword: A free MS Word reader for Linux, BeOS and RISC OS. It converts the binary files from Word 6, 7, 97 and 2000 to text and Postscript. See antiword homepage. A user's comment: "It is still a bit incomplete, but I found it to be rather useful. Moreover, it is available fore a wider-than-usual range of platforms."

wvWare is a library that can read the Word6/Word7(=95), Word8(=97) and Word9(=2000) binary file format. See wvWare homepage (Ireland site). The wvWare library is used as import library in the wordprocessor AbiWord (see below).
Its predecessor MSWordView could only read Word8(=97) and convert word into html, which can then be read with a browser.

The free (GPL) wordprocessor AbiWord can import Word format (by using the aforementioned wvWare) and export to LaTeX format. AbiWord runs on BeOS, several Unix's and also Windows95/98/NT and stores documents as XML.


Converting from RTF

To use an RTF converter, the wordprocessor document must first be "saved as" Rich Text Format. However each new version of MS Word came with a new level of the RTF language. Most of the available converters cannot understand the current RTF version

Free:

rtf2latex2e new (2000) version which also can read current rtf levels. Now also converts equations (courtesy Steve Swanson, http://www.mackichan.com)
rtf2latex2e homepage and download site (USA site)

RTF2LaTeX, a patch for WP2LaTeX that allows it to convert also RTF documents. Experimental Release 0.4 (works, but it knows only small group of commands). See its homepage

older versions:
rtflatex understands only older RTF levels
rtf2latex understands only older RTF levels. RTF utilities homepage (USA site)
w2latex understands only older RTF levels

The free (GPL) wordprocessor AbiWord can import rtf and export to LaTeX format. AbiWord runs on BeOS, several Unix's and also Windows95/98/NT and stores documents as XML.

Commercial:

Scientific Word: Win95/98/2000/NT4 based TeX/LaTeX system with graphical editor and rtf import capability including MS's equation editor equations. The rtf import converter is basically the same as the new rtf2latex2e.
Scientific Word home page (USA)


Converting from WordPerfect format

Free:

WP2LaTeX: converts WordPerfect 3.x / 4.x / 5.x / 6.x / 7.x / 8.x, including equations, to LaTeX. homepage

TeXPerfect: WordPerfect 5.1 for DOS -> LaTeX Translater

Commercial:

Publishing Companion: converts Word/WordPerfect, including equations, to LaTeX. Comes with own equation editor. KTALK's home page (USA)


HTML as intermediate format

Wordprocessor to HTML

There are free HTML converters for Word 6 and 7 for Windows available from Microsoft:
Download... IA for Word 6 / IA for Word 7 / IA for Word for Mac
Word 97 contains an html converter by default, but in contrary to the previous versions it only recognizes heading styles if they are first converted into the corresponding html styles. Also, it sometimes inserts unnecessary tags.
Word 9 (2000) also contains an html converter by default, but you should not use this default: It actually creates sort of XML with many Word-specific elements. Instead, for saving as "clean" html, download and install the add-on converter from Microsoft.

WordPerfect 7 and up have an integrated InternetPublisher.
For WordPerfect 6.1 for Windows, the InternetPublisher is available separately:
Download... InternetPublisher for WPWin 6.1

There also is a tool for Unix which is intended to convert word6, word7(95) and word8(97) binary files to html. See http://www.su.shuttle.de/turbo/word2html.c.gz

Also see www.w3.org for a list of converters between word processors and HTML .

HTML to LaTeX

Because HTML is a structured format, the conversion between HTML and LaTeX is rather straightforward. However there remain the limitations of HTML compared to LaTeX, i.e. there are many elements in LaTeX which can not (yet?) be represented in HTML.

There are several HTML-to-LaTeX converters available. Without giving recommendations:

Frans Faase's html2tex (NL site) (C source)

Peter Thatcher's html2latex at sourceforge.net (Perl script)

Jeffrey Schaefer's html2latex at www.geom.umn.edu (Perl script)

Some converters are available from CTAN ("Comprehensive TeX Archive Network"), e.g. in .../support/html2latex. However, what you can find in CTAN under .../support/html2latex/ is Nathan Torkington's converter of 1993 -- rather outdated.
(The ... stands for a host specific base directory, which often is either "/pub/tex" or "/tex-archive")

Also see www.w3.org for a list of converters between HTML and LaTeX.


Other intermediate formats

There are ways to use SGML as intermediate format, and others have used it successfully. Having had a quick look at it, I found it rather complicated, especially it seems that you have to define the translation rules yourself. So I did not put more effort in trying to use it. If anyone can give a ready-to-use cookbook solution, I will include it here.


Converting from FrameMaker

FrameMaker Utilities (UK site): Contains converters for both directions (LaTeX <-> FrameMaker) as well as templates which make conversion from Framemaker to LaTeX more easy


Converting from NotaBene

NB4LATEX: converts files from NotaBene4 (including ancient Greek and all the symbols of logic) to LaTeX2e format. homepage


Converting from Excel

Excel-macro to convert Excel to Latex: http://www.jam-software.com/software.html

On CTAN in .../support/excel2latex/

The generated LaTeX code uses the tabular environment.


This HTML page is part of the texcnv site.
Copyright © 1998, 1999, 2000, 2001 Wilfried Hennings
You may copy and redistribute it under the following conditions:

Please also note the disclaimer.