Alaska Software Inc. - extracting text and images
Username: Password:
AuthorTopic: extracting text and images
Zdenko Bielikextracting text and images
on Tue, 05 Oct 2010 12:51:46 +0200
Hi gurus,

I need save content of word document as separate text files and images.

e.g.: from attached file I need save

- first section of "a" characters like file "section-1.txt",
- second section of "b" characters like file "section-2.txt",

- red image like file "image-1.jpg",

- third section of "c" characters like file "section-3.txt",
- four section of "d" characters like file "section-4.txt",

- blue image like file "image-2.jpg",
- yellow image like file "image-3.jpg",

- 5 section of "e" characters like file "section-5.txt",

I have none experience with ActiveX.
Please, can someone help me with this?

TIA & Regards
                        Zdeno 




ActiveX test.doc
AUGE_OHRRe: extracting text and images
on Tue, 05 Oct 2010 18:08:18 +0200
hi,

> I need save content of word document as separate text files and images.

start Word Macro Recorder (ALT-F11)
do "manuell" your "Action" as you like to do
stop Macro

> I have none experience with ActiveX.

now open Macro and look "inside".
if you have any "Code" we can help you to "translate" it

> Please, can someone help me with this?

im not shure if Word can do this. How will Word "recognize" what you want ?
i think you have to use a OCR for it

greetings by OHR
Jimmy
Zdenko BielikRe: extracting text and images
on Tue, 05 Oct 2010 19:03:56 +0200
Hi Jimmy,

> im not shure if Word can do this. How will Word "recognize" what you want 
> ?
hmmm, I understand...
so, other question: is it possible save whole text from doc file in one txt 
file and included images in separate files?

Regards
             Zdeno
Thomas Braun
Re: extracting text and images
on Wed, 06 Oct 2010 15:42:56 +0200
Zdenko Bielik wrote:

> hmmm, I understand...
> so, other question: is it possible save whole text from doc file in one txt 

Yes - you can do a "save as..." operation and specify text as the target
format. When doing this with the macro recorder, you will get VB code
similar to this:

Sub Makro1()
'
' Makro1 Makro
' Makro aufgezeichnet am 06.10.2010 von Thomas Braun
'
    ChangeFileOpenDirectory _
        "C:\Dokumente und Einstellungen\thomas.braun\Desktop\"
    ActiveDocument.SaveAs FileName:="dsfsa.txt", FileFormat:=wdFormatText, _
        LockComments:=False, Password:="", AddToRecentFiles:=True, WritePassword _
        :="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
        SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
        False, Encoding:=1252, InsertLineBreaks:=False, AllowSubstitutions:=False _
        , LineEnding:=wdCRLF
End Sub

> file and included images in separate files?

You could save the file as a html document and take the pictures from the
subfolder generated by word:

http://vbadud.blogspot.com/2010/05/how-to-retrieve-images-of-word-document.html

Thomas
Zdenko BielikRe: extracting text and images
on Thu, 07 Oct 2010 12:46:02 +0200
Hi  Thomas,

thanks for info and link. Unfortunately, I don't know, how translate VB code 
to Xbase++...
I will try ask Jimmy.

Thanks
            Zdeno
Zdenko BielikRe: extracting text and images
on Thu, 07 Oct 2010 12:49:55 +0200
Hi Jimmy, (or someone else),

> Yes - you can do a "save as..." operation and specify text as the target
> format. When doing this with the macro recorder, you will get VB code
> similar to this:


> You could save the file as a html document and take the pictures from the
> subfolder generated by word:
>
> http://vbadud.blogspot.com/2010/05/how-to-retrieve-images-of-word-document.html


can you help me now with my problem?
Thomas posted any workaround and link with VB solution,
but I don't know, how can be this translated into Xbase++...

TIA
          Zdeno
AUGE_OHRRe: extracting text and images
on Thu, 07 Oct 2010 17:48:05 +0200
hi,

>> Yes - you can do a "save as..." operation and specify text as the target
>> format. When doing this with the macro recorder, you will get VB code
>> similar to this:
> ...

there is not much to "translate" (see later)

>> You could save the file as a html document and take the pictures from the
>> subfolder generated by word:
>>
>> http://vbadud.blogspot.com/2010/05/how-to-retrieve-images-of-word-document.html
>
> can you help me now with my problem?
> Thomas posted any workaround and link with VB solution,
> but I don't know, how can be this translated into Xbase++...

when you "SaveAs" Html, Word will create a Subfolder
so you just have to search with Directory() for your Picture.

The Name of Subfolder, in my German Version, is FileName+"-Dateien"
so you have to look what Name your Version use.

greetings by OHR
Jimmy

*** Code ***


* allways use "full" Path


#include "activex.ch"
#include "common.ch"

#define wdFormatHTML    8

PROCEDURE MAIN(cFile,cSaveAs)
LOCAL oWord,oDoc
LOCAL cPath    := CURDRIVE()+":\"+CurDir()+"\"

DEFAULT cFile         TO "ActiveX test.doc"
DEFAULT cSaveAs  TO "ActiveX test.htm"

   IF FILE(cPath+cFile)
     oWord := CreateObject("Word.Application")
     IF Empty( oWord )
       MsgBox( "Microsoft Word not installed" )
     ENDIF

     oWord:visible := .T.
 open DOC
     oWord:documents:open( cPath+cFile )
     oDoc := oWord:ActiveDocument
 saveAs HTML
     oDoc:saveas(cPath+cSaveAs,wdFormatHTML)
 close DOC
     oDoc:close()
     oWord:Quit()
     oWord:destroy()

   now search for Subfolder with Directory()

  ELSE
     MsgBox( "File "+cPath+cFile+" not found" )
  ENDIF

RETURN

*** EOF ***
Zdenko BielikRe: extracting text and images
on Thu, 07 Oct 2010 19:26:52 +0200
Hi Jimmy,

thank you!!! Works great!


> #define wdFormatHTML    8
Please, can you post here all other possible "define constants" for file 
types?

TIA
       Zdeno
AUGE_OHRRe: extracting text and images
on Thu, 07 Oct 2010 20:59:25 +0200
hi,

>> #define wdFormatHTML    8
> Please, can you post here all other possible "define constants" for file 
> types?

while it is different for each Office Version you have to "generate" it 
yourself

TLB2CH.EXE "WORD.APPLICATION" /o:MyWord.CH

greetings by OHR
Jimmy