PDA

View Full Version : [C] Funzione che da XHTML\HTML trasformi in TEXT: X\HTML2TEXT.


Matrixbob
22-04-2007, 14:07
Qualcosa di efficiente, semplice e senza tanti fronzoli.

Googlando qualcosa si trova, ad esempio questo:
http://userpage.fu-berlin.de/~mbayer/tools/html2text.html

, ma č troppo complicato e dispersivo.
Senza contare che ne MinWG ne CigWin riescono a terminare la compilazione. :(

lovaz
22-04-2007, 14:58
Prova questo:
http://www.nirsoft.net/utils/htmlastext.html

Dovrebbe andare anche da command line

Matrixbob
22-04-2007, 15:22
Prova questo:
http://www.nirsoft.net/utils/htmlastext.html

Dovrebbe andare anche da command line

E' 1 freeware non open source.
Preferivo qualcosa che sia tipo 1 libreria da fare:
include "x-html2txt.c" nel mio Main. :(

Matrixbob
22-04-2007, 15:40
Chiedo a voi perchč magari qualcuno ci č gią passato e perchč cercare tutte queste possibili keyword con attenzione č 1 po' ardua:
HTML2TXT, htmltotxt, html_to_txt, html_convert, txt_convert, HTML markup remover, html_tool.

... e poi tutto com la X innanzi. :O

Matrixbob
22-04-2007, 16:25
Neanche:
http://code.google.com/

trova qualcosa di "buono".

Matrixbob
22-04-2007, 21:33
quello che ho trovato NON pulisce questa robaccia :(

<div class="feedflare">
<a href="http://feeds.wsjonline.com/~f/wsj/xml/rss/3_7011?a=BLH6Vx1e><img
src="http://feeds.wsjonline.com/~f/wsj/xml/rss/3_7011?i=BLH6Vx1e" border="0"></img></a>
<a href="http://feeds.wsjonline.com/~f/wsj/xml/rss/3_7011?a=Tifvjja7"><img
src="http://feeds.wsjonline.com/~f/wsj/xml/rss/3_7011?i=Tifvjja7" border="0"></img></a>
<a href="http://feeds.wsjonline.com/~f/wsj/xml/rss/3_7011?a=IHlNc3H8"><img
src="http://feeds.wsjonline.com/~f/wsj/xml/rss/3_7011?i=IHlNc3H8" border="0"></img></a>
<a href="http://feeds.wsjonline.com/~f/wsj/xml/rss/3_7011?a=MEhhYBSl"><img
src="http://feeds.wsjonline.com/~f/wsj/xml/rss/3_7011?i=MEhhYBSl" border="0"></img></a>
</div>
<img src="http://feeds.wsjonline.com/~r/wsj/xml/rss/3_7011/~4/110997774" height="1" width="1"/>

Ma che č 1 mix tra HTML ed XHTML?!
Non mi pare puramente nessuno dei 2.

Matrixbob
23-04-2007, 17:51
Ho trovato la "funzione/programma" che leggerete a seguire.
Ma gią nel "main" c'č qualcosa che non mi piace ed č questo:

RBuffer = (char*) malloc( BUFFER_LEN );
while( fgets( RBuffer, BUFFER_LEN, InFile ) != NULL ){
dfConvert( RBuffer );
fputs( RBuffer, OutFile );
}// endwhile
free( RBuffer );

fgets mi pare sia 1 funzione che legge al MAX quel tot_caratteri dopo di che si ferma.
Se il file č + lungo allora son cavoli e bisogna modificare la MACRO.
Giusto?!


/*
* HTML2TXT
*
* Copyright 2000 Matteo Baccan <mbaccan@planetisa.com>
* www - http://www.infomedia.it/artic/Baccan
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA (or visit
* their web site at http://www.gnu.org/).
*
*/

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <io.h>

#define BUFFER_LEN 4096 // buffer lenght
#define CV_EXITPARA 1 // parameter error
#define CV_EXITOPEN 2 // open error
#define CV_EXITEXIST 3 // file exist

void dfPrintLogo(void); // logo
void dfPrintInfo(void); // parameter info
void dfPrintOpenError( const char * InFile ); // open error
void dfPrintExist( const char * OutFile ); // file Exist
void dfConvert( char * Buffer ); // convert string
int dfCheckChar( int iLen,
char *Buffer,
int iPointer,
char *Check,
int iCheckLen );

int main(int argc, char *argv[]) {
FILE *InFile, *OutFile; // file declaration
char *RBuffer; // Read Buffer

if( argc < 3 ){ // check parameter
dfPrintInfo();
exit( CV_EXITPARA );
}// endif

dfPrintLogo();

if( access( argv[2], 00 ) >= 0 ){ // file exist
dfPrintExist( argv[2] );
exit( CV_EXITEXIST );
}// endif

if( (InFile = fopen( argv[1], "r")) == NULL ){ // open file
dfPrintOpenError( argv[1] );
exit( CV_EXITOPEN );
}// endif

if( (OutFile = fopen( argv[2], "w")) == NULL ){ // create file
dfPrintOpenError( argv[2] );
exit( CV_EXITOPEN );
}// endif

printf("\n ž Reading %s\n", argv[1] ); // Convert

RBuffer = (char*) malloc( BUFFER_LEN );
while( fgets( RBuffer, BUFFER_LEN, InFile ) != NULL ){
dfConvert( RBuffer );
fputs( RBuffer, OutFile );
}// endwhile
free( RBuffer );

printf("\n ž OK \n" );

fclose( InFile ); // Close File
fclose( OutFile );
return 0;

}// end of main


void dfPrintInfo(){
printf("\n");
printf("ŚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄæ\n");
printf("³²±° HTML2TXT Converter HTML to TXT Version 2.00 °±²³\n");
printf("³²±° Copyright 1997-2000 The Wonderful Team All Rights Reserved °±²³\n");
printf("³²±° ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ °±²³\n");
printf("³²±° Usage: HTML2TXT <InFile> <OutFile> °±²³\n");
printf("³²±° °±²³\n");
printf("³²±° InFile = File 2 convert into TXT °±²³\n");
printf("³²±° °±²³\n");
printf("³²±° OutFile = File 2 save °±²³\n");
printf("ĄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄŁ\n");
}// end of print info

void dfPrintLogo(){
printf("\n");
printf("ŚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄæ\n");
printf("³²±° HTML2TXT Converter HTML to TXT Version 2.00 °±²³\n");
printf("³²±° Copyright 1997-2000 The Wonderful Team All Rights Reserved °±²³\n");
printf("ĄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄŁ\n");
}// end of print logo


void dfPrintOpenError( const char * InFile ){
printf("\n ž Error opening file %s\n", InFile );
}// end of print error


void dfPrintExist( const char * OutFile ){
printf("\n ž Output file %s Exist \n", OutFile );
}// end of print Exist


void dfConvert( char * Buffer ){
static int iStart=0;
int iPos,iPointer;
int iLen=strlen( Buffer );

iPos=iPointer=0;
while( iLen-->0 ){
iPointer++;

// BlockQuote
if( dfCheckChar( iLen, Buffer, iPointer, "<BLOCKQUOTE>", 12 ) ){
Buffer[iPos++]='"';
iPointer+=11;
continue;
}
if( dfCheckChar( iLen, Buffer, iPointer, "</BLOCKQUOTE>", 13 ) ){
Buffer[iPos++]='"';
iPointer+=12;
continue;
}

// LineBreak
if( dfCheckChar( iLen, Buffer, iPointer, "<BR>", 4 ) ){
Buffer[iPos++]=0x0d;
iPointer+=3;
continue;
}
if( dfCheckChar( iLen, Buffer, iPointer, "</BR>", 5 ) ){
Buffer[iPos++]=0x0d;
iPointer+=4;
continue;
}

// Citation
if( dfCheckChar( iLen, Buffer, iPointer, "<CITE>", 6 ) ){
Buffer[iPos++]='"';
iPointer+=5;
continue;
}
if( dfCheckChar( iLen, Buffer, iPointer, "</CITE>", 7 ) ){
Buffer[iPos++]='"';
iPointer+=6;
continue;
}

// Tab
if( dfCheckChar( iLen, Buffer, iPointer, "<TD>", 4 ) ){
Buffer[iPos++]=9;
iPointer+=3;
continue;
}
if( dfCheckChar( iLen, Buffer, iPointer, "</TD>", 5 ) ){
Buffer[iPos++]=9;
iPointer+=4;
continue;
}

// HTML Command Skipper
if( Buffer[iPointer-1]=='<' ){
if( Buffer[iPointer]!='\0' ){
if( Buffer[iPointer]>='a' && Buffer[iPointer]<='z' ) iStart=1;
if( Buffer[iPointer]>='A' && Buffer[iPointer]<='Z' ) iStart=1;
if( Buffer[iPointer]=='!' ) iStart=1;
if( Buffer[iPointer]=='/' ) iStart=1;
}
}
if( Buffer[iPointer-1]=='>'&& iStart==1 ) {
iStart=0;
continue;
}

if( iStart==0 ){
if( Buffer[iPointer-1]=='&' ){
if( dfCheckChar( iLen, Buffer, iPointer, "&lt;" , 4 ) ){ Buffer[iPos++]='<'; iPointer+=3; continue; } //4
if( dfCheckChar( iLen, Buffer, iPointer, "&gt;" , 4 ) ){ Buffer[iPos++]='>'; iPointer+=3; continue; } //4
if( dfCheckChar( iLen, Buffer, iPointer, "&amp;" , 5 ) ){ Buffer[iPos++]='&'; iPointer+=4; continue; } //5
if( dfCheckChar( iLen, Buffer, iPointer, "&quot;" , 6 ) ){ Buffer[iPos++]='"'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&Aacute;" , 8 ) ){ Buffer[iPos++]=' '; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Agrave;" , 8 ) ){ Buffer[iPos++]='…'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Acirc;" , 7 ) ){ Buffer[iPos++]='ƒ'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&Atilde;" , 8 ) ){ Buffer[iPos++]='†'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Aring;" , 7 ) ){ Buffer[iPos++]=''; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&Auml;" , 6 ) ){ Buffer[iPos++]='„'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&AElig;" , 7 ) ){ Buffer[iPos++]='’'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&Ccedil;" , 8 ) ){ Buffer[iPos++]='‡'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Eacute;" , 8 ) ){ Buffer[iPos++]='‚'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Egrave;" , 8 ) ){ Buffer[iPos++]='Š'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Ecirc;" , 7 ) ){ Buffer[iPos++]='ˆ'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&Euml;" , 6 ) ){ Buffer[iPos++]='‰'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&Iacute;" , 8 ) ){ Buffer[iPos++]='”'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Igrave;" , 8 ) ){ Buffer[iPos++]=''; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Icirc;" , 7 ) ){ Buffer[iPos++]='Œ'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&Iuml;" , 6 ) ){ Buffer[iPos++]='‹'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&ETH;" , 5 ) ){ Buffer[iPos++]='Ń'; iPointer+=4; continue; } //5
if( dfCheckChar( iLen, Buffer, iPointer, "&Ntilde;" , 8 ) ){ Buffer[iPos++]='¤'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Oacute;" , 8 ) ){ Buffer[iPos++]='¢'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Ograve;" , 8 ) ){ Buffer[iPos++]='•'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Ocirc;" , 7 ) ){ Buffer[iPos++]='“'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&Otilde;" , 8 ) ){ Buffer[iPos++]='”'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Ouml;" , 6 ) ){ Buffer[iPos++]='”'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&Oslash;" , 8 ) ){ Buffer[iPos++]='0'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Uacute;" , 8 ) ){ Buffer[iPos++]='£'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Ugrave;" , 8 ) ){ Buffer[iPos++]='—'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&Ucirc;" , 7 ) ){ Buffer[iPos++]='–'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&Uuml;" , 6 ) ){ Buffer[iPos++]=''; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&Yacute;" , 8 ) ){ Buffer[iPos++]='Y'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&THORN;" , 7 ) ){ Buffer[iPos++]='č'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&szlig;" , 7 ) ){ Buffer[iPos++]='į'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&aacute;" , 8 ) ){ Buffer[iPos++]=' '; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&agrave;" , 8 ) ){ Buffer[iPos++]='…'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&acirc;" , 7 ) ){ Buffer[iPos++]='ƒ'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&atilde;" , 8 ) ){ Buffer[iPos++]='†'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&aring;" , 7 ) ){ Buffer[iPos++]=''; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&auml;" , 6 ) ){ Buffer[iPos++]='„'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&aelig;" , 7 ) ){ Buffer[iPos++]='‘'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&ccedil;" , 8 ) ){ Buffer[iPos++]='‡'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&eacute;" , 8 ) ){ Buffer[iPos++]='‚'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&egrave;" , 8 ) ){ Buffer[iPos++]='Š'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&ecirc;" , 7 ) ){ Buffer[iPos++]='ˆ'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&euml;" , 6 ) ){ Buffer[iPos++]='‰'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&iacute;" , 8 ) ){ Buffer[iPos++]='”'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&igrave;" , 8 ) ){ Buffer[iPos++]=''; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&icirc;" , 7 ) ){ Buffer[iPos++]='Œ'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&iuml;" , 6 ) ){ Buffer[iPos++]='‹'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&eth;" , 5 ) ){ Buffer[iPos++]='Ń'; iPointer+=4; continue; } //5
if( dfCheckChar( iLen, Buffer, iPointer, "&ntilde;" , 8 ) ){ Buffer[iPos++]='¤'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&oacute;" , 8 ) ){ Buffer[iPos++]='¢'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&ograve;" , 8 ) ){ Buffer[iPos++]='•'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&ocirc;" , 7 ) ){ Buffer[iPos++]='“'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&otilde;" , 8 ) ){ Buffer[iPos++]='”'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&ouml;" , 6 ) ){ Buffer[iPos++]='”'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&oslash;" , 8 ) ){ Buffer[iPos++]='0'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&uacute;" , 8 ) ){ Buffer[iPos++]='£'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&ugrave;" , 8 ) ){ Buffer[iPos++]='—'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&ucirc;" , 7 ) ){ Buffer[iPos++]='–'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&uuml;" , 6 ) ){ Buffer[iPos++]=''; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&yacute;" , 8 ) ){ Buffer[iPos++]='Y'; iPointer+=7; continue; } //8
if( dfCheckChar( iLen, Buffer, iPointer, "&thorn;" , 7 ) ){ Buffer[iPos++]='č'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&yuml;" , 6 ) ){ Buffer[iPos++]='˜'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&reg;" , 5 ) ){ Buffer[iPos++]='(';
Buffer[iPos++]='r';
Buffer[iPos++]=')'; iPointer+=4; continue; } //5
if( dfCheckChar( iLen, Buffer, iPointer, "&copy;" , 6 ) ){ Buffer[iPos++]='(';
Buffer[iPos++]='c';
Buffer[iPos++]=')'; iPointer+=5; continue; } //6
if( dfCheckChar( iLen, Buffer, iPointer, "&trade;" , 7 ) ){ Buffer[iPos++]='t';
Buffer[iPos++]='m'; iPointer+=6; continue; } //7
if( dfCheckChar( iLen, Buffer, iPointer, "&nbsp;" , 6 ) ){ Buffer[iPos++]=' '; iPointer+=5; continue; } //6

// &#number
if( Buffer[iPointer]=='#' ){
// May be a Number
int nCount = 0;
while( iLen-(nCount+1)>0 && // I have char?
Buffer[iPointer+1+nCount]>='0' && // Are number ?
Buffer[iPointer+1+nCount]<='9' ){
nCount++;
}

// If I have number .. try to cenvert it
if( nCount>0 ){
int nDmm = 0;
int nChar = 0;
int nMul = 1;
while( nDmm<nCount ){
nChar += (Buffer[iPointer+nCount-nDmm]-48)*nMul;
printf( "%d\n", Buffer[iPointer+nCount-nDmm]-48 );
printf( "%d\n", nMul );
nMul *= 10;
nDmm++;
}
if( nChar>0 ){
Buffer[iPos++]=nChar;
iPointer+=nDmm+1;
continue;
}
}
}
}

if( Buffer[iPointer-1]==0x0d &&
Buffer[iPointer ]==0x0d ){
iPointer++;
continue;
}

Buffer[iPos++]=Buffer[iPointer-1];
}
}
Buffer[iPos++]='\0';
}

// convert string
int dfCheckChar( int iLen,
char *Buffer,
int iPointer,
char *Check,
int iCheckLen ){

int iRet=0;
if( iLen+1 >= iCheckLen ){
iPointer--;
while( *Check!='\0' ){
if( Buffer[iPointer] ==*Check ||
(Buffer[iPointer]|32)==*Check ){
iPointer++;
Check++;
} else break;
}
iRet = (*Check==0 || *Check==13 || *Check==';' || *Check==' ');
}

return iRet;
}

Matrixbob
23-04-2007, 18:13
Ho accantonato quella perchč pensavo che la mia sia meglio, ma anche la mia s'impianta.

[NB]
Quello che segue č solo la parte che da problemi del mio programma.

int crea_mod_file(char *infile, char *outfile)
{
FILE *fp_infile, *fp_outfile;
char a, *tmp_string, *new_string;
int i=0;

if((fp_infile = fopen(infile, "r")) == NULL)
{
printf("\nError opening file %s.\n", infile);
return -1;
}

if((fp_outfile = fopen(outfile, "w")) == NULL)
{ // create file
printf("\nError opening file %s.\n", outfile);
return -2;
}

tmp_string=(char*) malloc(SIZEBUF);
memset(tmp_string, 0, SIZEBUF);

while((fgets(tmp_string, (SIZEBUF-1), fp_infile))!=NULL)
{
new_string=mod_escaped_html(tmp_string);
fputs(new_string, fp_outfile);
memset(tmp_string, 0, SIZEBUF);
}

free(tmp_string);
fclose(fp_infile);
fclose(fp_outfile);
return 0;
}
/******************************/

/* Funzione che conforma i cartatteri di una stringa */
char *mod_escaped_html(char *tmp_buffer)
{
tmp_buffer = subst_string(tmp_buffer, "&lt;", "<");
tmp_buffer = subst_string(tmp_buffer, "&gt;", ">");
tmp_buffer = subst_string(tmp_buffer, "&nbsp;", " ");
tmp_buffer = subst_string(tmp_buffer, "&agrave;", "ą");
tmp_buffer = subst_string(tmp_buffer, "&egrave;", "č");
tmp_buffer = subst_string(tmp_buffer, "&eacute;", "é");
tmp_buffer = subst_string(tmp_buffer, "&igrave;", "ģ");
tmp_buffer = subst_string(tmp_buffer, "&ograve;", "ņ");
tmp_buffer = subst_string(tmp_buffer, "&ugrave;", "ł");
tmp_buffer = subst_string(tmp_buffer, "&laquo;", "<");
tmp_buffer = subst_string(tmp_buffer, "&raquo;", ">");
tmp_buffer = subst_string(tmp_buffer, ">", ">");
tmp_buffer = subst_string(tmp_buffer, "&deg;", "°");
tmp_buffer = subst_string(tmp_buffer, "&amp;", "&");
tmp_buffer = subst_string(tmp_buffer, "&rsquo;", "'");
tmp_buffer = subst_string(tmp_buffer, "&apos;", "'");
tmp_buffer = subst_string(tmp_buffer, "&quot;", "\"");
return tmp_buffer;
}

Matrixbob
23-04-2007, 18:15
:edit:

Apparentemente risolto.

lovaz
24-04-2007, 08:41
...
fgets mi pare sia 1 funzione che legge al MAX quel tot_caratteri dopo di che si ferma.
Se il file č + lungo allora son cavoli e bisogna modificare la MACRO.
Giusto?!
...
Non mi pare, e' un ciclo, finche' non e' finito il file continua a leggere...

Matrixbob
24-04-2007, 09:09
Non mi pare, e' un ciclo, finche' non e' finito il file continua a leggere...

Ma siamo sicuri che fgets legga 1 stringa e poi 1 altra e poi 1 altra?
Cosģ non c'č il rischio di caricare 1/2 tag in memoria?!

lovaz
24-04-2007, 09:23
Si', legge una stringa alla volta, ma e' inserito in un ciclo while,
legge finche' non e' != NULL

Matrixbob
24-04-2007, 09:26
Si', legge una stringa alla volta, ma e' inserito in un ciclo while,
legge finche' non e' != NULL

Quindiha 1 puntatore di file alla stringa da leggere?
Io ho paura che li legga ogni volta la stessa stringa...

lovaz
24-04-2007, 10:21
Ma scusa, compila e provalo, cosi' vedi se converte giusto...

Matrixbob
24-04-2007, 10:23
Ma scusa, compila e provalo, cosi' vedi se converte giusto...

Si converte, ma secondo me c'č la possibilitą di prendere 1/2 TAG nel caso il "Signore" o il "tool" che ha fatto la pagina web non abbia dato il "\n" nei momenti opportuni, ma alla caz. :)

lovaz
24-04-2007, 10:26
E prova, dagli un html con un tag "proprio li'", intorno alla posizione 4096