...ma il forum...

ally · 07-08-2004, 13:44

...sta diventando troppo grande?...

...boh...sarà sta divisione di off topic...saranno le sottosezioni di modding o di giochi...ma veramente comincio a sentiirmi un po' spaesato...una volta per dare un occhio alle 3/4 sezioni che mi interessavano ci mettevo pochi minuti...ora devi quasi prenderti le ferie

/\/\@®¢Ø · 07-08-2004, 14:03

Quote:

Originariamente inviato da ally
...sta diventando troppo grande?...

...boh...sarà sta divisione di off topic...saranno le sottosezioni di modding o di giochi...ma veramente comincio a sentiirmi un po' spaesato...una volta per dare un occhio alle 3/4 sezioni che mi interessavano ci mettevo pochi minuti...ora devi quasi prenderti le ferie

Siamo nell'era dell'automazione

.
Uno scriptino python che si scarica le pagine delle sezioni che ti interessano e ti avvisa dei nuovi thread e via

.

OzzYRulez · 07-08-2004, 14:55

Quote:

Originariamente inviato da /\/\@®¢Ø
Siamo nell'era dell'automazione

.
Uno scriptino python che si scarica le pagine delle sezioni che ti interessano e ti avvisa dei nuovi thread e via

.

Interessa....dove come quando e....boh?!!?

|unknown| · 07-08-2004, 15:17

Quote:

Originariamente inviato da OzzYRulez
Interessa....dove come quando e....boh?!!?

armati di blocco note e scrivi

/\/\@®¢Ø · 07-08-2004, 17:31

Io uso questo... e' codice buttato la', (diciamo pure che e' uno schifo

), ma il suo lavoro lo fa.
Semplicemente pesca fuori i nuovi thread nelle sezioni indicate (nel caso 7,34,38 ovvero Linux, Programmazione e... boh

) e te li spedisce per posta all'indirizzo indicato. Devi solo cambiare le prime righe, dove devi specificare dove hai salvato il programma (data_path), il destinatario della posta (recipient) il server tramite cui mandarlo (smtpserver), il mittente (fromadd) e l'elenco dei forum. Lo scheduli che parta ogni qualche ora e sei a posto.

Codice:

import sys,re,time,urllib,smtplib,urllib2
# Change those

data_path = '/dati/Coding/forum_parser/'
recipient = '[email protected]'
smtpserver = 'localhost:25'
fromadd = '[email protected]'
forum_list = [ '7' , '34' , '38' ]
verbose_level = 4

def log( level , txt ):
    if level < verbose_level:
        print txt

def download( uri ):
    log(3,'downloading page ' + uri )
    x = urllib2.urlopen( uri ).read()
    log(3,'done !')
    return x

def prepare_html( topic_list ):
    body = ("<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n<html><body>"
         + "<p>The following topics appeared on the Forum:</p>\n\n" )
    for m in topic_list:
        (link,title) = m
        body = body + "<p><a href=\"" + forum + link + title +"</a></p>\n"
    body = body + "</body></html>"
    return (body,'text/html')


class forum_parser:
    def __init__( self , name , uri , fetch , forum_list , regex , num_regex , seen_name ):
        'Subject of the mail'
        self.name = name
        'Base address of the forum'
        self.uri = uri
        'Fetch page base uri'
        self.fetch = fetch
        'Forum numbers to be appended to the fetch page'
        self.forum_list = forum_list
        'Regular expression used to find the link and the title'
        'it needs two groups, the first to get the (relative) link, the second to get the topic title'
        self.p = re.compile(regex,re.IGNORECASE|re.DOTALL )
        'Regular expression used to find the topic number from the thread link'
        self.find_num = re.compile(num_regex)
        self.seen_file = seen_name
        'Has the topic to appear or not in the seen list ?'
        self.must_appear = False
        self.load( data_path + seen_name )

    def load(self,file_name):
        log ( 1 , 'loading ' + file_name )
        self.seen = {}
        try:

            c = file(file_name).readlines()
            for i in c :
                self.seen[ i.strip() ] = True
        except IOError:
            self.seen = {}
        except ValueError:
            log(1,'Warning: invalid data found in seen_file: using default values')
            self.seen = {}

    def save(self):
        f = file(data_path + self.seen_file,'w')
        for i in self.seen:
            f.write( str(i) + "\n")

    def generate( self , topic_list ):
        body = "The following topics appeared on " + self.name + ":\n\n"
        for m in topic_list:
            title = m['title']
            link = m['link']
            body = body + title + "\n" + self.uri + link + "\n\n\n"
        return (body,'text/txt')

    def get_num( self , uri ):
        return self.find_num.search( uri ).group(1)


    def parse( self , contents ):
        log(3,'parsing...')
        iterator = self.p.finditer( contents )
        unknown = []
        for match in iterator:
            link,title = match.group(1),match.group(2)
            thread_num = self.get_num( link )
            unknown.append( { 'link':link,'title':title , 'thread_num':thread_num } )
        unknown = self.remove_unwanted(unknown)
        if len( unknown ) > 0:
            return self.generate( unknown )
        else:
            return ('','')

    def remove_unwanted(self,link_list):
        '''Filter out unwanted topics, and return a new list'''
        log(3,'removing unwanted topics')
        result = []
        for i in link_list:
            log(3,'considering thread ' + i['thread_num'] )
            if self.seen.has_key( i['thread_num'] ) == self.must_appear:
                log(3,"I'm going to keep it")
                result.append( i )
                self.seen[ i['thread_num'] ] = True
        return result


    def load_page(self,page):
        p = self.uri + self.fetch + page
        try:
            return download(p)
        except IOError:
            log( 0 , "Warning: unable to fetch page " + p )
            return ''

    def go(self):
        page = ''
        for p in self.forum_list:
            page = page + self.load_page(p)
        if page != '':
            (body,content_type) = self.parse( page )
            if ( body != '' ):
                log( 1 , "Sending message to " + recipient )
                message = ("To: " + recipient
                    + "\nFrom: " + fromadd
                    + "\nSubject: " + self.name
                    + "\nDate:" + time.strftime("%a, %d %b %Y %H:%M:%S -0000", time.gmtime())
                    + "\nContent-type: " + content_type +" ; encoding=utf-8"
                    + "\n\n" + body )
                serv = smtplib.SMTP(smtpserver)
                serv.sendmail(fromadd,recipient,message)
                self.save()
            else:
                log( 0 , "No new topics,exiting..." )
        else:
            log( 0 , "No pages loaded, connection down ?" )




hw = forum_parser( 'HWUpgrade Forum' ,
                   "http://forum.hwupgrade.it/",
                   'forumdisplay.php?s=&forumid=',
                   forum_list ,
                   "<a href=\"(showthread.php\?s=[0-9a-zA-Z]*\&threadid=\d+)\"><b>([^<.]*?)</b>" ,
                   r'threadid=(\d+)'
                   , 'visti.txt' )

hw.go()

Il risultato ti arriva in posta in posta in formato HTML nella forma seguente:

Quote:

Il codice potrebbe essere piu' breve in effetti... ma lo uso anche per altri forum e quindi ho dovuto generalizzare un po'.

/\/\@®¢Ø · 07-08-2004, 17:32

ufff ma perche' il tasto quote e' cosi' maledettamente vicino al tasto modifica?

07-08-2004, 13:44	#1
ally Bannato Iscritto dal: Jan 2003 Città: Messaggi: 4421	...ma il forum... ...sta diventando troppo grande?... ...boh...sarà sta divisione di off topic...saranno le sottosezioni di modding o di giochi...ma veramente comincio a sentiirmi un po' spaesato...una volta per dare un occhio alle 3/4 sezioni che mi interessavano ci mettevo pochi minuti...ora devi quasi prenderti le ferie

07-08-2004, 17:32	#6
/\/\@®¢Ø Bannato $L'Avatar di /\/\@®¢Ø$ Iscritto dal: Jul 2000 Città: Malo (VI) Messaggi: 1000	ufff ma perche' il tasto quote e' cosi' maledettamente vicino al tasto modifica?
$/\/\@®¢Ø è offline$

Strumenti
Mostra una versione stampabile Invia questa pagina per email