|
|
|
![]() |
|
Strumenti |
![]() |
#1 |
Bannato
Iscritto dal: Jan 2003
Città:
Messaggi: 4421
|
...ma il forum...
...sta diventando troppo grande?...
...boh...sarà sta divisione di off topic...saranno le sottosezioni di modding o di giochi...ma veramente comincio a sentiirmi un po' spaesato...una volta per dare un occhio alle 3/4 sezioni che mi interessavano ci mettevo pochi minuti...ora devi quasi prenderti le ferie ![]() |
![]() |
![]() |
![]() |
#2 | |
Bannato
Iscritto dal: Jul 2000
Città: Malo (VI)
Messaggi: 1000
|
Re: ...ma il forum...
Quote:
![]() Uno scriptino python che si scarica le pagine delle sezioni che ti interessano e ti avvisa dei nuovi thread e via ![]() |
|
![]() |
![]() |
![]() |
#3 | |
Senior Member
Iscritto dal: Sep 2002
Città: Prato
Messaggi: 446
|
Re: Re: ...ma il forum...
Quote:
![]()
__________________
~ TheBosZ ~ My Music Setup: MusicMan John Petrucci ~Mystic Dream~ 6 strings '04; Fender Telecaster American '10; Gibson LesPaul Custom White '96; Fender Stratocaster Vintage Sunburst '00 ; Fender Hot Rod Deluxe; Marshall JTM 45 Reissue ; Framus 2x12 Celestion Vintage ; Ts9 Replica ; Face Fuzz Replica ; Small Clone ; Holy Grail ; MD-3 MicroDelay; ![]() |
|
![]() |
![]() |
![]() |
#4 | |
Senior Member
Iscritto dal: Jul 2003
Città: Casa (VE)
Messaggi: 1659
|
Re: Re: Re: ...ma il forum...
Quote:
![]()
__________________
- |
|
![]() |
![]() |
![]() |
#5 | |
Bannato
Iscritto dal: Jul 2000
Città: Malo (VI)
Messaggi: 1000
|
Io uso questo... e' codice buttato la', (diciamo pure che e' uno schifo
![]() Semplicemente pesca fuori i nuovi thread nelle sezioni indicate (nel caso 7,34,38 ovvero Linux, Programmazione e... boh ![]() Codice:
import sys,re,time,urllib,smtplib,urllib2 # Change those data_path = '/dati/Coding/forum_parser/' recipient = '[email protected]' smtpserver = 'localhost:25' fromadd = '[email protected]' forum_list = [ '7' , '34' , '38' ] verbose_level = 4 def log( level , txt ): if level < verbose_level: print txt def download( uri ): log(3,'downloading page ' + uri ) x = urllib2.urlopen( uri ).read() log(3,'done !') return x def prepare_html( topic_list ): body = ("<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n<html><body>" + "<p>The following topics appeared on the Forum:</p>\n\n" ) for m in topic_list: (link,title) = m body = body + "<p><a href=\"" + forum + link + title +"</a></p>\n" body = body + "</body></html>" return (body,'text/html') class forum_parser: def __init__( self , name , uri , fetch , forum_list , regex , num_regex , seen_name ): 'Subject of the mail' self.name = name 'Base address of the forum' self.uri = uri 'Fetch page base uri' self.fetch = fetch 'Forum numbers to be appended to the fetch page' self.forum_list = forum_list 'Regular expression used to find the link and the title' 'it needs two groups, the first to get the (relative) link, the second to get the topic title' self.p = re.compile(regex,re.IGNORECASE|re.DOTALL ) 'Regular expression used to find the topic number from the thread link' self.find_num = re.compile(num_regex) self.seen_file = seen_name 'Has the topic to appear or not in the seen list ?' self.must_appear = False self.load( data_path + seen_name ) def load(self,file_name): log ( 1 , 'loading ' + file_name ) self.seen = {} try: c = file(file_name).readlines() for i in c : self.seen[ i.strip() ] = True except IOError: self.seen = {} except ValueError: log(1,'Warning: invalid data found in seen_file: using default values') self.seen = {} def save(self): f = file(data_path + self.seen_file,'w') for i in self.seen: f.write( str(i) + "\n") def generate( self , topic_list ): body = "The following topics appeared on " + self.name + ":\n\n" for m in topic_list: title = m['title'] link = m['link'] body = body + title + "\n" + self.uri + link + "\n\n\n" return (body,'text/txt') def get_num( self , uri ): return self.find_num.search( uri ).group(1) def parse( self , contents ): log(3,'parsing...') iterator = self.p.finditer( contents ) unknown = [] for match in iterator: link,title = match.group(1),match.group(2) thread_num = self.get_num( link ) unknown.append( { 'link':link,'title':title , 'thread_num':thread_num } ) unknown = self.remove_unwanted(unknown) if len( unknown ) > 0: return self.generate( unknown ) else: return ('','') def remove_unwanted(self,link_list): '''Filter out unwanted topics, and return a new list''' log(3,'removing unwanted topics') result = [] for i in link_list: log(3,'considering thread ' + i['thread_num'] ) if self.seen.has_key( i['thread_num'] ) == self.must_appear: log(3,"I'm going to keep it") result.append( i ) self.seen[ i['thread_num'] ] = True return result def load_page(self,page): p = self.uri + self.fetch + page try: return download(p) except IOError: log( 0 , "Warning: unable to fetch page " + p ) return '' def go(self): page = '' for p in self.forum_list: page = page + self.load_page(p) if page != '': (body,content_type) = self.parse( page ) if ( body != '' ): log( 1 , "Sending message to " + recipient ) message = ("To: " + recipient + "\nFrom: " + fromadd + "\nSubject: " + self.name + "\nDate:" + time.strftime("%a, %d %b %Y %H:%M:%S -0000", time.gmtime()) + "\nContent-type: " + content_type +" ; encoding=utf-8" + "\n\n" + body ) serv = smtplib.SMTP(smtpserver) serv.sendmail(fromadd,recipient,message) self.save() else: log( 0 , "No new topics,exiting..." ) else: log( 0 , "No pages loaded, connection down ?" ) hw = forum_parser( 'HWUpgrade Forum' , "http://forum.hwupgrade.it/", 'forumdisplay.php?s=&forumid=', forum_list , "<a href=\"(showthread.php\?s=[0-9a-zA-Z]*\&threadid=\d+)\"><b>([^<.]*?)</b>" , r'threadid=(\d+)' , 'visti.txt' ) hw.go() Quote:
Ultima modifica di /\/\@®¢Ø : 07-08-2004 alle 17:36. |
|
![]() |
![]() |
![]() |
#6 |
Bannato
Iscritto dal: Jul 2000
Città: Malo (VI)
Messaggi: 1000
|
ufff ma perche' il tasto quote e' cosi' maledettamente vicino al tasto modifica?
![]() ![]() |
![]() |
![]() |
![]() |
Strumenti | |
|
|
Tutti gli orari sono GMT +1. Ora sono le: 06:30.