PDA

View Full Version : Programma per contare la frequenza di una parola in un documento


Careca
05-04-2006, 01:55
Ciao.

Sto cercando un programma possibilmente freeware che analizza un documento di testo (txt, doc, rtf) e conta le parole più utilizzate e fa altre statistiche di questo tipo con magari un bel grafico.

Un programma del genere si chiama di analisi lessicale.

Qualche consoglio?

Grazie
Ciao

Blackie
05-04-2006, 09:20
www.funsci.com

Careca
05-04-2006, 09:58
www.funsci.com

Ciao. Grazie per la segnalazione. Il sito è veramente molto bello e interessante...solo che lo avevo già trovato e mi serviva qualcosa di più semplice da utilizzare visto che devo analizzare parecchi testi.

Puoi aiutarmi?

Grazie
Ciao

Blackie
05-04-2006, 10:47
Basta stampare i documenti e sottolineare con matite di diversi colori le parole che devi contare, poi pagare un paio di amici che facciano i conti.... :O
Sto scherzando :D :D
Non saprei che programmi consigliarti, c'era un modo per farlo con Word ma non mi ricordo come.
Puoi provare dei programmi per scrivere (Openoffice,Abiword,Tomahawk) e vedere se hanno la funzione che ti serve. I tre che ti ho segnato sono programmi free.

Careca
05-04-2006, 11:31
Basta stampare i documenti e sottolineare con matite di diversi colori le parole che devi contare, poi pagare un paio di amici che facciano i conti.... :O

Scusa ma non avevo detto freeware? :D :Prrr:

Sto scherzando :D :D

Anch'io ;)
Non saprei che programmi consigliarti, c'era un modo per farlo con Word ma non mi ricordo come.
Puoi provare dei programmi per scrivere (Openoffice,Abiword,Tomahawk) e vedere se hanno la funzione che ti serve. I tre che ti ho segnato sono programmi free.

Pensavo anch'io che si potesse fare con word ma non ho trovato nulla (solo il numero di parole, frasi...)
Se ti dovesse venire in mente tienimi presente. ;)

Grazie
Ciao

Blackie
05-04-2006, 11:51
Il sistema c'è con word, attraverso una funzione del correttore automatico, mi sembrava di averlo letto in un'altra discussione del forum, guarda bene nella sezione programmi e utility

caviccun
05-04-2006, 12:48
Il sistema c'è con word, attraverso una funzione del correttore automatico, mi sembrava di averlo letto in un'altra discussione del forum, guarda bene nella sezione programmi e utility

un programma così lo si può fare in pascal , con poche righe di codice (senza grafico) . potresti chiedere anche nel forum "linguaggi". se non hai premura provo a scriverlo (non te lo assicuro) io e te lo invio.

Careca
05-04-2006, 14:23
un programma così lo si può fare in pascal , con poche righe di codice (senza grafico) . potresti chiedere anche nel forum "linguaggi". se non hai premura provo a scriverlo (non te lo assicuro) io e te lo invio.

Ti ringrazio ma un programma del genere me l'ha segnalato già Blackie.
Sto cercando qualcosa di più user-friendly per gestire parecchi documenti.

x Blackie: non sono riuscito a trovare niente...non ti ricordi qualche altro particolare?

Grazie ad entrambi
Ciao

Careca
05-04-2006, 15:36
Sub WordFrequency()


Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '


' Set up excluded words
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "")


' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ")
If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If


Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count


' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord < "A" Or SingleWord > "z" Then SingleWord = ""
'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded. Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword


' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j


' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment = wdAlignParagraphCenter
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Number of different words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum))
System.Cursor = wdCursorNormal
' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words ", vbOKOnly, "Finished")
Selection.HomeKey wdStory


End Sub

---------------------------
Doug Robbins - Word MVP

Careca
05-04-2006, 15:52
Sub WordFrequencyPerCartella()

Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim i, j, k, l, q, Temp As Integer 'Temporary variables
Dim tword As String
Dim Ans As String
Dim totalwords As Long
Dim aword As Object
Dim sDocumenti As String
Dim tmpname As String
Dim DirName As String


With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Seleziona la cartella contenente i file da analizzare"
.Show
If .SelectedItems.Count = 0 Then Exit Sub
DirName = .SelectedItems(1)
End With


' Set up excluded words
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,surrounding each word with [ ].", "Excluded Words", "")


' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ")
If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If


WordNum = 0


With Application.FileSearch
.NewSearch
.LookIn = DirName
.SearchSubFolders = False 'impostare a True per analizzare anche sottocartelle
.FileName = "*.doc"
.MatchTextExactly = True
'.FileType = msoFileTypeWordDocuments


If .Execute() > 0 Then
MsgBox "Trovati " & .FoundFiles.Count & "file(s)."
For i = 1 To (.FoundFiles.Count)
Documents.Open FileName:=.FoundFiles(i)
With ActiveDocument


If sDocumenti = "" Then
sDocumenti = .Name
Else
sDocumenti = sDocumenti + vbCrLf + .Name
End If


Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
ttlwds = ttlwds + ActiveDocument.Words.Count
totalwords = totalwords + ActiveDocument.Words.Count


' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord < "A" Or SingleWord > "z" Then SingleWord = ""
'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword


.Close wdDoNotSaveChanges
End With
Next i
Else
MsgBox "Nessun file trovato"
End If
End With


' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j


' Now write out the results
tmpname = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpname, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment = wdAlignParagraphCenter
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Total words in Documents"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Number of different words in Documents"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum))
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Documenti analizzati:"
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore sDocumenti


System.Cursor = wdCursorNormal
j = MsgBox("There were " & Trim(Str(WordNum)) & " different words", vbOKOnly, "Finished")
Selection.HomeKey wdStory


End Sub
'------------------------------------------------------------------


--
Tiziano Marmiroli
Microsoft MVP - Office

caviccun
05-04-2006, 16:49
Sub WordFrequencyPerCartella()

Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim i, j, k, l, q, Temp As Integer 'Temporary variables
Dim tword As String
Dim Ans As String
Dim totalwords As Long
Dim aword As Object
Dim sDocumenti As String
Dim tmpname As String
Dim DirName As String


With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Seleziona la cartella contenente i file da analizzare"
.Show
If .SelectedItems.Count = 0 Then Exit Sub
DirName = .SelectedItems(1)
End With


' Set up excluded words
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,surrounding each word with [ ].", "Excluded Words", "")


' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ")
If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If


WordNum = 0


With Application.FileSearch
.NewSearch
.LookIn = DirName
.SearchSubFolders = False 'impostare a True per analizzare anche sottocartelle
.FileName = "*.doc"
.MatchTextExactly = True
'.FileType = msoFileTypeWordDocuments


If .Execute() > 0 Then
MsgBox "Trovati " & .FoundFiles.Count & "file(s)."
For i = 1 To (.FoundFiles.Count)
Documents.Open FileName:=.FoundFiles(i)
With ActiveDocument


If sDocumenti = "" Then
sDocumenti = .Name
Else
sDocumenti = sDocumenti + vbCrLf + .Name
End If


Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
ttlwds = ttlwds + ActiveDocument.Words.Count
totalwords = totalwords + ActiveDocument.Words.Count


' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord < "A" Or SingleWord > "z" Then SingleWord = ""
'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword


.Close wdDoNotSaveChanges
End With
Next i
Else
MsgBox "Nessun file trovato"
End If
End With


' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j


' Now write out the results
tmpname = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpname, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment = wdAlignParagraphCenter
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Total words in Documents"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Number of different words in Documents"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum))
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Documenti analizzati:"
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore sDocumenti


System.Cursor = wdCursorNormal
j = MsgBox("There were " & Trim(Str(WordNum)) & " different words", vbOKOnly, "Finished")
Selection.HomeKey wdStory


End Sub
'------------------------------------------------------------------


--
Tiziano Marmiroli
Microsoft MVP - Office

BELLISSIMO , non so se funziona perchè non ho VB , tuttavia resto sempre affascinato dai codici.

Careca
05-04-2006, 17:05
Le ho provate tutte e due e funzionano perfettamente :)