NEWS | R Documentation |
News for Package 'tm.plugin.koRpus'
Changes in tm.plugin.koRpus version 0.4-2 (2021-05-17)
fixed
updated test standards after changes to koRpus' internal calculations of numer of lines in texts imported from TIF data frames
changed
kRp.corpus: replaced
prototype()
in class definition with initialize method
Changes in tm.plugin.koRpus version 0.4-1 (2020-12-17)
fixed
-
docTermMatrix()
: results were wrong because numbers were assigned to wrong columns; now fixed in koRpus unit tests failed on windows due to an UTF-8 issue
changed
the nested object class kRp.hierarchy was replaced by kRp.corpus; instead of reproducing the file hierarchy in the object structure, kRp.corpus has a flat structure with all texts in one single data frame; this data frame was also renamed from
"TT.res"
into"tokens"
the class name kRp.corpus was used in tm.plugin.koRpus before and is just being recycled ;) kRp.corpus inherits from class kRp.text as defined in the koRpus packagestatus messages are currently only shown when only one CPU is used
-
corpusTagged()
: now calledtaggedText()
as in koRpus -
corpusDesc()
: now calleddescribe()
as in koRpus [, [<-, [[ and [[<- methods no longer apply to the summary data frame but tokens slot as in koRpus (where it applies to the TT.res slot)
-
show()
: kRp.corpus objects now list all available features -
read.corp.custom()
: removed unused mc.cores argument -
docTermMatrix()
: by default behaves like most other methods and adds its result to the input object rather than returning just the matrix; also, the generic is now defined by the koRpus package and was removed, including all of the actual function code adjusted unit tests and vignette
updated all examples to use a new sample corpus (see added), to the benefit that many "\dontrun{}" cases could be removed
added
-
readCorpus()
: the hierarchy levels of a text corpus can now be assumed directly from the directory structure by setting "hierarchy=TRUE" -
corpusHasFeatures()
,corpusHasFeatures()
<-,corpusFeatures()
,corpusFeatures()
<-,corpusHierarchy()
,corpusHierarchy()
<-,corpusCorpFreq()
,corpusCorpFreq()
<-,diffText()
,diffText()
<-,originalText()
: new getter/setter methods for kRp.corpus objects -
split_by_doc_id()
: new method transforms a kRp.corpus object into a list of kRp.text objects -
corpusDocTermMatrix()
: new method to get/set the sparse document term matrix in kRp.corpus objects [[/[[<-: gained new argument
"doc_id"
to limit the scope to particular documents-
describe()
/describe()<-: now support filtering by doc_id new sample corpus for use in examples
removed
removed all classes and methods dealing with kRp.hierarchy
removed deprecated methods of the pre-kRp.hierarchy era
removed generic of
tif_as_tokens_df()
as it was moved to the koRpus package
Changes in tm.plugin.koRpus version 0.3-1 (2019-05-14)
fixed
-
readCorpus()
: solved a cryptic warning when more than one text was tokenized
added
-
docTermMatrix()
: new method to generate document-term matrices, either with absolute frequencies or tf-idf values -
query()
: new method, extending the generic of koRpus >= 0.12-1 -
filterByClass()
: new method, extending the generic of koRpus >= 0.12-1 -
jumbleWords()
: new method, extending the generic of koRpus >= 0.12-1 -
clozeDelete()
: new method, extending the generic of koRpus >= 0.12-1 -
cTest()
: new method, extending the generic of koRpus >= 0.12-1 -
textTransform()
: new method, extending the generic of koRpus >= 0.12-1 -
show()
: new method for objects of class kRp.hierarchy
changed
depends on koRpus >= 0.12-1 now
depends on the Matrix package now (for
docTermMatrix()
)adjusted test standards to include the additional POS tags from koRpus >= 0.12-1
Changes in tm.plugin.koRpus version 0.02-2 (2019-01-18)
fixed
-
readCorpus()
,kRpSource()
: added missing imports from packages tm, NLP and parallel -
readCorpus()
: fixed status message formatting -
corpusTm()
: removed useless"level"
argument and corrected the output -
readCorpus()
: removed unused"level"
argument -
corpusFiles()
: now also works with flat hierarchy objects
added
-
readCorpus()
: can now also import data frames in TIF format, including support for hierarchal categories -
tif_as_corpus_df()
: new S4 method to transform a kRp.hierarchy object into a TIF compliant data frame
changed
-
readCorpus()
: the tm corpora now include full hierarchy metadata removed pre-hierarchy portions from internal function
whatIsAvailable()
Changes in tm.plugin.koRpus version 0.02-1 (2018-07-29)
changed
vignette: also includes info on
readCorpus()
tests: adjusted test standards to new object class
added
kRp.hierarchy: new S4 class to replace kRp.sourcesCorpus and kRp.topicCorpus to allow more generic nesting of hierarchical levels
-
readCorpus()
: new function to generate kRp.hierarchy objects recursively many corpus*() getter functions can now filter by hierarchy level or category ID
removed all code regarding
simpleCorpus()
,sourcesCorpus()
andtopicCorpus()
, their object classes and methods; this is all handled much more flexible by kRp.hierarchy andreadCorpus()
now
Changes in tm.plugin.koRpus version 0.01-4 (2018-03-07)
fixed
-
sourcesCorpus()
: speak of"text"
instead of"texts"
if it's only one
changed
adjusted package to support koRpus >= 0.11 and sylly, especially with regards to
summary()
,hyphen()
, and new class contructors-
summary()
: for more coherence with the koRpus package the"text"
column in the summary slot was renamed into"doc_id"
reaktanz.de supports HTTPS now, updated references
vignette is now in RMarkdown/HTML format; the SWeave/PDF version was dropped
-
hyphen()
/lex.div()
/readability(): 'quiet' is now TRUE by default -
lex.div()
: 'char' is now an emtpy string by default; computing all characteristics was not a useful default for large text corpora
added
README.md
new [, [<-, [[ and [[<- methods added for corpus object classes
new methods
tif_as_tokens_df()
to export corpus objects as a single data.frame in fully TIF compliant format-
summary()
: now also includes the total number of stopwords (if available) new class object contructors
kRp_corpus()
,kRp_sourcesCorpus()
, andkRp_topicCorpus()
can be used instead of new("kRp.corpus"
, ...) etc.
Changes in tm.plugin.koRpus version 0.01-3 (2016-07-12)
fixed
the arguments that
simpleCorpus()
was supposed to pipe toDirSource()
weren't used
changed
the
"paths"
argument oftopicCorpus()
now expects a list, not a vectorusing the parallel package to be able to use more CPU cores
added
new argument
"format"
forsimpleCorpus()
,sourceCorpus()
, andtopicCorpus()
, to be able to work with text objects directly, instead of files
Changes in tm.plugin.koRpus version 0.01-2 (2015-07-08)
changed
using the S4 methods of koRpus 0.06-1 now, therefore renamed all methods removing the *.corpus suffix (e.g.,
lex.div.corpus()
is nowlex.div()
)renamed classes into kRp.corpus, kRp.sourcesCorpus and kRp.topicCorpus, and their generator functions accordingly
added
new methods
read.corp.custom()
,freq.analysis()
andsummary()
new getter/setter methods:
corpusSources()
,corpusTopics()
,corpusFreq()
,corpusSummary()
first basic unit tests, using the testthat package
new option
"summary"
forlex.div()
andreadability()
, to automatically update the summary data.framesfirst notes in a vignette
Changes in tm.plugin.koRpus version 0.01-1 (2015-06-29)
added
initial release