Type: | Package |
Title: | An Interface to the 'fastText' Library |
Version: | 2.1.0 |
Description: | An interface to the 'fastText' library https://github.com/facebookresearch/fastText. The package can be used for text classification and to learn word vectors. An example how to use 'fastTextR' can be found in the 'README' file. |
License: | BSD_3_clause + file LICENSE |
Imports: | stats, graphics, Rcpp (≥ 0.12.4), slam |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
LinkingTo: | Rcpp |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
URL: | https://github.com/EmilHvitfeldt/fastTextR |
BugReports: | https://github.com/EmilHvitfeldt/fastTextR/issues |
NeedsCompilation: | yes |
Packaged: | 2023-12-08 23:17:48 UTC; emilhvitfeldt |
Author: | Florian Schwendinger [aut],
Emil Hvitfeldt |
Maintainer: | Emil Hvitfeldt <emilhhvitfeldt@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-12-09 00:40:09 UTC |
Create a New FastText
Model
Description
Create a new FastText
model. The available methods
are the same as the package functions but with out the prefix "ft_"
and without the need to provide the model.
Usage
fasttext()
Examples
ft <- fasttext()
Get Analogies
Description
TODO
Usage
ft_analogies(model, word_triplets, k = 10L)
Arguments
model |
an object inheriting from |
word_triplets |
a character vector of length string giving the word. |
k |
an integer giving the number of nearest neighbors to be returned. |
Value
.
Examples
## Not run:
ft_analogies(model, c("berlin", "germany", "france"), k = 6L)
## End(Not run)
Default Control Settings
Description
A auxiliary function for defining the control variables.
Usage
ft_control(
loss = c("softmax", "hs", "ns"),
learning_rate = 0.05,
learn_update = 100L,
word_vec_size = 100L,
window_size = 5L,
epoch = 5L,
min_count = 5L,
min_count_label = 0L,
neg = 5L,
max_len_ngram = 1L,
nbuckets = 2000000L,
min_ngram = 3L,
max_ngram = 6L,
nthreads = 1L,
threshold = 1e-04,
label = "__label__",
verbose = 0,
pretrained_vectors = "",
output = "",
save_output = FALSE,
seed = 0L,
qnorm = FALSE,
retrain = FALSE,
qout = FALSE,
cutoff = 0L,
dsub = 2L,
autotune_validation_file = "",
autotune_metric = "f1",
autotune_predictions = 1L,
autotune_duration = 300L,
autotune_model_size = ""
)
Arguments
loss |
a character string giving the name of the loss function
allowed values are |
learning_rate |
a numeric giving the learning rate, the default value is |
learn_update |
an integer giving after how many tokens the learning rate
should be updated. The default value is |
word_vec_size |
an integer giving the length (size) of the word vectors. |
window_size |
an integer giving the size of the context window. |
epoch |
an integer giving the number of epochs. |
min_count |
an integer giving the minimal number of word occurences. |
min_count_label |
and integer giving the minimal number of label occurences. |
neg |
an integer giving how many negatives are sampled (only used if loss is |
max_len_ngram |
an integer giving the maximum length of ngrams used. |
nbuckets |
an integer giving the number of buckets. |
min_ngram |
an integer giving the minimal ngram length. |
max_ngram |
an integer giving the maximal ngram length. |
nthreads |
an integer giving the number of threads. |
threshold |
a numeric giving the sampling threshold. |
label |
a character string specifying the label prefix (default is |
verbose |
an integer giving the verbosity level, the default value
is |
pretrained_vectors |
a character string giving the file path to the pretrained word vectors which are used for the supervised learning. |
output |
a character string giving the output file path. |
save_output |
a logical (default is |
seed |
an integer |
qnorm |
a logical (default is |
retrain |
a logical (default is |
qout |
a logical (default is |
cutoff |
an integer (default is |
dsub |
an integer (default is |
autotune_validation_file |
a character string |
autotune_metric |
a character string (default is |
autotune_predictions |
an integer (default is |
autotune_duration |
an integer (default is |
autotune_model_size |
a character string |
Value
a list with the control variables.
Examples
ft_control(learning_rate=0.1)
Load Model
Description
Load a previously saved model from file.
Usage
ft_load(file)
Arguments
file |
a character string giving the name of the file to be read in. |
Value
an object inheriting from "fasttext"
.
Examples
## Not run:
model <- ft_load("dbpedia.bin")
## End(Not run)
Get Nearest Neighbors
Description
TODO
Usage
ft_nearest_neighbors(model, word, k = 10L)
Arguments
model |
an object inheriting from |
word |
a character string giving the word. |
k |
an integer giving the number of nearest neighbors to be returned. |
Value
.
Examples
## Not run:
ft_nearest_neighbors(model, "enviroment", k = 6L)
## End(Not run)
Normalize
Description
Applies normalization to a given text.
Usage
ft_normalize(txt)
Arguments
txt |
a character vector to be normalized. |
Value
a character vector.
Examples
## Not run:
ft_normalize(some_text)
## End(Not run)
Write Model
Description
Write a previously saved model from file.
Usage
ft_save(model, file, what = c("model", "vectors", "output"))
Arguments
model |
an object inheriting from |
file |
a character string giving the name of the file. |
what |
a character string giving what should be saved. |
Examples
## Not run:
ft_save(model, "my_model.bin", what = "model")
## End(Not run)
Get Sentence Vectors
Description
Obtain sentence vectors from a previously trained model.
Usage
ft_sentence_vectors(model, sentences)
Arguments
model |
an object inheriting from |
sentences |
a character vector giving the sentences. |
Value
a matrix containing the sentence vectors.
Examples
## Not run:
ft_sentence_vectors(model, c("sentence", "vector"))
## End(Not run)
Evaluate the Model
Description
Evaluate the quality of the predictions. For the model evaluation precision and recall are used.
Usage
ft_test(model, file, k = 1L, threshold = 0)
Arguments
model |
an object inheriting from |
file |
a character string giving the location of the validation file. |
k |
an integer giving the number of labels to be returned. |
threshold |
a double giving the threshold. |
Examples
## Not run:
ft_test(model, file)
## End(Not run)
Train a Model
Description
Train a new word representation model or supervised classification model.
Usage
ft_train(
file,
method = c("supervised", "cbow", "skipgram"),
control = ft_control(),
...
)
Arguments
file |
a character string giving the location of the input file. |
method |
a character string giving the method, possible values are
|
control |
a list giving the control variables, for more information
see |
... |
additional control arguments inserted into the control list. |
Examples
## Not run:
cntrl <- ft_control(nthreads = 1L)
model <- ft_train("my_data.txt", method="supervised", control = cntrl)
## End(Not run)
Get Word Vectors
Description
Obtain word vectors from a previously trained model.
Usage
ft_word_vectors(model, words)
Arguments
model |
an object inheriting from |
words |
a character vector giving the words. |
Value
a matrix containing the word vectors.
Examples
## Not run:
ft_word_vectors(model, c("word", "vector"))
## End(Not run)
Get Words
Description
Obtain all the words from a previously trained model.
Usage
ft_words(model)
Arguments
model |
an object inheriting from |
Value
a character vector.
Examples
## Not run:
ft_words(model)
## End(Not run)
Predict using a Previously Trained Model
Description
Predict values based on a previously trained model.
Usage
ft_predict(
model,
newdata,
k = 1L,
threshold = 0,
rval = c("sparse", "dense", "slam"),
...
)
Arguments
model |
an object inheriting from |
newdata |
a character vector giving the new data. |
k |
an integer giving the number of labels to be returned. |
threshold |
a double withing |
rval |
a character string controlling the return value, allowed
values are |
... |
currently not used. |
Value
NULL
if a 'result_file'
is given otherwise
if 'prob'
is true a data.frame
with the predicted labels
and the corresponding probabilities, if 'prob'
is false a
character vector with the predicted labels.
Examples
## Not run:
ft_predict(model, newdata)
## End(Not run)