Merges two data frames
merge.RdMerges two data frames
Arguments
- x
the first data frame to be joined.
- y
the second data frame to be joined.
- ...
additional argument(s) passed to the method.
- by
a character vector specifying the join columns. If by is not specified, the common column names in
xandywill be used. If by or both by.x and by.y are explicitly set to NULL or of length 0, the Cartesian Product of x and y will be returned.- by.x
a character vector specifying the joining columns for x.
- by.y
a character vector specifying the joining columns for y.
- all
a boolean value setting
all.xandall.yif any of them are unset.- all.x
a boolean value indicating whether all the rows in x should be including in the join.
- all.y
a boolean value indicating whether all the rows in y should be including in the join.
- sort
a logical argument indicating whether the resulting columns should be sorted.
- suffixes
a string vector of length 2 used to make colnames of
xandyunique. The first element is appended to each colname ofx. The second element is appended to each colname ofy.
Details
If all.x and all.y are set to FALSE, a natural join will be returned. If all.x is set to TRUE and all.y is set to FALSE, a left outer join will be returned. If all.x is set to FALSE and all.y is set to TRUE, a right outer join will be returned. If all.x and all.y are set to TRUE, a full outer join will be returned.
See also
Other SparkDataFrame functions:
SparkDataFrame-class,
agg(),
alias(),
arrange(),
as.data.frame(),
attach,SparkDataFrame-method,
broadcast(),
cache(),
checkpoint(),
coalesce(),
collect(),
colnames(),
coltypes(),
createOrReplaceTempView(),
crossJoin(),
cube(),
dapplyCollect(),
dapply(),
describe(),
dim(),
distinct(),
dropDuplicates(),
dropna(),
drop(),
dtypes(),
exceptAll(),
except(),
explain(),
filter(),
first(),
gapplyCollect(),
gapply(),
getNumPartitions(),
group_by(),
head(),
hint(),
histogram(),
insertInto(),
intersectAll(),
intersect(),
isLocal(),
isStreaming(),
join(),
limit(),
localCheckpoint(),
mutate(),
ncol(),
nrow(),
persist(),
printSchema(),
randomSplit(),
rbind(),
rename(),
repartitionByRange(),
repartition(),
rollup(),
sample(),
saveAsTable(),
schema(),
selectExpr(),
select(),
showDF(),
show(),
storageLevel(),
str(),
subset(),
summary(),
take(),
toJSON(),
unionAll(),
unionByName(),
union(),
unpersist(),
withColumn(),
withWatermark(),
with(),
write.df(),
write.jdbc(),
write.json(),
write.orc(),
write.parquet(),
write.stream(),
write.text()
Examples
if (FALSE) {
sparkR.session()
df1 <- read.json(path)
df2 <- read.json(path2)
merge(df1, df2) # Performs an inner join by common columns
merge(df1, df2, by = "col1") # Performs an inner join based on expression
merge(df1, df2, by.x = "col1", by.y = "col2", all.y = TRUE)
merge(df1, df2, by.x = "col1", by.y = "col2", all.x = TRUE)
merge(df1, df2, by.x = "col1", by.y = "col2", all.x = TRUE, all.y = TRUE)
merge(df1, df2, by.x = "col1", by.y = "col2", all = TRUE, sort = FALSE)
merge(df1, df2, by = "col1", all = TRUE, suffixes = c("-X", "-Y"))
merge(df1, df2, by = NULL) # Performs a Cartesian join
}