DataFrame¶
Constructor¶
| 
 | pandas-on-Spark DataFrame that corresponds to pandas DataFrame logically. | 
Attributes and underlying data¶
| The index (row labels) Column of the DataFrame. | |
| The column labels of the DataFrame. | |
| Returns true if the current DataFrame is empty. | 
| Return the dtypes in the DataFrame. | |
| Return a tuple representing the dimensionality of the DataFrame. | |
| Return a list representing the axes of the DataFrame. | |
| Return an int representing the number of array dimensions. | |
| Return an int representing the number of elements in this object. | |
| 
 | Return a subset of the DataFrame’s columns based on the column dtypes. | 
| Return a Numpy representation of the DataFrame or the Series. | 
Conversion¶
| 
 | Make a copy of this object’s indices and data. | 
| Detects missing values for items in the current Dataframe. | |
| 
 | Cast a pandas-on-Spark object to a specified dtype  | 
| Detects missing values for items in the current Dataframe. | |
| Detects non-missing values for items in the current Dataframe. | |
| Detects non-missing values for items in the current Dataframe. | |
| 
 | Synonym for DataFrame.fillna() or Series.fillna() with  | 
| Return the bool of a single element in the current object. | 
Indexing, iteration¶
| Access a single value for a row/column label pair. | |
| Access a single value for a row/column pair by integer position. | |
| 
 | Return the first n rows. | 
| 
 | Return index of first occurrence of maximum over requested axis. | 
| 
 | Return index of first occurrence of minimum over requested axis. | 
| Access a group of rows and columns by label(s) or a boolean Series. | |
| Purely integer-location based indexing for selection by position. | |
| This is an alias of  | |
| Iterator over (column name, Series) pairs. | |
| Iterate over DataFrame rows as (index, Series) pairs. | |
| 
 | Iterate over DataFrame rows as namedtuples. | 
| Return alias for columns. | |
| 
 | Return item and drop from frame. | 
| 
 | Return the last n rows. | 
| 
 | Return cross-section from the DataFrame. | 
| 
 | Get item from object for given key (DataFrame column, Panel slice, etc.). | 
| 
 | Replace values where the condition is False. | 
| 
 | Replace values where the condition is True. | 
| 
 | Query the columns of a DataFrame with a boolean expression. | 
Binary operator functions¶
| 
 | Get Addition of dataframe and other, element-wise (binary operator +). | 
| 
 | Get Addition of dataframe and other, element-wise (binary operator +). | 
| 
 | Get Floating division of dataframe and other, element-wise (binary operator /). | 
| 
 | Get Floating division of dataframe and other, element-wise (binary operator /). | 
| 
 | Get Floating division of dataframe and other, element-wise (binary operator /). | 
| 
 | Get Floating division of dataframe and other, element-wise (binary operator /). | 
| 
 | Get Multiplication of dataframe and other, element-wise (binary operator *). | 
| 
 | Get Multiplication of dataframe and other, element-wise (binary operator *). | 
| 
 | Get Subtraction of dataframe and other, element-wise (binary operator -). | 
| 
 | Get Subtraction of dataframe and other, element-wise (binary operator -). | 
| 
 | Get Exponential power of series of dataframe and other, element-wise (binary operator **). | 
| 
 | Get Exponential power of dataframe and other, element-wise (binary operator **). | 
| 
 | Get Modulo of dataframe and other, element-wise (binary operator %). | 
| 
 | Get Modulo of dataframe and other, element-wise (binary operator %). | 
| 
 | Get Integer division of dataframe and other, element-wise (binary operator //). | 
| 
 | Get Integer division of dataframe and other, element-wise (binary operator //). | 
| 
 | Compare if the current value is less than the other. | 
| 
 | Compare if the current value is greater than the other. | 
| 
 | Compare if the current value is less than or equal to the other. | 
| 
 | Compare if the current value is greater than or equal to the other. | 
| 
 | Compare if the current value is not equal to the other. | 
| 
 | Compare if the current value is equal to the other. | 
| 
 | Compute the matrix multiplication between the DataFrame and other. | 
Function application, GroupBy & Window¶
| 
 | Apply a function along an axis of the DataFrame. | 
| 
 | Apply a function to a Dataframe elementwise. | 
| 
 | Apply func(self, *args, **kwargs). | 
| 
 | Aggregate using one or more operations over the specified axis. | 
| 
 | Aggregate using one or more operations over the specified axis. | 
| 
 | Group DataFrame or Series using a Series of columns. | 
| 
 | Provide rolling transformations. | 
| 
 | Provide expanding transformations. | 
| 
 | Call  | 
Computations / Descriptive Stats¶
| Return a Series/DataFrame with absolute numeric value of each element. | |
| 
 | Return whether all elements are True. | 
| 
 | Return whether any element is True. | 
| 
 | Trim values at input threshold(s). | 
| 
 | Compute pairwise correlation of columns, excluding NA/null values. | 
| 
 | Count non-NA cells for each column. | 
| 
 | Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding  | 
| 
 | Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). | 
| 
 | Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). | 
| 
 | Return the mean absolute deviation of values. | 
| 
 | Return the maximum of the values. | 
| 
 | Return the mean of the values. | 
| 
 | Return the minimum of the values. | 
| 
 | Return the median of the values for the requested axis. | 
| 
 | Percentage change between the current and a prior element. | 
| 
 | Return the product of the values. | 
| 
 | Return the product of the values. | 
| 
 | Return value at the given quantile. | 
| 
 | Return number of unique elements in the object. | 
| 
 | Return unbiased standard error of the mean over requested axis. | 
| 
 | Return unbiased skew normalized by N-1. | 
| 
 | Return the sum of the values. | 
| 
 | Return sample standard deviation. | 
| 
 | Return unbiased variance. | 
| 
 | Return cumulative minimum over a DataFrame or Series axis. | 
| 
 | Return cumulative maximum over a DataFrame or Series axis. | 
| 
 | Return cumulative sum over a DataFrame or Series axis. | 
| 
 | Return cumulative product over a DataFrame or Series axis. | 
| 
 | Round a DataFrame to a variable number of decimal places. | 
| 
 | First discrete difference of element. | 
| 
 | Evaluate a string describing operations on DataFrame columns. | 
Reindexing / Selection / Label manipulation¶
| 
 | Prefix labels with string prefix. | 
| 
 | Suffix labels with string suffix. | 
| 
 | Align two objects on their axes with the specified join method. | 
| 
 | Select values at particular time of day (example: 9:30AM). | 
| 
 | Select values between particular times of the day (example: 9:00-9:30 AM). | 
| 
 | Drop specified labels from columns. | 
| 
 | Return DataFrame with requested index / column level(s) removed. | 
| 
 | Return DataFrame with duplicate rows removed, optionally only considering certain columns. | 
| 
 | Return boolean Series denoting duplicate rows, optionally only considering certain columns. | 
| 
 | Compare if the current value is equal to the other. | 
| 
 | Subset rows or columns of dataframe according to labels in the specified index. | 
| 
 | Select first periods of time series data based on a date offset. | 
| 
 | Return the first n rows. | 
| 
 | Select final periods of time series data based on a date offset. | 
| 
 | Alter axes labels. | 
| 
 | Set the name of the axis for the index or columns. | 
| 
 | Reset the index, or a level of it. | 
| 
 | Set the DataFrame index (row labels) using one or more existing columns. | 
| 
 | Interchange axes and swap values axes appropriately. | 
| 
 | Swap levels i and j in a MultiIndex on a particular axis. | 
| 
 | Return the elements in the given positional indices along an axis. | 
| 
 | Whether each element in the DataFrame is contained in values. | 
| 
 | Return a random sample of items from an axis of object. | 
| 
 | Truncate a Series or DataFrame before and after some index value. | 
Missing data handling¶
| 
 | Synonym for DataFrame.fillna() or Series.fillna() with  | 
| 
 | Remove missing values. | 
| 
 | Fill NA/NaN values. | 
| 
 | Returns a new DataFrame replacing a value with another value. | 
| 
 | Synonym for DataFrame.fillna() or Series.fillna() with  | 
| 
 | Synonym for DataFrame.fillna() or Series.fillna() with  | 
Reshaping, sorting, transposing¶
| 
 | Create a spreadsheet-style pivot table as a DataFrame. | 
| 
 | Return reshaped DataFrame organized by given index / column values. | 
| 
 | Sort object by labels (along an axis) | 
| 
 | Sort by the values along either axis. | 
| 
 | Return the first n rows ordered by columns in descending order. | 
| 
 | Return the first n rows ordered by columns in ascending order. | 
| Stack the prescribed level(s) from columns to index. | |
| Pivot the (necessarily hierarchical) index labels. | |
| 
 | Unpivot a DataFrame from wide format to long format, optionally leaving identifier variables set. | 
| 
 | Transform each element of a list-like to a row, replicating index values. | 
| 
 | Squeeze 1 dimensional axis objects into scalars. | 
| Transpose index and columns. | |
| Transpose index and columns. | |
| 
 | Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. | 
| 
 | Return a DataFrame with matching indices as other object. | 
| 
 | Compute numerical data ranks (1 through n) along axis. | 
Combining / joining / merging¶
| 
 | Append rows of other to the end of caller, returning a new object. | 
| 
 | Assign new columns to a DataFrame. | 
| 
 | Merge DataFrame objects with a database-style join. | 
| 
 | Join columns of another DataFrame. | 
| 
 | Modify in place using non-NA values from another DataFrame. | 
| 
 | Insert column into DataFrame at specified location. | 
Serialization / IO / Conversion¶
| 
 | Convert structured or record ndarray to DataFrame. | 
| 
 | Print a concise summary of a DataFrame. | 
| 
 | Write the DataFrame into a Spark table. | 
| 
 | Write the DataFrame out as a Delta Lake table. | 
| 
 | Write the DataFrame out as a Parquet file or directory. | 
| 
 | Write the DataFrame out to a Spark data source. | 
| 
 | Write object to a comma-separated values (csv) file. | 
| Return a pandas DataFrame. | |
| 
 | Render a DataFrame as an HTML table. | 
| A NumPy ndarray representing the values in this DataFrame or Series. | |
| 
 | Spark related features. | 
| 
 | Render a DataFrame to a console-friendly tabular output. | 
| 
 | Convert the object to a JSON string. | 
| 
 | Convert the DataFrame to a dictionary. | 
| 
 | Write object to an Excel sheet. | 
| 
 | Copy object to the system clipboard. | 
| 
 | Print Series or DataFrame in Markdown-friendly format. | 
| 
 | Convert DataFrame to a NumPy record array. | 
| 
 | Render an object to a LaTeX tabular environment table. | 
| Property returning a Styler object containing methods for building a styled HTML representation for the DataFrame. | 
Plotting¶
DataFrame.plot is both a callable method and a namespace attribute for
specific plotting methods of the form DataFrame.plot.<kind>.
| alias of  | |
| 
 | Draw a stacked area plot. | 
| 
 | Make a horizontal bar plot. | 
| 
 | Vertical bar plot. | 
| 
 | Draw one histogram of the DataFrame’s columns. | 
| 
 | Plot DataFrame/Series as lines. | 
| 
 | Generate a pie plot. | 
| 
 | Create a scatter plot with varying marker point size and color. | 
| 
 | Generate Kernel Density Estimate plot using Gaussian kernels. | 
| 
 | Draw one histogram of the DataFrame’s columns. | 
| 
 | Generate Kernel Density Estimate plot using Gaussian kernels. | 
Pandas-on-Spark specific¶
DataFrame.pandas_on_spark provides pandas-on-Spark specific features that exists only in pandas API on Spark.
These can be accessed by DataFrame.pandas_on_spark.<function/property>.
| Apply a function that takes pandas DataFrame and outputs pandas DataFrame. | |
| Transform chunks with a function that takes pandas DataFrame and outputs pandas DataFrame. |