From: CORAL: COde RepresentAtion learning with weakly-supervised transformers for analyzing data analysis
Stage | Definition | When to Use | When Not to Use | Example |
---|---|---|---|---|
Import | These cells are used primarily to import libraries into the Python environment. Although they may serve other functions, like defining constants or initializing helper objects, the majority of the code in these cells sets up analytical tools for use later in the notebook. | Loading libraries, defining constants, initializing environments, connecting to databases | A cell has one or more import statements, but most of the cell serves another purpose | |
Wrangle | Wrangle cells clean, filter, summarize, and/or integrate data. These cells often permute data for use in later cells. | Cleaning data, feature processing, data transformations, augmenting an existing dataset, loading and/or saving data, splitting data into train and test sets | Transformations are applied, but the result is simply examined (See: Explore) | |
Explore | Interactive explorations of data. These cells tend to yield a result that informs later decisions, or enable the user to draw new conclusions. Explore cells may also transform data, but only for the purpose of exploring relationships and not for further in-depth analysis | Rendering DataFrames, visualizing relationships, printing summaries of data, calculating simple statistics, examining the output of functions | Visualizations are used to evaluate the performance of a model (See: Evaluate) | |
Model | Define and fit models of relationships to data. These cells may include some data transformations, but the primary purpose is to create a model to describe or predict some facet of the dataset | Statistical modeling, fitting and/or specifying machine learning models, simulation, defining loss functions | Significance testing and calculating feature importance (See: Evaluate) | |
Evaluate | Measure the explanatory power or predictive accuracy of model using appropriate statistical techniques. These cells sometimes employ visualizations to explore analytical results (e.g. plotting regression residuals) | Cross validation, significance testing, inspecting model output, plotting feature significance. | If a cell both evaluates and defines a machine learning model (a common pattern), default to “Model” |