Export Best Practices

The Anthology Reach data export process (framework) is designed as a scheduled batch process best suited to export incremental transactional data to an Excel or CSV file output. The process works best for data added or changed within 24-48 hours. The export process is not suitable for large volume data migration activities.

The following topics provide information to help you evaluate your export process requirements during design of your export, and to optimize your export run time settings to best suit your business needs.

Export Template Design

Designing an efficient export process involves considering various factors. You can optimize your export runs by adjusting settings based on your export template design. When planning and designing your export, keep the following points in mind:

  • Will your export template include related tables and files?

    • The export process allows you to configure export templates to produce a single file, or multiple files.

    • Exports that are designed using a single table and file will process more efficiently than an export that uses multiple source tables and produces multiple output files.

  • How frequently will your export need to run?

    • The Recurrence schedule may vary depending on your needs and use of the exported data files.

    • More frequent exports may contain fewer rows of data per file than exports that run less often. It will depend on your data patterns and the source tables you export from.

      Tip: Use your expected export criteria and planned recurrence schedule to do some data analysis to estimate average row counts expected for your export output files.

      Example

      • Export criteria: Contacts modifiedon within last 1 hours

      • Recurrence: Every 1 hour

  • Use Dataset mappings

    • Dataset mappings can be configured to use as value crosswalks as part of your export template.

    • Data sets are especially useful when your target external system needs a code value instead of a description for a field.

      Example: The export value for the ContactType field is “STU” instead of “Student”.

      Tip: Using dataset mappings with your export template will improve overall performance timings for your export processes.
    • Configure dataset mappings for OPTIONSET and LOOKUP data transformation type fields included in your export template.

    • Using dataset mappings with a LOOKUP type field may not always be practical based on the field and type of data you collect, but always consider this as part of your export template design.

  • Use the Batch Size parameter

    • You can use the Batch Size parameter to control parallel processing aspects of the export process.

    • The Batch Size can be adjusted based on the design of your export template and expected data output.

    • Batch size can be set between 1- 4900.

    • Increasing the batch size will process the export file faster.

    • You need to identify the best suitable batch size for the template.

      The following are example batch size and the time taken to export same data under test conditions.

      • Batch size of 1000 takes 12 minutes to export 10,000 rows of data.

      • Batch size of 25 takes 25 minutes to export 10,000 rows of data.

Recurrence and Scheduling

As part of designing your export process, evaluate your recurrence schedule requirements and take the following into consideration.

  1. Distribute your export process recurrence schedule

    1. Running exports in parallel that produce higher volume output increases the processing load, and could lead to unexpected failures.

      If your export needs will require larger volume output files run during a concurrent time frame, you will need to carefully test and evaluate adjusting your recurrence schedules if necessary to accommodate.

    2. Consider the total number of output files you need to produce in a day, stay within system limits (e.g. API service limits) reword limit.

  2. Always perform test in Sandbox/Dev/Test Environments before moving to Production.

Note: Export results may vary based on the load on the server, export template, number of columns, type of columns (optionsets, lookups),etc.

Limits

  • Recommended maximum File Size Limits

    1. The total output size for a CSV file of 100 MB is supported.

    2. The total output size for an Excel file of 10 MB is supported.

    These are general guidelines but may vary based on your environment.

  • Processing limits

    1. Calculating the number of loops for processing the export records : Total number of rows divided by the batch size should not exceed 200.

      1. Delivered sample export templates have default Batch Size parameter value.

      2. For your export template you should review your template design and adjust it to optimize your export process.

        The following examples explain how to calculate the number of loops for processing the export records:

        EXAMPLE 1

        • The total number of rows is 100,000

        • The Batch Size is 500

        The number of loops to process the export record = Total number of rows/ Batch Size, which is 100,00/500 = 200

        EXAMPLE 2

        • The total number of rows is 100,000

        • The Batch Size is 100

        The number of loops to process the export record = Total number of rows/ Batch Size, which is 100,00/100 = 1000

        In the above examples

        • In Example 1 based on the calculation, the number of loops to process the export records is 200, which is within the recommended limit.

        • In Example 2 based on the calculation, the number of loops to process the export records is 1000, which is not within the recommended limit as it exceeds the recommended 200 limit value.

      3. A higher batch size setting increases the resource consumption and the overall processing time. Process may fail or slow down because of the low batch size (number of loops/azure function executions are too high, extending the processing time.

      4. As number of columns increase, batch size parameter should decrease accordingly.

    2. Total maximum rows output per run of the export process, single or multi file output: output should not be more than 500,000 rows

      1. If your export template is designed to produce related files these count in the total.

        Example

        • Scenario A: Template designed with a parent output file for Applications, and separate files for Application Recommendations, Application Requirements, Test Scores

          • Applications - 100,000 rows

            • 5 Application Recommendations - 500,000 rows

            • 5 Application Requirements - 500,000 rows

            • 6 Test Scores - 600,000 rows

          In the above test, Applications.xlsx, ApplicationRecommendations.xlsx, and ApplicationRequirements.xlsx exports likely to work. But TestScores.xlsx might fail.

        • Scenario B: Single Entity – Applications with 510,000 rows

          In the above test, Applications.xlsx might not work.

    3. Although, there is no limit on the configurations that can be done using the export framework, it is recommended to account for critical factors like maximum peak load, scheduling during non-business hours, and the long-term impact of data load on the server before using the export framework.