Metrics Update Process
Overview#
Drill4J uses an ETL (Extract, Transform, Load) process to transform raw data collected by agents into actionable metrics. The system maintains two separate database schemas:
raw_data- Stores data sent by agents in its original format without any processing.metrics- Contains processed data that powers the dashboards and API responses. This schema is populated and maintained by the ETL pipeline.
The ETL process runs automatically on a schedule and can also be triggered manually.
It reads from raw_data, performs necessary transformations and calculations, and updates the metrics schema.
This architecture allows for data reprocessing if needed and separates data collection from data analysis concerns.
Scheduled Run#
ETL process runs automatically using a Cron job.
The schedule is controlled by the DRILL_SCHEDULER_ETL_JOB_CRON environment variable passed to the Drill4J Backend.
Note: Applying changes to the environment variable requires restarting the Drill4J Backend instance for the new schedule to take effect.
Best Practice: Adjust the ETL schedule frequency so that job execution finishes before the next run time. Although the job won't start a new run until the previous one finishes, we recommend leaving extra buffer time.
On-Demand Run#
You can manually trigger the ETL process without waiting for the scheduled execution. The ETL process supports two modes of operation:
Incremental Updates#
By default, the ETL process performs incremental updates, processing only new data from raw_data that hasn't been transformed yet. This is the most efficient approach for day-to-day operations.
When to use:
- Regular scheduled updates
- Quick catch-up after a brief period
- Before querying impacted tests/methods API endpoints
API Request:
Complete Restart#
A complete restart clears all data from the metrics schema and reprocesses everything from scratch based on available data in raw_data.
When to use:
- When the settings affecting the metrics changed (metrics period expanded, rules for ignoring methods and classes changed)
- When retrospective changes have occurred
- When data integrity issues are suspected
- After schema migrations or updates
API Request:
Warning: This operation is resource-intensive and can take a long time to complete (from hours to days, depending on the amount of data).
Best Practice: Schedule complete restarts during maintenance windows when metrics access is not critical.
Data Retention and Cleanup#
Drill4J automatically manages data retention for both raw_data and metrics schemas using dedicated cleanup jobs.
This prevents unlimited data growth and maintains optimal database performance.
Configuring Retention Periods#
Each agent group has its own retention settings that control how long data is kept in each schema.
Viewing Current Settings:
Example Response:
Updating Retention Settings:
Warning: Please note the body must contain all parameters with appropriate values - payload is not merged - it overwrites all settings.
Configuring Cleanup Schedule#
The cleanup jobs run on a schedule controlled by the DRILL_SCHEDULER_DATA_RETENTION_JOB_CRON environment variable.
Example Configuration:
By default, the cleanup job is set to run daily at 01:00.
Note: Changes to the environment variable require restarting the Drill4J Backend instance.
Best Practice:
The retentionPeriodDays for raw_data should be greater than or equal to the metricsPeriodDays. This ensures that all data required for a complete ETL reprocessing is available, allowing safe full metric recalculation if needed.
Fine-Tuning Performance#
The ETL pipeline can be tuned for optimal performance based on your infrastructure and data volume. These parameters control memory usage, database interaction, and throughput.
Buffer Size#
- Environment Variable:
DRILL_ETL_BUFFER_SIZE - Purpose: Size of the in-memory buffer between data extractor and loaders
- Behavior: Prevents unbounded memory growth. When the buffer is full, the extractor suspends, giving loaders time to process
- Impact: Affects throughput and memory usage
- Default: 2000
- Tuning Guidance:
- Increase for faster processing if memory allows (4000-8000)
- Decrease if experiencing memory pressure (500-1000)
Fetch Size#
- Environment Variable:
DRILL_ETL_FETCH_SIZE - Purpose: JDBC fetch size hint for SQL queries used by the data extractor
- Behavior: Determines how many rows are fetched from the database per round trip
- Impact: Network latency and database load
- Default: 2000
- Tuning Guidance:
- Increase for better throughput on fast networks (5000-10000)
- Decrease for slower networks or smaller result sets (500-1000)
Batch Size#
- Environment Variable:
DRILL_ETL_BATCH_SIZE - Purpose: Number of items grouped into a single write batch/transaction used by data loaders
- Behavior: Controls commit frequency and transaction size
- Impact: Write performance and transaction overhead
- Default: 1000
- Tuning Guidance:
- Increase for better write performance (2000-5000)
- Decrease to reduce transaction lock time (100-500)
Note: Applying changes to the environment variable requires restarting the Drill4J Backend instance for the new schedule to take effect.
Tracking and Monitoring#
The ETL process provides comprehensive logging to help you monitor execution, troubleshoot issues, and optimize performance.
Logging Levels#
The ETL logging supports multiple levels:
- INFO: Logs only ETL start and completion events.
- DEBUG: In addition to INFO, logs when each extractor and loader starts and finishes.
- TRACE: In addition to DEBUG, logs every batch commit during loading.
Tracking Progress#
You can track the real-time progress of ETL executions by calling the Metrics Refresh Status API:
Example Response:
ETL Statuses:
EXTRACTING: ETL process is extracting data fromraw_dataschema.LOADING: ETL is actively loading data.SUCCESS: ETL completed successfully.FAILED: ETL run ended with an error. Check the Drill4J Admin container logs for details. It will try again on the next scheduled run, but it will continue to fail until the underlying issue is fixed.
Troubleshooting#
ETL Pipeline Fails#
Symptoms:
- ETL log shows errors
- Manual refresh API returns errors
- Metrics not updating
Solutions:
Check Database Connectivity:
- Verify database connection credentials
- Test database accessibility from the Backend instance
- Check firewall rules and network connectivity
Verify Schema Existence:
- Check database user has necessary permissions (SELECT, INSERT, UPDATE, DELETE)
ETL Running Slowly#
Symptoms:
- ETL process execution time keeps increasing
- Data processing delay continues to grow
- Metrics stop reflecting the most recent data
Solutions:
Data Volume:
- Review retention settings - older data may not be needed
- Review methods ignore rules - consider excluding unneeded classes and methods
Database Performance:
- Consider increasing database resources (CPU, memory, IOPS)
- Consider database maintenance (VACUUM, ANALYZE, etc.)
Review Performance Parameters:
- Consider increasing
bufferSize,fetchSize, orbatchSize. - Monitor memory usage when adjusting parameters
- Consider increasing
Metrics Data Inconsistency#
Symptoms:
- Dashboard showing unexpected values
- API results don't match raw data
Solutions:
Perform Complete Refresh:
- Use
reset=trueAPI call to reprocess all data
- Use
Check Data Retention:
- Review
retentionPeriodDaysandmetricsPeriodDayssettings - Verify cleanup jobs haven't removed needed data
- Review
Investigate Errors:
- Review ETL logs for failure counts
- Review ETL Metadata table for error messages