Docker agent troubleshooting and best practices

This section combines actionable troubleshooting steps with best practices to ensure your Middleware.io Host Agent is running efficiently, securely, and provides reliable data.

Verification & Common Issues

This part focuses on diagnosing and resolving problems if your agent is not performing as expected.

Verify Agent Status

Before diving into complex troubleshooting, confirm the basics:

Check Container Status: Ensure the mw-host-agent container is running:
```
docker ps -a --filter name=mw-host-agent
```
- Expected Output: Look for STATUS to be Up ... seconds/minutes.
- Problematic Output: If it's Exited, Created, or Restarting, it indicates an issue that needs further investigation (check logs).
Inspect Agent Logs: The agent's logs are your primary source of diagnostic information. This will show startup messages, connection attempts, and any errors.
```
docker logs mw-host-agent
```
- What to Look For:
  - Messages indicating successful connection to MW_TARGET.
  - Errors related to API key, network connectivity, or file/volume permissions.
  - If necessary, increase MW_LOG_LEVEL to debug (see Section 4.1. Environment Variables) for more verbose output, but remember to revert for production.
Check Middleware.io Dashboard: Confirm that data is actually appearing in your Middleware.io account.
- Infrastructure Section: Verify if your Docker host appears and is reporting metrics (CPU, Memory, Disk, Network).
- Container List: Check if your running Docker containers are listed and reporting metrics and logs.
- Logs Section: Confirm that container logs and any configured host logs are flowing into the platform.
- Note: It may take 1-2 minutes for initial data to be processed and displayed.

Common Issues and Solutions

Agent Container Not Starting or Exited:
- Symptom: docker ps shows mw-host-agent as Exited or Created shortly after deployment.
- Solution:
  - Check Logs First: The docker logs mw-host-agent command will almost always reveal the root cause (e.g., "invalid API key", "connection refused", "permission denied").
  - API Key/Target: Crucial. Double-check your MW_API_KEY and MW_TARGET environment variables. They must be exact, without leading/trailing spaces or typos. Get them directly from your Middleware.io "Installation" page.
  - Network: Ensure the Docker host has outbound internet access to the MW_TARGET URL (typically port 443 for HTTPS). Check firewall rules (ufw, firewalld, iptables).
  - Volume Mounts: Verify that all required volumes (/var/run/docker.sock, /proc, /sys, /var/lib/docker/containers) are correctly mounted and accessible. Incorrect paths or insufficient host permissions can prevent the agent from starting.
No Data Appearing in Dashboard (Agent Running):
- Symptom: The agent container appears Up in docker ps, but no metrics or logs are visible in your Middleware.io dashboard.
- Solution:
  - Agent Logs Review: Run docker logs mw-host-agent again. Look for any messages indicating data transmission failures or connection errors to MW_TARGET.
  - API Key/Target Validation: Even if the agent starts, an incorrect key/target will prevent data ingestion. Re-verify them.
  - Firewall/Proxy: Confirm no firewall is blocking outbound connections from the Docker host to your Middleware.io ingestion endpoint. If you are behind a corporate proxy, ensure HTTP_PROXY/HTTPS_PROXY environment variables are correctly set for the agent container (see Section 4.1).
  - Time Synchronization: Ensure your Docker host's system clock is synchronized (e.g., using ntpdate or chrony). Significant time skew can cause data rejection by the ingestion service.
  - Initial Delay: Allow a few minutes (1-2 minimum) for data to be collected, transmitted, and indexed by Middleware.io before it appears in the dashboard.
Data Missing or Incomplete (e.g., some containers, no logs):
- Symptom: You see some data (e.g., host metrics) but not all expected data (e.g., specific container metrics, or any logs).
- Solution:
  - Volume Mounts for Logs:
    - For container logs: Ensure /var/lib/docker/containers is correctly mounted read-only (:ro). This path is where Docker stores individual container log files.
    - For host logs: Verify that /var/log (or your custom paths specified in MW_AGENT_LOG_PATHS) is correctly mounted.
  - Host Permissions: Although mounts are read-only from the container's perspective, the Docker daemon (and underlying user) on the host needs sufficient permissions to access those paths.
  - Exclusion/Inclusion Lists: Check if MW_CONTAINER_EXCLUDE_LIST or MW_CONTAINER_INCLUDE_LIST environment variables are set. These can inadvertently filter out containers or log sources you intend to monitor.
  - APM Traces: Remember, the host agent provides infrastructure for APM. To collect application traces, you must also instrument your application code with language-specific Middleware.io APM agents (e.g., Java Agent, Node.js Agent) within your application containers. The host agent does not automatically collect APM traces from your application code.
Agent Using Excessive Resources (CPU/Memory):
- Symptom: The mw-host-agent container is consistently consuming a high percentage of CPU or a large amount of memory.
- Solution:
  - Check Agent Logs Verbosity: If MW_LOG_LEVEL is set to debug or trace, it can significantly increase resource usage due to extensive logging. Revert to info for production environments.
  - Reduce Collection Frequency: Adjust MW_METRIC_INTERVAL_SECONDS to a higher value (e.g., 30 or 60 seconds) if very frequent metric updates are not critically needed. This reduces the load on the agent.
  - Number of Monitored Entities: A very high number of Docker containers, host processes, or log files on the system will naturally increase the agent's resource consumption. If this is the case, consider using MW_CONTAINER_EXCLUDE_LIST to focus monitoring on critical containers.
  - Set Resource Limits: Proactively set CPU and memory limits for the agent container to prevent it from impacting other services on the host (see Section 5.2.1).

Logs and Diagnostics for Support

When contacting Middleware.io support, providing the following information will help expedite resolution:

The exact docker run command or docker-compose.yml file you are using.
The full output of docker logs mw-host-agent (ideally after setting MW_LOG_LEVEL=debug for a few minutes to capture verbose information).
The output of docker ps -a --filter name=mw-host-agent.
A clear description of the issue: what's happening, what you expect to happen, and when the issue started.
Any relevant error messages or screenshots from your Middleware.io dashboard.

Best Practices & Tips

Proactive measures to ensure a robust, efficient, and secure monitoring setup.

Resource Allocation (Limits & Reservations)

Even though the Middleware.io agent is designed to be lightweight, setting resource limits is a crucial Docker best practice, especially in shared or resource-constrained environments. This prevents the agent from consuming excessive resources and potentially impacting your application's performance.

Why? Prevents resource contention and ensures the agent doesn't starve other critical services on the host.

How to Set:

Docker Run: Use --cpus and --memory flags.

docker run -d \
  --name mw-host-agent \
  --cpus="0.5" \ # Limit to 0.5 CPU cores
  --memory="256m" \ # Limit to 256 MB of memory
  # ... rest of your command

Docker Compose (Recommended deploy key for V3.x):

services:
  mw-host-agent:
    # ...
    deploy:
      resources:
        limits:
          cpus: '0.5' # Limit to 0.5 CPU cores
          memory: '256Mi' # Limit to 256 MB of memory
        reservations: # Optional: Guaranteed resources
          cpus: '0.25' # Reserve 0.25 CPU cores

Tip: Start with conservative limits (e.g., 0.5 CPU, 256MB memory) and adjust based on actual resource usage observed in your environment.

Security Considerations

Security should be a top priority when deploying agents that access host resources.

Read-Only Volumes (:ro):
- Why? Minimizes the potential attack surface by ensuring the agent container cannot write to the host's critical filesystems.
- How: Always use the :ro (read-only) flag when mounting host paths (e.g., /var/run/docker.sock:/var/run/docker.sock:ro, /proc:/proc:ro).
privileged: true vs. cap_add:
- Why? privileged: true grants the container nearly all capabilities of the host, which is a significant security risk. While it simplifies host-level access, consider alternatives.
- Recommendation: If strict security is paramount, explore if specific Linux capabilities (cap_add) can replace privileged mode (e.g., SYS_PTRACE (for process monitoring), DAC_READ_SEARCH (for filesystem access checks)). However, configuring cap_add correctly can be complex and might require specific experimentation.
API Key Management:
- Why? Your MW_API_KEY is a sensitive credential. Exposure could lead to unauthorized data access or abuse.
- How:
  - Avoid Hardcoding: Never hardcode your API key directly in public repositories or images.
  - Environment Variables: Use environment variables as shown in this guide, and ensure they are managed securely (e.g., not committed to version control).
  - Docker Secrets/Orchestrator Secrets: For production environments, utilize Docker Secrets (for Docker Swarm) or Kubernetes Secrets to inject the API key securely.
  - CI/CD Integration: Integrate your CI/CD pipeline's secret management features to inject the MW_API_KEY during deployment.

Monitoring Agent Health

It's crucial to monitor the monitoring agent itself to ensure continuous data collection.

Why? If the agent fails, you lose visibility into your Docker environment.
How:
- Regular Checks: Periodically check the agent's status using docker ps and review its logs with docker logs mw-host-agent.
- Alerting in Middleware.io: Consider setting up alerts in Middleware.io for the agent's own metrics (if available, e.g., agent CPU/memory usage) or for the absence of data from a specific host/agent.
- Container Restart Policy: The --restart unless-stopped flag (or restart: unless-stopped in Docker Compose) is a critical best practice to ensure the agent automatically restarts if Docker restarts or if the agent container crashes.

Updating the Agent

Keeping your agent updated ensures you benefit from the latest features, bug fixes, and security patches.

Why? Access to new features, performance improvements, and security enhancements.
How to Upgrade:
1. Stop and remove the current agent container:
```
docker stop mw-host-agent
docker rm mw-host-agent
```
2. Pull the latest image: This ensures you get the most recent version.
```
docker pull ghcr.io/middleware-labs/mw-host-agent:master
```
3. Run the agent again using your original docker run command or docker compose up -d. This will start a new container with the updated image.

Consistency Across Environments

Maintain uniformity in your agent deployment and configuration.

Why? Reduces configuration drift, simplifies troubleshooting, and ensures consistent data quality across development, staging, and production.
How: Use declarative tools like Docker Compose or orchestration platforms (Kubernetes) to define your agent service, ensuring the same configuration (volume mounts, environment variables, resource limits) is applied everywhere.