Docker agent troubleshooting and best practices
This section combines actionable troubleshooting steps with best practices to ensure your Middleware.io Host Agent is running efficiently, securely, and provides reliable data.
Verification & Common Issues
This part focuses on diagnosing and resolving problems if your agent is not performing as expected.
Verify Agent Status
Before diving into complex troubleshooting, confirm the basics:
Check Container Status: Ensure the
mw-host-agent
container is running:docker ps -a --filter name=mw-host-agent
- Expected Output: Look for
STATUS
to beUp ... seconds/minutes
. - Problematic Output: If it's
Exited
,Created
, orRestarting
, it indicates an issue that needs further investigation (check logs).
- Expected Output: Look for
Inspect Agent Logs: The agent's logs are your primary source of diagnostic information. This will show startup messages, connection attempts, and any errors.
docker logs mw-host-agent
- What to Look For:
- Messages indicating successful connection to
MW_TARGET
. - Errors related to API key, network connectivity, or file/volume permissions.
- If necessary, increase
MW_LOG_LEVEL
todebug
(see Section 4.1. Environment Variables) for more verbose output, but remember to revert for production.
- Messages indicating successful connection to
- What to Look For:
Check Middleware.io Dashboard: Confirm that data is actually appearing in your Middleware.io account.
- Infrastructure Section: Verify if your Docker host appears and is reporting metrics (CPU, Memory, Disk, Network).
- Container List: Check if your running Docker containers are listed and reporting metrics and logs.
- Logs Section: Confirm that container logs and any configured host logs are flowing into the platform.
- Note: It may take 1-2 minutes for initial data to be processed and displayed.
Common Issues and Solutions
Agent Container Not Starting or Exited:
- Symptom:
docker ps
showsmw-host-agent
asExited
orCreated
shortly after deployment. - Solution:
- Check Logs First: The
docker logs mw-host-agent
command will almost always reveal the root cause (e.g., "invalid API key", "connection refused", "permission denied"). - API Key/Target: Crucial. Double-check your
MW_API_KEY
andMW_TARGET
environment variables. They must be exact, without leading/trailing spaces or typos. Get them directly from your Middleware.io "Installation" page. - Network: Ensure the Docker host has outbound internet access to the
MW_TARGET
URL (typically port 443 for HTTPS). Check firewall rules (ufw
,firewalld
,iptables
). - Volume Mounts: Verify that all required volumes (
/var/run/docker.sock
,/proc
,/sys
,/var/lib/docker/containers
) are correctly mounted and accessible. Incorrect paths or insufficient host permissions can prevent the agent from starting.
- Check Logs First: The
- Symptom:
No Data Appearing in Dashboard (Agent Running):
- Symptom: The agent container appears
Up
indocker ps
, but no metrics or logs are visible in your Middleware.io dashboard. - Solution:
- Agent Logs Review: Run
docker logs mw-host-agent
again. Look for any messages indicating data transmission failures or connection errors toMW_TARGET
. - API Key/Target Validation: Even if the agent starts, an incorrect key/target will prevent data ingestion. Re-verify them.
- Firewall/Proxy: Confirm no firewall is blocking outbound connections from the Docker host to your Middleware.io ingestion endpoint. If you are behind a corporate proxy, ensure
HTTP_PROXY
/HTTPS_PROXY
environment variables are correctly set for the agent container (see Section 4.1). - Time Synchronization: Ensure your Docker host's system clock is synchronized (e.g., using
ntpdate
orchrony
). Significant time skew can cause data rejection by the ingestion service. - Initial Delay: Allow a few minutes (1-2 minimum) for data to be collected, transmitted, and indexed by Middleware.io before it appears in the dashboard.
- Agent Logs Review: Run
- Symptom: The agent container appears
Data Missing or Incomplete (e.g., some containers, no logs):
- Symptom: You see some data (e.g., host metrics) but not all expected data (e.g., specific container metrics, or any logs).
- Solution:
- Volume Mounts for Logs:
- For container logs: Ensure
/var/lib/docker/containers
is correctly mounted read-only (:ro
). This path is where Docker stores individual container log files. - For host logs: Verify that
/var/log
(or your custom paths specified inMW_AGENT_LOG_PATHS
) is correctly mounted.
- For container logs: Ensure
- Host Permissions: Although mounts are read-only from the container's perspective, the Docker daemon (and underlying user) on the host needs sufficient permissions to access those paths.
- Exclusion/Inclusion Lists: Check if
MW_CONTAINER_EXCLUDE_LIST
orMW_CONTAINER_INCLUDE_LIST
environment variables are set. These can inadvertently filter out containers or log sources you intend to monitor. - APM Traces: Remember, the host agent provides infrastructure for APM. To collect application traces, you must also instrument your application code with language-specific Middleware.io APM agents (e.g., Java Agent, Node.js Agent) within your application containers. The host agent does not automatically collect APM traces from your application code.
- Volume Mounts for Logs:
Agent Using Excessive Resources (CPU/Memory):
- Symptom: The
mw-host-agent
container is consistently consuming a high percentage of CPU or a large amount of memory. - Solution:
- Check Agent Logs Verbosity: If
MW_LOG_LEVEL
is set todebug
ortrace
, it can significantly increase resource usage due to extensive logging. Revert toinfo
for production environments. - Reduce Collection Frequency: Adjust
MW_METRIC_INTERVAL_SECONDS
to a higher value (e.g., 30 or 60 seconds) if very frequent metric updates are not critically needed. This reduces the load on the agent. - Number of Monitored Entities: A very high number of Docker containers, host processes, or log files on the system will naturally increase the agent's resource consumption. If this is the case, consider using
MW_CONTAINER_EXCLUDE_LIST
to focus monitoring on critical containers. - Set Resource Limits: Proactively set CPU and memory limits for the agent container to prevent it from impacting other services on the host (see Section 5.2.1).
- Check Agent Logs Verbosity: If
- Symptom: The
Logs and Diagnostics for Support
When contacting Middleware.io support, providing the following information will help expedite resolution:
- The exact
docker run
command ordocker-compose.yml
file you are using. - The full output of
docker logs mw-host-agent
(ideally after settingMW_LOG_LEVEL=debug
for a few minutes to capture verbose information). - The output of
docker ps -a --filter name=mw-host-agent
. - A clear description of the issue: what's happening, what you expect to happen, and when the issue started.
- Any relevant error messages or screenshots from your Middleware.io dashboard.
Best Practices & Tips
Proactive measures to ensure a robust, efficient, and secure monitoring setup.
Resource Allocation (Limits & Reservations)
Even though the Middleware.io agent is designed to be lightweight, setting resource limits is a crucial Docker best practice, especially in shared or resource-constrained environments. This prevents the agent from consuming excessive resources and potentially impacting your application's performance.
- Why? Prevents resource contention and ensures the agent doesn't starve other critical services on the host.
- How to Set:
- Docker Run: Use
--cpus
and--memory
flags.docker run -d \ --name mw-host-agent \ --cpus="0.5" \ # Limit to 0.5 CPU cores --memory="256m" \ # Limit to 256 MB of memory # ... rest of your command
- Docker Compose (Recommended
deploy
key for V3.x):services: mw-host-agent: # ... deploy: resources: limits: cpus: '0.5' # Limit to 0.5 CPU cores memory: '256Mi' # Limit to 256 MB of memory reservations: # Optional: Guaranteed resources cpus: '0.25' # Reserve 0.25 CPU cores
- Tip: Start with conservative limits (e.g., 0.5 CPU, 256MB memory) and adjust based on actual resource usage observed in your environment.
- Docker Run: Use
Security Considerations
Security should be a top priority when deploying agents that access host resources.
- Read-Only Volumes (
:ro
):- Why? Minimizes the potential attack surface by ensuring the agent container cannot write to the host's critical filesystems.
- How: Always use the
:ro
(read-only) flag when mounting host paths (e.g.,/var/run/docker.sock:/var/run/docker.sock:ro
,/proc:/proc:ro
).
privileged: true
vs.cap_add
:- Why?
privileged: true
grants the container nearly all capabilities of the host, which is a significant security risk. While it simplifies host-level access, consider alternatives. - Recommendation: If strict security is paramount, explore if specific Linux capabilities (
cap_add
) can replaceprivileged
mode (e.g.,SYS_PTRACE
(for process monitoring),DAC_READ_SEARCH
(for filesystem access checks)). However, configuringcap_add
correctly can be complex and might require specific experimentation.
- Why?
- API Key Management:
- Why? Your
MW_API_KEY
is a sensitive credential. Exposure could lead to unauthorized data access or abuse. - How:
- Avoid Hardcoding: Never hardcode your API key directly in public repositories or images.
- Environment Variables: Use environment variables as shown in this guide, and ensure they are managed securely (e.g., not committed to version control).
- Docker Secrets/Orchestrator Secrets: For production environments, utilize Docker Secrets (for Docker Swarm) or Kubernetes Secrets to inject the API key securely.
- CI/CD Integration: Integrate your CI/CD pipeline's secret management features to inject the
MW_API_KEY
during deployment.
- Why? Your
Monitoring Agent Health
It's crucial to monitor the monitoring agent itself to ensure continuous data collection.
- Why? If the agent fails, you lose visibility into your Docker environment.
- How:
- Regular Checks: Periodically check the agent's status using
docker ps
and review its logs withdocker logs mw-host-agent
. - Alerting in Middleware.io: Consider setting up alerts in Middleware.io for the agent's own metrics (if available, e.g., agent CPU/memory usage) or for the absence of data from a specific host/agent.
- Container Restart Policy: The
--restart unless-stopped
flag (orrestart: unless-stopped
in Docker Compose) is a critical best practice to ensure the agent automatically restarts if Docker restarts or if the agent container crashes.
- Regular Checks: Periodically check the agent's status using
Updating the Agent
Keeping your agent updated ensures you benefit from the latest features, bug fixes, and security patches.
- Why? Access to new features, performance improvements, and security enhancements.
- How to Upgrade:
- Stop and remove the current agent container:
docker stop mw-host-agent docker rm mw-host-agent
- Pull the latest image: This ensures you get the most recent version.
docker pull ghcr.io/middleware-labs/mw-host-agent:master
- Run the agent again using your original
docker run
command ordocker compose up -d
. This will start a new container with the updated image.
- Stop and remove the current agent container:
Consistency Across Environments
Maintain uniformity in your agent deployment and configuration.
- Why? Reduces configuration drift, simplifies troubleshooting, and ensures consistent data quality across development, staging, and production.
- How: Use declarative tools like Docker Compose or orchestration platforms (Kubernetes) to define your agent service, ensuring the same configuration (volume mounts, environment variables, resource limits) is applied everywhere.