ClickHouse¶
Since testcontainers-python v4.6.0
Introduction¶
The Testcontainers module for ClickHouse.
Adding this module to your project dependencies¶
Please run the following command to add the ClickHouse module to your python dependencies:
pip install testcontainers[clickhouse] clickhouse-driver
Usage example¶
from datetime import datetime, timedelta
import pandas as pd
from clickhouse_driver import Client
from testcontainers.clickhouse import ClickHouseContainer
def basic_example():
with ClickHouseContainer() as clickhouse:
# Get connection parameters
host = clickhouse.get_container_host_ip()
port = clickhouse.get_exposed_port(clickhouse.port)
# Create ClickHouse client
client = Client(host=host, port=port)
# Create a test table
client.execute("""
CREATE TABLE IF NOT EXISTS test_table (
id UInt32,
name String,
value Float64,
timestamp DateTime
) ENGINE = MergeTree()
ORDER BY (id, timestamp)
""")
print("Created test table")
# Generate test data
now = datetime.now()
data = [
(1, "test1", 100.0, now),
(2, "test2", 200.0, now + timedelta(hours=1)),
(3, "test3", 300.0, now + timedelta(hours=2)),
]
# Insert data
client.execute("INSERT INTO test_table (id, name, value, timestamp) VALUES", data)
print("Inserted test data")
# Query data
result = client.execute("""
SELECT *
FROM test_table
ORDER BY id
""")
print("\nQuery results:")
for row in result:
print(f"ID: {row[0]}, Name: {row[1]}, Value: {row[2]}, Timestamp: {row[3]}")
# Execute a more complex query
result = client.execute("""
SELECT
name,
avg(value) as avg_value,
min(value) as min_value,
max(value) as max_value
FROM test_table
GROUP BY name
ORDER BY avg_value DESC
""")
print("\nAggregation results:")
for row in result:
print(f"Name: {row[0]}, Avg: {row[1]:.2f}, Min: {row[2]:.2f}, Max: {row[3]:.2f}")
# Convert to pandas DataFrame
df = pd.DataFrame(result, columns=["name", "avg_value", "min_value", "max_value"])
print("\nDataFrame:")
print(df)
if __name__ == "__main__":
basic_example()
Features¶
- Column-oriented storage
- High-performance analytics
- Real-time data processing
- SQL support
- Data compression
- Parallel processing
- Distributed queries
- Integration with pandas for data analysis
Configuration¶
The ClickHouse container can be configured with the following parameters:
port
: Port to expose (default: 9000)version
: ClickHouse version to use (default: "latest")user
: Database username (default: "default")password
: Database password (default: "")database
: Database name (default: "default")