{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ERA5-Land Stündliche Daten\n", "\n", "ERA5-Land ist ein hochauflösendes Reanalyse-Datensatz, der eine konsistente und detaillierte Ansicht von Landvariablen über mehrere Jahrzehnte hinweg bietet. Durch die Kombination von Modelldaten mit atmosphärischer Antriebskraft aus ERA5 wird eine hohe Genauigkeit sichergestellt. Durch die Korrektur von Eingangsvariablen für Höhenunterschiede und die Nutzung indirekter Beobachtungseinflüsse bietet ERA5-Land eine verbesserte Präzision für Anwendungen im Bereich der Landoberflächenanalyse, wie z. B. Hochwasser- und Dürrevorhersagen. Trotz gewisser Unsicherheiten macht die umfangreiche zeitliche und räumliche Auflösung ERA5-Land zu einer wertvollen Ressource für Entscheidungsfindung und Umweltanalysen.\n", "\n", "**Informationen zum Datensatz:**\n", "* Quelle: [ERA5-Land Hourly Data](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land?tab=overview')\n", "* Author: str.ucture GmbH\n", "* Notebook-Version: 1.2 (Aktualisiert: März 05, 2025)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Festlegen der Pfade und Arbeitsverzeichnisse" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "''' ---- Verzeichnisse hier angeben ---- '''\n", "download_folder = r\".\\data\\era5-land-hourly-data\\download\"\n", "working_folder = r\".\\data\\era5-land-hourly-data\\working\"\n", "geotiff_folder = r\".\\data\\era5-land-hourly-data\\geotiff\"\n", "csv_folder = r\".\\data\\era5-land-hourly-data\\csv\"\n", "output_folder = r\".\\data\\era5-land-hourly-data\\output\"\n", "''' ----- Ende der Angaben ---- '''\n", "\n", "os.makedirs(download_folder, exist_ok=True)\n", "os.makedirs(working_folder, exist_ok=True)\n", "os.makedirs(geotiff_folder, exist_ok=True)\n", "os.makedirs(csv_folder, exist_ok=True)\n", "os.makedirs(output_folder, exist_ok=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Herunterladen und Entpacken des Datensatzes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1 Authentifizierung" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import cdsapi\n", "\n", "def main():\n", " # API-Key für die Authentifizierung\n", " api_key = \"fdae60fd-35d4-436f-825c-c63fedab94a4\"\n", " api_url = \"https://cds.climate.copernicus.eu/api\"\n", "\n", " # Erstellung des CDS-API-Clients\n", " client = cdsapi.Client(url=api_url, key=api_key)\n", " return client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 Definieren Sie die „request“ und laden Sie den Datensatz herunter" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "27814f47066445f8abd3fc208f368a57", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Wähle eine Variablengruppe', layout=Layout(width='50%'), options=('var_group_temperature…" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import ipywidgets as widgets\n", "import _utils.extra_era5_land_hourly as utils\n", "\n", "var_group_name_list = utils.var_group_name_list\n", "var_group_dict = utils.var_group_dict\n", "\n", "selected_variable_group = widgets.Dropdown(\n", " options = var_group_name_list,\n", " value = var_group_name_list[0],\n", " description = \"Wähle eine Variablengruppe\",\n", " style = dict(description_width='initial'),\n", " layout = widgets.Layout(width='50%'),\n", ")\n", "\n", "selected_variable_group" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2f457a4f32b445c0926b0afeee71dfa7", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Wähle die gewünschte Variable', index=1, layout=Layout(width='50%'), options=('2m_dewpoi…" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "current_variable_group = var_group_dict[selected_variable_group.value]\n", "\n", "selected_variable = widgets.Dropdown(\n", " options=current_variable_group,\n", " value=current_variable_group[1],\n", " description=\"Wähle die gewünschte Variable\",\n", " style=dict(description_width='initial'),\n", " layout=widgets.Layout(width='50%'),\n", ")\n", "\n", "selected_variable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 Definiere das \"Jahr\" zum Herunterladen\n", "\n", "> Hinweis: Für das ausgewählte **Jahr** sind alle Monate (Januar bis Dezember), Tage (1 bis 30/31) und Stunden (00:00 bis 23:00) im \"request\"-Parameter angegeben. Ändere diese Einstellung, um die Dateigröße zu reduzieren oder einen spezifischen Datensatz herunterzuladen." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "bd9c9699ea0f46ffb14fdd9f97f666cd", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Wähle das Jahr zum Herunterladen der Daten:', index=74, layout=Layout(width='50%'), opti…" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from datetime import datetime\n", "\n", "selected_year = widgets.Dropdown(\n", " options=[str(year) for year in range(1950, 2024+1)],\n", " value=str(2024),\n", " description=\"Wähle das Jahr zum Herunterladen der Daten:\",\n", " disabled=False,\n", " style=dict(description_width='initial'),\n", " layout=widgets.Layout(width='50%'),\n", ")\n", "\n", "selected_year" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 Define Bounding Box Extents (Bbox)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Definieren der Begrenzungsrahmen-Koordinaten (WGS84-Format)\n", "# Das Koordinatenformat lautet: [Norden, Westen, Süden, Osten]\n", "bbox_wgs84_deutschland = [56.0, 5.8, 47.2, 15.0]\n", "bbox_wgs84_de_standard = [5.7, 47.1, 15.2, 55.2]\n", "bbox_wgs84_konstanz = [47.9, 8.9, 47.6, 9.3]\n", "bbox_wgs84_konstanz_standard = [9.0, 47.6, 9.3, 47.8] # [West, South, East, North]\n", "\n", "# Alternativ können Sie ein Shapefile für eine präzise geografische Filterung verwenden\n", "import geopandas as gpd\n", "import math\n", "\n", "# Beispiel: Shapefile von Konstanz laden (WGS84-Projektion)\n", "de_shapefile = r\"./shapefiles/de_boundary.shp\"\n", "de_gdf = gpd.read_file(de_shapefile)\n", "\n", "# Extrahieren Sie den Begrenzungsrahmen des Shapefiles\n", "de_bounds = de_gdf.total_bounds\n", "\n", "# Passen Sie den Begrenzungsrahmen an und puffern Sie ihn, um einen etwas größeren\n", "de_bounds_adjusted = [(math.floor(de_bounds[0]* 10)/10)-0.1,\n", " (math.floor(de_bounds[1]* 10)/10)-0.1,\n", " (math.ceil(de_bounds[2]* 10)/10)+0.1,\n", " (math.ceil(de_bounds[3]* 10)/10)+0.1]\n", "\n", "# Ordnen Sie die Koordinaten in das Format: [Nord, West, Süd, Ost] um.\n", "bbox_de_bounds_adjusted = [de_bounds_adjusted[3], de_bounds_adjusted[0],\n", " de_bounds_adjusted[1], de_bounds_adjusted[2]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.4 Definiere \"Datensatz\" und \"Anfrage\"" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Definition des Datensatzes und der Request-Parameter\n", "dataset = \"reanalysis-era5-land\"\n", "request = {\n", " \"variable\": selected_variable.value,\n", " \"year\": selected_year.value,\n", " \"month\": [str(month) for month in range(13)],\n", " \"day\": [str(day) for day in range(32)],\n", " \"time\": [f\"{hour:02d}:00\" for hour in range(24)],\n", " \"data_format\": \"netcdf\",\n", " \"download_format\": \"unarchived\",\n", " \"area\": bbox_de_bounds_adjusted\n", "}" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Datensatz bereits heruntergeladen.\n" ] } ], "source": [ "download_folder_subset = os.path.join(download_folder, f\"{selected_variable.value}\")\n", "os.makedirs(download_folder_subset, exist_ok=True)\n", "\n", "# Führen Sie es aus, um den Datensatz herunterzuladen:\n", "def main_retrieve():\n", " dataset_filename = f\"{dataset}-{selected_variable.value}-{selected_year.value}.nc\"\n", " dataset_filepath = os.path.join(download_folder_subset, dataset_filename)\n", "\n", " # Den Datensatz nur herunterladen, wenn er noch nicht heruntergeladen wurde\n", " if not os.path.isfile(dataset_filepath):\n", " # Rufen Sie den CDS-Client nur auf, wenn der Datensatz noch nicht heruntergeladen wurde.\n", " client = main()\n", " # Den Datensatz mit den definierten Anforderungsparametern herunterladen\n", " client.retrieve(dataset, request, dataset_filepath)\n", " else:\n", " print(\"Datensatz bereits heruntergeladen.\")\n", "\n", "if __name__ == \"__main__\":\n", " main_retrieve()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 Entpacke die ZIP-Datei im Ordner\n", "\n", "> Hinweis: Da der Datensatz für eine einzelne Variable heruntergeladen wird, wird nur eine NetCDF-Datei heruntergeladen, und das CDS erstellt keine ZIP-Datei für eine einzelne Variable im NetCDF-Format." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# import zipfile\n", "\n", "# Definieren Sie einen Extraktionsordner für die ZIP-Datei, der dem Arbeitsordner entspricht\n", "# extract_folder = working_folder\n", "\n", "# # Extract the ZIP file\n", "# try: \n", "# if not os.listdir(extract_folder):\n", "# dataset_filename = f\"{dataset}.zip\"\n", "# dataset_filepath = os.path.join(download_folder, dataset_filename)\n", "\n", "# with zipfile.ZipFile(dataset_filepath, 'r') as zip_ref:\n", "# zip_ref.extractall(extract_folder)\n", "# print(f\"Dateien erfolgreich extrahiert nach: {extract_folder}\")\n", "# else:\n", "# print(\"Ordner ist nicht leer. Entpacken überspringen.\")\n", "# except FileNotFoundError:\n", "# print(f\"Fehler: Die Datei {dataset_filepath} wurde nicht gefunden.\")\n", "# except zipfile.BadZipFile:\n", "# print(f\"Fehler: Die Datei {dataset_filepath} ist keine gültige ZIP-Datei.\")\n", "# except Exception as e:\n", "# print(f\"Ein unerwarteter Fehler ist aufgetreten: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Untersuchen der Metadaten der NetCDF4-Datei" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1 Erstellen eines DataFrame mit verfügbaren NetCDF-Dateien" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
filenamedatasetds_variablevariable_nameyear
0reanalysis-era5-land...reanalysis-era5-land2m_temperaturet2m1950
1reanalysis-era5-land...reanalysis-era5-land2m_temperaturet2m1951
2reanalysis-era5-land...reanalysis-era5-land2m_temperaturet2m1952
3reanalysis-era5-land...reanalysis-era5-land2m_temperaturet2m1953
4reanalysis-era5-land...reanalysis-era5-land2m_temperaturet2m1954
5reanalysis-era5-land...reanalysis-era5-land2m_temperaturet2m1955
6reanalysis-era5-land...reanalysis-era5-land2m_temperaturet2m1956
7reanalysis-era5-land...reanalysis-era5-land2m_temperaturet2m1957
8reanalysis-era5-land...reanalysis-era5-land2m_temperaturet2m1958
9reanalysis-era5-land...reanalysis-era5-land2m_temperaturet2m1959
\n", "
" ], "text/plain": [ " filename dataset ds_variable \\\n", "0 reanalysis-era5-land... reanalysis-era5-land 2m_temperature \n", "1 reanalysis-era5-land... reanalysis-era5-land 2m_temperature \n", "2 reanalysis-era5-land... reanalysis-era5-land 2m_temperature \n", "3 reanalysis-era5-land... reanalysis-era5-land 2m_temperature \n", "4 reanalysis-era5-land... reanalysis-era5-land 2m_temperature \n", "5 reanalysis-era5-land... reanalysis-era5-land 2m_temperature \n", "6 reanalysis-era5-land... reanalysis-era5-land 2m_temperature \n", "7 reanalysis-era5-land... reanalysis-era5-land 2m_temperature \n", "8 reanalysis-era5-land... reanalysis-era5-land 2m_temperature \n", "9 reanalysis-era5-land... reanalysis-era5-land 2m_temperature \n", "\n", " variable_name year \n", "0 t2m 1950 \n", "1 t2m 1951 \n", "2 t2m 1952 \n", "3 t2m 1953 \n", "4 t2m 1954 \n", "5 t2m 1955 \n", "6 t2m 1956 \n", "7 t2m 1957 \n", "8 t2m 1958 \n", "9 t2m 1959 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "import pandas as pd\n", "import netCDF4 as nc\n", "\n", "def meta(filename):\n", " # Überprüfen, ob der Dateiname dem erwarteten Muster entspricht\n", " match = re.search(r\"(?Preanalysis-era5-land)-(?P\\d+m_[a-z_]+)-(?P\\d{4})\",filename)\n", "\n", " # Fehler ausgeben, wenn der Dateiname nicht dem erwarteten Schema entspricht\n", " if not match:\n", " match = re.search(\"Der angegebene Dateiname entspricht nicht dem erwarteten Benennungsschema.\")\n", " \n", " # Funktion zum Extrahieren des Variablennamens aus der NetCDF-Datei\n", " def get_nc_variable():\n", " with nc.Dataset(os.path.join(download_folder_subset, filename), 'r') as nc_dataset:\n", " nc_variable_name = nc_dataset.variables.keys()\n", " return [*nc_variable_name][5]\n", "\n", " # Metadaten als Dictionary zurückgeben\n", " return dict(\n", " filename=filename,\n", " path=os.path.join(download_folder_subset, filename),\n", " # index=match.group('index'),\n", " dataset=match.group('dataset'),\n", " ds_variable=match.group('ds_variable'),\n", " variable_name=get_nc_variable(),\n", " year=match.group('year')\n", " )\n", "\n", "# Metadaten für alle NetCDF-Dateien im Ordner extrahieren\n", "# Das Dictionary 'nc_files' enthält alle relevanten Metadaten der verfügbaren NetCDF4-Dateien\n", "# Dieses Dictionary wird später verwendet, um die Dateien in GeoTiff zu konvertieren\n", "nc_files = [meta(f) for f in os.listdir(download_folder_subset) if f.endswith('.nc')]\n", "nc_files = sorted(nc_files, key=lambda x: x['year']) # Nach Jahr sortieren\n", "df_nc_files = pd.DataFrame.from_dict(nc_files)\n", "\n", "# Pandas-Anzeigeoptionen anpassen\n", "pd.options.display.max_colwidth = 24\n", "\n", "# DataFrame anzeigen, ohne die Spalte 'path' darzustellen\n", "df_nc_files.head(10).loc[:, df_nc_files.columns != 'path']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Einzigartige Variablennamen und verfügbare Variablen ausgeben" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 t2m : Verfügbare Variablen: ['number', 'valid_time', 'latitude', 'longitude', 'expver', 't2m']\n" ] } ], "source": [ "# Variable definieren, um bereits verarbeitete Variablennamen zu speichern und Duplikate zu vermeiden \n", "seen_variables = set()\n", "\n", "# Alle Variablen in jeder NetCDF-Datei auflisten\n", "for i, nc_file in enumerate(nc_files):\n", " variable_name = nc_file['variable_name']\n", " \n", " # Überspringen, wenn die Variable bereits verarbeitet wurde\n", " if variable_name in seen_variables:\n", " continue\n", "\n", " # NetCDF-Datei im Lesemodus öffnen\n", " with nc.Dataset(nc_file['path'], mode='r') as nc_dataset:\n", " # Alle Variablen im aktuellen Datensatz auflisten\n", " variables_list = list(nc_dataset.variables.keys())\n", " \n", " # Details der Datei und ihrer Variablen ausgeben\n", " print(f\"{i + 1:<2} {variable_name:<18}: Verfügbare Variablen: {variables_list}\") \n", " \n", " # Diese Variable als verarbeitet markieren\n", " seen_variables.add(variable_name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1. Zusammenfassung der Variable 't2m':\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BeschreibungBemerkungen
0Variablennamet2m
1Datentypfloat32
2Form(8759, 82, 96)
3Variableninfo('valid_time', 'lati...
4EinheitenK
5Langer Name2 metre temperature
\n", "
" ], "text/plain": [ " Beschreibung Bemerkungen\n", "0 Variablenname t2m\n", "1 Datentyp float32\n", "2 Form (8759, 82, 96)\n", "3 Variableninfo ('valid_time', 'lati...\n", "4 Einheiten K\n", "5 Langer Name 2 metre temperature" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Alle Variableninformationen in jeder NetCDF-Datei auflisten\n", "seen_variables = set()\n", "\n", "# Alle variablen Informationen in jeder NetCDF-Datei auflisten\n", "for i, nc_file in enumerate(nc_files):\n", " variable_name = nc_file['variable_name']\n", " \n", " # Überspringen, wenn die Variable bereits verarbeitet wurde\n", " if variable_name in seen_variables:\n", " continue\n", " \n", " # NetCDF-Datei im Lesemodus öffnen\n", " with nc.Dataset(nc_file['path'], mode='r') as nc_dataset:\n", " # Primärvariable-Daten abrufen\n", " variable_data = nc_dataset[variable_name]\n", "\n", " # Zusammenfassung der Primärvariable erstellen\n", " summary = {\n", " \"Variablenname\": variable_name,\n", " \"Datentyp\": variable_data.dtype,\n", " \"Form\": variable_data.shape,\n", " \"Variableninfo\": f\"{variable_data.dimensions}\",\n", " \"Einheiten\": getattr(variable_data, \"units\", \"N/A\"),\n", " \"Langer Name\": getattr(variable_data, \"long_name\", \"N/A\"),\n", " } \n", "\n", " # Datensatz-Zusammenfassung als DataFrame zur besseren Visualisierung anzeigen\n", " nc_summary = pd.DataFrame(list(summary.items()), columns=['Beschreibung', 'Bemerkungen'])\n", " print(f\"{i + 1}. Zusammenfassung der Variable '{variable_name}':\")\n", " display(nc_summary)\n", "\n", " # Variablenname zur Liste der bereits verarbeiteten Variablen hinzufügen\n", " seen_variables.add(variable_name)\n", "\n", " # Ausgabe begrenzen\n", " output_limit = 2\n", " if len(seen_variables) >= output_limit:\n", " print(f\".... (Ausgabe auf die ersten {output_limit} Variablen gekürzt)\")\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Exportieren der NetCDF4-Dateien im CSV-Format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.1 Definieren eine Funktion zum Konvertieren von NetCDF-Daten in DataFrame" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "\n", "# Funktion zur Konvertierung von NetCDF-Daten in ein Pandas DataFrame\n", "def netcdf_to_dataframe(\n", " nc_file,\n", " bounding_box=None):\n", "\n", " # Öffne das NetCDF-Dataset im Lesemodus\n", " with xr.open_dataset(nc_file['path']) as nc_dataset:\n", " # Zugriff auf die Variablendaten aus dem Datensatz\n", " variable_data = nc_dataset[nc_file['variable_name']]\n", " \n", " # Sicherstellen, dass die Namen für Breiten- und Längengrad korrekt sind\n", " latitude_name = 'latitude' if 'latitude' in nc_dataset.coords else 'lat'\n", " longitude_name = 'longitude' if 'longitude' in nc_dataset.coords else 'lon'\n", " \n", " # Falls eine Begrenzungsbox angegeben ist, die Daten filtern\n", " if bounding_box:\n", " filtered_data = variable_data.where(\n", " (nc_dataset[latitude_name] >= bounding_box[1]) & (nc_dataset[latitude_name] <= bounding_box[3]) &\n", " (nc_dataset[longitude_name] >= bounding_box[0]) & (nc_dataset[longitude_name] <= bounding_box[2]),\n", " drop=True\n", " )\n", " else:\n", " filtered_data = variable_data\n", "\n", " # Umwandlung des xarray-Datensatzes in ein Pandas DataFrame\n", " df = filtered_data.to_dataframe().reset_index()\n", "\n", " # Entfernen nicht benötigter Spalten (variiert je nach Datensatz)\n", " if 'height' in df.columns:\n", " df = df.drop(columns=['number'])\n", " if 'quantile' in df.columns:\n", " df = df.drop(columns=['expver'])\n", " \n", " # Valid_time in Datum und Uhrzeit aufteilen\n", " df['valid_time'] = pd.to_datetime(df['valid_time'])\n", " df['date'] = df['valid_time'].dt.date\n", " df['time'] = df['valid_time'].dt.time\n", " df = df.set_index(['date', 'time', latitude_name, longitude_name])\n", " \n", " return df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.2 Nach Begrenzungsrahmen filtern, DataFrame erstellen und als CSV-Datei exportieren" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Datei existiert bereits unter .\\data\\era5-land-hourly-data\\csv\\2m_temperature\\t2m-1950.csv.\n", "Überspringen den Export.\n", "Letzte vorhandene CSV-Datei lesen...\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valid_timenumberexpvert2m
datetimelatitudelongitude
1950-01-0101:00:0047.89.11950-01-01 01:00:0001270.76500
9.21950-01-01 01:00:0001270.69080
9.31950-01-01 01:00:0001270.63416
47.79.11950-01-01 01:00:0001271.12048
9.21950-01-01 01:00:0001271.46814
........................
1950-12-3123:00:0047.89.21950-12-31 23:00:0001267.41724
9.31950-12-31 23:00:0001267.42114
47.79.11950-12-31 23:00:0001267.54810
9.21950-12-31 23:00:0001267.38208
9.31950-12-31 23:00:0001267.46216
\n", "

52554 rows × 4 columns

\n", "
" ], "text/plain": [ " valid_time number expver \\\n", "date time latitude longitude \n", "1950-01-01 01:00:00 47.8 9.1 1950-01-01 01:00:00 0 1 \n", " 9.2 1950-01-01 01:00:00 0 1 \n", " 9.3 1950-01-01 01:00:00 0 1 \n", " 47.7 9.1 1950-01-01 01:00:00 0 1 \n", " 9.2 1950-01-01 01:00:00 0 1 \n", "... ... ... ... \n", "1950-12-31 23:00:00 47.8 9.2 1950-12-31 23:00:00 0 1 \n", " 9.3 1950-12-31 23:00:00 0 1 \n", " 47.7 9.1 1950-12-31 23:00:00 0 1 \n", " 9.2 1950-12-31 23:00:00 0 1 \n", " 9.3 1950-12-31 23:00:00 0 1 \n", "\n", " t2m \n", "date time latitude longitude \n", "1950-01-01 01:00:00 47.8 9.1 270.76500 \n", " 9.2 270.69080 \n", " 9.3 270.63416 \n", " 47.7 9.1 271.12048 \n", " 9.2 271.46814 \n", "... ... \n", "1950-12-31 23:00:00 47.8 9.2 267.41724 \n", " 9.3 267.42114 \n", " 47.7 9.1 267.54810 \n", " 9.2 267.38208 \n", " 9.3 267.46216 \n", "\n", "[52554 rows x 4 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Erstelle einen Ordner zum Speichern der Teilmengen-CSV-Dateien basierend auf der ausgewählten Variable\n", "subset_csv_folder = os.path.join(csv_folder, f\"{selected_variable.value}\")\n", "os.makedirs(subset_csv_folder, exist_ok=True)\n", "\n", "# Exportiere alle netCDF4-Dateien als einzelne CSV-Dateien\n", "for nc_file in nc_files:\n", " # CSV-Dateiname und Pfad für die Ausgabe definieren\n", " csv_filename = f\"{nc_file['variable_name']}-{nc_file['year']}.csv\"\n", " csv_filepath = os.path.join(subset_csv_folder, csv_filename)\n", "\n", " # Exportiere das DataFrame als CSV, falls es noch nicht existiert\n", " if not os.path.isfile(csv_filepath):\n", " dataframe = netcdf_to_dataframe(nc_file, bounding_box=bbox_wgs84_konstanz_standard)\n", " dataframe.to_csv(csv_filepath, sep=',', encoding='utf8')\n", " else:\n", " print(f\"Datei existiert bereits unter {csv_filepath}.\\nÜberspringen den Export.\")\n", " break\n", "\n", "print(\"Letzte vorhandene CSV-Datei lesen...\")\n", "dataframe = pd.read_csv(csv_filepath).set_index(['date', 'time', 'latitude', 'longitude'])\n", "\n", "# Zeige das DataFrame an\n", "dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Exportieren der NetCDF4-Datei nach GeoTIFF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.1 Define a Function to export the NetCDF4 file as GeoTIFF File(s)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from rasterio.transform import from_origin\n", "import rasterio\n", "import sys\n", "\n", "from tqdm.notebook import tqdm\n", "\n", "def main_export_geotiff(\n", " nc_file,\n", " bounding_box=None,\n", " start_year=None,\n", " end_year=None,\n", " merged=None,\n", " output_directory=None):\n", " \n", " \"\"\"\n", " Parameter:\n", " nc_file (dict): Ein Dictionary mit den Schlüsseln 'path' (Dateipfad), 'variable'...\n", " bounding_box (list): [lon_min, lat_min, lon_max, lat_max] (optional).\n", " start_year (int): Startjahr für das Dataset (optional).\n", " end_year (int): Endjahr für das Dataset (optional).\n", " merged (bool): Gibt an, ob ein zusammengeführtes GeoTIFF oder einzelne GeoTIFFs erstellt werden sollen (optional).\n", " output_directory (str): Verzeichnis zum Speichern der Ausgabe-GeoTIFF-Dateien (optional).\n", " \"\"\"\n", " \n", " # Öffnet die NetCDF-Datei\n", " with nc.Dataset(nc_file['path'], 'r') as nc_dataset:\n", " nc_dataset = nc.Dataset(nc_file['path'], 'r')\n", " lon = nc_dataset['longitude'][:]\n", " lat = nc_dataset['latitude'][:]\n", " \n", " # Falls eine Begrenzungsbox angegeben wurde, filtere die Daten entsprechend\n", " if bounding_box:\n", " lon_min, lat_min, lon_max, lat_max = bounding_box\n", " \n", " indices_lat = np.where((lat >= lat_min) & (lat <= lat_max))[0]\n", " indices_lon = np.where((lon >= lon_min) & (lon <= lon_max))[0]\n", " start_lat, end_lat = indices_lat[0], indices_lat[-1] + 1\n", " start_lon, end_lon = indices_lon[0], indices_lon[-1] + 1\n", " else:\n", " start_lat, end_lat = 0, len(lat)\n", " start_lon, end_lon = 0, len(lon)\n", " \n", " lat = lat[start_lat:end_lat]\n", " lon = lon[start_lon:end_lon]\n", " \n", " # Extrahiere die Zeitvariable und konvertiere sie in lesbare Datumsangaben\n", " time_var = nc_dataset.variables['valid_time']\n", " time_units = time_var.units\n", " time_calendar = getattr(time_var, \"calendar\", \"standard\")\n", " cftime = nc.num2date(time_var[:], units=time_units, calendar=time_calendar)\n", " \n", " # Berechnet die räumliche Auflösung und die Rastertransformation\n", " dx = abs(lon[1] - lon[0])\n", " dy = abs(lat[1] - lat[0])\n", " transform = from_origin(lon.min() - dx / 2, lat.max() + dy / 2, dx, dy)\n", " # Hinweis: Die in diesem Code verwendete Transformation unterscheidet sich von anderen Datensätzen\n", "\n", " # Extrahiere Variablen-Daten\n", " variable_data = nc_dataset.variables[nc_file['variable_name']]\n", " variable_data_subset = variable_data[..., start_lat:end_lat, start_lon:end_lon]\n", " \n", " if merged:\n", " # Erstellt ein zusammengeführtes GeoTIFF mit allen Zeitscheiben als separate Bänder\n", " if output_directory:\n", " subset_directory_path = output_directory\n", " else:\n", " subset_directory_path = os.path.join(geotiff_folder, f\"{selected_variable.value}-merged\")\n", " os.makedirs(subset_directory_path, exist_ok=True)\n", "\n", " # Pfad der Ausgabedatei festlegen\n", " output_filename = f\"{nc_file['filename'].replace('.nc','')}.tif\"\n", " output_filepath = os.path.join(subset_directory_path, output_filename)\n", "\n", " # Erstellt eine GeoTIFF-Datei mit mehreren Bändern für jede Zeitscheibe\n", " with rasterio.open(\n", " output_filepath,\n", " \"w\",\n", " driver = \"GTiff\",\n", " dtype = str(variable_data_subset.dtype),\n", " width = variable_data_subset.shape[2],\n", " height = variable_data_subset.shape[1],\n", " count = variable_data_subset.shape[0],\n", " crs = \"EPSG:4326\",\n", " nodata = -9999,\n", " transform=transform,\n", " ) as dst:\n", " for time_index in tqdm(range(variable_data_subset.shape[0]),\n", " desc=f\"Exportiere zusammengeführte GeoTIFF-Datei für {nc_file['year']}\"): \n", " band_data = variable_data_subset[time_index,:,:]\n", " band_desc = str(cftime[time_index])\n", " \n", " # Schreibe jede Zeitscheibe als Band\n", " dst.write(band_data, time_index + 1)\n", " dst.set_band_description(time_index + 1, band_desc)\n", " \n", " else:\n", " # Export als einzelne GeoTIFF-Dateien\n", " if output_directory:\n", " subset_directory_path = output_directory\n", " else:\n", " subset_directory_path = os.path.join(geotiff_folder,\n", " f\"{selected_variable.value}-individual\",\n", " f\"{nc_file['year']}\")\n", " os.makedirs(subset_directory_path, exist_ok=True)\n", "\n", " for time_index in tqdm(range(variable_data_subset.shape[0]),\n", " desc=\"Exportieren einzelner GeoTIFF-Dateien\"):\n", " # Bestimmt das Datum für die aktuelle Zeitscheibe\n", " band_desc = str(cftime[time_index])\n", "\n", " # Definiert den Speicherort der Ausgabe-GeoTIFF-Datei\n", " output_filename = f\"{nc_file['filename'].replace('.nc','')}-{band_desc.replace(' ','').replace(':','-')}.tif\"\n", " output_filepath = os.path.join(subset_directory_path, output_filename)\n", "\n", " # Exportiert die aktuelle Zeitscheibe als GeoTIFF\n", " with rasterio.open(\n", " output_filepath,\n", " \"w\",\n", " driver=\"GTiff\",\n", " dtype=str(variable_data_subset.dtype),\n", " width=variable_data_subset.shape[2],\n", " height=variable_data_subset.shape[1],\n", " count=1,\n", " crs=\"EPSG:4326\",\n", " nodata=-9999,\n", " transform=transform,\n", " ) as dst:\n", " band_data = variable_data_subset[time_index,:,:]\n", " \n", " dst.write(band_data, 1)\n", " dst.set_band_description(1, band_desc)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Der Ordner ist nicht leer. Überspringe die Konvertierung.\n" ] } ], "source": [ "if __name__ == \"__main__\":\n", " # Exportiere alle NetCDF-Dateien als zusammengeführte GeoTIFF-Datei\n", " # Auf True setzen, um alle NetCDF-Dateien zu konvertieren, oder auf False, um nur zwei Dateien zum Testen zu konvertieren\n", " convert_all = False\n", "\n", " for i, nc_file in enumerate(nc_files):\n", " main_export_geotiff(nc_file=nc_file, bounding_box=None, merged=True)\n", "\n", " if not convert_all and i >= 1:\n", " print(\"Testkonvertierung abgeschlossen: 2 Dateien erfolgreich konvertiert.\\nBeende Konvertierung.\")\n", " break\n", "\n", " # # Zusätzlicher Fall: Exportiere alle NetCDF-Dateien als einzelne GeoTIFF-Dateien\n", " # # Hinweis: Aufgrund der großen Zeitschrittanzahl (In den meisten Fällen sind 365*24 Zeitschritte pro Datensatz verfügbar),\n", " # # wird empfohlen, einzelne GeoTIFF-Dateien nur bei Bedarf zu exportieren.\n", " # # Der folgende Code exportiert die NetCDF-Datei als GeoTIFF für die erste verfügbare Datenreihe, d.h. Jahr=1950\n", " # for nc_file in nc_files[:1]:\n", " # continue_conversion = main_export_geotiff(nc_file=nc_file, bounding_box=None, merged=False)\n", " # if not continue_conversion:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Analyse und Visualisierung Optionen" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6.1 Definieren eine Funktion zur Erstellung einer Heatmap" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import cartopy.feature as cfeature\n", "import cartopy.crs as ccrs\n", "import numpy as np\n", "import cftime\n", "\n", "def main_plt_plot(\n", " year=None,\n", " month=None,\n", " day=None,\n", " hour_of_day=None,\n", " bounding_box=None):\n", " \n", " # Definiere den Dateipfad basierend auf der ausgewählten Variable und dem Jahr\n", " filename = f\"{dataset}-{selected_variable.value}-{year}.nc\"\n", " filepath = os.path.join(download_folder_subset, filename)\n", "\n", " # Öffnet die NetCDF-Datei\n", " with nc.Dataset(filepath, mode='r') as nc_dataset:\n", " latitudes = nc_dataset.variables['latitude'][:]\n", " longitudes = nc_dataset.variables['longitude'][:]\n", "\n", " # Falls eine Begrenzungsbox angegeben wurde, filtere die Daten entsprechend\n", " if bounding_box:\n", " lat_indices = np.where((latitudes >= bounding_box[1]) & (latitudes <= bounding_box[3]))[0]\n", " lon_indices = np.where((longitudes >= bounding_box[0]) & (longitudes <= bounding_box[2]))[0]\n", "\n", " lat_subset = latitudes[lat_indices]\n", " lon_subset = longitudes[lon_indices]\n", " else:\n", " lat_indices = slice(None)\n", " lon_indices = slice(None)\n", "\n", " lat_subset = latitudes\n", " lon_subset = longitudes\n", "\n", " # Konvertiere die Variable valid_time in cftime-Objekte\n", " time_var = nc_dataset.variables['valid_time']\n", " time_units = time_var.units\n", " time_calendar = getattr(time_var, \"calendar\", \"standard\")\n", " cftime_values = nc.num2date(time_var[:], units=time_units, calendar=time_calendar)\n", "\n", " selected_time = cftime.DatetimeProlepticGregorian(year, month, day, hour_of_day, 0, 0, 0, has_year_zero=True)\n", " time_index = np.where(cftime_values == selected_time)[0]\n", "\n", " # Extrahiere Variablen-Daten\n", " nc_variable_name = nc_dataset.variables.keys()\n", " variable_name = [*nc_variable_name][5]\n", " variable_data = nc_dataset[variable_name][..., lat_indices, lon_indices]-273.15\n", " var_units = getattr(nc_dataset.variables[variable_name], \"units\", \"N/A\")\n", " var_longname = getattr(nc_dataset.variables[variable_name], \"long_name\", \"N/A\")\n", "\n", " # NaN-Werte für Perzentilberechnungen entfernen\n", " band_data_nonan = variable_data[~np.isnan(variable_data)]\n", " vmin = np.nanpercentile(band_data_nonan, 1)\n", " vmax = np.nanpercentile(band_data_nonan, 99)\n", " \n", " def dynamic_round(value):\n", " # Bestimmen Sie die Größe des Wertes.\n", " order_of_magnitude = np.floor(np.log10(abs(value)))\n", " \n", " # Verwenden Sie diese Größe, um die Genauigkeit dynamisch zu wählen.\n", " if order_of_magnitude < -2: # Werte kleiner als 0,01\n", " return round(value, 3)\n", " elif order_of_magnitude < -1: # Werte zwischen 0,01 und 1\n", " return round(value, 2)\n", " elif order_of_magnitude < 0: # Werte zwischen 1 und 10\n", " return round(value, 1)\n", " else: # Werte 10 oder größer\n", " return round(value)\n", " \n", " # Dynamische Rundung auf vmin und vmax anwenden\n", " vmin = dynamic_round(vmin)\n", " vmax = dynamic_round(vmax)\n", "\n", " bins = 10\n", " interval = (vmax - vmin) / bins\n", "\n", " # Erstellen Sie ein 2D-Netzgitter für die grafische Darstellung.\n", " lon_grid, lat_grid = np.meshgrid(lon_subset, lat_subset)\n", "\n", " # Erstelle die Figur\n", " fig, ax = plt.subplots(\n", " figsize=(12, 8),\n", " facecolor='#f1f1f1',\n", " edgecolor='k',\n", " subplot_kw={'projection': ccrs.PlateCarree()}\n", " )\n", "\n", " # Kartenmerkmale hinzufügen\n", " ax.coastlines(edgecolor='black', linewidth=0.5)\n", " ax.add_feature(cfeature.BORDERS, edgecolor='black', linewidth=0.5)\n", "\n", " # Erstelle ein Colormesh-Plot\n", " cmap = plt.get_cmap(\"viridis\", bins)\n", " pcm = ax.pcolormesh(\n", " lon_grid, lat_grid, variable_data[0, :, :],\n", " transform=ccrs.PlateCarree(),\n", " cmap=cmap,\n", " shading='auto',\n", " vmin=vmin,\n", " vmax=vmax\n", " )\n", "\n", " # Passe die Kartenausdehnung an die Daten an\n", " ax.set_extent([lon_subset.min(), lon_subset.max(), lat_subset.min(), lat_subset.max()], crs=ccrs.PlateCarree())\n", "\n", " # Einen Farbbalken hinzufügen\n", " ticks = np.linspace(vmin, vmax, num=bins + 1)\n", " cbar = plt.colorbar(pcm, ax=ax, orientation='vertical', pad=0.02, ticks=ticks)\n", " cbar.set_label(f\"{var_longname}, ({variable_name})\", fontsize=12)\n", " cbar.ax.tick_params(labelsize=12)\n", " \n", " # Gitterlinien hinzufügen\n", " gl = ax.gridlines(draw_labels=True,\n", " crs=ccrs.PlateCarree(),\n", " linewidth=0.8,\n", " color='gray',\n", " alpha=0.7,\n", " linestyle='--')\n", " gl.top_labels = False \n", " gl.right_labels = False\n", " gl.xlabel_style = {'size': 10, 'color': 'black'}\n", " gl.ylabel_style = {'size': 10, 'color': 'black'}\n", " \n", " # Titel und Beschriftungen hinzufügen\n", " fig.text(0.5, 0.0, 'Longitude', ha='center', fontsize=14)\n", " fig.text(0.04, 0.5, 'Latitude', va='center', rotation='vertical', fontsize=14)\n", " ax.set_aspect(\"equal\")\n", "\n", " # Einen Titel hinzufügen\n", " ax.set_title(f\"{var_longname} ({variable_name}), {str(selected_time)}\", fontsize=14)\n", "\n", " # Layout anpassen und das Diagramm anzeigen\n", " plt.tight_layout()\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "c:\\Users\\ShaileshShrestha\\anaconda3\\envs\\cds_env\\lib\\site-packages\\numpy\\lib\\_function_base_impl.py:4842: UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray.\n", " arr.partition(\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "if __name__ == \"__main__\":\n", " main_plt_plot(year=2020, month=8, day=15, hour_of_day=12)" ] } ], "metadata": { "kernelspec": { "display_name": "cds_env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" } }, "nbformat": 4, "nbformat_minor": 2 }