M3: Data Structures — Python for Corporate Professionals | OTLMS

M3 · Data Structures Python for Corporate Professionals

Lists, Tuples, Dictionaries & Sets

Real programs don’t work with one value at a time — they work with collections. A list of servers. A dictionary of ticket details. A set of unique user IDs. This module teaches you Python’s four built-in collection types and, crucially, when to reach for each one. Get this right and your code becomes dramatically easier to write and read.

5 topics ~75 min read Foundation level All roles

📖

Concept — Four Ways Python Organises Data

Lists, tuples, dictionaries, sets — what each one is for and when to use it

⌃

So far you’ve stored one value per variable. That works fine for a single server name or a single ticket count. But what about a list of 200 servers? Or a record with a server’s name, IP address, CPU usage, and online status all together? You need collections — and Python gives you four excellent ones.

📋 List

Ordered, changeable, allows duplicates. Your go-to collection for any sequence of items you might need to add to, remove from, or loop over. Server names, log entries, ticket IDs.

🔒 Tuple

Ordered, unchangeable, allows duplicates. Use when the data should never change after creation. Coordinates, RGB colour values, database connection parameters.

🗂 Dictionary

Key-value pairs, ordered (Python 3.7+), changeable. Perfect for structured records — a server with its properties, a ticket with its fields, a config with its settings.

🔵 Set

Unordered, unique items only, changeable. Use when you need to eliminate duplicates or perform set operations — who’s in team A but not team B, which IPs appear in both lists.

Topic 1

Lists — Your Most-Used Collection

A list is an ordered sequence of items. The items can be of any type — strings, numbers, booleans, or even other lists. You create a list with square brackets and separate items with commas.

Lists are indexed, meaning every item has a position number starting at zero. The first item is at index 0, the second at index 1, and so on. Python also supports negative indexing — index -1 always refers to the last item, -2 to the second-to-last, regardless of how long the list is.

“web-01”

-5

“db-01”

-4

“api-01”

-3

“cache-01”

-2

“lb-01”

-1

Blue = positive index · Orange = negative index (count from end)

Slicing lets you extract a portion of a list: servers[1:3] gives items at index 1 and 2 (the stop index is excluded). servers[:2] gives the first two items. servers[-2:] gives the last two. This notation feels strange at first but becomes second nature very quickly.

Lists come with a rich set of methods. The ones you’ll use most: .append() adds one item to the end, .extend() adds multiple items, .insert(index, item) adds at a specific position, .remove(item) deletes the first matching item, .pop() removes and returns the last item, .sort() sorts in place, and .reverse() flips the order.

List comprehensions are a compact way to create a new list by transforming or filtering an existing one: [x * 2 for x in numbers] doubles every number. [s for s in servers if s.startswith(“db”)] filters to only database servers. They’re cleaner than writing a for loop with an append, and you’ll see them constantly in professional Python code.

Topic 2

Tuples — Fixed Collections That Don’t Change

A tuple looks just like a list but uses parentheses instead of square brackets, and — crucially — once created, it cannot be changed. You can’t add items, remove items, or change any item in a tuple. This immutability is the whole point.

Use tuples for data that represents a fixed “record” or “coordinate” — things like (latitude, longitude), (host, port, database), or (255, 128, 0) for an RGB colour. If someone reads your code and sees a tuple, they immediately know: this data is not supposed to change. A list signals flexibility; a tuple signals permanence.

Tuples support indexing and slicing just like lists, and they can be unpacked — split into separate variables in one line:

Python

db_config = ("ora-prod-01", 1521, "ORCL")

# Unpack into individual variables
host, port, service = db_config
print(f"Connecting to {host}:{port}/{service}")
# Connecting to ora-prod-01:1521/ORCL

Topic 3

Dictionaries — Structured Records with Named Fields

A dictionary stores key-value pairs. Instead of accessing items by position (like a list), you access them by name. This is enormously useful for representing structured objects — a server record, a ticket, a user profile, a configuration block.

Keys are usually strings (though they can be any immutable type). Values can be anything at all — strings, numbers, booleans, lists, even other dictionaries. You create a dictionary with curly braces: {“key”: value}. You access a value with dict[“key”] or the safer dict.get(“key”) which returns None instead of raising an error if the key doesn’t exist.

Dictionaries are the Python equivalent of a JSON object — and since most APIs return JSON, you’ll be working with dictionaries constantly if you’re in DevOps, AI/ML, or automation work.

💡 Always use .get(“key”) instead of [“key”] when you’re not 100% sure the key exists. dict[“missing_key”] raises a KeyError and crashes your script. dict.get(“missing_key”) quietly returns None. You can also provide a default: dict.get(“cpu”, 0) returns 0 if “cpu” isn’t present.

Topic 4

Sets — Unique Items & Membership Operations

A set is an unordered collection that automatically eliminates duplicates. Every item in a set appears exactly once, no matter how many times you add it. Sets are created with curly braces (like dicts) but with no key-value pairs — just values: {1, 2, 3}. To create an empty set you must use set() — not {}, which creates an empty dict.

Sets shine in two situations: when you need to deduplicate a list quickly, and when you need to compare two collections using set operations. Python’s set operations are clean and fast:

Operation	Python syntax	What it gives you
Union	A \| B or A.union(B)	All items in A or B (or both)
Intersection	A & B or A.intersection(B)	Items that appear in both A and B
Difference	A – B or A.difference(B)	Items in A that are NOT in B
Symmetric diff	A ^ B	Items in A or B but NOT both
Subset check	A.issubset(B)	True if every item in A is also in B

Topic 5

Nested Structures — Lists of Dicts & Dicts of Lists

The real power of Python’s data structures comes from combining them. A list of dictionaries is the most common pattern in professional Python — it’s how you represent a table of records. Each dictionary is one row; each key is a column name.

This pattern matches exactly how data arrives from databases (list of rows), REST APIs (list of JSON objects), and CSV files (list of records). Once you understand list-of-dicts, you’ll recognise it everywhere.

Think of it this way: a list of dictionaries is like a spreadsheet. The list is the spreadsheet. Each dictionary is a row. Each key is a column header. records[2][“email”] reads: row 2, column “email” — exactly how you’d think of a cell reference.

A dictionary of lists is also very common — it groups items by category. For example, {“production”: [“web-01”, “db-01”], “dev”: [“dev-01”, “dev-02”]} maps environments to their server lists. You’ll build these patterns constantly once you reach the file-handling and database modules.

✏️

Syntax Reference

Every operation for all four data structures, with annotations

⌃

Lists — create, access, modify, loop

Python

# Create
servers = ["web-01", "db-01", "api-01", "cache-01"]
empty   = []

# Access by index
servers[0]          # "web-01"  (first)
servers[-1]         # "cache-01" (last)
servers[1:3]        # ["db-01", "api-01"]
servers[:2]         # ["web-01", "db-01"]
servers[-2:]        # ["api-01", "cache-01"]

# Modify
servers.append("lb-01")          # add to end
servers.insert(1, "proxy-01")    # insert at index 1
servers.remove("cache-01")       # remove by value
servers.pop()                       # remove & return last item
servers.pop(0)                     # remove & return item at index 0
servers[0] = "web-prod-01"        # replace by index

# Useful operations
len(servers)                        # number of items
"db-01" in servers                 # True / False membership check
servers.sort()                      # sort alphabetically in-place
servers.sort(reverse=True)         # reverse sort
sorted(servers)                     # returns new sorted list (original unchanged)
servers.reverse()                   # reverse in-place
servers.count("db-01")             # how many times item appears
servers.index("api-01")            # index position of item

# List comprehensions
upper = [s.upper() for s in servers]
db_only = [s for s in servers if s.startswith("db")]
lengths  = [len(s) for s in servers]

Tuples — create, access, unpack

Python

# Create
db_config  = ("ora-prod-01", 1521, "ORCL")
single     = ("only-one",)        # trailing comma required for single-item tuple
coords     = 19.0760, 72.8777     # parentheses optional — still a tuple

# Access (same as lists)
db_config[0]    # "ora-prod-01"
db_config[-1]   # "ORCL"

# Unpack — clean way to split a tuple into named variables
host, port, service = db_config
lat, lon = coords

# Useful operations
len(db_config)          # 3
1521 in db_config        # True
db_config.count("ORCL") # 1
db_config.index(1521)  # 1

# Convert between list and tuple
as_list  = list(db_config)
as_tuple = tuple(as_list)

Dictionaries — create, access, modify, loop

Python

# Create
server = {
    "name":   "prod-db-01",
    "ip":     "10.0.1.15",
    "cpu":    74,
    "online": True
}
empty = {}

# Access
server["name"]               # "prod-db-01"  — raises KeyError if missing
server.get("cpu")            # 74            — returns None if missing
server.get("disk", 0)        # 0             — returns default if missing

# Add & update
server["disk"] = 88           # add new key
server["cpu"]  = 81           # update existing key
server.update({"mem": 62, "env": "production"})  # bulk update

# Delete
del server["disk"]             # remove key — raises KeyError if missing
server.pop("mem", None)       # remove & return — safe, no error if missing

# Check membership
"cpu" in server               # True — checks keys only
"cpu" in server.values()       # False — checks values

# Loop patterns
for key in server:              # iterate over keys
    print(key)

for key, value in server.items():   # iterate over key-value pairs
    print(f"{key}: {value}")

for value in server.values():   # iterate over values only
    print(value)

# Useful operations
len(server)                     # number of key-value pairs
server.keys()                   # dict_keys([...]) — all key names
server.values()                 # dict_values([...]) — all values
server.items()                  # dict_items([('name','prod-db-01'),...]) — both

Sets — create, modify, operations

Python

# Create
team_a = {"alice", "bob", "carol"}
team_b = {"carol", "dave", "eve"}
empty  = set()                   # NOT {} — that creates an empty dict

# Deduplicate a list instantly
ip_list  = ["10.0.1.1", "10.0.1.2", "10.0.1.1", "10.0.1.3"]
unique_ips = set(ip_list)        # {'10.0.1.1', '10.0.1.2', '10.0.1.3'}

# Modify
team_a.add("frank")              # add one item
team_a.update(["grace", "henry"]) # add multiple items
team_a.remove("bob")             # remove — raises KeyError if missing
team_a.discard("nobody")         # remove safely — no error if missing

# Set operations
team_a | team_b                   # union — everyone in either team
team_a & team_b                   # intersection — in both teams
team_a - team_b                   # difference — in A but not B
team_a ^ team_b                   # symmetric diff — in one but not both

# Membership check (very fast — much faster than a list)
"carol" in team_a                 # True
"dave"  in team_a                 # False

Nested structures — list of dicts

Python

# The most common pattern: list of dictionaries
fleet = [
    {"name": "web-01",  "env": "prod", "cpu": 45, "online": True},
    {"name": "db-01",   "env": "prod", "cpu": 82, "online": True},
    {"name": "dev-01",  "env": "dev",  "cpu": 12, "online": False},
]

# Access a single field
fleet[0]["name"]             # "web-01"
fleet[1]["cpu"]              # 82

# Loop and filter
online_servers = [s for s in fleet if s["online"]]

# Sort by a field
by_cpu = sorted(fleet, key=lambda s: s["cpu"], reverse=True)

# Group by environment (dict of lists)
by_env = {}
for s in fleet:
    env = s["env"]
    if env not in by_env:
        by_env[env] = []
    by_env[env].append(s["name"])
# {'prod': ['web-01', 'db-01'], 'dev': ['dev-01']}

💡

Examples — Data Structures Doing Real Work

Five programs that show how each collection type earns its place

⌃

Each example uses the data structure that’s genuinely the right fit for the problem — not just the first one that comes to mind. Noticing why each type was chosen is as important as reading the code itself.

Example 1 — IT Support: Ticket queue manager

IT SupportAll Roles

Uses a list to manage an IT ticket queue — adding new tickets, escalating one to the front, closing resolved ones, and printing the current queue. Lists are right here because order matters and items change constantly.

Python

# IT Support: Ticket queue as a list

queue = ["TKT-1041", "TKT-1035", "TKT-1029"]

# Add a new ticket to the back of the queue
queue.append("TKT-1055")

# Escalate a critical ticket — move it to the front
critical = "TKT-1062"
queue.insert(0, critical)

# Resolve the ticket at the front of the queue
resolved = queue.pop(0)
print(f"Resolved: {resolved}")

# Remove a ticket that was cancelled
if "TKT-1029" in queue:
    queue.remove("TKT-1029")
    print("TKT-1029 cancelled and removed")

# List comprehension: extract only high-priority IDs (ticket num > 1040)
high_priority = [t for t in queue if int(t.split("-")[1]) > 1040]

print(f"\nCurrent queue ({len(queue)} tickets):")
for i, t in enumerate(queue, 1):
    print(f"  {i}. {t}")

print(f"\nHigh priority: {high_priority}")

Output

Resolved: TKT-1062
TKT-1029 cancelled and removed

Current queue (3 tickets):
1. TKT-1041
2. TKT-1035
3. TKT-1055

High priority: [‘TKT-1041’, ‘TKT-1055’]

Example 2 — Database Developer: Query result processor

Database Dev

Represents database query results as a list of dictionaries — exactly how Python database libraries (cx_Oracle, psycopg2) return rows. Filters, sorts, and summarises the data before display. Tuples hold the column definitions, which shouldn’t change.

Python

# Database: Process query results (list of dicts pattern)

# Simulates rows returned from:  SELECT * FROM employees WHERE dept='IT'
employees = [
    {"id": 1, "name": "Priya Sharma",  "dept": "IT",      "salary": 85000, "active": True},
    {"id": 2, "name": "Rahul Mehta",  "dept": "IT",      "salary": 92000, "active": True},
    {"id": 3, "name": "Anita Patel",  "dept": "Finance", "salary": 78000, "active": True},
    {"id": 4, "name": "Suresh Kumar", "dept": "IT",      "salary": 67000, "active": False},
    {"id": 5, "name": "Meera Nair",   "dept": "Finance", "salary": 95000, "active": True},
]

# Immutable column definition (tuple — should never change)
columns = ("id", "name", "dept", "salary", "active")

# Filter: active IT employees only
it_active = [e for e in employees
             if e["dept"] == "IT" and e["active"]]

# Sort by salary descending
it_active.sort(key=lambda e: e["salary"], reverse=True)

# Summary stats
salaries  = [e["salary"] for e in it_active]
total_sal = sum(salaries)
avg_sal   = total_sal / len(salaries)

print(f"{'Name':16} {'Dept':10} {'Salary':>10}")
print("-" * 40)
for e in it_active:
    print(f"{e['name']:16} {e['dept']:10} {e['salary']:>10,}")
print("-" * 40)
print(f"{'Total':27} {total_sal:>10,}")
print(f"{'Average':27} {avg_sal:>10,.0f}")

Output

Name             Dept         Salary
—————————————-
Rahul Mehta      IT            92,000
Priya Sharma     IT            85,000
—————————————-
Total                            177,000
Average                          88,500

Example 3 — DevOps: Config file reader

DevOpsIT Support

Stores environment configuration in a dictionary of dictionaries, demonstrates safe key access with .get(), and generates deployment summary from the nested structure. DevOps engineers work with config dicts constantly — from YAML to environment variables.

Python

# DevOps: Environment config as nested dict

config = {
    "production": {
        "db_host":    "ora-prod-01.internal",
        "db_port":    1521,
        "replicas":   3,
        "ssl":        True,
        "log_level":  "WARNING",
    },
    "staging": {
        "db_host":    "ora-stg-01.internal",
        "db_port":    1521,
        "replicas":   1,
        "ssl":        True,
        "log_level":  "INFO",
    },
    "dev": {
        "db_host":    "localhost",
        "db_port":    5432,
        "replicas":   1,
        "ssl":        False,
        "log_level":  "DEBUG",
    },
}

target_env = "staging"
env_cfg    = config.get(target_env)

if not env_cfg:
    print(f"Error: environment '{target_env}' not found in config.")
else:
    print(f"=== Deployment Config: {target_env.upper()} ===")
    for key, value in env_cfg.items():
        ssl_note = " (⚠ disabled)" if key == "ssl" and not value else ""
        print(f"  {key:14}: {value}{ssl_note}")

    # Safe access for optional key
    timeout = env_cfg.get("timeout_secs", 30)
    print(f"  {'timeout_secs':14}: {timeout} (default)")

Output

=== Deployment Config: STAGING ===
db_host        : ora-stg-01.internal
db_port        : 1521
replicas       : 1
ssl            : True
log_level      : INFO
timeout_secs   : 30 (default)

Example 4 — AI / ML: Feature vocabulary builder

AI / ML

Uses sets to build a clean vocabulary from text data — deduplicating tokens, finding words unique to one corpus versus another, and identifying shared terms. Set operations are a fundamental tool in NLP preprocessing.

Python

# AI/ML: Set operations for NLP vocabulary analysis

# Two document corpora (simplified — normally thousands of docs)
corpus_support = [
    "reset password account login failed error system",
    "network timeout connection error server down reset",
    "account locked password reset request user",
]

corpus_billing = [
    "invoice payment failed account error billing",
    "refund request account credit card payment",
    "subscription renewal billing account update",
]

def build_vocab(corpus):
    words = []
    for doc in corpus:
        words.extend(doc.split())
    return set(words)          # set() removes all duplicates

vocab_support = build_vocab(corpus_support)
vocab_billing = build_vocab(corpus_billing)

# Set operations
shared_terms   = vocab_support & vocab_billing     # intersection
support_only   = vocab_support - vocab_billing     # difference
billing_only   = vocab_billing - vocab_support     # difference
all_terms      = vocab_support | vocab_billing     # union

print(f"Support vocabulary : {len(vocab_support)} unique terms")
print(f"Billing vocabulary : {len(vocab_billing)} unique terms")
print(f"Shared terms       : {sorted(shared_terms)}")
print(f"Support-only terms : {sorted(support_only)}")
print(f"Billing-only terms : {sorted(billing_only)}")
print(f"Total unique terms : {len(all_terms)}")

Output

Support vocabulary : 12 unique terms
Billing vocabulary : 11 unique terms
Shared terms : [‘account’, ‘error’, ‘request’, ‘reset’]
Support-only terms : [‘connection’, ‘down’, ‘failed’, ‘locked’, ‘login’, ‘network’, ‘password’, ‘server’, ‘system’, ‘timeout’, ‘user’]
Billing-only terms : [‘billing’, ‘card’, ‘credit’, ‘credit’, ‘invoice’, ‘payment’, ‘refund’, ‘renewal’, ‘subscription’, ‘update’]
Total unique terms : 23

Example 5 — Automation: Multi-environment deployment tracker

AutomationDevOps

Combines all four data structure types in one program: a tuple for immutable version info, a list to track deployment order, a dict of lists to group deployments by environment, and a set to track which services have been deployed. This is the kind of structure a release automation script would maintain.

Python

# Automation: Release tracker using all four data structures

# Tuple: immutable release metadata
release = ("v2.4.1", "2026-06-14", "hotfix")
version, date, rel_type = release

# List: ordered deployment steps
deploy_order = ["auth-service", "api-gateway", "user-service",
                "notification-service", "report-service"]

# Dict of lists: services by environment
environments = {
    "dev":        ["auth-service", "api-gateway"],
    "staging":    ["auth-service", "api-gateway", "user-service"],
    "production": ["auth-service", "api-gateway", "user-service",
                    "notification-service", "report-service"],
}

# Set: services already deployed (updated as we go)
deployed = set()

print(f"Release {version} ({rel_type}) — {date}\n")

for env, services in environments.items():
    print(f"[{env.upper()}]")
    for svc in deploy_order:
        if svc not in services:
            continue                      # skip services not in this env
        if svc in deployed:
            print(f"  ↩ {svc} (already deployed)")
        else:
            print(f"  ✓ Deploying {svc}...")
            deployed.add(svc)
    print()

pending = set(deploy_order) - deployed
print(f"Deployed : {len(deployed)}/{len(deploy_order)} services")
if pending:
    print(f"Pending  : {sorted(pending)}")

Output

Release v2.4.1 (hotfix) — 2026-06-14

[DEV]
✓ Deploying auth-service…
✓ Deploying api-gateway…

[STAGING]
↩ auth-service (already deployed)
↩ api-gateway (already deployed)
✓ Deploying user-service…

[PRODUCTION]
↩ auth-service (already deployed)
↩ api-gateway (already deployed)
↩ user-service (already deployed)
✓ Deploying notification-service…
✓ Deploying report-service…

Deployed : 5/5 services

🏋️

Practice Exercises

Four problems — one for each data structure type

⌃

Each exercise is focused on one specific structure. This is intentional — you need to build intuition for each type individually before you start mixing them. Aim to complete each one without looking at the Syntax Reference first; consult it only if you get stuck.

List manipulation (Lists). Start with this list of IP addresses: [“10.0.1.1”, “10.0.1.2”, “10.0.1.3”, “10.0.1.4”, “10.0.1.5”]. Write code to: (a) add two more IPs to the end, (b) insert “10.0.0.1” as the first item, (c) remove “10.0.1.3”, (d) sort the list, (e) use a list comprehension to create a new list containing only IPs that start with “10.0.1”. Print the list after each step.

For step (b) use .insert(0, “10.0.0.1”) — index 0 puts it at the start. For step (c) use .remove(“10.0.1.3”). For the list comprehension: [ip for ip in ips if ip.startswith(“10.0.1”)]. Remember to print after each operation so you can see the state changing.

Server record (Dictionary). Create a dictionary representing a single server with at least 6 keys: name, environment, IP, CPU percentage, memory percentage, and online status. Then: (a) print each key-value pair using a .items() loop with f-string formatting, (b) add a “disk” key, (c) safely retrieve a “last_reboot” key that doesn’t exist using .get() with a default, (d) check whether CPU is above 80 and print an appropriate status message.

For the loop: for key, value in server.items(): print(f”{key}: {value}”). Adding a key is just assignment: server[“disk”] = 55. For safe retrieval: server.get(“last_reboot”, “Unknown”). For the CPU check: if server[“cpu”] > 80: ….

Duplicate detector (Sets). You receive two lists of user login IDs from two different authentication systems that should match: sys_a = [1001, 1002, 1003, 1004, 1001, 1002] and sys_b = [1002, 1003, 1005, 1006]. Using sets, find: (a) how many unique IDs are in each system, (b) which IDs exist in both systems, (c) which IDs are only in system A, (d) which IDs are only in system B (new users to provision), (e) the total number of unique IDs across both systems.

Convert each list to a set first: set_a = set(sys_a). Then: intersection set_a & set_b, difference set_a – set_b, union set_a | set_b. The len() of a set tells you how many unique items it contains.

Fleet grouper (Nested structures). You have a list of 6 servers, each as a dictionary with name, env (“production”, “staging”, or “dev”), and cpu. Write code that: (a) groups all server names into a dictionary keyed by environment (dict of lists), (b) calculates the average CPU per environment, (c) prints a summary table showing each environment, its server count, and its average CPU.

For grouping: loop through servers, check if s[“env”] not in groups: groups[s[“env”]] = [], then groups[s[“env”]].append(s[“name”]). For average CPU per env: loop through the original list again, accumulate CPU values per env into another dict, then divide by count. Or do both in one pass using two dicts.

📋

Assignment — M3

An inventory management system — estimated 40–50 minutes

⌃

📋 IT Asset Inventory System

Your company needs a script to manage its IT asset inventory. Build a Python script called asset_inventory.py that uses all four data structure types appropriately.

Assets list — create a list of at least 8 asset dictionaries. Each asset should have: asset_id (str), type (str — “laptop”, “server”, “network”, or “phone”), owner (str — employee name or “unassigned”), location (str — city name), purchase_year (int), and value_inr (int).
Immutable config tuple — create a tuple called asset_meta holding the current inventory date, the company name, and the currency code (“INR”). Unpack it into three variables and use them in your output headers.
Group by type — build a dictionary where each key is an asset type and the value is a list of asset IDs of that type. Print this grouped view.
Unique locations — use a set to find all unique office locations in the inventory. Print the count and the sorted list of locations.
Financial summary — calculate and print: total asset value, average value per asset, most expensive asset (name and value), and total value grouped by asset type.
Ageing report — assets older than 4 years (purchase_year < 2022) should be flagged for review. Use a list comprehension to extract them and print a “Flagged for replacement” list.

✅ Paste your script and output in the comments. If your output includes all six sections — grouped view, locations, financial summary, and ageing report — you’re ready for M4: Functions & Modules.