Scraping
How to

If you scrape public web data with Python, this guide shows you exactly how to implement How to Set Up Proxies With BeautifulSoup. You'll learn the differences between HTTP/HTTPS and SOCKS5 in requests, how to authenticate and rotate IPs, verify connectivity with httpbin, and choose the right Oculus proxy type for your workload. We'll also share a short testing plan, compliance notes, and troubleshooting tips so you can collect data reliably and responsibly. Reminder: proxies forward traffic; encryption comes from HTTPS/TLS.
pip install beautifulsoup4 requests For SOCKS5 support with requests:
pip install "requests[socks]" Host, Port, Username, Password in your Oculus Dashboard:
https://oculusproxies.com/dashboard/page/plans
import os import requests from bs4 import BeautifulSoup HOST = os.getenv("OCULUS_HOST", "[HOST]") PORT = os.getenv("OCULUS_PORT", "[PORT]") USER = os.getenv("OCULUS_USER", "[USERNAME]") PASS = os.getenv("OCULUS_PASS", "[PASSWORD]") proxies = { "http": f"http://{USER}:{PASS}@{HOST}:{PORT}", "https": f"http://{USER}:{PASS}@{HOST}:{PORT}" } headers = { "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 " "(KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36", "Accept-Language": "en-US,en;q=0.9", } # 1) Verify the proxy IP ip_resp = requests.get("https://httpbin.org/ip", proxies=proxies, headers=headers, timeout=15) ip_resp.raise_for_status() print("Proxy exit IP:", ip_resp.text) # 2) Fetch and parse a page resp = requests.get("https://example.org/", proxies=proxies, headers=headers, timeout=15) resp.raise_for_status() soup = BeautifulSoup(resp.text, "html.parser") print("Title:", soup.title.text if soup.title else "No title") import json import time import random import logging import requests from bs4 import BeautifulSoup from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") def build_session(proxies, headers): s = requests.Session() retry_cfg = Retry( total=3, backoff_factor=0.5, # 0.5, 1.0, 2.0... status_forcelist=[403, 407, 408, 429, 500, 502, 503, 504], allowed_methods=["GET", "HEAD", "OPTIONS"] ) s.mount("http://", HTTPAdapter(max_retries=retry_cfg)) s.mount("https://", HTTPAdapter(max_retries=retry_cfg)) s.proxies.update(proxies) s.headers.update(headers) return s def fetch(url, session, timeout=20): try: r = session.get(url, timeout=timeout) r.raise_for_status() return r except requests.exceptions.RequestException as e: logging.warning("Fetch error for %s: %s", url, e) return None def make_proxies(user, passwd, host, port, scheme="http"): return {"http": f"{scheme}://{user}:{passwd}@{host}:{port}", "https": f"{scheme}://{user}:{passwd}@{host}:{port}"} HOST = "[HOST]" PORT = "[PORT]" USER = "[USERNAME]" PASS = "[PASSWORD]" proxies_http = make_proxies(USER, PASS, HOST, PORT, scheme="http") headers = { "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 " "(KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36" } session = build_session(proxies_http, headers) r = fetch("https://httpbin.org/ip", session) if r: logging.info("Exit IP: %s", r.text) target = "https://example.org" r = fetch(target, session) if r: soup = BeautifulSoup(r.text, "html.parser") logging.info("Title: %s", soup.title.text if soup.title else "No title") # pip install "requests[socks]" # Use socks5h to ensure DNS resolution happens through the proxy. proxies_socks = { "http": f"socks5h://{USER}:{PASS}@{HOST}:{PORT}", "https": f"socks5h://{USER}:{PASS}@{HOST}:{PORT}", } session = build_session(proxies_socks, headers) print(session.get("https://httpbin.org/ip", timeout=15).text) Many providers offer rotating endpoints or session parameters in the username.
def new_session_with_identity(seed): user = f"{USER}" # or f"{USER}-sess-{seed}" if supported by your plan proxy = f"http://{user}:{PASS}@{HOST}:{PORT}" s = build_session({"http": proxy, "https": proxy}, headers) return s for i in range(3): s = new_session_with_identity(random.randint(1, 1_000_000)) r = fetch("https://httpbin.org/ip", s) if r: print(i, "exit:", r.json()) time.sleep(random.uniform(1.0, 2.0)) Below is the requested normal table. Always confirm details on each provider's official site.
| Provider | Network Types | Geo Targeting | Protocols | Compliance | Pricing Model | Best For |
|---|---|---|---|---|---|---|
| Oculus Proxies | Residential, ISP, Datacenter | Country, City, State, ASN, ZIP | HTTP/S, SOCKS5 | ToS/KYC + Acceptable Use | Usage‑based & monthly tiers — Datacenter from $0.10/GB, Residential from $0.80/GB | macOS setup simplicity; mixed workloads needing region flexibility |
| Bright Data | Residential, ISP, Datacenter, Mobile | Country, City, State, ASN, ZIP | HTTP/S, SOCKS5 | Compliance program | Usage‑based & monthly tiers — Datacenter from $0.90/GB, Residential from $2.50/GB | Enterprise-scale targeting and datasets |
| ASocks | Residential, Mobile | Country | HTTP/S, SOCKS5 | ToS/AUP | Pay‑as‑you‑go, No datacenter — Residential from $0.75/IP | Budget-friendly residential/mobile with simple setup |
| SOAX | Residential, ISP, Datacenter, Mobile | Country, City | HTTP/S, SOCKS5 | ToS/compliance | Usage‑based & monthly tiers — Datacenter from $0.40/GB, Residential from $2.00/GB | Precise geo targeting with broad network mix |
| FloppyData | Residential, ISP, Datacenter, Mobile | Country, City | HTTP/S, SOCKS5 | ToS/AUP | Usage‑based & monthly tiers — Datacenter from $0.60/GB, Residential from $1.00/GB | Low per‑GB rates and quick start across proxy types |
Notes: Specs and pricing are publicly stated by each provider and may change. Checked: January 2026.
Setting up proxies with BeautifulSoup is straightforward: use requests.Session with retries, verify with httpbin, and match proxy type to target difficulty—datacenter for speed, residential/ISP for tougher sites.
Test 2–3 providers for 7–14 days, comparing success rates, TTFB, and bans on identical workloads. Always respect site terms, robots.txt, and provider policies. Choose Oculus Proxies for granular geo targeting, flexible sessions, and transparent pricing across all network types.