An entrance is a portal to the inside, exposing the character of a place as you gain access to its inner world. A seaport has an entrance that is wide, obtuse, while a gate is cut by two right angles. Other entrances, the uncommon ones, are narrower.
Narrow entrances lead deep into caves, into closets and attics, where the room is empty of space. A close brush with the edge gives us claustrophobia, as we realize there is really only one way out. In the structures of rocks and attics are our most sacred fears. An ancient temple is known to be haunted, as well as our attics and closets.
This is our primal fear because we have, in essence, a rodent-like sense of instincts. But what other instincts do we have that drive us? It's true, we are very unnatural in this way.
We have an instinct to dominate, we drive slaves. Our instincts compel us to freedom, we ride horses. Our instincts crave beauty and art, and, finally, social health.
How do we fulfill these urges? Technology.
And we'll never run out of this asset while we make duplicates of our reality. We make duplicates of products, we make duplicates of files. We multiply because we fear the absence of space.
***
import requests
import time
import pickle
import os
import random
def fetch_wikipedia_content(title):
url = "https://en.wikipedia.org/w/api.php"
params = {"action": "query", "format": "json", "prop": "extracts", "explaintext": True, "titles": title}
response = requests.get(url, params=params)
pages = response.json().get("query", {}).get("pages", {})
return next((page_info.get("extract", "") for page_id, page_info in pages.items() if page_id != "-1"), None)
def fetch_links_on_page(title):
url = "https://en.wikipedia.org/w/api.php"
params = {"action": "query", "format": "json", "prop": "links", "titles": title, "plnamespace": 0, "pllimit": "max"}
response = requests.get(url, params=params)
links = response.json().get("query", {}).get("pages", {}).values()
return [link["title"] for page_info in links if "links" in page_info for link in page_info["links"]]
def crawl_wikipedia(current_title):
print(f"Fetching content for: {current_title}")
content = fetch_wikipedia_content(current_title)
if content:
file_name = f"text/{current_title.replace('/', '_')}.txt"
if not os.path.exists(file_name):
with open(file_name, "w", encoding="utf-8") as file:
file.write(content)
print(f"Saved: {file_name}")
else:
print("File exists, skipping.")
def main():
visited_file = "visited.pkl"
visited = pickle.load(open(visited_file, "rb")) if os.path.exists(visited_file) else ["Main Page"]
to_crawl = [link for title in random.sample(visited, min(60, len(visited))) for link in fetch_links_on_page(title)
if link not in visited]
print("Starting Wikipedia crawler... Press Ctrl+C to stop.")
try:
while to_crawl:
current_title = to_crawl.pop(0)
visited.append(current_title)
crawl_wikipedia(current_title)
to_crawl.extend(
link for link in fetch_links_on_page(current_title) if link not in visited and link not in to_crawl)
pickle.dump(visited, open(visited_file, "wb"))
time.sleep(1)
except KeyboardInterrupt:
pickle.dump(visited, open(visited_file, "wb"))
print("Crawling stopped. Data saved.")
if __name__ == "__main__":
main()
***