Connect Kafka on WSL2 with PyCharm on Windows

WSL2 with Kafka and PyCharm - To solve broker not found error on WSL2 Kafka and PyCharm client on Windows

*** This post is technical and about a problem I faced with Kafka on WSL2 and PyCharm***

This guide of for people who are facing difficulty with using Kafka on WSL2  and PyCharm for python clients

Hello and welcome to this small blog on Kafka with WSL2 and PyCharm. 

The intention of this blog is to highlight a basic understanding about a WSL2 networking feature and how it affects editors like PyCharm and JetBrains from accessing servers running on wsl2.

If you have already come across this issue and have found a solution different that the one given here, please feel free to correct me.

  • Okay, so many of us Linux fellas who are obliged to work on Windows would definitely love the new WSL (Windows Subsystem for Linux) feature introduced by Microsoft for Windows 10. This feature literally runs a Linux machine on our windows without the overhead of a virtual machine like VirtualBox or VMware.(WSL Microsoft Documentation)
  • But, it has its own set of compromises in terms or memory, CPU and networking. 
  • However, with a system containing at least 16GB or RAM, the CPU and Memory issues won't cause much issues for normal application development.
  • To further optimize the performance of the WSL with a need to run Linux servers while using heavy IDEs like PyCharm or JetBrains, windows has come up with version of the WSL labelled WSL2.

The approach taken to construct Linux environment is very much different in WSL2 from what is used in WSL1.

WSL2 is closer to how an actual VM works with its usage of BIOS (Hardware) Virtualization support, Hyper-V and other necessities which are often found in VM requirements.

Hence it would be plausible to say that WSL2 is more of a Microsoft VM for Linux machines than a sub-space file system like WSl1.

The WSL2, although exercises greater need of resources for running, gives users a chance to manually constrict maximum usage limits for resources like CPU cores, Memory and Swap spaces by allowing them to set thresholds in the .wslconfig file. 

[Link to WSL config]


There are certain perks to WSL2 that were not found in WSL1. As listed by Microsoft, they are:

  •  increase file system performance,
  •  support full system call compatibility.


I will not go in too deep to define what they mean to us as it will deviate the cause of this article. However, do note that features of WSL2 are not all enhancements over WSL1. WSL2 is just a slightly different take to integrate Linux environment into Windows with some perks than WSL1.

Now, we saw some perks of WSL2 above. But everything is not cherry atop cream when it comes to WSL. People who already know and use WSL would find that IO, serial connections and networking configurations have changed drastically between v1 and v2 of WSL.

WSL1 has better performance with Serial connections like USB(WSL2 doesn't yet support serial) and faster IO times than WSL2. Memory release also seems to be a bit of active drawback of using WSL2.

Now, coming to networks and how WSL2 tackles this issue, I would say there is a huge difference in how the WSL and windows are making use of network setups.

According to Windows WSL documentation, any application running on WSL2 with open ports can be accessed from windows using  `localhost:<port-number>`. However, the reverse is not possible as the communication from WSL2 to windows doesn't fall under the same system category. In order to access windows servers from WSL2, you need to make use of the WSL network topology addresses that are assigned between Windows and WSL instance. It won't be a regular localhost but a unique *IP virtual LAN* like how a VM establishes connection with the Host machine (a virtual LAN or a network bridge). Also common IP protocols like Ping (ICMP) will fail when we try to ping Windows from WSL


[WSL IPCONFIG](ip_addr_wsl.png)


[WINDOWS IPCONFIG](windows_ps_ipconfig.png)

Now, coming to our other part of the article, you all might know Apache Kafka. (if not, I recommend you to go check it out once. An amazing piece of software for efficient data/event streaming). 

The rest of the blog is focused on a specific issue of accessing kafka brokers on WSL2 from PyCharm on Windows.


As we saw above, the network is completely different from how it was with WSL1. In WSL1, even if the server is running on WSL and PyCharm is on windows, using `localhost` is sufficient to access services running anywhere on the system.

But in WSL2 with the network being in a separate WSL domain, clients(producer/consumer/admin) working on windows are unable to connect to Kafka on WSL2 with `localhost:9092` or `localhost:2181` (default Kafka broker and zookeeper ports). 

Now in order to fix this issue, we need to understand a topic in Kafka called LISTENERS

Listeners are a configuration parameter in Kafka server used to establish any means on communication with the broker. 

There are two types of configurations for listeners:

  • Listeners (used for inter broker communication)
  • Advertised Listeners (Used for communication with clients)


Now in a single node setup, listeners can be just localhost:9092 as the brokers and zookeepers exist on the same system and listeners is just for node-to-node communications.


But our focus here should be on the `advertised.listeners` config which dictates to what IP the clients should connect in order to communicate with the Kafka server.

        [Image of server.properties socket config]


Now, in WSL1 even if we leave both listener and advertised listener as localhost:9092 it will work and the clients would be able to connect.


But in WSL2, my clients on PyCharm (windows) weren't able to connect to Kafka brokers with localhost:9092. 

You will see a Kafka error saying cannot connect to broker on the ipv4 address [ip address: port]. Reason Unknown. Know that there might be other reasons for your errors too. But if the scenario matches and you get this error. This might be one possible fix.

So after a lot of research and understanding from the docs (I was eager to start working early instead of knowing completely about Kafka configs [*facepalm*]), I deduced that even with single node system on WSL2, we cannot give localhost:9092 as advertised.listeners as WSL2 is not communicating with client on its own system but with client on a different network. Also windows doesn't find any Kafka broker on its localhost:9092 so the client will report an error while connecting. Instead you have to give your WSL2 ip address (in my case:- `172.23.231.153`) as advertised listener address and use the same address in your client bootstrap-server configuration to connect to Kafka servers. 

This will tell your client and Kafka broker that any client should connect to that broker on the specified advertised listener address.

This particular approach is generally what Kafka administrators do when setting up a multi node Kafka cluster on cloud. They use either domain name address or internal (static) ip address during setting up the cluster to prevent changes in IP affecting the Kafka server running because the machines and servers will literally be on different physical systems in production. 

So to conclude, here we have to forget that Linux servers can be accessed from windows using localhost and assume that we are configuring server and client on different systems and use the appropriate IP addresses to communicate.

Regards,

- Sachmo

P.S: I wrote this guide because I found it useful. I didn't clearly find any solution to this problem from any of the websites like confluent docs or GitHub sites. Hence I wanted to create one. If you don't find it useful, please give a constructive criticism so that I can improve the quality or content. 




 

 

Comments

  1. Why is this article under "Random drabbles of my life"? πŸ˜‚

    ReplyDelete

Post a Comment

Popular posts from this blog

An Aesthetic's Artistry

print("Hello World!")