Hi all,
I'm interested in getting involved the Python API development. The first use-case I've encountered in my work is that of zipWithIndex, so I started looking into how to go about implementing that. It looks like the core of it involves being able to uniquely identify what worker you're currently in between distributed calls; the Scala end has getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context is more or less limited to the broadcast variables. Happy to hear any hints as to how I should get started with this. Thanks. Regards, Shannon |
The subtaskIndex is not currently exposed to the python operator.
Fortunately this can be changed very easily: On the java side, within PythonStreamer.startPython() the python process is started and several parameters are transferred (L.129++) using stdin/-out. These parameters are received on the python side in Environment.execute() (L.168++). So the transfer is rather straight-forward, after that you only have to modify the operator.configure() method to also take a subtaskIndex argument, modify the RuntimeContext constructor, add a getIndexOfThisSubtask() method and you're set. Feel free to open a JIRA for this. On 11.03.2016 18:15, Shannon Quinn wrote: > Hi all, > > I'm interested in getting involved the Python API development. The > first use-case I've encountered in my work is that of zipWithIndex, so > I started looking into how to go about implementing that. It looks > like the core of it involves being able to uniquely identify what > worker you're currently in between distributed calls; the Scala end > has getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime > context is more or less limited to the broadcast variables. > > Happy to hear any hints as to how I should get started with this. Thanks. > > Regards, > Shannon > |
Hi Shannon,
I'm happy to see some community engagement on our Python APIs! On Fri, Mar 11, 2016 at 6:32 PM, Chesnay Schepler <[hidden email]> wrote: > The subtaskIndex is not currently exposed to the python operator. > > Fortunately this can be changed very easily: > On the java side, within PythonStreamer.startPython() the python process > is started and several parameters are transferred (L.129++) using > stdin/-out. > These parameters are received on the python side in Environment.execute() > (L.168++). > > So the transfer is rather straight-forward, after that you only have to > modify the operator.configure() method to > also take a subtaskIndex argument, modify the RuntimeContext constructor, > add a getIndexOfThisSubtask() method and you're set. > > Feel free to open a JIRA for this. > > > On 11.03.2016 18:15, Shannon Quinn wrote: > >> Hi all, >> >> I'm interested in getting involved the Python API development. The first >> use-case I've encountered in my work is that of zipWithIndex, so I started >> looking into how to go about implementing that. It looks like the core of >> it involves being able to uniquely identify what worker you're currently in >> between distributed calls; the Scala end has >> getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context >> is more or less limited to the broadcast variables. >> >> Happy to hear any hints as to how I should get started with this. Thanks. >> >> Regards, >> Shannon >> >> > |
I'm a Python guru; if it doesn't have a Python API, I'll likely help
make one :) Work is bad this week but I'm planning to get started on this next week! Shannon On 3/14/16 5:37 AM, Robert Metzger wrote: > Hi Shannon, > > I'm happy to see some community engagement on our Python APIs! > > On Fri, Mar 11, 2016 at 6:32 PM, Chesnay Schepler <[hidden email]> > wrote: > >> The subtaskIndex is not currently exposed to the python operator. >> >> Fortunately this can be changed very easily: >> On the java side, within PythonStreamer.startPython() the python process >> is started and several parameters are transferred (L.129++) using >> stdin/-out. >> These parameters are received on the python side in Environment.execute() >> (L.168++). >> >> So the transfer is rather straight-forward, after that you only have to >> modify the operator.configure() method to >> also take a subtaskIndex argument, modify the RuntimeContext constructor, >> add a getIndexOfThisSubtask() method and you're set. >> >> Feel free to open a JIRA for this. >> >> >> On 11.03.2016 18:15, Shannon Quinn wrote: >> >>> Hi all, >>> >>> I'm interested in getting involved the Python API development. The first >>> use-case I've encountered in my work is that of zipWithIndex, so I started >>> looking into how to go about implementing that. It looks like the core of >>> it involves being able to uniquely identify what worker you're currently in >>> between distributed calls; the Scala end has >>> getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context >>> is more or less limited to the broadcast variables. >>> >>> Happy to hear any hints as to how I should get started with this. Thanks. >>> >>> Regards, >>> Shannon >>> >>> |
Free forum by Nabble | Edit this page |