|Topic||Lead||Interested?||Spec?||Summary||Notes / Goals / Etc||Scheduling notes|
|discovery||devananda||JayF, dtantsur, wanyen, pensu, rameshg87, stendulker, nobodycam, deva, victor_lowther,Nisha||too many to list||We've talked about this several times. The general concensus at the midcycle was "out of scope" and that's the stance we took for the rest of Juno.||If we have a session to discuss this again, I'd like to frame it as a way to encourage integration with external inventory management system(s) - -and if someone wants to add one to OpenStack, that's fine, but not in scope for the current project.||operator track?|
and "ready state"
|devananda||JayF, lucasagomes, dtantsur, JoshNang, stendulker, rameshg87, nobodycam, deva, victor_lowther||again, too many||Related to discovery of unknown hardware, but distinct from it, is the topic of interrogating hardware for which we at least know *something*, figuring out what it is, populating node.properties, and so on. This paves the way for pre-configuring the hardware in a "ready state".|
Can we do this? Yep. Without a major API change? Yep.
Should we? Probably. It's a highly requested feature, and fits within the existing project scope.
|In addition to this, should we have a joint session with Heat or TripleO |
dtantsur: see also https://github.com/Divius/ironic-discoverd :) uses lold term "discovery", but actually about introspection via PXE and ramdisk.
|Versioned Driver API||NobodyCam||wanyen, stendulker, rameshg87||none||Nova is talking about moving its virt drivers out of tree inorder to " Avert a crisis." do we every expect to see our driver pool grow to such a point that we would want to do the same type of split-out?||(NobodyCam)|
I would like to see if the Ironic community believes there would be value in spliting drivers in to their own repo. The goal would be the basicly the same as Nova's: Provide faster reviews, Have a review team deadicated to drivers, provide a single location for all drivers there by avoiding the in-tree / out of tree stigma.
Also, properly versioning the driver API would help support out-of-tree drivers. Whether vendor/proprietary or merely in separate trees for organizational reasons, making that API contract follow SemVer might be a good thing. And help set a good precedent within OpenStack.
The agent has a similar issue here too; we support "third party" hardware managers but don't currently version that interface. Perhaps IPA could adopt (or even proof of concept) the same approach that Ironic takes.
|Ramdisk builders||JayF||wanyen, pensu, lucasagomes, rameshg87, adam_g, deva||n/a||Ramdisks are key to all in-band Ironic deploy mechanisms -- both IPA and PXE drivers both require a properly built ramdisk to function. Right now the source for these builders is spread across many repos: IPA repo holds it's own, while the PXE image build live in diskimage-builder, and so forth. We should come up with a single, sensible home for all the build scripts Ironic requires to function.||This became especially evident when an ISO builder was written that builds an ISO out of a ramdisk/kernel combo. Right now the code is duplicated across repos because we don't have a reasonable, common place for it.|
|oslo.objects||JayF, lucasagomes, mrda||We copied the "object" code from Nova a while back. It's evolved, and we've tried to stay mostly in sync.|
Is it time to move this code out into Oslo now?
NOTE: dansmith is working on this already... not sure we need to have a session for this
|It would be great if we could drag dan smith from nova to participate of this conversation|
It's on the oslo agenda for this summit. https://etherpad.openstack.org/p/kilo-oslo-summit-topics
|in Oslo track|
|Capabilities aka Firmware/BIOS/RAID configuration||JayF is willing but others likely more capable||dtantsur, wanyen, lucasagomes, stendulker, adam_g, deva, victor_lowther, ifarkas|
many rejected from J
|Many folks want a mechanism for configuring firmware, BIOS, RAID, or other settings on a node at or after deploy time. While things done post-deploy are likely out of scope, it's possible to utilize capabilities+flavors to express configuration in a reasonable, cross-driver way.||I (JayF) personally would like to see solutions that don't involve directly exposing any firmware/bios/raid settings as an Ironic API endpoint, as these will be inherently hard to abstract.||APIs are Hard (tm)|
|operators||devananda||jroll, lucasagomes, JoshNang, nobodycam||We will have a slot in the Monday operators track.|
|What feedback do we want?|
(Nobodycam) The list of desired feedback should be ready for the moday operators track. we may want to go over this list in our weekly meeting. I would love to get feed back on our python-lib vs the CLI client
|in Operators track|
|"Experimental" tag for drivers||JayF||Nobodycam||none yet||There is no indication to deployers currently what Ironic drivers are tested and how well. We should mark drivers that are not tested and known-good as experimental, and require a deployer to expliclty enable them.||This will help prevent folks from accidentally using drivers in production that aren't suited for it, also, by making it an attribute of the software, it means changes to 'experimental' status automatically flow through our usual code review processes.|
(Nobodycam) I feel it would be better to use positive tags like "production ready".
(JayF): I just worry if we state it postiviely, it'll be seen as an endorsement -- I've seen a lot of ML threads about declaring things "certified" or similar things having some pushback from the TC. I don't have a personal preference which way it's implemented though :).
|security||jroll, dtantsur, lucasagomes, JoshNang, adam_g||none||At the moment the pxe/iscsi driver resets the partition table on the first hard disk, but doesn't wipe the data. This has two holes: other disks have their partition tables preserved; tenant data is able to be read by the new instance.||https://bugs.launchpad.net/ironic/+bug/1174153 This should be part of the decom talk below.||No. Roll into decom|
|decom||JoshNang / JayF||jroll, dtantsur, lucasagomes, deva, JoshNang, adam_g|
|Cleaning up after tenants and preparing the server for the next tenant. There is also a main summit presentation about it here: https://openstacksummitnovember2014paris.sched.org/event/722245d15f368a720d95c9a9bbb77100#.VCGa2S5dVag||Most of the code for Ironic and IPA to run this is written, and being actively used downstream by OnMetal. Also, I think there are synergies to be had between this and the "ready state" stuff given they both are directed at validating a machine is ready for another tenant.||Definitely a good thing.|
Do we need a whole slot for it?
|lock breaking||Lucas (if no one else wants to do it)||jroll, JayF, deva, JoshNang, rameshg87, mrda, lucasagomes, nobodycam||none||Breaking the lock of a node. Should we offer that in our API? Or have some separated utility to do that? Of course we should aim to _not_ have a node stuck with the lock, but you know sh*t happens so it would be nice to offer a way to recover from that lock problem.|
Could also talk about pluggable locking / zookeeper work here.
|Pluggable locking spec/code: https://review.openstack.org/#/q/status:open+branch:master+topic:bp/pluggable-distributed-synchronization,n,z|
Zookeeper spec/code: https://review.openstack.org/#/q/status:open+branch:master+topic:bp/zookeeper-syncmanager,n,z
Auto-dropping a lock when a conductor looses network connectivity will lead to split-brain and defeats the main purpose of locks. Granted, that purpose has been lost along the way -- protect a system from potentially dangerous interactions during sensitive operations. Let's see if we can get back to the intent of locks.
|policy.json||jroll (or anyone else)||JayF, deva, lucasagomes, nobodycam||none yet||There is a need to provide non-admin access to the API. One example is a read-only monitoring user. Another is providing a way for ops teams to do certain things without giving away the keys to the entire cloud (e.g. putting a node in maintenance mode or updating MAC addresses for NIC replacements). oslo.policy has mechanisms for this; let's make it granular and let operators decide who gets access to what.||Hackathon!|
|soft power control||NobodyCam||JayF, jroll, rameshg87, deva||none yet||Is there a reason to provide "soft power off" options from ironic.||I would like to gauge operators need for such support with in Ironic.|
jroll: "nova rescue" sounds like a good reason. "nova reboot" also currently does a hard power off; users don't expect this to corrupt a disk. If users don't have console access, the node may not even be able to run fsck and becomes unusable.
|Let's poll the operators, but we don't need a whole session for this|
|HA nova-compute||jroll||JayF, deva, lucasagomes, JoshNang, rameshg87, nobodycam, adam_g||none yet||How do we solve making nova-compute HA while not completely breaking Nova's view of the world?||Also known as "clustered hypervisors"||This belongs in the Nova track. And we need to talk about it. Seriously.|
|dtantsur, wanyen, lucasagomes, nobodycam,ramineni||All drivers require auth'n. This is stored in node.driver_info. But why do they all require different parameters?|
Could/Should we standardize the parameter names for user_name, password? What about unique properties of some drivers?
|context: the iLO+IPMI hack in https://review.openstack.org/124704|
JayF: This would make it easier to implement (see policy.json row) hiding secrets from certain users or only providing them when specifically requested.
Split "boot" and
|devananda||jroll, lucasagomes, JoshNang, rameshg87, stendulker, adam_g||The iLO driver implementation highlighted the conflation we currently have between booting and provisioning. For example, we can boot using PXE or iLO, and we can deploy using iSCSI or IPA. However, we only have one "deploy" driver interface today.||hackathon?|
|Driver compositing||devananda||dtantsur, lucasagomes, nobodycam, victor_lowther||A "driver" is a composition of realized classes which implement a defined set of interfaces. Sometimes, those classes belong to a single module; other times, they do not. This is resulting in Ironic exposing an exponentially increasingn list of driver names. We have _13_ today, and that is not a complete list.|
Should we give drivers cute names?
|dtantsur: also bothered by this. sometimes our driver split looks too artificial even||meetup?|
|Network partitioning||jroll(?)||We should support separate provisioning and tenant networks (and decommissioning networks, eventually). We'll need to integrate with Neutron or something to tell (physical/software) switches to do such a thing.|
Neutron has been working on some things to support this, and Rackspace does this in production right now :)
|option for netboot or local boot||wanyen(I am proposing this topic. Anyone wants to lead is fine with me.)||wanyen, lucasagomes, rameshg87, nobodycam, jroll||Currently pxe dirver always netboot and IPA driver in Juno always local boot after a node has been provisioned. Ideally, we want to provide an option for users to choose netboot or local boot on a per instance or per node basis.||This is similar to "capabilities" discussion above. Ironic should expose such things to the user (whether directly or through Nova flavors) such that they can choose.||Merge with capabilities|
|node placement based on additional hardware/firmware properties||wanyen(I am proposing this topic. Anyone wants to lead is fine with me.)||wanyen||Currently bare-metal node placement is based on a basic set of hw properties, i.e., disk and ram sizes, ncpus, cpu arch. There are use cases where a deployer may want to have more control on what bare-metal node to place a workload on. For instance, to place a workload on a specific server model, or on a node with 10Gb NIC, ...etc. extra flavor with ComputeCapabilitesFilter might be able to facilitate this feature.||Q: Doesn't ComputeCapabilitesFilter already solve that problem!?|
This can be achieved with manually setting node['properties']['capabilities'] today and using appropriate nova flavor metadata. Making this automatic falls into the discussion of hw introspection above.
- create some demonstrations
- create a tempest test
|BMC sensor type and emit hw sensors to destination other than ceilometer||wanyen(I am proposing this topic. Anyone wants to lead is fine with me.)||wanyen|
|currently the hw sensor type is hardcoded with "ipmi". Vendors may collect addtional sensors using non-ipmi protocol. So, there is a need to use a more generic sensor type name, e.g., bmc instead of ipmi. Also, ther is a need to emit sensor data to other monitoring tools such as Monosca, nagio,..etc.||The data is sent to the RPC channel so any service could listen to that and consume the data AFAIUI||work in progress|
|firmware update||wanyen(I am proposing this topic. Anyone wants to lead is fine with me.)||wanyen, dtantsur, ramineni, victor_lowther||Firmware update is a useful feature and I have heard users expressed interested in this functionality. There are multiple ways to do firmware update, e.g, out-of-band firmware update, do it as part of the node decom or deploy.||JayF: Firmware updates are likely something that would happen as a part of decom; perhaps this should be rolled into there?|
Deva: reflashing firmware during decom is an essential part of security in a multi-tenant environment. That could be done in- or out-of-band. There is also interest in certain platforms which support live out-of-band firmware updates.
|We need to agree on an API for specifying this that works across drivers and hardware vendors. Is that achievable in one session?|
|trusted boot||lucasagomes, jroll||none for kilo (yet)||implement trusted boot.||prob not worth a session slot, but maybe worth chatting about in the pod area, or unconference talk.|
|notifications||jroll proposed, others feel free to lead||nova-style notifications. When an error happens, put it on the notifications bus. e.g. deploy() fails|
|external events||adam_g proposed, others feel free to lead||jroll|
(obsolete Juno spec) https://review.openstack.org/#/c/99770/
|we have no good way of coordinating events with external services (nova, neutron) we are resorting to polling and (worse) sleeps in various places. we should provide a way for ironic to send event callbacks to other services (ie, tell nova node is powered on) as well as receive them (ie, neutron tells ironic VIFs have been plugged). nova has already implemented this last cycle and it may be useful here, as well|
|client automatic retries||Nova has a client_wrapper class that wraps retries for certain exceptions. If this is the behavior we expect in our clients, why is our client not doing this itself?!!?||A hack-a-thon. Not a design session.||hackathon|
|deva||lucasagomes, jroll, nobodycam|
how 'bout some bugs?
|PXE deploys partition images, and can not create a boot block, so nodes must always net boot.|
IPA deploys whole-disk images, and requires a boot block!
WTF?? Our drivers behave inconsistently and this is not exposed in the API anywhere!
introducing new driver features
|dtantsur?||dtantsur, lucasagomes, ramineni, victor_lowther|
|Folks want to introduce more and more features that are generic, but currently will be implemented only by one driver.|
Current agreement is to have vendor-passthru as a staging area. First question: is it correct? If so, we need to support full range of HTTP verbs and also support sync approach in vendor-passthru.
Anyway, my current understanding is that we aim to promote these vendor-specific things eventually. How do we cope with features that supported by _most_ driver (but e.g. not ipmi)?
Raising NotImplementedError is ok sometimes, but it's not really good for it to happen only in the middle of complex operation. Possible solution is some kind of capabilities framework.
Another use case: supporting whole-disk images. As it is proposed now, deploy process will not fail if driver does not support whole-disk images, it will just silently produce wrong deploy. With capabilities attached to driver we could fail deploy on Nova driver stage.
|Also see line 7 (RAID etc)|
(deva) failing during deploy based on information available ahead of time is a bad example. Such information should inform scheduling decisions.
long running deploy ramdisks
|Reduce deploy time by a full POST cycle by powering on the node and letting the deploy ramdisk idle until a deploy command is issued|
|chassis / node inheritance||nobodycam (if none better can be found)||none||Ironic has the concept of a chassis but it is woefully under used. I would like to talk about adding chassis to node inheritance for properties / driver info.|
I can see several advantages:
rotating Ipmi info (for nodes with same account info)
each node may require less data (such as ci systems)
|inherited properties would be visable when --detail is passed to node-show|
|DHCP MAC registration||zyluo|
|The specification `support-external-dhcp` made|
the dhcp provider class pluggable which the base class does two things,
1) Update port dhcp options and 2) Update port mac address.
This implies that the dhcp server knows about the port but there is no method
to register a MAC/IP pair to the DHCP service. Therefore it is not guaranteed
that the dhcp server knows about a mac address except when the neutron
dhcp provider class is used since neutron receives a port create request
from nova upon "nova boot".
The dhcp base class needs a "create_port_address" method
which will be called when the admin does a 'ironic port-create'.
This spcification will change the behaviour of 'ironic port-create' and
send the mac address to the dhcp server with an optional IP address.
The Neutron dhcp provider class will not override the "create_port_address"
method so no behavioural changes will be made.
Future dhcp provider classes should implement the method accordingly.
|Related with "Making Ironic more stand-alone and increasing independent functional testing of Ironic" Wed 11:00|